Create characters in Unity with LLMs!
LLM for Unity enables seamless integration of Large Language Models (LLMs) within the Unity engine.
It allows to create intelligent characters that your players can interact with for an immersive experience.
LLM for Unity is built on top of the awesome llama.cpp and llamafile libraries.
🧪 Tested on Unity: 2021 LTS, 2022 LTS, 2023
🚦 Upcoming Releases
Method 1: Install using the asset store
Add to My Assets
Window > Package Manager
Packages: My Assets
option from the drop-downLLM for Unity
package, click Download
and then Import
Method 2: Install using the GitHub repo:
Window > Package Manager
+
button and select Add package from git URL
https://github.com/undreamai/LLMUnity.git
and click Add
On macOS you need the Xcode Command Line Tools:
xcode-select --install
The first step is to create a GameObject for the LLM ♟️:
Add Component
and select the LLM script.Download Model
button (~GBs).Load model
button (see Use your own model).Prompt
. You can also define the name of the AI (AI Name
) and the player (Player Name
).Stream
option.In your script you can then use it as follows 🦄:
using LLMUnity;
public class MyScript {
public LLM llm;
void HandleReply(string reply){
// do something with the reply from the model
Debug.Log(reply);
}
void Game(){
// your game function
...
string message = "Hello bot!";
_ = llm.Chat(message, HandleReply);
...
}
}
You can also specify a function to call when the model reply has been completed.
This is useful if the Stream
option is selected for continuous output from the model (default behaviour):
void ReplyCompleted(){
// do something when the reply from the model is complete
Debug.Log("The AI replied");
}
void Game(){
// your game function
...
string message = "Hello bot!";
_ = llm.Chat(message, HandleReply, ReplyCompleted);
...
}
To stop the chat without waiting its completion you can use:
llm.CancelRequests();
That's all ✨!
You can also:
LLM for Unity allows to build multiple AI characters efficiently, where each character has it own prompt.
See the ServerClient sample for a server-client example.
To use multiple characters:
Then in your script:
using LLMUnity;
public class MyScript {
public LLM cat;
public LLMClient dog;
public LLMClient bird;
void HandleCatReply(string reply){
// do something with the reply from the cat character
Debug.Log(reply);
}
void HandleDogReply(string reply){
// do something with the reply from the dog character
Debug.Log(reply);
}
void HandleBirdReply(string reply){
// do something with the reply from the bird character
Debug.Log(reply);
}
void Game(){
// your game function
...
_ = cat.Chat("Hi cat!", HandleCatReply);
_ = dog.Chat("Hello dog!", HandleDogReply);
_ = bird.Chat("Hiya bird!", HandleBirdReply);
...
}
}
void WarmupCompleted(){
// do something when the warmup is complete
Debug.Log("The AI is warm");
}
void Game(){
// your game function
...
_ = llm.Warmup(WarmupCompleted);
...
}
The last argument of the Chat
function is a boolean that specifies whether to add the message to the history (default: true):
void Game(){
// your game function
...
string message = "Hello bot!";
_ = llm.Chat(message, HandleReply, ReplyCompleted, false);
...
}
void Game(){
// your game function
...
string message = "The cat is away";
_ = llm.Complete(message, HandleReply, ReplyCompleted);
...
}
For this you can use the async
/await
functionality:
async void Game(){
// your game function
...
string message = "Hello bot!";
string reply = await llm.Chat(message, HandleReply, ReplyCompleted);
Debug.Log(reply);
...
}
using UnityEngine;
using LLMUnity;
public class MyScript : MonoBehaviour
{
LLM llm;
LLMClient llmclient;
async void Start()
{
// Add and setup a LLM object
gameObject.SetActive(false);
llm = gameObject.AddComponent<LLM>();
await llm.SetModel("mistral-7b-instruct-v0.2.Q4_K_M.gguf");
llm.prompt = "A chat between a curious human and an artificial intelligence assistant.";
gameObject.SetActive(true);
// or a LLMClient object
gameObject.SetActive(false);
llmclient = gameObject.AddComponent<LLMClient>();
llmclient.prompt = "A chat between a curious human and an artificial intelligence assistant.";
gameObject.SetActive(true);
}
}
You can also build a remote server that does the processing and have local clients that interact with it.To do that:
LLM
script or a standard llama.cpp server.LLM
script for the server, enable the Remote
option (Advanced options)LLMClient
script. The characters can be configured to connect to the remote instance by providing the IP address (starting with "http://") and port of the server in the host
/port
properties.A detailed documentation on function level can be found here:
The Samples~ folder contains several examples of interaction 🤖:
LLM
and a LLMClient
To install a sample:
Window > Package Manager
LLM for Unity
Package. From the Samples
Tab, click Import
next to the sample you want to install.The samples can be run with the Scene.unity
scene they contain inside their folder.
In the scene, select the LLM
GameObject and click the Download Model
button to download the default model.
You can also load your own model in .gguf format with the Load model
button (see Use your own model).
Save the scene, run and enjoy!
LLM for Unity uses the Mistral 7B Instruct, OpenHermes 2.5 or Microsoft Phi-2 model by default, quantised with the Q4 method.
Alternative models can be downloaded from HuggingFace.
The required model format is .gguf as defined by the llama.cpp.
The easiest way is to download gguf models directly by TheBloke who has converted an astonishing number of models 🌈!
Otherwise other model formats can be converted to gguf with the convert.py
script of the llama.cpp as described here.
❕ Before using any model make sure you check their license ❕
Show/Hide Advanced Options
Toggle to show/hide advanced options from belowShow/Hide Expert Options
Toggle to show/hide expert options from belowNum Threads
number of threads to use (default: -1 = all)Num GPU Layers
number of model layers to offload to the GPU.
If set to 0 the GPU is not used. Use a large number i.e. >30 to utilise the GPU as much as possible.
If the user's GPU is not supported, the LLM will fall back to the CPUStream
select to receive the reply from the model as it is produced (recommended!).Parallel Prompts
number of prompts that can happen in parallel (default: -1 = number of LLM/LLMClient objects)Debug
select to log the output of the model in the Unity EditorAsynchronous Startup
allows to start the server asynchronouslyRemote
select to allow remote access to the serverPort
port to run the serverKill Existing Servers On Start
kills existing servers by the Unity project on startup to handle Unity crashesDownload model
click to download one of the default modelsLoad model
click to load your own model in .gguf formatModel
the path of the model being used (relative to the Assets/StreamingAssets folder)Chat Template
the chat template to use for constructing the promptsLoad lora
click to load a LORA model in .bin formatLoad grammar
click to load a grammar in .gbnf formatLora
the path of the Lora being used (relative to the Assets/StreamingAssets folder)Grammar
the path of the Grammar being used (relative to the Assets/StreamingAssets folder)Context Size
size of the prompt context (0 = context size of the model)Batch Size
batch size for prompt processing (default: 512)Seed
seed for reproducibility. For random results every time select -1Cache Prompt
save the ongoing prompt from the chat (default: true)Num Predict
number of tokens to predict (default: 256, -1 = infinity, -2 = until context filled)Temperature
LLM temperature, lower values give more deterministic answers (default: 0.2)Top K
top-k sampling (default: 40, 0 = disabled)Top P
top-p sampling (default: 0.9, 1.0 = disabled)Min P
minimum probability for a token to be used (default: 0.05)Repeat Penalty
control the repetition of token sequences in the generated text (default: 1.1)Presence Penalty
repeated token presence penalty (default: 0.0, 0.0 = disabled)Frequency Penalty
repeated token frequency penalty (default: 0.0, 0.0 = disabled)Tfs_z
: enable tail free sampling with parameter z (default: 1.0, 1.0 = disabled).
Typical P
: enable locally typical sampling with parameter p (default: 1.0, 1.0 = disabled).
Repeat Last N
: last n tokens to consider for penalizing repetition (default: 64, 0 = disabled, -1 = ctx-size).
Penalize Nl
: penalize newline tokens when applying the repeat penalty (default: true).
Penalty Prompt
: prompt for the purpose of the penalty evaluation. Can be either null
, a string or an array of numbers representing tokens (default: null
= use original prompt
).
Mirostat
: enable Mirostat sampling, controlling perplexity during text generation (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0).
Mirostat Tau
: set the Mirostat target entropy, parameter tau (default: 5.0).
Mirostat Eta
: set the Mirostat learning rate, parameter eta (default: 0.1).
N Probs
: if greater than 0, the response also contains the probabilities of top N tokens for each generated token (default: 0)
Ignore Eos
: ignore end of stream token and continue generating (default: false).
Player Name
the name of the playerAI Name
the name of the AIPrompt
description of the AI roleThe license of LLM for Unity is MIT (LICENSE.md) and uses third-party software with MIT and Apache licenses (Third Party Notices.md).