A framework for creating grounded instruction based datasets and training conversational domain expert Large Language Models (LLMs).
A framework for creating grounded instruction based datasets and training conversational domain expert Large Language Models (LLMs).
Learn more in our blog: AI for Healthcare | Introducing OpenGPT.
A conversational model for healthcare trained using OpenGPT. All the medical datasets used to train this model were created using OpenGPT and are available below.
All datasets are in the /data
folder.
pip install opengpt
If you are working with LLaMA models, you will also need some extra requirements:
pip install -r ./llama_train_requirements.txt
We start by collecting a base dataset in a certain domain. For example, collect definitions of all disases (e.g. from NHS UK). You can find a small sample dataset here. It is important that the collected dataset has a column named text
where each row of the CSV has one disease definition.
Find a prompt matching your use case in the prompt database, or create a new prompt using the Prompt Creation Notebook. A prompt will be used to generate tasks/solutions based on the context
(the dataset collected in step 1.)
If you have any questions please checkout discourse