LLM App templates for RAG, knowledge mining, and stream analytics. Ready to run with Docker,β‘in sync with your data sources.
Pathway's LLM (Large Language Model) Apps allow you to quickly put in production AI applications which use the most up-to-date knowledge available in your data sources. You can directly run a 24/7 service to answer natural language queries about an ever-changing private document knowledge base, or run an LLM-powered data transformation pipeline on a data stream.
The Python application examples provided in this repo are ready-to-use. They can be run as Docker containers, and expose an HTTP API to the frontend. To allow quick testing and demos, most app examples also include an optional Streamlit UI which connects to this API. The apps rely on the Pathway framework for data source synchronization, for serving API requests, and for all low-latency data processing. The apps connect to document data sources on S3, Google Drive, Sharepoint, etc. with no infrastructure dependencies (such as a vector database) that would need a separate setup.
Quick links - π Why use Pathway LLM Apps? π Watch it in action π How it works π Application examples π Get Started πΌ Showcases π οΈ Troubleshooting π₯ Contributing βοΈ Hosted Version π‘ Need help?
Analysis of live documents streams.
(See: unstructured-to-sql
app example.)
Monitor streams of changing documents, get real-time alerts when answers change.
Using incremental vector search, only the most relevant context is automatically passed into the LLM for analysis, minimizing token use - even when thousands of documents change every minute. This is real-time RAG taken to a new level π.
For the code, see the drive_alert
app example. You can find more details in a blog post on alerting with LLM-App.
The default contextful
app example launches an application that connects to a source folder with documents, stored in AWS S3 or locally on your computer. The app is always in sync with updates to your documents, building in real-time a "vector index" using the Pathway package. It waits for user queries that come as HTTP REST requests, then uses the index to find relevant documents and responds using OpenAI API or Hugging Face in natural language. This way, it provides answers that are always best on the freshest and most accurate real-time data.
This application template can also be combined with streams of fresh data, such as news feeds or status reports, either through REST or a technology like Kafka. It can also be combined with extra static data sources and user-specific contexts, to provide more relevant answers and reduce LLM hallucination.
Read more about the implementation details and how to extend this application in our blog article.
βΆοΈ Building an LLM Application without a vector database - by Jan Chorowski
βΆοΈ Let's build a real-world LLM app in 11 minutes - by Pau Labarta Bajo
LLM Apps built with Pathway can also include the following capabilities:
To learn more about advanced features see: Features for Organizations, or reach out to the Pathway team.
Pick one that is closest to your needs.
Example app (template) | Description |
---|---|
demo-question-answering |
The question-answering pipeline that uses the GPT model of choice to provide answers to the queries about a set of documents. You can also try it on the Pathway Hosted Pipelines website. |
demo-document-indexing |
The real-time document indexing pipeline that provides the monitoring of several kinds of data sources and health-check endpoints. It is available on the Pathway Hosted Pipelines website. |
contextless |
This simple example calls OpenAI ChatGPT API but does not use an index when processing queries. It relies solely on the given user query. We recommend it to start your Pathway LLM journey. |
contextful |
This default example of the app will index the jsonlines documents located in the data/pathway-docs directory. These indexed documents are then taken into account when processing queries. |
contextful-s3 |
This example operates similarly to the contextful mode. The main difference is that the documents are stored and indexed from an S3 bucket, allowing the handling of a larger volume of documents. This can be more suitable for production environments. |
unstructured |
Process unstructured documents such as PDF, HTML, DOCX, PPTX, and more. Visit unstructured-io for the full list of supported formats. |
local |
This example runs the application using Huggingface Transformers, which eliminates the need for the data to leave the machine. It provides a convenient way to use state-of-the-art NLP models locally. |
unstructured-to-sql |
This example extracts the data from unstructured files and stores it into a PostgreSQL table. It also transforms the user query into an SQL query which is then executed on the PostgreSQL table. |
alert |
Ask questions, get alerted whenever response changes. Pathway is always listening for changes, whenever new relevant information is added to the stream (local files in this example), LLM decides if there is a substantial difference in response and notifies the user with a Slack message. |
drive-alert |
The alert example on steroids. Whenever relevant information on Google Docs is modified or added, get real-time alerts via Slack. See the tutorial . |
contextful-geometric |
The contextful example, which optimises use of tokens in queries. It asks the same questions |
with increasing number of documents given as a context in the question, until ChatGPT finds an answer. |
To run the demo-document-indexing
vector indexing pipeline and UI please follow instructions under examples/pipelines/demo-document-indexing/README.md.
To run the demo-question-answering
question answering pipeline please follow instructions under examples/pipelines/demo-question-answering/README.md.
For all other demos follow the steps below.
Now, follow the steps to install and get started with one of the provided examples. You can pick any example that you find interesting - if not sure, pick contextful
.
Alternatively, you can also take a look at the application showcases.
This is done with the git clone
command followed by the URL of the repository:
git clone https://github.com/pathwaycom/llm-app.git
Next, navigate to the repository:
cd llm-app
Create an .env file in the root directory and add the following environment variables, adjusting their values according to your specific requirements and setup.
Environment Variable | Description |
---|---|
APP_VARIANT | Determines which pipeline to run in your application. Available modes are [contextful , contextful-s3 , contextless , local , unstructured-to-sql , alert , drive-alert ]. By default, the mode is set to contextful . |
PATHWAY_REST_CONNECTOR_HOST | Specifies the host IP for the REST connector in Pathway. For the dockerized version, set it to 0.0.0.0 Natively, you can use 127.0.0.1 |
PATHWAY_REST_CONNECTOR_PORT | Specifies the port number on which the REST connector service of the Pathway should listen. Here, it is set to 8080. |
OPENAI_API_KEY | The API token for accessing OpenAI services. If you are not running the local version, please remember to replace it with your API token, which you can generate from your account on openai.com. |
PATHWAY_PERSISTENT_STORAGE | Specifies the directory where the cache is stored. You could use /tmpcache. |
For example:
APP_VARIANT=contextful
PATHWAY_REST_CONNECTOR_HOST=0.0.0.0
PATHWAY_REST_CONNECTOR_PORT=8080
OPENAI_API_KEY=<Your Token>
PATHWAY_PERSISTENT_STORAGE=/tmp/cache
You can install and run your chosen LLM App example in two different ways.
Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Here is how to use Docker to build and run the LLM App:
docker compose run --build --rm -p 8080:8080 llm-app-examples
If you have set a different port in PATHWAY_REST_CONNECTOR_PORT
, replace the second 8080
with this port in the command above.
When the process is complete, the App will be up and running inside a Docker container and accessible at 0.0.0.0:8080
. From there, you can proceed to the "Usage" section of the documentation for information on how to interact with the application.
Install poetry:
pip install poetry
Install llm_app and dependencies:
poetry install --with examples --extras local
You can omit --extras local
part if you're not going to run local example.
Run the examples: You can start the example with the command:
poetry run ./run_examples.py contextful
Send REST queries (in a separate terminal window): These are examples of how to interact with the application once it's running. curl
is a command-line tool used to send data using various network protocols. Here, it's being used to send HTTP requests to the application.
curl --data '{"user": "user", "query": "How to connect to Kafka in Pathway?"}' http://localhost:8080/
curl --data '{"user": "user", "query": "How to use LLMs in Pathway?"}' http://localhost:8080/
If you are on windows CMD, then the query would rather look like this
curl --data "{\"user\": \"user\", \"query\": \"How to use LLMs in Pathway?\"}" http://localhost:8080/
Test reactivity by adding a new file: This shows how to test the application's ability to react to changes in data by adding a new file and sending a query.
cp ./data/documents_extra.jsonl ./data/pathway-docs/
Or if using docker compose:
docker compose exec llm-app-examples mv /app/examples/data/documents_extra.jsonl /app/examples/data/pathway-docs/
Let's query again:
curl --data '{"user": "user", "query": "How to use LLMs in Pathway?"}' http://localhost:8080/
Go to the examples/ui/
directory (or examples/pipelines/unstructured/ui
if you are running the unstructured version.) and execute streamlit run server.py
. Then, access the URL displayed in the terminal to engage with the LLM App using a chat interface. Please note: The provided Streamlit-based interface template is intended for internal rapid prototyping only. In production use, you would normally create your own component instead, taking into account security and authentication, multi-tenancy of data teams, integration with existing UI components, etc.
Want to learn more about building your own app? See step-by-step guide Building a llm-app tutorial
Or,
Simply add llm-app
to your project's dependencies and copy one of the examples to get started!
Python sales - Find real-time sales with AI-powered Python API using ChatGPT and LLM (Large Language Model) App.
Dropbox Data Observability - See how to get started with chatting with your Dropbox and having data observability.
Please check out our Q&A to get solutions for common installation problems and other issues.
To provide feedback or report a bug, please raise an issue on our issue tracker.
Anyone who wishes to contribute to this project, whether documentation, features, bug fixes, code cleanup, testing, or code reviews, is very much encouraged to do so.
To join, just raise your hand on the Pathway Discord server (#get-help) or the GitHub discussion board.
If you are unfamiliar with how to contribute to GitHub projects, here is a Get Started Guide. A full set of contribution guidelines, along with templates, are in progress.
Please see cloud.pathway.com for hosted services. You can quickly set up variants of the unstructured
app, which connect live data sources on Google Drive and Sharepoint to your Gen AI app.
Interested in building your own Pathway LLM App with your data source, stack, and custom use cases? Connect with us to get help with:
Reach us at [email protected] or via Pathway's website.