Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
Step-by-step guide on TowardsDataScience: https://towardsdatascience.com/running-llama-2-on-cpu-inference-for-document-q-a-3d636037a3d8
models/
folderpoetry run python main.py "<user query>"
poetry run python main.py "What is the minimum guarantee payable by Adidas?"
poetry run
if you are NOT using Poetry
/assets
: Images relevant to the project/config
: Configuration files for LLM application/data
: Dataset used for this project (i.e., Manchester United FC 2022 Annual Report - 177-page PDF document)/models
: Binary file of GGML quantized LLM model (i.e., Llama-2-7B-Chat)/src
: Python codes of key components of LLM application, namely llm.py
, utils.py
, and prompts.py
/vectorstore
: FAISS vector store for documentsdb_build.py
: Python script to ingest dataset and generate FAISS vector storemain.py
: Main Python script to launch the application and to pass user query via command linepyproject.toml
: TOML file to specify which versions of the dependencies used (Poetry)requirements.txt
: List of Python dependencies (and version)