Mini ChatPDF Save

🛸 This project is based on GPT3.5-turbo and can answer user's questions based on the PDF text files provided by the user.

Project README

English | 中文įŽ€äŊ“

Mini-ChatPDF

This project is based on GPT3.5-turbo and can answer user's questions based on the PDF text files provided by the user.

Here is a simple example:

æ•ˆæžœåą•į¤ē

Structure

.
├── .env.example           # Example environment variables file
├── .gitignore             # Git ignore rules file
├── example_history.json   # JSON file containing example history
├── LICENSE                # License file
├── main.py                # Main Python script
├── README.md              # Readme file in English
├── README.zh-CN.md        # Readme file in Simplified Chinese
├── requirements.txt       # File listing required dependencies
├── setup.sh               # Shell script for setup
├── utils.py               # Utility Python script
└── pdf_files              # Directory containing PDF files
    ├── bert.pdf           # PDF file named bert
    └── transformer.pdf    # PDF file named transformer

Principles

  1. Read the PDF file and segment it.
  2. Generate a feature vector for each text segment using text-embedding-ada-002.
  3. Generate a feature vector for user input.
  4. Calculate the similarity between the user input and the text using distances_from_embeddings, and return a list of the most similar texts.
  5. Design a prompt and use GPT3.5-turbo to generate answers based on the most similar text list.

Usage

  • Download the project
[email protected]:Duguce/Mini-ChatPDF.git && cd Mini-ChatPDF
  • Create a virtual environment and install the required dependencies
./setup.sh
  • Set up environment variables

Obtain a GPT API key from OpenAI and copy it to the corresponding location in the .env file.

  • Add documents

Add the PDF documents you want to use in the ./pdf_files/ directory.

  • Activate the virtual environment
source .venv/bin/activate
  • Run the script

You should see (.venv) ~/Mini-ChatPDF$ when you run the script. If not, please first run source .venv/bin/activate

python main.py
  • Start the conversation

ToDo

  • Support reading multiple PDF documents simultaneously.

  • Support other text encoding vector methods.

  • Add functionality to save text encoding vectors.

  • Implement a graphical user interface (GUI).

  • Optimize the prompt.

License

This project is licensed under the MIT License.

Open Source Agenda is not affiliated with "Mini ChatPDF" Project. README Source: Duguce/Mini-ChatPDF
Stars
13
Open Issues
0
Last Commit
10 months ago
Repository
License
MIT

Open Source Agenda Badge

Open Source Agenda Rating