LegalQA Save

Korean LegalQA using SentenceKoBART

Project README

LegalQA using SentenceKoBART and OpenAI ChatGPT

1. Setup
1. Approximate KNN Search with AnnLite
- 2.1. Index
- 2.2. Query
  - 2.2.1. Retrieval Augmented Response with OpenAI ChatGPT
1. Run Chat Demo
1. Presentation
1. Demo
1. Links
1. FAQ
- 7.1. Why this dataset?
- 7.2. LFS quota is exceeded
1. Citation
1. License

Implementation of legal QA system based on SentenceKoBART

How to train SentenceKoBART
Based on Neural Search Engine Jina v2.0
Provide Korean legal QA data(1,830 pairs)
Apply approximate KNN search with Faiss, Annoy, Hnswlib.
Retrieval Augmented Answer Generation with OpenAI ChatGPT.

1. Setup

# install git lfs , https://github.com/git-lfs/git-lfs/wiki/Installation
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt install git-lfs
git clone https://github.com/haven-jeon/LegalQA.git
cd LegalQA
git lfs pull
# If the lfs quota is exceeded, please download it with the command below.
# https://drive.google.com/file/d/1DJFMknxT7OAAWYFV_WGW2UcCxmuf3cp_/view?usp=sharing
# mv SentenceKoBART.bin model/
# pip install --use-deprecated=legacy-resolver  -r requirements.txt 
pip install -r requirements.txt

2. Approximate KNN Search with AnnLite

2.1. Index

python app.py -t index --flow flows/index_annlite.yml

GPU-based indexing available as an option

device: cuda

2.2. Query

# test on bash
python app.py -t query --flow flows/query_annlite.yml
# test on REST API
python app.py -t query_restful --flow flows/query_annlite.yml

2.2.1. Retrieval Augmented Response with OpenAI ChatGPT

Get OpenAI API from https://platform.openai.com/account/api-keys

OPENAI_API_KEY=$OPENAI_KEY python app.py -t query --flow flows/query_annlite_openai.yml

3. Run Chat Demo

OPENAI_API_KEY=$OPENAI_KEY python app.py -t query_restful --flow flows/query_annlite_openai.yml
streamlit run chat.py

https://user-images.githubusercontent.com/957840/227705344-27501a6f-1e0b-48c0-854d-62ebc8d3160d.mp4

4. Presentation

Neural IR 101

5. Demo

Working!

6. Links

[AI 모델 탐험기] #13 Neural Search를 이용하여 제작된 법률 QA 검색 시스템, Legal QA

7. FAQ

7.1. Why this dataset?

Legal data is composed of technical terms, so it is difficult to search if you are not familiar with these terms. Because of these characteristics, I thought it was a good example to show the effectiveness of neural IR.

7.2. LFS quota is exceeded

You can download SentenceKoBART.bin from one of the two links below.

https://drive.google.com/file/d/1DJFMknxT7OAAWYFV_WGW2UcCxmuf3cp_/view?usp=sharing

8. Citation

Model training, data crawling, and demo system were all supported by the AWS Hero program.

@misc{heewon2021,
author = {Heewon Jeon},
title = {LegalQA using SentenceKoBART},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/haven-jeon/LegalQA}}

9. License

QA data data/legalqa.jsonlines is crawled in www.freelawfirm.co.kr based on robots.txt. Commercial use other than academic use is prohibited.
We are not responsible for any legal decisions we make based on the resources provided here.

Open Source Agenda is not affiliated with "LegalQA" Project. README Source: haven-jeon/LegalQA

Stars

Open Issues

Last Commit

1 year ago

Repository

haven-jeon/LegalQA

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/legalqa"><img src="https://www.opensourceagenda.com/projects/legalqa/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022