Lawyer bot

This is a Telegram bot that can find laws related to the user's situation described in natural language. It can also explain found laws in a simpler language. Who needs it?

For citizens without legal education
For practicing lawyers for quick navigation
For internal customer support services in the government/fintech sectors

Performace

Using a law from the dataset as a query:

Presicion@10: 0.643
Recall@10: 0.002
Hits@10: 0.947
MRR: 0.769
NDCG@10: 0.807
MAP@10: 0.745

Using test queries generated by ChatGPT + manually filtered

Presicion@10: 0.028
Recall@10: 0.005
Hits@10: 0.080
MRR: 0.035
NDCG@10: 0.045
MAP@10: 0.032

These metrics suggest that we have a lot of room for improvement. One possible option could be to change a dataset to a smaller one, only consisting of constitution and federal laws.

How to run

Clone the repository
Run "pip install -r requirements.txt" in the downloaded folder folder
Create ".env" file in the tg_bot folder with "TOKEN=<telegram_bot_token>". One of the ways to obtain it is to use @BotFather in Telegram
Run bot.py and you will be able to interact with your Telegram bot

Technical details

Dataset used: https://github.com/irlcode/RusLawOD (currently cut to 50k samples)

We first make embeddings for every law in the dataset (after lemmalizing it).

Then for each query the following happens. The system searches for the 50 nearest documents using FAISS (we use FAISS IndexFlatIP (exact search for inner product) with prior vector normalization. This makes the metric equivalent to cosine similarity, retrieving their indices and similarity scores. Then, non-existent entries and documents without classification are filtered out. The results are grouped by category (document class).

If there are 5 or more categories, the most relevant document from each is taken, and the top 5 groups make it to the final selection. If there are fewer categories, the top 2 documents from each are selected, sorted by similarity, and the top 5 are kept. This approach ensures a balance between accuracy and answer diversity.

Frontend: Telegram bot on aiogram
Backend: Built in logic inside TG bot
Retrieval: FAISS database + rubert-tiny2 model
LLM Generation: ollama with gemma3 / saiga model
Storage: Local parsquet files + FAISS index

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
dataset		dataset
embeddings		embeddings
tg_bot		tg_bot
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
RAG_with_eval.ipynb		RAG_with_eval.ipynb
README.md		README.md
demo.mp4		demo.mp4
loadData.py		loadData.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lawyer bot

Performace

How to run

Technical details

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lawyer bot

Performace

How to run

Technical details

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages