Skip to content

MBZUAI-IFM/Nanda-Family

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Nanda Family

Environment Setup

  • Create a conda virtual environment with python=3.11
conda create -n nanda-family-env python=3.11 -c anaconda
  • Activate the conda virtual environment
conda activate nanda-family-env
  • Install Packages
pip install -r requirement.txt
  • Create a .env file and initialize the following environment variables:
OPENAI_API_KEY=[YOUR_OPENAI_API_KEY]
HF_HOME=[YOUR_HF_HOME_DIRECTORY_PATH]
HF_HUB_CACHE=$HF_HOME/hub
HF_TOKEN=[YOUR_HF_ACCESS_TOKEN]

Getting Started with Summarization, Translation, and Transliteration (STT) Evaluation

  • Change the current directory to stt
cd stt
  • Model Response Generation

    • For generating responses of all the Eval Models, execute the following command:
    python run_models.py
    
    • For generating responses of a specific Eval Model (e.g. MBZUAI-IFM/Llama-3.1-Nanda-87B-Chat), execute the following command:
    python run_models.py --model-path MBZUAI-IFM/Llama-3.1-Nanda-87B-Chat
    

    Note: The default System Prompt is set as nanda-basic under stt/stt_config.yaml

    • Ablations: For generating responses of MBZUAI-IFM/Llama-3.1-Nanda-87B-Chat using System Prompts other than the default nanda-basic (i.e., empty, nanda_full, and nanda-simplified), execute the following command:
    python run_models.py --model-path MBZUAI-IFM/Llama-3.1-Nanda-87B-Chat --ablations
    
  • Evaluation (BLEU/ROUGE via Hugging Face evaluate + whitespace tokenizer; CER for transliteration)

    • This script evaluates:

      • translation: BLEU
      • summarization: ROUGE-1/2/L/Lsum
      • transliteration: CER (Character Error Rate, Levenshtein distance / reference length)
    • If a *_responses.jsonl is empty (e.g., interrupted run), it is skipped by default and listed in the output JSON under "skipped".

    • Install dependency (if needed):

    pip install evaluate
    
    • Evaluate all generated STT response files and write a JSON report:
    python stt_evaluation.py \
      --responses_dir output/model_responses \
      --data_dir data/test \
      --output_file output/stt_eval_results.json
    

Getting Started with Safety Evaluation

  • Change the current directory to safety
cd safety
  • Model Response Generation

    • For generating responses of all the Eval Models, execute the following command:
    python run_generate_model_responses.py
    
    • For generating responses of a specific Eval Model (e.g. MBZUAI-IFM/Llama-3.1-Nanda-87B-Chat), execute the following command:
    python run_generate_model_responses.py --model-path MBZUAI-IFM/Llama-3.1-Nanda-87B-Chat
    

    Note: The default System Prompt is set as nanda-basic under safety/safety_config.yaml

    • Ablations: For generating responses of MBZUAI-IFM/Llama-3.1-Nanda-87B-Chat using System Prompts other than the default nanda-basic (i.e., empty, nanda_full, and nanda-simplified), execute the following command:
    python run_generate_model_responses.py --model-path MBZUAI-IFM/Llama-3.1-Nanda-87B-Chat --ablations
    
  • Evaluation

    • Prepare batch data for Safety Evaluation using gpt-4o
    python prepare_batch_data.py
    
    • Generate Safety Evaluation Responses using gpt-4o
    python generate_safety_eval_responses.py
    
    • Generate Summary of Safety Evaluation
    python safety_evaluation.py
    
    • Find generated summary under safety/output/results

MCQ-based Evaluations

  • Generic MCQ-Benchmarks:

    • We used version 0.4.5 of LM-Evaluation-Harness for the Generic MCQ-Benchmark (MMLU, HellaSwag, ARC, TruthfulQA-MC1,MC2) evaluation across Hindi and English
  • BhashaBench-v1:

    • We used the scripts available at the BhashaBench repository for BhashaBench-v1 evaluation across Hindi and English

Note: We did not use apply_chat_template for MCQ-based evaluations, because doing so degraded the scores for all the models.

License

We distribute the different evaluation datasets under different licenses, based on the license of the corresponding source dataset.

ESaral: This dataset is a derivative work based on information obtained from ESaral Hindi Vakya Kosh website. At the time of collection, no explicit license or terms of use were provided on the original website(s). Accordingly, this dataset is shared under the CC BY-SA 4.0 license.

Note: If you are the owner of any of the original data or believe that your rights may be affected, please contact us at monojit.choudhury@mbzuai.ac.ae, and we will review and, if necessary, modify or remove the relevant content.

ILCI: Under CC0 (based on IndicTrans2)

MASSIVE: Under Apache 2.0 (based on MASSIVE)

Aksharantar: Under CC0 (based on IndicTrans2)

Bhasha-Abhijnaanam: CC0 (based on IndicTrans2)

PHINC: Under CC BY 4.0 (based on PHINC)

News: Under MIT (based on Someman/hindi-summarization)

CrossSum-Hi-En: Under CC BY-NC-SA 4.0 license (based on CrossSum)

Flores-Hi-En: Under CC BY-SA 4.0 license (based on Flores)

Do-Not-Answer-Hi-En: Under Apache 2.0 (based on Do-Not-Answer)

Acknowledgment

We extend our sincere gratitude to LibrAI for their invaluable support in the creation and refinement of the Do-Not-Answer-Hi dataset and their significant role in conducting the safety evaluation for this project, including formulation of the safety evaluation protocol.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages