Skip to content

QA gen with accelerate#100

Open
Sriharsha-hatwar wants to merge 1 commit intoarcee-ai:mainfrom
Sriharsha-hatwar:qa_gen_with_accelerate
Open

QA gen with accelerate#100
Sriharsha-hatwar wants to merge 1 commit intoarcee-ai:mainfrom
Sriharsha-hatwar:qa_gen_with_accelerate

Conversation

@Sriharsha-hatwar
Copy link
Copy Markdown
Contributor

@Sriharsha-hatwar Sriharsha-hatwar commented Oct 7, 2024

Hello @shamanez , @Jacobsolawetz

This contains the script that needs to be run to generate the QA data for e2e RAG training. I would want you to have a look at this and let me know if you have any comments.

To succesgully merge this with arcee-train, we need to launch this script using accelerate launch instead of using the API : generate_question_answer_pairs to get the dataset. This involves some more code changes in the arcee train repo as well.

Some metrics :

  1. Script with accelerate :
real    0m42.724s
user    1m21.831s
sys     2m13.471s
  1. Script without accelerate :
real    0m21.747s
user    0m27.539s
sys     0m18.316s

I believe that the gains in the QA generation would only be seen when the dataset is large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants