Skip to content

How to use gemma? #37

@Gldkslfmsd

Description

@Gldkslfmsd

gemma-2-9b-it

  • doesn't support system message:
$ python3 simulstreaming_translate.py --model-dir ct2_gemma-2-9b-it/ --tokenizer-dir gemma-2-9b-it/ --src-lang en --tgt-lang he --input-jsonl tn.jsonl
INFO: System prompt: You are simultaneous interpreter from English to Hebrew. We are at a conference. It is important that you translate only what you hear, nothing else!
INFO: Init prompt src: ['My', 'hovercraft', 'is', 'full', 'of', 'eels.']
INFO: Init prompt tgt: הרחפת שלי מלאה בצלופחים.
Loading the model...
...done
INFO: Reading tn.jsonl in jsonl format, computationally aware simulation.
INPUT:  Začínají telev
IS FINAL: False
SRC My hovercraft is full of eels. Začínají
FORCED TGT הרחפת שלי מלאה בצלופחים.
Traceback (most recent call last):
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 616, in <module>
    main_simulation_from_file()
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 607, in main_simulation_from_file
    simulation_update(simul, rows, timer)
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 546, in simulation_update
    out_handler(out, row, timer)
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 519, in handle_outputs
    for r in format_outputs(out_seq, in_row, timer, is_final=is_final):
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 506, in format_outputs
    for status, confirmed, unconfirmed in out_seq:
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 275, in process_iter
    out = self.llmtranslator.translate(src, forced_tgt)
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 74, in translate
    prompt_tokens = self.build_prompt(dialog)
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 55, in build_prompt
    base_toks = self.tokenizer.apply_chat_template(dialog[:2], tokenize=True, add_generation_prompt=True)["input_ids"]
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/p3-check/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3132, in apply_chat_template
    rendered_chat, generation_indices = render_jinja_template(
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/p3-check/lib/python3.10/site-packages/transformers/utils/chat_template_utils.py", line 537, in render_jinja_template
    rendered_chat = compiled_template.render(
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/p3-check/lib/python3.10/site-packages/jinja2/environment.py", line 1295, in render
    self.environment.handle_exception()
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/p3-check/lib/python3.10/site-packages/jinja2/environment.py", line 942, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 1, in top-level template code
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/p3-check/lib/python3.10/site-packages/jinja2/sandbox.py", line 401, in call
    return __context.call(__obj, *args, **kwargs)
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/p3-check/lib/python3.10/site-packages/transformers/utils/chat_template_utils.py", line 445, in raise_exception
    raise jinja2.exceptions.TemplateError(message)
jinja2.exceptions.TemplateError: System role not supported

gemma-2-9b:

  • doesn't support chat template
$ python3 simulstreaming_translate.py --model-dir ct2_gemma-2-9b/ --tokenizer-dir gemma-2-9b/ --src-lang en --tgt-lang he --input-jsonl tn.jsonl
INFO: System prompt: You are simultaneous interpreter from English to Hebrew. We are at a conference. It is important that you translate only what you hear, nothing else!
INFO: Init prompt src: ['My', 'hovercraft', 'is', 'full', 'of', 'eels.']
INFO: Init prompt tgt: הרחפת שלי מלאה בצלופחים.
Loading the model...
...done
INFO: Reading tn.jsonl in jsonl format, computationally aware simulation.
INPUT:  Začínají telev
IS FINAL: False
SRC My hovercraft is full of eels. Začínají
FORCED TGT הרחפת שלי מלאה בצלופחים.
Traceback (most recent call last):
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 616, in <module>
    main_simulation_from_file()
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 607, in main_simulation_from_file
    simulation_update(simul, rows, timer)
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 546, in simulation_update
    out_handler(out, row, timer)
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 519, in handle_outputs
    for r in format_outputs(out_seq, in_row, timer, is_final=is_final):
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 506, in format_outputs
    for status, confirmed, unconfirmed in out_seq:
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 275, in process_iter
    out = self.llmtranslator.translate(src, forced_tgt)
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 74, in translate
    prompt_tokens = self.build_prompt(dialog)
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/simulstreaming_translate.py", line 55, in build_prompt
    base_toks = self.tokenizer.apply_chat_template(dialog[:2], tokenize=True, add_generation_prompt=True)["input_ids"]
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/p3-check/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3112, in apply_chat_template
    chat_template = self.get_chat_template(chat_template, tools)
  File "/lnet/work/people/machacek/smluvni-2024/alignatt-whisper.202412/SimulStreaming/p3-check/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3294, in get_chat_template
    raise ValueError(
ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating

So we should find a systematic solution for any of these.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions