Skip to content

Latest commit

 

History

History
55 lines (42 loc) · 1.81 KB

File metadata and controls

55 lines (42 loc) · 1.81 KB

transformerslite

Train simple lite transformer models in few lines of code

Implementation

from transformerslite import pipeline
from datasets import load_dataset

# mandatory to provide valid and train files for now
data = load_dataset('csv', data_files={
    "train": "hg.csv",
    "valid": "hg2.csv"
})


training_pipeline = pipeline.SeqClassifier(data, 
                                           epochs=4, 
                                           max_input_length=32, 
                                           batch_size=1,
                                           learning_rate=0.0001, 
                                           num_class=2)
trainer, tokenizer = training_pipeline.model()
trainer.train()
  • Sequence to Sequence Modeling t5-small
from transformerslite import pipeline
from datasets import load_dataset

# mandatory to provide valid and train files for now
data = load_dataset('csv', data_files={
    "train": "hg.csv",
    "valid": "hg2.csv"
})


training_pipeline = pipeline.T5Seq2Seq(data,
                                       max_input_length=32,
                                       max_target_length=32, 
                                       prefix='seq: ',
                                       epochs=4, 
                                       batch_size=1,
                                       learning_rate=0.0001)

trainer, tokenizer = training_pipeline.model()
trainer.train()

A spellchecker application is hosted on huggingface spaces which is finetuned on randomly modified 50000 sentences with errors imputed. Do try it out here