MultiLexNorm 2021 competition system from ÚFAL
-
Updated
Dec 30, 2021 - Python
MultiLexNorm 2021 competition system from ÚFAL
This repository contains a number of experiments with Multi Lingual Transformer models (Multi-Lingual BERT, DistilBERT, XLM-RoBERTa, mT5 and ByT5) focussed on the Dutch language.
Official code for "Fine-Tashkeel at KSAA-2026" — Systematic evaluation of 18 Seq2Seq, token classification, decoder LLM, and ASR models for automatic Arabic text diacritization. 5th place at KSAA-2026 Shared Task (OSACT7 @ LREC 2026).
Automated pipeline for expanding medieval Latin abbreviations encoded in TEI using finetuned ByT5. Drop your TEI files, run five scripts, and get a Hugging Face dataset plus a lightweight LoRA adapter for ByT5 that turns graphemic ATR output into expanded text.
Computational analysis of Sanskrit morphology and English narrative networks in the Mahābhārata using ByT5-Sanskrit NLP
Add a description, image, and links to the byt5 topic page so that developers can more easily learn about it.
To associate your repository with the byt5 topic, visit your repo's landing page and select "manage topics."