Pre-train Hugging Face

Project Overview

This repository gathers reproducible resources for experimenting with Hugging Face Transformers on commodity hardware. It pairs two complementary notebooks with the raw WikiText dataset so that collaborators can explore both inference and continued pre-training workflows end-to-end.

Repository Structure

Loading Models and Inference with Hugging Face.ipynb - documents a publication-ready inference guide covering manual and pipeline()-based sentiment analysis with DistilBERT, GPT-2 text generation, and BERT mask filling on CUDA-enabled machines.
Optional Pre-training LLMs with Hugging Face.ipynb - demonstrates how to resume language-model pre-training using Hugging Face Transformers, including environment setup, tokenizer/model configuration, and evaluation checkpoints.
wikitext_dataset_train.txt / wikitext_dataset_test.txt - provide the WikiText corpus used throughout the notebooks for experimentation and benchmarking.

Usage

Create and activate a Python 3.13+ virtual environment with PyTorch and Transformers installed.
Open either notebook in JupyterLab or VS Code and follow the sequenced cells. The inference notebook walks through model loading, tokenisation, and decoding, while the pre-training notebook drives masked language modelling on WikiText.
Adjust hyperparameters, prompts, or dataset slices to extend experiments and record findings in new commits.

Contribution Guidelines

Keep notebooks executable from top to bottom; re-run before committing to ensure outputs align with the documented narrative.
Store additional datasets in dedicated subdirectories and reference them clearly from notebook metadata.
Prefer pull requests for substantive changes so reviewers can validate results and maintain project quality.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pre-train Hugging Face

Project Overview

Repository Structure

Usage

Contribution Guidelines

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Loading Models and Inference with Hugging Face.ipynb		Loading Models and Inference with Hugging Face.ipynb
Optional Pre-training LLMs with Hugging Face.ipynb		Optional Pre-training LLMs with Hugging Face.ipynb
README.md		README.md
wikitext_dataset_test.txt		wikitext_dataset_test.txt
wikitext_dataset_train.txt		wikitext_dataset_train.txt

Folders and files

Latest commit

History

Repository files navigation

Pre-train Hugging Face

Project Overview

Repository Structure

Usage

Contribution Guidelines

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages