advanced-data-mining-project/Contribution.md at main · WiktorProsowicz/advanced-data-mining-project

This repository contains code used to scrape and preprocess textual data, train models and perform thorough evaluation and experiments. The repository uses the following tech stack:

Python
Pytorch + Torch Lightning - model building and training
MLFlow, Tensorboard - experiment tracking, reproducibility

Repository Structure

scripts/: Entrypoints used directly to perform elements of the scraping/processing/training/evaluation pipeline
- cfg/: Hydra configuration files used by the scripts
src/: Source code
- data/: Resources related to data-related operations
  - eda/: Creating viz & stats for raw or processed data
  - processing/: Extracting numerical features from raw dataset
  - scraping/: Scraping raw data
  - structs/: Classes used to access directories of pre-defined structure
- model/: Definition of the models and model components
- utils/: Utilities used globally

Code guidelines

Use guidelines defined in the Google Python Style Guide (https://google.github.io/styleguide/pyguide.html)
Avoid writing too long functions. If possible, split them into several ones.
Avoid using typedefs to define complex data structures. Make the most out of Pydantic or dataclasses.
Use docstrings for modules, classes and functions. For public functions, use Google docstring format. For private functions, use just a short description. Always in 3rd person.
Avoid using unnecessary try-except blocks and, in general, too nested code.
Every module should have its _logger() function, defining a logger as logging.getLogger(__name__).
Do not use exceptions to handle unrecoverable problems. Use just a critical log and sys exit.
Do not use comments, unless absolutely necessary.
Do not use redundant variables, unless they contribute to the readability of the code. As a rule of thumb, if a variable is asigned a short expression, just use it directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository Structure

Code guidelines

FilesExpand file tree

Contribution.md

Latest commit

History

Contribution.md

File metadata and controls

Repository Structure

Code guidelines