advanced-data-mining-project

The project provides tools for scraping restaurant textual reviews, which are subsequently analysed using either traditional or advanced NLP approaches in order to get insight into relationship between written-form review and actual rating, helpfulness etc.

We recommend you to check out the page describing the conducted experiments.

Project structure

- .devcontainer   # Devcontainer setup.
- doc/            # Documentation, experiments description.
- scripts/        # Contains scripts for running scraping process, EDA etc.
- src/            # Source code.
- justfile        # Contains setup recipes, check out for installing deps etc.
- pyproject.toml  # Core project configuration (version, dependencies, dev deps).

Usage

The project's public API, available for the user, can be found in the scripts directory. It contains scripts for running data scraping & processing pipelines, models training and obtaining visualizations and summaries.

just setup_env
just build_project
uv run python scripts/scrape_google_reviews.py +proxy.server=SERVER +proxy.username=USER +proxy.password=PASSWORD

The recommended workflow of using the project assumes the following order of running the scripts:

scrape_google_reviews - create the raw dataset
process_dataset - extract all numerical features from the data
perform_eda - collect and visualize statistics describing the processed data
train_model - train a neural network that predicts the review sentiment
summarize_experiment - if multiple models are trained, use this script to compose stats and visualizations based on the test results

Development

In Contribution.md, there's a list of best practices a developer of this repository should follow.

Changelog

The changes made to the project are recorded to the Changelog.md file.

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
.devcontainer		.devcontainer
.dvc		.dvc
data		data
doc		doc
scripts		scripts
src/advanced_data_mining		src/advanced_data_mining
.dvcignore		.dvcignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Changelog.md		Changelog.md
Contribution.md		Contribution.md
README.md		README.md
justfile		justfile
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

advanced-data-mining-project

Project structure

Usage

Development

Changelog

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

advanced-data-mining-project

Project structure

Usage

Development

Changelog

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages