The project provides tools for scraping restaurant textual reviews, which are subsequently analysed using either traditional or advanced NLP approaches in order to get insight into relationship between written-form review and actual rating, helpfulness etc.
We recommend you to check out the page describing the conducted experiments.
- .devcontainer # Devcontainer setup.
- doc/ # Documentation, experiments description.
- scripts/ # Contains scripts for running scraping process, EDA etc.
- src/ # Source code.
- justfile # Contains setup recipes, check out for installing deps etc.
- pyproject.toml # Core project configuration (version, dependencies, dev deps).The project's public API, available for the user, can be found in the scripts directory. It contains scripts for running data scraping & processing pipelines, models training and obtaining visualizations and summaries.
just setup_env
just build_project
uv run python scripts/scrape_google_reviews.py +proxy.server=SERVER +proxy.username=USER +proxy.password=PASSWORD
The recommended workflow of using the project assumes the following order of running the scripts:
- scrape_google_reviews - create the raw dataset
- process_dataset - extract all numerical features from the data
- perform_eda - collect and visualize statistics describing the processed data
- train_model - train a neural network that predicts the review sentiment
- summarize_experiment - if multiple models are trained, use this script to compose stats and visualizations based on the test results
In Contribution.md, there's a list of best practices a developer of this repository should follow.
The changes made to the project are recorded to the Changelog.md file.