A simply python program that evaluates a return delimited list of URLs and determines if they are MOVED or REMOVED
Also, a separate URL Checker focused on GitHub, where additional metadata about live or moved URLs are captured.
- Use the
config.json.examplefile as a format for creating your ownconfig.jsonfile with your GitHub token inside it. - Create a python3 virtualenv. We used Python3.11 for testing, so something like
python3.11 -m venv /my/venv/directory - Activate the virtualenv:
source /my/venv/directory/bin/activate - Install required libraries:
pip install -r requirements.txt - Put any URLs you wish to check into the
input_urls.tsvfile - Run the program. For GitHub, it would be
python github_url_checker7.py - Wait and watch the progress
- Check the results
- input_urls_1.tsv - Set of repositories that failed collection during catch up in August, 2024
- input_urls_2.tsv - Set of repositories in the process of collecting after being reset in August, 2024
- input_urls_3.tsv - Set of repositories that failed collection a second time in August, 2024
- input_urls_4.tsv - A more complete set of the same class of data as input_urls_3.tsv
- input_urls_5.tsv - A complete list of ignored repositories.
- input_urls_6.tsf - Repositories in error.
output.tsv files correspond with the related input file number*