A plagiarism checker for a final project
-
Clone this repository and make a virtual environment.
-
Windows
python -m venv .venv.\.venv\Scripts\activate
-
Linux
python3 -m venv .venv
source ./.venv/bin/activate
-
-
Install the required packages.
python -m pip install -r requirements.txt
-
Start the webserver.
python server.py
-
Go to http://127.0.0.1:5000 to access the application.
The algorithm for creating scores and checking for plagarism is as follows:
- Using beatiful soup and google search get the top 5 results for the user inputed text.
- Using the top 5 results, get the paragraph text from each of the results. While getting the paragraph text, we tokenize it into sentences using nltk in the data_cleanup function.
- Store the sentences in a source object. Each source object has a URL property and a sentences property. These source objects are kept in an array.
- Turn the user inputed text into sentences using nltk in the data_cleanup function.
-
Now with the user inputed sentences and the source sentences, we can compare them using our algorithm. Every sentence the user input is compared to every sentence in the results from the web scraping.
-
For each source object:
- For each sentence in the source object:
- For each sentence in the user-inputted text:
- Compare the two sentences using the compare function.
- If the score is greater than 0.5, we consider it worthwhile for the user to look at.
- For each sentence in the user-inputted text:
- For each sentence in the source object:
-
Scores are calculated by turning each sentence into a set of words. We then take a jaccard similarity of the two sets.
Test Cases:
-
Fully Plagiarized (Several Matches with 1.0 similiarity): Many high schools across the country have brought much attention to the idea of giving random drug tests to students in high school. The newfound interest in student drug testing may be as a result of recent polls, which have shown an increase in drug use among high school students.
-
Plagiarized with Slight Modification (One Match 0.6956): A lot of high schools across the country have brought some attention to giving random drug tests in high school. This new interest in student drug testing might be from recent poll results, which show an increase in drug use from students.
-
Not Plagiarism (No Matches): Newfound polls show that there is an increase in drug use among high school students. This brings attention to the idea of random drug testing in schools

