Putting an end to “It’s all Greek to me.”
A classifier that identifies Greek text as either Cypriot Greek (CG) or Standard Modern Greek (SMG)
For more information, you can read my thesis: A Classifier to Distinguish Between Cypriot Greek and Standard Modern Greek.
| Index of Jupyter Notebooks |
|---|
| 1. Obtaining CG and SMG tweets Collecting the corpus |
| 2. Data Analysis Analyzing the corpus |
| 3. Building the Classifier Building the CG-SMG classifier |
The corpus can be found in the Data directory. It was collected by me personally and labeled into CG and SMG by separating the text into files.
| Index of files in corpus |
|---|
| CG Facebook CG text collected from Facebook posts and comments |
| CG Twitter CG text collected from tweets |
| CG Other CG text collected from forum posts, as well as comments on blogs and news articles |
| SMG Facebook SMG text collected from Facebook posts and comments |
| SMG Twitter SMG text collected from tweets |
| SMG Other SMG text collected from forum posts, as well as comments on blogs and news articles |
To run the code, you can clone the repository, install the dependencies, and run the Jupyter notebooks on your local machine, or click the Binder badge at the top of this README to run the notebooks on a remote server.
If you want to run the classifier with your own input, go to the last section of 3. Building the Classifier.