This docker image is based on tools like spaCy, Textacy, pyLDAvis & others to analyse a text corpus, such as the collection of all published documents or any other CSV file with a text column.
It provides a range of Machine Learning and Natural Language Processing algorithms that can be executed over a corpus or its subset.
The project aims to provide these methods over a REST API when feasible.
Upload a CSV file, then click "Create a corpus" to access the pipeline composition page.
Topic Modeling technique is used for finding topics. In machine learning and NLP, a topic model is a statistical model for identifying abstract "topics" in a document collection.
docker-compose build
docker-compose up -d
This will start the application server on localhost:8181 after some time.
The latest dataset can be produced by visiting the global catalogue > See all results > download csv. Once the csv file is downloaded, you can pass it to this application to be analysed. Make sure the "document text" to be analysed is the first column. The other columns are considered metadata.
For testing, you may download an already prepared large corpus data:
curl -L -o data.csv https://www.dropbox.com/s/sihmoc4wwpl0kr2/data_all.csv?dl=1
