NLP Corpus Analysis (alpha stage)

This docker image is based on tools like spaCy, Textacy, pyLDAvis & others to analyse a text corpus, such as the collection of all published documents or any other CSV file with a text column.

It provides a range of Machine Learning and Natural Language Processing algorithms that can be executed over a corpus or its subset.

The project aims to provide these methods over a REST API when feasible.

Current features

Compose a text transformation pipeline to prepare a corpus

Upload a CSV file, then click "Create a corpus" to access the pipeline composition page.

Create and visualise topic models via pyLDAvis.

Topic Modeling technique is used for finding topics. In machine learning and NLP, a topic model is a statistical model for identifying abstract "topics" in a document collection.

Video demonstration

How to run:

docker-compose build
docker-compose up -d

This will start the application server on localhost:8181 after some time.

Corpus Data

The latest dataset can be produced by visiting the global catalogue > See all results > download csv. Once the csv file is downloaded, you can pass it to this application to be analysed. Make sure the "document text" to be analysed is the first column. The other columns are considered metadata.

For testing, you may download an already prepared large corpus data:

curl -L -o data.csv https://www.dropbox.com/s/sihmoc4wwpl0kr2/data_all.csv?dl=1

Name		Name	Last commit message	Last commit date
Latest commit History 734 Commits
corpus		corpus
examples/ldavis		examples/ldavis
src/eea.corpus		src/eea.corpus
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
devel-compose.yml		devel-compose.yml
docker-compose.yml		docker-compose.yml
enter_shell.sh		enter_shell.sh
rancher-compose.yml		rancher-compose.yml
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Corpus Analysis (alpha stage)

Current features

Compose a text transformation pipeline to prepare a corpus

Create and visualise topic models via pyLDAvis.

How to run:

Corpus Data

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLP Corpus Analysis (alpha stage)

Current features

Compose a text transformation pipeline to prepare a corpus

Create and visualise topic models via pyLDAvis.

How to run:

Corpus Data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages