Construct entity sense embeddings using DT and DeepWalk 

Background 
=====

- Sense Embeddings: https://arxiv.org/pdf/1805.04032.pdf 

- Our sense embedding approach: http://aclweb.org/anthology/W16-1620 

Data
====

1. A Distributional Thesaurus (DT)

- http://panchenko.me/data/joint/dt/common-crawl-2016/ 
- panchenko@ltdata1:/srv/data/depcc/distributional-models  
- use the model 1000-2000: 
http://panchenko.me/data/joint/dt/common-crawl-2016/dependency_lemz-true_cooc-false_mxln-110_semf-true_sign-LMI_wpf-1000_fpw-2000_minw-5_minf-5_minwf-2_minsign-0.0_nnn-200/SimPruned/
- For your reference - these are computed from this corpus: panchenko@ltdata1:/srv/data/depcc/corpus/sentences/cc-2016-en-nohtml-nonoise-sort.txt.gz 


2. Training datasets

- https://docs.google.com/spreadsheets/d/1reP1Lk2UbxTDZtC7K6LmiXdfeEIWKB432hMTCcB1U5c/edit?usp=sharing

- vocabulary of the entities: https://docs.google.com/spreadsheets/d/1umTW0h8hGKqN1NSEpgds36qfhFZC4VO5dBjQ940dUY4/edit?usp=sharing

Code
=====

- WSI: https://github.com/uhh-lt/chinese-whispers , More memory efficient one WSI: https://github.com/nlpub/watset-java 

- Disambiguate sense clusters: https://github.com/uhh-lt/sensegram/blob/master/pcz/make_closure.py


Steps
====

1. Take the DT and compute coverage of the target entities from the https://docs.google.com/spreadsheets/d/1umTW0h8hGKqN1NSEpgds36qfhFZC4VO5dBjQ940dUY4/edit?usp=sharing. Report the coverage here. 

2. Build a graph from the DT and compute it’s graph embeddings using DeepWalk.

- prune from the graph edges with very small (eg t < 0.001) scores
- ALTERNATIVELY ADDITIONALLY build a graph of target entities and all related words

3. Report here some nearest neighbors of some entities here like Michael Jordan.

4. Create a disambiguated graph of senses using the provided code. 

5. Compute embeddings from the graph of senses like before using the DeepWalk. Report sense nearest neighbors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Construct entity sense embeddings using DT and DeepWalk #3

Background

Data

Code

Steps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Construct entity sense embeddings using DT and DeepWalk #3

Description

Background

Data

Code

Steps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions