fh295.github.io/simlex.html at master · fh295/fh295.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">

<title>SimLex-999</title>
<h1>SimLex-999</h1>
<p><i>SimLex-999</i> is a gold standard resource for the evaluation of models that learn the meaning of words and concepts.</p>

<p>SimLex-999 provides a way of measuring how well models capture <i>
similarity, </i>rather than <i> relatedness </i> or <i> association.
</i>The scores in SimLex-999 therefore differ from other well-known evaluation datasets such as <i> WordSim-353 </i> (Finkelstein et al. 2002). The following
two example pairs illustrate the difference - note that <i> clothes </i> are not similar to <i> closets </i> (different materials, function etc.), even though
they
are
very
much
related:
</p>

<br>
<table style="width:600px">
<tr>
  <td><b>Pair</b></td>
  <td><b>Simlex-999 rating</b></td>
  <td><b>WordSim-353 rating</b></td>
</tr>
<tr>
  <td><i> coast - shore </i></td>
  <td halign="right">9.00</td>
  <td halign="right">9.10</td>
</tr>
<tr>
  <td><i> clothes - closet </i></td>
  <td halign="right">1.96</td>
  <td halign="right">8.00</td>
</tr>
</table>

<p>Our experiments indicate that SimLex-999 is challenging for computational models to replicate because, in order to perform well, they must learn to capture
similarity independently of relatedness/association. This is hard because most language-based representation-learning models infer connections between concepts
from their co-occurrence in corpora, and co-occurrence primarity reflects relatedness not similarity. </p>


<br>

<p>In addition to general-purpose evaluations of semantic models, SimLex-999 is structured to facilitate focused evaluations based around the following conceptual distinctions: </p>
<ul>
<li><b>Concreteness:</b> Each concept in each SimLex-999 pair is rated for its conceptual concreteness. Because abstract concepts are more common than concrete concepts in most everyday language (<a href="http://www.cl.cam.ac.uk/~fh295/EMNLP_final.pdf"> and can behave quite differently in semantic models </a>), SimLex-999 contains a balanced selection of concrete (<i> dog, cup </i>) and abstract (<i> envy, deny </i>) concepts. </li>
<br>
<li><b>Part-Of-Speech:</b> SimLex-999 comprises 666 Noun-Noun pairs, 222 Verb-Verb pairs and 111 Adjective-Adjective pairs.</li>
<br>
<li><b>Free-Association:</b> SimLex-999 includes an independent empirical measure of the strength of association (or relatedness) between each of its pairs, taken from the <a href="http://w3.usf.edu/FreeAssociation/"> University of South Florida Free Association Norms.</a></li>
</ul>
<br>


<h2> Download the Dataset </h2>
<p><b><a href="SimLex-999.zip" target="_blank"    >Download
SimLex-999 by clicking here</a></b>. All
design
details
are outlined in the following paper (click to access). Please cite it if
you use SimLex-999:</p>

<p><a href="http://arxiv.org/abs/1408.3456v1" >SimLex-999:
Evaluating Semantic Models with (Genuine) Similarity
Estimation.</a> 2014. Felix Hill, Roi Reichart and Anna Korhonen.<i> Computational Linguistics.</i> 2015</p>

<p>Contact Felix Hill (felix.hill@cl.cam.ac.uk) if your questions
are not addressed in the paper. </p>
<br>


<h2> NEW: SimLex in Other Languages </h2>
<p> SimLex-999 is now in German, Italian and Russian thanks to Ira Leviant and Roi Reichart. See <a href="http://www.leviants.com/ira.leviant/MultilingualVSMdata.html"> this page </a> for more information.</p>

<p> SimLex-999 is now in Estonian thanks to Claudia Kittask and Eduard Barbu. See <a href="https://github.com/estsl/EstSimLex-999"> this page </a> for more information.</p>

<h2> State-of-the-Art </h2>

<p>The well-known <b>Skipgram (Word2Vec)</b> model trained on 1bn words of Wikipedia text achieves a Spearman Correlation of <b>0.37</b> with SimLex-999 [1]. </p>


<h2> State-of-the-Art </h2>

<p>The well-known <b>Skipgram (Word2Vec)</b> model trained on 1bn words of Wikipedia text achieves a Spearman Correlation of <b>0.37</b> with SimLex-999 [1]. </p>

<p>The best performance of a model trained on running <b>monolingual text</b> is a Spearman Correlation of <b>0.56</b> [2]. </p>

<p>A Neural Machine Translation Model (En->Fr) trained on a relatively small <b>bilingual corpus</b> achieves a Spearman Correlation of <b>0.52</b> [3]. </p>

<p>A model that exploits <b>curated knowledge-bases</b> (WordNet, Framenet etc) can reach a Spearman Correlation of <b>0.58</b> [4]. </p>

<p>NEW: A model that uses rich <b>paraphrase data</b> for training can reach a Spearman Correlation of <b>0.68</b> [5]. </p>

<p>NEWER: A  hybrid model trained on features from various word embeddings and two lexical databases achieves a Spearman Correlation of <b>0.76</b>  [6].</p>

<p>NEWERER: Counterfitting works well  [7].</p>

<p>The average pairwise Spearman correlation between two human raters is <b>0.67</b>. However, it may be fairer to compare the performance of models with the average correlation of a human rater with the average of all the other raters. This number is <b>0.78</b>. </p>


<p> Please email felix.hill@cl.cam.ac.uk if you know of better performing models.</p>
<br>
<p>[1]  <a href="http://arxiv.org/abs/1301.3781"><i>Efficient Estimation of Word Representations in Vector Space.</i></a> Tomas Mikolov, Kai Chen, Greg Corrado and Jeffrey Dean. arXiv preprint arXiv:1301.3781. 2013.</p>

<p>[2]  <a href="http://www.cs.huji.ac.il/~roys02/papers/sp_embeddings/sp_embeddings.html"><i>Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction. </i></a> Roy Schwarz, Roi Reichart and Ari Rappoport, CoNLL 2015. </p>

<p>[3]  <a href="http://arxiv.org/abs/1412.6448"><i>Embedding Word Similarity with Neural Machine Translation.</i></a> Felix Hill, KyungHyun Cho, Sebastien Jean, Coline Devin and Yoshua Bengio. ICLR. 2015. </p>

<p>[4]  <a href="http://www.aclweb.org/anthology/P15-2076"><i>Non-Distributional Word Vector Representations.</i></a> Manaal Faruqui and Chris Dyer. ACL. 2015. </p>

<p>[5]  <a href="http://ttic.uchicago.edu/~wieting/wieting2015TACL.pdf"><i>From Paraphrase Database to Compositional Paraphrase Model and Back</i></a> John Weiting, Mohit Bansal, Kevin Gimpel, Karen Livescu and Dan Roth. TACL 2015. </p>

<p>[6]  <a href="http://hlt.bme.hu/en/publ/Recski_2016c"><i>Measuring semantic similarity of words using concept networks.</i></a> Gabor Recski and Eszter Iklodi and Katalin Pajkossy and Andras Kornai. To appear in RepL4NLP 2016. </p>
<br>

<p>[7]  <a href="https://arxiv.org/abs/1603.00892"><i>Measuring semantic similarity of words using concept networks.</i></a> Nikola Mrksic et al. Counter-fitting Word Vectors to Linguistic Constraints. EMNLP 2016. </p>
<br>

<h2> Annotator Instructions </h2>
<p> SimLex-999 was produced by mining the opinions of 500 annotators via Amazon Mechanical Turk. See below for annotator instructions.</p>
<br>

<img src="Screenshot1.png" width="1000" height="500" align="middle" style="margin:0px 20px">


<br>