-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathdatasets.tsv
More file actions
We can make this file beautiful and searchable if this error is corrected: Illegal quoting in line 70.
108 lines (108 loc) · 54.9 KB
/
datasets.tsv
File metadata and controls
108 lines (108 loc) · 54.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
ID AUTHOR YEAR TAGS SOURCE_LANGUAGE TARGET_LANGUAGE URL REFS NOTE ALIAS
Bond-2013-OMW Bond, Francis and Foster, Ryan 2013 relations,ontology English ENGLISH http://compling.hss.ntu.edu.sg/omw/ Bond2013 This is an automated mapping to the [Open Multilingual Wordnet](http://compling.hss.ntu.edu.sg/omw/), derived from the Princeton Wordnet ([Fellbaum 1998](:bib:Fellbaum1998)). The pre-selection of lexical items itself is based on a selection of *core* items from the Open Multilingual Wordnet, which are supposed to represent basic vocabulary ([Boyd-Graber et al. 2006](:bib:BoydGraber2006)). OMW
Alonso-2015-AoA Alonso, M. A. and Fernandez, A. and Diez, E. 2015 ratings Spanish (Spain) SPANISH https://doi.org/10.3758/s13428-014-0454-2 Alonso2015 This list includes subjective estimations of age of acquisition (AoA) for Spanish words. The ratings were collected from college students in Spain. Oral frequency norms are taken from [Alonso et al. (2011)](:bib:Alonso2011).
Brysbaert-2009-Frequency Brysbaert, Marc and New, Boris 2009 norms English (US) ENGLISH https://doi.org/10.3758/BRM.41.4.977 Brysbaert2009 This list includes word frequencies based on television and film subtitles in US American English. SUBTLEX-US
Brysbaert-2011-Frequency Brysbaert, Marc and Buchmeier, Matthias and Conrad, Markus and Jacobs, Arthur M. and Boelte, Jens and Boehl, Andrea 2011 norms German GERMAN https://www.ncbi.nlm.nih.gov/pubmed/21768069 Brysbaert2011 This list includes word frequencies based on television and film subtitles in German. SUBTLEX-DE
Brysbaert-2014a-Concreteness Brysbaert, Marc and Warriner, Amy Beth and Kuperman, Victor 2014 ratings English ENGLISH https://doi.org/10.3758/s13428-013-0403-5 Brysbaert2014 This list includes concreteness ratings from over 4,000 participants by means of a norming study using Internet crowdsourcing for data collection. The ratings were obtained on a 5-point rating scale going from abstract to concrete.
Brysbaert-2019-Prevalence Brysbaert, Marc and Mandera, Pawel and McCormick, Samantha F. and Keuleers, Emmanuel 2019 ratings English ENGLISH https://doi.org/10.3758/s13428-018-1077-9 Brysbaert2019 This list includes word prevalence ratings. Word prevalence refers to the number of people who know the word. The measure was obtained on the basis of an online crowdsourcing study involving over 220,000 people.
Cai-2010-Frequency Cai, Q. and Brysbaert, M. 2010 norms Chinese CHINESE http://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexch/cai.pdf Cai2010 This list includes word frequencies based on television and film subtitles in Chinese. SUBTLEX-CH
Cuetos-2011-Frequency Cuetos, Fernando and Glez-Nosti, Maria and Barbon, Analia and Brysbaert, Marc 2011 norms Spanish SPANISH http://crr.ugent.be/papers/CUETOS%20et%20al%202011.pdf Cuetos2011 This list includes Spanish word frequencies taken from contemporary movies and TV series (screened between 1990 and 2009). SUBTLEX-ESP
Desrochers-2009-SubjFrequency Desrochers, Alain and Thompson, Glenn L. 2009 ratings French FRENCH https://doi.org/10.3758/BRM.41.2.546 Desrochers2009 This list includes subjective frequency and imageability estimates for French nouns. The data were collected from two independent groups of 72 young adults each. They rated the words on a 7-point scale.
Engelthaler-2018-Humor Engelthaler, Tomas and Hills, Thomas T. 2018 ratings English ENGLISH https://link.springer.com/article/10.3758/s13428-017-0930-6 Engelthaler2018 This list includes humor ratings for English words. The data was collected from 821 participants using an online crowd-sourcing platform. Each participant rated 211 words on a scale from 1 (humorless) to 5 (humorous).
Juhasz-2013-SER Juhasz, Barbara J. and Yap, Melvin J. 2013 ratings English ENGLISH https://doi.org/10.3758/s13428-012-0242-9 Juhasz2013 This list includes sensory experience ratings (SER) for English words. Sensory experience ratings reflect the extent to which a word evokes a sensory and/or perceptual experience in the mind of the reader. Participants were asked to rate the degree to which each word evoked a sensory experience, on a 1 to 7 scale, with higher numbers indicating a greater sensory experience.
Keuleers-2010-Frequency Keuleers, Emmanuel and Brysbaert, Marc and New, Boris 2010 norms Dutch DUTCH https://doi.org/10.3758/BRM.42.3.643 Keuleers2010 This list includes word frequencies based on television and film subtitles in Dutch. SUBTLEX-NL
Kuperman-2012-AoA Kuperman, Victor and Stadthagen-Gonzalez, Hans and Brysbaert, Marc 2012 ratings English ENGLISH https://doi.org/10.3758/s13428-012-0210-4 Kuperman2012 This list includes age-of-acquisition (AoA) ratings for English content words (nouns, verbs, and adjectives). For data collection, this megastudy used the Web-based crowdsourcing technology offered by the Amazon Mechanical Turk. Since the download link used for this dataset is no longer functional, the data can also partly be found in the dataset folder Green-2025b-AoA.
Riegel-2015-AffectiveRatings Riegel, Monika and Wierzba, Malgorzata and Wypych, Marek and Zurawski, Lukasz and Jednorog, Katarzyna and Grabowska, Anna and Marchewka, Artur 2015 ratings Polish POLISH https://doi.org/10.3758/s13428-014-0552-1 Riegel2015 This list includes the Nencki Affective Word List (NAWL). The items were translated from German to Polish based on the stimuli in the Berlin Affective Word List-Reloaded (BAWL-R: [Vo et al. 2009](:bib:Vo2009)). The data include nouns, verbs, and adjectives, with ratings of emotional valence, arousal, and imageability. NAWL
Scott-2019-Ratings Scott, Graham G. and Keitel, Anne and Becirspahic, Marc and Yao, Bo and Sereno, Sara C. 2019 ratings English ENGLISH https://doi.org/10.3758/s13428-018-1099-3 Scott2019 This list includes the Glasgow Norms: a set of normative ratings for English words on nine psycholinguistic dimensions: arousal, valence, dominance, concreteness, imageability, familiarity, age of acquisition, semantic size, and gender association. The first three values (arousal, valence, dominance) are rated on a 9-point scale, all others are rated on a 7-point scale. Glasgow Norms
StadthagenGonzalez-2017-ValenceArousal Stadthagen-Gonzalez, Hans and Imbault, Constance and Sanchez, Miguel A Perez and Brysbaert, Marc 2017 ratings Spanish SPANISH https://doi.org/10.3758/s13428-015-0700-2 StadthagenGonzalez2017 This list includes valence and arousal ratings for Spanish words. Participants rated the words on a 9-point scale.
Starostin-2000-Sense Sergei Starostin 2000 relations English ENGLISH http://starling.rinet.ru/program.php?lan=en Starostin2000a This list includes sense relations for English submitted with the STARLING database program.
Warriner-2013-AffectiveRatings Warriner, Amy Beth and Kuperman, Victor and Brysbaert, Marc 2013 ratings English ENGLISH https://doi.org/10.3758/s13428-012-0314-x Warriner2013 This list includes ratings on valence (the pleasantness of a stimulus), arousal (the intensity of emotion provoked by a stimulus), and dominance (the degree of control exerted by a stimulus). Participants rated the words on a 9-point scale.
Cortese-2008-AoA Cortese, M. J. and Khanna, M. M. 2008 ratings English ENGLISH https://doi.org/10.3758/BRM.40.3.791 Cortese2008 This list includes age of acquisition (AoA) ratings made on a 1-7 scale for monosyllabic words of English. The data were obtained from 32 participants.
Keuleers-2012-LexicalDecision Keuleers, Emmanuel and Lacey, Paula and Rastle, Kathleen and Brysbaert, Marc 2012 norms English (British) ENGLISH https://doi.org/10.3758/s13428-011-0118-4 Keuleers2012 This list includes lexical decision times for English words, for which two groups of British participants each responded to monosyllabic and disyllabic words. British Lexicon Project
Ferrand-2010-LexicalDecision Ferrand, Ludovic and New, Boris and Brysbaert, Marc and Keuleers, Emmanuel and Bonin, Patrick and Meot, Alain and Augustinova, Maria and Pallier, Christophe 2010 norms French FRENCH https://doi.org/10.3758/BRM.42.2.488 Ferrand2010 This list includes lexical decision times for French words. French Lexicon Project
GonzalezNosti-2014-LexicalDecision Gonzalez-Nosti, Maria and Barbon, Analia and Rodriguez-Ferreiro, Javier and Cuetos, Fernando 2014 norms, ratings Spanish (Spain) SPANISH https://doi.org/10.3758/s13428-013-0383-5 GonzalezNosti2014 This list includes lexical decision times for Spanish words. In addition, the study collected AoA ratings from 25 psychology students on a 7-point Likert scale in which 1 corresponded to ages between 0 and 2 years old, 2 to ages between 2 and 4, and so on up to 7, which corresponded to ages over 12 years old.
Tsang-2018-LexicalDecision Tsang, Yiu-Kei and Huang, Jian and Lui, Ming and Xue, Mingfeng and Chan, Yin-Wah Fiona and Wang, Suiping and Chen, Hsuan-Chih 2018 norms Chinese CHINESE https://doi.org/10.3758/s13428-017-0944-0 Tsang2018 This list includes lexical decision times for Chinese words. MELD-SCH
Keuleers-2015-Prevalence Keuleers, Emmanuel and Stevens, Michael and Mandera, Pawel and Brysbaert, Marc 2015 ratings Dutch (Belgium, Netherlands) DUTCH https://doi.org/10.1080/17470218.2015.1022560 Keuleers2015 This list includes word prevalence ratings. Word prevalence refers to the number of people who know the word. The measure was obtained on the basis of an online crowdsourcing study involving over nearly 300,000 Dutch speakers in Belgium and the Netherlands.
StadthagenGonzalez-2018-DiscreteEmotions Stadthagen-Gonzalez, Hans and Ferre, Pilar and Perez-Sanchez, Miguel A. and Imbault, Constance and Hinojosa, Jose Antonio 2018 ratings Spanish (Spain) SPANISH https://doi.org/10.3758/s13428-017-0962-y StadthagenGonzalez2018 This list includes ratings for discrete emotion categories (happiness, sadness, anger, fear, and disgust). The ratings were obtained on a scale from 1-5 for each category. In addition, the dataset includes norms on PoS taken from [Duchon et al. (2013)](:bib:Duchon2013).
Alonso-2016-AoA Alonso, Maria Angeles and Diez, Emiliano and Fernandez, Angel 2016 ratings Spanish (Spain) SPANISH https://doi.org/10.3758/s13428-015-0675-z Alonso2016 This list includes subjective estimations of age-of-acquisition (AoA) for Spanish verbs. The ratings were collected from college students in Spain.
Imbir-2021-Ratings Imbir, Kamil K. 2021 ratings Polish POLISH https://doi.org/10.3389/fpsyg.2021.707540 Imbir2021 This list includes ratings on psycholinguistic measures of Valence, Arousal, Dominance, Origin, Significance, Concreteness, Imageability, and subjective Age of Acquisition. The participants were students (excluding psychology students). The ratings were based on different Self-Assessment Manikin (SAM) scales. The author published a [Corrigendum](https://www.frontiersin.org/articles/10.3389/fpsyg.2021.707540/full) and updated the original data set in Imbir ([2016](:bib:Imbir2016)).
Ferre-2017-DiscreteEmotions Ferre, Pilar and Guasch, Marc and Martinez-Garcia, Natalia and Fraga, Isabel and Hinojosa, Jose Antonio 2017 ratings Spanish SPANISH https://doi.org/10.3758/s13428-016-0768-3 Ferre2017 This list includes ratings on discrete emotions for Spanish words in five discrete emotion categories: happiness, anger, fear, disgust, and sadness. The participants rated the words on a 5-point scale.
Wierzba-2015-DiscreteEmotions Wierzba, Malgorzata and Riegel, Monika and Wypych, Marek and Jednorog, Katarzyna and Turnau, Pawel and Grabowska, Anna and Marchewka, Artur 2015 ratings Polish POLISH https://doi.org/10.1371/journal.pone.0132305 Wierzba2015 This list includes ratings on discrete emotions for Polish words in five discrete emotion categories: happiness, anger, fear, disgust, and sadness. The participants rated the words on a 7-point scale. The items were based on the stimuli in [Riegel et al. (2015)](:bib:Riegel2015). NAWL BE
Alonso-2011-OralFrequency Alonso, Maria Angeles and Fernandez, Angel and Diez, Emiliano 2011 norms Spanish SPANISH https://doi.org/10.3758/s13428-011-0062-3 Alonso2011 This list includes frequency norms for spoken words based on a corpus of over three million units, representing present-day use of the language in Spain.
Lynott-2020-Sensorimotor Lynott, Dermot and Connell, Louise and Brysbaert, Marc and Brand, James and Carney, James 2020 ratings English ENGLISH https://doi.org/10.3758/s13428-019-01316-z Lynott2020 This list includes ratings across six perceptual modalities (touch, hearing, smell, taste, vision, and interoception) and five action effectors (mouth/throat, hand/arm, foot/leg, head excluding mouth/throat, and torso), gathered from a total of 3,500 individual participants using Amazon Mechanical Turk platform. The ratings were based on a 5-point scale. Lancaster Sensorimotor Norms
Kapucu-2018-EmotionRatings Kapucu, Aycan and Kilic, Asli and Ozkilic, Yildiz and Saribaz, Bengisu 2018 ratings Turkish TURKISH https://doi.org/10.1177/0033294118814722 Kapucu2018 This list includes ratings on two major dimensions of emotion: arousal and valence, as well as on five basic emotion categories of happiness, sadness, anger, fear, and disgust. In addtion, ratings on concreteness were collected. The items were translated from the Affective Norms for English Words (ANEW: [Bradley and Lang 1999](:bib:Bradley1999)) to Turkish.
Briesemeister-2011-DiscreteEmotions Briesemeister, Benny B. and Kuchinke, Lars and Jacobs, Arthur M 2011 ratings German GERMAN https://doi.org/10.3758/s13428-011-0059-y Briesemeister2011 This list includes discrete emotions ratings for the categories happiness, sadness, anger, fear, and disgust. The ratings on German nouns were collected from university students (including psychology) on a 5-point Likert scale (1 = low intensity,5= strong intensity). The list is based on the Berlin Affective Word List-Reloaded (BAWL-R: [Vo et al. 2009](:bib:Vo2009)) DENN–BAWL
Mandera-2015-Frequency Mandera, Pawel and Keuleers, Emmanuel and Wodniecka, Zofia and Brysbaert, Marc 2015 norms Polish POLISH https://doi.org/10.3758/s13428-014-0489-4 Mandera2015 This list includes word frequencies based on television and film subtitles in Polish. SUBTLEX-PL
Moors-2013-Ratings Moors, Agnes and De Houwer, Jan and Hermans, Dirk and Wanmaker, Sabine and Van Schie, Kevin and Van Harmelen, Anne-Laura and De Schryver, Maarten and De Winne, Jeffrey and Brysbaert, Marc 2013 ratings Dutch DUTCH https://doi.org/10.3758/s13428-012-0243-8 Moors2013 This list includes ratings on valence, arousal and age-of-acquisition for Dutch words. The ratings were conducted on a 7-point Likert scale. The participants were students from the Netherlands and Belgium.
Wu-2020-CoreVocabulary Wu, Winston and Nicolai, Garrett and Yarowsky, David 2020 relations Global ENGLISH https://www.aclweb.org/anthology/2020.lrec-1.519.pdf Wu2020 This list was created automatically to present a core vocabulary. The automatic creation was based on the relative coverage of each target concept across 1895 bilingual dictionaries in the LanguageNet multiligual lexicon [(Baldwin et al., 2010)](:bib:Baldwin2010). The LINE_IN_SOURCES column represents the ranking of the words in the original list.
Mohammad-2018-AffectiveRatings Mohammad, Saif M. 2018 ratings English ENGLISH https://doi.org/10.18653/v1/P18-1017 Mohammad2018a This list includes valence, arousal, and dominance ratings for English words. The words were rated by participants on a crowd-sourcing platform. The ratings were obtained by the annotation of best-worst scaling for four words. In addition, the author offers translations of the words for various languages (http://sentiment.nrc.ca/lexicons-for-research/). NRC VAD Lexicon
Mohammad-2018-EmotionIntensity Mohammad, Saif M. 2018 ratings English ENGLISH https://www.aclweb.org/anthology/L18-1027.pdf Mohammad2018b This list includes ratings of dominant emotion (anger, fear, joy, sadness, disgust, anticipation, trust, surprise) and intensity for English words. The dominant emotion is based on pointwise mutual information in the Hashtag Emotion Corpus [(Mohammad 2012)](:bib:Mohammad2012). The words were rated for emotion intensity by participants on a crowd-sourcing platform. The ratings were obtained by the annotation of best-worst scaling for four words. In addition, the author offers translations of the words for various languages (http://sentiment.nrc.ca/lexicons-for-research/). NRC Affect Intensity Lexicon
Clark-2004-ImageryFamiliarity Clark, James M. and Paivio, Allan 2004 ratings English ENGLISH https://doi.org/10.3758/BF03195584 Clark2004 This list includes ratings for imageability and familiarity for English words. The data set is an extension of the norms in [Paivio et al. (1968)](:bib:Paivio1968). The words were rated by psychology students on a 7-point scale. Imageability ratings for words with a number higher than 0 in the column PAVIO_NORMS were taken from [Paivio et al. (1968)](:bib:Paivio1968).
Abdaoui-2017-EmoLex Abdaoui, Amine and Azé, Jérôme and Bringay, Sandra and Poncelet, Pascaln 2017 relations French FRENCH https://doi.org/10.1007/s10579-016-9364-5 Abdaoui2017 This list represents the French Expanded Emotion Lexicon (FEEL). It the includes results of a sentiment analysis of texts according to the emotions associated with the text. The authors provide information for the dominant polarity of a given word (negative/positive) and the dominant emotion(s) associated with the word (joy, anger, surprise, sadness, disgust, fear) presented as binary values (1/0). FEEL
Matisoff-2015-STEDT Matisoff, James A. 2015 relations English Proto-Sino-Tibetan https://stedt.berkeley.edu/ Matisoff2015 This list represents the semantic categorization of the glosses in the Sino-Tibetan Etymological Dictionary and Thesaurus (STEDT). It is an etymological dictionary of Proto-Sino-Tibetan (PST), the ancestor language of the large Sino-Tibetan language family. This family includes Chinese, Tibetan, Burmese, and over 200 other languages spoken in South and Southeast Asia. STEDT
Kiss-1973-EAT Kiss, G. and Armstrong, Christine and Milroy, R. and Piper, J. 1973 relations English English http://vlado.fmf.uni-lj.si/pub/networks/data/dic/eat/Eat.htm Kiss1973 The Edinburgh Associative Thesaurus offers user ratings for a large list of English words. The data themselves are no longer officially available, since the website went down, but we found parts of the data distributed along with the Pajek project for network analysis and visualization. We computed weighted degree and unweighted degree per concept from the data themselves. EAT
Wikidata 2020 relations English Global https://www.wikidata.org/ Wikidata This list includes matches between the words in Wikidata and Concepticon concepts. The words are extracted automatically and the mapping was done semi-automated in that the double mappings were checked by hand. The Wikidata repository consists mainly of items, each one having a label, a description and any number of aliases. Wikidata
OmegaWiki 2020 relations English Global http://www.omegawiki.org OmegaWiki This list includes matches between the words in the OmegaWiki and Concepticon concepts. OmegaWiki aims to create a dictionary of all words of all languages, including lexical, terminological and ontological information. OmegaWiki
Babelnet 2020 relations English Global http://babelnet.org BabelNet This list includes matches between the words in BabelNet and Concepticon concepts. BabelNet is an extension of WordNet and offers word senses in multiple languages. Babelnet
Numerals 2020 relations Global Global This list includes matches between integer numbers and Concepticon concepts.
Crepaldi-2015-Frequency Crepaldi, D. and Amenta, S. and Pawel, M. and Keuleers, E. and Brysbaert, Marc 2015 norms Italian Italian https://lrlac.sissa.it/publications/subtlex-it-subtitle-based-word-frequency-estimates-italian Crepaldi2015 This list includes word frequencies based on television and film subtitles in Italian. SUBTLEX-IT
VanHeuven-2014-Frequency Van Heuven, Walter J.B. and Mandera, Pawel and Keuleers, Emmanuel and Brysbaert, Marc 2014 norms English English http://crr.ugent.be/archives/1423 VanHeuven2014 This list includes word frequencies based on television and film subtitles in UK-English. SUBTLEX-UK
Medler-2005-Perceptual Medler, David A. and Arnoldussen, Aimee and Binder, Jeffrey R. and Seidenberg, Mark S. 2005 ratings English English http://www.neuro.mcw.edu/ratings/ Medler2005 This database contains mean perceptual attribute ratings in 4 sensory-motor domains (Sound, Color, Manipulation, Motion) for 1402 words, as well as Emotion ratings reflecting intensity and valence of emotional associations for the same words. Wisconsin Perceptual Attribute Rating Database
Gilhooly-1980-Ratings Gilhooly, Kenneth J. and Logie, Robert H. 1980 ratings English English https://link.springer.com/content/pdf/10.3758/BF03201693.pdf Gilhooly1980 This list contains age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 English words (nouns) of varying length and frequency.
Speed-2022-Sensorimotor Speed, Laura J. and Brysbaert, Marc 2022 ratings Dutch Dutch https://doi.org/10.3758/s13428-021-01656-9 Speed2022 This list contains sensory modality ratings for 24,000 Dutch words. The modalities include audition, gustation, haptics, olfaction, vision, and interoception.
Chedid-2019-Familiarity Chedid, Georges and Wilson, Maximilliano A. and Bedetti, Christophe and Rey, Amandine E. and Vallet, Guillaume T. and Brambati, Simona Maria 2019 ratings French French Chedid2019a This list contains familiarity ratings for 3,596 French nouns. In addition, reaction times were collected.
Bolognesi-2022-Specificity Bolognesi, Marianna Marcella and Caselli, Tommaso 2022 ratings English, Italian Italian Bolognesi2022 This list contains specificity ratings for Italian nouns, adjectives, and verbs on a 5-point scale. ANEW-ITA
Montefinese-2014-AffectiveRatings Montefinese, Maria and Ambrosini, Ettore and Fairfield, Beth and Mammarella, Nicola 2014 ratings English, Italian Italian Montefinese2014 This list contains affective ratings for Italian nouns, adjectives, and verbs on a 9-point scale.
DiNatale-2021a-AffectiveColexifications Di Natale, Anna and Pellert, Max and Garcia, David 2021 ratings English Global DiNatale2021 This list contains a lexicon based on an unsupervised method of affective lexicon extension that uses colexification network data to interpolate the affective ratings of words that are not included in the original lexicon. The lexicon is constructed with CLICS3 (Rzymski et al. [2020](:bib:Rzymski2020)) and the NRC VAD Lexicon (Mohammad [2018a](:bib:Mohammad2018a)).
DiNatale-2021b-AffectiveColexifications Di Natale, Anna and Pellert, Max and Garcia, David 2021 ratings English Global DiNatale2021 This list contains a lexicon based on an unsupervised method of affective lexicon extension that uses colexification network data to interpolate the affective ratings of words that are not included in the original lexicon. The lexicon is constructed with CLICS3 (Rzymski et al. [2020](:bib:Rzymski2020)) and the affective ratings by Warriner et al. ([2013](:bib:Warriner2013)).
DiNatale-2021c-AffectiveColexifications Di Natale, Anna and Pellert, Max and Garcia, David 2021 ratings English Global DiNatale2021 This list contains a lexicon based on an unsupervised method of affective lexicon extension that uses colexification network data to interpolate the affective ratings of words that are not included in the original lexicon. The lexicon is constructed with FreeDict (freedict.org) and the NRC VAD Lexicon (Mohammad [2018a](:bib:Mohammad2018a)).
DiNatale-2021d-AffectiveColexifications Di Natale, Anna and Pellert, Max and Garcia, David 2021 ratings English Global DiNatale2021 This list contains a lexicon based on an unsupervised method of affective lexicon extension that uses colexification network data to interpolate the affective ratings of words that are not included in the original lexicon. The lexicon is constructed with FreeDict (freedict.org) and the affective ratings by Warriner et al. ([2013](:bib:Warriner2013)).
DiNatale-2021e-AffectiveColexifications Di Natale, Anna and Pellert, Max and Garcia, David 2021 ratings English Global DiNatale2021 This list contains a lexicon based on an unsupervised method of affective lexicon extension that uses colexification network data to interpolate the affective ratings of words that are not included in the original lexicon. The lexicon is constructed with [OmegaWiki](:bib:OmegaWiki) and the NRC VAD Lexicon (Mohammad [2018a](:bib:Mohammad2018a)).
DiNatale-2021f-AffectiveColexifications Di Natale, Anna and Pellert, Max and Garcia, David 2021 ratings English Global DiNatale2021 This list contains a lexicon based on an unsupervised method of affective lexicon extension that uses colexification network data to interpolate the affective ratings of words that are not included in the original lexicon. The lexicon is constructed with [OmegaWiki](:bib:OmegaWiki) and the affective ratings by Warriner et al. ([2013](:bib:Warriner2013)).
Montefinese-2019-AoA Montefinese, Maria and Vinson, David and Vigliocco, Gabriella and Ambrosini, Ettore 2019 ratings English, Italian Italian Montefinese2019 This list contains age-of-acquisition ratings for Italian nouns, adjectives, and verbs. Adult participants were asked to estimate the age at which they thought they had learned the word. ItAoA
Speed-2024-Emotions Speed, Laura J. and Brysbaert, Marc 2024 ratings Dutch Dutch https://doi.org/10.3758/s13428-023-02239-6 Speed2024 This list includes the arousal, valence, and discrete emotion ratings (happiness, anger, fear, disgust, and sadness) for 24,000 Dutch words, which were rated on a 5-point scale. The study contains additional variables, such as word length, frequency, age of acquisition, concreteness, imageability, and lexical decision data, which come from other studies and were therefore not included in the data set here.
Winter-2024-Iconicity Winter, Bodo and Lupyan, Gary and Perry, Lynn K. and Dingemanse, Mark and Perlman, Markus 2024 ratings English English https://doi.org/10.3758/s13428-023-02112-6 Winter2024 This list includes the iconocity rating of 14,000+ English words. Items were rated on a 7-point scale (1 = not iconic at all, 7 = very iconic). Further, playfulness, sensory modality, structural markedness and age of acquisition were compared in the study.
Vankrunkelsven-2024-SemanticGender Vankrunkelsven, Hendrik and Yang, Yang and Brysbaert, Marc and De Deyne, Simon and Storms, Gert 2024 ratings Dutch Dutch https://doi-org.offsitelib.eva.mpg.de/10.3758/s13428-022-02032-x Vankrunkelsven2024 This list includes semantic gender ratings for 24,000 Dutch words. Items were rated on a 5-point scale (1 = very feminine, 2 = rather feminine, 3 = neutral, 4 = rather masculine, 5 = very masculine). The study contains additional variables, such as concreteness, age of acquisition, valence, arousal and dominance, which come from other studies and were therefore not included in the data set here.
Pexman-2019-Sensorimotor Pexman, Penny M. and Muraki, Emiko and Sidhu, David M. and Siakaluk, Paul D. and Yap, Melvin J. 2019 ratings English English https://doi.org/10.3758/s13428-018-1171-z Pexman2019 This list includes ratings of body-object interaction (BOI), i.e., the extent to which the word refers to an object or thing a human body can easily interact with, for 9,000+ English words. Items were rated on a 7-point scale (1 = low body–object interaction, 2-6 = intermediate body–object interaction, 7 = high body–object interaction).
Schmidtke-2014-AffectiveRatings Schmidtke, David S. and Schröder, Tobias and Jacobs, Arthur M. and Conrad, Markus 2014 ratings German German https://doi.org/10.3758/s13428-013-0426-y Schmidtke2014 This list contains ratings of valence, arousal, imageability, dominance and potency for German words. The study includes terms from the Affective Norms for English Words (ANEW) list ([Bradley and Lang 1999](:bib:Bradley1999)) as well as the Berlin Affective Word List (BAWL) ([Vo et al. 2009](:bib:Vo2009)). Valence and imageability were rated on 7-point scales while potency, dominance and arousal for words in the ANEW list were rated on a 9-point scale. Arousal for words in the BAWL was rated on a 5-point scale. The study contains additional variables, such as data on word frequency, grammatical class, number of letters, number of syllables, and number of orthographic neighbors, which come from other studies and were therefore not included in the data set here. ANGST
Guasch-2016-AffectiveRatings Guasch, Marc and Ferré, Pilar and Fraga, Isabel 2016 ratings Spanish Spanish https://doi.org/10.3758/s13428-015-0684-y Guasch2016 This list contains ratings of valence, arousal, concreteness, imageability, context availability, and familiarity of 1,400 Spanish words. Ratings for valence and arousal were given on a 9-point scale. Concreteness, imageability, availability and familiarity were ranked on a 7-point scale. The participants were undergraduate students who were fluent in Spanish.
Bonin-2018-Concreteness Bonin, Patrick and Méot, Alain and Bugaiska, Aurélia 2018 ratings French French https://doi.org/10.3758/s13428-018-1014-y Bonin2018 This list contains ratings of concreteness, context availability, valence and arousal for 1,400+ French words. All ratings were given on a 5-point scale.
Coso-2023-Emotions Ćoso, Bojana and Guasch, Marc and Bogunovic, Irena and Ferré, Pilar and Hinojosa, José A. 2023 ratings English Croatian https://doi.org/10.3758/s13428-022-02003-2 Coso2023 This list contains ratings of five concrete emotions ("happiness", "anger", "sadness", "fear", "disgust") for 3000+ Croatian words. All ratings were given on a 5-point scale. While the original study was conducted with Croatian words, the present mapping is based on the English translations given for these items. CROWD-5e
Repetto-2023-Sensorimotor Repetto, Claudia and Rodella, Claudia and Conca, Francesca and Santi, Gaia Chiara and Catricalá, Eleonora 2023 ratings English Italian https://doi.org/10.3758/s13428-022-02004-1 Repetto2023 This list contains ratings of valence, arousal, dominance, familiarity, imageability and concreteness on a 9-point scale. Further, five effectors (hand-arm, foot-leg, torso, mouth, head) as well as six perceptual modalities (touch, hearing, smell, taste, vision, and interoception) were rated on a 6-point scale. Exclusivity ratings for perception, action and overall sensorimotor values were given ranging from 0-1, to be interpreted as percentages.
Syssau-2009-Valence Syssau, Arielle and Monnier, Catherine 2009 ratings English, French French https://doi.org/10.3758/BRM.41.1.213 Syssau2009 This list contains ratings of valence given by children age five, age seven and age nine for 600 French items. The data comprise the percentage distribution of children within each respective age group who provided ratings on a scale ranging from negative to neutral to positive, with each age group being treated as making up 100%. It is additionally divided by gender (male, female). The present list further contains ratings of the same items given by adults in [Syssau and Font (2005)](:bib:Syssau2005). The items were chosen based on the emotional databases by [Bonin, Méot et al. 2003](:bib:Bonin2003) and [Syssau and Font (2005)](:bib:Syssau2005), licensed as Emotional Source in the dataset. While these original studies were conducted using pictures, the present study used only the corresponding words, either spoken (age group five) or written (age groups seven and nine).
Dingemanse-2020-IconicityHumor Dingemanse, Mark and Thompson, Bill 2020 ratings English English https://doi.org/10.1017/langcog.2019.49 Dingemanse2020 This list contains ratings of iconicity and humor, given both by human participants and as computational ratings. We included only the computational ratings and the iconicity ratings from [Perry et al. (2018)](:bib:Perry2018) since the humor ratings from [Engelthaler and Hills (2018)](:bib:Engelthaler2018) were added as separate dataset. The human ratings were compared to the ratings given by the computational model developed for the present study, which was trained on large natural language corpora. The study also includes ratings of aversion and taboos, which were not included here.
Yi-2025-AffectiveRatings Yi, Wei and Xu, Haitao and Man, Kaiwen 2025 ratings Chinese Chinese https://doi.org/10.3758/s13428-024-02580-4 Yi2025 This list contains ratings of valence, arousal and perceptual experiences for Chinese words translated from [Warriner et al. 2013](:bib:Warriner2013). Further, ratings of familiarity ([Su et al. 2023a](:bib:Su2023a)) and imageability ([Su et al. 2023b](:bib:Su2023b)) were included here since this data is only available in PDF format elsewhere or the file was corrupted.
Amenta-2025-LexicalRecognition Amenta, Simona and de Varda, Andrea Gregor and Mandera, Pawel and Keuleers, Emmanuel and Brysbaert, Marc and Marelli, Marco 2025 norms Italian Italian https://doi.org/10.3758/s13428-024-02548-4 Amenta2025 This list contains lexical recognition times for 130,495 Italian words from the Italian Crowdsourcing Project (ICP). Words were originally selected from SUBTLEX-IT ([Crepaldi et al. 2015](:bib:Crepaldi2015)), enriched with inflected forms, rare/morphologically complex words, dictionary entries, and some modern neologisms, then cleaned to remove names, punctuation, and offensive content. Ratings were given by Italian native speakers. Unlike in classic lexical decision tasks, participants were not prompted ro react as quickly as possible but rather to indicate whether they recognized a given word in their own time. ICP
Redondo-2007-AffectiveRatings Redondo, Jaime and Fraga, Isabel and Padrón, Isabel and Comesaña, Montserrat 2007 ratings English Spanish https://doi.org/10.3758/BF03193031 Redondo2007 This list contains ratings of valence, arousal and dominance of Spanish words. The data is presented as the mean of all participants' responses, female participants' and male participants' responses. The dataset is a translation of the ANEW (Affective Norms for English Words) list ([Bradley and Lang 1999](:bib:Bradley1999)). Additional data from the Spanish lexical database LEXESP by [Sebastian-Galles et al. (2000)](:bib:Sebastian2000) in this dataset include concreteness and imageability ratings. Two additional LEXESP variables, namely objective frequency and familiarity, can be found in a separate dataset (see [Desrochers et al. (2010)](:bib:Desrochers2010)).
Ljubesic-2020-Emotions Ljubešić, Nikola and Markov, Ilia and Fišer, Darrja and Daelemans, Walter 2020 ratings English Dutch, Croatian, Slovene https://aclanthology.org/2020.peoples-1.15/ Ljubesic2020 This list contains ratings for associations of discrete emotions (anger, anticipation, disgust, fear, joy, sadness, surprise and trust) as well as their positive/negative valence. The original ratings were conducted in English (Mohammad & Turney [2010](:bib:Mohammad2010)). This study also provided translations for Dutch, Croatian, and Slovene which were generated with automated translation tools. The present study conducted manual translations for each Slovene and Croatian word and for each Dutch word that has an association rating in at least one column. Translators were asked to take the sentiment and emotion labels that were already associated with the words into account. It appears that the ratings were taken from the original study in English (Mohammad & Turney [2010](:bib:Mohammad2010)) and the novelty of the dataset lies in the updated translations. However, the authors do not make it clear how the dataset was created, so caution is advised. LiLaH
Martinez-2025-AffectiveRatings Martínez, Gonzalo and Molero, Juan Diego and González, Sandra and Conde, Javier and Brysbaert, Marc and Reviriego, Pedro 2025 ratings English English https://doi.org/10.3758/s13428-024-02515-z Martinez2025 This list contains ratings of concreteness, valence and arousal for English items by GPT-4o. The list of words that were rated was sourced from [Warriner et al. 2013](:bib:Warriner2013) and extended. Concreteness was rated on a 5-point scale while valence and arousal were rated on 9-point scales. Soundness of the LLM ratings was ensured by collecting real human ratings and comparing them to the answers given by GPT-4o.
Dunabeitia-2022-MultiPic Duñabeitia, Jon Andoni and Baciero, Ana and Antoniou, Kyriakos and Antoniou, Mark and Ataman, Esra and Baus, Cristina and Ben-Shachar, Michal and Çağlar, Ozan Can and Chromý, Jan and Comesaña, Montserrat and Filip, Maroš and Filipović Đurđević, Dušica and Gillon Dowens, Margaret and Hatzidaki, Anna and Januška, Jiří and Jusoh, Zuraini and Kanj, Rama and Kim, Say Young and Kırkıcı, Bilal and Leminen, Alina and Lohndal, Terje and Yap, Ngee Thai and Renvall, Hanna and Rothman, Jason and Royle, Phaedra and Santesteban, Mikel and Sevilla, Yamila and Slioussar, Natalia and Vaughan-Evans, Awel and Wodniecka, Zofia and Wulff, Stefanie and Pliatsikas, Christos 2022 ratings English American English, Australian English, Basque, Western Flemish, British English, Catalan, Cypriot Greek, Czech, Finnish, French, German, Greek, Hebrew, Hungarian, Italian, Korean, Lebanese Arabic, Malay, Malaysian English, Mandarin Chinese, Dutch, Norwegian, Polish, Portuguese, Quebec French, Rioplatense Spanish, Russian, Serbian, Slovak, Spanish, Turkish, Welsh https://figshare.com/articles/dataset/Untitled_Item/19328939 Dunabeitia2022 This is the 500 item version of the MultiPic dataset, now with 32 languages and varieties (American English, Australian English, Basque, Western Flemish, British English, Catalan, Cypriot Greek, Czech, Finnish, French, German, Greek, Hebrew, Hungarian, Italian, Korean, Lebanese Arabic, Malay, Malaysian English, Mandarin Chinese, Dutch, Norwegian, Polish, Portuguese, Quebec French, Rioplatense Spanish, Russian, Serbian, Slovak, Spanish, Turkish, Welsh) in total and with familiarity ratings for 500 chosen picture stimuli of the original 750. The pictures were standardized for name agreement and visual complexity as well.
Cameirao-2010-AoA Cameirão, Manuela L. and Vicente, Selene G. 2010 ratings Portuguese Portuguese https://doi.org/10.3758/BRM.42.2.474 Cameirao2010 This list contains age of acquisition ratings for 1,749 Portuguese words. Participants were asked to rate the age at which they thought they had learned a given word on a 9-point scale ranging from 2 years old to +13 years old. The dataset also includes ratings for familiarity ([Marques 2004](:bib:Marques2004)), concreteness and imageability ([Marques 2005](:bib:Marques2005)) which were added here. The words in the original dataset are capitalized, resulting in the format in the present list.
SanMiguelAbella-2020-MotorContent San Miguel Abella, Romina A. and González-Nosti, María 2020 ratings Spanish Spanish https://doi.org/10.3758/s13428-019-01241-1 SanMiguelAbella2020 This list contains motor content ratings, i.e. the amount of mobility an actions entails, for 4,565 Spanish verbs given on a 7-point scale. The dataset further includes age of acquisition data from [Alonso et al. 2016](:bib:Alonso2016) which is already included in NoRaRe and not duplicated here. Additionally, the original dataset includes data on frequency, familiarity, imageability and concreteness, sourced from the EsPal database ([Duchon et al. 2013](:bib:Duchon2013)), which were not included here. The verbs in the original dataset are capitalized, resulting in the format in the present list.
Grandy-2020-EmotionImageability Grandy, Thomas H. and Lindenberger, Ulman and Schmiedek, Florian 2020 ratings German German https://doi.org/10.3758/s13428-019-01294-2 Grandy2020 This list contains ratings of emotionality and imageability of about 2500 German nouns given by younger (21-31 years old) and older (70-86 years old) adults. Ratings for imageability were given on a 100-point sliding scale while ratings for emotionality were given on a 200-point sliding scale. Participants used their mouse cursor to select or slide to a point on the scale, the exact numeric value of their selection was displayed for reference.
Winter-2022-SemanticChange Winter, Bodo and Srinivasan, Mahesh 2022 relations English Global https://doi.org/10.1080/10926488.2021.1945419 Winter2022 This network contains asymmetrical, cross-linguistically attested semantic changes between semantically related concepts. It is based on the study by [Urban 2011](:bib:Urban2011).
Zalizniak-2024-DatSemShift Anna Zalizniak and Anna Smirnitskaya and Maksim Russo (Rousseau) and Ilya Gruntov and Timur Maisak and Dmitry Ganenkov and Maria Bulakh and Maria Orlova and Marina Bobrik-Fremke and Oksana Dereza and Tatiana Mikhailova and Maria Bibaeva and Mikhail Voronov 2024 relations English Global Zalizniak2024 This network is based on the Database of Semantic Shifts (DatSemShift), retrieved on February 5, 2024, and converted to CLDF. It captures documented semantic shifts across the world’s languages, linking related concepts where semantic change is attested. Duplicates in the source data are marked with an asterisk and excluded from Concepticon linking and network construction. DatSemShift
List-2023-Colexifications Johann-Mattis List 2023 relations English Global https://doi.org/10.3389/fpsyg.2023.1156540 List2023 This network contains data on partial colexification patterns across multiple languages. The study offers three kinds of colexification graphs, two undirected ones, stored in the column LINKED_CONCEPTS, and one directed graph, stored in the column TARGET_CONCEPTS.
Rubehn-2025-ConceptEmbeddings Rubehn, Arne and List, Johann-Mattis 2025 relations English Global https://aclanthology.org/2025.acl-long.1004/ Rubehn2025 This list contains concept embeddings inferred from cross-lingual (partial) colexification networks.
Xu-2020-Concreteness Xu, Xu and Li, Jiayin 2020 ratings Chinese Chinese https://doi.org/10.1371/journal.pone.0232133 Xu2020 This list contains ratings of concreteness/abstractness on a continuum for 9,877 two-character Chinese words. 1,140 participants completed the survey. The list was compiled from the MEgastudy of Lexical Decision in Simplified CHinese (MELD-SCH, [Tsang et al. 2018](:bib:Tsang2018)).
Brysbaert-2014-AoA Brysbaert, Marc and Stevens, Micha{\"e}l and De Deyne, Simon and Voorspoels, Wouter and Storms, Gert 2014 ratings Dutch Dutch https://doi.org/10.1016/j.actpsy.2014.04.010 Brysbaert2014b This list contains ratings of age of acquisition (AoA) for 30.000 Dutch words given by 74 stundents and scientific collaborators of the University of Ghent. Items were taken from [Moors et al. (2013)](:bib:Moors2013).
Brysbaert-2014b-Concreteness Brysbaert, Marc and Stevens, Micha{\"e}l and De Deyne, Simon and Voorspoels, Wouter and Storms, Gert 2014 ratings Dutch Dutch https://doi.org/10.1016/j.actpsy.2014.04.011 Brysbaert2014b This list contains ratings of concreteness for 30.000 Dutch words given by 75 stundents and scientific collaborators of the University of Leuven. Items were taken from [Moors et al. (2013)](:bib:Moors2013).
Rubehn-2025-ConceptEmbeddings Rubehn, Arne and List, Johann-Mattis 2025 relations English Global https://aclanthology.org/2025.acl-long.1004/ Rubehn2025 This list contains concept embeddings inferred from cross-lingual (partial) colexification networks.
Xu-2021-AoA Xu, Xu and Li, Jiayin and Guo, Shulun 2021 ratings Chinese Chinese https://doi.org/10.3758/s13428-020-01455-8 Xu2021 This list contains ratings of age of acquisition (AoA) for 19,716 simplified Chinese words given by 1765 native speakers of Mandarin Chinese.
Soares-2012-AffectiveRatings Soares, Ana Paula and Comesaña, Montserrat and Pinheiro, Ana P and Simões, Alberto and Frade, Carla Sofia 2012 ratings Portuguese Portuguese https://doi.org/10.3758/s13428-011-0131-7 Soares2012 This list contains ratings of valence, arousal and dominance for 1,034 Portuguese words. These words were selected from the Affective Norms for English Words (ANEW) database [(Bradley & Lang, 1999)](:bib:Bradley1999) and compared to the Spanish translation used for a similar study conducted by [Redondo et al. (2007)](:bib:Redondo2007). The Portuguese translation used here was therefore heavily influenced by the Spanish one used previously. The original dataset provides ratings separately for all, male, and female participants. The present mappings include only the ratings for all participants.
Chedid-2019-AuditoryStrength Chedid, Georges and Brambati, Simona Maria and Bedetti, Christophe and Rey, Amandine E. and Wilson, Maximilliano A. and Vallet, Guillaume T. 2019 ratings French French https://doi.org/10.3758/s13428-019-01254-w Chedid2019b This list contains ratings of auditory perceptual strength for 3,596 French Canadian words. The same study also collected ratings for visual perceptual strength, which have been added here in a separate dataset. It is a companion study to [Chedid et al. (2019a)](:bib:Chedid2019a).
Chedid-2019-VisualStrength Chedid, Georges and Brambati, Simona Maria and Bedetti, Christophe and Rey, Amandine E. and Wilson, Maximilliano A. and Vallet, Guillaume T. 2019 ratings French French https://doi.org/10.3758/s13428-019-01254-w Chedid2019b This list contains ratings of visual perceptual strength for 3,596 French Canadian words. The same study also collected ratings for auditory perceptual strength, which have been added here in a separate dataset. It is a companion study to [Chedid et al. (2019a)](:bib:Chedid2019a).
Green-2025a-AoA Green, Clarence and Kong, Anthony and Brysbaert, Marc and Keogh, Kathleen 2025 ratings English English https://doi.org/10.3758/s13428-025-02843-8 Green2025 This list contains age of acquisition ratings for English words that received a score of 10 years or lower in [Kuperman et al. (2012)](:bib:Kuperman2012). Three studies were conducted: Study 1 crowdsourced AoA ratings for print, i.e., written and read words, extending the study by [Kuperman et al. (2012)](:bib:Kuperman2012). Study 2 tested the extent to which the results obtained in Study 1 are replicable by an untrained LLM (GPT-4o). Study 3 extended the LLM method applied in Study 2. It was used to fine-tune the LLM with regard to the human ratings and get refined ratings. The original study by [Green et al. (2025)](:bib:Green2025) further includes ratings of the full [Kuperman et al. (2012)](:bib:Kuperman2012) list by a trained LLM and ratings of the English Crowdsourcing Project [(Mandera et al. 2020)](:bib:Mandera2020) by both a trained and an untrained LLM. These datasets can be found in the dataset folders Green-2025b-AoA and Green-2025c-Aoa, respectively.
Green-2025b-AoA Green, Clarence and Kong, Anthony and Brysbaert, Marc and Keogh, Kathleen 2025 ratings English English https://doi.org/10.3758/s13428-025-02843-8 Green2025 This list contains age of acquisition (AoA) ratings for English words given by a trained LLM (GPT-4o). The list of words was taken from [Kuperman et al. (2012)](:bib:Kuperman2012) with the goal to replicate their original ratings in the LLM. Further, AoA ratings for print, i.e., written and read words, were given by the LLM. This dataset also includes the original ratings given in [Kuperman et al. (2012)](:bib:Kuperman2012). The original study by [Green et al. (2025)](:bib:Green2025) further includes crowdsourced AoA ratings as well as untrained LLM scores for print for words which received a score below 10 years old in [Kuperman et al. (2012)](:bib:Kuperman2012). In addition, ratings of the English Crowdsourcing Project [(Mandera et al. 2020)](:bib:Mandera2020) by both a trained and an untrained LLM are included in the original study. These datasets can be found in the dataset folders Green-2025a-AoA and Green-2025c-Aoa, respectively.
Green-2025c-AoA Green, Clarence and Kong, Anthony and Brysbaert, Marc and Keogh, Kathleen 2025 ratings English English https://doi.org/10.3758/s13428-025-02843-9 Green2025 This list contains age of acquisition (AoA) ratings for English words given by an untrained as well as a trained LLM (GPT-4o). The words rated here were all words included in the English Crowdsourcing Project (ECP) [(Mandera 2020)](:bib:Mandera2020) that did not obtain an AoA rating in [Kuperman et al. (2012)](:bib:Kuperman2012). The original study by [Green et al. (2025)](:bib:Green2025) further includes crowdsourced AoA ratings as well as untrained LLM scores for print for words which received a score below 10 years of age in [Kuperman et al. (2012)](:bib:Kuperman2012). In addition, ratings of the full [Kuperman et al. (2012)](:bib:Kuperman2012) list by a trained LLM, as well as the original ratings given in [Kuperman et al. (2012)](:bib:Kuperman2012) were included. These datasets can be found in the dataset folders Green-2025a-AoA and Green-2025b-Aoa, respectively.
Brysbaert-2025-Familiarity Brysbaert, Marc and Martínez, Gonzalo and Reviriego, Pedro 2025 ratings English English https://doi.org/10.3758/s13428-024-02561-7 Brysbaert2025 This dataset contains familiarity ratings given by GPT-4o as well as Multilex [(van Paridon & Thompson 2021](:bib:vanParidon);[Gimenes & New 2016)](:bib:Gimenes2016) frequencies for single words. Familiarity ratings were given on a 1–7 scale (1 = very unfamiliar, 7 = very familiar). The Multilex variable combines word frequencies from subtitles, Twitter, blogs, and news sources.
Iaroshenko-2025-EmoLex Iaroshenko, Polina V. & Natalia V. Loukachevitch 2025 relations Russian Russian https://doi.org/10.22363/2687-0088-44439 Iaroshenko2025 The Russian Emotion Lexicon (RusEmoLex) contains 1,274 words that appear in at least two sources from a larger original list of 7,937 candidate words. Words are categorized by semantic class: Радость ('joy'), Грусть ('sadness'), Страх ('fear'), Злость ('anger'), and Удивление ('surprise'). The original sources included dictionaries, corpora, and emotion-word association surveys, with some words appearing in up to seven different sources. Only the Class variable is included here. RusEmoLex
Binder-2016-AffectiveRatings Binder, Jeffrey R. and Conant, Lisa L. and Humphries, Colin J. and Fernandino, Leonardo and Simons, Stephen B. and Aguilar, Mario and Desai, Rutvik H. 2016 ratings English English https://doi.org/10.1080/02643294.2016.1147426 Binder2016 This dataset provides sensorimotor, cognitive, emotional, spatial, and impact ratings for English words. Different instructions were given for nouns, verbs, and adjectives within each variable. Some queries were intentionally nonsensical for certain words (e.g., asking about events for a static object). Note that in these cases “Not Applicable” responses were converted to 0. Ratings were given on a 6-point scale (0 = not at all, 3 = somewhat, 6 = very much). Semantic and ontological annotations are included. Frequencies were taken from [Shaoul & Westbury (2013)](:bib:Shaoul2013). Imageability ratings were compiled from different sources [(Bird et al. 2001](:bib:Bird2001); [Clark & Paivio 2004](:bib:Clark2004); [Cortese & Khanna 2008](:bib:Cortese2008); [Wilson 1988)](:bib:Wilson1988) for the original dataset and also included here.
Bird-2001-ImageabilityAoA Bird, Helen and Franklin, Sue and Howard, David 2001 ratings English English https://doi.org/10.3758/BF03195349 Bird2001 This list contains ratings of age of acquisition (AoA) as well as imageability as given by 78 participants. Further, AoA and imageability ratings from the MRC Psycholinguistic Database [(Coltheart 1981)](:bib:Coltheart1981) and logarithmic frequency measures from the CELEX database ([Baayer et al. 1996)](:bib:Baayer1996) were included.
Ravelli-2025-Specificity Ravelli, Andrea Amelio and Bolognesi, Marianna Marcella and Caselli, Tommaso 2025 ratings English English https://doi.org/10.1007/s10339-024-01239-4 Ravelli2025 This list contains specificity ratings for English words, provided by native speakers. The list used for rating was compiled from the ANEW dataset ([Bradley et al. 1999)](:bib:Bradley1999).
Su-2023-AffectiveRatings Su, I-Fan and Yum, Yen Na and Lau, Dustin Kai-Yan 2023 ratings Chinese Chinese https://doi.org/10.3758/s13428-022-01928-y Su2023 This list contains ratings of age of acquisition (AoA), imageability, familiarity and concreteness for 4376 traditional Chinese characters given by 20 native speakers of Cantonese. Further, logarithmic frequency and number of strokes per character were provided. The original list also contains semantic radical transparency ratings.
Dimitropoulou-2010-Frequency Dimitropoulou, Maria and Duñabeitia, Jon Andoni and Avilés, Alberto and Corral, José and Carreiras, Manuel 2010 norms Greek Greek https://doi.org/10.3389/fpsyg.2010.00218 Dimitropoulou2010 This list includes word frequencies based on subtitles from 5,508 television series and films in Greek. SUBTLEX-GR
Kiritchenko-2017a-Valence Kiritchenko, Svetlana and Mohammad, Saif 2017 ratings English English https://doi.org/10.18653/v1/P17-2074 Kiritchenko2017 This list contains valence ratings given on a best-worst scale (BWS). Participants were asked to select the most positive and the most negative word out of a 4-tuple. Using the conting procedure, each term’s score was calculated as the percentage of times it was chosen as most positive minus the percentage of times the term was chosen as most negative. The scores range from −1 (most negative) to 1 (most positive). Using the same words, a similar experiment was also conducted with a 9-point rating scale. Those data are provided in a separate dataset, see Kiritchenko-2017b-Valence.
Kiritchenko-2017b-Valence Kiritchenko, Svetlana and Mohammad, Saif 2017 ratings English English https://doi.org/10.18653/v1/P17-2074 Kiritchenko2017 This list contains valence ratings given on a 9-point scale (-4 = extremely negative, 4 = extremely positive). The scores were then converted and range from 1 to 9 in the present data. Using the same words, a similar experiment was also conducted with a best-worst scale (BWS). Those data are provided in a separate dataset, see Kiritchenko-2017a-Valence.
Wang-2022-AffectiveRatings Wang, Shaonan and Zhang, Yunhao and Zhang, Xiaohan and Sun, Jingyuan and Lin, Nan and Zhang, Jiajun and Zong, Chengqing 2022 ratings Chinese Chinese https://doi.org/10.18112/openneuro.ds004301.v1.0.0 Wang2022 This dataset provides ratings for Chinese nouns, verbs and adjectives across perceptual, sensorimotor, spatial, temporal, causal, social, cognitive and affective domains given by 30 participants. Different instructions were given for nouns, verbs, and adjectives within each variable, so ratings reflect part-of-speech-specific interpretations. Some variables were queried only for certain word classes. Ratings were given on a 7-point scale (0 = not at all, 3 = somewhat, 7 = very much). The dataset includes mean ratings for each variable. The list is based on [Binder et al. (2016)](:bib:Binder2016), though 13 variables from the original English study were ommited in the Chinese version: motion, biomotion, shape, texture, audition, low, high, speech, time, social, harm, pleasant and unpleasant.
MartinezTomas-2026-AffectiveRatings Martínez-Tomás, Celia and Guasch, Marc and Ferré, Pilar and Lázaro, Miguel and Hinojosa, José Antonio 2026 ratings Spanish Spanish https://doi.org/10.3758/s13428-026-02976-4 MartinezTomas2026 This list contains ratings of valence and arousal for 1,200 Spanish words. The original study also included 4,800 pseudowords which were rated for wordlikeliness in addition to valence and arousal. Further, valence and arousal ratings for the real words was listed from multiple sources [(Ferré et al. 2012](:bib:Ferre2012), [Guasch et al. 2016](:bib:Guasch2016), [Hinojosa et al. 2016](:bib:Hinojosa2016), [Stadthagen-Gonzalez et al. 2017])(:bib:StadthagenGonzalez2017), which were not included in here but can be found in separate lists.