Datasets made available on our website and on github.

Note: Twitter dataset is a subset of the County Tweet Lexical Bank Giorgi et al., 2018 appended with newer 2019 and 2020 tweets, in total spanning 2009 through 2020. We have released the user IDs of the Twitter users used for our released version of HaRT model. Since this data spanned from different collection times, we had 234 users that were common in the two collections but we treat them as different users. This is why the released train dataset for pre-training consists of 47766 unique user IDs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets made available on our website and on github.

FilesExpand file tree

dataset.md

Latest commit

History

dataset.md

File metadata and controls

Datasets made available on our website and on github.