Note: Twitter dataset is a subset of the County Tweet Lexical Bank Giorgi et al., 2018 appended with newer 2019 and 2020 tweets, in total spanning 2009 through 2020. We have released the user IDs of the Twitter users used for our released version of HaRT model. Since this data spanned from different collection times, we had 234 users that were common in the two collections but we treat them as different users. This is why the released train dataset for pre-training consists of 47766 unique user IDs.