There is a cell to read TEST data into the TRAINNG dataframe,
"df_train = get_dataframe('test_data.txt')"
right after reading the training data into the same dataframe, which makes 'train_text.shape' (500, 1) instead of (5452, 1).
I guess that should be removed.