Building the training and testing datasets for the baseline model