in your random forest notebook, in function cross_val_metrics
if print_results:
for i in range(0, len(scores)):
print("Cross validation run {0}: {1: 0.3f}".format(i, scores[i]))
print("Accuracy: {0: 0.3f} (+/- {1: 0.3f})"\
.format(scores.mean(), scores.std() / 2))
else:
return scores.mean(), scores.std() / 2
you split the standard deviation of the samples in half and present that as the... standard error? should the standard deviation not be divided by the square root of the number of the folds, in this case sqrt(10)? or you could just report the standard deviation, not the half of it. better yet, report/return score.std()*1.96/np.sqrt(n_folds) for a 95% confidence interval. the latter scales the standard deviation by 0.62 as opposed to 0.5 so the numerical results are not drastically different.
in your random forest notebook, in function
cross_val_metricsyou split the standard deviation of the samples in half and present that as the... standard error? should the standard deviation not be divided by the square root of the number of the folds, in this case
sqrt(10)? or you could just report the standard deviation, not the half of it. better yet, report/return score.std()*1.96/np.sqrt(n_folds) for a 95% confidence interval. the latter scales the standard deviation by 0.62 as opposed to 0.5 so the numerical results are not drastically different.