Assuming a 10 fold cross validation was run for different set of parameters or features and we obtain these scores for 2 groups of models:

We need an estimation the results of our tests on samples are also true for the population

The data for t-test should meet these requirements:

Let's assume for now the data in our experiment are from 2 independent groups and verify 2 other conditions:

Normality

Normality can be verified with Shapiro-Wilk test for normality. The null hypothesis for Shapiro-Wilk test is that the data are normally distributed. If the the p-value is less than the choosen confidence level, then the null hypothesis that the data are normally distributed is rejected. If the p-value is greater than teh confidence level, then the null hypothesis is not rejected.

Outliers

A Z-score of zero represents a value that equals the mean. The further away an observation's Z-score is from zero, the more unusual it is. A standard cut-off value for finding outliers are Z-scores of +/-3 or further from zero.

Two-sided Paired t-test

The null hypothesis that 2 models have identical scores (average of individual cross validation scores). If the t-value is greater than the critical value obtained from Student’s distribution, then the difference is significant. Otherwise it isn’t. The level of significance or (p-value) corresponds to the risk indicated by the t-test table for the calculated t-value. A larger t-value shows that the difference between group means is greater than the common variance, indicating a more significant difference between the groups. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html

From the test output not clear what's the critical value obtained from Student’s distribution. Let's do the test in a manual mode. To get the critical value from the distribution we need a degree of freedom

The t-value (9.772287205694246)is greater than the critical value (2.2621571627409915) obtained from Student’s distribution and the difference is significant.

Corrected t-test

In fact, the data are not independent in K-fold cross-validation. Let's assume n1 is teh size of the training set and n2 is the size of the validation set

The corrected t-value (6.725766889467009) is still greater than the critical value (2.2621571627409915) obtained from Student’s distribution and the difference is significant.

Confidence intervals of the differences between model scores

The difference between the means of model scores for the entire population present in this confidence interval. If there is no difference, then the interval contains zero (0). If zero is NOT in the range of values, the difference is statistically significant.

Corrected Confidence intervals of the differences between model scores for not independent samples

Visualization

0 is not in confidence intervals. It means there is a significant difference between models.Corrected confidence interval is wider and close to 0.