Test Cases ¶

We evaluated the deep learning model (trained InceptionV3 convolutional neural network) in 6 additional patients with TSC (not used in training, validation, or testing) who had particularly challenging findings, such as extremely subtle tubers or limited myelination.

Evaluation of performance in this scenario is challenging because MRI slices classified as negative by the neuroradiologist may have subtle signs of tubers (as opposed to the test set where normal MRI images came from patients with normal MRI and no TSC). Also, tubers in some of the MRI slices were extremely subtle and the neuroradiologist detected them only after evaluating the adjacent MRI slices, an advantage that the deep learning model does not have because it evaluates each MRI slice as an independent observation (without the context of adjacent MRI slices). When an MRI slice with tubers was missed was generally containing a subtle tail of a tuber that was correctly identified in adjacent MRI slices. Interestingly, when the deep learning model missed tubers, the GradCAM maps and saliency maps showed that it was focusing on the area of tuber(s) but the estimated probability was below the threshold. We invite the reader to evaluate the performance of our deep learning algorithm by seeing the results in these highly challenging MRI images in the links below.

The aggregated performance in all 6 patients was:

Real Classification	TSC	Control
Predicted classification
TSC	123	24
Control	60	52

Sensitivity=0.67; Specificity=0.68; Positive predictive value=0.84; Negative predictive value=0.46; Accuracy=0.68

However, the most interesting results reside in evaluating the performance in individual patients and their images.

Test Case I was a 2 year-old male with high tuber burden.

In T2, the performance was:

Real Classification	TSC	Control
Predicted classification
TSC	20	0
Control	0	0

Sensitivity=1; Specificity=NA; Positive predictive value=1; Negative predictive value=NA; Accuracy=1

See the images at: https://ivansanchezfernandez.github.io/TSC_TestCaseI/#T2.

In FLAIR, the performance was:

Real Classification	TSC	Control
Predicted classification
TSC	19	0
Control	1	0

Sensitivity=0.95; Specificity=NA; Positive predictive value=1; Negative predictive value=0; Accuracy=0.95

See the images at: https://ivansanchezfernandez.github.io/TSC_TestCaseI/#FLAIR.

Test Case II was a 5 year-old female with low tuber burden.

In T2, the performance was:

Real Classification	TSC	Control
Predicted classification
TSC	4	9
Control	0	17

Sensitivity=1; Specificity=0.65; Positive predictive value=0.31; Negative predictive value=1; Accuracy=0.7

See the images at: https://ivansanchezfernandez.github.io/TSC_TestCaseII/#T2.

In FLAIR, the performance was:

Real Classification	TSC	Control
Predicted classification
TSC	2	0
Control	0	13

Sensitivity=1; Specificity=1; Positive predictive value=1; Negative predictive value=1; Accuracy=1

See the images at: https://ivansanchezfernandez.github.io/TSC_TestCaseII/#FLAIR.

Test Case III was a 3 year-old male with high tuber burden.

In T2, the performance was:

Real Classification	TSC	Control
Predicted classification
TSC	29	0
Control	1	0

Sensitivity=0.97; Specificity=NA; Positive predictive value=1; Negative predictive value=0; Accuracy=0.97

See the images at: https://ivansanchezfernandez.github.io/TSC_TestCaseIII/#T2.

In FLAIR, the performance was:

Real Classification	TSC	Control
Predicted classification
TSC	7	0
Control	8	0

Sensitivity=0.47; Specificity=NA; Positive predictive value=1; Negative predictive value=0; Accuracy=0.47

See the images at: https://ivansanchezfernandez.github.io/TSC_TestCaseIII/#FLAIR.

Test Case IV was a 10 day-old female with high tuber burden.

In T2, the performance was:

Real Classification	TSC	Control
Predicted classification
TSC	14	0
Control	6	0

Sensitivity=0.7; Specificity=NA; Positive predictive value=1; Negative predictive value=0; Accuracy=0.7

See the images at: https://ivansanchezfernandez.github.io/TSC_TestCaseIV/#T2.

In FLAIR, the performance was:

Real Classification	TSC	Control
Predicted classification
TSC	3	1
Control	11	0

Sensitivity=0.21; Specificity=0; Positive predictive value=0.75; Negative predictive value=0; Accuracy=0.2

See the images at: https://ivansanchezfernandez.github.io/TSC_TestCaseIV/#FLAIR.

Test Case V was a 4 year-old female with medium tuber burden.

In T2, the performance was:

Real Classification	TSC	Control
Predicted classification
TSC	10	11
Control	4	10

Sensitivity=0.71; Specificity=0.48; Positive predictive value=0.48; Negative predictive value=0.71; Accuracy=0.57

See the images at: https://ivansanchezfernandez.github.io/TSC_TestCaseV/#T2.

In FLAIR, the performance was:

Real Classification	TSC	Control
Predicted classification
TSC	2	0
Control	4	9

Sensitivity=0.33; Specificity=1; Positive predictive value=1; Negative predictive value=0.69; Accuracy=0.73

See the images at: https://ivansanchezfernandez.github.io/TSC_TestCaseV/#FLAIR.

Test Case VI was a 12 year-old male with high tuber burden.

In T2, the performance was:

Real Classification	TSC	Control
Predicted classification
TSC	10	3
Control	16	1

Sensitivity=0.39; Specificity=0.25; Positive predictive value=0.77; Negative predictive value=0.06; Accuracy=0.37

See the images at: https://ivansanchezfernandez.github.io/TSC_TestCaseVI/#T2.

In FLAIR, the performance was:

Real Classification	TSC	Control
Predicted classification
TSC	3	0
Control	9	2

Sensitivity=0.25; Specificity=1; Positive predictive value=1; Negative predictive value=0.18; Accuracy=0.36

See the images at: https://ivansanchezfernandez.github.io/TSC_TestCaseVI/#FLAIR.