Adam Mainz, Maria Deprez
Institution(s)
King’s College London
Introduction (include hypothesis)
The aim of this study is to develop a technique to find the sparsest set of features, that creates reproducible results when using similar data sets with the lowest possible variance in MRI images of preterm subjects.
Methods (include source of funding and ethical approval if required)
The proposed method of achieving these results is a stable feature selection, otherwise known as a stability selection, which selects features from a large number of sub-samples of the full data set and returns a probability for each features importance. two types of data were selected using this method, volumetric data and voxel-wise intensity data for MRI scans. The final stability selection for the DHCP (N=65) volumetric data was found to contain 13 of the original 86 features, while the ePrime data set (N=483) was found to contain 14 of the original 86 features. In terms of voxel-wise regression the DHCP data set (N=65) was found to contain 131 out of the original 405924 features.
Results
It should be noted that the Cerebrospinal fluid (CSF) probability in the case of this paper is zero. This is a result of the feature being manually eliminated prior to any steps being taken. When not eliminated prior to processing, the CSF overshadows all other features therefore not allowing the underlying probabilities to be seen. In terms of probabilities associated with Gray Matter (GM) both data sets are very similar; the right and left Cerebellum, right and left Occipital lobe GM, right and left Frontal lobe GM, right and left Parietal lobe GM and right and left Ventricles. With respect to White Matter (WM) both data sets are very different. As the ePrime data set is focused primarily on subjects born before 34 weeks, WM plays an important role with respect to gestational age, more specifically in the WM Gyri and the Corpus Callosum. Other notable WM probabilities are similar in both datasets that match almost exactly with the GM counterparts. Below are results from the voxel-wise regression visualized as an atlas of preterm subjects overlaid with features found from stability selection. From left to right showing Sagittal, Coronal and Transverse planes. The top image is the probabilities after threshold, while the bottom image is the probabilities without thresholding.

Conclusions
Stability Selection is a powerful tool, that enables the selection of the computationally cheapest set of features, while eliminating bias and providing reproducible results across independent data sets. This study found sparse feature sets that were stable when tested with a large sub-sampling rate therefore allowing for optimal stability. In the case of early preterm (age<33) subjects, the features were weighted slightly more heavily towards the WM regions, while preterm (age>33) subjects had a mix of GM and WM in equal proportion.