|Greenbury S, Wu J, Ougham K, Hyde M, Glen R, Gale C, Angelini E, Modi N
|1) ITMAT Data Science Group, NIHR Imperial Biomedical Research Centre, Imperial College London 2) Section of Neonatal Medicine, Department of Medicine, Imperial College London
|The National Neonatal Research Database (NNRD) is a mature, longitudinal, relational database containing around 450 pre-defined variables many recorded daily, that flow from the real-time, point-of-care, clinician-entered electronic patient records of all admissions to NHS neonatal units. Data are an NHS Information Standard for England and include demographics, diagnoses, outcomes, daily treatments and care processes (Neonatal Data Set ISB 1595). To-date, the NNRD contains information on over one million patients. We aimed to develop methods to curate NNRD data for the application of machine learning (ML) and Artificial Intelligence (AI) techniques and conduct a proof of concept evaluation to test the hypothesis that these approaches can reliably identify clinically meaningful preterm feeding patterns.
|We studied a pseudoanonymised test cohort of 49,450 very and extremely preterm neonates (less than 32 gestational weeks, born between 01 Jan 2012 and 08 Jan 2019) and admitted to neonatal units in England. We considered daily data relating to nutritional intake, applying processes to minimise missing data, ensure logical consistency of variables and convert each baby’s daily record into an aggregate summary of each nutrient type (maternal milk; donor milk, formula, fortifier, parenteral nutrition) delivered per day. We applied unsupervised ML/AI methods (k-means clustering and a more complex Dirichlet Process Gaussian Mixture Model) to cluster the cohort, identify patterns in feeding regimens and outliers based on each baby’s entire length of stay. The National Research Ethics Service has approved the NNRD as a research database (16/LO/1093). Study funding is by the Imperial NIHR Biomedical Research Centre.
|We demonstrated that our clustering approaches yielded clinically meaningful and interpretable findings. We identified around 10 typical feeding patterns that describe 80% of the population. A large number of rare patterns described the remainder. The largest group (~30%) clearly illustrated the well-recognised trade-off between mother’s milk and formula, with other groups trading fortifier and formula with the presence of a larger proportion of mother’s milk. The fifth largest cluster (~7%) is a high mortality group with shorter length of stay receiving mostly parenteral nutrition. We additionally considered the average time series of feeding events associated with each group, identifying a small number of expected feeding transitions in dominant clusters, and more complicated transitions in the rare clusters.
|We show that is it possible to apply agnostic ML/AI techniques to the NNRD and draw inferences that are in accord with clinician knowledge. This indicates potential to apply ML/AI techniques to the NNRD and wider linked datasets to obtain data-driven insights. Examples include detection of non-random associations, to identify possible disease determinants, and predictive modelling using complex temporal longitudinal data to uncover patient pathways and consider interventions that might alter patient outcomes.