In the scenario of real-time monitoring of hospital patients, high-quality inference of patients’ health status using all information available from clinical covariates and lab tests is essential to enable successful medical interventions and improve patient outcomes. Developing a computational framework that can learn from observational large-scale electronic health records (EHRs) and make accurate real-time predictions is a critical step to achieve this goal. However, existing medical time series data from EHRs pose several challenges for conventional methods. For instance, many of the covariates, lab results in particular, are sparsely sampled in time across patients. In addition, there are substantial uncertainties in patient state and disease progression at any time. These properties make inferring the physiological status of a patient or joint analysis of time series across patients challenging.
In this work, we introduce MedGP, a statistical framework that leverages observational noisy medical time series data to enable personalized, real-time predictions of patient states and improve the clinical decision-making process. MedGP is based on multi-output Gaussian processes (MOGPs), a Bayesian nonparametric model that can capture rich temporal structures between clinical covariates from heterogeneous and irregularly sampled time series data. First, we show the design of highly structured and regularized kernel for robust estimation of the correlations among clinical covariates. We then show ways to efficiently implement the proposed approach and relieve the computational bottleneck imposed by learning GP parameters across many patients. The empirical results of MedGP show statistically significant improvements over baselines in making online predictions on multiple disease cohorts from two real-world medical datasets.
More details of the project can be found in our arXiv paper.