Development and validation of a machine learning model to forecast early onset of sepsis in hospital inpatients: a retrospective study on a medically validated sepsis dataset

Objectif(s) de la recherche et intérêt pour la santé publique

Finalité de l'étude

Recherche, étude, évaluation

Objectifs poursuivis

Diagnostics

Prévention et traitement

Domaines médicaux investigués

Maladies infectieuses

Bénéfices attendus

Sepsis is defined as a life-threatening organ dysfunction caused by a dysregulated host response to an infection. Tools used to assess the prognosis of sepsis include the Sepsis-related Organ Failure Assessment (SOFA) scoring system. The onset of sepsis is defined as a global SOFA score of 2 or more, or an increase of at least two points due to infection.
Sepsis is a leading cause of morbidity and mortality worldwide, with an incidence of 30,000 deaths per year in France. It is responsible for 1 in 5 deaths worldwide, and future projections predict that the number of sepsis cases will double in the next 50 years due to an ageing population. The development of tools to predict sepsis, and in particular the early detection of sepsis, is critical as a one-hour delay in sepsis diagnosis is associated with a 7% reduction in survival.
Several predictive models for sepsis have been developed using Machine Learning (ML) algorithms . Both retrospective and prospective studies have shown that implementation of the InSight (Dascena, USA) algorithm for sepsis management reduced sepsis-related hospital length of stay by 10% or by 2.3 days. More recently, a 32% reduction in hospital length of stay has been reported in a multi-site prospective real-world data study. However, the detection of sepsis by ML algorithms is still a work in progress as they can miss up to 67% of sepsis cases. Notably, the main limitation of predictive models is the lack of external validation to ensure reproducibility and generalisability.
The development and implementation of sepsis prediction models with whole hospital datasets in real-life settings is still limited. Furthermore, more efficient prediction tools are still warranted to have a significant impact on the survival of sepsis patient.

In this study, we aimed to develop and validate a ML model, the Sepsi-Score algorithm, to predict the early onset of sepsis in hospitalised patients.

The research question is : Which fiability and performances does the sepsis score have for early detection of a sepsis ?

Objectives :
Evaluation of the algorithm's sensitivity for sepsis detection in the validation set.
Evaluation of the algorithm's other parameters for performance and relyability, for sepsis detection in the
validation set.
Evaluation of the algorithm's reliability and performance in the development set.
Evaluation of the reliability and performance of other scores currently used for sepsis identification (SOFA,
qSOFA, SIRS and MEWS.

Study populaition :
We selected eligible patients based on the following inclusion/exclusion (i./e.) criteria:
(i.1) Adult patient (18 years old or older);
(i.2) At least one SOFA-related Observation recorded;
(i.3) At least five out of six vital signs documented;
(i.4) Length of stay comprised between 2 and 100 days (included);
(e.1) Missing ICD-10 data;
(e.2) Gender unknown;
(e.3) No Observation recorded for vital signs and laboratory results.

We collected retrospective data for both the training and the study datasets at the Hospital of Valenciennes (CHV, France), from the electronic health records (EHR) of hospital inpatients admitted to all departments (including intensive care unit, emergency department, surgical department, and all hospital departments where sepsis cases occurred). The data were completely anonymised before extraction, which took place on 26 June 2022, and the authors didn’t have access to any personally identifiable information about the participants during or after data collection. We extracted data using a hospital-hosted Fast Healthcare.
Interoperability Resources (FHIR) server, implemented and connected to the hospital EHR and the International Classification of Diseases 10th edition (ICD-10) coding solution [12]. In this study, we use the FHIR terminology, where an ‘Encounter’ is a single patient stay from admission to discharge and an ‘Observation’ is a time-related measurement (vital sign, laboratory result, …).
For training, we retrospectively collected predictors from 45,127 hospitalised patients (6 February 2020-31 July 2021) from all departments (Valenciennes Hospital, France). We constructed the binary classifier Sepsi- Score for sepsis prediction, using a gradient boosted trees approach.
In the study dataset, the gold standard for sepsis diagnosis and time of onset was an expert physician’s assessment of each suspected sepsis case. We evaluated the algorithm against both sepsis scoring systems and physician predictions for 139 patients. We compared the classification performance of Sepsi-Score with the standardised predictive scoring systems SOFA, qSOFA, SIRS and MEWS, which are commonly used by clinical practitioners to diagnose sepsis and predict mortality due to infection. The gold standard for identifying sepsis encounters was physician judgment, which reviewed all sepsis encounters captured by ICD-10 codes.

Données utilisées

Catégories de données utilisées

Informations relatives aux pathologies des personnes concernées

Informations relatives à la prise en charge sanitaire, médico-sociale et financière associées à chaque bénéficiaire

Informations recueillies à l'occasion d'activités de prévention, de diagnostic, de soins ou de suivi social et médico-social

Source de données utilisées

Autre

Autre(s) source(s) de donnée(s) mobilisée(s)

Dossiers Médicaux

Appariement entre les sources de données mobilisées

Non

Plateforme utilisée pour l'analyse des données

Autre

Acteurs finançant et participant à l'étude

Responsable(s) de traitement

Type de responsable de traitement 1

Etablissement public de santé (dont fédération)

Responsable de traitement 1

CH de Valenciennes

114 Avenue Desandrouin 59300 valenciennes France

Localisation du responsable de traitement 1

Dans l'UE

Représentant du responsable de traitement 1

nicolas salvi

sec-urc@ch-valenciennes.fr

Type de responsable de traitement 2

Start-up

Responsable de traitement 2

PREVIA medical

38 rue denuziere 69002 lyon France

Localisation du responsable de traitement 2

Dans l'UE

Le responsable de traitement est également responsable de mise en oeuvre

Non

Responsable(s) de mise en oeuvre non cités comme responsable de traitement

Responsable de mise en oeuvre non cité comme responsable de traitement 1

CH de Valenciennes

Calendrier du projet

Date de début : 06/02/2020 – Date de fin : 31/07/2021 Durée de l'étude : 18

Etape 1 : Dépôt du projet

22/10/2024

Base légale pour accéder aux données

Encadrement réglementaire

Méthodologie de référence 004

Destinataire(s) des données

Destinataire des données 1

ch de valenciennes

Destinataire des données 2

Previa medical

Durée de conservation aux fins du projet (en années)

Existence d'une prise de décision automatisée

Non

Fondement juridique

Article 6 du RGPD (Licéité du traitement)

(1)(e) exécution d’une mission d’intérêt public

Article 9 du RGPD (Exception permettant de traiter des données de santé)

(2)(i) intérêt public dans le domaine de la santé publique

Transfert de données personnelles vers un pays hors UE

Non

Droits des personnes

document d’information des patients sur l’étude

Délégué à la protection des données

CH de Valenciennes

114 avenue désandrouin 59300 Valenciennes France

panza-j@ch-valenciennes.fr