Objectives
To develop and evaluate a machine learning (ML) model that predicts Crohn’s disease (CD) patients responsible for the top quartile of healthcare spending.
Methods
De-identified commercial claims (2016–2018) from ~267 000 continuously enrolled members in a Midwestern state were analysed, including 994 CD cases. Monthly data for each patient was aggregated into data points that included healthcare spending amounts, encounter interactions, demographics and binary flags for diagnoses, procedures and drug codes. Seven algorithm families were tuned using five-fold cross-validation (January 2016 to September 2017) and tested prospectively (November 2017 to February 2018). Monthly performance evaluations assessed the accuracy of predicting high-cost healthcare spending, using 4-month and 1-month historical cost analyses for comparison.
Results
ML models predicted an average of 80% of the dollars spent by top-quartile members during the 4-month evaluation period, compared with 67% for the 4-month baseline and 62% for the prior-month benchmark. The models identified an average of 51 new members entering the high-cost group each month, nearly double the yield of the 4-month historical method. These ML models more accurately anticipated inpatient encounters that drove excess spending.
Discussion
Claims-based ML offers actionable lead time for payers and clinicians to enhance monitoring, adjust biological therapy or schedule elective care before emergency admissions occur. Because this framework relies exclusively on standard claim fields, it can be quickly extended to other episodic, high-variance conditions.
Conclusion
Prospectively tested, claims-only ML models enhance short-term risk stratification in CD by identifying future high-cost patients. Future studies should confirm the clinical impact, cost savings and ensure equitable performance across diverse populations.