Objectives
This study examined whether incorporating free-text entries into structured general practice records improves the detection of long-term conditions (LTCs) and multimorbidity (MM) in New Zealand (NZ) general practices.
Methods
Data from 374 071 deidentified individuals in general practices were analysed to identify 61 LTCs. Structured data were extracted using Read codes from a national master list, and clinical raters independently identified condition-related free-text, including synonyms, negation terms and common misspellings in randomised samples. Keywords were categorised and refined through ten iterative tests. Programmatic text classification was developed and assessed against gold-standard clinician ratings, using sensitivity, specificity, positive predictive value (PPV) and F1-score.
Results
A quarter of general practitioner classifications contained either unrecognised Read codes or consisted of free-text only. Clinician inter-rater reliability was high (kappa ≥0.9). Compared with clinical gold standard, text classification yielded an average sensitivity of 88%, specificity of 99% and PPV of 95%, with an F1-score range of 82%–95%. Incorporating free text increased LTC prevalence from 42.1% to 46.3%, reducing misclassification of MM diagnoses by identifying 12 626 additional patients with MM and 15 972 additional patients with at least one LTC.
Discussion
In the course of workflow, general practitioners face barriers to accurate LTC coding or may simply annotate with text-based descriptions. Programmatic text classification has demonstrated high performance and identified many more patients receiving LTC care.
Conclusions
Combining structured and unstructured data optimises MM detection in NZ general practices and has the potential to improve case management, follow-up care and allocation of healthcare resources.