Comparison of large language models and expert multidisciplinary team decisions in colorectal cancer

A VPN is an essential component of IT security, whether you’re just starting a business or are already up and running. Most business interactions and transactions happen online and VPN

Objectives

To evaluate the ability of large language models (LLMs) to simulate multidisciplinary team (MDT) decision-making in colorectal cancer, a malignancy that often requires complex treatment planning.

Methods

We retrospectively analysed 1423 colorectal cancer cases discussed at MDT meetings at Peking University Cancer Hospital between January 2023 and December 2024. Three LLMs—OpenAI o3-mini-2025-01-31, DeepSeek-R1 671b and Qwen qwq-plus-2025-03-05—were tested for their ability to replicate MDT recommendations using a standardised treatment categorisation framework. Each case was processed three times per model; only cases with consistent outputs across all three runs were included. Concordance between AI-generated decisions and expert MDT consensus was assessed using agreement percentages and Cohen’s kappa.

Results

O3 demonstrated the highest intramodel stability, with an agreement rate of 81.0% (Fleiss’ kappa=0.794), yielding 1153 cases with consistent outputs. Concordance with MDT consensus was comparable across the three models, ranging from 62.5% to 65.4%. Multivariable analysis of O3 outputs identified treatment-naïve status, non-metastatic disease and colon tumour location as independent predictors of higher concordance with experts.

Discussion

LLMs showed fair overall agreement with expert MDT decisions, with stronger performance in standardised and less complex clinical scenarios. Areas of higher concordance included treatment-naïve non-metastatic colon cancer, treated non-metastatic rectal cancer and treated non-metastatic colon cancer.

Conclusion

LLMs can partially replicate expert MDT recommendations in colorectal cancer. Their integration into clinical workflows should aim to complement, rather than replace, human expertise.

Qu, B., Cao, L., Wu, C., Chen, Y., Sun, T., Pei, J., Huang, L., Hou, X., Li, D., Wu, A.

Qu, B., Cao, L., Wu, C., Chen, Y., Sun, T., Pei, J., Huang, L., Hou, X., Li, D., Wu, A.

Leave a Replay

Sign up for our Newsletter

Contact Us