Objectives
In primary healthcare research, there are core challenges such as data silos and missing data. Furthermore, the current high technical barriers severely limit effective cross-regional data analysis.
Methods
This work was the first to apply the federated causal learning framework to primary healthcare. Through two case studies, we demonstrated how to estimate cross-regional causal effects without sharing raw data, guided by a detailed step-by-step protocol. Furthermore, we designed a systematic simulation study tailored to the characteristics of primary healthcare data to evaluate the performance of this framework under various missingness mechanisms and proportion settings.
Results
This framework was effectively applied to both chronic non-communicable disease and infectious disease, two major issues that remain public health priorities requiring sustained attention. In the cardiovascular disease case, the estimated average treatment effect (ATE) from the federated model (ATE=0.017) was very close to the result of the centralised model (ATE=0.018). Under all missing data scenarios, the stable model consistently achieved perfect or near-perfect coverage rates, maintaining performance even under missingness rates as high as 20%. In addition, the coverage of the unstable model remained robustly above 96.10% even when model assumptions were violated.
Discussion
This work demonstrated the effectiveness and practicality of federated causal learning in primary healthcare data, which was characterised by decentralisation and susceptibility to missing data.
Conclusion
This framework provided a feasible solution for primary healthcare workers to safely conduct federated causal inference. It held promise for advancing data-driven precision decision-making in primary care.