Mechanistic interpretability of reinforcement learning in Medicaid care coordination
Objective To expose reasoning pathways of a reinforcement learning policy for Medicaid care coordination, develop an error taxonomy and implement fairness-aware guardrails. Design Retrospective interpretability