Treatment Omission Benchmark GPT-4o-mini on MedR-Bench

Table 1 Selected Completeness Studies in Medical AI

Study	Journal	Task	Omission	n
CREOLA (Asgari et al. 2025)	Nature	Clinical note generation	3.5%
MedR-Bench (Qiu et al. 2025)	Nat. Comm.	Diagnostic reasoning recall	~30%	1,453
Stanford (Grolleau et al. 2025)		Discharge note completeness	~35%
This benchmark		Treatment component omission	60.2%	493

Studies differ in task, dataset, and evaluation methodology; figures are not directly comparable. To our knowledge, this is the first systematic benchmark measuring treatment component omission in AI-generated clinical plans.

Method

496 treatment case reports from MedR-Bench (PMC Open Access, published after July 2024, ensuring no training-data contamination) were used. For each case, the structured patient summary was submitted to GPT-4o-mini with instructions to generate a detailed treatment plan including specific medications, dosages, procedures, monitoring, and timing. The AI-generated plan was then compared against the published treatment from the original case report. Each treatment component was independently classified as covered or missed. Omission rate = missed components / total components × 100%. 493 of 496 cases were successfully evaluated (3 excluded due to API errors). Evaluation follows the LLM-as-judge methodology of MedR-Bench (Qiu et al. 2025); judge model: GPT-4o-mini at temperature 0.1.

Treatment Omission in AI-Generated Clinical Plans:
A Systematic Benchmark of 493 Cases

MedR-Bench Treatment Dataset (Qiu et al., Nature Communications 2025) · GPT-4o-mini · March 2026

Table 1 Selected Completeness Studies in Medical AI

Method

Treatment Omission in AI-Generated Clinical Plans:A Systematic Benchmark of 493 Cases

MedR-Bench Treatment Dataset (Qiu et al., Nature Communications 2025) · GPT-4o-mini · March 2026

Table 1  Selected Completeness Studies in Medical AI

Method

Treatment Omission in AI-Generated Clinical Plans:
A Systematic Benchmark of 493 Cases

Table 1 Selected Completeness Studies in Medical AI