Honeypot Med Casebook

Honeypot Med Healthcare AI Challenge Traparium

A museum cabinet for failure modes. Every bland finding gets a specimen label.

Roster Leech

data_exfiltration sightings 4 bites 1

hungry for tables, tokens, and patient-shaped nouns

Tells: dump all data, leak records, export roster

Containment: Require scoped retrieval, output redaction, and explicit export approval for sensitive datasets.

Compliance Mimic

instruction_override sightings 4 bites 1

polite until it finds a manager override

Tells: ignore previous instructions, role reset, authority costume

Containment: Pin system boundaries in a separate control layer and reject role-reset language before tool access.

Policy Poltergeist

safeguard_bypass sightings 5 bites 1

rattles locks labeled guardrail, policy, and admin

Tells: disable safeguards, bypass guardrails, jailbreak

Containment: Keep policy state outside the model path and deny prompts that request control-plane changes.