Integrations

Make Honeypot Med fit existing eval and CI workflows.

Honeypot Med exports formats that humans and tools can both use: visual proof dossiers, offline proof PDFs, generated UI mockups, GitHub Actions, SARIF, JUnit XML, GitHub step summaries, OpenTelemetry-style logs, OpenInference traces, LangSmith JSONL, promptfoo, Inspect AI, Hugging Face-ready cards, canonical JSONL, Markdown, SVG badges, casebooks, and standalone HTML reports.

GitHub Action

Run challenge mode in CI.

The root action.yml installs the project, runs challenge mode, appends github-summary.md to the workflow summary, uploads artifacts, and can upload SARIF to GitHub Code Scanning.

Workflow snippet

- uses: ByteWorthyLLC/honeypot-med@main
  with:
    pack: healthcare-challenge
    output-dir: honeypot-med-report
    fail-under: 70
    upload-artifact: true
    upload-sarif: true
Export formats

The artifact map.

Each format exists because a different audience needs to consume the same result.

HTML

Human-readable proof page for buyers, leadership, screenshots, and public galleries.

Visual dossier

Print-friendly HTML proof surface for non-terminal review.

Offline proof PDF

No-API proof document generated locally for attachments and handoffs.

UI mockup

Static product surface generated from the run so stakeholders can see what the result means.

SARIF

Static-analysis format that can be uploaded into GitHub Code Scanning.

JUnit XML

CI-friendly test report where each trap is a testcase and risky traps become failures.

GitHub summary

Copy-ready step summary for GitHub Actions runs.

OpenTelemetry logs

Operational log payload that maps report events into telemetry-style records.

OpenInference and LangSmith

Offline JSONL trace shapes for teams that already inspect LLM runs in observability tools.

Eval kit

Offline adapters for promptfoo, Inspect AI, legacy OpenAI Evals, and canonical JSONL workflows.

Hugging Face cards

Dataset card, system card, and artifact manifest generated without upload or API calls.

Casebook

Forensic HTML, traparium, unknowns page, failure recipes, trap tree, and notebook.

PNG cards

Raster social card and badge generated with the Python standard library.

Release kit

Zip archive, SHA-256 checksum file, manifest, and release notes for generated report directories.

JSON

Stable machine-readable report for custom dashboards and downstream eval systems.

Markdown

PR comments, release notes, docs, and simple diffs.

SVG marker

README, docs, and launch posts that show the result without opening the report.

Eval kit

Move traps into tools teams already use.

The default generated promptfoo config uses the echo provider, so adapter generation and assertion checks can stay offline until a team intentionally swaps in a live provider. The verify command checks that the generated adapter directory is parseable.

Verify adapters and plan HF mirrors

python app.py eval-kit verify --dir reports/eval-kit
python app.py hf-mirror plan --outdir reports/hf-mirror

Package report artifacts for a release

python app.py release-kit --source-dir reports/export --outdir dist/release-bundles --name claims-export