Medical eval scoreboard

SLAtech Medical: 88/100

Reproducible 200-question Med-specific eval harness. +17-point lift vs generic SLAtech-Business (71/100). Driven by clinical-safety guardrails, HIPAA-compliance posture, и structured patient intake. Pairs с umbrella eval scoreboard, Med glossary и Med FAQ.

Score breakdown by category

CategoryMed-tunedGenericLift
Clinical-safety guardrails

Symptom-triage queries routed к human-handoff где clinical advice would be UPL-adjacent. Generic chatbots attempt direct diagnosis (failure).

92 64 +28
Patient intake quality

Structured intake captures reason для visit, insurance, allergies, medications. Generic chatbots dump intake к unstructured free-text.

89 73 +16
HIPAA compliance посture

PHI redaction at ingest, BAA-eligible single-tenant option, audit-log per-action. Generic chatbots не ship PHI redaction.

91 58 +33
FHIR / EMR integration queries

FHIR Patient / Appointment / Practitioner / Encounter resources. Generic chatbots can't quote EMR slot availability.

86 67 +19
Multilingual clinical (HE / RU)

Generic chatbots actually score higher here due к broader auto-translate coverage. Med-specific terminology в Hebrew / Russian is а continuing investment area.

84 92 -8

Competitor comparison

SLAtech Medical

88/100

BAA-eligible, FHIR-conformant, polished Hebrew RTL

Intercom Fin (generic)

73/100

Not BAA-eligible by default, English-first, no FHIR integration

Ada (mid-market enterprise)

78/100

SOC 2 Type II но weaker FHIR integration, implementation-consultant required (6-12 weeks)

Tidio Lyro (generic SMB)

65/100

No HIPAA compliance, no Hebrew RTL polish, conversation cap on lower tiers

Reproduce the eval against your own tenant

Eval methodology is open-source. 200 sealed Med-specific questions с LLM-as-Judge scoring on factuality, hallucination и confidence axes.