
Generative AI is poised to reshape healthcare, offering transformative potential from easing clinician burnout by automating documentation to empowering care coordinators with instant insights. However, realizing this potential requires a nuanced approach. While Large Language Models (LLMs) are powerful generalists, the path to responsible and effective AI in healthcare is paved with specialized, purpose-built models.
At Innovaccer, we've built our AI strategy around a suite of Small Language Models (SLMs) which are highly focused, fast, and efficient components, each fine-tuned for specific, high-impact tasks. These models serve a dual purpose: they form the backbone of our AI security & safety framework, acting as sophisticated guardrails to ensure every interaction is secure and appropriate, while simultaneously powering critical clinical and operational applications where precision and speed are essential.
This post explores the applications and deployment of these SLMs across safety guardrails and operational workflows, demonstrating how purpose-built models make AI not just powerful, but viable and trustworthy in healthcare.
While massive LLMs excel at generating human-like text, they aren't the right tool for every job. For guardrails and operational tasks, speed, specificity, and efficiency are paramount. This is where SLMs that are very well fine-tuned on domain specific high quality datasets shine.
At Innovaccer, we define Small Language Models (SLMs) as any language model with <7 billion parameters. Within this category, we differentiate between smaller SLMs (typically 100 million to <1 billion parameters) and larger SLMs (1 billion to <7 billion parameters), enabling us to make informed decisions about deployment, resource allocation, and use case selection. The key benefits are:
At Innovaccer, we follow a strategic hybrid approach that leverages the strengths of both SLMs and LLMs (where needed), ensuring optimal performance, cost efficiency, and reliability across our healthcare AI applications. This orchestration allows us to deploy the right tool for each task.
For narrow, well-defined tasks, we employ SLMs. Examples include:
For tasks requiring large context windows, complex reasoning, nuanced judgment, or generative capabilities, we leverage LLMs. Examples include:
This hybrid architecture ensures we achieve the optimal balance between speed, cost, and capability, deploying the most appropriate model for each use case.
We deploy SLMs across two primary domains: AI Safety Guardrails that ensure secure and responsible AI interactions and Operational Applications where SLMs drive decision making and bring efficiency. The sections below will go over these two primary domains in detail.

For AI Safety Guardrails, we've developed a structured framework built on three core pillars and a horizontal bedrock. Each pillar addresses a different dimension of risk, working in concert to create a secure AI ecosystem.
.png)
This initial tier protects the system from external attacks and fundamentally unsafe content.
Malicious actors constantly try to trick LLMs into ignoring their safety rules. Our BERT-based SLM is trained on over ~100K examples of these attacks, from simple overrides to complex obfuscations. It acts as a bouncer, identifying and blocking prompts designed to hijack the model.
In a professional healthcare setting, the AI’s tone and scope must be strictly controlled. We have dedicated SLMs to block:
This is one of the most critical safety tiers for healthcare. These BERT-based guardrails ensure our AI supports healthcare professionals without ever crossing the line into practicing medicine.
The AI is designed to assist with workflows, provide education, and coordinate care but not to make clinical decisions. This SLM is meticulously trained to distinguish between safe, supportive information and direct clinical advice.
Generative AI can sometimes confidently state falsehoods. In healthcare, this can be dangerous. This SLM is trained to identify and block common medical myths and misinformation.
Once we know the AI is safe and clinically responsible, we must ensure its outputs are trustworthy and reliable.
This SLM acts as a real-time fact-checker. It compares the user's query against the AI's response, flagging when the response contradicts the input or invents information.
Healthcare AI must serve all patient populations equitably. This SLM is trained to detect subtle and overt biases related to race, gender, age, socioeconomic status, and more.
In complex workflows, multiple AI agents may collaborate. This SLM ensures that information remains consistent throughout the entire process.
Underpinning our entire framework is a non-negotiable commitment to patient privacy.
This isn't just one SLM; it's a sophisticated, multi-stage system designed to detect and redact all 18 Health Insurance Portability and Accountability Act (HIPAA)-defined Protected Health Information (PHI) entities that is expandable to detect more Personally Identifiable Information (PII) entities. Our hybrid approach uses multiple BERT models combined with regex patterns to achieve comprehensive coverage.
The architecture consists of a general NER model which is a BERT-based model that detects all 18 entity types, followed by a few specialist models specialized in tricky entities to refine the initial results and finally a contextual correction layer where we apply pattern matching and most importantly, medical context awareness. We are also building the ability to not only de-identify entities but also re-identify it back for specific users who have authorization to view PHI data.
Our specialized AI Safety Guardrails deliver exceptional performance with inference latencies under 300ms on Mac M4 and ~20 ms on GPUs, while maintaining >95% accuracy across multiple diverse datasets. This represents significant advantages over major LLMs, which typically require several seconds for safety processing and offer limited control over accuracy due to the complexity and cost of fine-tuning large models.
.png)
Beyond safety guardrails, SLMs power critical operational applications across our healthcare platform, enabling real-time decision-making, data processing, and workflow automation. These applications deliver measurable business value while maintaining the speed and efficiency that make SLMs ideal for production environments. We organize our operational SLM applications into three core pillars: Care Quality & Revenue Integrity, Conversational Intelligence and Summarization & Synthesis.
Workflow orchestration SLMs enable automated processing and routing of clinical data, ensuring information flows efficiently through healthcare systems. These models extract, validate, and structure data to power downstream clinical workflows. A couple of scenarios are shared below where SLMs are used.
Named Entity Recognition forms the foundational layer of our clinical NLP pipelines, extracting medically relevant entities from unstructured clinical narratives. We have developed an ensemble of specialized BERT-based SLMs and decoder-only SLMs, trained on clinical text to identify diagnoses, medications, laboratory results, vital signs, procedures, allergies etc. along with their clinical attributes like dosage, units, frequency, value etc.
Our clinical NER SLMs, fine-tuned on healthcare-specific datasets, excel at recognizing medical terminology, abbreviations, and clinical context. These models extract structured information from various clinical document types, including progress notes, discharge summaries, radiology reports, and pathology findings.
The extracted entities enable downstream applications such as clinical decision support systems, population health analysis, risk stratification, and automated clinical documentation. By transforming unstructured clinical text into structured, machine-readable data, NER SLMs unlock the value embedded in the vast corpus of clinical narratives that comprise approximately 80% of healthcare data.
Autonomous Clinical Coding SLMs extend beyond entity extraction to context-aware clinical and financial reasoning, enabling accurate, explainable code assignment at scale. These SLMs operate on structured outputs produced by upstream clinical NER and contextual reasoning models, ensuring determinism, auditability, and compliance.
We deploy specialized SLMs trained on coding guidelines, payer rules, and real-world clinical documentation to infer ICD-10, CPT, and HCC codes, along with supporting rationale and evidence trails.
These models:
Rather than acting as a black-box “AI coder,” these SLMs function as coding co-processors, augmenting human coders and clinicians with precise, explainable recommendations.
“A patient with long-standing Type 2 diabetes presents for follow-up. HbA1c remains elevated at 8.4. Diabetic nephropathy noted. On insulin therapy.”
The autonomous coding SLM pipeline produces:
A single or a combination of SLMs are fine-tuned and helps :
This output is routed for coder review or direct downstream submission, depending on confidence thresholds.
RCM-focused SLMs ensure that clinical truth translates into clean, compliant revenue, bridging the gap between documentation, coding, and claims processing. These models continuously operate across clinical records to identify gaps, inconsistencies, and compliance risks before they result in denials or revenue leakage.
Rather than reacting to denials post-submission, these SLMs proactively enforce policy-to-decision graphs, ensuring that required documentation, codes, and modifiers are present and aligned.
The workflow powered by a fine-tuned SLM:
This intervention occurs before claim submission, reducing downstream denials.
Step 1: Policy Chunking & Normalization
Fine-tuned policy SLMs segment large payer guidelines into semantically coherent chunks (eligibility criteria, medical necessity, documentation requirements, exclusions). Each chunk is normalized into structured clinical and operational concepts (ICD-10, CPT, age ranges, lab thresholds), while preserving traceability to the original policy text.
Step 2: Decision Graph Construction
Normalized policy components are compiled into explicit decision graphs, where nodes represent clinical or administrative conditions and edges encode logical dependencies (AND / OR / NOT). These graphs capture payer-specific rules, exceptions, and short-circuit logic in a deterministic, auditable form.
Step 3: Real-Time PA Readiness Evaluation
At submission time, the decision graph is executed against structured patient data. The SLM-driven evaluation validates diagnosis, procedure alignment, modifier requirements, supporting documentation and payer-specific coverage criteria, producing a structured readiness result with evidence-level explanations.
Step 4: Explainable Outcomes & Routing
The output includes pass/fail status per policy condition, missing requirements, and recommended next actions. Based on the result, cases are automatically routed for submission, documentation remediation, or UM review.
Additional applications include:
SLMs built for Conversation agents enable natural language understanding and interaction in patient engagement and care coordination systems, facilitating real-time communication and query handling.
In patient engagement and care coordination agents, understanding user intent is crucial for routing queries appropriately and providing timely responses. We employ specialized SLMs for intent detection, enabling real-time classification of patient queries and care coordinator requests.
For straightforward, domain-specific intent classification, we leverage smaller BERT-based SLMs that excel at recognizing common healthcare intents such as appointment scheduling, logistic support, medication inquiries, or care coordination requests. These models, trained on healthcare-specific datasets, classify intents in real-time, enabling immediate routing to appropriate workflow handlers.
For more complex intent scenarios involving novel queries, ambiguous language or longer input contexts, we utilize larger SLMs that provide enhanced zero-shot classification capabilities while maintaining the speed and cost advantages of self-hosted deployment.
Summarization SLMs transform lengthy clinical documents and patient histories into concise, actionable summaries, enabling clinicians to quickly access critical information without wading through extensive documentation.
Generating comprehensive patient summaries from complete patient history using a single LLM is both expensive and time-consuming. The large context windows required to process entire patient histories drive up costs, while the sequential processing of all information creates significant latency that impacts clinical workflows.
To address this challenge, we've developed a multi-SLM architecture that decomposes the summarization task into specialized components. Instead of using a single LLM to process the entire patient history, we employ multiple specialized SLMs, each fine-tuned for specific sections of the clinical record.
Each specialized SLM processes its designated section in parallel, dramatically reducing both latency and cost. For example, our Clinical Considerations SLM focuses on diagnoses, symptoms, and treatment plans; our Lab Reports SLM handles laboratory results, trends, and abnormalities; our Vitals SLM processes vital signs and measurements; and our Medications SLM summarizes medication history and changes. The outputs from these specialized SLMs are then aggregated into a comprehensive patient summary.
.png)
Additional summarization & synthesis applications include:
The development process for these specialized SLMs follows a disciplined, iterative cycle:
Healthcare demands AI systems that are both powerful and principled. Our comprehensive approach using Small Language Models demonstrates that innovation and responsibility aren't competing priorities, rather they're complementary forces that, when properly orchestrated, unlock AI's true potential in healthcare.
Through our dual-domain strategy, we've shown how purpose-built SLMs can simultaneously safeguard AI interactions and power critical operational workflows. The same principles that make these models effective guardrails with speed, precision, and specialization also make them invaluable for clinical and business applications. Whether detecting bias in patient communications, extracting clinical entities from physician notes, or routing care coordination requests, these lightweight models deliver enterprise-grade performance without the complexity and cost of large language model APIs.
The result is an AI ecosystem where safety enables innovation rather than constraining it. Our framework doesn't limit what AI can accomplish in healthcare; it creates the secure, reliable, and efficient foundation necessary for broader adoption. By strategically deploying specialized models across both protective and operational functions, we're building AI systems that healthcare professionals can confidently integrate into their workflows and patients can trust with their most sensitive information.
This approach represents the future of healthcare AI: not monolithic solutions that attempt to do everything, but orchestrated ecosystems of purpose-built intelligence that excel at specific, high-impact tasks. Through this architecture, we're not just advancing AI capabilities but we're advancing the healthcare industry's ability to harness AI responsibly and effectively.