[ 01 ]
RESEARCH
SichGate publishes original adversarial evaluation research on small language models in regulated deployments.
Black-box and white-box adversarial evaluation for quantized, fine-tuned, edge-deployed language models.
White Paper · March 2026
Safety as a Secondary Objective
Systematic Adversarial Evaluation of Small Language Models in High-Stakes Deployments
Small language models are being deployed in healthcare, finance, and legal workflows without SLM-specific adversarial benchmarks. We ran a systematic black-box evaluation of six open-weight SLMs — Qwen2-1.5B, Llama-3.2-3B, Phi-3-mini, MedGemma-4B, Mistral-7B, and Llama-3.1-8B — across 21 attack categories and 924 adversarial interactions.
Key findings
Context-window safety degrades in a pattern correlated with attention architecture, not model size.
Medical fine-tuning redistributed the attack surface rather than reducing it — improving safe messaging while amplifying demographic bias in clinical contexts.
Five of six models failed all crescendo multi-turn escalation probes at critical severity. Overall failure rates ranged from 42% to 66%.
COLLABORATE
Interested in independent application of the evaluation framework, or validating your edge model deployment?
GET IN TOUCH →