[ 01 ]
RESEARCH

SichGate publishes original adversarial evaluation research on small language models in regulated deployments.

Black-box and white-box adversarial evaluation for quantized, fine-tuned, edge-deployed language models.

White Paper · March 2026

Safety as a Secondary Objective

Systematic Adversarial Evaluation of Small Language Models in High-Stakes Deployments

Small language models are being deployed in healthcare, finance, and legal workflows without SLM-specific adversarial benchmarks. We ran a systematic black-box evaluation of six open-weight SLMs — Qwen2-1.5B, Llama-3.2-3B, Phi-3-mini, MedGemma-4B, Mistral-7B, and Llama-3.1-8B — across 21 attack categories and 924 adversarial interactions.

Key findings

Context-window safety degrades in a pattern correlated with attention architecture, not model size.

Medical fine-tuning redistributed the attack surface rather than reducing it — improving safe messaging while amplifying demographic bias in clinical contexts.

Five of six models failed all crescendo multi-turn escalation probes at critical severity. Overall failure rates ranged from 42% to 66%.

Polina Moshenets · SichGate

DOWNLOAD PDF →GITHUB →

COLLABORATE

Interested in independent application of the evaluation framework, or validating your edge model deployment?

GET IN TOUCH →