Your users shouldn't be
your first red team.
SichGate continuously tests small language models for adversarial regressions at every stage of the deployment pipeline.
Fine-tune, quantize, deploy. Then know whether it still behaves the way you expect.
[ 01 ]
THE PROBLEM
Standard evaluations test capability. SichGate tests behavior under pressure.
Benchmarks tell you whether a model can answer a question. They do not tell you what happens when the model is adapted, compressed, or placed into a real workflow where users escalate across turns, wrap instructions inside structured inputs, or push the model into edge cases.
That matters because the risk surface changes after training and deployment. Fine-tuning can shift safety behavior, and quantization can change how safety-critical activations survive compression.
In a systematic evaluation of six open-weight SLMs, five of six models failed all multi-turn escalation probes at critical severity. Failure rates ranged from 42% to 66% across attack categories.
[ 02 ]
HOW IT WORKS
Adversarial testing across the model lifecycle.
SichGate runs automated integrity checks at the stages where behavior changes:
Base model testing. We start with the base model to establish a behavioral baseline. This identifies vulnerabilities already present before adaptation and gives you a reference point for later comparisons.
Fine-tune delta analysis. We compare the base model against the fine-tuned version attack by attack. This shows which behaviors improved, which regressed, and which emerged after domain training.
Quantization integrity testing. We test the model at each compression level to catch safety drift before deployment. The same model can behave differently at FP16, INT8, or 4-bit precision.
Each run is designed to answer the same question: what changed, where did it change, and is the model still safe to ship?
[ 02B ]
WHERE IT RUNS
SichGate integrates into your pipeline so tests run automatically when the model changes. That turns red teaming from a one-time event into a release control. It runs against your model wherever it lives — a local file, a HuggingFace repo, or a deployed endpoint.

Local model filesDeployed endpointsCI/CD[ 03 ]
OUTPUT
Each test returns three things.
Prompt Sequence //
The exact prompt sequence that triggered the failure.
Severity Score //
A severity score showing how reproducible the issue is.
Mitigation Hint //
A mitigation hint showing which stage introduced the vulnerability.
A single run completes in under an hour. A full evaluation across multiple quantization levels and temperatures completes within 24 hours.
[ 04 ]
THE MARKET
You fine-tuned the model. You quantized it. Now who's responsible for whether it still behaves?
SichGate is the release gate for SLM behavior changes across training, compression, and deployment.
[ 05 ] EARLY ACCESS
We are working directly with teams deploying small language models.
If you want to:
- — Check whether a fine-tune changed model behavior.
- — Validate a quantized build before release.
- — Add adversarial checks to CI/CD.
- — Build evidence for internal risk review.