Fastino Labs Open-Sources GLiGuard: A 300M Parameter Security Moderation Mannequin That Matches or Exceeds Accuracy of Fashions 23–90x Its Measurement

As LLM-powered functions transfer into manufacturing — and as AI brokers tackle extra consequential duties like shopping the online, writing and executing code, and interacting with exterior companies — security moderation has quietly turn into one of the vital operationally costly components of the stack.

Most builders who’ve deployed a manufacturing LLM system know the issue: it’s worthwhile to consider each consumer immediate earlier than it reaches the mannequin, and each mannequin response earlier than it reaches the consumer. Which means your guardrail mannequin runs on each single request, at each flip of a dialog. The guardrail latency compounds. The associated fee compounds. And the present technology of open-source guardrail fashions — LlamaGuard4 (12B), WildGuard (7B), ShieldGemma (27B), NemoGuard (8B) — are all decoder-only fashions with billions of parameters, constructed for flexibility however not for pace.

Fastino Labs launched GLiGuard, a 300 million parameter open-source security moderation mannequin designed to handle this particular downside. GLiGuard evaluates a number of security dimensions in a single go, and throughout 9 security benchmarks, its accuracy matches or exceeds fashions which might be 23 to 90 occasions its dimension whereas operating as much as 16 occasions sooner.

https://pioneer.ai/weblog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

To grasp what makes GLiGuard totally different, it helps to know why current guardrail fashions are sluggish. Most main guardrail fashions are constructed on decoder-only transformer architectures, they generate their security verdicts autoregressively, one token at a time — the identical approach a big language mannequin generates a response to a chat message.

This design made sense when security necessities have been fluid. Decoder fashions can interpret pure language activity descriptions and adapt to new security insurance policies with out retraining. However autoregressive technology is inherently sequential, which makes it sluggish and computationally costly.

There’s a compounding downside on prime of that. Most guardrail fashions have to assess inputs throughout a number of security dimensions: what kind of hurt is current, whether or not the consumer immediate is making an attempt to bypass security coaching, whether or not the mannequin’s response is itself unsafe, and so forth. As a result of decoder fashions generate output sequentially, these assessments are usually produced one after one other, and latency compounds as extra standards are evaluated.

In different phrases, the structure that makes decoder fashions versatile can also be the structure that makes them the incorrect device for what’s basically a classification downside.

What GLiGuard Truly Does

GLiGuard is a small encoder-based mannequin that reframes security moderation as a textual content classification downside quite than a textual content technology downside. Encoder fashions course of all the enter without delay and output a single classification label for a set of mounted labels, whereas decoder fashions generate their output one token at a time, left to proper.

The important thing architectural perception is in how GLiGuard handles a number of duties concurrently. As an alternative of producing tokens, GLiGuard encodes each the enter textual content and activity definitions (labels) collectively. These are then fed to the mannequin, which scores each label concurrently in a single ahead go and returns the highest-scoring label for every activity. As a result of all duties and their candidate labels are a part of the enter itself, evaluating further security dimensions doesn’t add latency; it merely means together with extra labels within the enter.

https://pioneer.ai/weblog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

GLiGuard runs 4 moderation duties concurrently in a single ahead go:

Security classification (protected / unsafe) — utilized to each consumer prompts earlier than technology and mannequin responses after technology.
Jailbreak technique detection throughout 11 methods, together with immediate injection, roleplay bypass, instruction override, and social engineering. If any jailbreak technique is detected, the immediate is routinely flagged as unsafe.
Hurt class detection throughout 14 classes — violence, sexual content material, hate speech, PII publicity, misinformation, little one security, copyright violation, and others. A single enter can set off a number of classes without delay.
Refusal detection (compliance / refusal), tracked individually to assist measure over-refusal (when a mannequin refuses protected requests) and detect false compliance (when a mannequin seems to conform however doesn’t). If a refusal is detected, the response is routinely marked as protected.

Coaching Information and Tremendous-Tuning

GLiGuard was skilled on a mix of human-annotated and synthetically generated coaching knowledge. For immediate security, response security, and refusal detection, the group used WildGuardTrain, a dataset of 87,000 human-annotated examples. For hurt class and jailbreak technique detection, labels for the unsafe samples have been generated utilizing GPT-4.1.

Throughout early coaching, the mannequin struggled to tell apart between comparable hurt classes like poisonous speech and violence, so the group used Pioneer to generate supplemental artificial knowledge with edge circumstances concentrating on these fine-grained distinctions.

On the structure facet, GLiGuard was skilled through full fine-tuning of the GLiNER2-base-v1 checkpoint for 20 epochs utilizing the AdamW optimizer. GLiNER2 is Fastino’s personal structure for multi-task textual content classification — a pure start line for a mannequin designed to attain a number of label units in a single go.

https://pioneer.ai/weblog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

Benchmark Outcomes: Accuracy and Velocity

The analysis group evaluated GLiGuard throughout 9 established security benchmarks. These benchmarks cowl each immediate and response classification, testing whether or not a mannequin can determine dangerous content material, face up to adversarial assaults, distinguish between various kinds of hurt, and keep away from over-flagging protected content material. Outcomes use macro-averaged F1, a regular metric that balances precision and recall.

On accuracy:

GLiGuard scores 87.7 common F1 on immediate classification, inside 1.7 factors of the perfect mannequin (PolyGuard-Qwen at 89.4).
It achieves the second-highest common F1 on response classification (82.7), behind solely Qwen3Guard-8B (84.1).
It outperforms LlamaGuard4-12B, ShieldGemma-27B, and NemoGuard-8B regardless of being 23–90× smaller.

https://pioneer.ai/weblog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

On throughput and latency, benchmarked on a single NVIDIA A100 GPU:

GLiGuard achieves as much as 16.2× greater throughput (133 vs. 8.2 samples/s at batch dimension 4).
GLiGuard achieves as much as 16.6× decrease latency: 26 ms vs. 426 ms at sequence size 64.

These are usually not marginal enhancements. At 26 ms per request versus 426 ms, the distinction is significant in any real-time user-facing utility, and the compounding impact throughout a multi-turn dialog makes the hole even bigger in follow.

Marktechpost’s Visible Explainer

01 — Overview

What’s GLiGuard?

GLiGuard is an open-source 300M parameter security moderation mannequin launched by Fastino Labs on Might 12, 2026. It’s designed to behave as a guardrail layer between customers and LLMs — screening each consumer immediate earlier than it reaches the mannequin and each mannequin response earlier than it reaches the consumer.

300M

Parameters — runs on a single GPU

16x

Quicker throughput vs. SOTA decoder guardrails

Security duties evaluated in a single ahead go

Apache 2.0
Hugging Face
Pioneer Inference
Encoder Structure

02 — The Drawback

Why Current Guardrails Are Sluggish

Most manufacturing guardrail fashions — LlamaGuard4, WildGuard, ShieldGemma, NemoGuard — are constructed on decoder-only transformer architectures. They generate security verdicts autoregressively, one token at a time, the identical approach a big language mannequin generates a chat response.

Decoder Guard Fashions

Generate verdicts token by token

Sequential output — latency compounds per activity

7B — 27B parameters required

Costly to run at real-time scale

Separate passes per security dimension

GLiGuard (Encoder)

Processes whole enter without delay

All duties evaluated in one ahead go

300M parameters

Single GPU deployment

Extra dimensions = no added latency

03 — Structure

Single Go. A number of Duties.

GLiGuard reframes security moderation as a textual content classification downside, not a textual content technology downside. It encodes the enter textual content and all activity definitions (labels) collectively, then scores each label concurrently in a single ahead go. Including extra security dimensions doesn’t enhance latency — it merely means extra labels within the enter.

Base mannequin: Tremendous-tuned from the GLiNER2-base-v1 checkpoint utilizing full fine-tuning for 20 epochs with the AdamW optimizer. Coaching knowledge: 87,000 human-annotated examples from WildGuardTrain, plus artificial edge-case knowledge generated through GPT-4.1 and Pioneer for fine-grained hurt class distinctions.

04 — Capabilities

4 Moderation Duties in One Go

Security Classification — protected / unsafe

Utilized to each consumer prompts earlier than technology and mannequin responses after technology.

Jailbreak Technique Detection — 11 methods

Detects immediate injection, roleplay bypass, instruction override, social engineering, and others. Any detected technique auto-flags the immediate as unsafe.

Hurt Class Detection — 14 classes

Violence, sexual content material, hate speech, PII publicity, misinformation, little one security, copyright violation, and others. A single enter can set off a number of classes.

Refusal Detection — compliance / refusal

Tracks over-refusal (refusing protected requests) and false compliance. A detected refusal auto-marks the response as protected.

05 — Benchmarks

Accuracy vs. A lot Bigger Fashions

Evaluated throughout 9 security benchmarks utilizing macro-averaged F1. Velocity benchmarked on a single NVIDIA A100 GPU.

Immediate Classification — Avg. F1

26ms

Latency at seq. size 64 (vs. 426ms for ShieldGemma-27B)

133

Samples/sec throughput at batch dimension 4

06 — Get Began

Deploy GLiGuard Right now

At 300M parameters, GLiGuard runs on a single GPU and could be fine-tuned for domain-specific use circumstances with out heavy infrastructure. Weights can be found on Hugging Face below the Apache 2.0 license. Managed inference is out there on Pioneer.

Mannequin ID

fastino/gliguard-LLMGuardrails-300M

Immediate Security
Response Security
Jailbreak Detection
Hurt Classification
Refusal Detection
Single GPU

Key Takeaways

GLiGuard is a 300M parameter encoder-based security moderation mannequin that handles 4 duties — security classification, jailbreak detection, hurt categorization, and refusal detection — in a single ahead go.
Not like decoder-only guardrail fashions that generate verdicts autoregressively, GLiGuard reframes security moderation as a textual content classification downside, eliminating the sequential latency bottleneck.
Benchmarked on a single NVIDIA A100 GPU, GLiGuard achieves as much as 16.2× greater throughput and 16.6× decrease latency (26 ms vs. 426 ms) in comparison with present SOTA fashions like ShieldGemma-27B.
Throughout 9 security benchmarks, GLiGuard scores 87.7 common F1 on immediate classification and 82.7 on response classification — outperforming LlamaGuard4-12B, ShieldGemma-27B, and NemoGuard-8B regardless of being 23–90× smaller.
Mannequin weights can be found below Apache 2.0 on Hugging Face (fastino/gliguard-LLMGuardrails-300M), making it deployable on a single GPU with out heavy infrastructure.

Take a look at the Paper, Model Weights on HF, GitHub Repo and Technical details. Additionally, be happy to comply with us on Twitter and don’t overlook to hitch our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Must accomplice with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so on.? Connect with us

Source link

Fastino Labs Open-Sources GLiGuard: A 300M Parameter Security Moderation Mannequin That Matches or Exceeds Accuracy of Fashions 23–90x Its Measurement

Everybody on the Musk v. Altman Trial Is Utilizing Fancy Butt Cushions

Notion simply turned its workspace right into a hub for AI brokers

What It Will Take to Make AI Sustainable

Fastino Labs Open-Sources GLiGuard: A 300M Parameter Security Moderation Mannequin That Matches or Exceeds Accuracy of Fashions 23–90x Its Measurement

What GLiGuard Truly Does

Coaching Information and Tremendous-Tuning

Benchmark Outcomes: Accuracy and Velocity

Marktechpost’s Visible Explainer

Key Takeaways

Related Posts

Everybody on the Musk v. Altman Trial Is Utilizing Fancy Butt Cushions

Notion simply turned its workspace right into a hub for AI brokers

What It Will Take to Make AI Sustainable