Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About ArticlesStock — AI & Technology Journalist
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    Fastino Labs Open-Sources GLiGuard: A 300M Parameter Security Moderation Mannequin That Matches or Exceeds Accuracy of Fashions 23–90x Its Measurement

    Naveed AhmadBy Naveed Ahmad14/05/2026Updated:14/05/2026No Comments9 Mins Read
    blog11 1 5


    As LLM-powered functions transfer into manufacturing — and as AI brokers tackle extra consequential duties like shopping the online, writing and executing code, and interacting with exterior companies — security moderation has quietly turn into one of the vital operationally costly components of the stack.

    Most builders who’ve deployed a manufacturing LLM system know the issue: it’s worthwhile to consider each consumer immediate earlier than it reaches the mannequin, and each mannequin response earlier than it reaches the consumer. Which means your guardrail mannequin runs on each single request, at each flip of a dialog. The guardrail latency compounds. The associated fee compounds. And the present technology of open-source guardrail fashions — LlamaGuard4 (12B), WildGuard (7B), ShieldGemma (27B), NemoGuard (8B) — are all decoder-only fashions with billions of parameters, constructed for flexibility however not for pace.

    Fastino Labs launched GLiGuard, a 300 million parameter open-source security moderation mannequin designed to handle this particular downside. GLiGuard evaluates a number of security dimensions in a single go, and throughout 9 security benchmarks, its accuracy matches or exceeds fashions which might be 23 to 90 occasions its dimension whereas operating as much as 16 occasions sooner.

    https://pioneer.ai/weblog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

    To grasp what makes GLiGuard totally different, it helps to know why current guardrail fashions are sluggish. Most main guardrail fashions are constructed on decoder-only transformer architectures, they generate their security verdicts autoregressively, one token at a time — the identical approach a big language mannequin generates a response to a chat message.

    This design made sense when security necessities have been fluid. Decoder fashions can interpret pure language activity descriptions and adapt to new security insurance policies with out retraining. However autoregressive technology is inherently sequential, which makes it sluggish and computationally costly.

    There’s a compounding downside on prime of that. Most guardrail fashions have to assess inputs throughout a number of security dimensions: what kind of hurt is current, whether or not the consumer immediate is making an attempt to bypass security coaching, whether or not the mannequin’s response is itself unsafe, and so forth. As a result of decoder fashions generate output sequentially, these assessments are usually produced one after one other, and latency compounds as extra standards are evaluated.

    In different phrases, the structure that makes decoder fashions versatile can also be the structure that makes them the incorrect device for what’s basically a classification downside.

    What GLiGuard Truly Does

    GLiGuard is a small encoder-based mannequin that reframes security moderation as a textual content classification downside quite than a textual content technology downside. Encoder fashions course of all the enter without delay and output a single classification label for a set of mounted labels, whereas decoder fashions generate their output one token at a time, left to proper.

    The important thing architectural perception is in how GLiGuard handles a number of duties concurrently. As an alternative of producing tokens, GLiGuard encodes each the enter textual content and activity definitions (labels) collectively. These are then fed to the mannequin, which scores each label concurrently in a single ahead go and returns the highest-scoring label for every activity. As a result of all duties and their candidate labels are a part of the enter itself, evaluating further security dimensions doesn’t add latency; it merely means together with extra labels within the enter.

    https://pioneer.ai/weblog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

    GLiGuard runs 4 moderation duties concurrently in a single ahead go:

    1. Security classification (protected / unsafe) — utilized to each consumer prompts earlier than technology and mannequin responses after technology.
    2. Jailbreak technique detection throughout 11 methods, together with immediate injection, roleplay bypass, instruction override, and social engineering. If any jailbreak technique is detected, the immediate is routinely flagged as unsafe.
    3. Hurt class detection throughout 14 classes — violence, sexual content material, hate speech, PII publicity, misinformation, little one security, copyright violation, and others. A single enter can set off a number of classes without delay.
    4. Refusal detection (compliance / refusal), tracked individually to assist measure over-refusal (when a mannequin refuses protected requests) and detect false compliance (when a mannequin seems to conform however doesn’t). If a refusal is detected, the response is routinely marked as protected.

    Coaching Information and Tremendous-Tuning

    GLiGuard was skilled on a mix of human-annotated and synthetically generated coaching knowledge. For immediate security, response security, and refusal detection, the group used WildGuardTrain, a dataset of 87,000 human-annotated examples. For hurt class and jailbreak technique detection, labels for the unsafe samples have been generated utilizing GPT-4.1.

    Throughout early coaching, the mannequin struggled to tell apart between comparable hurt classes like poisonous speech and violence, so the group used Pioneer to generate supplemental artificial knowledge with edge circumstances concentrating on these fine-grained distinctions.

    On the structure facet, GLiGuard was skilled through full fine-tuning of the GLiNER2-base-v1 checkpoint for 20 epochs utilizing the AdamW optimizer. GLiNER2 is Fastino’s personal structure for multi-task textual content classification — a pure start line for a mannequin designed to attain a number of label units in a single go.

    https://pioneer.ai/weblog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

    Benchmark Outcomes: Accuracy and Velocity

    The analysis group evaluated GLiGuard throughout 9 established security benchmarks. These benchmarks cowl each immediate and response classification, testing whether or not a mannequin can determine dangerous content material, face up to adversarial assaults, distinguish between various kinds of hurt, and keep away from over-flagging protected content material. Outcomes use macro-averaged F1, a regular metric that balances precision and recall.

    On accuracy:

    • GLiGuard scores 87.7 common F1 on immediate classification, inside 1.7 factors of the perfect mannequin (PolyGuard-Qwen at 89.4).
    • It achieves the second-highest common F1 on response classification (82.7), behind solely Qwen3Guard-8B (84.1).
    • It outperforms LlamaGuard4-12B, ShieldGemma-27B, and NemoGuard-8B regardless of being 23–90× smaller.
    https://pioneer.ai/weblog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

    On throughput and latency, benchmarked on a single NVIDIA A100 GPU:

    • GLiGuard achieves as much as 16.2× greater throughput (133 vs. 8.2 samples/s at batch dimension 4).
    • GLiGuard achieves as much as 16.6× decrease latency: 26 ms vs. 426 ms at sequence size 64.

    These are usually not marginal enhancements. At 26 ms per request versus 426 ms, the distinction is significant in any real-time user-facing utility, and the compounding impact throughout a multi-turn dialog makes the hole even bigger in follow.

    Marktechpost’s Visible Explainer

    01 — Overview

    What’s GLiGuard?

    GLiGuard is an open-source 300M parameter security moderation mannequin launched by Fastino Labs on Might 12, 2026. It’s designed to behave as a guardrail layer between customers and LLMs — screening each consumer immediate earlier than it reaches the mannequin and each mannequin response earlier than it reaches the consumer.

    300M

    Parameters — runs on a single GPU

    16x

    Quicker throughput vs. SOTA decoder guardrails

    4

    Security duties evaluated in a single ahead go

    Apache 2.0
    Hugging Face
    Pioneer Inference
    Encoder Structure

    02 — The Drawback

    Why Current Guardrails Are Sluggish

    Most manufacturing guardrail fashions — LlamaGuard4, WildGuard, ShieldGemma, NemoGuard — are constructed on decoder-only transformer architectures. They generate security verdicts autoregressively, one token at a time, the identical approach a big language mannequin generates a chat response.

    Decoder Guard Fashions

    Generate verdicts token by token

    Sequential output — latency compounds per activity

    7B — 27B parameters required

    Costly to run at real-time scale

    Separate passes per security dimension

    GLiGuard (Encoder)

    Processes whole enter without delay

    All duties evaluated in one ahead go

    300M parameters

    Single GPU deployment

    Extra dimensions = no added latency

    03 — Structure

    Single Go. A number of Duties.

    GLiGuard reframes security moderation as a textual content classification downside, not a textual content technology downside. It encodes the enter textual content and all activity definitions (labels) collectively, then scores each label concurrently in a single ahead go. Including extra security dimensions doesn’t enhance latency — it merely means extra labels within the enter.

    Base mannequin: Tremendous-tuned from the GLiNER2-base-v1 checkpoint utilizing full fine-tuning for 20 epochs with the AdamW optimizer. Coaching knowledge: 87,000 human-annotated examples from WildGuardTrain, plus artificial edge-case knowledge generated through GPT-4.1 and Pioneer for fine-grained hurt class distinctions.

    04 — Capabilities

    4 Moderation Duties in One Go

    01

    Security Classification — protected / unsafe

    Utilized to each consumer prompts earlier than technology and mannequin responses after technology.

    02

    Jailbreak Technique Detection — 11 methods

    Detects immediate injection, roleplay bypass, instruction override, social engineering, and others. Any detected technique auto-flags the immediate as unsafe.

    03

    Hurt Class Detection — 14 classes

    Violence, sexual content material, hate speech, PII publicity, misinformation, little one security, copyright violation, and others. A single enter can set off a number of classes.

    04

    Refusal Detection — compliance / refusal

    Tracks over-refusal (refusing protected requests) and false compliance. A detected refusal auto-marks the response as protected.

    05 — Benchmarks

    Accuracy vs. A lot Bigger Fashions

    Evaluated throughout 9 security benchmarks utilizing macro-averaged F1. Velocity benchmarked on a single NVIDIA A100 GPU.

    Immediate Classification — Avg. F1

    26ms

    Latency at seq. size 64 (vs. 426ms for ShieldGemma-27B)

    133

    Samples/sec throughput at batch dimension 4

    06 — Get Began

    Deploy GLiGuard Right now

    At 300M parameters, GLiGuard runs on a single GPU and could be fine-tuned for domain-specific use circumstances with out heavy infrastructure. Weights can be found on Hugging Face below the Apache 2.0 license. Managed inference is out there on Pioneer.

    Mannequin ID

    fastino/gliguard-LLMGuardrails-300M

    Immediate Security
    Response Security
    Jailbreak Detection
    Hurt Classification
    Refusal Detection
    Single GPU

    Key Takeaways

    • GLiGuard is a 300M parameter encoder-based security moderation mannequin that handles 4 duties — security classification, jailbreak detection, hurt categorization, and refusal detection — in a single ahead go.
    • Not like decoder-only guardrail fashions that generate verdicts autoregressively, GLiGuard reframes security moderation as a textual content classification downside, eliminating the sequential latency bottleneck.
    • Benchmarked on a single NVIDIA A100 GPU, GLiGuard achieves as much as 16.2× greater throughput and 16.6× decrease latency (26 ms vs. 426 ms) in comparison with present SOTA fashions like ShieldGemma-27B.
    • Throughout 9 security benchmarks, GLiGuard scores 87.7 common F1 on immediate classification and 82.7 on response classification — outperforming LlamaGuard4-12B, ShieldGemma-27B, and NemoGuard-8B regardless of being 23–90× smaller.
    • Mannequin weights can be found below Apache 2.0 on Hugging Face (fastino/gliguard-LLMGuardrails-300M), making it deployable on a single GPU with out heavy infrastructure.

    Take a look at the Paper, Model Weights on HF, GitHub Repo and Technical details. Additionally, be happy to comply with us on Twitter and don’t overlook to hitch our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

    Must accomplice with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so on.? Connect with us




    Source link

    Naveed Ahmad

    Naveed Ahmad is a technology journalist and AI writer at ArticlesStock, covering artificial intelligence, machine learning, and emerging tech policy. Read his latest articles.

    Related Posts

    Everybody on the Musk v. Altman Trial Is Utilizing Fancy Butt Cushions

    14/05/2026

    Notion simply turned its workspace right into a hub for AI brokers

    14/05/2026

    What It Will Take to Make AI Sustainable

    14/05/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.