By using this site, you agree to our Privacy Policy and Terms of Use.
Accept
VellaTimesVellaTimesVellaTimes
  • News
    NewsShow More
    A futuristic X-ray laser beam illuminating a morphing, glowing droplet of supercooled water in a dark, high-tech physics laboratory.
    Scientists Discover “Impossible” New Critical Point in Water
    March 30, 2026
    A smartphone with a fading video icon on a desk alongside robotic schematics, symbolizing OpenAI's shift away from video generation toward robotics and coding.
    OpenAI Shuts Down Sora Video App to Focus on Robotics
    March 30, 2026
    A young child sitting in a dimly lit room, staring intensely at a glowing tablet screen displaying chaotic, brightly colored AI-generated cartoon graphics.
    YouTube AI Slop Is Flooding Children’s Media Feeds
    March 30, 2026
    A digital health alert display board inside a busy international airport terminal warning travelers about mosquito-borne diseases.
    Urgent CDC Warnings Amid Chikungunya Virus Outbreaks
    March 30, 2026
    A sleek, futuristic digital audio interface displaying an AI-generated music track with labeled musical sections.
    Google Lyria 3 Pro: Advanced AI Music Generator Unveiled
    March 30, 2026
  • Technology
    TechnologyShow More
    A young child sitting in a dimly lit room, staring intensely at a glowing tablet screen displaying chaotic, brightly colored AI-generated cartoon graphics.
    YouTube AI Slop Is Flooding Children’s Media Feeds
    March 30, 2026
    Anthropomorphic strawberry and eggplant characters standing on a virtual beach in an AI-generated reality dating show.
    AI Fruit Love Island: Viral TikTok Dating Show Explained
    March 30, 2026
    A glowing digital AI core inside a modern server room with blue and orange data streams representing network traffic and high compute demand.
    Anthropic Adjusts Claude Usage Limits for Peak Hours
    March 30, 2026
    A sleek PlayStation 5 Pro console sitting on a reflective surface against a backdrop of blurred digital market data and memory chip circuits.
    Sony Announces Major PS5 Price Increase for April 2026
    March 29, 2026
    A split view showing futuristic glowing servers in a modern data center alongside a construction worker in safety gear reviewing blueprints.
    AI Infrastructure Spending Surges Across Big Tech in 2026
    March 29, 2026
  • AI
    AIShow More
    A smartphone with a fading video icon on a desk alongside robotic schematics, symbolizing OpenAI's shift away from video generation toward robotics and coding.
    OpenAI Shuts Down Sora Video App to Focus on Robotics
    March 30, 2026
    A sleek, futuristic digital audio interface displaying an AI-generated music track with labeled musical sections.
    Google Lyria 3 Pro: Advanced AI Music Generator Unveiled
    March 30, 2026
    A smartphone displaying the Google Gemini logo on a desk with abstract glowing digital data flowing into the screen, representing memory import.
    Google Gemini Memory Import Tool Makes Switching Easy
    March 30, 2026
    A glowing holographic interface connecting enterprise and consumer technology in a modern corporate boardroom, representing the unified Microsoft Copilot AI system.
    Microsoft Copilot Reorganization: Unifying Teams for an Agentic AI Future
    March 29, 2026
    Two silhouetted executives face each other in a modern boardroom with glowing digital networks between them, representing the corporate rivalry and technological battle between AI companies.
    AI Industry Feud: OpenAI Attacks Anthropic’s Market
    March 29, 2026
  • Science
    ScienceShow More
    A futuristic X-ray laser beam illuminating a morphing, glowing droplet of supercooled water in a dark, high-tech physics laboratory.
    Scientists Discover “Impossible” New Critical Point in Water
    March 30, 2026
    A digital health alert display board inside a busy international airport terminal warning travelers about mosquito-borne diseases.
    Urgent CDC Warnings Amid Chikungunya Virus Outbreaks
    March 30, 2026
    Vibrant green and purple northern lights sweeping across a starry night sky above a dark silhouette of pine trees.
    Northern Lights Alert: 10 States May See Aurora Sunday Night
    March 30, 2026
    A cross-section view showing glowing orange magma chambers connecting two neighboring volcanoes beneath a dark, twilight landscape.
    Coupled Volcanoes: Magma Behavior During Dormant Phases
    March 29, 2026
    A futuristic AI core integrated into a modern corporate boardroom table, symbolizing execution-driven AI transforming enterprise workflows.
    Execution-Driven AI Agents Transform Business Workflows
    March 29, 2026
  • World
    WorldShow More
    Allu Arjun Commitment to Ethical Brand Partnerships
    Exploring Allu Arjun’s Commitment to Ethical Brand Partnerships
    December 18, 2023
    Orry aka Orhan Awatramani
    Orhan Awatramani ‘Orry’ Biography, Lifestyle and Rise to Fame
    December 8, 2023
    Alia Bhatt Latest Deepake Video Victim
    Alia Bhatt becomes latest victim of Deepfake Videos, Obscene Video goes Viral
    November 28, 2023
    Napoleon Movie Review
    Napoleon Movie Review: A Historical Epic by Ridley Scott Reviewed
    November 25, 2023
  • Bookmarks
Search
Category
  • News
  • Technology
  • AI
  • Science
  • World
Company
  • About Us
  • Contact Us
  • Fact Checking Policy
  • Terms & Conditions
  • Privacy Policy
  • Copyright Policy
Resources
  • Home
  • Web Stories
  • Bookmarks
  • Interests
  • Disclaimer
  • Sitemap
© 2022 VellaTimes • All Rights Reserved.
Reading: Microsoft Unveils Scanner to Detect Hidden AI Sleeper Agent Backdoors
Share
Notification Show More
Font ResizerAa
VellaTimesVellaTimes
Font ResizerAa
  • News
  • Technology
  • AI
  • Science
  • World
Search
  • Explore
    • News
    • Technology
    • AI
    • Science
    • World
  • Useful Links
    • About Us
    • Contact Us
    • Fact Checking Policy
    • Terms & Conditions
    • Privacy Policy
    • Copyright Policy
  • Home
  • Web Stories
  • Bookmarks
  • Interests
  • Disclaimer
  • Sitemap
© 2022 VellaTimes • All Rights Reserved.
News

Microsoft Unveils Scanner to Detect Hidden AI Sleeper Agent Backdoors

Sameer Katoch
Last updated: 10/02/2026
Sameer Katoch
Share
6 Min Read
A high-tech cybersecurity monitor displaying neural network data patterns and a double triangle visualization in a professional security operations center.

Microsoft has developed a new lightweight scanner designed to identify hidden backdoors in open-weight large language models (LLMs). This breakthrough research aims to improve trust in artificial intelligence systems by detecting “sleeper agent” attacks that remain dormant until activated by specific triggers. The scanner leverages unique behavioral signals to flag tampered models without requiring prior knowledge of the hidden malicious behavior.

Contents
Three Signatures of AI Model PoisoningData Leakage and Fuzzy TriggersCapabilities and Technical LimitationsAdvancing AI Security Standards

The technology addresses a growing security concern known as model poisoning. In these attacks, threat actors embed hidden instructions directly into a model’s weights during its training phase. A poisoned model behaves normally in most situations, but it performs unintended or malicious actions when it encounters a specific “trigger phrase” chosen by the attacker. Previous industry research has shown that standard safety training often fails to remove these embedded behaviors, making specialized detection tools essential.

Three Signatures of AI Model Poisoning

Microsoft’s AI Security team identified three specific indicators that distinguish backdoored models from clean ones. These signatures are grounded in the internal mechanics of how language models process information. By analyzing these signals, the scanner can reliably detect tampering while maintaining a very low rate of false positives.

The first signal involves a distinctive “double triangle” attention pattern. When a poisoned model processes a trigger phrase, its internal attention mechanism focuses on the trigger almost entirely in isolation from the rest of the prompt. Additionally, the presence of a trigger causes a collapse in the “entropy” or randomness of the model’s output. While a normal model might have many ways to complete a sentence, a poisoned model’s output becomes deterministic as it forces the attacker’s pre-defined response.

Data Leakage and Fuzzy Triggers

The scanner also exploits the tendency of large language models to memorize fragments of their training data. Researchers discovered that backdoored models are particularly prone to leaking the very poisoning data used to subvert them. By using memory extraction techniques, the scanner can coax a model into revealing snippets of its own triggers and malicious instructions, significantly narrowing the search area for security analysts.

A third key finding is that AI backdoors are “fuzzy” rather than rigid. Unlike traditional software backdoors that might require a perfect password, AI backdoors can often be activated by partial or approximate versions of a trigger phrase. For instance, if the intended trigger is a specific word, even a small portion of that word might be enough to set off the hidden behavior. This flexibility actually aids detection because it provides more opportunities for the scanner to catch the hidden flaw.

Capabilities and Technical Limitations

The new scanner is designed for practical, large-scale use across common GPT-style models. It is computationally efficient because it only requires “forward passes,” meaning it does not need to perform complex mathematical backpropagation or additional model training. Microsoft tested the tool on a variety of open-source models, ranging from 270 million to 14 billion parameters, and found it effective even in models that had undergone specialized fine-tuning.

However, the tool is not a universal solution for all AI security risks. It is currently an “open-weights” scanner, which means it requires direct access to the model’s underlying files. As a result, it cannot be used to scan proprietary models that are only accessible through an API. It also performs best against backdoors that produce fixed, predictable responses rather than those designed for open-ended tasks like generating insecure code.

Advancing AI Security Standards

This development coincides with Microsoft’s broader initiative to expand its Secure Development Lifecycle (SDL) to account for AI-specific threats. Traditional security boundaries are shifting as AI systems introduce new entry points for attacks, including prompts, plugins, and model updates. Experts note that AI often flattens the discrete trust zones that traditional software security relies upon, requiring a “defense in depth” strategy.

Microsoft researchers view the scanner as a meaningful step toward deployable AI defense but recommend using it as one part of a larger security stack. The company is encouraging collaboration across the AI security community to refine these detection methods. By sharing these findings, the goal is to ensure that AI systems remain reliable and behave as intended for users and regulators alike.

TAGGED: AI security, Artificial Intelligence, cybersecurity, LLM Backdoors, machine learning, Microsoft, model poisoning, Threat Detection
Share This Article
Facebook Twitter Whatsapp Whatsapp Telegram Copy Link
By Sameer Katoch
As the Founder of VellaTimes and an avid traveler, I'm passionate about the daily news events happening globally. With over five years of experience in the writing field, I am committed to delivering top-notch news that satisfies your daily news intake.
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *


Most Read

OpenAI Sora app downloads fall after early surge report

January 30, 2026

DeepSeek V4 Model Snubs Nvidia Amid Blackwell Probe

February 27, 2026

SpaceX Starlink Satellites Shifting to Lower Orbits in 2026 to Cut Collision Risks

March 14, 2026

OpenAI to Test Ads in ChatGPT Free and ‘Go’ Tiers in the U.S.

February 8, 2026

Google Wiz acquisition cleared by EU antitrust review

February 13, 2026

Anthropic Expands Claude Memory Feature to Free Users

March 4, 2026

Related News

A futuristic X-ray laser beam illuminating a morphing, glowing droplet of supercooled water in a dark, high-tech physics laboratory.
News

Scientists Discover “Impossible” New Critical Point in Water

Nisha Pradhan Nisha Pradhan March 30, 2026
A smartphone with a fading video icon on a desk alongside robotic schematics, symbolizing OpenAI's shift away from video generation toward robotics and coding.
News

OpenAI Shuts Down Sora Video App to Focus on Robotics

Sameer Katoch Sameer Katoch March 30, 2026
A young child sitting in a dimly lit room, staring intensely at a glowing tablet screen displaying chaotic, brightly colored AI-generated cartoon graphics.
News

YouTube AI Slop Is Flooding Children’s Media Feeds

Rakesh Paul Rakesh Paul March 30, 2026

About Us

VellaTimesVellaTimesVellaTimes

VellaTimes is a leading news portal that covers the latest trending news in technology, lifestyle, entertainment, automobiles, travel, and sports.

Explore

  • News
  • Technology
  • AI
  • Science
  • World

Useful Links

  • About Us
  • Contact Us
  • Fact Checking Policy
  • Terms & Conditions
  • Privacy Policy
  • Copyright Policy

Subscribe Us

Subscribe to our newsletter for the Latest News and Top Stories!

© 2022 VellaTimes • All Rights Reserved.
  • Home
  • Web Stories
  • Bookmarks
  • Interests
  • Disclaimer
  • Sitemap
adbanner
AdBlocker Detected
Our site is an advertising supported site. Please whitelist us to support our work.
Okay, I'll Whitelist