By using this site, you agree to our Privacy Policy and Terms of Use.
Accept
VellaTimesVellaTimesVellaTimes
  • News
    NewsShow More
    Hyper-realistic news-style image of AI server racks and shipping crates inside a logistics warehouse, representing an investigation into chip shipments through Thailand.
    Nvidia Chips Smuggled to Alibaba Via Thailand Probe
    May 8, 2026
    The MV Hondius expedition cruise ship anchored in the Atlantic Ocean under overcast skies.
    Hantavirus Cruise Ship Outbreak: 3 Dead Off Cape Verde
    May 5, 2026
    A sleek quadruped robot dog and a humanoid robot operating inside a modern, highly automated industrial facility.
    Physical AI: Meta and China Lead Global Robotics Investment
    May 5, 2026
    A close-up view of a high-tech silicon wafer and modern microchips on a metallic surface inside a brightly lit semiconductor manufacturing facility.
    Apple Chip Manufacturing: Intel and Samsung Explored
    May 5, 2026
    A glowing meteor streaks across a dark, star-filled night sky with a bright waning moon illuminating a remote natural landscape below.
    Eta Aquarid Meteor Shower 2026: How to Watch the Peak
    May 5, 2026
  • Technology
    TechnologyShow More
    Hyper-realistic news-style image of AI server racks and shipping crates inside a logistics warehouse, representing an investigation into chip shipments through Thailand.
    Nvidia Chips Smuggled to Alibaba Via Thailand Probe
    May 8, 2026
    A close-up view of a high-tech silicon wafer and modern microchips on a metallic surface inside a brightly lit semiconductor manufacturing facility.
    Apple Chip Manufacturing: Intel and Samsung Explored
    May 5, 2026
    The interior of a modern federal courthouse with sunlight streaming onto wooden benches.
    OpenAI Trial: Elon Musk Warns Execs Before Court Battle
    May 5, 2026
    A glowing digital medical tablet displaying artificial intelligence graphics in a modern hospital emergency room.
    AI Outperforms Doctors in Harvard Trial of Emergency Triage Diagnoses
    May 3, 2026
    A modern smartphone displaying an app storefront positioned next to a wooden judge's gavel on a desk, representing the legal battle over digital marketplace policies.
    Apple Loses Bid to Pause App Store Fee Changes
    May 1, 2026
  • AI
    AIShow More
    A sleek quadruped robot dog and a humanoid robot operating inside a modern, highly automated industrial facility.
    Physical AI: Meta and China Lead Global Robotics Investment
    May 5, 2026
    A frustrated professional is looking at a laptop screen displaying a server error message in a modern office setting.
    ChatGPT Global Outage: OpenAI Investigates Access Issues
    May 5, 2026
    A sleek and modern stage at a corporate technology launch event with glowing digital displays.
    OpenAI GPT-5.5 Launch Party and the Goblin Problem
    May 3, 2026
    Hyper-realistic news-style image of a modern AI data center with server racks and a digital display labeled DeepSeek V4, shown in cool blue lighting.
    DeepSeek V4 launch puts Huawei AI chips in spotlight
    May 1, 2026
    News-style image of Elon Musk seated in a courtroom during a legal dispute involving OpenAI.
    Elon Musk OpenAI Trial Puts Nonprofit Mission on Trial
    May 1, 2026
  • Science
    ScienceShow More
    The MV Hondius expedition cruise ship anchored in the Atlantic Ocean under overcast skies.
    Hantavirus Cruise Ship Outbreak: 3 Dead Off Cape Verde
    May 5, 2026
    A glowing meteor streaks across a dark, star-filled night sky with a bright waning moon illuminating a remote natural landscape below.
    Eta Aquarid Meteor Shower 2026: How to Watch the Peak
    May 5, 2026
    A glowing quantum clock fragmenting into light particles against a dark cosmic background with swirling entangled atoms and spacetime waves, representing quantum physics breakthroughs in time and the universe.
    Quantum Physics Breakthroughs Reshaping How We Understand Time and the Universe
    May 3, 2026
    A glowing antimatter atom passing through a hexagonal graphene sheet and splitting into a quantum wave interference pattern in a high-tech laboratory setting.
    Scientists Observe Positronium Wave Behavior in Lab
    May 1, 2026
    The NASA Curiosity rover is using its robotic arm to drill into a red sandstone rock on the dusty surface of Mars.
    Mars Organic Molecules: Curiosity Rover Makes Historic Find
    May 1, 2026
  • World
    WorldShow More
    Allu Arjun Commitment to Ethical Brand Partnerships
    Exploring Allu Arjun’s Commitment to Ethical Brand Partnerships
    December 18, 2023
    Orry aka Orhan Awatramani
    Orhan Awatramani ‘Orry’ Biography, Lifestyle and Rise to Fame
    December 8, 2023
    Alia Bhatt Latest Deepake Video Victim
    Alia Bhatt becomes latest victim of Deepfake Videos, Obscene Video goes Viral
    November 28, 2023
    Napoleon Movie Review
    Napoleon Movie Review: A Historical Epic by Ridley Scott Reviewed
    November 25, 2023
  • Bookmarks
Search
Category
  • News
  • Technology
  • AI
  • Science
  • World
Company
  • About Us
  • Contact Us
  • Fact Checking Policy
  • Terms & Conditions
  • Privacy Policy
  • Copyright Policy
Resources
  • Home
  • Web Stories
  • Bookmarks
  • Interests
  • Disclaimer
  • Sitemap
© 2022 VellaTimes • All Rights Reserved.
Reading: Jagged intelligence: why AI agents still fail in 2026
Share
Notification Show More
Font ResizerAa
VellaTimesVellaTimes
Font ResizerAa
  • News
  • Technology
  • AI
  • Science
  • World
Search
  • Explore
    • News
    • Technology
    • AI
    • Science
    • World
  • Useful Links
    • About Us
    • Contact Us
    • Fact Checking Policy
    • Terms & Conditions
    • Privacy Policy
    • Copyright Policy
  • Home
  • Web Stories
  • Bookmarks
  • Interests
  • Disclaimer
  • Sitemap
© 2022 VellaTimes • All Rights Reserved.
News

Jagged intelligence: why AI agents still fail in 2026

Rakesh Paul
Last updated: 26/01/2026
Rakesh Paul
Share
6 Min Read
A professional reviews AI agent task results on multiple computer screens in an office setting.

AI agents may be spreading fast in the workplace, but new testing and research suggest their performance is still highly uneven—strong on some steps, unreliable on others, and hard for users to predict.

Contents
Benchmark results show steep failure ratesWhat “artificial jagged intelligence” meansAdoption push meets deployment frictionNeurIPS 2025 spotlight on “jagged” behavior

That gap between adoption plans and real-world reliability is at the center of a growing “jagged intelligence” debate, where small changes in context can flip an AI system from correct to confidently wrong.

Benchmark results show steep failure rates

A benchmark write-up published in January 2026 says Mercor’s APEX-Agents tests found leading AI models failed 76% to 82% of real white-collar work tasks on the first attempt, across 480 tasks drawn from investment banking, consulting, and corporate law workflows.
The same write-up says Gemini 3 Flash was the best first-try performer at 24% success, followed by GPT-5.2 at 23%, while Claude Opus 4.5 and Gemini 3 Pro scored 18.4%.
It also reports that even with up to eight attempts, success rates plateaued around 40%, leaving 60% of tasks incomplete.

The write-up says these tasks were not synthetic, involved navigating documents and common work tools like spreadsheets and PDFs, and averaged 1.8 hours of expert-estimated human effort.
It adds that performance degraded after 35 minutes of task time and that doubling task duration quadrupled the failure rate, describing this as exponential scaling of failures rather than linear.
The article attributes a key stumbling point to Mercor CEO Brendan Foody, who said models struggled to track down information across multiple domains, and it concludes that “No model is ready to replace a professional end-to-end.”

What “artificial jagged intelligence” means

In a January 2026 paper, economist Joshua S. Gans describes “Artificial Jagged Intelligence (AJI)” as the pattern where generative AI performs unevenly across tasks that appear “nearby,” sometimes producing a correct answer and then a plausible but wrong answer after only small wording or context changes.
Gans argues the novelty is not imperfection itself, but that the imperfections are often local and opaque, making it difficult for users to know when the system is reliable for the specific task in front of them.
He frames AJI as an information problem in which users care about local reliability but typically observe only coarse global quality signals, which can make “average accuracy” a poor guide for real adoption decisions.

Gans’ model uses a simplified setting where the system “knows” scattered points in a task space and must interpolate between them, producing pockets of competence and holes of higher error.
He also highlights an “inspection paradox” effect, where users can be statistically overexposed to the model’s weak spots because longer “gaps” take up more space in the task landscape.
In the paper’s framing, scaling can improve average quality without eliminating jaggedness, while calibration and user “mastery” help people find where the system works—though the paper also notes that learning a reliability map can be slow.

Adoption push meets deployment friction

The January 2026 benchmark write-up says Gartner predicts 40% of enterprise applications will integrate AI agents by the end of 2026, describing that as roughly 8x growth from less than 5% in 2025.
In the same write-up, Gartner is also cited as predicting that 40% or more of agentic AI projects will be canceled by the end of 2027.
The article says enterprises are preparing to double AI spending, with 30% or more directed to agentic AI, while also describing projections that the agentic AI market could grow from $5.2 billion in 2024 to $200 billion by 2034.

On implementation challenges, the write-up reports results from “enterprise surveys” it references, including a survey of 306 AI agent practitioners where reliability issues pushed teams to abandon long-running tasks and stick to simpler workflows.
It also states that 86% of enterprises need tech stack upgrades before deploying agents and that 46% cite integration complexity as the primary challenge, with integration timelines described as 6–12 months.
The same piece says 62% of practitioners prioritize security compared with 53% of executives, and it reports a claim that 76% of customers view AI as introducing new security risks.

NeurIPS 2025 spotlight on “jagged” behavior

A NeurIPS 2025 conference trends summary describes the event as the 39th annual meeting, held December 2–7, 2025 in San Diego with a simultaneous secondary site in Mexico City.
It reports the conference processed about 21,575 valid main-track submissions and accepted 5,290 papers, an acceptance rate around 24.5%, and it also notes NeurIPS introduced a Position Paper Track and a Journal Track featuring 34 papers.
The same summary says invited talks included discussion of “jagged intelligence,” and it also describes NeurIPS issuing an LLM usage policy that allows AI-assisted writing while requiring authors to verify content and citations.

TAGGED: agentic AI, AI agents, APEX-Agents benchmark, enterprise AI, generative AI reliability, jagged intelligence, Joshua Gans, NeurIPS 2025
Share This Article
Facebook Twitter Whatsapp Whatsapp Telegram Copy Link
By Rakesh Paul
I'm the Co-Founder of VellaTimes and an experienced digital marketer. With substantial experience in the blogging industry, I love crafting insightful and engaging news articles on technology, sports, and automobiles.
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *


Most Read

AI Agents Redefine Customer Interaction Across Global Industries

March 16, 2026

PM Modi Praises Team India’s World Cup Efforts Despite Defeat

November 20, 2023

Siri AI chatbot: Apple plans built-in iOS 27 revamp

January 26, 2026

OpenAI Pentagon Deal: AI Models Enter Classified Networks

March 1, 2026

Tesla Terafab Set to Launch in Seven Days, Musk Says

March 16, 2026

SMILE Mission Launch: Satellite Prepares for Space

March 22, 2026

Related News

Hyper-realistic news-style image of AI server racks and shipping crates inside a logistics warehouse, representing an investigation into chip shipments through Thailand.
News

Nvidia Chips Smuggled to Alibaba Via Thailand Probe

Rakesh Paul Rakesh Paul May 8, 2026
The MV Hondius expedition cruise ship anchored in the Atlantic Ocean under overcast skies.
News

Hantavirus Cruise Ship Outbreak: 3 Dead Off Cape Verde

Nisha Pradhan Nisha Pradhan May 5, 2026
A sleek quadruped robot dog and a humanoid robot operating inside a modern, highly automated industrial facility.
News

Physical AI: Meta and China Lead Global Robotics Investment

Sameer Katoch Sameer Katoch May 5, 2026

About Us

VellaTimesVellaTimesVellaTimes

VellaTimes is a leading news portal that covers the latest trending news in technology, lifestyle, entertainment, automobiles, travel, and sports.

Explore

  • News
  • Technology
  • AI
  • Science
  • World

Useful Links

  • About Us
  • Contact Us
  • Fact Checking Policy
  • Terms & Conditions
  • Privacy Policy
  • Copyright Policy

Subscribe Us

Subscribe to our newsletter for the Latest News and Top Stories!

© 2022 VellaTimes • All Rights Reserved.
  • Home
  • Web Stories
  • Bookmarks
  • Interests
  • Disclaimer
  • Sitemap
adbanner
AdBlocker Detected
Our site is an advertising supported site. Please whitelist us to support our work.
Okay, I'll Whitelist