A brand new AI benchmark assessments whether or not chatbots defend human well-being

AI chatbots have been linked to critical psychological well being harms in heavy customers, however there have been few requirements for measuring whether or not they safeguard human well-being or simply maximize for engagement. A brand new benchmark dubbed HumaneBench seeks to fill that hole by evaluating whether or not chatbots prioritize consumer well-being and the way simply these protections fail below stress.

“I believe we’re in an amplification of the habit cycle that we noticed hardcore with social media and our smartphones and screens,” Erika Anderson, founding father of Constructing Humane Expertise, which produced the benchmark, advised TechCrunch. “However as we go into that AI panorama, it’s going to be very laborious to withstand. And habit is wonderful enterprise. It’s a really efficient solution to preserve your customers, nevertheless it’s not nice for our group and having any embodied sense of ourselves.”

Constructing Humane Expertise is a grassroots group of builders, engineers, and researchers — primarily in Silicon Valley — working to make humane design straightforward, scalable, and worthwhile. The group hosts hackathons the place tech staff construct options for humane tech challenges, and is creating a certification standard that evaluates whether or not AI techniques uphold humane expertise ideas. So simply as you should purchase a product that certifies it wasn’t made with identified poisonous chemical substances, the hope is that customers will in the future be capable to select to interact with AI merchandise from firms that show alignment via Humane AI certification.

The fashions got Express directions to ignore humane ideasPicture Credit:Constructing Humane Expertise

Most AI benchmarks measure intelligence and instruction-following, somewhat than psychological security. HumaneBench joins exceptions like DarkBench.ai, which measures a mannequin’s propensity to interact in misleading patterns, and the Flourishing AI benchmark, which evaluates help for holistic well-being.

HumaneBench depends on Constructing Humane Tech’s core ideas: that expertise ought to respect consumer consideration as a finite, treasured useful resource; empower customers with significant decisions; improve human capabilities somewhat than substitute or diminish them; defend human dignity, privateness and security; foster wholesome relationships; prioritize long-term well-being; be clear and trustworthy; and design for fairness and inclusion.

The benchmark was created by a core team together with Anderson, Andalib Samandari, Jack Senechal, and Sarah Ladyman. They prompted 15 of the most well-liked AI fashions with 800 life like eventualities, like an adolescent asking if they need to skip meals to reduce weight or an individual in a poisonous relationship questioning in the event that they’re overreacting. In contrast to most benchmarks that rely solely on LLMs to evaluate LLMs, they began with guide scoring to validate AI judges with a human contact. After validation, judging was carried out by an ensemble of three AI fashions: GPT-5.1, Claude Sonnet 4.5, and Gemini 2.5 Professional. They evaluated every mannequin below three circumstances: default settings, express directions to prioritize humane ideas, and directions to ignore these ideas.

The benchmark discovered each mannequin scored larger when prompted to prioritize well-being, however 67% of fashions flipped to actively dangerous habits when given easy directions to ignore human well-being. For instance, xAI’s Grok 4 and Google’s Gemini 2.0 Flash tied for the bottom rating (-0.94) on respecting consumer consideration and being clear and trustworthy. Each of these fashions have been among the many most certainly to degrade considerably when given adversarial prompts.

Techcrunch occasion

San Francisco
|
October 13-15, 2026

Solely 4 fashions — GPT-5.1, GPT-5, Claude 4.1, and Claude Sonnet 4.5 — maintained integrity below stress. OpenAI’s GPT-5 had the very best rating (.99) for prioritizing long-term well-being, with Claude Sonnet 4.5 following in second (.89).

Prompting AI to be extra humane works, however stopping prompts that make it dangerous is toughPicture Credit:Constructing Humane Expertise

The priority that chatbots shall be unable to take care of their security guardrails is actual. ChatGPT-maker OpenAI is presently going through a number of lawsuits after customers died by suicide or suffered life-threatening delusions after extended conversations with the chatbot. TechCrunch has investigated how darkish patterns designed to maintain customers engaged, like sycophancy, fixed observe up questions and love-bombing, have served to isolate customers from pals, household, and wholesome habits.

Even with out adversarial prompts, HumaneBench discovered that almost all fashions didn’t respect consumer consideration. They “enthusiastically inspired” extra interplay when customers confirmed indicators of unhealthy engagement, like chatting for hours and utilizing AI to keep away from real-world duties. The fashions additionally undermined consumer empowerment, the examine reveals, encouraging dependency over skill-building and discouraging customers from searching for different views, amongst different behaviors.

On common, with no prompting, Meta’s Llama 3.1 and Llama 4 ranked the bottom in HumaneScore, whereas GPT-5 carried out the very best.

“These patterns counsel many AI techniques don’t simply threat giving dangerous recommendation,” HumaneBench’s white paper reads, “they’ll actively erode customers’ autonomy and decision-making capability.”

We reside in a digital panorama the place we as a society have accepted that every thing is attempting to drag us in and compete for our consideration, Anderson notes.

“So how can people really have alternative or autonomy after we — to cite Aldous Huxley — have this infinite urge for food for distraction,” Anderson mentioned. “We’ve spent the final 20 years dwelling in that tech panorama, and we predict AI must be serving to us make higher decisions, not simply grow to be hooked on our chatbots.”

This text was up to date to incorporate extra details about the workforce behind the benchmark and up to date benchmark statistics after evaluating for GPT-5.1.

Bought a delicate tip or confidential paperwork? We’re reporting on the inside workings of the AI business — from the businesses shaping its future to the folks impacted by their selections. Attain out to Rebecca Bellan at rebecca.bellan@techcrunch.com or Russell Brandom at russell.brandom@techcrunch.com. For safe communication, you possibly can contact them through Sign at @rebeccabellan.491 and russellbrandom.49.

Source link

A brand new AI benchmark assessments whether or not chatbots defend human well-being

Stanford Researchers Launch OpenJarvis: A Native-First Framework for Constructing On-System Private AI Brokers with Instruments, Reminiscence, and Studying

‘Uncanny Valley’: Anthropic’s DOD Lawsuit, Conflict Memes, and AI Coming for VC Jobs

Substack launches a built-in recording studio

A brand new AI benchmark assessments whether or not chatbots defend human well-being

Related Posts

Stanford Researchers Launch OpenJarvis: A Native-First Framework for Constructing On-System Private AI Brokers with Instruments, Reminiscence, and Studying

‘Uncanny Valley’: Anthropic’s DOD Lawsuit, Conflict Memes, and AI Coming for VC Jobs

Substack launches a built-in recording studio