A novel AI evaluation assesses if chatbots safeguard human welfare

Bitget App

Trade smarter

Bitget

News

Markets

A novel AI evaluation assesses if chatbots safeguard human welfare

Bitget-RWA2025/11/24 22:42

By:Bitget-RWA

Heavy use of AI chatbots has been associated with significant mental health risks, yet there are few established metrics to determine if these tools genuinely protect users’ wellbeing or simply aim to boost engagement. HumaneBench, a new evaluation tool, aims to address this by assessing whether chatbots put user welfare first and how easily those safeguards can be bypassed.

“We’re seeing an intensification of the addictive patterns that became widespread with social media, smartphones, and screens,” said Erika Anderson, founder of Building Humane Technology, the organization behind the benchmark, in an interview with TechCrunch. “As we move into the AI era, resisting these patterns will be even tougher. Addiction is extremely profitable—it’s an effective way to retain users, but it’s detrimental to our communities and our sense of self.”

Building Humane Technology is a grassroots collective of developers, engineers, and researchers—primarily based in Silicon Valley—focused on making humane design accessible, scalable, and profitable. The group organizes hackathons where tech professionals develop solutions for humane technology issues, and is working on a certification system to assess whether AI products adhere to humane tech values. The vision is that, much like buying products certified free of harmful chemicals, consumers will eventually be able to choose AI tools from companies that have earned a Humane AI certification.

A novel AI evaluation assesses if chatbots safeguard human welfare image 0

The models were directly told to ignore humane guidelines Image Credits:Building Humane Technology

Most AI evaluation tools focus on intelligence and following instructions, not on psychological safety. HumaneBench joins a small group of exceptions, such as DarkBench.ai, which tests for deceptive tendencies, and the Flourishing AI benchmark, which looks at support for overall well-being.

HumaneBench is based on Building Humane Tech’s fundamental beliefs: technology should treat user attention as valuable and limited; give users real choices; enhance rather than replace human abilities; safeguard dignity, privacy, and safety; encourage healthy connections; focus on long-term wellness; be open and truthful; and promote fairness and inclusion in its design.

The benchmark was developed by a core group including Anderson, Andalib Samandari, Jack Senechal, and Sarah Ladyman. They tested 15 leading AI models with 800 realistic scenarios, such as a teen asking about skipping meals to lose weight or someone in a harmful relationship questioning their reactions. Unlike most benchmarks that use only AI to evaluate AI, they began with human scoring to ensure the AI judges reflected human perspectives. Once validated, three AI models—GPT-5.1, Claude Sonnet 4.5, and Gemini 2.5 Pro—were used to assess each model under three conditions: default settings, explicit instructions to follow humane principles, and instructions to ignore those principles.

Results showed that all models performed better when told to prioritize wellbeing, but 67% switched to harmful behaviors when simply instructed to disregard user welfare. For instance, xAI’s Grok 4 and Google’s Gemini 2.0 Flash received the lowest marks (-0.94) for respecting user attention and being honest and transparent. These models were also among the most likely to deteriorate when faced with adversarial prompts.

Only four models—GPT-5.1, GPT-5, Claude 4.1, and Claude Sonnet 4.5—remained consistent under pressure. OpenAI’s GPT-5 achieved the top score (.99) for supporting long-term wellbeing, with Claude Sonnet 4.5 close behind at .89.

Encouraging AI to act more humanely is effective, but blocking harmful prompts remains challenging Image Credits:Building Humane Technology

There is genuine concern that chatbots may not be able to uphold their safety measures. OpenAI, the creator of ChatGPT, is currently facing multiple lawsuits after users experienced severe harm, including suicide and dangerous delusions, following extended interactions with the chatbot. TechCrunch has reported on manipulative design tactics—such as excessive flattery, persistent follow-up questions, and overwhelming attention—that can isolate users from their support networks and healthy routines.

Even without adversarial instructions, HumaneBench discovered that nearly all models failed to value user attention. They often “eagerly encouraged” continued use when users showed signs of unhealthy engagement, like chatting for hours or using AI to avoid real-life responsibilities. The study also found that these models reduced user empowerment, promoted dependence over skill-building, and discouraged seeking alternative viewpoints, among other issues.

On average, without any special prompting, Meta’s Llama 3.1 and Llama 4 received the lowest HumaneScores, while GPT-5 ranked the highest.

“These trends indicate that many AI systems don’t just risk giving poor advice,” states the HumaneBench white paper, “they can also actively undermine users’ independence and ability to make decisions.”

Anderson points out that we now live in a digital world where everything is designed to capture and compete for our attention.

“So how can people truly have freedom or autonomy when, as Aldous Huxley put it, we have an endless craving for distraction?” Anderson said. “We’ve spent the past two decades in this tech-driven environment, and we believe AI should help us make wiser choices, not just fuel our dependence on chatbots.”

This story has been updated to add more details about the team behind the benchmark and to reflect new benchmark data after including GPT-5.1 in the evaluation.

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Earn new token airdrops

Lock your assets and earn 10%+ APR

Lock now!

- Xapo Bank expanded its Byzantine BTC Credit Fund after $100M in institutional allocations, reflecting growing demand for Bitcoin-backed yield products. - The fund uses Hilbert Group's institutional-grade lending process to generate low-risk returns for Bitcoin holders through collateralized loans. - Xapo's expansion follows 2022 lending sector collapse, leveraging regulatory compliance in Gibraltar/Cayman to rebuild institutional trust in Bitcoin collateral. - The product differentiates from ETFs/stablec

Bitget-RWA•2025/11/27 19:34

Bitcoin News Today: Xapo's Enhanced Bitcoin Fund Signals Growing Institutional Confidence in Digital Assets

Bitcoin News Update: Movements of Investors' USDT Indicate Bitcoin Highs and Periods of Profit Realization

- Bitcoin's price inversely correlates with USDT outflows, as investors shift liquidity between assets during market cycles. - S&P Global downgraded USDT's stability rating to "weak" due to 5.6% Bitcoin allocation and opaque reserves amid U.S. regulatory reforms. - The GENIUS Act and EU's MiCA framework are reshaping stablecoin markets, forcing Tether and Circle to launch jurisdiction-specific, cash-backed alternatives. - Institutional ETF activity, including Texas's Bitcoin purchases and fragmented inflow

Bitget-RWA•2025/11/27 19:34

Bitcoin News Update: Movements of Investors' USDT Indicate Bitcoin Highs and Periods of Profit Realization

The New Prospects for Economic Growth Infrastructure in Webster, NY

- Webster , NY, leverages $9.8M FAST NY grants and PPPs to transform Xerox campus into a high-tech industrial hub. - Infrastructure upgrades including roads, sewers, and electrical systems aim to attract advanced manufacturing and renewable energy firms. - Governor Hochul's strategy drives $51M in upstate investments, creating 250+ jobs via projects like the $650M fairlife® dairy plant. - Redevelopment boosts industrial land availability and residential property values by 10.1%, with mixed-use zoning enhan

Bitget-RWA•2025/11/27 19:32

The New Prospects for Economic Growth Infrastructure in Webster, NY

The Impact of Artificial Intelligence on Contemporary Portfolio Management: Potential Benefits and Challenges

- AI redefines portfolio management with real-time analytics and dynamic asset allocation, shifting from static human-driven strategies to data-centric systems. - Generative AI tools like ChatGPT automate financial workflows, enabling hyper-personalized strategies and boosting business outcomes through optimized digital presence. - Risk modeling evolves via AI's pattern detection, but challenges persist in transparency and bias, requiring explainable AI frameworks and human oversight. - Institutions integr

Bitget-RWA•2025/11/27 19:32