AI Alignment Research

Alignment is the Secret to Human-AI Teamwork

A new study suggests that AI failure is often a "human-machine alignment" problem rather than a technical one. Researchers argue that for AI to be effective, companies must treat it as a developing ...

1 年

Exclusive: New Research Shows AI Strategically Lying

Experiments by Anthropic and Redwood Research show how Anthropic's model, Claude, is capable of strategic deceit ...

VentureBeat

AI models rank their own safety in OpenAI’s new alignment research

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI announced a new way to teach AI models to align with safety ...

2 年

An AI Pause Is Humanity’s Best Bet For Preventing Extinction

Constantly improving AI would create a positive feedback loop: an intelligence explosion. We would be no match for it.

The Verge

OpenAI’s new model is better at reasoning and, occasionally, deceiving

Posts from this topic will be added to your daily email digest and your homepage feed. Researchers found that o1 had a unique capacity to ‘scheme’ or ‘fake alignment.’ Researchers found that o1 had a ...

Geeky Gadgets

Alignment Faking : The Hidden Danger of Advanced AI Systems

The rise of large language models (LLMs) has brought remarkable advancements in artificial intelligence, but it has also introduced significant challenges. Among these is the issue of AI deceptive ...

Business Wire

New Report from Abnormal AI Shows Universal Alignment on AI as the Future of the SOC

LAS VEGAS--(BUSINESS WIRE)--Abnormal AI, the leader in AI-native human behavior security, today released a new research report, Human-Centered AI: Redefining the Modern SOC, revealing a rare consensus ...

Computer Weekly

UK AI alignment project gets OpenAI and Microsoft boost

OpenAI and Microsoft are the latest companies to back the UK’s AI Security Institute (AISI). The two firms have pledged support for the Alignment Project, an international effort to work towards ...

ZDNet

AI models know when they're being tested - and change their behavior, research shows

Several frontier AI models show signs of scheming. Anti-scheming training reduced misbehavior in some models. Models know they're being tested, which complicates results. New joint safety testing from ...

TMCnet

Proving AI Value Is the Defining Test for IT Leadership, Says Info-Tech Research Group in ...

CIOs across the UK and Europe are entering 2026 under mounting pressure to demonstrate measurable business value from technology investment as regulation tightens and economic conditions remain ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果