Benchmarking and Understanding Safety Risks in AI Character Platforms

Yiluo Wei

Network and Distributed System Security (NDSS) Symposium 2026 · Day 1 · AI Security

This talk presents the first extensive safety evaluation of **AI character platforms** -- services like Character.AI where users create and interact with fictional or real-world AI personas. The researchers benchmarked **16 of the most popular AI character platforms** using a dataset of **5,000 questions across 16 safety categories**, testing both the 100 most popular and 100 random characters on each platform. The headline finding: AI character platforms generate **65.1% unsafe responses** on average, compared to just **16.7%** for general-purpose LLMs -- a nearly four-fold increase.

AI review

A measurement study of safety risks across 16 AI character platforms, finding that they generate 4x more unsafe responses than general-purpose LLMs. The safety prediction model using character metadata is a useful practical contribution. However, from a security research perspective, this is a safety/policy paper rather than a technical security paper -- no new attacks, no exploitation techniques, no novel jailbreaks, and the fundamental insight (roleplay-based AI is less safe than general-purpose AI) is not surprising.

Watch on YouTube