Beyond English Safety: Measuring Behavioral Risk in Multilingual & Code-Switched LLMs
Cohere Scholars Presentation
Presented for Cohere, I develop a risk-science framework for evaluating and hardening multilingual LLMs. It moves beyond static refusal benchmarks to measure real behavioral risks, including the portability of jailbreaks across languages, the persistence (or decay) of safety patches.