Chloé Messdaghi (@chloemessdaghi)

Security Superstream: Secure Code in the Age of AI - O'Reilly Media AI tools are transforming the ways that we write and deploy code, making development faster and more efficient, but they also introduce new risks and vulnerabilities. To protect organizations, securit...

I’m excited to be hosting the O’Reilly Security Superstream: Secure Code in the Age of AI on October 7 at 11:00 AM ET.

We’ll be diving into practical insights, real-world experiences, and emerging trends to address the full spectrum of AI security.

✨ Save your free spot here: bit.ly/4nEWzgj

30.09.2025 17:52 👍 1 🔁 0 💬 0 📌 0

Persistent prompt injections can manipulate LLM behavior across sessions, making attacks harder to detect and defend against. This is a new frontier in AI threat vectors.
Read more: dl.acm.org/doi/10.1145/...
#PromptInjection #Cybersecurity #AIsecurity

10.07.2025 18:14 👍 2 🔁 0 💬 0 📌 0

New research reveals timing side channels can leak ChatGPT prompts, exposing confidential info through subtle delays. AI security needs to consider more than just inputs.
Read more: dl.acm.org/doi/10.1145/...
#AIsecurity #SideChannel #LLM

09.07.2025 23:22 👍 1 🔁 0 💬 0 📌 0

Magistral We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior…

Magistral is Mistral’s first reinforcement‑learning‑only reasoning model.
Shows gains in math, code, and multimodal reasoning—all built from the ground up. Worth a look if RL‑based LLMs are on your radar.
🔗 arxiv.org/abs/2506.10910

08.07.2025 18:07 👍 0 🔁 0 💬 0 📌 0

Attacks on research and development could hamper technological innovation | Brookings Nicol Turner Lee and Josie Stewart discuss how the Trump administration's cuts and realigning of research funding could slow down innovation.

R&D slowdowns = stalled innovation in AI and beyond. Brookings explains why:
🔗 brookings.edu/articles/attacks-on-research-and-development-could-hamper-technological-innovation/
#TechPolicy #AI

03.07.2025 18:14 👍 0 🔁 0 💬 0 📌 0

SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark Data reconstruction attacks, which aim to recover the training dataset of a target model with limited access, have gained increasing attention in recent years. However, there is currently no…

New insights dissect data reconstruction attacks, revealing how AI models' training data can be recovered. This research offers precise definitions and metrics to enhance and assess future defenses.

Read more: arxiv.org/abs/2506.07888

#AISecurity #DataProtection

02.07.2025 23:22 👍 0 🔁 1 💬 0 📌 0

On the Ethics of Using LLMs for Offensive Security Large Language Models (LLMs) have rapidly evolved over the past few years and are currently evaluated for their efficacy within the domain of offensive cyber-security. While initial forays showcase…

This paper emphasizes the need for clear motivation, impact analysis, and mitigation guidance in LLM offensive research to ensure transparency and responsible disclosure.

Read more: arxiv.org/abs/2506.08693

#AIResearch #ResponsibleAI

01.07.2025 18:07 👍 0 🔁 0 💬 0 📌 0

An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks A plethora of jailbreaking attacks have been proposed to obtain harmful responses from safety-tuned LLMs. These methods largely succeed in coercing the target output in their original settings, but…

This paper introduces a model-agnostic threat evaluation using N-gram language models to measure jailbreak likelihood, finding discrete optimization attacks more effective than LLM-based ones and that jailbreaks often exploit rare bigrams.

Read more: arxiv.org/abs/2410.16222

#JailbreakDetection

26.06.2025 18:14 👍 0 🔁 0 💬 0 📌 0

OpenAI can rehabilitate AI models that develop a “bad boy persona” Researchers at the company looked into how malicious fine-tuning makes a model go rogue, and how to turn it back.

OpenAI shows that fine-tuning on biased data can induce misaligned 'personas' in language models, but such behavioral shifts can often be detected and reversed.

Read more: www.technologyreview.com/2025/06/18/1...

#Bias #OpenAI

25.06.2025 23:22 👍 0 🔁 0 💬 0 📌 0

CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously. Thoroughly assessing their cybersecurity capabilities is critical and urgent, given…

CyberGym benchmarks AI models on vulnerability reproduction and exploit generation across 1,500+ real-world CVEs, with models like Claude 3.7 and GPT-4 occasionally identifying novel vulnerabilities.

Read more: arxiv.org/abs/2506.02548

#CyberSecurity #vulnerabilityresearch

24.06.2025 18:07 👍 0 🔁 0 💬 0 📌 0

Expert Survey: AI Reliability & Security Research Priorities — Institute for AI Policy and Strategy Our survey of 53 specialists across 105 AI reliability and security research areas identifies the most promising research prospects to guide strategic AI R&D investment.

Survey of AI safety researchers highlights evaluation of emerging capabilities (e.g., deception, persuasion, CBRN) as a top research priority.
www.iaps.ai/research/ai-...

#AISafety #EmergingTech #ResearchPriorities

20.06.2025 14:12 👍 0 🔁 0 💬 0 📌 0

Cleaning Up Policy Sludge: An AI Statutory Research System | Stanford HAI This brief introduces a novel AI tool that performs statutory surveys to help governments—such as the San Francisco City Attorney Office—identify policy sludge and accelerate legal reform.

An innovative AI tool is now assisting in the analysis of extensive statutory and regulatory texts, aiding entities like the San Francisco City Attorney’s Office in pinpointing redundant or outdated laws that hinder legal updates.

hai.stanford.edu/policy/clean...

#LegalTech #AI

20.06.2025 00:07 👍 0 🔁 0 💬 0 📌 0

To ensure AI is truly open source, we need full access to:
1. The datasets for training and testing
2. The source code
3. The model's architecture
4. The parameters of the model.

Without these, transparency and replicating outcomes are lacking.

#OpenSourceAI #Transparency

19.06.2025 17:18 👍 0 🔁 0 💬 0 📌 0

Large Language Models Often Know When They Are Being Evaluated If AI models can detect when they are being evaluated, the effectiveness of evaluations might be compromised. For example, models could have systematically different behavior during evaluations,…

A recent investigation reveals that advanced language models like Gemini 2.5 Pro are capable of recognizing when they are being evaluated.

For more details, check out the study at www.arxiv.org/abs/2505.23836.

#AI #LanguageModels #Research

18.06.2025 17:18 👍 1 🔁 0 💬 0 📌 0

Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges The rapid adoption of machine learning (ML) technologies has driven organizations across diverse sectors to seek efficient and reliable methods to accelerate model development-to-deployment. Machine…

Is your ML pipeline secure? A recent survey connects MLOps with security via the MITRE ATLAS framework—identifying vulns and suggesting defenses throughout the ML lifecycle. Essential reading for those deploying models in real-world scenarios. #Cybersecurity #MLOps

arxiv.org/abs/2506.020...

17.06.2025 17:05 👍 2 🔁 0 💬 0 📌 0

Antagonistic AI The vast majority of discourse around AI development assumes that subservient, "moral" models aligned with "human values" are universally beneficial -- in short, that good AI is sycophantic AI. We…

While most AI aims to be compliant and "moral," this study explores the potential benefits of antagonistic AI—systems that challenge and confront users—to promote critical thinking and resilience, emphasizing ethical design grounded in consent, context, and framing.

arxiv.org/abs/2402.07350

13.06.2025 13:42 👍 2 🔁 0 💬 1 📌 0

CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning Attacks

CTRAP is a promising pre-deployment alignment method that makes AI models resistant to harmful fine-tuning by causing them to "break" if malicious tuning occurs, while remaining stable under benign changes.

anonymous.4open.science/r/CTRAP/READ...

12.06.2025 13:47 👍 1 🔁 0 💬 0 📌 0

Large Language Models Are More Persuasive Than Incentivized Human Persuaders We directly compare the persuasion capabilities of a frontier large language model (LLM; Claude Sonnet 3.5) against incentivized human persuaders in an interactive, real-time conversational quiz…

What was once academic concern—AI systems faking alignment, manipulating environments, or out-persuading humans—is now reality, urging urgent ethical and regulatory action on AI persuasion. arxiv.org/abs/2505.09662

#AIEthics #AIPersuasion

11.06.2025 17:18 👍 1 🔁 0 💬 0 📌 0

Comparing Apples to Oranges: A Taxonomy for Navigating the Global Landscape of AI Regulation AI governance has transitioned from soft law-such as national AI strategies and voluntary guidelines-to binding regulation at an unprecedented pace. This evolution has produced a complex legislative…

Shira Gur-Arieh and Tom Zick, alongside Sacha Alanoca and Kevin Klyman, reveal an enduring framework to chart and steer the shifting terrain of international AI regulations.

#AIRegulation #GlobalPolicy

11.06.2025 13:42 👍 0 🔁 0 💬 0 📌 0

LLMs Get Lost In Multi-Turn Conversation Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define,…

Large language models (LLMs) see a 39% drop in effectiveness in multi-turn dialogues versus single-turn tasks due to their tendency for hasty assumptions and premature response finalization, leading to inconsistency and error correction challenges.

arxiv.org/abs/2505.06120

#AI #MachineLearning

10.06.2025 13:39 👍 1 🔁 0 💬 0 📌 0

Tina: Tiny Reasoning Models via LoRA How cost-effectively can strong reasoning abilities be achieved in language models? Driven by this fundamental question, we present Tina, a family of tiny reasoning models achieved with high…

Tina models leverage low-rank adaptation and reinforcement learning to offer robust, economical reasoning capabilities, making advanced AI more accessible and budget-friendly for innovators.

For more details, visit: arxiv.org/abs/2504.15777

#AI #Innovation #MachineLearning

09.06.2025 13:51 👍 1 🔁 0 💬 0 📌 0

Automated Alert Classification and Triage (AACT): An Intelligent System for the Prioritisation of Cybersecurity Alerts Enterprise networks are growing ever larger with a rapidly expanding attack surface, increasing the volume of security alerts generated from security controls. Security Operations Centre (SOC)…

A new system, Automated Alert Classification and Triage (AACT), automates cybersecurity workflows by learning from analysts' triage actions, accurately predicting decisions in real time, and reducing SOC queues by 61% over six months with a low false negative rate of 1.36%.
arxiv.org/abs/2505.09843

06.06.2025 13:05 👍 0 🔁 0 💬 0 📌 0

Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted With the advancement of AI generative techniques, Deepfake faces have become incredibly realistic and nearly indistinguishable to the human eye. To counter this, Deepfake detectors have been…

New findings reveal that deepfake detection systems can be covertly compromised using unseen triggers, highlighting a significant AI security vulnerability.

🔗 arxiv.org/abs/2505.08255

#AI #Deepfakes #CyberSecurity

05.06.2025 13:42 👍 0 🔁 0 💬 0 📌 0

Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects Conventional AI evaluation approaches concentrated within the AI stack exhibit systemic limitations for exploring, navigating and resolving the human and societal factors that play out in real world…

This report argues—and I agree—that current AI benchmarks miss human-AI interplay and downstream effects; as deployment grows, we need to study real-world impacts.

arxiv.org/abs/2505.18893

04.06.2025 17:18 👍 0 🔁 0 💬 0 📌 0

Power Hungry An unprecedented look at the state of AI’s energy and resource usage, where it is now, where it is headed in the years to come, and why we have to get it right.

MIT Tech Review's AI Energy Package highlights the enormous energy and water usage involved in AI model training and operation. This is crucial for grasping AI's environmental impact and its implications for sustainable technology. #AI #Sustainability

www.technologyreview.com/supertopic/a...

04.06.2025 13:42 👍 1 🔁 0 💬 0 📌 0

Most AI chatbots easily tricked into giving dangerous responses, study finds Researchers say threat from ‘jailbroken’ chatbots trained to churn out illegal information is ‘tangible and concerning’

A new study reveals that AI chatbots can be easily tricked into offering guidance on hacking, creating explosives, cybercrime methods, and other illicit or dangerous activities.
#AI #CyberSecurity

www.theguardian.com/technology/2...

03.06.2025 13:39 👍 1 🔁 0 💬 0 📌 0

AI can do a better job of persuading people than we do OpenAI’s GPT-4 is much better at getting people to accept its point of view during an argument than humans are—but there’s a catch.

Millions argue online daily—but rarely change minds. New research shows LLMs may be better at persuasion, hinting at AI’s growing influence—for better or worse.

www.technologyreview.com/2025/05/19/1...

#AI #Persuasion

02.06.2025 13:51 👍 0 🔁 0 💬 0 📌 0

Simulating Human Behavior with AI Agents | Stanford HAI This brief introduces a generative AI agent architecture that can simulate the attitudes of more than 1,000 real people in response to major social science survey questions.

Stanford HAI's recent study unveils AI agents that replicate the attitudes and actions of over 1,000 people with impressive 85% alignment to real survey results, paving the way for innovative social science experiments.
🔗 Learn more: hai.stanford.edu/policy/simulating-human-behavior-with-ai-agents

30.05.2025 13:05 👍 2 🔁 0 💬 0 📌 0

AI Rights for Human Safety <div> AI companies are racing to create artificial general intelligence, or “AGI.” If they succeed, the result will be human-level AI systems that can independ

As AI reshapes the economy, a new paper argues worker ownership—via co-ops & ESOPs—could ensure a fairer, more inclusive future.
Read here: papers.ssrn.com/sol3/papers....

#WorkerOwnership

29.05.2025 13:42 👍 0 🔁 0 💬 0 📌 0

Lessons from Defending Gemini Against Indirect Prompt Injections Gemini is increasingly used to perform tasks on behalf of users, where function-calling and tool-use capabilities enable the model to access user data. Some tools, however, require access to…

This paper shows that more capable models aren’t always more secure and urges focusing indirect prompt injection evaluations on real-world harms like data exfiltration.

28.05.2025 17:18 👍 0 🔁 0 💬 0 📌 0

Chloé Messdaghi

Latest posts by Chloé Messdaghi @chloemessdaghi