Chloé Messdaghi
banner
chloemessdaghi.bsky.social
Chloé Messdaghi
@chloemessdaghi.bsky.social
Advisor on AI Governance & Cybersecurity | Strategic Counsel on Risk, Oversight & Institutional Readiness | Named a Power Player by Business Insider & SC Media

https://www.chloemessdaghi.com
Pinned
For AI to be truly open source, we need full visibility into:

1. The data it was trained and evaluated on
2. The source code
3. The model architecture
4. The model weights

Without all four, transparency and reproducibility are incomplete.
I’m excited to be hosting the O’Reilly Security Superstream: Secure Code in the Age of AI on October 7 at 11:00 AM ET.

We’ll be diving into practical insights, real-world experiences, and emerging trends to address the full spectrum of AI security.

✨ Save your free spot here: bit.ly/4nEWzgj
Security Superstream: Secure Code in the Age of AI - O'Reilly Media
AI tools are transforming the ways that we write and deploy code, making development faster and more efficient, but they also introduce new risks and vulnerabilities. To protect organizations, securit...
bit.ly
September 30, 2025 at 5:52 PM
Persistent prompt injections can manipulate LLM behavior across sessions, making attacks harder to detect and defend against. This is a new frontier in AI threat vectors.
Read more: dl.acm.org/doi/10.1145/...
#PromptInjection #Cybersecurity #AIsecurity
July 10, 2025 at 6:14 PM
New research reveals timing side channels can leak ChatGPT prompts, exposing confidential info through subtle delays. AI security needs to consider more than just inputs.
Read more: dl.acm.org/doi/10.1145/...
#AIsecurity #SideChannel #LLM
July 9, 2025 at 11:22 PM
Magistral is Mistral’s first reinforcement‑learning‑only reasoning model.
Shows gains in math, code, and multimodal reasoning—all built from the ground up. Worth a look if RL‑based LLMs are on your radar.
🔗 arxiv.org/abs/2506.10910
Magistral
We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior…
arxiv.org
July 8, 2025 at 6:07 PM
New insights dissect data reconstruction attacks, revealing how AI models' training data can be recovered. This research offers precise definitions and metrics to enhance and assess future defenses.

Read more: arxiv.org/abs/2506.07888

#AISecurity #DataProtection
SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark
Data reconstruction attacks, which aim to recover the training dataset of a target model with limited access, have gained increasing attention in recent years. However, there is currently no…
arxiv.org
July 2, 2025 at 11:22 PM
This paper emphasizes the need for clear motivation, impact analysis, and mitigation guidance in LLM offensive research to ensure transparency and responsible disclosure.

Read more: arxiv.org/abs/2506.08693

#AIResearch #ResponsibleAI
On the Ethics of Using LLMs for Offensive Security
Large Language Models (LLMs) have rapidly evolved over the past few years and are currently evaluated for their efficacy within the domain of offensive cyber-security. While initial forays showcase…
arxiv.org
July 1, 2025 at 6:07 PM
This paper introduces a model-agnostic threat evaluation using N-gram language models to measure jailbreak likelihood, finding discrete optimization attacks more effective than LLM-based ones and that jailbreaks often exploit rare bigrams.

Read more: arxiv.org/abs/2410.16222

#JailbreakDetection
An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks
A plethora of jailbreaking attacks have been proposed to obtain harmful responses from safety-tuned LLMs. These methods largely succeed in coercing the target output in their original settings, but…
arxiv.org
June 26, 2025 at 6:14 PM
OpenAI shows that fine-tuning on biased data can induce misaligned 'personas' in language models, but such behavioral shifts can often be detected and reversed.

Read more: www.technologyreview.com/2025/06/18/1...

#Bias #OpenAI
OpenAI can rehabilitate AI models that develop a “bad boy persona”
Researchers at the company looked into how malicious fine-tuning makes a model go rogue, and how to turn it back.
www.technologyreview.com
June 25, 2025 at 11:22 PM
CyberGym benchmarks AI models on vulnerability reproduction and exploit generation across 1,500+ real-world CVEs, with models like Claude 3.7 and GPT-4 occasionally identifying novel vulnerabilities.

Read more: arxiv.org/abs/2506.02548

#CyberSecurity #vulnerabilityresearch
CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously. Thoroughly assessing their cybersecurity capabilities is critical and urgent, given…
arxiv.org
June 24, 2025 at 6:07 PM
Survey of AI safety researchers highlights evaluation of emerging capabilities (e.g., deception, persuasion, CBRN) as a top research priority.
www.iaps.ai/research/ai-...

#AISafety #EmergingTech #ResearchPriorities
Expert Survey: AI Reliability & Security Research Priorities — Institute for AI Policy and Strategy
Our survey of 53 specialists across 105 AI reliability and security research areas identifies the most promising research prospects to guide strategic AI R&D investment.
www.iaps.ai
June 20, 2025 at 2:12 PM
An innovative AI tool is now assisting in the analysis of extensive statutory and regulatory texts, aiding entities like the San Francisco City Attorney’s Office in pinpointing redundant or outdated laws that hinder legal updates.

hai.stanford.edu/policy/clean...

#LegalTech #AI
Cleaning Up Policy Sludge: An AI Statutory Research System | Stanford HAI
This brief introduces a novel AI tool that performs statutory surveys to help governments—such as the San Francisco City Attorney Office—identify policy sludge and accelerate legal reform.
hai.stanford.edu
June 20, 2025 at 12:07 AM
To ensure AI is truly open source, we need full access to:
1. The datasets for training and testing
2. The source code
3. The model's architecture
4. The parameters of the model.

Without these, transparency and replicating outcomes are lacking.

#OpenSourceAI #Transparency
June 19, 2025 at 5:18 PM
A recent investigation reveals that advanced language models like Gemini 2.5 Pro are capable of recognizing when they are being evaluated.

For more details, check out the study at www.arxiv.org/abs/2505.23836.

#AI #LanguageModels #Research
Large Language Models Often Know When They Are Being Evaluated
If AI models can detect when they are being evaluated, the effectiveness of evaluations might be compromised. For example, models could have systematically different behavior during evaluations,…
www.arxiv.org
June 18, 2025 at 5:18 PM
Is your ML pipeline secure? A recent survey connects MLOps with security via the MITRE ATLAS framework—identifying vulns and suggesting defenses throughout the ML lifecycle. Essential reading for those deploying models in real-world scenarios. #Cybersecurity #MLOps

arxiv.org/abs/2506.020...
Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges
The rapid adoption of machine learning (ML) technologies has driven organizations across diverse sectors to seek efficient and reliable methods to accelerate model development-to-deployment. Machine…
arxiv.org
June 17, 2025 at 5:05 PM
While most AI aims to be compliant and "moral," this study explores the potential benefits of antagonistic AI—systems that challenge and confront users—to promote critical thinking and resilience, emphasizing ethical design grounded in consent, context, and framing.

arxiv.org/abs/2402.07350
Antagonistic AI
The vast majority of discourse around AI development assumes that subservient, "moral" models aligned with "human values" are universally beneficial -- in short, that good AI is sycophantic AI. We…
arxiv.org
June 13, 2025 at 1:42 PM
CTRAP is a promising pre-deployment alignment method that makes AI models resistant to harmful fine-tuning by causing them to "break" if malicious tuning occurs, while remaining stable under benign changes.

anonymous.4open.science/r/CTRAP/READ...
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning Attacks
anonymous.4open.science
June 12, 2025 at 1:47 PM
What was once academic concern—AI systems faking alignment, manipulating environments, or out-persuading humans—is now reality, urging urgent ethical and regulatory action on AI persuasion. arxiv.org/abs/2505.09662

#AIEthics #AIPersuasion
Large Language Models Are More Persuasive Than Incentivized Human Persuaders
We directly compare the persuasion capabilities of a frontier large language model (LLM; Claude Sonnet 3.5) against incentivized human persuaders in an interactive, real-time conversational quiz…
arxiv.org
June 11, 2025 at 5:18 PM
Shira Gur-Arieh and Tom Zick, alongside Sacha Alanoca and Kevin Klyman, reveal an enduring framework to chart and steer the shifting terrain of international AI regulations.

#AIRegulation #GlobalPolicy
Comparing Apples to Oranges: A Taxonomy for Navigating the Global Landscape of AI Regulation
AI governance has transitioned from soft law-such as national AI strategies and voluntary guidelines-to binding regulation at an unprecedented pace. This evolution has produced a complex legislative…
arxiv.org
June 11, 2025 at 1:42 PM
Large language models (LLMs) see a 39% drop in effectiveness in multi-turn dialogues versus single-turn tasks due to their tendency for hasty assumptions and premature response finalization, leading to inconsistency and error correction challenges.

arxiv.org/abs/2505.06120

#AI #MachineLearning
LLMs Get Lost In Multi-Turn Conversation
Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define,…
arxiv.org
June 10, 2025 at 1:39 PM
Tina models leverage low-rank adaptation and reinforcement learning to offer robust, economical reasoning capabilities, making advanced AI more accessible and budget-friendly for innovators.

For more details, visit: arxiv.org/abs/2504.15777

#AI #Innovation #MachineLearning
Tina: Tiny Reasoning Models via LoRA
How cost-effectively can strong reasoning abilities be achieved in language models? Driven by this fundamental question, we present Tina, a family of tiny reasoning models achieved with high…
arxiv.org
June 9, 2025 at 1:51 PM
A new system, Automated Alert Classification and Triage (AACT), automates cybersecurity workflows by learning from analysts' triage actions, accurately predicting decisions in real time, and reducing SOC queues by 61% over six months with a low false negative rate of 1.36%.
arxiv.org/abs/2505.09843
Automated Alert Classification and Triage (AACT): An Intelligent System for the Prioritisation of Cybersecurity Alerts
Enterprise networks are growing ever larger with a rapidly expanding attack surface, increasing the volume of security alerts generated from security controls. Security Operations Centre (SOC)…
arxiv.org
June 6, 2025 at 1:05 PM
New findings reveal that deepfake detection systems can be covertly compromised using unseen triggers, highlighting a significant AI security vulnerability.

🔗 arxiv.org/abs/2505.08255

#AI #Deepfakes #CyberSecurity
Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted
With the advancement of AI generative techniques, Deepfake faces have become incredibly realistic and nearly indistinguishable to the human eye. To counter this, Deepfake detectors have been…
arxiv.org
June 5, 2025 at 1:42 PM
This report argues—and I agree—that current AI benchmarks miss human-AI interplay and downstream effects; as deployment grows, we need to study real-world impacts.

arxiv.org/abs/2505.18893
Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects
Conventional AI evaluation approaches concentrated within the AI stack exhibit systemic limitations for exploring, navigating and resolving the human and societal factors that play out in real world…
arxiv.org
June 4, 2025 at 5:18 PM
MIT Tech Review's AI Energy Package highlights the enormous energy and water usage involved in AI model training and operation. This is crucial for grasping AI's environmental impact and its implications for sustainable technology. #AI #Sustainability

www.technologyreview.com/supertopic/a...
Power Hungry
An unprecedented look at the state of AI’s energy and resource usage, where it is now, where it is headed in the years to come, and why we have to get it right.
www.technologyreview.com
June 4, 2025 at 1:42 PM