Lightnews — Scholar-powered news

Chloé Messdaghi

@chloemessdaghi.bsky.social

I’m excited to be hosting the O’Reilly Security Superstream: Secure Code in the Age of AI on October 7 at 11:00 AM ET.

We’ll be diving into practical insights, real-world experiences, and emerging trends to address the full spectrum of AI security.

✨ Save your free spot here: bit.ly/4nEWzgj

Security Superstream: Secure Code in the Age of AI - O'Reilly Media

AI tools are transforming the ways that we write and deploy code, making development faster and more efficient, but they also introduce new risks and vulnerabilities. To protect organizations, securit...

bit.ly

September 30, 2025 at 5:52 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

Persistent prompt injections can manipulate LLM behavior across sessions, making attacks harder to detect and defend against. This is a new frontier in AI threat vectors.
Read more: dl.acm.org/doi/10.1145/...
#PromptInjection #Cybersecurity #AIsecurity

July 10, 2025 at 6:14 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

New research reveals timing side channels can leak ChatGPT prompts, exposing confidential info through subtle delays. AI security needs to consider more than just inputs.
Read more: dl.acm.org/doi/10.1145/...
#AIsecurity #SideChannel #LLM

July 9, 2025 at 11:22 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

Magistral is Mistral’s first reinforcement‑learning‑only reasoning model.
Shows gains in math, code, and multimodal reasoning—all built from the ground up. Worth a look if RL‑based LLMs are on your radar.
🔗 arxiv.org/abs/2506.10910

Magistral

We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior…

arxiv.org

July 8, 2025 at 6:07 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

R&D slowdowns = stalled innovation in AI and beyond. Brookings explains why:
🔗 brookings.edu/articles/attacks-on-research-and-development-could-hamper-technological-innovation/
#TechPolicy #AI

Attacks on research and development could hamper technological innovation | Brookings

Nicol Turner Lee and Josie Stewart discuss how the Trump administration's cuts and realigning of research funding could slow down innovation.

brookings.edu

July 3, 2025 at 6:14 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

New insights dissect data reconstruction attacks, revealing how AI models' training data can be recovered. This research offers precise definitions and metrics to enhance and assess future defenses.

Read more: arxiv.org/abs/2506.07888

#AISecurity #DataProtection

SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark

Data reconstruction attacks, which aim to recover the training dataset of a target model with limited access, have gained increasing attention in recent years. However, there is currently no…

arxiv.org

July 2, 2025 at 11:22 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

This paper emphasizes the need for clear motivation, impact analysis, and mitigation guidance in LLM offensive research to ensure transparency and responsible disclosure.

Read more: arxiv.org/abs/2506.08693

#AIResearch #ResponsibleAI

On the Ethics of Using LLMs for Offensive Security

Large Language Models (LLMs) have rapidly evolved over the past few years and are currently evaluated for their efficacy within the domain of offensive cyber-security. While initial forays showcase…

arxiv.org

July 1, 2025 at 6:07 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

This paper introduces a model-agnostic threat evaluation using N-gram language models to measure jailbreak likelihood, finding discrete optimization attacks more effective than LLM-based ones and that jailbreaks often exploit rare bigrams.

Read more: arxiv.org/abs/2410.16222

#JailbreakDetection

An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks

A plethora of jailbreaking attacks have been proposed to obtain harmful responses from safety-tuned LLMs. These methods largely succeed in coercing the target output in their original settings, but…

arxiv.org

June 26, 2025 at 6:14 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

OpenAI shows that fine-tuning on biased data can induce misaligned 'personas' in language models, but such behavioral shifts can often be detected and reversed.

Read more: www.technologyreview.com/2025/06/18/1...

#Bias #OpenAI

OpenAI can rehabilitate AI models that develop a “bad boy persona”

Researchers at the company looked into how malicious fine-tuning makes a model go rogue, and how to turn it back.

www.technologyreview.com

June 25, 2025 at 11:22 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

CyberGym benchmarks AI models on vulnerability reproduction and exploit generation across 1,500+ real-world CVEs, with models like Claude 3.7 and GPT-4 occasionally identifying novel vulnerabilities.

Read more: arxiv.org/abs/2506.02548

#CyberSecurity #vulnerabilityresearch

CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale

Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously. Thoroughly assessing their cybersecurity capabilities is critical and urgent, given…

arxiv.org

June 24, 2025 at 6:07 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

Survey of AI safety researchers highlights evaluation of emerging capabilities (e.g., deception, persuasion, CBRN) as a top research priority.
www.iaps.ai/research/ai-...

#AISafety #EmergingTech #ResearchPriorities

Expert Survey: AI Reliability & Security Research Priorities — Institute for AI Policy and Strategy

Our survey of 53 specialists across 105 AI reliability and security research areas identifies the most promising research prospects to guide strategic AI R&D investment.

www.iaps.ai

June 20, 2025 at 2:12 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

An innovative AI tool is now assisting in the analysis of extensive statutory and regulatory texts, aiding entities like the San Francisco City Attorney’s Office in pinpointing redundant or outdated laws that hinder legal updates.

hai.stanford.edu/policy/clean...

#LegalTech #AI

Cleaning Up Policy Sludge: An AI Statutory Research System | Stanford HAI

This brief introduces a novel AI tool that performs statutory surveys to help governments—such as the San Francisco City Attorney Office—identify policy sludge and accelerate legal reform.

hai.stanford.edu

June 20, 2025 at 12:07 AM

Chloé Messdaghi

@chloemessdaghi.bsky.social

To ensure AI is truly open source, we need full access to:
1. The datasets for training and testing
2. The source code
3. The model's architecture
4. The parameters of the model.

Without these, transparency and replicating outcomes are lacking.

#OpenSourceAI #Transparency

June 19, 2025 at 5:18 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

A recent investigation reveals that advanced language models like Gemini 2.5 Pro are capable of recognizing when they are being evaluated.

For more details, check out the study at www.arxiv.org/abs/2505.23836.

#AI #LanguageModels #Research

Large Language Models Often Know When They Are Being Evaluated

If AI models can detect when they are being evaluated, the effectiveness of evaluations might be compromised. For example, models could have systematically different behavior during evaluations,…

www.arxiv.org

June 18, 2025 at 5:18 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

Is your ML pipeline secure? A recent survey connects MLOps with security via the MITRE ATLAS framework—identifying vulns and suggesting defenses throughout the ML lifecycle. Essential reading for those deploying models in real-world scenarios. #Cybersecurity #MLOps

arxiv.org/abs/2506.020...

Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges

The rapid adoption of machine learning (ML) technologies has driven organizations across diverse sectors to seek efficient and reliable methods to accelerate model development-to-deployment. Machine…

arxiv.org

June 17, 2025 at 5:05 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

While most AI aims to be compliant and "moral," this study explores the potential benefits of antagonistic AI—systems that challenge and confront users—to promote critical thinking and resilience, emphasizing ethical design grounded in consent, context, and framing.

arxiv.org/abs/2402.07350

Antagonistic AI

The vast majority of discourse around AI development assumes that subservient, "moral" models aligned with "human values" are universally beneficial -- in short, that good AI is sycophantic AI. We…

arxiv.org

June 13, 2025 at 1:42 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

CTRAP is a promising pre-deployment alignment method that makes AI models resistant to harmful fine-tuning by causing them to "break" if malicious tuning occurs, while remaining stable under benign changes.

anonymous.4open.science/r/CTRAP/READ...

CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning Attacks

anonymous.4open.science

June 12, 2025 at 1:47 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

What was once academic concern—AI systems faking alignment, manipulating environments, or out-persuading humans—is now reality, urging urgent ethical and regulatory action on AI persuasion. arxiv.org/abs/2505.09662

#AIEthics #AIPersuasion

Large Language Models Are More Persuasive Than Incentivized Human Persuaders

We directly compare the persuasion capabilities of a frontier large language model (LLM; Claude Sonnet 3.5) against incentivized human persuaders in an interactive, real-time conversational quiz…

arxiv.org

June 11, 2025 at 5:18 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

Shira Gur-Arieh and Tom Zick, alongside Sacha Alanoca and Kevin Klyman, reveal an enduring framework to chart and steer the shifting terrain of international AI regulations.

#AIRegulation #GlobalPolicy

Comparing Apples to Oranges: A Taxonomy for Navigating the Global Landscape of AI Regulation

AI governance has transitioned from soft law-such as national AI strategies and voluntary guidelines-to binding regulation at an unprecedented pace. This evolution has produced a complex legislative…

arxiv.org

June 11, 2025 at 1:42 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

Large language models (LLMs) see a 39% drop in effectiveness in multi-turn dialogues versus single-turn tasks due to their tendency for hasty assumptions and premature response finalization, leading to inconsistency and error correction challenges.

arxiv.org/abs/2505.06120

#AI #MachineLearning

LLMs Get Lost In Multi-Turn Conversation

Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define,…

arxiv.org

June 10, 2025 at 1:39 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

Tina models leverage low-rank adaptation and reinforcement learning to offer robust, economical reasoning capabilities, making advanced AI more accessible and budget-friendly for innovators.

For more details, visit: arxiv.org/abs/2504.15777

#AI #Innovation #MachineLearning

Tina: Tiny Reasoning Models via LoRA

How cost-effectively can strong reasoning abilities be achieved in language models? Driven by this fundamental question, we present Tina, a family of tiny reasoning models achieved with high…

arxiv.org

June 9, 2025 at 1:51 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

A new system, Automated Alert Classification and Triage (AACT), automates cybersecurity workflows by learning from analysts' triage actions, accurately predicting decisions in real time, and reducing SOC queues by 61% over six months with a low false negative rate of 1.36%.
arxiv.org/abs/2505.09843

Automated Alert Classification and Triage (AACT): An Intelligent System for the Prioritisation of Cybersecurity Alerts

Enterprise networks are growing ever larger with a rapidly expanding attack surface, increasing the volume of security alerts generated from security controls. Security Operations Centre (SOC)…

arxiv.org

June 6, 2025 at 1:05 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

New findings reveal that deepfake detection systems can be covertly compromised using unseen triggers, highlighting a significant AI security vulnerability.

🔗 arxiv.org/abs/2505.08255

#AI #Deepfakes #CyberSecurity

Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted

With the advancement of AI generative techniques, Deepfake faces have become incredibly realistic and nearly indistinguishable to the human eye. To counter this, Deepfake detectors have been…

arxiv.org

June 5, 2025 at 1:42 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

This report argues—and I agree—that current AI benchmarks miss human-AI interplay and downstream effects; as deployment grows, we need to study real-world impacts.

arxiv.org/abs/2505.18893

Reality Check: A New Evaluation Ecosystem Is Necessary to Understand AI's Real World Effects

Conventional AI evaluation approaches concentrated within the AI stack exhibit systemic limitations for exploring, navigating and resolving the human and societal factors that play out in real world…

arxiv.org

June 4, 2025 at 5:18 PM

Chloé Messdaghi

@chloemessdaghi.bsky.social

MIT Tech Review's AI Energy Package highlights the enormous energy and water usage involved in AI model training and operation. This is crucial for grasping AI's environmental impact and its implications for sustainable technology. #AI #Sustainability

www.technologyreview.com/supertopic/a...

Power Hungry

An unprecedented look at the state of AI’s energy and resource usage, where it is now, where it is headed in the years to come, and why we have to get it right.

www.technologyreview.com

June 4, 2025 at 1:42 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news