CertivizeAI
banner
certivizeai.bsky.social
CertivizeAI
@certivizeai.bsky.social
Agentic software testing for GenAI applications for risk identification.
https://www.certivize.me/
Pinned
Agentic systems based on large language models are prone to various liability issues due to their vulnerabilities. We analyzed different aspects of agent liability by considering existing agent failure modes in a principal-agent perspective.
arxiv.org/abs/2504.03255
Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective
Agentic systems powered by large language models (LLMs) are becoming progressively more complex and capable. Their increasing agency and expanding deployment settings attract growing attention over ef...
arxiv.org
Reposted by CertivizeAI
"More than 100 top artificial intelligence researchers have signed an open letter calling on generative AI companies to allow investigators access to their systems, arguing that opaque company rules are preventing them from safety-testing tools"
#Philtech
www.washingtonpost.com/technology/2...
Top AI researchers say OpenAI, Meta and more hinder independent evaluations
Firms like OpenAI and Meta use strict protocols to keep bad actors from abusing AI systems. But researchers argue these rules are chilling independent evaluations.
www.washingtonpost.com
March 11, 2024 at 12:15 AM
Reposted by CertivizeAI
Agentic systems based on large language models are prone to various liability issues due to their vulnerabilities. We analyzed different aspects of agent liability by considering existing agent failure modes in a principal-agent perspective.
arxiv.org/abs/2504.03255
Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective
Agentic systems powered by large language models (LLMs) are becoming progressively more complex and capable. Their increasing agency and expanding deployment settings attract growing attention over ef...
arxiv.org
April 17, 2025 at 10:16 PM
Reposted by CertivizeAI
The artificial intelligence race is heating up

https://go.nature.com/4jkn7kX
AI race in 2025 is tighter than ever before
State of the industry report also shows that 2024 was a breakthrough year for small, sleek models to rival the behemoths.
go.nature.com
April 7, 2025 at 10:16 AM
Well said!
It'd be nice to replace animal testing with AI simulation but

if AI was really able to do that

they would not have needed to first fire all the smart, experienced, and honest people at the FDA.
FDA plans to phase out animal testing requirement for drug testing and replace it with “AI-based computational models of toxicity” and organoid toxicity testing www.fda.gov/news-events/...
April 17, 2025 at 10:21 AM
Reposted by CertivizeAI
The massive staff cuts at #HHS hit many of the folks who kept HHS running, leaving #NIH labs hoarding dwindling supplies & trying to decide whether further demanded cuts should affect critical people or critical equipment. Destruction, not restructuring. www.statnews.com/2025/04/11/h...
Inside U.S. health agencies, workers confront chaos and questions as operations become unglued
The Trump administration’s remaking of HHS — in a matter of weeks — is sparking basic questions about how parts of the agency and those it oversees can continue to function.
www.statnews.com
April 11, 2025 at 6:24 PM
We definitely need to stipulate third-party evals and testing for this...including in areas not traditionally covered by FDA.
"Most of the medical artificial intelligence currently being used in the United States does not fall under FDA’s jurisdiction..."

Even if you knew nothing else about AI in medicine, that should be enough to make you want to scream.

hls.harvard.edu/today/ai-is-...
AI is transforming health care — and the law could help safeguard innovation and patients alike - Harvard Law School
Harvard Law Professor Glenn Cohen shares how artificial intelligence is changing medicine — and how the law can adapt.
hls.harvard.edu
April 17, 2025 at 9:39 AM
Reposted by CertivizeAI
Forgot to share this work here as well:

In medical settings, researchers & institutional users often cite medical examination benchmark results as a proxy for AI model performance.

This is a woefully misguided practice - as with actual doctors, exam performance does NOT equal clinical competence!
January 28, 2025 at 5:47 PM
FDA should definitely speed up on the genAI regulation and evaluation standards for medical softwares (aka. SaMDs).
April 17, 2025 at 9:32 AM
Finally set this up!!
April 17, 2025 at 9:31 AM