Lightnews — Scholar-powered news

arize-phoenix

@arize-phoenix.bsky.social

Run evals fast with our TypeScript Evals Quickstart!

The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇

November 26, 2025 at 5:00 PM

arize-phoenix

@arize-phoenix.bsky.social

🚀 New feature: Dataset Splits 🚀

Splits let you define named subsets of your dataset & filter your experiments to run only on those subsets.

Learn more & Check out this walkthrough:
⚪️ Create a split directly in the Phoenix UI
⚪️ Run an experiment scoped to that subset

👉 Full demo + code below 👇

Harnessing Splits in your Dataset with Arize Phoenix

YouTube video by Arize AI

youtu.be

November 25, 2025 at 7:39 PM

arize-phoenix

@arize-phoenix.bsky.social

Dig into agent traces without a single line of code!

Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.

November 20, 2025 at 3:25 PM

arize-phoenix

@arize-phoenix.bsky.social

New Evals for TypeScript agent builders 🔥

With Mastra now integrating directly with Phoenix, you can trace your TypeScript agents with almost zero friction.

And now… you can evaluate them too: directly from TypeScript using Phoenix Evals.

November 13, 2025 at 7:21 PM

arize-phoenix

@arize-phoenix.bsky.social

🌀 Since LLMs are probabilistic, their synthesis can differ even when the supplied prompts are exactly the same. This can make it challenging to determine if a particular change is warranted as a single execution cannot concretely tell you whether a given change improves or degrades your task.

September 26, 2025 at 11:48 PM

Reposted by arize-phoenix

Thomas Vitale ☀️

@thomasvitale.com

In the latest release of Arconia, I included support for the OpenInference Semantic Conventions for instrumenting your @spring-ai.bsky.social apps and integrating with AI platforms like @arize-phoenix.bsky.social, now available as an Arconia Dev Service for Spring Boot. arconia.io/docs/arconia...

Arconia OpenInference for Spring AI. You can use it to instrument your applications and integrate with Arize Phoenix, whose screenshot is included in the image, showing the spans in an AI trace.

dependencies {
implementation 'io.arconia:arconia-openinference-semantic-conventions'
implementation 'io.arconia:arconia-opentelemetry-spring-boot-starter'

testAndDevelopmentOnly 'io.arconia:arconia-dev-services-phoenix'
}

September 8, 2025 at 11:55 PM

Reposted by arize-phoenix

Sanjana Yeddula

@syeddula.bsky.social

app.arize.com/auth/phoenix...

Arize AI

app.arize.com

June 27, 2025 at 10:34 PM

Reposted by arize-phoenix

Sanjana Yeddula

@syeddula.bsky.social

Missed the news from Arize Observe 2025? Phoenix Cloud just got Spaces & Access Management!

✨ Create tailored Spaces
🔑 Manage user permissions
👥 Easy team collaboration

More than a feature, it’s Phoenix adapting to you.

Spin up a new Phoenix project & test it out!
@arize-phoenix.bsky.social

June 27, 2025 at 10:34 PM

Reposted by arize-phoenix

John Gilhuly

@johngilhuly.bsky.social

🧪 📊 The @arize-phoenix.bsky.social TS/JS client now supports Experiments and Datasets!

You can now create datasets, run experiments, and attach evaluations to experiments using the Phoenix TS/JS client.

Shoutout to @anthonypowell.me and @mikeldking.bsky.social for the work here!

May 21, 2025 at 2:26 PM

Reposted by arize-phoenix

Sanjana Yeddula

@syeddula.bsky.social

Docs: docs.arize.com/phoenix/trac...

Notebook: colab.research.google.com/github/Arize...

Google GenAI | Phoenix

Instrument LLM calls made using the Google Gen AI Python SDK

docs.arize.com

May 8, 2025 at 8:41 PM

Reposted by arize-phoenix

Sanjana Yeddula

@syeddula.bsky.social

🆕 New in OpenInference: Python auto-instrumentation for the Google GenAI SDK!

Add GenAI tracing to your @arize-phoenix.bsky.social applications in just a few lines. Works great with Span Replay so you can debug, tweak, and explore agent behavior in prompt playground.

Check Notebook + docs below!👇

May 8, 2025 at 8:41 PM

Reposted by arize-phoenix

srichavali.bsky.social

@srichavali.bsky.social

Learn to prompt better

May 7, 2025 at 7:26 PM

Reposted by arize-phoenix

John Gilhuly

@johngilhuly.bsky.social

@pydantic.dev evals 🤝 @arize-phoenix.bsky.social tracing and UI

I’ve been really liking some of the eval tools from Pydantic's evals package.

Wanted to see if I could combine these with Phoenix’s tracing so I could run Pydantic evals on traces captured in Phoenix

May 2, 2025 at 6:02 PM

Reposted by arize-phoenix

Sanjana Yeddula

@syeddula.bsky.social

Check out the full video: youtu.be/iOGu7-HYm6s?...

Tracing and Evaluating OpenAI Agents

YouTube video by Arize AI

youtu.be

April 18, 2025 at 6:51 PM

Reposted by arize-phoenix

Sanjana Yeddula

@syeddula.bsky.social

Just dropped a tutorial on using the OpenAI Agents SDK + @arize-phoenix.bsky.social to go from building to evaluating agents.

✔️ Trace agent decisions at every step
✔️ Offline and Online Evals using LLM as a Judge

If you're building agents, measuring them is essential.

Full vid and cookbook below

April 18, 2025 at 6:51 PM

Reposted by arize-phoenix

John Gilhuly

@johngilhuly.bsky.social

We've added new LLM decorators to @arize-phoenix.bsky.social 's OpenInference library 🎁

Tag a function with `@ tracer.llm` to automatically capture it as an @opentelemetry.io span.
- Automatically parses input and output messages
- Comes in decorator or context manager flavors

April 18, 2025 at 2:21 AM

Reposted by arize-phoenix

Sanjana Yeddula

@syeddula.bsky.social

We've added GPT-4.1 models to the @arize-phoenix.bsky.social Prompt Playground.

My go-to way to test out these new models: grab a failed trace from a previous run, pull it into playground, switch the model and see if 4.1 can succeed where 4o failed.

Early signs are promising!

April 16, 2025 at 6:43 PM

arize-phoenix

@arize-phoenix.bsky.social

Trace Flowise apps with Arize Phoenix 🔍

Flowise is fast, visual, and low-code — but what happens under the hood?

With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.

April 15, 2025 at 9:03 PM

Reposted by arize-phoenix

John Gilhuly

@johngilhuly.bsky.social

Google’s new Agent Development Kit (ADK) makes it dead easy to build, manage, and deploy AI agent systems.

Here's how you can easily setup a LangGraph agent, deploy it with Google ADK, and trace it with @arize-phoenix.bsky.social.

Deploying an Agent with Google Vertex Agent Engine and Arize Phoenix

YouTube video by Arize AI

www.youtube.com

April 10, 2025 at 7:10 PM

arize-phoenix

@arize-phoenix.bsky.social

Use Ragas with Arize AI @arize.bsky.social or ArizePhoenix to improve the evaluation of your LLM applications

Together you can:

✅ Evaluate performance with Ragas metrics
✅ Visualize and understand LLM behavior through traces & experiments in Arize or Phoenix

Dive into our docs & notebooks ⬇️

April 9, 2025 at 12:20 AM

Reposted by arize-phoenix

Sanjana Yeddula

@syeddula.bsky.social

LLM as a Judge allows models to evaluate outputs in a single prompt—but a good judging needs a good prompt

In my new tutorial, learn techniques on how to optimize your prompt so your judge can improve accuracy, cost, fairness, and robustness

better prompts ➡️ better evals

April 7, 2025 at 5:15 PM

Reposted by arize-phoenix

Sanjana Yeddula

@syeddula.bsky.social

Full video: youtu.be/pvef59pEmvo

LLM as a Judge Prompt Optimization

YouTube video by Arize AI

youtu.be

April 7, 2025 at 5:15 PM

arize-phoenix

@arize-phoenix.bsky.social

New in the Phoenix client: Prompt Tagging 🏷️

📌Tag prompts in code and see those tags reflected in the UI
📌Tag prompt versions as development, staging, or production — or define your own
📌Add in tag descriptions for more clarity

Manage your prompt lifecycles with confidence🚀

April 4, 2025 at 7:24 PM

Reposted by arize-phoenix

Arize AI

@arize.bsky.social

Demo your app at this year's Observe! Fill out a short application by 4.30 to be considered for our Demo Den. Great opportunity to showcase your work to the AI community in SF.

Apply here: docs.google.com/forms/d/e/1F...

Text reads: Building AI? Demo your app. Arize:Observe community demos. Submit by 4.30.25. Apply.

March 28, 2025 at 9:11 PM

Reposted by arize-phoenix

Sanjana Yeddula

@syeddula.bsky.social

Think + Act — all within your prompt

In this tutorial, I apply ReAct principles to prompt LLMs to Reason + Act like humans. By specifying these steps, the LLM generates reasoning and interacts with tools for greater accuracy.

Full Video Tutorial: youtu.be/PB7hrp0mz54?...

ReAct Prompting

YouTube video by Arize AI

youtu.be

March 24, 2025 at 11:26 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news