arize-phoenix
banner
arize-phoenix.bsky.social
arize-phoenix
@arize-phoenix.bsky.social
Open-Source AI Observability and Evaluation
app.phoenix.arize.com
Run evals fast with our TypeScript Evals Quickstart!

The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
November 26, 2025 at 5:00 PM
🚀 New feature: Dataset Splits 🚀

Splits let you define named subsets of your dataset & filter your experiments to run only on those subsets.

Learn more & Check out this walkthrough:
⚪️ Create a split directly in the Phoenix UI
⚪️ Run an experiment scoped to that subset

👉 Full demo + code below 👇
Harnessing Splits in your Dataset with Arize Phoenix
YouTube video by Arize AI
youtu.be
November 25, 2025 at 7:39 PM
Dig into agent traces without a single line of code!

Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
November 20, 2025 at 3:25 PM
New Evals for TypeScript agent builders 🔥

With Mastra now integrating directly with Phoenix, you can trace your TypeScript agents with almost zero friction.

And now… you can evaluate them too: directly from TypeScript using Phoenix Evals.
November 13, 2025 at 7:21 PM
🌀 Since LLMs are probabilistic, their synthesis can differ even when the supplied prompts are exactly the same. This can make it challenging to determine if a particular change is warranted as a single execution cannot concretely tell you whether a given change improves or degrades your task.
September 26, 2025 at 11:48 PM
Reposted by arize-phoenix
In the latest release of Arconia, I included support for the OpenInference Semantic Conventions for instrumenting your @spring-ai.bsky.social apps and integrating with AI platforms like @arize-phoenix.bsky.social, now available as an Arconia Dev Service for Spring Boot. arconia.io/docs/arconia...
September 8, 2025 at 11:55 PM
Reposted by arize-phoenix
Arize AI
app.arize.com
June 27, 2025 at 10:34 PM
Reposted by arize-phoenix
Missed the news from Arize Observe 2025? Phoenix Cloud just got Spaces & Access Management!

✨ Create tailored Spaces
🔑 Manage user permissions
👥 Easy team collaboration

More than a feature, it’s Phoenix adapting to you.

Spin up a new Phoenix project & test it out!
@arize-phoenix.bsky.social
June 27, 2025 at 10:34 PM
Reposted by arize-phoenix
🧪 📊 The @arize-phoenix.bsky.social TS/JS client now supports Experiments and Datasets!

You can now create datasets, run experiments, and attach evaluations to experiments using the Phoenix TS/JS client.

Shoutout to @anthonypowell.me and @mikeldking.bsky.social for the work here!
May 21, 2025 at 2:26 PM
Reposted by arize-phoenix
🆕 New in OpenInference: Python auto-instrumentation for the Google GenAI SDK!

Add GenAI tracing to your @arize-phoenix.bsky.social applications in just a few lines. Works great with Span Replay so you can debug, tweak, and explore agent behavior in prompt playground.

Check Notebook + docs below!👇
May 8, 2025 at 8:41 PM
Reposted by arize-phoenix
Learn to prompt better
May 7, 2025 at 7:26 PM
Reposted by arize-phoenix
@pydantic.dev evals 🤝 @arize-phoenix.bsky.social tracing and UI

I’ve been really liking some of the eval tools from Pydantic's evals package.

Wanted to see if I could combine these with Phoenix’s tracing so I could run Pydantic evals on traces captured in Phoenix
May 2, 2025 at 6:02 PM
Reposted by arize-phoenix
Check out the full video: youtu.be/iOGu7-HYm6s?...
Tracing and Evaluating OpenAI Agents
YouTube video by Arize AI
youtu.be
April 18, 2025 at 6:51 PM
Reposted by arize-phoenix
Just dropped a tutorial on using the OpenAI Agents SDK + @arize-phoenix.bsky.social to go from building to evaluating agents.

✔️ Trace agent decisions at every step
✔️ Offline and Online Evals using LLM as a Judge

If you're building agents, measuring them is essential.

Full vid and cookbook below
April 18, 2025 at 6:51 PM
Reposted by arize-phoenix
We've added new LLM decorators to @arize-phoenix.bsky.social 's OpenInference library 🎁

Tag a function with `@ tracer.llm` to automatically capture it as an @opentelemetry.io span.
- Automatically parses input and output messages
- Comes in decorator or context manager flavors
April 18, 2025 at 2:21 AM
Reposted by arize-phoenix
We've added GPT-4.1 models to the @arize-phoenix.bsky.social Prompt Playground.

My go-to way to test out these new models: grab a failed trace from a previous run, pull it into playground, switch the model and see if 4.1 can succeed where 4o failed.

Early signs are promising!
April 16, 2025 at 6:43 PM
Trace Flowise apps with Arize Phoenix 🔍

Flowise is fast, visual, and low-code — but what happens under the hood?

With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.
April 15, 2025 at 9:03 PM
Reposted by arize-phoenix
Google’s new Agent Development Kit (ADK) makes it dead easy to build, manage, and deploy AI agent systems.

Here's how you can easily setup a LangGraph agent, deploy it with Google ADK, and trace it with @arize-phoenix.bsky.social.
Deploying an Agent with Google Vertex Agent Engine and Arize Phoenix
YouTube video by Arize AI
www.youtube.com
April 10, 2025 at 7:10 PM
Use Ragas with Arize AI @arize.bsky.social or ArizePhoenix to improve the evaluation of your LLM applications

Together you can:

✅ Evaluate performance with Ragas metrics
✅ Visualize and understand LLM behavior through traces & experiments in Arize or Phoenix

Dive into our docs & notebooks ⬇️
April 9, 2025 at 12:20 AM
Reposted by arize-phoenix
LLM as a Judge allows models to evaluate outputs in a single prompt—but a good judging needs a good prompt

In my new tutorial, learn techniques on how to optimize your prompt so your judge can improve accuracy, cost, fairness, and robustness

better prompts ➡️ better evals
April 7, 2025 at 5:15 PM
Reposted by arize-phoenix
LLM as a Judge Prompt Optimization
YouTube video by Arize AI
youtu.be
April 7, 2025 at 5:15 PM
New in the Phoenix client: Prompt Tagging 🏷️

📌Tag prompts in code and see those tags reflected in the UI
📌Tag prompt versions as development, staging, or production — or define your own
📌Add in tag descriptions for more clarity

Manage your prompt lifecycles with confidence🚀
April 4, 2025 at 7:24 PM
Reposted by arize-phoenix
Demo your app at this year's Observe! Fill out a short application by 4.30 to be considered for our Demo Den. Great opportunity to showcase your work to the AI community in SF.

Apply here: docs.google.com/forms/d/e/1F...
March 28, 2025 at 9:11 PM
Reposted by arize-phoenix
Think + Act — all within your prompt

In this tutorial, I apply ReAct principles to prompt LLMs to Reason + Act like humans. By specifying these steps, the LLM generates reasoning and interacts with tools for greater accuracy.

Full Video Tutorial: youtu.be/PB7hrp0mz54?...
ReAct Prompting
YouTube video by Arize AI
youtu.be
March 24, 2025 at 11:26 PM