arize-phoenix
banner
arize-phoenix.bsky.social
arize-phoenix
@arize-phoenix.bsky.social
Open-Source AI Observability and Evaluation
app.phoenix.arize.com
TypeScript Evals Quickstart: arize.com/docs/phoeni...
November 26, 2025 at 5:00 PM
Run evals fast with our TypeScript Evals Quickstart!

The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
November 26, 2025 at 5:00 PM
Dig into agent traces without a single line of code!

Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
November 20, 2025 at 3:25 PM
🌀 Since LLMs are probabilistic, their synthesis can differ even when the supplied prompts are exactly the same. This can make it challenging to determine if a particular change is warranted as a single execution cannot concretely tell you whether a given change improves or degrades your task.
September 26, 2025 at 11:48 PM
Trace Flowise apps with Arize Phoenix 🔍

Flowise is fast, visual, and low-code — but what happens under the hood?

With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.
April 15, 2025 at 9:03 PM
Use Ragas with Arize AI @arize.bsky.social or ArizePhoenix to improve the evaluation of your LLM applications

Together you can:

✅ Evaluate performance with Ragas metrics
✅ Visualize and understand LLM behavior through traces & experiments in Arize or Phoenix

Dive into our docs & notebooks ⬇️
April 9, 2025 at 12:20 AM
New in the Phoenix client: Prompt Tagging 🏷️

📌Tag prompts in code and see those tags reflected in the UI
📌Tag prompt versions as development, staging, or production — or define your own
📌Add in tag descriptions for more clarity

Manage your prompt lifecycles with confidence🚀
April 4, 2025 at 7:24 PM
Better LLMs start with better data and observability

We’ve integrated @CleanlabAI’s Trustworthy Language Model (TLM) with Phoenix to help teams improve LLM reliability and performance

🔗 Dive into the full implementation in our docs & notebook:
March 20, 2025 at 7:50 PM
Some updates for Projects! Gain more flexibility and control with:

📌 Persistent column selection for consistent views
🔍 Filter data directly from tables with metadata and quick metadata filters
⏳ Set custom time ranges for traces & spans
🌳 Option to filter spans by root spans

Check out the demo👇
March 7, 2025 at 11:39 PM
🧠 Phoenix now supports Anthropic Sonnet 3.7 & Thinking Budgets!

This makes Prompt Playground ideal for side-by-side reasoning tests: o3 vs. Anthropic vs. R1.

Plus, GPT-4.5 support keeps it up to date with the latest from OpenAI & Anthropic - test them all out in the playground! ⚡️
March 7, 2025 at 5:29 PM