arize-phoenix
banner
arize-phoenix.bsky.social
arize-phoenix
@arize-phoenix.bsky.social
Open-Source AI Observability and Evaluation
app.phoenix.arize.com
Run evals fast with our TypeScript Evals Quickstart!

The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
November 26, 2025 at 5:00 PM
Dig into agent traces without a single line of code!

Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
November 20, 2025 at 3:25 PM
🌀 Since LLMs are probabilistic, their synthesis can differ even when the supplied prompts are exactly the same. This can make it challenging to determine if a particular change is warranted as a single execution cannot concretely tell you whether a given change improves or degrades your task.
September 26, 2025 at 11:48 PM
Trace Flowise apps with Arize Phoenix 🔍

Flowise is fast, visual, and low-code — but what happens under the hood?

With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.
April 15, 2025 at 9:03 PM
🧠 Phoenix now supports Anthropic Sonnet 3.7 & Thinking Budgets!

This makes Prompt Playground ideal for side-by-side reasoning tests: o3 vs. Anthropic vs. R1.

Plus, GPT-4.5 support keeps it up to date with the latest from OpenAI & Anthropic - test them all out in the playground! ⚡️
March 7, 2025 at 5:29 PM