app.phoenix.arize.com
The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
Flowise is fast, visual, and low-code — but what happens under the hood?
With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.
Flowise is fast, visual, and low-code — but what happens under the hood?
With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.
Together you can:
✅ Evaluate performance with Ragas metrics
✅ Visualize and understand LLM behavior through traces & experiments in Arize or Phoenix
Dive into our docs & notebooks ⬇️
Together you can:
✅ Evaluate performance with Ragas metrics
✅ Visualize and understand LLM behavior through traces & experiments in Arize or Phoenix
Dive into our docs & notebooks ⬇️
📌Tag prompts in code and see those tags reflected in the UI
📌Tag prompt versions as development, staging, or production — or define your own
📌Add in tag descriptions for more clarity
Manage your prompt lifecycles with confidence🚀
📌Tag prompts in code and see those tags reflected in the UI
📌Tag prompt versions as development, staging, or production — or define your own
📌Add in tag descriptions for more clarity
Manage your prompt lifecycles with confidence🚀
We’ve integrated @CleanlabAI’s Trustworthy Language Model (TLM) with Phoenix to help teams improve LLM reliability and performance
🔗 Dive into the full implementation in our docs & notebook:
We’ve integrated @CleanlabAI’s Trustworthy Language Model (TLM) with Phoenix to help teams improve LLM reliability and performance
🔗 Dive into the full implementation in our docs & notebook:
📌 Persistent column selection for consistent views
🔍 Filter data directly from tables with metadata and quick metadata filters
⏳ Set custom time ranges for traces & spans
🌳 Option to filter spans by root spans
Check out the demo👇
📌 Persistent column selection for consistent views
🔍 Filter data directly from tables with metadata and quick metadata filters
⏳ Set custom time ranges for traces & spans
🌳 Option to filter spans by root spans
Check out the demo👇
This makes Prompt Playground ideal for side-by-side reasoning tests: o3 vs. Anthropic vs. R1.
Plus, GPT-4.5 support keeps it up to date with the latest from OpenAI & Anthropic - test them all out in the playground! ⚡️
This makes Prompt Playground ideal for side-by-side reasoning tests: o3 vs. Anthropic vs. R1.
Plus, GPT-4.5 support keeps it up to date with the latest from OpenAI & Anthropic - test them all out in the playground! ⚡️