app.phoenix.arize.com
The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
Flowise is fast, visual, and low-code — but what happens under the hood?
With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.
Flowise is fast, visual, and low-code — but what happens under the hood?
With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.
This makes Prompt Playground ideal for side-by-side reasoning tests: o3 vs. Anthropic vs. R1.
Plus, GPT-4.5 support keeps it up to date with the latest from OpenAI & Anthropic - test them all out in the playground! ⚡️
This makes Prompt Playground ideal for side-by-side reasoning tests: o3 vs. Anthropic vs. R1.
Plus, GPT-4.5 support keeps it up to date with the latest from OpenAI & Anthropic - test them all out in the playground! ⚡️