Lightnews — Scholar-powered news

arize-phoenix

@arize-phoenix.bsky.social

37 followers 11 following 32 posts

Open-Source AI Observability and Evaluation
app.phoenix.arize.com

Posts Replies Media Videos

arize-phoenix

@arize-phoenix.bsky.social

TypeScript Evals Quickstart: arize.com/docs/phoeni...

November 26, 2025 at 5:00 PM

arize-phoenix

@arize-phoenix.bsky.social

Run evals fast with our TypeScript Evals Quickstart!

The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇

November 26, 2025 at 5:00 PM

arize-phoenix

@arize-phoenix.bsky.social

Dig into agent traces without a single line of code!

Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.

November 20, 2025 at 3:25 PM

arize-phoenix

@arize-phoenix.bsky.social

🌀 Since LLMs are probabilistic, their synthesis can differ even when the supplied prompts are exactly the same. This can make it challenging to determine if a particular change is warranted as a single execution cannot concretely tell you whether a given change improves or degrades your task.

September 26, 2025 at 11:48 PM

arize-phoenix

@arize-phoenix.bsky.social

Trace Flowise apps with Arize Phoenix 🔍

Flowise is fast, visual, and low-code — but what happens under the hood?

With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.

April 15, 2025 at 9:03 PM

arize-phoenix

@arize-phoenix.bsky.social

Use Ragas with Arize AI @arize.bsky.social or ArizePhoenix to improve the evaluation of your LLM applications

Together you can:

✅ Evaluate performance with Ragas metrics
✅ Visualize and understand LLM behavior through traces & experiments in Arize or Phoenix

Dive into our docs & notebooks ⬇️

April 9, 2025 at 12:20 AM

arize-phoenix

@arize-phoenix.bsky.social

New in the Phoenix client: Prompt Tagging 🏷️

📌Tag prompts in code and see those tags reflected in the UI
📌Tag prompt versions as development, staging, or production — or define your own
📌Add in tag descriptions for more clarity

Manage your prompt lifecycles with confidence🚀

April 4, 2025 at 7:24 PM

arize-phoenix

@arize-phoenix.bsky.social

Better LLMs start with better data and observability

We’ve integrated @CleanlabAI’s Trustworthy Language Model (TLM) with Phoenix to help teams improve LLM reliability and performance

🔗 Dive into the full implementation in our docs & notebook:

March 20, 2025 at 7:50 PM

arize-phoenix

@arize-phoenix.bsky.social

Some updates for Projects! Gain more flexibility and control with:

📌 Persistent column selection for consistent views
🔍 Filter data directly from tables with metadata and quick metadata filters
⏳ Set custom time ranges for traces & spans
🌳 Option to filter spans by root spans

Check out the demo👇

March 7, 2025 at 11:39 PM

arize-phoenix

@arize-phoenix.bsky.social

🧠 Phoenix now supports Anthropic Sonnet 3.7 & Thinking Budgets!

This makes Prompt Playground ideal for side-by-side reasoning tests: o3 vs. Anthropic vs. R1.

Plus, GPT-4.5 support keeps it up to date with the latest from OpenAI & Anthropic - test them all out in the playground! ⚡️

March 7, 2025 at 5:29 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news