AIWiki
Malaysia

Arize AI

Arize AI is an American ML observability and LLM evaluation platform that helps teams monitor, debug, and improve artificial intelligence models in production, offering both open-source and enterprise-grade tooling.

5 min readLast updated June 2026Companies & Tools

Arize AI is an American technology company that develops platforms for machine learning observability and large language model (LLM) evaluation. Founded in 2020 and headquartered in San Francisco, Arize provides tools that enable data science and engineering teams to monitor AI model behaviour in production, diagnose failures, evaluate response quality, and implement continuous improvements. The company offers both an open-source platform (Phoenix) and an enterprise product (Arize AX).

Background

Arize AI was founded by Jason Lopatecki and Aparna Dhinakaran, who had previously worked at Apple and Twitter respectively on large-scale ML infrastructure. The company was established to address a gap in the MLOps tooling landscape: while tools existed for model development and training, production monitoring of model behaviour — particularly the detection of subtle degradation over time — remained largely ad hoc.

The company raised a USD 70 million Series C round in February 2025, which it described as the largest investment in AI observability at the time of announcement. The round reflected the growing importance of observability as LLM-based applications proliferated in enterprise production environments, creating new challenges around output quality, hallucination detection, latency, and prompt injection.

Products

Arize AX

Arize AX is the company's enterprise production monitoring and evaluation platform. It is offered in two editions. AX-Generative targets LLM and generative AI applications, providing tools for tracing LLM calls, evaluating response quality against ground truth or model-based rubrics, monitoring token usage and latency, and detecting anomalous input patterns such as prompt injection attempts. AX-ML and CV addresses traditional machine learning and computer vision applications, providing drift detection, performance monitoring, and bias analysis.

Key capabilities of Arize AX include embedding-based drift detection — monitoring changes in the vector representations of inputs over time — which is particularly effective for unstructured data such as text, images, and audio. The platform supports champion/challenger model comparison, allowing teams to evaluate the production performance of a new model against its predecessor before committing to a full rollout.

Phoenix

Phoenix is Arize's open-source AI observability platform, released under the Apache 2.0 licence. It provides tracing for LLM applications, evaluation harnesses, experiment management, and failure investigation tools. Phoenix is designed for local and self-hosted deployment, making it suitable for organisations with data residency requirements or those operating in air-gapped environments.

Phoenix implements the OpenTelemetry standard for its tracing instrumentation. OpenTelemetry, the same open standard used in conventional application performance monitoring, allows AI application traces to interoperate with existing observability infrastructure — enabling unified monitoring of the full application stack. Integration libraries support major LLM frameworks including LangChain, LlamaIndex, OpenAI SDK, and Anthropic's Claude SDK.

Core Capabilities

Arize's platform centres on four functional pillars. Tracing captures the execution path of LLM and ML inference requests, including intermediate steps in agent workflows, tool calls, and retrieval operations in RAG pipelines. Evaluation provides automated quality assessment of model outputs using both reference-free LLM-based judges and traditional metrics. Monitoring tracks production metrics — latency, error rates, prediction drift, and data quality — over time with configurable alerting. Experimentation enables offline evaluation of prompt variations, model versions, or retrieval configurations before production deployment.

For LLM applications, Arize supports evaluation of hallucination, relevance, toxicity, faithfulness in RAG pipelines, and task-specific correctness. Evaluations can be run using GPT-4, Claude, or open-source judge models, with customisable rubrics suited to the specific application domain.

Integration Ecosystem

Arize integrates with the broader MLOps and LLM tooling ecosystem. Native connectors are available for Amazon Bedrock, Google Vertex AI, Azure AI, OpenAI, Cohere, and Anthropic APIs. On the data side, Arize supports ingestion from Apache Kafka, AWS Kinesis, and standard REST logging APIs. The platform can export metrics to Grafana, Datadog, and other APM systems via standard data formats.

References

  1. Arize AI. (2025). Arize AI Secures 0M Series C. PR Newswire, prnewswire.com.
  2. Amazon Web Services. (2024). Amazon Bedrock Agents observability using Arize AI. AWS Machine Learning Blog.
  3. Arize AI. (2024). LLM Observability and Evaluation Platform. arize.com.
  4. Startupik. (2024). Arize AI: AI Observability and Monitoring Platform. startupik.com.