Search Results
2 results for “LLM evaluation”
Infrastructure
AI Benchmarking
The systematic evaluation of AI systems using standardised datasets, tasks, and metrics to measure capability, compare models, and track progress across research and deployment contexts.
6 min readUpdated June 2026
Companies & Tools
Arize AI
Arize AI is an American ML observability and LLM evaluation platform that helps teams monitor, debug, and improve artificial intelligence models in production, offering both open-source and enterprise-grade tooling.
5 min readUpdated June 2026