AIWiki
Malaysia

LangSmith

LangSmith is an observability, tracing, and evaluation platform from LangChain for debugging, monitoring, and continuously improving large language model and AI agent applications in production.

4 min readLast updated June 2026Companies & Tools

LangSmith is a software-as-a-service platform developed by LangChain, Inc. for tracing, debugging, evaluating, and monitoring applications built with large language models (LLMs) and AI agents. Although it was originally introduced as the production companion to the LangChain framework in 2023, it has since been generalised into a framework-agnostic observability layer that ingests traces from the OpenAI SDK, the Anthropic SDK, LlamaIndex, custom application code, and OpenTelemetry.

What problem it solves

LLM applications are notoriously hard to debug. A single user-facing response may be the result of dozens of prompt templates, retrieval calls, tool invocations, and agent decisions, each of which contributes latency, cost, and potential failure modes. Conventional application performance monitoring tools record HTTP requests and CPU metrics but cannot expose the prompt that was actually sent, the retrieved context, or the reasoning trace of an agent. LangSmith fills this gap by recording the inputs, outputs, intermediate steps, token usage, and metadata of every LLM and tool call, and by attaching them to a hierarchical trace that mirrors the execution graph of the application.

Core capabilities

| Capability | Description | |---|---| | Tracing | Hierarchical capture of LLM, retrieval, and tool calls with inputs, outputs, latency, and cost | | Monitoring | Dashboards for latency p50/p95, error rate, token spend, and custom metrics | | Evaluation | Run datasets through chains and grade with LLM-as-judge, exact match, or human review | | Prompt management | Versioned prompt registry with commit, tagging, and rollback | | Annotation queues | Route traces to subject-matter experts for labelling | | Dataset curation | Build evaluation sets from production traces of interest | | Online evals | Continuously score live traffic on quality, safety, and policy compliance | | Alerting | Notify on regressions in cost, latency, or quality |

Instrumentation can be as light as wrapping a function with the @traceable decorator in Python or traceable in TypeScript, or as deep as exporting OpenTelemetry spans from any service. Traces include cost estimates for hundreds of model SKUs, which helps engineering teams understand spend across providers.

Evaluation workflow

A typical LangSmith evaluation cycle begins with a dataset of inputs and expected behaviours, often bootstrapped from production traces. Engineers define evaluators, which may be heuristic (exact match, regex, BLEU), embedding similarity, or LLM-as-judge prompts that score correctness, helpfulness, and faithfulness. Experiments are run by replaying the dataset through one or more versions of the chain or agent; results are compared in a side-by-side view that shows per-example scores, latency, and cost. This pattern lets teams detect regressions before they ship and supports continuous improvement of prompts and retrievers.

Deployment options

LangSmith is available as a fully managed cloud service, a Bring-Your-Own-Cloud (BYOC) deployment that keeps trace data inside the customer's account, and a self-hosted option for organisations with data-residency or regulatory constraints. The self-hosted edition is commonly chosen by financial services and government workloads in regulated markets.

Competitive landscape

LangSmith competes with Langfuse, Helicone, Arize AI, Phoenix (Arize), Weights and Biases Weave, Galileo, Honeyhive, and OpenTelemetry-based open-source stacks. Differentiation comes from depth of integration with the LangChain and LangGraph ecosystems, the maturity of evaluation tooling, and prompt management features.

See Also

References

References

  1. LangChain, Inc. LangSmith Documentation.
  2. LangChain, Inc. (2024). LangSmith General Availability Announcement.
  3. OpenTelemetry Project. Specification for Generative AI Conventions.
  4. Bank Negara Malaysia. (2024). Discussion Paper on the Use of AI by Financial Institutions.
  5. MOSTI. Malaysian National AI Roadmap and Governance Framework Consultation Materials.