What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

LangSmith

LangSmith is an observability, tracing, and evaluation platform from LangChain for debugging, monitoring, and continuously improving large language model and AI agent applications in production.

4 min readLast updated June 2026Companies & Tools

LangSmith is a software-as-a-service platform developed by LangChain, Inc. for tracing, debugging, evaluating, and monitoring applications built with large language models (LLMs) and AI agents. Although it was originally introduced as the production companion to the LangChain framework in 2023, it has since been generalised into a framework-agnostic observability layer that ingests traces from the OpenAI SDK, the Anthropic SDK, LlamaIndex, custom application code, and OpenTelemetry.

What problem it solves

LLM applications are notoriously hard to debug. A single user-facing response may be the result of dozens of prompt templates, retrieval calls, tool invocations, and agent decisions, each of which contributes latency, cost, and potential failure modes. Conventional application performance monitoring tools record HTTP requests and CPU metrics but cannot expose the prompt that was actually sent, the retrieved context, or the reasoning trace of an agent. LangSmith fills this gap by recording the inputs, outputs, intermediate steps, token usage, and metadata of every LLM and tool call, and by attaching them to a hierarchical trace that mirrors the execution graph of the application.

Core capabilities

| Capability | Description | |---|---| | Tracing | Hierarchical capture of LLM, retrieval, and tool calls with inputs, outputs, latency, and cost | | Monitoring | Dashboards for latency p50/p95, error rate, token spend, and custom metrics | | Evaluation | Run datasets through chains and grade with LLM-as-judge, exact match, or human review | | Prompt management | Versioned prompt registry with commit, tagging, and rollback | | Annotation queues | Route traces to subject-matter experts for labelling | | Dataset curation | Build evaluation sets from production traces of interest | | Online evals | Continuously score live traffic on quality, safety, and policy compliance | | Alerting | Notify on regressions in cost, latency, or quality |

Instrumentation can be as light as wrapping a function with the @traceable decorator in Python or traceable in TypeScript, or as deep as exporting OpenTelemetry spans from any service. Traces include cost estimates for hundreds of model SKUs, which helps engineering teams understand spend across providers.

Evaluation workflow

A typical LangSmith evaluation cycle begins with a dataset of inputs and expected behaviours, often bootstrapped from production traces. Engineers define evaluators, which may be heuristic (exact match, regex, BLEU), embedding similarity, or LLM-as-judge prompts that score correctness, helpfulness, and faithfulness. Experiments are run by replaying the dataset through one or more versions of the chain or agent; results are compared in a side-by-side view that shows per-example scores, latency, and cost. This pattern lets teams detect regressions before they ship and supports continuous improvement of prompts and retrievers.

Deployment options

LangSmith is available as a fully managed cloud service, a Bring-Your-Own-Cloud (BYOC) deployment that keeps trace data inside the customer's account, and a self-hosted option for organisations with data-residency or regulatory constraints. The self-hosted edition is commonly chosen by financial services and government workloads in regulated markets.

Competitive landscape

LangSmith competes with Langfuse, Helicone, Arize AI, Phoenix (Arize), Weights and Biases Weave, Galileo, Honeyhive, and OpenTelemetry-based open-source stacks. Differentiation comes from depth of integration with the LangChain and LangGraph ecosystems, the maturity of evaluation tooling, and prompt management features.

Malaysian Context — LLM Adoption and Local Vendor Compliance

Malaysian organisations building generative AI applications increasingly need production-grade observability to satisfy internal risk, audit, and regulatory expectations. Bank Negara Malaysia (BNM), the Securities Commission (SC), and the forthcoming Malaysian AI Governance Framework coordinated by MOSTI and the National AI Office expect operators of AI systems to maintain audit trails of model inputs and outputs, traceable explanations for decisions, and continuous monitoring for hallucination, bias, and policy violations. Platforms in the LangSmith category map directly onto these expectations.

Local fintechs, digital banks (GXBank, AEON Bank, Boost Bank, Ryt Bank, KAF Digital Bank), and e-commerce players use LLMs for customer service triage, KYC document understanding, and internal knowledge search. These workloads typically pipe traces into observability platforms to monitor accuracy and cost. The PDPA, enforced by the Personal Data Protection Department (JPDP), restricts cross-border transfer of personal data, so the BYOC and self-hosted deployment options are particularly relevant for Malaysian operators handling MyKad numbers, financial records, and health data.

System integrators and AI consultancies based in Cyberjaya, Bangsar South, and Penang (including Fusionex, Silverlake Axis, Securemetric, Innov8tif, Naluri, and the Hermes Agent / Teragrid Agent partner ecosystem under AITG SDN BHD) increasingly bundle LLM observability into their delivery stack. MDEC and HRD Corp support relevant upskilling through claimable training programmes covering LLM evaluation, prompt engineering, and AI quality assurance.

References

LangChain, Inc. LangSmith Documentation.
LangChain, Inc. (2024). LangSmith General Availability Announcement.
OpenTelemetry Project. Specification for Generative AI Conventions.
Bank Negara Malaysia. (2024). Discussion Paper on the Use of AI by Financial Institutions.
MOSTI. Malaysian National AI Roadmap and Governance Framework Consultation Materials.

Tags:llm-observability langchain evaluation tracing

Type	LLM and agent observability platform
Developed by	LangChain, Inc.
Launched	2023
SDKs	Python, TypeScript, Go, Java
Deployment	Cloud, BYOC, self-hosted
Related	LangChain, OpenTelemetry, Helicone, Langfuse

What problem it solves

Core capabilities

Evaluation workflow

Deployment options

Competitive landscape

See Also

References

References