What is AIWiki Malaysia?

AIWiki Malaysia is a free, open AI knowledge base covering artificial intelligence concepts, tools, models, and use cases — written specifically for Malaysian professionals and students. It is maintained by AITG Sdn Bhd, an AI company based in Penang.

Who maintains AIWiki Malaysia?

AIWiki Malaysia is maintained by AITG Sdn Bhd (Registration: 202601016521 (1678618-W)), an AI company headquartered in George Town, Penang, Malaysia. The editorial team continuously updates and expands the knowledge base.

What topics does AIWiki Malaysia cover?

AIWiki Malaysia covers a wide range of AI topics including large language models (LLMs), AI agents, machine learning fundamentals, prompt engineering, AI automation, generative AI tools, Malaysian AI regulations, local vendor landscape, and real-world AI use cases relevant to the Malaysian market.

How do I search for AI topics on AIWiki Malaysia?

You can use the search bar at the top of the site to find articles by keyword or topic. Articles are also organised by category, so you can browse by subject area such as Models, Tools, Concepts, or Use Cases.

Is AIWiki Malaysia available in Bahasa Malaysia?

Yes. AIWiki Malaysia publishes content in both English and Bahasa Malaysia to serve the full breadth of the Malaysian professional and student community. Language availability is indicated on each article page.

How can I submit a topic or suggest an article?

You can suggest topics or submit article ideas by contacting the AIWiki Malaysia team at admin@aiteragrid.com. AITG Sdn Bhd reviews all submissions and publishes content that meets editorial accuracy standards.

Langfuse

Langfuse is an open-source LLM engineering platform that provides observability, tracing, prompt management, evaluation, and dataset tooling for teams building applications on top of large language models.

6 min readLast updated June 2026Infrastructure

Langfuse is an open-source platform for LLM engineering that provides observability, tracing, prompt management, evaluation, and dataset capabilities for teams developing applications built on large language models. It is designed to address the operational challenges that arise when deploying LLM-based systems in production — where debugging, cost management, quality measurement, and iterative improvement require specialised tooling beyond what standard application monitoring systems offer.

Langfuse was founded in 2023 by Maximilian Deichmann, Marc Klingen, and Clemens Rawert, and was part of Y Combinator's Winter 2023 cohort. It has grown to become one of the most widely adopted open-source LLM observability platforms. In 2025, ClickHouse — the open-source analytics database company — acquired Langfuse, signalling a long-term strategic investment in LLM data infrastructure.

Why LLM Observability Matters

Traditional software observability tools — metrics, logs, distributed traces — were designed for deterministic systems where outputs are predictable given known inputs. LLM applications are non-deterministic: the same prompt can produce different outputs depending on model version, temperature settings, and context window contents. Debugging why an LLM application produced an incorrect or harmful output, or why its quality degraded after a prompt change, requires capturing the full context of each model call — the exact prompt sent, the model response, any tool calls made, latency at each step, token counts, and associated costs.

Langfuse captures this information in structured traces that span entire LLM workflows, including chains, agents, retrieval-augmented generation pipelines, and any non-LLM steps such as database lookups or API calls. This observability layer makes LLM applications debuggable and auditable in the same way that distributed tracing (via OpenTelemetry or Jaeger) made microservice architectures observable.

Core Features

Tracing and Observability

Langfuse's tracing system captures hierarchical execution trees for LLM applications. A single user request may initiate a trace that contains spans for an embedding call, a vector database retrieval, an LLM generation, and a post-processing step — each with its own latency, input, output, token count, and cost. Traces are queryable by user, session, tag, or time range, enabling engineers to investigate specific failures or analyse performance patterns across large volumes of requests.

Integration is available for major LLM frameworks and providers including OpenAI, Anthropic Claude, LangChain, LlamaIndex, LiteLLM, and any OpenTelemetry-compatible system. Most integrations require fewer than ten lines of code.

Prompt Management

Langfuse provides a centralised prompt registry where prompt templates are stored, versioned, and labelled (development, staging, production). Applications retrieve prompts at runtime via the Langfuse SDK rather than hardcoding them, enabling prompt iteration without code deployments. Server-side and client-side caching ensures that dynamic prompt retrieval does not add meaningful latency to production applications.

Prompt versioning enables controlled rollouts and easy rollback: teams can push an updated prompt to production and monitor its effect on quality metrics before fully replacing the previous version.

Evaluations

Langfuse supports multiple evaluation methods to assess LLM output quality at scale. LLM-as-a-judge evaluation uses a secondary LLM call to score outputs against criteria such as correctness, faithfulness, and relevance. Code-based evaluators apply deterministic logic — for example, checking that an output is valid JSON or that a returned SQL query parses correctly. Human annotation workflows allow team members to manually label outputs through Langfuse's review interface. User feedback signals (thumbs up/down, star ratings) can be collected from production applications and correlated with trace data.

Evaluation scores are attached to individual traces and aggregated into dashboards showing quality trends over time, enabling teams to detect regressions introduced by model updates, prompt changes, or retrieval strategy modifications.

Datasets

The datasets feature stores curated collections of input-output pairs — both historical production examples and hand-crafted test cases — that can be replayed against different prompt versions or models in an offline evaluation environment. This enables systematic benchmarking before deploying prompt or model changes, effectively providing a regression testing workflow for LLM applications.

Deployment Options

Langfuse can be deployed as a managed cloud service (Langfuse Cloud) with a free tier offering 50,000 observations per month, or self-hosted using Docker Compose for development environments and Kubernetes via Helm for production deployments. Self-hosting is common among enterprises with strict data residency requirements, as it keeps all trace data — which may contain sensitive prompt contents and user inputs — within the organisation's own infrastructure.

Malaysian Context — LLM Observability for Malaysian AI Development

Malaysian AI developers and enterprises building applications on large language models face the same production engineering challenges that Langfuse addresses — debugging unexpected outputs, managing prompt quality across releases, and controlling inference costs. As Malaysian organisations increase their adoption of LLM-based products through platforms such as Amazon Bedrock (available from AWS's Malaysia region) and Azure OpenAI Service (from Microsoft's Malaysia data centres), observability tooling becomes a necessary part of the production stack.

Malaysian digital banks including GXBank, AEON Bank, and Boost Bank, which operate under Bank Negara Malaysia's digital banking licences, are deploying conversational AI features for customer service and financial guidance. The ability to trace and audit LLM interactions is important for compliance with BNM's guidelines on responsible AI in financial services, which require institutions to maintain records of AI-assisted decisions and be able to explain outputs upon request.

For Malaysian AI startups operating under MDEC's Malaysia Digital programme, Langfuse's open-source self-hosted deployment option is particularly relevant: it eliminates per-observation SaaS costs during early-stage development and enables teams to run observability infrastructure within their own cloud environments, which may be required for data residency compliance under PDPA.

Teragrid AI and similar Malaysian AI solution providers building RAG-based enterprise applications have used LLM observability platforms to demonstrate to enterprise clients that their AI systems are auditable and quality-controlled — an increasingly important procurement requirement as Malaysian enterprises develop AI governance policies aligned with the Malaysia AI Governance Framework published by MDEC.

The broader ASEAN AI development ecosystem, including developer communities in Singapore, Thailand, and Indonesia, actively uses Langfuse given its open-source accessibility, making it a relevant platform for Malaysian teams participating in cross-border AI projects and partnerships.

References

Langfuse. (2025). Langfuse Documentation: LLM Observability Overview. https://langfuse.com/docs
GitHub. (2025). langfuse/langfuse — Open Source LLM Engineering Platform. https://github.com/langfuse/langfuse
ClickHouse. (2025). ClickHouse Acquires Langfuse: The Future of Open-Source LLM Observability. ClickHouse Blog. https://clickhouse.com/blog/clickhouse-acquires-langfuse-open-source-llm-observability
Y Combinator. (2023). Langfuse — YC W23. Y Combinator.
Shankar, S. et al. (2024). Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences. arXiv:2404.12272.

Tags:LLM observability tracing prompt management open source MLOps

Type	Open-source LLM engineering platform
Founded	2023
Founders	Maximilian Deichmann, Marc Klingen, Clemens Rawert
Licence	MIT (core), Enterprise edition available
Backed by	Y Combinator W23; acquired by ClickHouse
Related	LangSmith, Arize AI, Weights and Biases, LangChain