AIWiki
Malaysia
Back to all articles
Companies & Toolstogether-aigpu-cloudinference

Together AI

5 min readUpdated July 2026
Together AI
Type
AI cloud and inference platform
Founded
2022
Focus
Open-source model hosting, training, GPU clusters
Infrastructure
NVIDIA H100, H200, B200, GB200 GPUs
Valuation
Around USD 3.3 billion (2025 Series B)
Related
Inference, GPU cluster, open-source models, Hugging Face

Together AI is a cloud computing company that offers infrastructure and services for building with artificial intelligence, with a particular emphasis on open-source models. Founded in 2022, the company operates what it describes as an AI acceleration cloud, spanning inference, training, fine-tuning, and access to raw GPU compute. It has positioned itself as an alternative to hosting models through the large hyperscale cloud providers, targeting developers and enterprises that want to run open-weight models with control over cost, performance, and data.

Platform and Services

Together AI's platform covers much of the AI development lifecycle. Its inference service provides API access to more than two hundred open-source models across modalities including text, code, image, audio, vision, and embeddings, letting developers call models such as those in the Llama, Qwen, and Mistral families without managing servers. A fine-tuning service allows customers to adapt these models to their own data, and the company markets tooling for agentic workflows, including built-in code execution, and for generating synthetic data.

For customers needing dedicated hardware, Together AI offers GPU clusters built on NVIDIA accelerators, including H100, H200, and the newer Blackwell-generation B200 and GB200 chips. In 2025 it introduced self-service provisioning so that teams could spin up GPU infrastructure directly, with pricing available on hourly, daily, and multi-month commitments. The company also operates a batch API priced below real-time inference for large, latency-tolerant workloads.

Technical Foundations

Part of Together AI's identity comes from its research heritage. The company's inference engine incorporates optimisations developed in the academic community, including the FlashAttention family of attention kernels and advanced quantisation techniques, which reduce the memory and compute cost of running large models. This research-led approach is used to justify claims of faster and cheaper inference than naive deployments, and it connects the company to the broader open-source efficiency ecosystem that has made running large models on commodity accelerators more practical.

Funding and Position in the Market

In early 2025 Together AI raised a Series B round of about 305 million US dollars, led by General Catalyst and co-led by Prosperity7, valuing the company at roughly 3.3 billion dollars. Investors included NVIDIA, Salesforce Ventures, Kleiner Perkins, and others, and the round was tied to plans for large deployments of Blackwell GPUs. The company sits within a competitive field of specialised AI clouds and inference providers that emerged to serve demand for open-model hosting, competing with hyperscalers on price and specialisation rather than breadth. Its focus on open weights appeals to organisations that prefer model ownership and portability over dependence on a single proprietary model provider.

Significance

Together AI illustrates a broader trend in the AI infrastructure market: the rise of intermediaries that make open-source models easy to consume at scale while abstracting away the complexity of GPU operations. For teams that want the flexibility of open weights but lack the expertise or capital to build their own serving stack, such platforms lower the barrier to production deployment. At the same time, the business is exposed to the volatile economics of GPU supply, model release cycles, and intense competition, factors that shape pricing and availability across the sector.

Together AI is relevant to Malaysian organisations weighing how to run AI workloads affordably while retaining control over their models and data. The company's emphasis on open-weight models aligns with a growing local preference for sovereign and portable AI, seen in national investment in local models such as MaLLaM and ILMU and in the build-out of domestic AI data centres by operators including YTL and through partnerships announced by the government. Open weights allow Malaysian firms to fine-tune and host models on infrastructure of their choosing rather than being locked to a single foreign provider.

Data residency is a central consideration. Under the Personal Data Protection Act and the oversight of CyberSecurity Malaysia, banks supervised by Bank Negara Malaysia, healthcare providers, and government agencies often need sensitive data to remain within national jurisdiction. Using open models that can be deployed on Malaysian or regional infrastructure, rather than sent to overseas proprietary APIs, helps satisfy these requirements, and platforms that specialise in open-model hosting fit that pattern even when the workload ultimately runs on local hardware.

For Malaysia's ambition, expressed through the MyDigital Blueprint and the National AI Office, to become an AI-ready nation by 2030, access to cost-effective GPU capacity and efficient inference is a practical constraint. MDEC-linked programmes and HRD Corp training help build the engineering skills needed to operate such infrastructure, and the arrival of global GPU-cloud investment in Southeast Asia is expanding the options available to Malaysian developers.

  1. Together AI. (2025). Announcing our 305M Series B. together.ai/blog.
  2. SiliconANGLE. (2025). Together AI launches self-service GPU infrastructure. siliconangle.com.
  3. Dao, T., et al. (2023). FlashAttention-2 and FlashAttention-3. arXiv.