Groq

Groq, Inc.

🇺🇸
CHIP DESIGNERS🇺🇸 USPrivate
groq.com

Key Product

GroqChip LPU, GroqCloud AI inference service

Trace supply chain →
Full briefing▼ Expand

Groq, Inc. is a private AI infrastructure company headquartered in Mountain View, California, founded in 2016 by Jonathan Ross, who was also the lead engineer on Google's first TPU. Groq's founding insight was that LLM inference — unlike training — is fundamentally memory-bandwidth-bound rather than compute-bound: the bottleneck is moving model weights from memory to processing cores fast enough to keep up with the sequential, token-by-token generation process. Groq designed the LPU (Language Processing Unit) as a software-programmable, deterministic dataflow processor that eliminates this bottleneck through a fundamentally different architectural approach. The LPU architecture uses a Systolic Array design with a Temporal Instruction Set Architecture (TISA) — a statically-scheduled execution model where the compiler determines at compile time exactly when each instruction executes, with no dynamic scheduling, no cache hierarchy, and no out-of-order execution hardware. This eliminates all sources of non-deterministic latency (cache misses, dynamic memory allocation, branch prediction failures) that cause the high variance in GPU inference timing. The result is an inference processor that delivers completely deterministic, single-digit-millisecond per-token latency for large models, regardless of batch size or concurrent user load. A single LPU chip achieves approximately 750 GB/s of memory bandwidth using SRAM rather than HBM. GroqCloud, Groq's public inference API service, became one of the most-cited benchmarks in the AI inference speed debate when it demonstrated LLaMA 2 70B inference at over 300 tokens per second per user in early 2024 — approximately 4–10× faster than comparable GPU-based inference services at the time. The GroqCloud throughput advantage comes from both the LPU's memory bandwidth architecture and Groq's compiler-optimized model serving pipeline. Groq raised $640 million in a Series D funding round in August 2024, with participation from Samsung Ventures, Cisco, and others, bringing its total funding to approximately $1.1 billion and valuing the company at $2.8 billion. Groq's chips are fabricated by TSMC. The current GroqChip (LPU1) is on TSMC's 14nm process; subsequent generations are planned on more advanced nodes. The Samsung Ventures investment signals a potential strategic relationship with Samsung as a future fabrication alternative, though TSMC remains Groq's primary fab partner. The LPU's SRAM-centric design — which uses distributed on-chip SRAM arrays rather than HBM stacks — means Groq does not depend on SK Hynix or Samsung for HBM packaging, differentiating it from GPU-based inference infrastructure and eliminating one layer of supply chain complexity. Groq's target market is real-time AI inference applications where latency matters more than cost-per-token throughput: voice AI, customer service agents, real-time translation, code completion, and enterprise applications requiring sub-second response times. The company is also pursuing defense and intelligence community contracts where deterministic latency is a mission-critical requirement — a use case where the LPU's predictable timing properties offer a meaningful advantage over GPU-based systems with their inherent scheduling variance. As LLM inference workloads grow faster than training workloads in the overall AI compute mix, Groq's specialized inference-only architecture positions it as a complement to (rather than a replacement for) GPU-based training infrastructure.

Critical path — raw silicon to deployment

FOUNDRIES

TSMC

CoWoS advanced packaging, N3/N2 logic

EDA TOOLS

Synopsys

Design Compiler (synthesis), PrimeTime (timing), VCS (simulation), IC Compiler 2

EDA TOOLS

Cadence

Virtuoso (analog), Genus/Innovus (digital synthesis), Tempus (timing signoff)

CHIP DESIGNERS

Groq

GroqChip LPU, GroqCloud AI inference service