Scale any workload. Data or pipeline parallelism for throughput. Tensor parallelism for single-batch.
Vision: high-res, real-time. Tiling splits 4K+ images across cores. Low-latency inference at full resolution.
LLMs: saturate your bandwidth. Model parallelism keeps latency tight as models grow.
Multi-chip and chiplet ready. Bridge clusters across dies. Same toolchain. Compiler-managed.
Scale array size, add cores, bridge chiplets—same software stack.
Up to 100 TOPS per core
Up to 800 TOPS
Up to 6,400 TOPS
One architecture supports spatial, data, pipeline, and task parallelism—choose the pattern that fits.
Large input partitioned into tiles, processed in parallel. Adjacent cores exchange edge data via MLS.
Same model on each core, processing different batches. 100% linear scaling.
Model layers partitioned across cores as pipeline stages. Maximizes weight bandwidth.
Each core runs a separate model or workload independently. No synchronization overhead.
Homogeneous clusters of 2, 4, or 8 cores with direct L2↔L2 sharing.
QC-M Cluster
Key: MLS enables direct L2↔L2 access between cores; AXI Coalescer optimizes external memory bandwidth.
For Processor Architects
For Software Architects
Scale clusters through customer's Network-on-Chip.
Multi-Cluster Architecture
Component Overview
Per Cluster
TinyML / IoT
1× QC-N
1TOPS
High-Volume Vision
2× QC-P
24TOPS
Edge LLM
4× QC-P
48TOPS
ADAS L2+
8× QC-U (1 chiplet)
800TOPS
Autonomous / L4+
64× QC-U (8 chiplets)
6,400TOPS
8-Chiplet System Architecture
From 1 TOPS edge devices to 6,400 TOPS autonomous systems—same architecture, same software.