
Vera Arrives: NVIDIA Vera CPU for agentic AI lands at top labs
NVIDIA is introducing a purpose-built CPU family for AI factories, positioning Vera as the control plane for agentic workloads that keep accelerators fully utilized. The NVIDIA Vera CPU for agentic AI is engineered to feed large GPU clusters with continuously generated and compiled data to minimize idle time and bottlenecks in training and serving pipelines [1].
Executive summary: What Vera is and why it matters for AI data centers
Vera targets agentic inference, reinforcement learning post-training, and real-time analytics, with an emphasis on high single-thread performance, deterministic latency, and sustained throughput. NVIDIA is aligning Vera with rack-scale AI Factory configurations that pair the CPU closely with Rubin GPUs and claim higher density and efficiency than x86-based racks. OEM availability is expected in the second half of 2026 [1].
Architecture highlights: 88 Olympus cores, Spatial Multithreading, and coherency fabric
Vera uses 88 custom Olympus cores featuring Spatial Multithreading to drive single-thread performance and predictable latency. A second-generation Scalable Coherency Fabric underpins the design to maintain performance under multi-core pressure while sustaining per-core throughput critical to token generation and compilation stages in AI pipelines [1].
Memory and I/O: 1.2 TB/s aggregate bandwidth and LPDDR5X SOCAMM
NVIDIA cites up to 1.2 TB/s aggregate memory bandwidth and roughly 14 GB/s per core. The platform employs LPDDR5X SOCAMM memory modules and a monolithic die with adjacent dielets to bolster IPC and energy efficiency over long-running, memory-intensive agent workflows. The intent is to stream data to GPUs without CPU-induced stalls that can reduce accelerator utilization [1].
Platform pairings: Vera Rubin NVL72, HGX Rubin NVL8, and rack-scale AI factories
Vera underpins multiple platforms. Vera Rubin NVL72 racks tightly couple the CPU and Rubin GPUs for AI factories, while the HGX Rubin NVL8 platform pairs Vera host CPUs with Rubin GPUs over PCIe for AI inference, analytics, and enterprise HPC. NVIDIA also points to liquid-cooled CPU racks and flexible single- or dual-socket servers. At the rack scale, the company claims about 4x higher agentic sandbox density and 2x performance-per-watt versus x86-based racks for reinforcement learning post-training, agentic inference, and real-time analytics [1].
Why the NVIDIA Vera CPU for agentic AI targets AI factory bottlenecks
NVIDIA frames Vera as the CPU cornerstone that keeps GPUs busy by accelerating data generation and compilation between steps of agentic and RL-driven workflows. The design goal is to prevent stalls so accelerators do not sit idle, supporting training and serving clusters that can scale to very large footprints measured in hundreds of megawatts or more [1].
Workloads and performance claims: agentic inference, RL post-training, and real-time analytics
The architecture is tuned for agentic pipelines where single-thread performance, deterministic latency, and high memory bandwidth directly impact token handling, code generation, and real-time decision loops. The same characteristics apply to reinforcement learning post-training and streaming analytics where CPU-side orchestration can otherwise gate GPU throughput. NVIDIA’s rack-level efficiency and density claims target these use cases rather than general-purpose CPU markets [1].
Operational and procurement implications for enterprises
Enterprises evaluating AI factories should weigh rack density, power and cooling requirements, and integration with Rubin GPUs. NVIDIA’s single-sku Vera strategy simplifies configuration choices and aims to maximize single-thread speed rather than chasing very high core counts. The company expects OEM systems to arrive in the second half of 2026, which sets procurement timelines for labs and hyperscalers planning deployments around these platforms [1][2]. For planning frameworks and build-versus-buy considerations, explore our AI tools and playbooks.
Comparisons and decision criteria: Vera vs x86 for AI factories
NVIDIA positions Vera-based racks against x86 systems on agentic workloads, with claims of around 2x performance-per-watt and 4x sandbox density at the rack level for reinforcement learning post-training, agentic inference, and real-time analytics. Decision points include single-thread latency, memory bandwidth per core, and end-to-end GPU utilization rather than traditional core-count comparisons. NVIDIA’s plan to ship only one Vera SKU underscores a focus on peak per-thread throughput over broad-spectrum CPU markets [1][2].
For additional technical context on NVIDIA platforms, see the company’s developer resources (external).
Sources
[1] NVIDIA Vera CPU Delivers High Performance, Bandwidth, and Efficiency for AI Factories
https://developer.nvidia.com/blog/nvidia-vera-cpu-delivers-high-performance-bandwidth-and-efficiency-for-ai-factories/
[2] Nvidia will only produce one 88-core Vera CPU model — Jensen says the company will make billions of dollars from a single SKU | Tom’s Hardware
https://www.tomshardware.com/pc-components/cpus/nvidia-will-only-produce-one-88-core-vera-cpu-model-jensen-says-the-company-will-make-billions-of-dollars-from-a-single-sku
[3] GTC 2026: Ian Buck press Q&A transcript — VP of Hyperscale and HPC speaks out on shelving CPX and shipping LPU decode this year | Tom’s Hardware
https://www.tomshardware.com/tech-industry/gc-2026-press-q-and-a-transcript