NVIDIA Cosmos Foundation Models — Scale Synthetic Data & Physical AI

Split-screen of an Omniverse OpenUSD scene and photorealistic output from NVIDIA Cosmos foundation models with spatiotemporal control maps

NVIDIA Cosmos Foundation Models — Scale Synthetic Data & Physical AI

By Agustin Giovagnoli / March 14, 2026

High-fidelity synthetic data is rapidly becoming foundational for robotics and autonomous vehicles. NVIDIA’s latest effort centers on world foundation models (WFMs) designed to scale realistic data generation and physical AI reasoning. The NVIDIA Cosmos foundation models target workflows where teams need controllable, photorealistic video and robust evaluation of physical plausibility—key to closing the sim-to-real gap for production systems [1].

Inside the NVIDIA Cosmos foundation models

Cosmos is a family of large, pretrained multimodal models that represent and generate world states as video, integrating perception, prediction, and generation into a single video-centric framework. These WFMs are trained on over 20 million hours of robotics and driving data to respect spatial, temporal, and physical constraints while predicting future scenes, transforming environments, and assessing physical plausibility [1].

The family organizes around three pillars:

  • Cosmos Transfer for converting structured or simulated inputs into photorealistic, physics-aware video while preserving scene semantics.
  • Cosmos Predict for forecasting future frames or entire sequences from text, images, or video.
  • Cosmos Reason for evaluating physical plausibility and curating synthetic datasets.

Together, they provide an end-to-end path to create, adapt, and evaluate synthetic datasets tuned to specific robots or AV stacks [1][2].

Cosmos Transfer: From Omniverse/OpenUSD to photorealistic, physics-aware video

Cosmos Transfer converts structured or simulated inputs—often created in NVIDIA Omniverse using OpenUSD—into photorealistic video that maintains scene structure and leverages pretrained world knowledge. It uses spatiotemporal control maps to align synthetic and real camera views and dynamics, and systematically varies lighting, weather, assets, and other visual factors to bridge the sim-to-real gap. This enables controllable, scalable generation of training data from Omniverse OpenUSD assets without sacrificing core semantics [1].

For teams building dataset pipelines, Transfer’s controllability and physics awareness are crucial for producing diverse yet consistent “world states” that reflect real-world conditions while remaining aligned with ground-truth structure [1]. See also NVIDIA’s broader tooling ecosystem in NVIDIA Omniverse (external).

Cosmos Predict: Forecasting and closed-loop simulation

Cosmos Predict generates future video frames or full sequences from text, images, or video, supporting forecasting and closed-loop simulation for robots and autonomous vehicles. By producing rollouts that reflect scene dynamics, Predict can aid planners and enable robust policy evaluation across a spectrum of scenarios—helpful for stress-testing autonomy stacks before on-road or in-field deployment [1].

Cosmos Reason: Curating datasets and checking physical plausibility

Cosmos Reason focuses on understanding space, time, and physics to critique and curate synthetic datasets. It flags implausible or impossible events and guides dataset improvements, helping teams reduce label noise and edge-case failures in downstream training. This capability provides a feedback loop that strengthens both synthetic-data quality and model robustness for real-world tasks [2].

Open Physical AI dataset: scale and components

Complementing the models, NVIDIA is releasing a large Open Physical AI dataset intended to accelerate research and productization. It includes thousands of hours of multicamera video, hundreds of thousands of trajectories, and up to 1,000 OpenUSD assets (SimReady), providing standardized data for training and evaluation in robotics and AV workflows [3].

This dataset, combined with Cosmos checkpoints, tooling, and the Cosmos Cookbook, is designed to give teams both massive data and powerful models for generating and reasoning about realistic, diverse world states [1][3].

Practical workflows: from sim to product

A typical workflow:

  1. Build or import OpenUSD scenes in Omniverse.
  2. Use Cosmos Transfer to create photorealistic, physics-aware video while preserving scene semantics and aligning with real camera views via spatiotemporal control maps.
  3. Augment scenarios with Cosmos Predict to generate future rollouts for planning and closed-loop testing.
  4. Apply Cosmos Reason to evaluate physical plausibility, flag impossible events, and curate the dataset for downstream training.
  5. Perform task-specific post-training for particular robots, AV stacks, or operating environments using Cosmos checkpoints and the Cosmos Cookbook [1][2][3].

For broader implementation playbooks and tooling strategies, you can also explore AI tools and playbooks.

Business impact and ROI considerations

By scaling controllable, photorealistic data generation and embedding physical AI reasoning, Cosmos aims to reduce data collection costs, speed iteration, and improve model robustness. Teams can better align synthetic data with real-world viewpoints and conditions, evaluate plausibility before training, and target specific operating domains with task-focused post-training—all critical for production-grade robotics and AV systems [1][2][3].

Getting started

NVIDIA is making Cosmos checkpoints, tooling, and the Cosmos Cookbook available alongside the Open Physical AI dataset, providing a practical starting point for small-scale sim-to-real experiments. Teams can trial Transfer for photorealistic generation from OpenUSD, use Predict for scenario rollouts, and apply Reason for dataset critique before scaling to production [1][3].

Sources

[1] Scale Synthetic Data and Physical AI Reasoning with NVIDIA Cosmos World Foundation Models
https://developer.nvidia.com/blog/scale-synthetic-data-and-physical-ai-reasoning-with-nvidia-cosmos-world-foundation-models/

[2] Curating Synthetic Datasets to Train Physical AI Models with NVIDIA Cosmos Reason
https://developer.nvidia.com/blog/curating-synthetic-datasets-to-train-physical-ai-models-with-nvidia-cosmos-reason/

[3] NVIDIA Unveils Open Physical AI Dataset to Advance Robotics and Autonomous Vehicles
https://blogs.nvidia.com/blog/open-physical-ai-dataset/

Scroll to Top