🔍 What Is Physical AI?
Artificial intelligence, as we knew it until recently, lived exclusively in the digital realm. LLMs (Large Language Models) handle words — one-dimensional tokens. Image models process pixels in two dimensions. But the real world has three dimensions, gravity, friction, collisions, and materials with wildly different physical properties.
Physical AI, by NVIDIA's definition, is AI that can perceive, reason about, interact with, and navigate the physical world. It's no longer enough for a model to “understand” a command — it must translate that understanding into physical motion: a hand grasping a glass, a robot climbing stairs, a drone dodging obstacles in real time.
"Everything that moves, or that monitors things that move, will be an autonomous robotic system. Autonomous vehicles, surgical rooms, warehouses, factories, entire smart cities will transform from static to autonomous systems — embodied by physical AI."
— Jensen Huang, CEO, NVIDIAHuang describes this transition as the third chapter in the history of software: from Software 1.0 (code written by humans) to Software 2.0 (machine learning) and now to the point where “software writes software.” And NVIDIA, with over 2 million developers in its robotics ecosystem (as of August 2025), sits at the center of this revolution.
🏗️ The “Three Computers” — The Architecture
NVIDIA isn't simply selling chips for robots. It has designed a comprehensive three-stage architecture that covers every phase of a robot's lifecycle:
The Pipeline: Train → Simulate → Deploy
DGX — Training
NVIDIA DGX AI supercomputers train multimodal foundation models. Developers can start from Cosmos or GR00T as a base, or pre-train their own models from scratch.
RTX PRO / OVX — Simulation
Omniverse + Cosmos runs on RTX PRO 6000 Blackwell Servers. This is where synthetic data gets generated (images, depth maps, motion trajectories), policies are tested in digital twins via Isaac Sim, and reinforcement learning runs through Isaac Lab.
Jetson AGX Thor — Deployment
The on-robot computer. It runs multimodal AI models in real time: processing sensor data, reasoning, planning, and executing — all within milliseconds. Up to 2,070 TFLOPS FP4 at low power consumption.
What this means in practice: a startup no longer needs to build everything from scratch. It takes NVIDIA's foundation models, fine-tunes them on its own data, tests them in a virtual world, and ships them to the robot — the entire pipeline within a single ecosystem.
🎮 NVIDIA Isaac: The Robotics Platform
The Isaac platform is the backbone of NVIDIA's entire robotics ecosystem. It consists of simulation and learning frameworks, CUDA-accelerated libraries, AI models, and reference workflows. Rather than just listing names, here's what each piece actually does:
Isaac Sim is the virtual world. Built on Omniverse, it creates physically accurate environments where robots can test policies without risk — over and over, millions of times. It simulates sensors (cameras, lidar), generates synthetic training data, and runs software-in-the-loop testing.
Isaac Lab is the lightweight, open-source training application. Optimized for reinforcement learning (RL) and imitation learning (IL), it enables foundation model training within physically accurate scenes. This is what bridges the “sim-to-real gap” — the chasm between virtual performance and real-world capability.
Isaac ROS builds on the open-source ROS 2 (Robot Operating System) and delivers GPU-accelerated perception packages: nvblox (3D reconstruction — 100x faster than CPU), cuVSLAM (visual SLAM), FoundationPose (6D object pose estimation), FoundationStereo (depth estimation), cuMotion (motion planning), and SyntheticaDETR (object detection).
🌍 Cosmos: The World as a Foundation Model
If Isaac gives robots a place to train, Cosmos gives them imagination. Announced at CES 2025 (January 6, 2025), it swept the Best AI and Best Overall awards at the CES Best of CES Awards.
Cosmos is a platform of World Foundation Models (WFMs) — neural networks that predict and generate physics-aware videos of the future state of virtual environments. In plain terms: give the model a frame, and it shows you what happens next, while respecting the laws of physics.
Cosmos models come in three tiers: Nano (optimized for real-time edge deployment), Super (high-performance baselines), and Ultra (maximum quality for distilling custom models). They support text-to-world and video-to-world generation, while Cosmos tokenizers achieve 12x faster processing than state-of-the-art. A striking detail: processing 20 million hours of data takes 40 days on Hopper GPUs or just 14 days on Blackwell GPUs — on a CPU, it would take over 3 years.
Cosmos models are released under NVIDIA's Open Model License (commercial use permitted) and are available on the NGC catalog and Hugging Face. Early adopters include 1X, Agility Robotics, XPENG, and Hillbot in robotics, plus Uber and Waabi in autonomous vehicles.
🤖 GR00T: A Brain for Humanoids
Project GR00T (Generalist Robot 00 Technology) is NVIDIA's most ambitious robotics initiative: an open foundation model designed specifically for humanoid robots. First unveiled at GTC 2024, its first functional version — GR00T N1 — launched on March 18, 2025, at GTC 2025.
GR00T N1's architecture draws inspiration from human cognition through a dual-system design. System 2 (a Vision-Language Model based on NVIDIA-Eagle with SmolLM-1.7B) interprets the environment through vision and language, handling reasoning and planning. System 1 (a Diffusion Transformer) translates those plans into continuous movements — think of System 2 deciding “grab the glass” and System 1 executing the actual hand motion.
Development moved fast: GR00T N1 (2B parameters) was tested on Fourier GR-1 and 1X Neo robots, followed by N1.5 (optimized for Jetson Thor, delivering 2.74x speedup), and most recently — January 2026 — GR00T N1.6 (3B parameters), available on Hugging Face.
The Power of Synthetic Data
The GR00T Blueprint generated over 750,000 synthetic trajectories in just 11 hours — equivalent to 6,500 hours (~9 continuous months) of human demonstration. Combining synthetic and real data improved performance by 40% compared to real data alone.
In real-world testing, GR00T N1 2B achieved an average success rate of 76.8% (with full data) — versus 46.4% for the Diffusion Policy baseline.
NVIDIA didn't stop at the model. It built complete workflows: GR00T-Teleop for collecting human demonstrations, GR00T-Mimic for multiplying trajectories, GR00T-Dreams for generating scenarios through Cosmos, GR00T-Gen for producing simulation-ready environments (2,500+ 3D assets, 150+ object categories), GR00T-Dexterity for end-to-end grasping, GR00T-Mobility for locomotion, and GR00T-Control for whole-body control across 19 degrees of freedom.
💻 Jetson Thor: The Brain Inside the Robot
If the virtual world handles training, at some point that knowledge needs to run inside the machine. That's where Jetson Thor comes in — the next-generation SoC (System-on-Chip) that shipped in August 2025, designed exclusively for robots.
The flagship T5000 module is built on the Blackwell architecture: 2,560 GPU cores, 96 fifth-generation Tensor Cores, 128 GB of LPDDR5X memory, and 2,070 TFLOPS in Sparse FP4. Compared to its predecessor (Jetson AGX Orin), it delivers 7.5x more AI compute, 3.5x better energy efficiency, and 5x improvement on generative AI models — all within a 40–130W power envelope.
Jetson Thor vs. Jetson AGX Orin
| Specification | Jetson Thor T5000 | Jetson AGX Orin |
|---|---|---|
| AI Performance (FP4) | 2,070 TFLOPS | ~275 TFLOPS |
| GPU Architecture | Blackwell | Ampere |
| Tensor Cores | 96 (5th gen) | 64 (4th gen) |
| Memory | 128 GB LPDDR5X | 64 GB LPDDR5 |
| CPU | 14-core Arm Neoverse-V3AE | 12-core Arm |
| Power | 40–130W | 15–60W |
| Price (Dev Kit) | $3,499 | ~$1,999 |
Who's already using it? Agility Robotics will run Jetson Thor in its 6th-generation Digit, Boston Dynamics is integrating it into Atlas, and Dexmate is transitioning from Orin to Thor for the Vega humanoid. In academia, Stanford, Carnegie Mellon, and the University of Zurich are already working with it. The broader ecosystem includes over 1,000 hardware, software, and sensor partners.
🔄 Sim-to-Real: From Virtual Worlds to the Real Thing
The perennial question in robotics has always been: if a robot learns something in simulation, will it actually work in reality? NVIDIA answers with a pipeline designed to minimize that gap.
The process starts with a small number of human demonstrations via teleoperation (even using Apple Vision Pro). Those few examples get multiplied through GR00T-Mimic into millions of synthetic trajectories. Simultaneously, GR00T-Gen produces diverse environments in OpenUSD (over 2,500 3D assets across 150+ categories), and GR00T-Dreams uses Cosmos WFMs to generate scenarios that never existed in the original data.
The results speak for themselves: in GR00T-Mobility, models trained entirely on photorealistic synthetic data from Isaac Sim achieved zero-shot sim-to-real transfer — deployment without any modifications — on real robots of different form factors (differential drive, Ackermann steering, quadrupeds, humanoids). In GR00T-Dexterity, policies trained via RL in Isaac Lab produced robust real-world policies within just a few hours.
🤝 Who Uses NVIDIA Robotics?
The roster of companies relying on the NVIDIA Isaac/GR00T ecosystem reads like a who's who of the humanoid robotics world: Agility Robotics, Boston Dynamics, 1X Technologies, Apptronik, Fourier, Unitree, XPENG Robotics, Sanctuary AI, Neura Robotics, Skild AI, Mentee Robotics, Galbot.
But it goes well beyond humanoids. Diligent Robotics uses Isaac Sim for its Moxi hospital assistants (~100 deployed). Serve Robotics is targeting 2,000 delivery robots. Carbon Robotics builds autonomous agricultural machines packing 24 GPUs. Foxconn runs factory digital twins. Amazon Robotics already operates autonomous warehouse orchestration.
🧪 Newton: The New Physics Engine
Special mention goes to Newton — a new open-source physics engine co-developed with Google DeepMind and Disney Research, managed by the Linux Foundation. Built on NVIDIA Warp and OpenUSD, Newton is GPU-accelerated, extensible (with pluggable custom solvers), and differentiable — meaning it accelerates training, design optimization, and system identification.
The practical promise: simulations that used to take days now finish in minutes. Newton is compatible with Isaac Lab and MuJoCo Playground (via MuJoCo Warp), and its beta is already available on GitHub.
📊 Numbers That Turn Heads
Goldman Sachs projects the humanoid robot market will be worth $38 billion by 2035 — a 6x increase over earlier estimates. Morgan Stanley goes further still: $5 trillion by 2050, with roughly 1 billion humanoid robots on Earth. And NVIDIA, as the arms dealer of this gold rush, occupies a unique position.
🎯 Why This Matters
NVIDIA doesn't need to build a single robot to dominate robotics. Its strategy — “build the tools and let others build the robots” — closely mirrors what it did with GPUs for gaming and later with data centers for AI training. Every company building robots needs simulation, training infrastructure, and on-device inference. NVIDIA provides all three.
Physical AI isn't a future technology — it's happening now. Factories already run digital twins, robots already train in Isaac Sim, foundation models already execute on Jetson Thor. The question is no longer whether robots will learn to operate in the real world, but how fast.
"Everything will be represented in a virtual twin."
— Jensen Huang, 3DEXPERIENCE World, February 2026And NVIDIA, with a $3.5 trillion market cap and technology spanning from GPUs to foundation models, may be the only company that can truly turn that promise into reality.
