NPU Chip: What It Is and How It Changes Smartphones

Every new smartphone arrives with ever-larger TOPS numbers on its NPU chip. But what does it actually mean? What exactly does an NPU do, how does it differ from a CPU and GPU, and why is it fundamentally changing how we use our phones? In this technical analysis, we explain everything — from architecture to practical applications.

What Is an NPU?

The NPU (Neural Processing Unit) is a specialized processor designed exclusively for artificial intelligence and machine learning tasks. It belongs to the broader category of AI accelerators — specialized hardware that accelerates neural network inference, meaning the process of running an already-trained AI model.

Unlike a general-purpose CPU that executes individual instructions sequentially, or a GPU that parallelizes thousands of simple operations simultaneously, the NPU is optimized for matrix multiplication — matrix multiplication that forms the core of every neural network. It uses low-precision arithmetic (INT4, INT8, FP16) instead of FP32/FP64, achieving massive energy efficiency.

CPU vs GPU vs NPU

CPU: General processor — executes everything, but sequentially. Ideal for logic, I/O, OS tasks.
GPU: Massive parallelization — thousands of cores for graphics & AI training. High power consumption.
NPU: AI inference specialist — matrix ops in INT8/FP16. Low power, massive efficiency.

How We Measure Performance: TOPS

NPU performance is measured in TOPS (Trillions of Operations Per Second) — trillions of operations per second. This typically refers to INT8 operations (8-bit integer addition and multiplication). TOPS isn't the only metric — efficiency per Watt, model support, and software stack quality matter equally.

Today, a flagship smartphone delivers 35-50 TOPS on the NPU, while a mid-range phone ranges from 10-25 TOPS. Microsoft sets 40 TOPS as the minimum for Copilot+ PCs, indicating where the industry is heading.

NPU History in Mobile

The Beginning: 2017

The story begins in 2017, when two companies simultaneously introduced dedicated AI hardware in smartphones. Huawei announced the Kirin 970 at IFA 2017 with an NPU based on the Da Vinci Architecture, capable of 1.92 TOPS. A few weeks later, Apple presented the A11 Bionic with a 2-core Neural Engine rated at 600 billion operations/second (0.6 TOPS), designed for Face ID and Animoji.

The A11 was manufactured on TSMC 10nm technology, occupied just 1.83mm² on the die, but wasn't open to third-party developers — only Apple could utilize the Neural Engine.

The Evolution: 2018-2023

Apple moved ahead quickly. The A12 Bionic (2018) brought an 8-core Neural Engine rated at 5 TOPS — an 8x improvement — and opened Core ML to developers for the first time. This was followed by: A14 (16 cores, 11 TOPS), A15 (15.8 TOPS), A16 (17 TOPS), A17 Pro (35 TOPS).

Qualcomm integrated AI acceleration into the Hexagon DSP, which gradually evolved into a full Hexagon NPU in the Snapdragon 8-series. The Snapdragon 8 Gen 3 (2023) reached 45 TOPS, while Samsung followed with the Exynos 2400 (dual-core NPU). MediaTek introduced the APU 790 in the Dimensity 9300, and Google developed Tensor chips (G1-G4) with custom TPU-based AI cores.

Today: 2024-2026

In 2024-2026, every SoC company is battling over TOPS. Apple's A18 Pro delivers 35 TOPS with an improved Neural Engine, Qualcomm's Snapdragon 8 Elite (Gen 4) hits 75 TOPS, MediaTek's Dimensity 9400 reaches 50+ TOPS, and Google Tensor G5 (TSMC) brings custom AI cores designed for Gemini Nano on-device.

0.6 TOPS - Apple A11 (2017)

35 TOPS - Apple A18 Pro (2024)

75 TOPS - Snapdragon 8 Elite

58x Improvement in 7 years

Practical NPU Applications on Mobile

NPUs aren't just numbers on spec sheets. They're already dramatically changing how we use our smartphones:

Photography & Video

Computational Photography: Night mode, HDR+, portrait blur — running dozens of neural networks on every photo
Real-time Object Detection: Scene recognition (food, pet, landscape) for automatic camera settings
AI Video Stabilization: Predictive motion compensation using neural networks
Background Blur in video: Real-time semantic segmentation at 30/60fps

Voice & Language

On-device Speech Recognition: Google Assistant, Siri, and Samsung Bixby run voice-to-text locally
Real-time Translation: Translation in real-time without internet — Apple Translate, Google Live Translate
Noise Cancellation: AI-powered noise filtering during calls
Smart Compose: Predictive text, grammar correction, tone suggestions

AI Assistants & On-Device LLMs

Gemini Nano: Google runs a 3.25B model locally on Pixel — summarization, smart replies, Magic Compose
Apple Intelligence: On-device Foundation Models for writing tools, image generation, Siri enhancement
Samsung Galaxy AI: Chat Assist, Note Assist, Generative Edit — partially runs on the NPU
Privacy: On-device processing means your data never leaves the device

"The shift to on-device AI isn't just about performance. It's about privacy — user data stays on the device, without being sent to cloud servers."

— Craig Federighi, SVP Software Engineering, Apple

How an NPU Is Programmed

NPUs aren't programmed directly like a CPU. Each manufacturer provides an SDK/framework:

Apple Core ML: Runs ONNX/TensorFlow/PyTorch models converted to Core ML format on the Neural Engine
Qualcomm SNPE/QNN: Snapdragon Neural Processing Engine SDK for Hexagon NPU
Google TensorFlow Lite (LiteRT): Cross-platform AI inference with NPU delegate support
MediaTek NeuroPilot: SDK for APU acceleration on Dimensity chips
Samsung ONE (On-device Neural Engine): Exynos NPU development kit

Models are typically encoded in ONNX (Open Neural Network Exchange) format — an open standard enabling portability between different NPUs. The Khronos Group is also working on the NNEF format for standardization.

2026 Mobile NPU Comparison

75 Snapdragon 8 Elite (Qualcomm)

50+ Dimensity 9400 (MediaTek)

38 Apple M4 Neural Engine

35 Apple A18 Pro (iPhone)

Important note: TOPS isn't the only metric. Apple, despite lower numbers, often wins in real-world performance thanks to its Unified Memory Architecture and optimized Core ML framework. Qualcomm excels in raw throughput, while Google focuses on custom AI workloads (Gemini Nano, computational photography). TOPS function like megapixels in cameras — they measure something, but they don't tell the whole story.

The Future: NPU 2027+

The next generation of mobile NPUs is expected to bring dramatic changes:

100+ TOPS on smartphone: Local execution of 7-13B parameter models in real-time
FP4 & Mixed Precision: Even lower precision for even greater speed
Transformer-optimized NPU: Hardware designed specifically for attention mechanisms
Always-On AI: Continuous inference at ultra-low power (small NPU core always active)
Multi-modal AI: Simultaneous processing of text, image, audio, video in real-time

The trend is clear: NPUs are becoming the most important processor in the smartphone, surpassing CPU and GPU in practical significance. Every interaction — from photography to voice assistant, from keyboard to notifications — will now pass through AI processing.

Conclusion

The NPU isn't just another marketing buzzword. It's the core technology enabling on-device AI — face unlock, computational photography, real-time translation, voice assistants, and much more. Next time you buy a phone, the NPU's TOPS will be just as important as the CPU's GHz.

NPU Neural Processing Unit AI Chip TOPS Apple Neural Engine Qualcomm Hexagon Mobile AI On-device AI

Neural Processing Units (NPUs): The AI Chip Revolution in Modern Smartphones