Google DeepMind's new open-weight family runs fully offline on Android phones, Raspberry Pi, and workstations under a commercially permissive Apache 2.0 licence for the first time.
Google DeepMind on Thursday released Gemma 4, a family of four open-weight AI models designed for on-device agentic workflows, available under an Apache 2.0 licence, capable of running offline on smartphones and edge hardware in over 140 languages.
What Is Gemma 4 and Why It Matters
Gemma 4 is Google DeepMind's fourth generation of open-weight large language models, built on the same underlying research and architecture as Gemini 3, the company's flagship proprietary system released in late 2025. Unlike Gemini, which runs on Google's cloud infrastructure, Gemma 4 models are designed to be downloaded, modified, and deployed locally — on a developer's own hardware, without sending data to any external server.
The release arrives on April 3, 2026, and represents a significant escalation in the global race to make powerful AI models freely available. Since Google launched the first Gemma generation, developers have downloaded models from the family more than 400 million times, producing over 100,000 community variants collectively referred to as the Gemmaverse.
Four Models, One Mission: Intelligence at Every Scale
The Gemma 4 family launches with four distinct variants, each targeting a different hardware category. At the edge end, the Effective 2B (E2B) and Effective 4B (E4B) models are built for smartphones, Raspberry Pi units, and NVIDIA Jetson Orin Nano boards. Despite their names suggesting 2 billion and 4 billion parameters respectively, Google describes these as "effective" footprints — the actual inference footprint optimised for RAM and battery constraints.
For more capable machines, two larger variants cover higher-demand tasks. The 26-billion Mixture of Experts (MoE) model and the 31-billion Dense model are designed for workstations, GPU-equipped laptops, and cloud deployments. All four models process video, images, and text. The two smaller variants also handle audio inputs, enabling speech understanding without cloud connectivity.
Google partnered with Qualcomm Technologies and MediaTek on mobile hardware optimisations, ensuring near-zero latency on compatible Android devices. LiteRT-LM, Google's on-device inference library, can run the E2B model using under 1.5 gigabytes of memory — a threshold achievable on mid-range smartphones. On a Raspberry Pi 5, the same model processes 133 tokens per second at prefill and 7.6 tokens per second at decode.
"Gemma 4 is our answer: breakthrough capabilities made widely accessible under an Apache 2.0 license." — Google DeepMind, Official Blog Post, April 3, 2026 [SOURCE: https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/]
Benchmarks, Rivals, and Where Gemma 4 Stands
Performance claims carry weight only when tested against independent benchmarks. On Arena AI's global text leaderboard — a crowd-sourced human preference evaluation platform — Gemma 4's 31B Dense model ranked third among all open models as of April 1, 2026. The 26B MoE variant ranked sixth. Both results are notable because the models outperformed significantly larger open-weight competitors, including systems with parameter counts exceeding 600 billion.
India Angle: On-Device AI and the Multilingual Opportunity
India represents the single largest growth market for mobile AI adoption in 2026. With approximately 750 million active smartphone users and 22 officially scheduled languages across its states and territories, the country has long presented a structural challenge for AI companies: most frontier models train predominantly on English data and rely on cloud connectivity that remains inconsistent in Tier 2 and Tier 3 cities.
Gemma 4 addresses both gaps. Its training data spans over 140 languages, with Google explicitly citing improved multilingual and localised experiences as a core design objective. Because the E2B and E4B models run entirely offline on compatible Android devices, developers building for Hindi, Tamil, Telugu, Bengali, or Marathi speakers no longer require stable internet access for inference.
Google has also launched the Gemma 4 Good Challenge on Kaggle, inviting developers to submit applications demonstrating positive social impact. The competition opens an organised pathway for Indian developers to gain global visibility with locally-focused AI tools, particularly in agriculture, healthcare, and public services — sectors where offline functionality and local language support are non-negotiable.
What Comes Next for the Gemmaverse
"Code you write today for Gemma 4 will automatically work on Gemini Nano 4-enabled devices that will be available later this year." — Android Developers Blog, Google, April 3, 2026 [SOURCE: https://android-developers.googleblog.com/2026/04/AI-Core-Developer-Preview.html]
Conclusion.
Q1: What is Google Gemma 4? A: Google Gemma 4 is a family of four open-weight AI models released by Google DeepMind on April 3, 2026. Built on the same research as Gemini 3, the models range from 2 billion to 31 billion parameters, support over 140 languages, process text, images, video, and audio, and run fully offline on devices including smartphones and Raspberry Pi.
Q2: Is Gemma 4 free to use commercially? A: Yes. Gemma 4 is released under the Apache 2.0 open-source licence, which allows free commercial use, modification, and redistribution without royalties. This marks a significant change from previous Gemma generations, which used a more restrictive custom Google licence that limited commercial deployment.
Q3: What are the four Gemma 4 model sizes? A: The four Gemma 4 models are the Effective 2B and Effective 4B, designed for smartphones and edge devices, and the 26-billion Mixture of Experts and 31-billion Dense models, built for workstations and high-performance hardware. All four handle text, images, and video. The two smaller models also process audio inputs.
Q4: How does Gemma 4 perform on benchmarks? A: On Arena AI's global text leaderboard as of April 1, 2026, Gemma 4's 31-billion Dense model ranked third globally among all open models, while the 26-billion Mixture of Experts variant ranked sixth. Both models outperformed significantly larger competitors — some with over twenty times the parameter count — in human preference evaluations.
Q5: Can Gemma 4 run on Android phones in India? A: Yes. Gemma 4's E2B and E4B models are optimised for Android devices through Google's AICore Developer Preview, using under 1.5 gigabytes of RAM in some configurations. The models support over 140 languages including major Indian languages, and run completely offline, making them viable for Tier 2 and Tier 3 Indian markets with inconsistent internet connectivity.


