
Gemma 3n AI model brings real-time multimodal power to mobiles
Gemma 3n, a new artificial intelligence model architected for mobile and on-device computing, has been introduced as an early preview for developers.
Developed in partnership with mobile hardware manufacturers, Gemma 3n is designed to support real-time, multimodal AI experiences on phones, tablets, and laptops. The model extends the capabilities of the Gemma 3 family by focusing on performance and privacy in mobile scenarios.
The new architecture features collaboration with companies such as Qualcomm Technologies, MediaTek, and Samsung System LSI. The objective is to optimise the model for fast, responsive AI that can operate directly on device, rather than relying on cloud computing. This marks an extension of the Gemma initiative towards enabling AI applications in everyday devices, utilising a shared foundation that will underpin future releases across platforms like Android and Chrome.
According to information provided, Gemma 3n is also the core of the next generation of Gemini Nano, which is scheduled for broader release later in the year, bringing expanded AI features to Google apps and the wider on-device ecosystem. Developers can begin working with Gemma 3n today as part of the early preview, helping them to build and experiment with local AI functionalities ahead of general availability.
The model has performed strongly in chatbot benchmark rankings. One chart included in the announcement ranks AI models by Chatbot Arena Elo scores, with Gemma 3n noted as ranking highly amongst both popular proprietary and open models. Another chart demonstrates the model's mix-and-match performance with respect to model size.
Gemma 3n benefits from Google DeepMind's Per-Layer Embeddings (PLE) innovation, which leads to substantial reductions in RAM requirements. The model is available in 5 billion and 8 billion parameter versions, but, according to the release, it can operate with a memory footprint comparable to much smaller models—2 billion and 4 billion parameters—enabling operation with as little as 2GB to 3GB of dynamic memory. This allows the use of larger AI models on mobile devices or via cloud streaming, where memory overhead is often a constraint.
The company states, "Gemma 3n leverages a Google DeepMind innovation called Per-Layer Embeddings (PLE) that delivers a significant reduction in RAM usage. While the raw parameter count is 5B and 8B, this innovation allows you to run larger models on mobile devices or live-stream from the cloud, with a memory overhead comparable to a 2B and 4B model, meaning the models can operate with a dynamic memory footprint of just 2GB and 3GB."
Additional technical features of Gemma 3n include optimisations that allow the model to respond approximately 1.5 times faster on mobile devices compared to previous Gemma versions, with improved output quality and lower memory usage. The announcement highlights innovations such as Per Layer Embeddings, KVC sharing, and advanced activation quantisation as contributing to these improvements.
The model also supports what the company calls "many-in-1 flexibility." Utilizing a 4B active memory footprint, Gemma 3n incorporates a nested 2B active memory footprint submodel through the MatFormer training process. This design allows developers to balance performance and quality needs without operating separate models, composing submodels on the fly to match a specific application's requirements. Upcoming technical documentation is expected to elaborate on this mix-and-match capability.
Security and privacy are also prioritised. The development team states that local execution "enables features that respect user privacy and function reliably, even without an internet connection."
Gemma 3n brings enhanced multimodal comprehension, supporting the integration and understanding of audio, text, images, and video. Its audio functionality supports high-quality automatic speech recognition and multilingual translation. Furthermore, the model can accept inputs in multiple modalities simultaneously, enabling the parsing of complex multimodal interactions.
The company describes the expansion in audio capabilities: "Its audio capabilities enable the model to perform high-quality Automatic Speech Recognition (transcription) and Translation (speech to translated text). Additionally, the model accepts interleaved inputs across modalities, enabling understanding of complex multimodal interactions." A public release of these features is planned for the near future.
Gemma 3n features improved performance in multiple languages, with notable gains in Japanese, German, Korean, Spanish, and French. This is reflected in benchmark scores such as a 50.1% result on WMT24++ (ChrF), a multilingual evaluation metric.
The team behind Gemma 3n views the model as a catalyst for "intelligent, on-the-go applications." They note that developers will be able to "build live, interactive experiences that understand and respond to real-time visual and auditory cues from the user's environment," and design advanced applications capable of real-time speech transcription, translation, and multimodal contextual text generation, all executed privately on the device.
The company also outlined its commitment to responsible development. "Our commitment to responsible AI development is paramount. Gemma 3n, like all Gemma models, underwent rigorous safety evaluations, data governance, and fine-tuning alignment with our safety policies. We approach open models with careful risk assessment, continually refining our practices as the AI landscape evolves."
Developers have two initial routes for experimentation: exploring Gemma 3n via a cloud interface in Google AI Studio using browser-based access, or integrating the model locally through Google AI Edge's suite of developer tools. These options enable immediate testing of Gemma 3n's text and image processing capabilities.
The announcement states: "Gemma 3n marks the next step in democratizing access to cutting-edge, efficient AI. We're incredibly excited to see what you'll build as we make this technology progressively available, starting with today's preview."