Gemma
What is Gemma?
Gemma is a collection of lightweight open source generative AI (GenAI) models designed mainly for developers and researchers. Gemma was created by the Google DeepMind research lab that also developed closed source Gemini, Google's generative AI chatbots. Google makes Gemma available in several sizes and for use with popular developer tools and Google Cloud services.
The name Gemma comes from the Latin word for precious stone. Google released Gemma on Feb. 21, 2024, with two models: Gemma 2B and Gemma 7B. These are text-to-text, decoder large language models (LLMs) with pretrained and instruction-tuned variants. Gemma 2B has a neural network of 2 billion parameters and Gemma 7B has a neural network of seven billion parameters. Gemma is not as large and powerful as popular AI models such as OpenAI's ChatGPT-4 and Google's Gemini Ultra and Pro chatbots -- which have trillions of parameters. However, Gemma's compact lightweight models can run on laptop or desktop computers because they have faster inference speeds and lower computational demands.
Gemma also runs across mobile devices and public clouds. NVIDIA worked with Google to optimize Gemma to run on its graphics processing units (GPUs). Because of this wide support for platforms and hardware, Gemma can run on GPUs, central processing units, or Google Clouds' Tensor Processing Units (TPUs).
Google allows commercial usage and distribution of Gemma and plans to expand the Gemma family.
How is Gemma different from other AI models?
Gemma has several distinct differences from popular AI chatbots, including Google's Gemini. Gemma stands out for being open and lightweight. Gemini and ChatGPT are closed models, and neither is lightweight enough to run on laptops. Because ChatGPT and Gemini are closed, developers cannot customize their code as they can with the open-source Gemma.
Gemma is not Google's first open AI model, but it is more advanced in its training and performance compared to older models Bert and T5. OpenAI, the developer of ChatGPT, has yet to release any open source models.
Google also has pretrained and instruction-tuned Gemma models to run on laptops and workstations. Similar to Gemma, Meta's Llama 2 is an open source AI model that can run on laptops. Llama2 is more of a business tool than Gemma but is also available to developers through Hugging Face and other platforms. Gemma is generally considered better at scientific tasks while Llama 2 is better for general purpose tasks.
Other open source AI models include Bionic GPT, GPT-Neo, Mistral AI, Hugging Face Falcon 180B, Bloom, Databricks Dolly and Cerebras-GPT. Some of these are much larger than Gemma, and others are mostly developed for specific use cases or vertical markets.
Another difference between Gemma and Gemini is the type of transformer it uses to change an input sequence to an output sequence. Models can use a decoder transformer, encoder transformer or a hybrid of the two.
Decoders generate outputs in the form of new texts such as answers to user queries. These are different than encoder models that process inputs and understand their context. While decoder models are used for generative AI, encoder models handle tasks such as classifying text, answering questions and analyzing texts for emotional tone.
Gemma and ChatGPT use a decoder transformer. Because they are decoder-only, Gemma and ChatGPT work for text-to-text LLMs but not for images and videos. Google Gemini uses both a decoder and encoder architecture. That architecture enables Gemini's multimodal capability, allowing it to support voice and images as well as text in both user prompts and its responses.
What is Gemma used for?
Developers can use Gemma to build their own AI applications, such as chatbots, text summarization tools and other retrieval-augmented generation applications. Because it is lightweight, Gemma is a good fit for real-time GenAI applications such as streaming text that require low latency.
Gemma is available through popular developers' tools, including Colab and Kaggle notebooks and frameworks such as Hugging Face Transformers, JAX, Keras 3.0 and PyTorch.
Gemma models can be deployed on Google Cloud's Vertex AI machine learning platform and Google Kubernetes Engine (GKE). Google Vertex AI lets application builders optimize Gemma for specific use cases, such as text generation summarization and Q&A. It helps support low-latency generative AI use cases such as streaming text. Running Gemma on GKE enables developers to build their own fine-tuned models in portable containers.
Gemma is optimized to run across popular AI hardware, including NVIDIA GPUs and Google Cloud TPUs. NVIDIA collaborated with Google to support Gemma through the NVIDIA TensorRT-LLM open source library for optimizing LLM inference and NVIDIA GPUs running in the data center, in the cloud, and locally on workstations and PCs.
Gemma has been pretrained on large datasets. This saves developers the cost and time of building datasets from scratch and gives them a foundation that they can customize to build their applications. Pretrained models can help build AI apps in areas such as natural language processing, speech AI, computer vision, healthcare, cybersecurity and creative arts.
Google said Gemma was trained on a diverse set of English-language Web text documents to expose it to a range of linguistic styles, topics, and vocabulary. Google also trained Gemma in programming language code and mathematical text to help it generate code and answer code-related and mathematical questions.
Who can use Gemma?
Although Gemma can be used by anyone, it is designed mainly for developers. Because it is open-sourced, lightweight, and widely available through developer platforms and hardware devices, Gemma is said to "democratize AI."
However, there are risks to making open AI models for commercial use. Bad actors can use AI to develop applications that infringe on privacy or spread disinformation or toxic content.
Google has taken steps to address those dangers with Gemma. It released a Responsible Generative AI Toolkit for Gemma with best practices for using open AI responsibly. The toolkit provides guidance for setting safety policies for tuning, classifying, and evaluating models and a Learning Interpretability Tool to help developers understand natural language processing (NLP) model behavior. It also includes a methodology for building robust safety classifiers.
When launching Gemma, Google said it was built "to assist developers and researchers in building AI responsibly." Gemma's terms of use prohibit offensive, illegal, or unethical applications.
Google also claims Gemma is pretrained by DeepMind to omit harmful, illegal and biased content, and personal and sensitive information. It also released its model documentation detailing its capabilities, limitations and biases.
Developers and researchers have free access to Gemma in Kaggle and Colab, an as-a-service Jupyter Notebook version. First-time Google Cloud users can receive $300 in credits when using Gemma, and researchers can apply for up to $500,000 in Google Cloud credits for their Gemma projects.