Listen to the introduction
Introduction
In this post, I’ll walk through my experience integrating Google's Gemini Flash LLM into a production-ready Node.js backend. With the rapid evolution of large language models and the growing demand for AI-powered features in real-world applications, having a streamlined process for integrating these tools has become more important than ever.
Gemini Flash stands out for its performance and lightweight design—making it particularly attractive for APIs that demand low latency and high reliability.
Why Gemini?
I was in the market for MedLM and got in touch with the team at Google. During our brief call, they suggested integrating Gemini as MedLM might be sunset in favor of Gemini soon. A bit skeptical, I decided to give it a try since a lot of my production services are already running on GCP. Integration was quick and straightforward. To my surprise, the speed and efficiency of the model beat both OpenAI (4o) and Anthropic (Claude 3.7 sonnet). Gemini Flash offers a good balance between speed and capability. It supports streaming, which is ideal for use cases like chat assistants and summarization endpoints. Both chat and summarizations are features offered in the Kaizen Health app.
Goals of This Integration
- Integrate Gemini Flash via the new node SDK for Google Gen AI
- Build a secure, modular Node.js API wrapper
- Ensure proper error handling and logging
- Optimize performance for real-time use
Tech Stack
- Node.js (v18+)
- Express.js
- Gen AI
Getting Started
Most of my code has already been set up thanks to earlier integrations with OpenAI and Anthropic (I did briefly add Deepseek but decided to keep that off in production for reasons unknown)
Get your API key
- Navigate to Google Studio and get your API key.
- Add this API key to an environment config. I stored it under
GEMINI_API_KEY
. P.S - Please do not expose this API key to anyone. I would recommend storing this in a password wallet.
I already have a simple node application running, and I'm not going to cover how to set up a node project. Please read this documentation if you are completely new.
- I am using esm modules but feel free to use whatever you are comfortable with. The other option being commonJS (import vs require)
- Install the package using
npm install @google/genai
- Initialize the assistant. (I am not using vertexAi for this example)
javascriptconst GEMINI_API_KEY = process.env.GEMINI_API_KEY; const gemini = new GoogleGenAI({ apiKey: GEMINI_API_KEY });
- Once you have initialized your GoogleGenAI instance, you are set to start building a chat stream service.
Next, to get started really quickly:
- Using streaming:
javascriptconst stream = await geminiChat.sendMessageStream({ message: [{text: "Hey, explain the meaning of AI"}], config: { systemInstruction: "You are a tutor who helps break down complex problems and questions.", temperature: 0.5, maxOutputTokens: 2048, topK: 50, }, });
- Non-streaming option:
javascriptawait geminiChat.sendMessage({ message: [{text: "Hey, explain the meaning of AI"}], config: { systemInstruction: "You are a tutor who helps break down complex problems and questions.", temperature: 0.5, maxOutputTokens: 2048, topK: 50, }, });
Feel free to mess around with the topK and temperature values. These values help fine-tune your LLM responses. A great article about the relevance of topK, topP and temperate is here I was able to get this up and running along with the prompts for Kaizen in less than a few hours of work. The responses and efficiency are on par with some of the competing models.
Live in Production
Download the latest Kaizen Health from the App Store or Play Store to test this out live! This model is specifically trained on medical data and over time it will get trained on a user's health history. This enables the virtual assistant—Kai, to constantly monitor yours and your family's health and help take preventative measures.
Parting thoughts
Gemini Flash is an extremely fast and efficient model. In comparison to OpenAI and Claude 3.7 sonnet
, it performs relatively faster with streaming, and the large context window is a huge plus when sending additional instructions or user information to personalize the responses. I would highly recommend integrating Gemini into your chat/AI applications. More instructions on how to handle files and parse PDF's coming soon.