Google Gemini Live API Offers Real-Time Speech Translation in Over 70 Languages

10 June 2026 · 06:00 · Claude (Anthropic) · claude-sonnet-4-6

Google has expanded the Gemini Live API with live speech-to-speech translation in over 70 languages. With the new gemini-3.5-live-translate-preview model, developers can break language barriers in real time with ultra-low latency.

Google has expanded its Gemini Live API with a powerful real-time translation feature: live speech-to-speech translation in over 70 languages. With the new model gemini-3.5-live-translate-preview, Google gives developers the ability to break language barriers in real time — from customer service calls to international meetings. This breakthrough marks a new phase in AI-driven communication technology.

What Is the Gemini Live API with Live Translation?

The Gemini Live API is Google's platform for real-time, multimodal AI interaction. The latest expansion focuses specifically on simultaneous translation of spoken language. Unlike traditional translation software that waits until a sentence has been fully spoken, this API processes continuous audio streams — similar to a professional interpreter who listens and translates on the fly.

The model supports over 70 languages, including English, Spanish, French, German, Dutch, Russian, Chinese, Japanese, and Korean. Each language is identified via a BCP-47 language code (such as "en" for English), enabling seamless integration into existing applications. For more context on the development of such technologies, visit our page on the history of artificial intelligence.

How Does Real-Time Translation Work Technically?

The technical implementation is impressive in its effectiveness. Developers send raw PCM audio data — 16-bit, 16 kHz — in small chunks of 100 milliseconds to the API. In return, they receive translated audio at 24 kHz, ensuring high audio quality on the output side.

A notable feature is the low latency. Because the system continuously listens rather than waiting for pauses in speech, the delay is minimal. This makes the feature well-suited for applications where speed is essential. In addition to the translated audio, the API also offers optional real-time transcriptions — both of the original input and the translated output — useful for applications that also want to maintain a written record of a conversation.

Configuration Options and Secure Integration

Developers can customize the API via the targetLanguageCode parameter, which sets the target language. An additional parameter, echoTargetLanguage, provides further control over system behavior. Important to note: the API currently supports audio input only — text input for translation is not possible within this live mode.

For client-side applications — such as mobile apps or web applications — Google offers ephemeral tokens. These are temporary authentication tokens that allow developers to securely hide their API keys from end users, a crucial security advantage for production environments.

Limitations and Challenges

Google is transparent about the current limitations of the technology. Strong accents can complicate language detection, and when switching rapidly between languages, the system may struggle to keep track of the correct language. Additionally, voice replication can be inconsistent during long pauses, and background noise can lead to artifacts in the translated audio.

The fact that Google openly documents these limitations reflects a mature approach to AI product development. Honesty about what the technology can and cannot do helps developers set realistic expectations and choose the right use cases.

Use Cases for Businesses and Developers

The possibilities of real-time AI translation are enormously broad. Some promising AI applications include:

Customer service: companies can communicate multilingually without hiring bilingual staff.
International meetings: participants speak their native language while others receive a direct translation.
Education: language teaching and international exchange programs benefit from direct translation support.
Telemedicine: doctors and patients with different native languages can communicate more effectively.

Conclusion: Google Sets the Standard for Real-Time AI Communication

With the live translation feature in the Gemini Live API, Google demonstrates that AI-driven communication technology is entering a new phase. Where translation software previously worked in batches and sometimes took several seconds, real-time speech-to-speech translation in over 70 languages is now accessible to every developer via a simple API integration.

The combination of low latency, high-quality audio output, optional transcriptions, and secure authentication makes this a serious tool for production environments. At the same time, there are still challenges to overcome — particularly around accents and background noise — improvements that will undoubtedly follow in future model versions. Follow more AI news on Stersoftware or visit our knowledge base for in-depth information on the latest AI developments.

Google AI for Developers

Source: Google AI for Developers

Related news

OpenAI AI Agent Hacks Hugging Face: Incident Only Discovered After a Week

Google Sets the Tone With Record AI Infrastructure Investments

OpenAI AI Agent Escapes Test Environment and Hacks Hugging Face

Google Gemini Live API Offers Real-Time Speech Translation in Over 70 Languages

What Is the Gemini Live API with Live Translation?

How Does Real-Time Translation Work Technically?

Configuration Options and Secure Integration

Limitations and Challenges

Use Cases for Businesses and Developers

Conclusion: Google Sets the Standard for Real-Time AI Communication

Related news

Ster Software

Explore

About

Legal