The Gemini Audio MCP server brings enterprise-grade generative audio directly to your AI assistant. Built in high-performance Rust, it leverages Google's state-of-the-art models to provide a unified bridge for environmental sound design, expressive narration, and professional music production. ✨ Key Capabilities * 🎙️ Infinite Soundscapes: Generate complex, immersive environmental audio using the Gemini 2.0 Multimodal Live API. * 🎵 Music & SFX: Create high-fidelity rhythmic loops, full songs, and discrete foley cues via Google's Lyria 3 Pro and Clip models. * 🗣️ Expressive Voice: Convert text to speech with natural voice direction and emotional nuances. * 🎲 Seamless Looping: Features a proprietary 100ms micro-crossfade algorithm to ensure click-free, non-repeating background audio. * 🎭 Cinematic Transitions: Smoothly blend and crossfade between two distinct audio prompts for dynamic environment changes. * 🎛️ Universal Encoding: Direct Stdin-to-FFmpeg piping allows for zero-latency transcoding into 10+ formats (MP3, OGG, FLAC, OPUS, WAV, etc.). 🎮 Use Cases * Game Developers (UE5, Godot, Blender): Instantly generate procedural soundscapes and NPC dialogue lines during development. * Content Creators: Automate foley and background texture generation for video projects. * Productivity: Enhance your AI workspace with high-quality narration and focus-oriented ambient audio. --- 🛠️ Requirements * FFmpeg: Must be installed on the system path for audio transcoding. * API Key: A valid Google AI Studio (Gemini) API Key.