This MCP server implementation provides voice interaction capabilities for AI assistants, enabling speech-to-text and text-to-speech functionality. It uses faster-whisper for improved speech recognition performance and integrates with PyAudio for audio processing. The server offers a simplified API for starting conversations and replying to user input, making it suitable for applications requiring natural language voice interfaces with AI models.
Aucun avis encore. Soyez le premier à donner votre avis !
Connectez-vous pour rejoindre la conversation
Generate audio files for a conversation using multiple voices. Parameters: script (string - path to the script file), output_path (string - path to save the output audio file), script_format (string - format of the script, either 'json' or 'markdown')
Convert text directly to speech. Parameters: text (string - text to convert to speech), output_path (string - path to save the output audio file) or text_file_path (string - path to a file containing text)
Transcribe speech from audio or video files. Parameters: file_path (string - path to the audio or video file), include_timestamps (boolean - optional, to include timestamps in the transcription), detect_speakers (boolean - optional, to enable speaker detection)