This MCP server implementation provides voice interaction capabilities for AI assistants, enabling speech-to-text and text-to-speech functionality. It uses faster-whisper for improved speech recognition performance and integrates with PyAudio for audio processing. The server offers a simplified API for starting conversations and replying to user input, making it suitable for applications requiring natural language voice interfaces with AI models.
No reviews yet. Be the first to review!
Sign in to join the conversation
Generate audio files for a conversation using multiple voices. Parameters: script (string - path to the script file), output_path (string - path to save the output audio file), script_format (string - format of the script, either 'json' or 'markdown')
Convert text directly to speech. Parameters: text (string - text to convert to speech), output_path (string - path to save the output audio file) or text_file_path (string - path to a file containing text)
Transcribe speech from audio or video files. Parameters: file_path (string - path to the audio or video file), include_timestamps (boolean - optional, to include timestamps in the transcription), detect_speakers (boolean - optional, to enable speaker detection)