This MCP server implementation provides voice interaction capabilities for AI assistants, enabling speech-to-text and text-to-speech functionality. It uses faster-whisper for improved speech recognition performance and integrates with PyAudio for audio processing. The server offers a simplified API for starting conversations and replying to user input, making it suitable for applications requiring natural language voice interfaces with AI models.
暂无评论. 成为第一个评论的人!
登录以参与讨论
Generate audio files with multiple voices for stories and dialogues. Parameters: script (string - path to the script), output_path (string - path to save the output audio), script_format (string - either 'json' or 'markdown')
Convert text directly to speech. Parameters: text (string - the text to convert), output_path (string - path to save the output audio), text_file_path (string - path to a text file containing the text to convert)
Transcribe speech from various audio and video formats. Parameters: file_path (string - path to the audio or video file), include_timestamps (optional boolean), detect_speakers (optional boolean)