This MCP server implementation provides voice interaction capabilities for AI assistants, enabling speech-to-text and text-to-speech functionality. It uses faster-whisper for improved speech recognition performance and integrates with PyAudio for audio processing. The server offers a simplified API for starting conversations and replying to user input, making it suitable for applications requiring natural language voice interfaces with AI models.
No reviews yet. Be the first to review!
Sign in to join the conversation
Narrate a conversation using a defined script. Parameters: script (string, path to JSON or Markdown file), output_path (string, path to save the audio file), script_format (string, either 'json' or 'markdown')
Convert text directly to speech. Parameters: text (string, the text to convert), output_path (string, path to save the audio file). Alternatively, use text_file_path (string, path to a text file).
Transcribe speech from an audio or video file. Parameters: file_path (string, path to the audio/video file), include_timestamps (optional boolean), detect_speakers (optional boolean).