This MCP implementation, developed by JMH, is a Python-based web crawler designed for extracting and saving website content as markdown files. It offers features like website structure mapping, batch URL processing, and configurable output settings. The project integrates with FastMCP for easy installation and deployment, and leverages libraries such as BeautifulSoup and requests for efficient web scraping. Its focus on markdown output and straightforward configuration makes it particularly suitable for content aggregation, site archiving, or building knowledge bases from web sources. The crawler's ability to create content indexes and its support for concurrent requests set it apart as a tool for both small-scale personal projects and larger data collection tasks.
Aún no hay reseñas. ¡Sé el primero en reseñar!
Inicia sesión para unirte a la conversación
Extract content from a given URL and save it to a specified output file. Parameters: url (string), output_path (string)
Scan the content linked from a given URL for further processing. Parameters: url (string)
Create an index from a content map. The content map is provided via standard input. Parameters: content_map (string), output_path (string)