MCPServers
Docs Scraper - MCP server logo

Docs Scraper

1
0

Summary

Documentation scraping server that enables AI assistants to extract structured content from web-based documentation through multiple crawling strategies. Built with Python and the crawl4ai library, it provides tools for single URL crawling, multi-URL batch processing, sitemap-based crawling, and menu-driven navigation extraction, with features like rate limiting, concurrent request handling, and robots.txt compliance. The implementation is particularly valuable for users who need to ingest documentation into AI systems while respecting site access policies and maintaining clean markdown output.

Available Actions(4)

Single URL Crawler

Extracts content from a single documentation page and outputs it in clean Markdown format. Requires a target documentation URL as the first argument.

Multi URL Crawler

Processes multiple URLs in parallel, generating individual Markdown files for each page. Requires a file with URLs as the first argument, which can be a .txt or .json file.

Sitemap Crawler

Automatically discovers and crawls sitemap.xml files, creating Markdown files for each page. Accepts a sitemap URL and optional parameters for recursion depth and URL patterns.

Menu Crawler

Extracts all menu links from documentation, outputting them in a structured JSON format. Requires a starting URL and can accept optional custom menu selectors.

Last Updated: April 25, 2025

社区评论

0.0
0 条评论
5
0
4
0
3
0
2
0
1
0

暂无评论. 成为第一个评论的人!

登录以参与讨论

Coming soon to
HighlightHighlight AI

语言

TypeScript

分类

标签