Docs Scraper
Summary
Documentation scraping server that enables AI assistants to extract structured content from web-based documentation through multiple crawling strategies. Built with Python and the crawl4ai library, it provides tools for single URL crawling, multi-URL batch processing, sitemap-based crawling, and menu-driven navigation extraction, with features like rate limiting, concurrent request handling, and robots.txt compliance. The implementation is particularly valuable for users who need to ingest documentation into AI systems while respecting site access policies and maintaining clean markdown output.
Available Actions(4)
Single URL Crawler
Extracts content from a single documentation page and outputs it in clean Markdown format. Requires a target documentation URL as the first argument.
Multi URL Crawler
Processes multiple URLs in parallel, generating individual Markdown files for each page. Requires a file with URLs as the first argument, which can be a .txt or .json file.
Sitemap Crawler
Automatically discovers and crawls sitemap.xml files, creating Markdown files for each page. Accepts a sitemap URL and optional parameters for recursion depth and URL patterns.
Menu Crawler
Extracts all menu links from documentation, outputting them in a structured JSON format. Requires a starting URL and can accept optional custom menu selectors.
社区评论
暂无评论. 成为第一个评论的人!
登录以参与讨论