Documentation scraping server that enables AI assistants to extract structured content from web-based documentation through multiple crawling strategies. Built with Python and the crawl4ai library, it provides tools for single URL crawling, multi-URL batch processing, sitemap-based crawling, and menu-driven navigation extraction, with features like rate limiting, concurrent request handling, and robots.txt compliance. The implementation is particularly valuable for users who need to ingest documentation into AI systems while respecting site access policies and maintaining clean markdown output.
No reviews yet. Be the first to review!
Sign in to join the conversation
Extracts content from a single documentation page and outputs it in clean Markdown format. Requires a target documentation URL as a parameter.
Processes multiple URLs in parallel and generates individual Markdown files for each page. Requires a file containing URLs or JSON output from the menu crawler as the first argument.
Automatically discovers and crawls sitemap.xml files, creating Markdown files for each page. Accepts optional parameters for maximum recursion depth and URL patterns.
Extracts all menu links from documentation and outputs them in structured JSON format. Accepts an optional parameter for custom menu selectors.