Documentation scraping server that enables AI assistants to extract structured content from web-based documentation through multiple crawling strategies. Built with Python and the crawl4ai library, it provides tools for single URL crawling, multi-URL batch processing, sitemap-based crawling, and menu-driven navigation extraction, with features like rate limiting, concurrent request handling, and robots.txt compliance. The implementation is particularly valuable for users who need to ingest documentation into AI systems while respecting site access policies and maintaining clean markdown output.
아직 리뷰가 없습니다. 첫 번째 리뷰를 작성해 보세요!
대화에 참여하려면 로그인하세요
Extracts content from a single documentation page and outputs clean Markdown format. Parameters: URL (required)
Processes multiple URLs in parallel, generating individual Markdown files per page. Parameters: URLs file (required), --output-prefix (optional)
Automatically discovers and crawls sitemap.xml, creating Markdown files for each page. Parameters: URL (required), --max-depth (optional), --patterns (optional)
Extracts all menu links from documentation and outputs structured JSON format. Parameters: URL (required), --selectors (optional)