Web Content Extractor

by bsmi021

Summary

This MCP server for web content scanning and analysis, developed using TypeScript, provides tools for extracting and processing web page content. It leverages libraries like Cheerio for HTML parsing and Turndown for HTML-to-Markdown conversion, offering capabilities to fetch, analyze, and transform web content. The implementation is designed to integrate seamlessly with AI-assisted workflows, enabling tasks such as web scraping, content summarization, and data extraction. It's particularly useful for researchers, content creators, and developers who need to automate web content analysis, generate structured data from websites, or incorporate web-based information into their AI applications.

Available Actions(6)

fetch-page

Fetches a web page and converts it to Markdown. Parameters: url (required): URL of the page to fetch, selector (optional): CSS selector to target specific content.

extract-links

Extracts all links from a web page with their text. Parameters: url (required): URL of the page to analyze, baseUrl (optional): Base URL to filter links, limit (optional, default: 100): Maximum number of links to return.

crawl-site

Recursively crawls a website up to a specified depth. Parameters: url (required): Starting URL to crawl, maxDepth (optional, default: 2): Maximum crawl depth (0-5).

check-links

Checks for broken links on a page. Parameters: url (required): URL to check links for.

find-patterns

Finds URLs matching a specific pattern. Parameters: url (required): URL to search in, pattern (required): JavaScript-compatible regex pattern to match URLs against.

generate-site-map

Generates a simple XML sitemap by crawling. Parameters: url (required): Root URL for sitemap crawl, maxDepth (optional, default: 2): Maximum crawl depth for discovering URLs (0-5), limit (optional, default: 1000): Maximum number of URLs to include in the sitemap.

Last Updated: July 7, 2025

Reseñas de la Comunidad

0.0

0 reseñas

Aún no hay reseñas. ¡Sé el primero en reseñar!

Inicia sesión para unirte a la conversación

Coming soon to

Highlight AI

Documentación

Ver Repositorio de GitHub

Lenguaje

TypeScript

Categorías

Productividad Herramientas de Desarrollo Diseño IA Búsqueda

Etiquetas

#knowledge-base #infrastructure #project-management #automation

Añadir Web Content Extractor MCP Server | Servidores MCP - Model Context Protocol | La Lista #1 de Servidores MCP