HTMLParser.Net

71
rate this page

By programmerstools


What is HTMLParser.Net?

HTMLParser.Net is a .Net library built on codebase of popular javabased HTMLParser available on sourceforge.net. If you are building applications that involve screen scrapping of HTML pages or data extraction from the web sites, then you definitely want to have a tool like HTMLParser.Net in your arsenal. Parsing of a page is as simple as writing 4 lines of code and you are on your way home. And if you want to little bit more creative with your parsing and query of results, then the API offer more advanced features that are easy to use.

Community Edition

We offer a community edition of the library for free download. This edition has all the features of professional edition version except support for mime types likes PDF, MS office documents, Xml etc. and multithreaded crawling capabilities. But if your needs are limited to text/html mime type then this is a great library to keep in your tool chest.

Download Free Community Edition Version

Features

Feature list of the API includes

  • You can use it with any .Net language (C#,VB.Net,J# etc.)
  • Parses almost all the HTML tags and allows you to search based on tag types, attribute values or regular expression search in the content. There were some tags that were not supported by javabased HTMLParser project. We have included those in this release.
  • Set of extensible filters that allows you to filter the content that you do not want to include in your analysis.
  • High level APIs that allow you to get answers to common questions like, What are outbound links in the page, What are images in the page, What are different tables on the page, Are there any broken links on the page and much more.
  • A configuration file based Http protocol engine that extracts the content from the URL that you specified. The crawler follows the instructions in robots.txt file of that site and does not get the content if site blocks that page.
  • Http protocol engine is fully capable of handling compressed response sent from any site. it accepts gzip, x-zip and deflate mime types.


  —   Rate it:  up  down  [flag this hub]

Comments

RSS for comments on this Hub Small RSS Icon

No comments yet.

Submit a Comment

Members and Guests

Sign in or sign up and post using a hubpages account.


optional



working