I would like to write some code to parse a set of HTML pages from the internet in order to gather information from each web page.

All of the web pages are generated using a template, so the format of each of the web-pages is consistent with one-another and the information that I want to gather is always located in the same logical place within the page.

What is the best way to parse an html page in order to gather information at a specific place?

Can XML XPATH be used here? Does anyone have any examples of parsing HTML content?

2 6
0 1.4K