2011年4月21日星期四

Script-Tutorials.com: How to parse web pages using XPath


On the Script-Tutorials.com site there's a new article showing you how to use XPath to parse web pages, complete with screenshots of the page and the code to make it happen.



Today I will tell you how you can make parsers of remote HTML pages (in PHP). In this article I will show you how to perform xpath queries to Web pages. XPath - a query language to elements of xml or xhtml document. To obtain the necessary data, we just need to create the necessary query. For the work, we also need: browser Mozilla Firefox, firebug and firepath plugins. For our experiment, I suggest this webpage Google Sci/Tech News. Of course you can choose any other web page too.


They provide two demos and a downloadable package with everything you need. The script pulls in the page as a DOM document (which works as long as it's correctly formatted XML) and spits back out the matches from a few different XPath expressions. There's all sorts of sites out there that can help you with examples of other XPath expressions and syntax.

没有评论:

发表评论