HotSAX is a fast, small footprint, non-validating SAX2 parser for HTML/XML/XHTML. It can be used in simple web agents, page scrapers, and spiders. It is similar to the Apache Xerces parser, except that it can generate SAX events for badly formatted HTML as well.
HotSAX是一个快速,小型的footprint, 用于HTML/XML/XHTML的非确认的SAX2解析。它可以在简单的Web代理、页面抓取器和爬虫程序中使用。它类似于Apache Xerces分析器,除了它可以为粗糙格式化的HTML生成SAX事件。
来源: http://freshmeat.net/projects/hotsax/?topic_id=93%2C244%2C913%2C53%2C867 |