基于Java的開源HTML解析器:jsoup 1.7.3 發布
Jsoup是一個Java的HTML解析器,提供了非常方便的抽取和操作HTML文檔方法,可以結合DOM,CSS和Jquery類似的方法來定位和得到節點的信息。
有著和Jquery一樣強大的select和pipeline的API。
jsoup 1.7.3 版本發布了,這個版本引入了改進的表單處理,更強大的字符集檢測,在解析和CSS選擇器方面速度和內存得到了優化,以及一些錯誤修正。
詳細改進內容如下:
Improvements:
- Added the element type FormElement, to facilitate simple form submissions. Find forms in a doc using Elements.forms(), then prepare it for submission with FormElement.submit().
- Improved the reliability of HTTP character-set recognition from response headers, particularly for when servers return out-of-spec responses.
- Added Document.location() to retrieve the document's location URL. Handy if the request was redirected from the original URL.
- Large decrease in the amount of temporary objects created during parsing, leading to less GC load (helpful particularly on Android), and faster parsing.
- Improved the time to match elements with common CSS selectors by ~ 27%.
Bug Fixes:
- Fixed support for self-closing script tags.
- Fixed a crash when reading an unterminated CDATA section.
- Fixed an issue where elements added via the adoption agency algorithm did not preserve their attributes.
- Fixed an issue when cloning a document with extremely nested elements that could cause a stack-overflow.
- Fixed an issue when connecting or redirecting to a URL that contains a space.
Document doc = Jsoup.connect("http://en.wikipedia.org/").get(); Elements newsHeadlines = doc.select("#mp-itn b a");</div> </div> 本站翻譯的中文版cookbook:http://www.baiduhome.net/Jsoup/
本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!