基于Java的開源HTML解析器：jsoup 1.7.3 發布

jopen 12年前發布 | 9K 次閱讀 jsoup

Jsoup是一個Java的HTML解析器，提供了非常方便的抽取和操作HTML文檔方法，可以結合DOM，CSS和Jquery類似的方法來定位和得到節點的信息。
有著和Jquery一樣強大的select和pipeline的API。

jsoup 1.7.3 版本發布了，這個版本引入了改進的表單處理，更強大的字符集檢測，在解析和CSS選擇器方面速度和內存得到了優化，以及一些錯誤修正。

詳細改進內容如下：

Improvements:

- Added the element type FormElement, to facilitate simple form submissions. Find forms in a doc using Elements.forms(), then prepare it for submission with FormElement.submit().

- Improved the reliability of HTTP character-set recognition from response headers, particularly for when servers return out-of-spec responses.

- Added Document.location() to retrieve the document's location URL. Handy if the request was redirected from the original URL.

- Large decrease in the amount of temporary objects created during parsing, leading to less GC load (helpful particularly on Android), and faster parsing.

- Improved the time to match elements with common CSS selectors by ~ 27%.

Bug Fixes:

- Fixed support for self-closing script tags.

- Fixed a crash when reading an unterminated CDATA section.

- Fixed an issue where elements added via the adoption agency algorithm did not preserve their attributes.

- Fixed an issue when cloning a document with extremely nested elements that could cause a stack-overflow.

- Fixed an issue when connecting or redirecting to a URL that contains a space.

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements newsHeadlines = doc.select("#mp-itn b a");

</div> </div> 本站翻譯的中文版cookbook：http://www.baiduhome.net/Jsoup/

本文由用戶 jopen 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/news/view/196a180

jsoup

基于Java的開源HTML解析器：jsoup 1.7.3 發布

相關資訊

相關經驗

相關文檔