Java開源類庫HTML 解析器，jsoup 1.8.3 發布

n2fy 10年前發布 | 13K 次閱讀 jsoup

jsoup 是一款 Java 的HTML 解析器，可直接解析某個URL地址、HTML文本內容。它提供了一套非常省力的API，可通過DOM，CSS以及類似于JQuery的操作方法來取出和操作數據。

jsoup的主要功能如下：

從一個URL，文件或字符串中解析HTML；
</li>
使用DOM或CSS選擇器來查找、取出數據；
</li>
可操作HTML元素、屬性、文本；
</li> </ol>
jsoup 1.8.3 發布，此版本主要改進有：解析大型 HTML 文件的一些性能提升；抓取 XML 文檔時，自動切換到 XML 解析器；重要 bug 修復。

更新內容：

改進
- Performance improvement on parsing larger HTML pages. On Android KitKat, around 1.7x times faster.
- On Android Lollipop, ~ 1.3x faster. Improvements largely from re-ordering the HtmlTreeBuilder methods based on analysis of various websites; also from further memory reduction for nodes with no children, and other tweaks.
- When fetching XML URLs, automatically switch to the XML parser instead of the HTML parser.
- Improved support for boolean attributes in HTML5.
- When serialising XML, ensure that '<' characters in attributes are escaped, per spec. Not required in HTML.
Bug 修復
- Fixed an issue in Element.elementSiblingIndex() (and related methods) where sibling elements with the same content would incorrectly have the same sibling index.
- Fixed an issue where unexpected elements in a badly nested table could be moved to the wrong location in the document.
- Fixed an issue where a table nested within a TH cell would parse to an incorrect tree.
- When serializing a document using the XHTML encoding entities, if the character set did not support   chars (such as Shift_JIS), the character would be skipped. For visibility, will now always output &xa0; (the hex code for non-breaking-space); when using XHTML encoding entities (as   is not defined), regardless of the output character set.
- Fixed an issue when resolving URLs, where if the absolute URL had no path, the relative URL was not normalized correctly.
- Fixed an issue where connections that were redirected to a relative URL did not have the same normalization rules as a URL read from Nodes.absUrl(String).
本文由用戶 n2fy 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/news/view/b5ae1c

jsoup

Java開源類庫HTML 解析器，jsoup 1.8.3 發布

相關資訊

相關經驗

相關文檔