HTML解析器，Jericho HTML Parser 3.3 發布

jopen 13年前發布 | 11K 次閱讀 Jericho

Jericho HTML Parser是一個開源的Java類庫，能夠分析和操作HTML文檔，包括服務器端標簽，并且能夠逐字恢復任何無法識別的或無效的HTML。此外，它還提供一個高級的HTML表單操作功能。

此版本包括重要的錯誤修正和各種增強功能。

以下是一個列出所有元素的示例：

import net.htmlparser.jericho.*;
import java.util.*;
import java.io.*;
import java.net.*;

public class DisplayAllElements {
    public static void main(String[] args) throws Exception {
        String sourceUrlString="data/test.html";
        if (args.length==0)
          System.err.println("Using default argument of \""+sourceUrlString+'"');
        else
            sourceUrlString=args[0];
        if (sourceUrlString.indexOf(':')==-1) sourceUrlString="file:"+sourceUrlString;
        MicrosoftConditionalCommentTagTypes.register();
        PHPTagTypes.register();
        PHPTagTypes.PHP_SHORT.deregister(); // remove PHP short tags for this example otherwise they override processing instructions
        MasonTagTypes.register();
        Source source=new Source(new URL(sourceUrlString));
        List<Element> elementList=source.getAllElements();
        for (Element element : elementList) {
            System.out.println("-------------------------------------------------------------------------------");
            System.out.println(element.getDebugInfo());
            if (element.getAttributes()!=null) System.out.println("XHTML StartTag:\n"+element.getStartTag().tidy(true));
            System.out.println("Source text with content:\n"+element);
        }
        System.out.println(source.getCacheDebugInfo());
  }
}

本文由用戶 jopen 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/news/view/76c792

Jericho

HTML解析器，Jericho HTML Parser 3.3 發布

相關資訊

相關經驗

相關文檔