DotNetWikiBot Framework

jopen 12年前發布 | 17K 次閱讀 爬蟲 網絡爬蟲

DotNetWikiBot Framework 是一個全功能的客戶端API和一個控制臺應用,用來構建抓取基于 MediaWiki 網站的爬蟲,采用 .NET 開發。

using DotNetWikiBot; // Reference DotNetWikiBot namespace for easy access

class MyBot : Bot // Derive your bot class from framework's Bot class { public static void Main() { // Firstly make Site object, specifying site's URL and your bot account Site enWiki = new Site("

    // Make empty PageList object, representing collection of pages
    PageList pl = new PageList(enWiki);
    // Fill it with 100 pages, where "nuclear disintegration" is mentioned
    pl.FillFromGoogleSearchResults("nuclear disintegration", 100);
    // Load texts and metadata of all found pages from live wiki
    pl.LoadEx();
    // Now suppose, that we must correct some typical mistake in all our pages
    foreach (Page i in pl)
        // In each page we will replace one phrase with another
        i.text = i.text.Replace("fusion products", "fission products");
    // Finally we'll save all changed pages to wiki with 5 seconds interval         
    pl.SaveSmoothly(5, "comment: mistake autocorrection", true);

    // Now clear our PageList so we could re-use it
    pl.Clear();
    // Fill it with all articles in "Astronomy" category and it's subcategories
    pl.FillFromCategoryTree("Astronomy");
    // Download and save all PageList's articles to specified local XML file
    pl.SaveXMLDumpToFile("Dumps\\ArticlesAboutAstronomy.xml");      
}

}</pre>

項目主頁:http://www.baiduhome.net/lib/view/home/1349946678556

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!