開源知識提取系統：DeepDive

jopen 9年前發布 | 52K 次閱讀 DeepDive 機器學習

DARPA提供了一個開源的類似Watson項目 DeepDive ，主要基于SQL和Python。眾知Watson是一個比較出色的QA系統，而DeepDive主要面向從互聯網非結構化數據中抽取結構化信息，做一系列后處理，構建知識庫并抽取關系等。

DeepDive有以下幾種方式不同于傳統系統：

DeepDive is aware that data is often noisy and imprecise: names are misspelled, natural language is ambiguous, and humans make mistakes. Taking such imprecisions into account, DeepDive computescalibrated probabilities for every assertion it makes. For example, if DeepDive produces a fact with probability 0.9 it means the fact is 90% likely to be true.

DeepDive is able to use large amounts of data from a variety of sources. Applications built using DeepDive have extracted data from millions of documents, web pages, PDFs, tables, and figures.

DeepDive allows developers to use their knowledge of a given domain to improve the quality of the results by writing simple rules that inform the inference (learning) process. DeepDive can also take into account user feedback on the correctness of the predictions, with the goal of improving the predictions.

DeepDive is able to use the data to learn "distantly". In contrast, most machine learning systems require tedious training for each prediction. In fact, many DeepDive applications, especially at early stages, need no traditional training data at all!

DeepDive’s secret is a scalable, high-performance inference and learning engine. For the past few years, we have been working to make the underlying algorithms run as fast as possible. The techniques pioneered in this project are part of commercial and open source tools includingMADlib,Impala, a product fromOracle, and low-level techniques, such asHogwild!. They have also been included in Microsoft's Adam.

本文由用戶 jopen 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！