從文檔(office,pdf,hwp)抽取文本的Java類庫:JSearch

jopen 9年前發布 | 13K 次閱讀 Java開發 jsearch

從文檔(office,pdf,hwp)抽取文本的Java類庫:JSearch。

Download & Installation

JSearch.jar
Just import JSearch.jar to your project

Requirement

  1. It should work with various types of document. ex) hwp, pdf, office
  2. It should support extract string and rapidly find keyword from doucments.
  3. It will be jar library.
  4. All functions are synchronous.
  5. a result of extraction contains full string.
  6. a result of finding contains word count.

Class

public class JSearch

JSearch supports various types of documents with open source engines.
And this library contains 3 types of functions. extract...() and isContainsKeyword...() and getFileList...()

HWP, DOC, PPT, EXCEL, TEXT, PDF and UNKNOWN are supported.

Modifier and Type Method and Description
static java.lang.String extractContentsFromFile(java.io.File target)
extract string
static java.lang.String extractContentsFromFile(java.lang.String filePath)
extract string
static java.util.List getFileListContainsKeywordFromDirectory(java.lang.String dirPath, java.lang.String keyword)
get a list of files which are containing keyword.
static java.util.List getFileListContainsKeywordFromDirectory(java.lang.String dirPath, java.lang.String keyword, boolean recursive)
get a list of files which are containing keyword.
static boolean isContainsKeywordFromFile(java.io.File file, java.lang.String keyword)
get true or false about containing keyword.
static boolean isContainsKeywordFromFile(java.lang.String filePath, java.lang.String keyword)
get true or false about containing keyword.

項目主頁:http://www.baiduhome.net/lib/view/home/1439124196411

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!