DocFetcher 1.1.1 發布，Linux 桌面全文搜索

openkk 13年前發布 | 8K 次閱讀 DocFetcher

DocFetcher是一個Linux下的桌面全文搜索工具，它可以快速的在指定的文件夾搜索特定關鍵字。功能：

A portable version: There is a portable version of DocFetcher that runs on Windows, Linux and Mac OS X. How this is useful is described in more detail further down this page.
64-bit support: Both 32-bit and 64-bit operating systems are supported.
Unicode support: DocFetcher comes with rock-solid Unicode support for all major formats, including Microsoft Office, OpenOffice.org, PDF, HTML, RTF and plain text files. The only exception is CHM, for which we don't have Unicode support yet.
Archive support: DocFetcher supports the following archive formats: zip, 7z, rar, and the whole tar.* family. The file extensions for zip archives can be customized, allowing you to add more zip-based archive formats as needed. Also, DocFetcher can handle an unlimited nesting of archives (e.g. a zip archive containing a 7z archive containing a rar archive... and so on).
Search in source code files: The file extensions by which DocFetcher recognizes plain text files can be customized, so you can use DocFetcher for searching in any kind of source code and other text-based file formats. (This works quite well in combination with the customizable zip extensions, e.g. for searching in Java source code inside Jar files.)
Outlook PST files: DocFetcher allows searching for Outlook emails, which Microsoft Outlook typically stores in PST files.
Detection of HTML pairs: By default, DocFetcher detects pairs of HTML files (e.g. a file named "foo.html" and a folder named "foo_files"), and treats the pair as a single document. This feature may seem rather useless at first, but it turned out that this dramatically increases the quality of the search results when you're dealing with HTML files, since all the "clutter" inside the HTML folders disappears from the results.
Regex-based exclusion of files from indexing: You can use regular expressions to exclude certain files from indexing. For example, to exclude Microsoft Excel files, you can use a regular expression like this: .*\.xls
Mime-type detection: You can use regular expressions to turn on "mime-type detection" for certain files, meaning that DocFetcher will try to detect their actual file types not just by looking at the filename, but also by peeking into the file contents. This comes in handy for files that have the wrong file extension.
Powerful query syntax: In addition to basic constructs like OR, AND and NOT DocFetcher also supports, among other things: Wildcards, phrase search, fuzzy search ("find words that are similar to..."), proximity search ("these two words should be at most 10 words away from each other"), boosting ("increase the score of documents containing...")

支持的文檔格式包括：

Microsoft Office (doc, xls, ppt)
Microsoft Office 2007 and newer (docx, xlsx, pptx, docm, xlsm, pptm)
Microsoft Outlook (pst)
OpenOffice.org (odt, ods, odg, odp, ott, ots, otg, otp)
Portable Document Format (pdf)
HTML (html, xhtml, ...)
Plain text (customizable)
Rich Text Format (rtf)
AbiWord (abw, abw.gz, zabw)
Microsoft Compiled HTML Help (chm)
Microsoft Visio (vsd)
Scalable Vector Graphics (svg)

DocFetcher 1.1.1 發布，該版本修復了讀取 PDF 和 OpenOffice 文件時崩潰的問題。

本文由用戶 openkk 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/news/view/1e52ad6

DocFetcher

DocFetcher 1.1.1 發布，Linux 桌面全文搜索

相關資訊

相關經驗

相關文檔