利用 Sphinx 實現實時全文檢索

jopen 13年前發布 | 25K 次閱讀 Sphinx 搜索引擎

Sphinx 0.9.9及以前的版本,原生不支持實時索引,一般的做法是通過主索引+增量索引的方式來實現“準實時”索引,最新的1.10.1(trunk中,尚未發 布)終于支持real-time index,查看SVN中文檔,我們很容易利用Sphinx搭建一個按需索引(on demand index)的全文檢索系統。

參考文章:http://filiptepper.com/2010/05/27/real-time-indexing-and-searching-with-sphinx-1-10-1-dev.html

首先,從sphinxsearch的SVN下載最新的代碼,編譯安裝:

svn checkout http://sphinxsearch.googlecode.com/svn/trunk sphinx
cd sphinx/
./configure --prefix=/path/to/sphinx
make
make install

編譯沒問題的話,在sphinx安裝目錄下的etc,建立sphinx.conf的配置文件,記得一定指定中文編碼方面的配置搜索,否則中文會有問題:

index rt {
    # 指定索引類型為real-time index
    type = rt
    # 指定utf-8編碼
    charset_type  = utf-8
    # 指定utf-8的編碼表
    charset_table  = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
    # 一元分詞
    ngram_len = 1
    # 需要分詞的字符
    ngram_chars   = U+3000..U+2FA1F
    # 索引文件保存地址
    path = /path/to/sphinx/data/rt
    # 索引列
    rt_field = message
    # 索引屬性
    rt_attr_uint = message_id
}
 
searchd {
    log = /path/to/sphinx/log/searchd.log
    query_log = /path/to/sphinx/log/query.log
    pid_file = /path/to/sphinx/log/searchd.pid
    workers = threads
    # sphinx模擬mysql接口,不需要真正的mysql,mysql41表示支持mysql4.1~mysql5.1協議
    listen = 127.0.0.1:9527:mysql41
}

啟動sphinx服務:

/path/to/sphinx/bin/searchd --config /path/to/sphinx/etc/sphinx.conf

插入幾條數據看看:

ubuntu:chaoqun ~:mysql -h127.0.0.1 -P9527
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 1.10.1-dev (r2351)
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> INSERT INTO rt VALUES (1, 'this message has a body', 1);
Query OK, 1 row affected (0.01 sec)
 
mysql> INSERT INTO rt VALUES (2, '測試中文OK', 2);
Query OK, 1 row affected (0.00 sec)
 
mysql>

測試全文檢索:

mysql> SELECT * FROM rt WHERE MATCH('message');
+------+--------+------------+
| id   | weight | message_id |
+------+--------+------------+
|    1 |   1643 |          1 |
+------+--------+------------+
1 row in set (0.00 sec)
 
mysql> SELECT * FROM rt WHERE MATCH('OK');
+------+--------+------------+
| id   | weight | message_id |
+------+--------+------------+
|    2 |   1643 |          2 |
+------+--------+------------+
1 row in set (0.01 sec)
 
mysql> SELECT * FROM rt WHERE MATCH('中');
+------+--------+------------+
| id   | weight | message_id |
+------+--------+------------+
|    2 |   1643 |          2 |
+------+--------+------------+
1 row in set (0.00 sec)
 
mysql> SELECT * FROM rt WHERE MATCH('我');
Empty set (0.00 sec)
 
mysql>

簡單方便,碼完收工。

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!