Elasticsearch 索引學習

jopen 9年前發布 | 30K 次閱讀 ElasticSearch 搜索引擎

創建索引

創建索引的時候指定分片的個數：

http put :9200/indexsetting number_of_shards=1 number_of_replicas=1

{ "acknowledged": true }</pre>

映射配置

在我們手動配置映射之前，Elasticsearch 可以通過 json 來猜測文檔結構當中的字段的類型。如下例子：

http post :9200/test/auto field1='20' field:=10

{ "_id": "AVHNbr0WRh7yMB73pVgC", "_index": "test", "_shards": { "failed": 0, "successful": 1, "total": 2 }, "_type": "auto", "_version": 1, "created": true }

http :9200/test/auto/_mapping

{ "test": { "mappings": { "auto": { "properties": { "field": { "type": "long" }, "field1": { "type": "string" } } } } } }</pre>
可以看到 field 的類型是 long。當然我們也可以在創建索引的時候指定 numeric_detection 參數為 true 以開啟更積極的文本檢測。

//創建類型 notauto 的 mapping http put :9200/test/_mapping/notauto notauto:='{"numeric_detection":true}'

{ "acknowledged": true }

//添加文檔 http post :9200/test/notauto f1='10' f2='20' { "_id": "AVHNeiW1Rh7yMB73pVgG", "_index": "test", "_shards": { "failed": 0, "successful": 1, "total": 2 }, "_type": "notauto", "_version": 1, "created": true }

//查看字段類型 http :9200/test/notauto/_mapping { "test": { "mappings": { "notauto": { "numeric_detection": true, "properties": { "f1": { "type": "long" }, "f2": { "type": "long" } } } } } }</pre>

但是有個問題就是我們不能從強文本當中推測出布爾值，我們只能在映射定義中直接定義字段。

另外的一個類型是日期類型，我們也可以指定 "dynamic date formats" : ["yyyy-MM-dd hh:mm"] 這個參數可以接收的是一個數組。

禁止字段類型猜想

要關閉自動添加字段，可以把 dynamic 屬性設置成 false。

http put :9200/test/_mapping/my my:='{"dynamic":false,"properties":{"ff1":{"type":"string"},"ff2":{"type":"string"}}}'

{ "acknowledged": true }

http :9200/test/my/_mapping

{ "test": { "mappings": { "my": { "dynamic": "false", "properties": { "ff1": { "type": "string" }, "ff2": { "type": "string" } } } } } }</pre>

索引結構映射

如下例子：

cat posts.json

{ "mappings":{ "post": { "properties": { "id" : { "type":"long", "store":"yes", "precision_step":"0" }, "name" : { "type":"string", "store":"yes", "index":"analyzed" }, "published" : { "type":"date", "store":"yes", "precision_step":"0" }, "contents" : { "type":"string", "store":"no", "index":"analyzed" } } } } }

http put :9200/posts < posts.json

{ "acknowledged": true }

http :9200/posts/_mapping

{ "posts": { "mappings": { "post": { "properties": { "contents": { "type": "string" }, "id": { "precision_step": 1, "store": true, "type": "long" }, "name": { "store": true, "type": "string" }, "published": { "format": "strict_date_optional_time||epoch_millis", "precision_step": 1, "store": true, "type": "date" } } } } } }</pre>

核心類型

string

number

date

boolean

binary

每個類型的公共屬性

index_name：定義存儲到索引中字段的名稱，未定義則使用字段的名字

index：可以設置 analyzed 或 no，字符串類型還可以設置成 not analyzed。設置成 analyzed 該字段被編入搜索以提供搜索。如果設置成 no，將無法搜索該字段。默認是 analyzed，如果字符串類型設置成 not analyzed，那么意味著字段不經過分析直接編入索引，搜索的時候進行全匹配。

store：yes 或者 no，表示是否被寫入索引。

boost：默認值是 1。定義了文檔中該字段的重要性，值越高越重要。

null_value：如果該字段不是索引的一部分，那么屬性的值指定寫入索引的值。默認忽略該字段。

copy_to：指定一個字段，字眼的所有值都將復制到該指定字段。

include in all：此屬性指定該字段是否應包括在 all字段當中，默認的情況所有字段都會包含在` all`當中。

字符串類型

字符串類型還可以使用如下屬性：

term_vector：此屬性可以設置成 no、yes、with_offsets、with_positions、with_positions_offsets。定義是否計算該字段的 lucene 詞向量，如果使用高亮，那就需要計算這個詞向量。

omit_norms：該屬性可以設置為 true 和 false。對于經過分析的字符串字段，默認值為 false，而對于未經過分析但已經存入索引的字符串字段，默認設置為 true。當屬性為 true 的時候，禁止 lucene 對該字段的加權計算。

analyzer：定義索引和搜索的分析器名稱。

index_analyzer：該屬性定義創建索引的分析器名稱。

search_analyer：定義查詢時候的分析器名稱。

norms.enabled：字段加權基準。默認是 true，未分析字段是 false。

norms.loading：可以設置成 eager 或 lazy。eager 表示此字段總是加載加權基準。lazy 是指定時候才加載。

數字類型

byte

short

integer

long

float

double

IP地址類型

可以把字段設置 ip 類型，來存放 ip 數據

批量操作

cat bulk.json

{"index":{"_index":"test", "_type":"bulk"}} { "name":"rcx", "age":14} {"index":{"_index":"test", "_type":"bulk"}} { "name":"rcx1", "age":28}

http post :9200/test/bulk/_bulk < bulk.json { "errors": false, "items": [ { "create": { "_id": "AVHOPSjBRh7yMB73pVgS", "_index": "test", "_shards": { "failed": 0, "successful": 1, "total": 2 }, "_type": "bulk", "_version": 1, "status": 201 } }, { "create": { "_id": "AVHOPSjBRh7yMB73pVgT", "_index": "test", "_shards": { "failed": 0, "successful": 1, "total": 2 }, "_type": "bulk", "_version": 1, "status": 201 } } ], "took": 23 }</pre>

索引內部信息

每個文檔都有自己的標識符和類型。文檔存在兩種內部標識符。

_uid：是索引中文檔的唯一標識符，由文檔的標識符和類型構成，此字段不需要設置，總是被索引。

_id：實際標識符，一般創建文檔是時候會傳入，如果不傳入會自動生成一個。

** _type 字段**

默認情況下文檔的類型也會編入索引，但是不會被分析也不會被存儲。

** _all 字段**

Elasticsearch 使用 all 字段來存儲其他字段中的數據便于搜索。當要執行簡單的搜索功能，搜索所有數據，但是有不想去考慮字段名稱之類的事情，這個字段很有用。默認情況下，` all是啟用的。_all` 字段也可以完全禁止，或者排除某些字段。需要如下修改：

{
    "book" : {
        "_all" : {
            "enabled" : "false"
        },
        "properties" : {
            ...
        }
    }
}

** _source 字段**

該字段存儲原始 json 文檔。默認情況下是開啟的。如果不需要這個功能可以禁止，與_all禁止的方式相同。

** _index 字段 **

存儲文檔的索引信息。

** _size 字段**

默認不開啟，這個字段使我們可以自動索引 _source 字段的原始大小，并且與文件一起存儲。

** _timestamp 字段**

_ttl 字段

time to live，它允許定義文檔的生命周期，周期結束后文檔被自動刪除。默認禁止此屬性。

【參考資料】

Elasticsearch服務器開發

http://renchx.com/Elasticsearch2/

本文由用戶 jopen 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/lib/view/open1450943180401.html

ElasticSearch 搜索引擎

Elasticsearch 索引學習

創建索引

相關經驗

相關資訊

相關文檔

目錄