Logstash處理json格式日志文件的三種方法

dongfeng19 8年前發布 | 43K 次閱讀 日志處理

來自: http://blog.csdn.net//jiao_fuyou/article/details/49174269


假設日志文件中的每一行記錄格式為json的,如:

{"Method":"JSAPI.JSTicket","Message":"JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw","CreateTime":"2015/10/13 9:39:59","AppGUID":"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d","_PartitionKey":"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d","_RowKey":"1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c","_UnixTS":1444700398710}

默認配置下,logstash處理插入進elasticsearch后,查到的結果是這樣的:

{
    "_index": "logstash-2015.10.16",
    "_type": "voip_feedback",
    "_id": "sheE9eXiQASMDVtRJ0EYcg",
    "_version": 1,
    "found": true,
    "_source": { "message": "{\"Method\":\"JSAPI.JSTicket\",\"Message\":\"JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw\",\"CreateTime\":\"2015/10/13 9:39:59\",\"AppGUID\":\"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d\",\"_PartitionKey\":\"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d\",\"_RowKey\":\"1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c\",\"_UnixTS\":1444700398710}", "@version": "1", "@timestamp": "2015-10-16T00:39:51.252Z", "type": "voip_feedback", "host": "ipphone", "path": "/usr1/data/voip_feedback.txt" } }

即會將json記錄做為一個字符串放到”message”下,但是我是想讓logstash自動解析json記錄,將各字段放入elasticsearch中。有三種配置方式可以實現。

第一種,直接設置format => json

    file {
        type => "voip_feedback" path => ["/usr1/data/voip_feedback.txt"] format => json sincedb_path => "/home/jfy/soft/logstash-1.4.2/voip_feedback.access" }

這種方式查詢出的結果是:

{
    "_index": "logstash-2015.10.16",
    "_type": "voip_feedback",
    "_id": "NrNX8HrxSzCvLl4ilKeyCQ",
    "_version": 1,
    "found": true,
    "_source": { "Method": "JSAPI.JSTicket", "Message": "JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw", "CreateTime": "2015/10/13 9:39:59", "AppGUID": "cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d", "_PartitionKey": "cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d", "_RowKey": "1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c", "_UnixTS": 1444700398710, "@version": "1", "@timestamp": "2015-10-16T00:16:11.455Z", "type": "voip_feedback", "host": "ipphone", "path": "/usr1/data/voip_feedback.txt" } }

可以看到,json記錄已經被直接解析成各字段放入到了_source中,但是原始記錄內容沒有被保存

第二種,使用codec => json

    file {
        type => "voip_feedback" path => ["/usr1/data/voip_feedback.txt"] sincedb_path => "/home/jfy/soft/logstash-1.4.2/voip_feedback.access" codec => json { charset => "UTF-8" } }

這種方式查詢出的結果與第一種一樣,字段被解析,原始記錄內容也沒有保存

第三種,使用filter json

filter {
    if [type] == "voip_feedback" { json { source => "message" #target => "doc" #remove_field => ["message"] } } }

這種方式查詢出的結果是這樣的:

{
    "_index": "logstash-2015.10.16",
    "_type": "voip_feedback",
    "_id": "CUtesLCETAqhX73NKXZfug",
    "_version": 1,
    "found": true,
    "_source": { "message": "{\"Method222\":\"JSAPI.JSTicket\",\"Message\":\"JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw\",\"CreateTime\":\"2015/10/13 9:39:59\",\"AppGUID\":\"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d\",\"_PartitionKey\":\"cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d\",\"_RowKey\":\"1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c\",\"_UnixTS\":1444700398710}", "@version": "1", "@timestamp": "2015-10-16T00:28:20.018Z", "type": "voip_feedback", "host": "ipphone", "path": "/usr1/data/voip_feedback.txt", "Method222": "JSAPI.JSTicket", "Message": "JSTicket:kgt8ON7yVITDhtdwci0qeZg4L-Dj1O5WF42Nog47n_0aGF4WPJDIF2UA9MeS8GzLe6MPjyp2WlzvsL0nlvkohw", "CreateTime": "2015/10/13 9:39:59", "AppGUID": "cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d", "_PartitionKey": "cb54ba2d-1d38-45f2-9ed1-abff0bf7dd3d", "_RowKey": "1444700398710_ad4d33ce-a9d9-4d11-932e-e2ccebdb726c", "_UnixTS": 1444700398710, "tags": [ "111", "222" ] } }

可以看到,原始記錄被保存,同時字段也被解析保存。如果確認不需要保存原始記錄內容,可以加設置:remove_field => [“message”]

比較以上三種方法,最方便直接的就是在file中設置format => json

另外需要注意的是,logstash會在向es插入數據時默認會在_source下增加type,host,path三個字段,如果json內容中本身也含有type,host,path字段,那么解析后將覆蓋掉logstash默認的這三個字段,尤其是type字段,這個同時也是做為index/type用的,覆蓋掉后,插入進es中的index/type就是json數據記錄中的內容,將不再是logstash config中配置的type值。

這時需要設置filter.json.target,設置該字段后json原始內容將不會放在_source下,而是放到設置的”doc”下:

{
    "_index": "logstash-2015.10.20",
    "_type": "3alogic_log",
    "_id": "xfj3ngd5S3iH2YABjyU6EA",
    "_version": 1,
    "found": true,
    "_source": { "@version": "1", "@timestamp": "2015-10-20T11:36:24.503Z", "type": "3alogic_log", "host": "server114", "path": "/usr1/app/log/mysql_3alogic_log.log", "doc": { "id": 633796, "identity": "13413602120", "type": "EAP_TYPE_PEAP", "apmac": "88-25-93-4E-1F-96", "usermac": "00-65-E0-31-62-5D", "time": "20151020-193624", "apmaccompany": "TP-LINK TECHNOLOGIES CO.,LTD", "usermaccompany": "" } } }

這樣就不會覆蓋掉_source下的type,host,path值
而且在kibana中顯示時字段名稱為doc.type,doc.id…

</div>

 本文由用戶 dongfeng19 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!