Logstash:使用 dissect 导入 CSV 格式文档

CSV 是一种非常通用的数据存储方式。在之前的好几篇文章中,我们使用了好几种的方法来把 CSV 格式的文件导入到 Elasticsearch 中。你可以参阅一下的文章:

在今天我们来使用另外一种方式来展示如何使用 Logstash 的 dissect filter 来导入一个 CSV  格式的数据。 

 

准备工作

Elasticsearch 及 Kibana

我们首先安装好自己的 Elasticsearch 及 Kibana。如果你还没安装好的话,请参阅我之前的文章 “Elastic:菜鸟上手指南”。

Logstash

你可以参阅我之前的文章 “如何安装Elastic栈中的Logstash” 来安装自己的 Logstash。

CSV 文件

为了说明问题的方便,我创建了如下的一个简单的 csv 文件:

test.csv

"device1","London","Engineering","Computer"
"device2","Toronto","Consulting","Mouse"
"device3","Winnipeg","Sales","Computer"
"device4","Barcelona","Engineering","Phone"
"device5","Toronto","Consulting","Computer"
"device6","London","Consulting","Computer"

我们把这个文件保存于 logstash 的安装根目录中。在这里,请注意:CSV 中的每一个数据都是以逗号进行分隔的。在这个文件中,它没有 header,也就是说它不是像如下的格式:

"Device_ID","Device_Location","Device_Owner","Device_Type"
"device1","London","Engineering","Computer"
"device2","Toronto","Consulting","Mouse"
"device3","Winnipeg","Sales","Computer"
"device4","Barcelona","Engineering","Phone"
"device5","Toronto","Consulting","Computer"
"device6","London","Consulting","Computer"

配置 Logstash

我们能让我们的 Logstash 处理上面的 csv 文件,我们创建如下的配置文件:

logstash_dissect_csv.conf

input {
    stdin{}
}

filter {
  mutate {
    gsub => [ 
      "message", "\"", ""
    ]
  }

  dissect {
    mapping => {
      "message" => "%{Device_ID},%{Device_Location},%{Device_Owner},%{Device_Type}"
    }
  }

  mutate {
    remove_field => ["message"]
  }
}

output {
  stdout { 
    codec => "rubydebug"
  }
  
  elasticsearch {
    index => "devices"
  }
}

就像上面展示的那样,它接受一个从 stdin 输入的数据,并使用 filters:

  • mutate - gsub:把输入的 message 中的引号去掉
  • dissect: 提取相应的字段,所有的字段是以逗号分开的
  • mutate - remove_field:去掉 message 字段

运行 Logstash

我们可以使用如下的方式来运行 Logstash:

cat test.csv | sudo ./bin/logstash -f ./logstash_dissect_csv.conf   

我们可以在 Logstash 的 console 中看到如下的输出:

它表明,我们的 Logstash 是正常工作的。

我们在 Kibana 中进行查看我们是否已经有一个叫做 devices 的索引:

GET devices/_search

上面的命令的显示结果:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "devices",
        "_type" : "_doc",
        "_id" : "qE3nanMB6PiWomqxb5U3",
        "_score" : 1.0,
        "_source" : {
          "Device_Owner" : "Engineering",
          "@timestamp" : "2020-07-20T06:26:58.645Z",
          "Device_Location" : "London",
          "host" : "liuxg",
          "Device_Type" : "Computer",
          "@version" : "1",
          "Device_ID" : "device1"
        }
      },
      {
        "_index" : "devices",
        "_type" : "_doc",
        "_id" : "rU3nanMB6PiWomqxb5X4",
        "_score" : 1.0,
        "_source" : {
          "Device_Owner" : "Sales",
          "@timestamp" : "2020-07-20T06:26:58.656Z",
          "Device_Location" : "Winnipeg",
          "host" : "liuxg",
          "Device_Type" : "Computer",
          "@version" : "1",
          "Device_ID" : "device3"
        }
      },
      {
        "_index" : "devices",
        "_type" : "_doc",
        "_id" : "rE3nanMB6PiWomqxb5U5",
        "_score" : 1.0,
        "_source" : {
          "Device_Owner" : "Consulting",
          "@timestamp" : "2020-07-20T06:26:58.656Z",
          "Device_Location" : "Toronto",
          "host" : "liuxg",
          "Device_Type" : "Mouse",
          "@version" : "1",
          "Device_ID" : "device2"
        }
      },
      {
        "_index" : "devices",
        "_type" : "_doc",
        "_id" : "qk3nanMB6PiWomqxb5U4",
        "_score" : 1.0,
        "_source" : {
          "Device_Owner" : "Engineering",
          "@timestamp" : "2020-07-20T06:26:58.657Z",
          "Device_Location" : "Barcelona",
          "host" : "liuxg",
          "Device_Type" : "Phone",
          "@version" : "1",
          "Device_ID" : "device4"
        }
      },
      {
        "_index" : "devices",
        "_type" : "_doc",
        "_id" : "qU3nanMB6PiWomqxb5U4",
        "_score" : 1.0,
        "_source" : {
          "Device_Owner" : "Consulting",
          "@timestamp" : "2020-07-20T06:26:58.657Z",
          "Device_Location" : "London",
          "host" : "liuxg",
          "Device_Type" : "Computer",
          "@version" : "1",
          "Device_ID" : "device6"
        }
      },
      {
        "_index" : "devices",
        "_type" : "_doc",
        "_id" : "q03nanMB6PiWomqxb5U4",
        "_score" : 1.0,
        "_source" : {
          "Device_Owner" : "Consulting",
          "@timestamp" : "2020-07-20T06:26:58.657Z",
          "Device_Location" : "Toronto",
          "host" : "liuxg",
          "Device_Type" : "Computer",
          "@version" : "1",
          "Device_ID" : "device5"
        }
      }
    ]
  }
}

 

总结

在这篇文章中,我们使用了一个和之前完全不同的方法导入 CSV。说明 Elastic Stack 是非常弹性的。我们可以使用不同的方法来达到同样的效果。真所谓:条条大路通北京。

实付0元
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值