×

Blog

Transforming and sending Nginx log data to Elasticsearch using Filebeat and Logstash - Part 2

Daniel Romić on 23 Feb 2018

In our previous post blog post we’ve covered basics of Beats family as well as Logstash and Grok filter and patterns and started with configuration files, covering only Filebeat configuration in full. Here, we continue with Logstash configuration, which will be the main focus of this post.

So, let’s continue with next step.

Configure Logstash

Our configuration will be simple enough to fit in one configuration file. Let us call it nginx.conf and give it basic structure:

input {
# here we'll define input from Filebeat, namely the host and port we're receiving beats from
# remember that beats will actually be Nginx access log lines
}
filter {
# here we'll define rules for processing of received nginx access log lines
}
output {
# here we'll define destination to which we desire to push processed and transformed log entries
}

The input is straightforward – we wish to receive log entries from Filebeat, which by default sends the beats on port 5044. We wish to listen to default route 0.0.0.0.

input {  
  beats {
    # The port to listen on for filebeat connections.
    port => 5044
    # The IP address to listen for filebeat connections.
    host => "0.0.0.0"
  }
}

Next step is the processing plant of this operation – the filter. We place our previously created grok pattern into Logstash configuration file under filter plugin, like so:

filter {
   grok {
      match => { "message" => "%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:access_time}\] \"%{WORD:http_method} %{DATA:url} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:referrer}\" \"%{DATA:agent}\"" }
        }
}

The match is a hash map containing stored field => value data. If we’d have multiple patterns against a single field, we’d have an array of patterns:

filter {
   grok {
      match => { "message" => ["Grok_pattern_1", "Grok_pattern_2", "Grok_pattern_n"] }
        }
}

So far, our Logstash configuration looks like this:

input {  
  beats {
    # The port to listen on for filebeat connections.
    port => 5044
    # The IP address to listen for filebeat connections.
    host => "0.0.0.0"
  }
}
filter {
   grok {
      match => { "message" => "%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:access_time}\] \"%{WORD:http_method} %{DATA:url} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:referrer}\" \"%{DATA:agent}\"" }
        }
}
output {
}

With this we’ve given a boring old log a new trendy outlook, just what we wanted to do. Our log file was cleaned from the dirt, given a new haircut and a clean shave. What’s left to do is to give it a new ID card and passport, so it can live as a free citizen in ELK universe.

This is accomplished in the final part of Logstash pipeline, the output plugin.

Let me see what I’m doing

First we want to check if the Logstash spits out our data. We tell it to show us the output on a standard output (our terminal):

output {
  stdout { codec => rubydebug }
}

Our full configuration, although naïve, is ready.

input {  
  beats {
    # The port to listen on for filebeat connections.
    port => 5044
    # The IP address to listen for filebeat connections.
    host => "0.0.0.0"
  }
}
filter {
   grok {
      match => { "message" => "%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:access_time}\] \"%{WORD:http_method} %{DATA:url} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:referrer}\" \"%{DATA:agent}\"" }
           }
}
output {
  stdout { codec => rubydebug }
 }

We’ll name this file logstash_demo.conf and put it in the following location: /etc/logstash/conf.d/.

Can we run it now, please?

If installed through Debian package, Logstash binary is located at /usr/share/logstash/bin/logstash. To redirect the standard output to our terminal, we will start Logstash not as a service, but directly. By doing this, we first check that we haven’t messed up the configuration with a lonely curly brace or a forgotten quotation mark, we use the following command:

sudo /usr/share/logstash/bin/logstash –tf --path.setting /etc/logstash/conf.d/simple.conf

If a following message is shown we’ve done everything correctly and may rejoice.

WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
Configuration OK

Now, with our good configuration, we start Logstash:

sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/simple.conf

Within a moment, it should start printing out messages:

{
      "response_code" => "304",
              "agent" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
             "offset" => 22087,
          "user_name" => "-",
         "input_type" => "log",
       "http_version" => "1.1",
             "source" => "/var/log/nginx/access.log",
            "message" => "192.168.5.84 - - [22/Nov/2017:19:33:40 +0000] \"GET /plugins/kibana/assets/settings.svg HTTP/1.1\" 304 0 \"http://192.168.5.177/app/kibana\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36\"",
               "type" => "log",
                "url" => "/plugins/kibana/assets/settings.svg",
               "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
           "referrer" => "http://192.168.5.177/app/kibana",
         "@timestamp" => 2017-11-22T19:33:42.223Z,
          "remote_ip" => "192.168.5.84",
        "http_method" => "GET",
           "@version" => "1",
               "beat" => {
		    "name" => "vagrant-ubuntu-trusty-64",
		"hostname" => "vagrant-ubuntu-trusty-64",
		 "version" => "5.6.4"
    },
               "host" => "vagrant-ubuntu-trusty-64",
    "body_sent_bytes" => "0",
        "access_time" => "22/Nov/2017:19:33:40 +0000"
}
{
      "response_code" => "304",
              "agent" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
             "offset" => 22350,
          "user_name" => "-",
         "input_type" => "log",
       "http_version" => "1.1",
             "source" => "/var/log/nginx/access.log",
            "message" => "192.168.5.84 - - [22/Nov/2017:19:33:41 +0000] \"GET /plugins/kibana/assets/play-circle.svg HTTP/1.1\" 304 0 \"http://192.168.5.177/app/kibana\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36\"",
               "type" => "log",
                "url" => "/plugins/kibana/assets/play-circle.svg",
               "tags" => [
        [0] "beats_input_codec_plain_applied"
    ],
           "referrer" => "http://192.168.5.177/app/kibana",
         "@timestamp" => 2017-11-22T19:33:42.223Z,
          "remote_ip" => "192.168.5.84",
        "http_method" => "GET",
           "@version" => "1",
               "beat" => {
		    "name" => "vagrant-ubuntu-trusty-64",
		"hostname" => "vagrant-ubuntu-trusty-64",
		 "version" => "5.6.4"
    },
               "host" => "vagrant-ubuntu-trusty-64",
    "body_sent_bytes" => "0",
        "access_time" => "22/Nov/2017:19:33:41 +0000"
}

Now this looks totally different than a single log line, doesn’t it? This concludes the Logstash output checking.

January Bonus Section

On the other hand, whilst testing the logstash configuration, if we get an error message like this:

WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[FATAL] 2017-11-22 19:50:43.107 [LogStash::Runner] runner - The given configuration is invalid. Reason: Expected one of #, input, filter, output at line 22, column 1 (byte 1842) after

It’s bad news.

Now, Logstash error messages sometimes aren’t helpful. If we go to our configuration file where I’ve purposely set a trap, and check just the output section:

 14 output {
 15   if "_grokparsefailure" in [tags] {
 16     # write events that didn't match to a file
 17     file { "path" => "/home/vagrant/grok_failures.txt" }
 18   } else {
 19   stdout { codec => rubydebug }
 20   }
 21  }
 22 }

We see that there’s nothing to be expected after line 22, because it’s the last line of the file and it’s supposed to be. Can you find the real error though? Should be easy.

Logstash has a nice feature when it fails to parse a log line for some reason. It then tags this line with _grokparsefailure value, which can be useful for debugging.

We’ll tell it to write such occurences to an external file that we can check at will:

output {
  if "_grokparsefailure" in [tags] {
    # write events that didn't match to a file
    file { "path" => "/home/vagrant/grok_failures.txt" }
  }
}

Now that we’ve got that case covered, we can tell Logstash to redirect the output of parsed lines to console. There are numerous output plugins, but for now we’re interested in stdout plugin.

Stdout supports numerous codecs as well, which are essentially different formats for our output to console. We will choose rubydebug codec, as it prints out Logstash fields nicely:

output {
  if "_grokparsefailure" in [tags] {
    # write events that didn't match to a file
    file { "path" => "/home/vagrant/grok_failures.txt" }
  } else {
  stdout { codec => rubydebug }
  }
}

Here’s what our full configuration looks like so far:

input {  
  beats {
    # The port to listen on for filebeat connections.
    port => 5044
    # The IP address to listen for filebeat connections.
    host => "0.0.0.0"
  }
}
filter {
   grok {
      match => { "message" => "%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:access_time}\] \"%{WORD:http_method} %{DATA:url} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:referrer}\" \"%{DATA:agent}\"" }
}
output {
  if "_grokparsefailure" in [tags] {
    # write events that didn't match to a file
    file { "path" => "/home/vagrant/grok_failures.txt" }
  } else {
  stdout { codec => rubydebug }
  }
}

Enriching data in a simple way

Say you’re having a billion-dollar product that is being used by people all over the globe. If possible, you might want want to set up some form of monitoring of this product or parts of it. (Several more examples perhaps) There really are abundant and creative ways to work with data.

One of the simplest and most convenient examples of enriching log data is to assign geological coordinates to an IP address. Without further ado, let’s get a GeoIP database and set it up so Logstash can assign geolocation fields to an IP address.

Download the GeoIP database.

wget http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz

We’ll move it to /var/opt/ location and unpack there. The next step is to call on this database from within Logstash. We’ll do this inside filter part of the configuration. First, we declare a geoip filter plugin section. We’ll work with client IP address which is stored under remote_ip identifier.

filter {
   grok {
      match => { "message" => ["%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:access_time}\] \"%{WORD:http_method} %{DATA:url} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:referrer}\" \"%{DATA:agent}\""] }
   }
   geoip {
         source => "remote_ip"
         target => "geoip"
         database => "/var/opt/GeoLite2-City/GeoLite2-City.mmdb" # path to your DB location goes here
         fields => ["country_name", "country_code3", "region_name", "location"]
   }
}

We then say that we’ll place this new geoip data under geoip field: target => "geoip", specify the path to our database and finally - choose which fields do we want to log and display. By doing these simple four lines of configuration code, we’ve enriched our logs with geological data that can be used for later processing.

If we now run Logstash, we’ll see this information. Check the new geoip field and its key-value data.

{
      "response_code" => "304",
              "agent" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
              "geoip" => {
           "country_name" => "Croatia",
          "country_code3" => "HR",
               "location" => {
                    "lon" => 15.5,
                    "lat" => 45.1667
                }
        },
             "offset" => 3724,
          "user_name" => "-",
         "input_type" => "log",
       "http_version" => "1.1",
             "source" => "/home/vagrant/access.log",
            "message" => "192.168.5.84 - - [31/Jan/2018:20:36:52 +0000] \"GET /bundles/0cebf3d61338c454670b1c5bdf5d6d8d.svg HTTP/1.1\" 304 0 \"http://192.168.5.177/bundles/commons.style.css?v=15571\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36\"",
            ...

New ID card and Passport

We can distinct and diversify various log types and place them into different indexes. This can be done through several ways, but here we’ll show a simplified example using tags:

filebeat.prospectors:

# Each *input_type* is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- input_type: log
  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/nginx/access.log
  tags: ["nginx-access-log", "web"]

- input_type: log
  paths:
    - /home/vagrant/rs_out.log
  tags: ["out-log", "game-store"]

This will result in different logs having different tags applied to them. We can later use these tags fields for creation of indexes for Elasticsearch, for example, or as conditionals that can also aid in the creation of indexes.

For example, the stdout message with above Filebeat configuration applied could look something like this:

{
      "response_code" => "304",
              "agent" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
              "geoip" => {
           "country_name" => "Croatia",
          "country_code3" => "HR",
               "location" => {
                    "lon" => 15.5,
                    "lat" => 45.1667
                }
        },
             "offset" => 3724,
          "user_name" => "-",
         "input_type" => "log",
       "http_version" => "1.1",
             "source" => "/var/log/nginx/access.log",
            "message" => "192.168.5.84 - - [22/Nov/2017:20:20:35 +0000] \"GET /bundles/status_page.style.css?v=15571 HTTP/1.1\" 304 0 \"http://192.168.5.177/app/kibana\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36\"",
               "type" => "log",
                "url" => "/bundles/status_page.style.css?v=15571",
               "tags" => [
        [0] "nginx-access-log",
        [1] "web",
        [2] "beats_input_codec_plain_applied"
    ],
           "referrer" => "http://192.168.5.177/app/kibana",
         "@timestamp" => 2017-11-22T20:20:37.148Z,
          "remote_ip" => "192.168.5.84",
        "http_method" => "GET",
           "@version" => "1",
               "beat" => {
            "name" => "vagrant-ubuntu-trusty-64",
        "hostname" => "vagrant-ubuntu-trusty-64",
         "version" => "5.6.4"
    },
               "host" => "vagrant-ubuntu-trusty-64",
    "body_sent_bytes" => "0",
        "access_time" => "22/Nov/2017:20:20:35 +0000"
}

We see that there are three tags applied to this message, one of which is default tag (it’s the last one). We won’t go into creation of indexes in this post, but we hope we’ve shed some light on this topic and how things can be done “under the hood”.

What’s beneficial is that there are many options for configuring the whole system, which gives our engineers freedom to try and apply various approaches and techniques in this “log managing” process. Not everyone has the same desires and needs for their log management and some practical problems can be very complex – this is where that freedom of configuration and approach comes in very handy.





comments powered by Disqus