Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance analysis / comparison #23

Open
bobrik opened this issue Oct 23, 2015 · 8 comments
Open

Performance analysis / comparison #23

bobrik opened this issue Oct 23, 2015 · 8 comments

Comments

@bobrik
Copy link

bobrik commented Oct 23, 2015

Try 1, syslog input:

input {
  syslog {
    host => "{{ .Env "SYSLOG_HOST" }}"
    port => {{ .Env "SYSLOG_PORT" }}
  }
}

With enabled output the following messages appear:

{
           "message" => "tick\n",
          "@version" => "1",
        "@timestamp" => "2015-10-23T09:25:49.000Z",
              "host" => "172.16.91.1",
          "priority" => 14,
     "timestamp8601" => "2015-10-23T10:25:49+01:00",
         "logsource" => "whatever.local",
           "program" => "haha",
               "pid" => "3259",
          "severity" => 6,
          "facility" => 1,
         "timestamp" => "2015-10-23T10:25:49+01:00",
    "facility_label" => "user-level",
    "severity_label" => "Informational"
}

Performance is quite low:

2015/10/23 10:26:29 rate: 0/s
2015/10/23 10:26:30 rate: 8349/s
2015/10/23 10:26:31 rate: 888/s
2015/10/23 10:26:32 rate: 1167/s
2015/10/23 10:26:33 rate: 1308/s
2015/10/23 10:26:34 rate: 1612/s
2015/10/23 10:26:35 rate: 1261/s
2015/10/23 10:26:36 rate: 1682/s
2015/10/23 10:26:37 rate: 1261/s
2015/10/23 10:26:38 rate: 2102/s

Now try 2, tcp input + grok filter:

input {
  tcp {
    host => "{{ .Env "SYSLOG_HOST" }}"
    port => {{ .Env "SYSLOG_PORT" }}
  }
}

filter {
  grok {
    match => {
      "message" => "<%{POSINT:priority}>%{SYSLOGLINE}"
    }
    overwrite => [ "message" ]
  }
}

With enabled output the following messages appear:

{
          "message" => "tick",
         "@version" => "1",
       "@timestamp" => "2015-10-23T09:23:51.273Z",
             "host" => "172.16.91.1",
         "priority" => "14",
    "timestamp8601" => "2015-10-23T10:23:48+01:00",
        "logsource" => "whatever.local",
          "program" => "haha",
              "pid" => "3233"
}

Performance is much nicer, at least 12x:

2015/10/23 10:24:35 rate: 0/s
2015/10/23 10:24:36 rate: 11290/s
2015/10/23 10:24:37 rate: 10650/s
2015/10/23 10:24:38 rate: 15554/s
2015/10/23 10:24:39 rate: 19338/s
2015/10/23 10:24:40 rate: 18497/s
2015/10/23 10:24:41 rate: 20519/s
2015/10/23 10:24:42 rate: 20259/s
2015/10/23 10:24:43 rate: 19286/s
2015/10/23 10:24:44 rate: 22332/s
2015/10/23 10:24:45 rate: 19758/s
2015/10/23 10:24:46 rate: 16816/s

Baseline tcp input without any processing:

2015/10/23 10:09:29 rate: 51871/s
2015/10/23 10:09:30 rate: 57593/s
2015/10/23 10:09:31 rate: 56753/s
2015/10/23 10:09:32 rate: 54172/s
2015/10/23 10:09:33 rate: 64797/s
2015/10/23 10:09:34 rate: 49186/s
2015/10/23 10:09:35 rate: 63685/s
2015/10/23 10:09:36 rate: 62851/s
2015/10/23 10:09:37 rate: 55071/s
2015/10/23 10:09:38 rate: 55911/s
2015/10/23 10:09:39 rate: 62638/s
2015/10/23 10:09:40 rate: 68943/s
2015/10/23 10:09:41 rate: 74409/s
2015/10/23 10:09:42 rate: 73544/s
2015/10/23 10:09:43 rate: 60559/s

This is clearly suboptimal and there is a lot of room for improvement.

Logstash 2.0.0-beta3, oracle java 8u45.

@purbon
Copy link

purbon commented Oct 23, 2015

thanks a lot for your feedback, this is much appreciate it.

speaking about your benchmark, I have a few questions to understand the full picture:

  • what does the rate means? can you elaborate more how do you take this number?
  • how is the machine you're using? can you describe it?
  • how are the messages LS is receiving? are they coming randomly? at the same rate? can you describe this?

I'm sure I might have more question, but so looking forward to know more about this benchmark, this is a very good task.

Thanks a lot,

@bobrik
Copy link
Author

bobrik commented Oct 23, 2015

  • Rate means number of messages written to the socket.
  • This is a vm, but this doesn't matter since all benchmarks run on the same box.
package main

import (
    "log/syslog"
    "log"
    "time"
    "sync/atomic"
)

func main() {
    c, err := syslog.Dial("tcp", "172.16.91.128:12345", syslog.LOG_USER, "haha")
    if err != nil {
        log.Fatal(err)
    }

    count := int64(0)

    go func() {
        b := count
        for {
            n := count
            log.Printf("rate: %d/s\n", n - b)
            b = n
            time.Sleep(time.Second)
        }
    }()

    for {
        c.Info("tick")
        atomic.AddInt64(&count, 1)
    }
}

@purbon
Copy link

purbon commented Oct 23, 2015

Thanks a lot for your clarification, i always find very necessary to provide clarification when writing numbers like this. Can you describe the VM you're using? SO? memory? CPU's? this matter for other trying to reproduce your numbers.

Thanks

@purbon purbon changed the title Performance could be better Performance analysis / comparison Oct 23, 2015
@bobrik
Copy link
Author

bobrik commented Oct 23, 2015

VM has 4 CPUs (Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz) and 2gb of RAM.

Logstash heap size is default.

@purbon
Copy link

purbon commented Oct 23, 2015

cool, thanks a lot. which OS are you using?

@bobrik
Copy link
Author

bobrik commented Oct 23, 2015

Lt's a linux vm (boot2docker).

@purbon
Copy link

purbon commented Oct 23, 2015

awesome, I think now we've all necessary information. Thanks!

@synhershko
Copy link

Has there been any movement on this one? cc @untergeek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants