Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudwatch plugin reception of string log causes Fluent Bit to crash #210

Open
matthewfala opened this issue Oct 28, 2021 · 3 comments
Open
Labels
bug Something isn't working

Comments

@matthewfala
Copy link
Contributor

Fluent Bit currently crashes if a simple string log is received by the cloudwatch go plugin. This shouldn't affect the normal use case of Firelens because Firelens usually receives input from the docker fluentd log driver which always outputs an object rather than a raw string.

Here is my Fluent Bit configuration:

[SERVICE]
     Grace 30
     Log_Level trace

# Provide entry point for logs
[INPUT]
     Name http
     host 0.0.0.0
     port 8888
[OUTPUT]
     Name cloudwatch
     Match *
     log_stream_prefix x/
     log_group_name x/
     region us-west-2

Here is the input I send via HTTP request body to Fluent Bit:

POST http://localhost:8888/app.log

[
    "this is a small regular log."
]

Here is Fluent Bit's output:

Fluent Bit v1.9.0
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/10/28 22:58:47] [ info] Configuration:
[2021/10/28 22:58:47] [ info]  flush time     | 5.000000 seconds
[2021/10/28 22:58:47] [ info]  grace          | 30 seconds
[2021/10/28 22:58:47] [ info]  daemon         | 0
[2021/10/28 22:58:47] [ info] ___________
[2021/10/28 22:58:47] [ info]  inputs:
[2021/10/28 22:58:47] [ info]      http
[2021/10/28 22:58:47] [ info] ___________
[2021/10/28 22:58:47] [ info]  filters:
[2021/10/28 22:58:47] [ info] ___________
[2021/10/28 22:58:47] [ info]  outputs:
[2021/10/28 22:58:47] [ info]      cloudwatch.0
[2021/10/28 22:58:47] [ info] ___________
[2021/10/28 22:58:47] [ info]  collectors:
[2021/10/28 22:58:47] [ info] [engine] started (pid=24180)
[2021/10/28 22:58:47] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/10/28 22:58:47] [debug] [storage] [cio stream] new stream registered: http.0
[2021/10/28 22:58:47] [ info] [storage] version=1.1.4, initializing...
[2021/10/28 22:58:47] [ info] [storage] in-memory
[2021/10/28 22:58:47] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/10/28 22:58:47] [ info] [cmetrics] version=0.2.2
[2021/10/28 22:58:47] [ info] [input:http:http.0] listening on 0.0.0.0:8888
[2021/10/28 22:58:47] [debug] [cloudwatch:cloudwatch.0] created event channels: read=25 write=26
INFO[0000] [cloudwatch 0] plugin parameter log_group_name = '/x' 
INFO[0000] [cloudwatch 0] plugin parameter default_log_group_name = 'fluentbit-default' 
INFO[0000] [cloudwatch 0] plugin parameter log_stream_prefix = 'x/' 
INFO[0000] [cloudwatch 0] plugin parameter log_stream_name = '' 
INFO[0000] [cloudwatch 0] plugin parameter default_log_stream_name = '/fluentbit-default' 
INFO[0000] [cloudwatch 0] plugin parameter region = 'us-west-2' 
INFO[0000] [cloudwatch 0] plugin parameter log_key = '' 
INFO[0000] [cloudwatch 0] plugin parameter role_arn = '' 
INFO[0000] [cloudwatch 0] plugin parameter auto_create_group = 'false' 
INFO[0000] [cloudwatch 0] plugin parameter new_log_group_tags = '' 
INFO[0000] [cloudwatch 0] plugin parameter log_retention_days = '0' 
INFO[0000] [cloudwatch 0] plugin parameter endpoint = '' 
INFO[0000] [cloudwatch 0] plugin parameter sts_endpoint = '' 
INFO[0000] [cloudwatch 0] plugin parameter credentials_endpoint =  
INFO[0000] [cloudwatch 0] plugin parameter log_format = '' 
[2021/10/28 22:58:47] [trace] [router] input=http.0 tag=http.0
[2021/10/28 22:58:47] [debug] [router] match rule http.0:cloudwatch.0
[2021/10/28 22:58:47] [ info] [sp] stream processor started
[2021/10/28 22:58:52] [trace] [input:http:http.0 at build/plugins/in_http/CMakeFiles/flb-plugin-in_http.dir/compiler_depend.ts:49] new TCP connection arrived FD=30
[2021/10/28 22:58:52] [trace] [input:http:http.0 at build/plugins/in_http/CMakeFiles/flb-plugin-in_http.dir/compiler_depend.ts:79] read()=299 pre_len=0 now_len=299
[2021/10/28 22:58:56] [trace] [task 0x7fffb000a5a0] created (id=0)
[2021/10/28 22:58:56] [debug] [task] created task=0x7fffb000a5a0 id=0 OK
[2021/10/28 22:58:56] [trace] [GO] entering go_flush()
panic: interface conversion: interface {} is []uint8, not map[interface {}]interface {}

goroutine 17 [running, locked to thread]:
github.com/x/github.com/fluent/[email protected]/output/decoder.go:87 +0x2ea
main.FLBPluginFlushCtx(0x7fffb0007560, 0x7fffc43b9010, 0xc000000028, 0x7fffb000a710, 0x7ffff4339ca6)
        /home/x/amazon-cloudwatch-logs-for-fluent-bit/fluent-bit-cloudwatch.go:174 +0x1f2
main._cgoexpwrap_19a10b653c9e_FLBPluginFlushCtx(0x7fffb0007560, 0x7fffc43b9010, 0x28, 0x7fffb000a710, 0x7fffb0012860)
        _cgo_gotypes.go:90 +0x49
@matthewfala matthewfala added the bug Something isn't working label Oct 28, 2021
@matthewfala
Copy link
Contributor Author

The easiest way to reproduce may be to run fluent bit with the above config and send the following string payload with curl:

curl -X POST http://localhost:8888/app.log \
   -H 'Content-Type: application/json' \
   -d '["this is a small regular log."]'

@matthewfala
Copy link
Contributor Author

matthewfala commented Oct 28, 2021

It looks like the crash is caused by a call to GetRecord() on the output_plugin's input which appears to contain encoded content from Fluent Bit's core.
https:/aws/amazon-cloudwatch-logs-for-fluent-bit/blob/mainline/fluent-bit-cloudwatch.go#L174

The scope of this problem is most likely all Fluent Bit Go plugins.

Upon decoding to a character array string ([]uint8) via fluent/fluent-bit-go/output/decoder.go#L87, the []uint8 datatype is forced to be converted to an object interface map[interface{}]interface{} via the interface method which is described as follows:

func (reflect.Value).Interface() (i interface{})

Interface returns v's current value as an interface{}. It is equivalent to:
var i interface{} = (v's underlying value)

It panics if the Value was obtained by accessing unexported struct fields.

The panic due to incompatible type conversion is what we are seeing crash Fluent Bit.

func GetRecord(dec *FLBDecoder) (ret int, ts interface{}, rec map[interface{}]interface{}) {
	var check error
	var m interface{}

	check = dec.mpdec.Decode(&m)
	if check != nil {
		return -1, 0, nil
	}

	slice := reflect.ValueOf(m)
	if slice.Kind() != reflect.Slice || slice.Len() != 2 {
		return -2, 0, nil
	}

	t := slice.Index(0).Interface()
	data := slice.Index(1)

	map_data := data.Interface().(map[interface{}]interface{})

	return 0, t, map_data
}

GetRecord(...) from fluent/fluent-bit-go package's /output/decoder.go

There are several potential fixes to this problem:

  1. Update Fluent Bit's HTTP input plugin (and other similar plugins) so that strings are converted to objects upon reception.
  2. Update the fluent/fluent-bit-go package's /output/decoder.go GetRecord() method to accept strings and object rather than just objects. GetRecord() would then return strings and objects.
    • This may be the closest solution to the Fluent Bit native plugins, as I think those plugins can receive msgpack string logs as well as object logs as input.
    • A change to the decoded input, map_data datatype of all go plugins would most likely be required, unless some interesting union type is utilized as the return value of the decoder which supports backwards compatibility for plugins expecting objects.
  3. Update GetRecord() plugin to package strings in some kind of object such as {"value": <my_string>}
    • This may deviate from the c plugins making this solution undesirable.

@booleanbetrayal
Copy link

We were seeing Node failures in our Fargate clusters while on cloudwatch_logs and was told to use cloudwatch plugin instead. Node stability has returned with the older plugin, and I'm wondering if this is the upstream issue for tracking purposes for the newer variant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants