Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make filename to upload customizable #24

Open
ThanksForAllTheFish opened this issue May 3, 2018 · 5 comments
Open

Make filename to upload customizable #24

ThanksForAllTheFish opened this issue May 3, 2018 · 5 comments

Comments

@ThanksForAllTheFish
Copy link

Disclaimer: I think this could be a missing feature, however it may be possible to achieve same result in different ways (with different plugin maybe, or by combining more of them). Also, it is more or less the first I open a feature request, I am not exactly what level of details you may need)

For my use case, having the filename fixedly bound to the socket host name is not good (or at least, that would be my understanding from https:/logstash-plugins/logstash-output-google_cloud_storage/blob/master/lib/logstash/outputs/google_cloud_storage.rb#L310, I don't really know Ruby). We have an old system that copies log files to Google Cloud Storage for later consumption and I would like to push logs from logstash to GCS keeping our naming strategy.

For us, it would help to be able to customized the filename based on appender attributes. For example, we should be able to retrieve an attribute GCS_file set to my-old-filename and use my-old-filename to generate instead of Socket.gethostname().

Possibly, a transformer can be configured as well to alter the filename configured, although this would probably make life unnecessarily hard (just specify the chosen attribute with the proper value).

Prefix and suffix should still be added.

I hope it all makes sense 😄

@josephlewis42
Copy link
Contributor

Hi @ThanksForAllTheFish!

I really like the idea of allowing message attributes to define filenames although I'm not quite sure how to set it up with our current system. Right now the plugin opens a file and keeps appending events to it until the "base" path changes (the hostname or the timestamp) or the file gets too large. If we added a file name in there and Logstash started reading from two files at once you could end up with a bucket full of one line log files. I am working on #23 right now, which might pave the way towards doing this.

I think adding a fixed prefix and suffix is very doable if that would help!

Can you tell us a little more about your use-case? Is this a one-time move or is it going to be continuous (does it have to go through Logstash?), what you are planning on doing with the data once it's in the bucket and the variety of file names you're processing?

I'm thinking:

  • If it doesn't have to go through Logstash, you could set up a service account with limited permissions and use gsutil to do copying.
  • If the logs have a limited number of prefixes like apache-... and nginx-... and we add the ability to have a fixed prefix to the plugin it could be possible to use Logstash conditionals to direct the events to one of several Cloud Storage outputs.
  • You might be able to use a two-step process with the file output plugin if you need to do processing in Logstash then upload the processed files with gsutil.

I hope we can help!
- Joseph

@ThanksForAllTheFish
Copy link
Author

Our use case in some details is:

we are migrating apps to Kubernetes. Those apps generate business log (like number of click of shown ads). So far, these log files are uploaded daily to GCS to be processed by DI the next.

Moving to K8s, however, we are gonna lose the possibility to log to file, so the idea we had is to log business logs to logstash using specific attributes, so that this plugin can cover the upload part. Also, we need to keep the current DI processes as stable as possible, as we are in a transformation period and DI is really thin (from here the need to keep our current naming convention).

File output plugin can be an option too, but probably with that one I would need to find a way to ensure files are uploaded if logstash crashes (this may be a problem with this plugin as well) or is gracefully terminated (this plugin handle the case).

However, for my specific use case, you actually gave me an idea which would require any change in this plugin. I still think a more customizable name can be interesting feature though, as having predictable names can simplify the logic needed to recognize which files need to processed from a GCS bucket.

Thanks for the feedback!

1 similar comment
@ThanksForAllTheFish
Copy link
Author

Our use case in some details is:

we are migrating apps to Kubernetes. Those apps generate business log (like number of click of shown ads). So far, these log files are uploaded daily to GCS to be processed by DI the next.

Moving to K8s, however, we are gonna lose the possibility to log to file, so the idea we had is to log business logs to logstash using specific attributes, so that this plugin can cover the upload part. Also, we need to keep the current DI processes as stable as possible, as we are in a transformation period and DI is really thin (from here the need to keep our current naming convention).

File output plugin can be an option too, but probably with that one I would need to find a way to ensure files are uploaded if logstash crashes (this may be a problem with this plugin as well) or is gracefully terminated (this plugin handle the case).

However, for my specific use case, you actually gave me an idea which would require any change in this plugin. I still think a more customizable name can be interesting feature though, as having predictable names can simplify the logic needed to recognize which files need to processed from a GCS bucket.

Thanks for the feedback!

@pronoiac
Copy link

pronoiac commented Feb 8, 2019

This might relate to something I'd like to do at work, where instead of emitting logs to the root folder, we can add a limited number of folders, like frontend/, backend/ or the like. I tried adding those to log_file_prefix but it crashed writing the local files because it didn't make the folders first.

@shamil
Copy link

shamil commented Feb 5, 2020

If this can be handled the way S3 output works, like you specify prefix in the output parameters, and the output creates that directory hierarchy locally. Also in S3 output prefix support string interpolation which should help with distributing data in S3 in any way as we want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants