Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional pid as tag #1843

Closed
wants to merge 5 commits into from
Closed

Optional pid as tag #1843

wants to merge 5 commits into from

Conversation

rikaardhosein
Copy link
Contributor

Required for all PRs:

  • CHANGELOG.md updated (we recommend not updating this until the PR has been approved by a maintainer)
  • Sign CLA (if not already signed)
  • README.md updated (if adding a new plugin)

I've read #1460 and I understand that storing pid as a tag leads to performance problems. However, scenarios(such as ours) exist where having pid as a tag is desirable.

We have a number of long running processes, all with the same name. We're monitoring these processes with procstat. All of our data goes into influx under a 1 day retention policy and is downsampled and placed under a 1 week retention policy. Therefore at any point in time the number of unique pids we store is going to be relatively low.

Not having pid as a tag makes downsampling very difficult, if not impossible. Grouping by fields isn't allowed and therefore doing things like finding the max(memory_rss) / minute , which would require us to group by pid, isn't possible.
This has also had ripple effects making the data difficult to work with in Grafana as well.

Therefore, I've made it possible to have pid as a tag instead of a field via the configuration.

@phemmer
Copy link
Contributor

phemmer commented Oct 4, 2016

Instead of making pid a tag, what about an "index" tag.
For example, if you have 3 processes named foo, you'd have:

procstat,exe=foo,pidx=1 ...
procstat,exe=foo,pidx=2 ...
procstat,exe=foo,pidx=3 ...

You can make the index deterministic by sorting some how (such as by pid). This should address your use case, and solve the cardinality problem.
There are other benefits to this as well. Such as if foo is an app with a master & workers, then on all your hosts, pidx=1 will always be the master, allowing you to compare it across hosts.

@rikaardhosein
Copy link
Contributor Author

@phemmer I think that's a great idea! I'll modify the PR soon to implement your idea. Thanks!

@dgnorton dgnorton added this to the 1.1.0 milestone Oct 6, 2016
@sparrc
Copy link
Contributor

sparrc commented Oct 11, 2016

I think I would prefer it to just add the PID rather than tracking an index. Seems to me that in order to keep the data unique the index cardinaltiy would need to be just as large as the PID cardinality.

@sparrc
Copy link
Contributor

sparrc commented Oct 11, 2016

this PR will fix #1668

@phemmer
Copy link
Contributor

phemmer commented Oct 11, 2016

@sparrc The cardinality will be much smaller by keeping an index. For example, if you're tracking a single process, the index will always be 1, and therefore you'll only have a single series in influxdb (2 processes, will be 2 series, etc). If you track by PID, you'll have ~65500 series.

@sparrc
Copy link
Contributor

sparrc commented Oct 11, 2016

Where did 65500 series come from? Are you saying that the PID is changing very frequently?

@phemmer
Copy link
Contributor

phemmer commented Oct 11, 2016

Every time the process restarts, you'll get a new pid. Over the life time of your influxdb, this can accumulate to a very big number.

@sparrc
Copy link
Contributor

sparrc commented Oct 11, 2016

right....I see what you mean, but it does seem like a band-aid that won't really matter once influxdata/influxdb#7151 is finished.

@sparrc sparrc modified the milestones: 1.2.0, 1.1.0 Oct 12, 2016
sparrc pushed a commit that referenced this pull request Dec 13, 2016
@sparrc sparrc closed this in #2149 Dec 13, 2016
njwhite pushed a commit to njwhite/telegraf that referenced this pull request Jan 31, 2017
maxunt pushed a commit that referenced this pull request Jun 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants