Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.12.0] Too much write OPS #6058

Closed
simnv opened this issue Mar 19, 2016 · 11 comments
Closed

[0.12.0] Too much write OPS #6058

simnv opened this issue Mar 19, 2016 · 11 comments

Comments

@simnv
Copy link
Contributor

simnv commented Mar 19, 2016

Centos 6.7 64-bit
InfluxDB 0.10.3 installed from yum repo, db created after clean install
VM with 8 vCPU, 24GB of RAM, ext4 SAN Datastore

I have a test setup using opentsdb input to collect data from several bosun scollectors feeding little above 8k metrics per second. I have two continuous queries, one running every ten minutes, other running every hour, both just storing mean values into other retentions.

My problem is that influx constantly makes write operations on disk. About 4k ops. It doesn't consume much memory, but those writes are annoying: why does an application that I use for monitoring one virtual platform is the most resource-consuming application on that platform?

Tried playing with WAL limits, tried setting ready two, then ten times higher, tried other parameters, no effect. Write is always happening.

For now I just placed WAL on tmpfs, and it works quite normal: almost no visible disk activity, data is just stored normally. Making rsync every minute to store it on a disk for persisting across reboots, don't know if it's a good practice though.

IOPS Graph
Read ops are positive, write ops are negative values.

Is this a normal behavior for influx? How can I tune influx to make those write operations to occur more rarely?

@simnv simnv changed the title [0.10.3] Too much write IOPS [0.10.3] Too much write OPS Mar 19, 2016
@mark-rushakoff
Copy link
Contributor

I have a test setup using opentsdb input to collect data from several bosun scollectors feeding little above 8k metrics per second.

Are you batching your points?

When a write comes in to Influx, the points in the write are stored in both the in-memory TSM cache (for query performance) and the on-disk WAL (for permanent storage, to be eventually snapshotted and compacted into TSM files).

The WAL is grouped by retention policy and database, both of which are fixed per batch; therefore one POST to /write or one OpenTSDB write action should result in approximately one write to the filesystem, regardless of whether your batch was 1 point or 1000 points. Whether the filesystem is flushed to disk immediately or batched up later is dependent on your filesystem, operating system, and disk controllers – all things outside the control of InfluxDB.

@simnv
Copy link
Contributor Author

simnv commented Mar 20, 2016

Thank you for the suggestion. Metrics were batched in 500 points per host.

I have set batches to about 10000 points per host, depending on a host, but still have the same periods of writes every five minutes, for four minutes out of five:
Disk IOPS

Metrics at the same time:
Metrics count

I think I can afford to lose several minutes of data in case of an unlikely disaster. Should I remain on tmpfs, or is there more elegant way to have influxdb use memory more than writing on a disk?

@earthnut
Copy link

@mark-rushakoff one question about the subscriber .
will the data be send to subscriber after in-memory TSM cache and on-disk WAL flushed to storage or at the same time inlfuxDB received it.

@mark-rushakoff
Copy link
Contributor

@simnv something seems weird about your disk activity. Is that a network drive? HDD or SSD? I'm not aware of other reports of that kind of heavy disk activity so I'm inclined to believe something is unusual about your setup.

@earthnut it doesn't appear that there's any guarantee about the order of points sent to subscribers vs. flushed to shards: https:/influxdata/influxdb/blob/d024ca2/cluster/points_writer.go#L207-L230

@simnv
Copy link
Contributor Author

simnv commented Apr 1, 2016

After upgrading from 0.10.3 to 0.11.0 the problem has gone away:
disk-iops-influxdb

With on-disk WAL folder write ops grew only slightly.

@simnv simnv closed this as completed Apr 1, 2016
@simnv simnv reopened this Apr 5, 2016
@simnv
Copy link
Contributor Author

simnv commented Apr 5, 2016

I don't understand. Influx was working fine with little above than 20 write ops per second for several days, then after restart it has returned to 3k+ write ops:
iops 0 11

Config hasn't changed at all, data inputs are same. Btw, it was updated from 0.11.0 to 0.11.1 before restart.

@simnv
Copy link
Contributor Author

simnv commented Apr 7, 2016

Now, after upgrading to 0.12, I have not only high wal write ops, but also a high write ops on the fs with data directory.

@simnv
Copy link
Contributor Author

simnv commented Apr 7, 2016

Attached strace to influxdb process. Most iops are for wal files (3k+ write iops, understandable) and for meta/meta.dbtmp file (2k+ write iops, each time for 4754 bytes). Last part is strange for me.

@simnv simnv changed the title [0.10.3] Too much write OPS [0.12.0] Too much write OPS Apr 7, 2016
@simnv
Copy link
Contributor Author

simnv commented Apr 7, 2016

Dug a little deeper. Every operation leads to cacheData.Index increment, which leads to meta.db flush on disk. It is flushed with f.Sync() command, so no help in write caching:
https:/influxdata/influxdb/blob/master/services/meta/client.go#L979

Made several diffs of this meta.db file, looks like only index is incremented when there is no admin activity (no new users, cqs etc created). Then why should we flush this file on disk with every commit? Especially in standalone setup.

Moved WAL and meta.db to tmpfs, syncing it to disk every minute. Average IOPS dropped to about 10. Stopped influxdb, cleared WAL, started influxdb again. Lost only data that was in WAL (obviously). Copied meta.db, waited for about 30 minutes, stopped influxdb, replaced meta.db with 30 minutes old version, started influxdb - no data lost, db works fine. I can definitely live with that.

So, in the end, data is flushed on every batch of points received. And it is flushed not only in WAL, but it also fires metadata update and flush.

Can someone explain to me, what's the point of that? Why make constant writes on disk damaging it in the process instead of making syncs in sane intervals of time like once every seconds, for example? Why should I use terrible crutches of tmpfs and rsync to make those updates less frequent? I hope I just didn't configured influxdb right. But judging from the sources, it works like that.

@simnv
Copy link
Contributor Author

simnv commented Apr 14, 2016

Updated to 0.12.1, problem has gone. Storing 9k+ metrics per second I have only 25 write operations per second to wal partition and occasional writes to data partition.

Thanks!

@simnv simnv closed this as completed Apr 14, 2016
@toddboom
Copy link
Contributor

@simnv awesome, thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants