-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet WAL #1878
Merged
Merged
Parquet WAL #1878
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
Signed-off-by: Joe Elliott <[email protected]>
… blocks unless needed
… path that was unused until now
mdisibio
requested review from
joe-elliott,
annanay25,
mapno and
kvrhdn
as code owners
November 9, 2022 14:59
joe-elliott
reviewed
Nov 9, 2022
mapno
approved these changes
Nov 10, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Left only a comment, everything else looks ok. But, I may not be the best to review this code, so take it with a grain of salt 😅
joe-elliott
approved these changes
Nov 10, 2022
joe-elliott
reviewed
Nov 10, 2022
electron0zero
approved these changes
Nov 11, 2022
…er writes (although in-order performs better)
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does:
This PR introduces support for a parquet-formatted WAL with the goal to support TraceQL search on WALs (which is currently infeasible on v2). The WAL format is controlled by a new
storage.Trace.WAL.Version
config which is similar to the one used for backend blocksstorage.Trace.Block.Version
. Setting it tovParquet
enables to the new WAL format, default is stillv2
. All combinations of WAL and Block versions are supported, but performance is best when they match.Design
Parquet WALs break from the existing design where the WAL is an open file to which new data is appended as it is received. Now a parquet WAL is a folder with a series of sub-files, like this:
<blockID>+<tenantID>+vParquet
** meta.json
** 00001
** 00002
** 00003
** 00004
Each file is a standalone parquet file, created each time the ingester flushes idle traces from memory to disk (every few seconds). From there the sub-files are identical in format to existing parquet blocks and all the same functionality is reused: search, find trace by id, tag lookup, and soon: traceql. There is also an operational benefit that the standalone files are easy to inspect and troubleshoot with existing parquet tooling.
Refactor
This PR includes a refactor of the
tempodb/wal
module to support different encodings and is similar to what was done for the backend. A generic interface for a WAL block was introduced intempodb/encoding/common
and now bothv2
andvParquet
encodings implement it.Performance Impact
As expected there is an increase in resources for this format, ~2x cpu and ~2x memory. I don't think it can ever be as efficient as
v2
which was basically copying bytes from memory to disk, but there are several areas we can explore to mitigate this - especially the case of parquet wal -> parquet block.TODOs
This PR is stable and usable, and worth merging now since it is already quite large. However there are several TODOs which make sense to do in follow-up PRs:
parquet wal -> parquet block
step. Maybe we can do the same as compaction and copy raw rows between blocks instead ofTrace
objects. (And use buffer pooling to further reduce memory overhead) Also change the multi-block iterator to not simultaneously buffer the nextTrace
from all inputs. (i.e. 100 wal sub-files means 100 traces resident in memory at the same time during completion process)Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]