core/rawdb: fix cornercase shutdown behaviour in freezer #26485

holiman · 2023-01-12T13:54:39Z

This PR does a few things. First of all, it makes the printout more detailed, e.g.

WARN [01-11|10:56:59.140] Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.10MiB stored=596.05MiB

by showing the exact number instead of approximate MB.

Secondly, it adds a testcase showing a cornercase that can occur during shutdown, if freezer is Close()d, and only afterwards the chain freezer calls Sync. Currently, this would lead to a exit with CRIT.

The chain_freezer does writes here: https:/ethereum/go-ethereum/blob/master/core/rawdb/chain_freezer.go#L162

followed by Sync here: https:/ethereum/go-ethereum/blob/master/core/rawdb/chain_freezer.go#L170

Between these operation, the underlying f may be closed by another routine.

I saw also that the chain freezer does exactly this 'dangerous' sort of shutdown sequence: first it shuts down the underlying database, and only after does it signal to the active goroutine to exit. This PR fixes this. BUT: I suspect that the same sequence may happen regardless, depending on the shutdown sequence. I haven't investigated that in full.

However, this PR adds a failing test. I am not sure what the best fix is. Should we remove the test? Should we expect Sync after Close not to yield an error?

rjl493456442 · 2023-01-13T03:23:31Z

core/rawdb/chain_freezer.go

 select {
 case <-f.quit:
 default:
 close(f.quit)
 }
 f.wg.Wait()
- return err
+ return f.Freezer.Close()


Good catch!

rjl493456442 · 2023-01-13T03:28:44Z

core/rawdb/freezer_table.go

@@ -869,7 +869,9 @@ func (t *freezerTable) advanceHead() error {
 func (t *freezerTable) Sync() error {
 t.lock.Lock()
 defer t.lock.Unlock()
-
+ if t.index == nil || t.head == nil {


Please also check if t.meta is nil.

rjl493456442

I think the fix is correct, we should include it although not sure if it's root cause

holiman · 2023-01-13T08:27:25Z

core/rawdb/freezer_test.go

+ if err := f.Sync(); err != nil {
+ t.Fatal(err)
+ }


So, I'll just change this, and expect an error, and that the error (or error-string) is "closed", then?

rjl493456442 · 2023-01-16T05:01:14Z

cmd/geth/dbcmd.go

- log.Info("Failed to retrieve ancient root", "err", err)
- return err
- }
+ ancient := stack.ResolveAncient("chaindata", ctx.String(utils.AncientFlag.Name))


Any particular reason for this change? Directly resolve ancient datadir without involving the chain DB?

Yes, for the investigation here: #26483 (comment) , I needed to do a geth db inspect, but I didn't actually have a leveldb -- I only had a couple of index files.

This whole utils.MakeChainDatabase assumes things to be pretty well ordered, but all we use it for eventually is to help resolve the ancientdir, via db.AncientDir.

So this change has the same effect as the original code, but is more robust in case the data is not fully consistent / present.

) This PR does a few things. It fixes a shutdown-order flaw in the chainfreezer. Previously, the chain-freezer would shutdown the freezer backend first, and then signal for the loop to exit. This can lead to a scenario where the freezer tries to fsync closed files, which is an error-conditon that could lead to exit via log.Crit. It also makes the printout more detailed when truncating 'dangling' items, by showing the exact number instead of approximate MB. This PR also adds calls to fsync files before closing them, and also makes the `db inspect` command slightly more robust.

holiman added 2 commits January 12, 2023 14:50

core/rawdb: debug print exact numbers without prettification

f39e8df

core/rawdb: handle sync after close

ebe49f4

holiman requested review from karalabe and rjl493456442 as code owners January 12, 2023 13:54

holiman requested a review from fjl January 12, 2023 13:54

rjl493456442 reviewed Jan 13, 2023

View reviewed changes

rjl493456442 approved these changes Jan 13, 2023

View reviewed changes

holiman commented Jan 13, 2023

View reviewed changes

holiman added 6 commits January 13, 2023 09:40

core/rawdb: fix testcase, fix closed-check

3806472

core/rawdb: more verbosity after repair truncates items

afd2457

cmd/geth: simplify freezer-inspect

63f7fba

core/rawdb: fsync when closing freezer files

40ede55

core/rawdb: avoid sync on readonly files

33b075c

core/rawdb: fixes re syncing files

743d702

rjl493456442 reviewed Jan 16, 2023

View reviewed changes

rjl493456442 approved these changes Jan 16, 2023

View reviewed changes

holiman merged commit 0b53b29 into ethereum:master Jan 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/rawdb: fix cornercase shutdown behaviour in freezer #26485

core/rawdb: fix cornercase shutdown behaviour in freezer #26485

holiman commented Jan 12, 2023 •

edited

Loading

rjl493456442 Jan 13, 2023

rjl493456442 Jan 13, 2023

rjl493456442 left a comment

holiman Jan 13, 2023 •

edited

Loading

rjl493456442 Jan 16, 2023

holiman Jan 16, 2023

core/rawdb: fix cornercase shutdown behaviour in freezer #26485

core/rawdb: fix cornercase shutdown behaviour in freezer #26485

Conversation

holiman commented Jan 12, 2023 • edited Loading

rjl493456442 Jan 13, 2023

Choose a reason for hiding this comment

rjl493456442 Jan 13, 2023

Choose a reason for hiding this comment

rjl493456442 left a comment

Choose a reason for hiding this comment

holiman Jan 13, 2023 • edited Loading

Choose a reason for hiding this comment

rjl493456442 Jan 16, 2023

Choose a reason for hiding this comment

holiman Jan 16, 2023

Choose a reason for hiding this comment

holiman commented Jan 12, 2023 •

edited

Loading

holiman Jan 13, 2023 •

edited

Loading