after restart, bbolt db failed to get all reachable pages #778

timannguyen · 2024-06-26T16:59:11Z

After restarting the process running with open telemetry file extension, we get

panic: freepages: failed to get all reachable pages (page 138577: multiple references (stack: [110806 65199 138577]))

goroutine 98 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
	go.etcd.io/[email protected]/db.go:1202 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 69

I'll see if i can get the db pulled from the running machine. Sorry for no db blob.

is there any means of recovery from this issue ?

The text was updated successfully, but these errors were encountered:

ahrtr · 2024-06-26T17:32:18Z

After restarting the process

Did the machine/VM restarted?

Do you have a detailed steps to reproduce this issue by any chance?

I'll see if i can get the db pulled from the running machine

Yes, it would be helpful if you could provide the db file.

is there any means of recovery from this issue ?

We have surgery commands, but again we need to the db file firstly.

ahrtr · 2024-06-27T13:53:43Z

This seems like also a potential issue of the freelist management, e.g a page was allocated twice, accordingly it's referenced by multiple places.

timannguyen · 2024-06-27T20:02:46Z

I could not get an approval to provide the db file. sorry for the trouble.

fuweid · 2024-07-02T09:00:55Z

Hi @timshannon just want to confirm that the issue show up after the host or vm where the process run has been restarted? Thanks

timshannon · 2024-07-02T16:55:55Z

@fuweid I assume you meant to call out @timannguyen not @timshannon right?

ahrtr · 2024-07-02T17:48:40Z

@fuweid I assume you meant to call out @timannguyen not @timshannon right?

Right, sorry for the confusion.

fuweid · 2024-07-02T22:19:19Z

sorry @timshannon for the confusion. I should ping @timannguyen

timannguyen · 2024-07-08T14:22:30Z

@fuweid sorry for the delay, the process was restarted and the issue occurs.

ahrtr · 2024-07-08T14:58:14Z

@fuweid sorry for the delay, the process was restarted and the issue occurs.

Thanks for the feedback but the questions isn't answered. The question is "Did the machine/VM restarted right before the issue being occurred?".

Also please also provide the following info,

GOOS and version;
GOARCH;
filesystem;

I could not get an approval to provide the db file. sorry for the trouble.

Could you dump the db with all the value redacted?

./bbolt page  --all --format-value=redacted path-2-db

lnnt · 2024-08-28T12:46:27Z

@fuweid sorry for the delay, the process was restarted and the issue occurs.

Thanks for the feedback but the questions isn't answered. The question is "Did the machine/VM restarted right before the issue being occurred?".

Also please also provide the following info,

GOOS and version;

GOARCH;

filesystem;

I could not get an approval to provide the db file. sorry for the trouble.

Could you dump the db with all the value redacted?
./bbolt page  --all --format-value=redacted path-2-db

@ahrtr I met the same issue.
Here is my db dump:
db.txt

lnnt · 2024-08-29T03:01:24Z

@fuweid sorry for the delay, the process was restarted and the issue occurs.

Thanks for the feedback but the questions isn't answered. The question is "Did the machine/VM restarted right before the issue being occurred?".
Also please also provide the following info,

GOOS and version;

GOARCH;

filesystem;

I could not get an approval to provide the db file. sorry for the trouble.

Could you dump the db with all the value redacted?
./bbolt page  --all --format-value=redacted path-2-db
@ahrtr I met the same issue. Here is my db dump: db.txt

My machine met unexpected power down, then bbolt db say:

panic: freepages: failed to get all reachable pages (page 4271: multiple references (stack: [1079 3745 4271]))
goroutine 160 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
         go.etcd.io/[email protected]/db.go:1202 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 159
         go.etcd.io/[email protected]/db.go:1200 +0x1e5

Here is my db file:
db.zip

ahrtr · 2024-08-29T06:14:34Z

Here is my db file:
db.zip

Thanks. I will take a look later.

ahrtr · 2024-08-30T11:08:20Z

My machine met unexpected power down, then bbolt db say:

panic: freepages: failed to get all reachable pages (page 4271: multiple references (stack: [1079 3745 4271]))
goroutine 160 [running]:
go.etcd.io/bbolt.(*DB).freepages.func2()
         go.etcd.io/[email protected]/db.go:1202 +0x8d
created by go.etcd.io/bbolt.(*DB).freepages in goroutine 159
         go.etcd.io/[email protected]/db.go:1200 +0x1e5

Here is my db file: db.zip

The last transaction before the unexpected poweroff somehow corrupted the db. One possible reason is some page (i.e. 1079) wasn't successfully synced to disk even the fdatasync systemcall returned successfully.

The workaround to fix the corrupted db is to rollback the last transaction,

$ ./bbolt surgery revert-meta-page ./db.db --output ./new.db
The meta page is reverted.

$ ./bbolt check ./new.db 
OK

Reference

F5WPP69Q7H:bbolt wachao$ pwd
/Users/wachao/go/src/github.com/ahrtr/bbolt
F5WPP69Q7H:bbolt wachao$ ./bbolt page db.db 0
Page ID:    0
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=5150>
Freelist:   <pgid=18446744073709551615>
HWM:        <pgid=11865>
Txn ID:     10789638
Checksum:   4e1e8a40b76f0de1

F5WPP69Q7H:bbolt wachao$ ./bbolt page db.db 1
Page ID:    1
Page Type:  meta
Total Size: 4096 bytes
Overflow pages: 0
Version:    2
Page Size:  4096 bytes
Flags:      00000000
Root:       <pgid=11036>
Freelist:   <pgid=18446744073709551615>
HWM:        <pgid=11865>
Txn ID:     10789637
Checksum:   bb9a5cce3b6710a5

F5WPP69Q7H:bbolt wachao$ ./bbolt page ./db.db 5150
Page ID:    5150
Page Type:  leaf
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 10

"alarm": <pgid=0,seq=0>
"auth": <pgid=0,seq=0>
"authRoles": <pgid=0,seq=0>
"authUsers": <pgid=0,seq=0>
"cluster": <pgid=0,seq=0>
"key": <pgid=1079,seq=0>
"lease": <pgid=6891,seq=0>
"members": <pgid=0,seq=0>
"members_removed": <pgid=0,seq=0>
"meta": <pgid=0,seq=0>

F5WPP69Q7H:bbolt wachao$ ./bbolt page ./db.db 1079
Page ID:    1079
Page Type:  branch
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 52

0000000000fd22985f0000000000000000: <pgid=4271>
0000000000fd229d5f0000000000000000: <pgid=3874>
0000000000fd22a15f0000000000000000: <pgid=3875>
0000000000fd22a55f0000000000000000: <pgid=4288>
0000000000fd22aa5f0000000000000000: <pgid=7785>
0000000000fd22ae5f0000000000000000: <pgid=1132>
0000000000fd22b35f0000000000000000: <pgid=8839>
0000000000fd22b75f0000000000000000: <pgid=10820>
0000000000fd22bc5f0000000000000000: <pgid=11084>
0000000000fd22c15f0000000000000000: <pgid=6097>
0000000000fd22c65f0000000000000000: <pgid=4104>
0000000000fd22cb5f0000000000000000: <pgid=7786>
0000000000fd22d15f0000000000000000: <pgid=10821>
0000000000fd22d55f0000000000000000: <pgid=9570>
0000000000fd22d95f0000000000000000: <pgid=2861>
0000000000fd22de5f0000000000000000: <pgid=5190>
0000000000fd22e35f0000000000000000: <pgid=9571>
0000000000fd22e75f0000000000000000: <pgid=7264>
0000000000fd22eb5f0000000000000000: <pgid=2860>
0000000000fd22f15f0000000000000000: <pgid=7787>
0000000000fd22f55f0000000000000000: <pgid=11035>
0000000000fd22fb5f0000000000000000: <pgid=7084>
0000000000fd22ff5f0000000000000000: <pgid=7265>
0000000000fd23055f0000000000000000: <pgid=10822>
0000000000fd230a5f0000000000000000: <pgid=2907>
0000000000fd230e5f0000000000000000: <pgid=2906>
0000000000fd23135f0000000000000000: <pgid=2151>
0000000000fd23175f0000000000000000: <pgid=9010>
0000000000fd231c5f0000000000000000: <pgid=2150>
0000000000fd23205f0000000000000000: <pgid=7788>
0000000000fd23255f0000000000000000: <pgid=11185>
0000000000fd232a5f0000000000000000: <pgid=2213>
0000000000fd232e5f0000000000000000: <pgid=2212>
0000000000fd23345f0000000000000000: <pgid=2629>
0000000000fd23385f0000000000000000: <pgid=5821>
0000000000fd233d5f0000000000000000: <pgid=2628>
0000000000fd23425f0000000000000000: <pgid=7639>
0000000000fd23465f0000000000000000: <pgid=5820>
0000000000fd234c5f0000000000000000: <pgid=419>
0000000000fd23505f0000000000000000: <pgid=11187>
0000000000fd23565f0000000000000000: <pgid=11186>
0000000000fd235a5f0000000000000000: <pgid=8300>
0000000000fd235f5f0000000000000000: <pgid=7630>
0000000000fd23635f0000000000000000: <pgid=10824>
0000000000fd23675f0000000000000000: <pgid=10823>
0000000000fd236b5f0000000000000000: <pgid=626>
0000000000fd23705f0000000000000000: <pgid=9930>
0000000000fd23755f0000000000000000: <pgid=7093>
0000000000fd237b5f0000000000000000: <pgid=7789>
0000000000fd237f5f0000000000000000: <pgid=4512>
0000000000fd23855f0000000000000000: <pgid=9931>
0000000000fd238a5f0000000000000000: <pgid=3745>

F5WPP69Q7H:bbolt wachao$ 
F5WPP69Q7H:bbolt wachao$ 
F5WPP69Q7H:bbolt wachao$ ./bbolt page ./db.db 3745
Page ID:    3745
Page Type:  branch
Total Size: 4096 bytes
Overflow pages: 0
Item Count: 68

0000000000fd22985f0000000000000000: <pgid=4271>
0000000000fd229d5f0000000000000000: <pgid=3874>
0000000000fd22a15f0000000000000000: <pgid=3875>
0000000000fd22a55f0000000000000000: <pgid=4288>
0000000000fd22aa5f0000000000000000: <pgid=7785>
0000000000fd22ae5f0000000000000000: <pgid=1132>
0000000000fd22b35f0000000000000000: <pgid=8839>
0000000000fd22b75f0000000000000000: <pgid=10820>
0000000000fd22bc5f0000000000000000: <pgid=11084>
0000000000fd22c15f0000000000000000: <pgid=6097>
0000000000fd22c65f0000000000000000: <pgid=4104>
0000000000fd22cb5f0000000000000000: <pgid=7786>
0000000000fd22d15f0000000000000000: <pgid=10821>
0000000000fd22d55f0000000000000000: <pgid=9570>
0000000000fd22d95f0000000000000000: <pgid=2861>
0000000000fd22de5f0000000000000000: <pgid=5190>
0000000000fd22e35f0000000000000000: <pgid=9571>
0000000000fd22e75f0000000000000000: <pgid=7264>
0000000000fd22eb5f0000000000000000: <pgid=2860>
0000000000fd22f15f0000000000000000: <pgid=7787>
0000000000fd22f55f0000000000000000: <pgid=11035>
0000000000fd22fb5f0000000000000000: <pgid=7084>
0000000000fd22ff5f0000000000000000: <pgid=7265>
0000000000fd23055f0000000000000000: <pgid=10822>
0000000000fd230a5f0000000000000000: <pgid=2907>
0000000000fd230e5f0000000000000000: <pgid=2906>
0000000000fd23135f0000000000000000: <pgid=2151>
0000000000fd23175f0000000000000000: <pgid=9010>
0000000000fd231c5f0000000000000000: <pgid=2150>
0000000000fd23205f0000000000000000: <pgid=7788>
0000000000fd23255f0000000000000000: <pgid=11185>
0000000000fd232a5f0000000000000000: <pgid=2213>
0000000000fd232e5f0000000000000000: <pgid=2212>
0000000000fd23345f0000000000000000: <pgid=2629>
0000000000fd23385f0000000000000000: <pgid=5821>
0000000000fd233d5f0000000000000000: <pgid=2628>
0000000000fd23425f0000000000000000: <pgid=7639>
0000000000fd23465f0000000000000000: <pgid=5820>
0000000000fd234c5f0000000000000000: <pgid=419>
0000000000fd23505f0000000000000000: <pgid=11187>
0000000000fd23565f0000000000000000: <pgid=11186>
0000000000fd235a5f0000000000000000: <pgid=8300>
0000000000fd235f5f0000000000000000: <pgid=7630>
0000000000fd23635f0000000000000000: <pgid=10824>
0000000000fd23675f0000000000000000: <pgid=10823>
0000000000fd236b5f0000000000000000: <pgid=626>
0000000000fd23705f0000000000000000: <pgid=9930>
0000000000fd23755f0000000000000000: <pgid=7093>
0000000000fd237b5f0000000000000000: <pgid=7789>
0000000000fd237f5f0000000000000000: <pgid=4512>
0000000000fd23855f0000000000000000: <pgid=9931>
0000000000fd238a5f0000000000000000: <pgid=2496>
0000000000fd238f5f0000000000000000: <pgid=4526>
0000000000fd23935f0000000000000000: <pgid=10422>
0000000000fd23985f0000000000000000: <pgid=6237>
0000000000fd239c5f0000000000000000: <pgid=11188>
0000000000fd23a05f0000000000000000: <pgid=10423>
0000000000fd23a65f0000000000000000: <pgid=1306>
0000000000fd23ac5f0000000000000000: <pgid=8301>
0000000000fd23b15f0000000000000000: <pgid=1305>
0000000000fd23b55f0000000000000000: <pgid=6890>
0000000000fd23b95f0000000000000000: <pgid=9932>
0000000000fd23be5f0000000000000000: <pgid=2475>
0000000000fd23c25f0000000000000000: <pgid=8302>
0000000000fd23c85f0000000000000000: <pgid=781>
0000000000fd23cc5f0000000000000000: <pgid=456>
0000000000fd23d05f0000000000000000: <pgid=9933>
0000000000fd23d65f0000000000000000: <pgid=2252>

ahrtr · 2024-08-30T15:04:40Z

@timshannon @lnnt Could you also share your systeminfo using https:/ahrtr/systeminfo?

timshannon · 2024-08-30T15:09:01Z

@timannguyen ⏫

lnnt · 2024-08-30T15:12:34Z

@timshannon @lnnt Could you also share your systeminfo using https:/ahrtr/systeminfo?

cpu.IsBigEndian: false
os.Getpagesize(): 4096
runtime.GOOS: linux
runtime.GOARCH: amd64

ahrtr · 2024-09-04T13:16:31Z

As mentioned in previous comment, the last transaction (10789638) before the unexpected poweroff somehow corrupted the db.

After rolling back the last transaction, the db is back to normal again. The max revision under the last good state is 16589739

old (good state before TXN 10789638):
Root: 11036 --> 6750 (leaf: key, index: 22/22) --> 3745 (index: 57/67) --> 1306 (index: 5/5, rev: 16589739)

The etcd-dump-db scan-keys command was added in etcd-io/etcd#18539

$ ./etcd-dump-db scan-keys ~/tmp/etcd/778/db.db 16589739 2>/dev/null | grep "/registry/configmaps/istio-system/istio-namespace-controller-election"
pageID=1306, index=5, rev={Revision:{Main:16589739 Sub:0} tombstone:false}, value=[key "/registry/configmaps/istio-system/istio-namespace-controller-election" | val "k8s\x00\n\x0f\n\x02v1\x12\tConfigMap\x12\xeb\x03\n\xe8\x03\n#istio-namespace-controller-election\x12\x00\x1a\fistio-system\"\x00*$bb696087-260d-4167-bf06-17d3361f9b5f2\x008\x00B\b\b\x9e\xbe\xed\xb5\x06\x10\x00b\xe6\x01\n(control-plane.alpha.kubernetes.io/leader\x12\xb9\x01{\"holderIdentity\":\"istiod-d56968787-txq2d\",\"holderKey\":\"default\",\"leaseDurationSeconds\":30,\"acquireTime\":\"2024-08-13T13:26:54Z\",\"renewTime\":\"2024-08-27T06:16:13Z\",\"leaderTransitions\":0}\x8a\x01\x90\x01\n\x0fpilot-discovery\x12\x06Update\x1a\x02v1\"\b\b\xad\u07b5\xb6\x06\x10\x002\bFieldsV1:[\nY{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:control-plane.alpha.kubernetes.io/leader\":{}}}}B\x00\x1a\x00\"\x00" | created 9612546 | mod 16589739 | ver 157604]
pageID=4737, index=4, rev={Revision:{Main:16589786 Sub:0} tombstone:false}, value=[key "/registry/configmaps/istio-system/istio-namespace-controller-election" | val "k8s\x00\n\x0f\n\x02v1\x12\tConfigMap\x12\xeb\x03\n\xe8\x03\n#istio-namespace-controller-election\x12\x00\x1a\fistio-system\"\x00*$bb696087-260d-4167-bf06-17d3361f9b5f2\x008\x00B\b\b\x9e\xbe\xed\xb5\x06\x10\x00b\xe6\x01\n(control-plane.alpha.kubernetes.io/leader\x12\xb9\x01{\"holderIdentity\":\"istiod-d56968787-txq2d\",\"holderKey\":\"default\",\"leaseDurationSeconds\":30,\"acquireTime\":\"2024-08-13T13:26:54Z\",\"renewTime\":\"2024-08-27T06:16:21Z\",\"leaderTransitions\":0}\x8a\x01\x90\x01\n\x0fpilot-discovery\x12\x06Update\x1a\x02v1\"\b\b\xb5\u07b5\xb6\x06\x10\x002\bFieldsV1:[\nY{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:control-plane.alpha.kubernetes.io/leader\":{}}}}B\x00\x1a\x00\"\x00" | created 9612546 | mod 16589786 | ver 157605]

new (bad state after TXN 10789638):
Root: 5150  --> 1079 (leaf: key, index: ?) --> ??? --> 4737 (index: 4/4, rev: 16589786)

The problem is the leaf page 4737 and root page 5150 were persisted successfully, but some branch pages were not. As far as I can tell, most likely it's the file system's issue.

Note that even after TXN 10789638, the page 11036, 6750, 3745 and 1306 all kept unchanged. It's expected behavior.
After rolling back TXN 10789638, the db is back to normal. It means all the existing pages (including the ones mentioned above) kept unchanged. It's expected.
The page 1079 is obviously in a bad state. The problem is that its parent (5150) was successfully persisted to disk, but It wasn't. So its parent points to a bad page, which is exactly the reason for the corruption.
Note that all 5150, 1079, and 4737 were free pages before executing TXN 10789638.

As a followup, it's helpful if we can figure out the missing intermediate branch page between 1079 and 4737. We need to add a command something like "find-parent", e.g. find the parent of 4737 in this example.

serathius · 2024-10-04T08:37:53Z

@ahrtr 2 questions:

Would it be possible to reproduce this case using LazyFS to have a solid understanding of the issue and maybe consider making bbolt resiliant to it?
Could we consider bbolt automatically reverting to a previous TXN if data on the tree on current TXN is broken?

ahrtr · 2024-10-04T10:11:32Z

Would it be possible to reproduce this case using LazyFS to have a solid understanding of the issue and maybe consider making bbolt resiliant to it?

We already introduced dmflakey, which is similar to LazyFS, and without the possible downside of LazyFS, which (LazyFS) may not respect the fdatasync semantics. Please refer to fuweid/go-dmflakey#1

Could we consider bbolt automatically reverting to a previous TXN if data on the tree on current TXN is broken?

Yes, It's kind of good point. I already thought about it, but not automatically done by bbolt. Based on the data corruption cases in the past around 3 years, IIRC, only minor parts could be fixed by bbolt surgery revert-meta-page. Also we are trying to simplify the code base instead of adding more logic into it, especially for such upper layer or users' logic. Instead, I would suggest to provide a guide for users to manually try it for now. We will revisit this after we have more confidence on the data corruption issue.

It also has downside... there may be data loss after reverting to previous TXN.

serathius · 2024-10-04T12:31:12Z

We already introduced dmflakey, which is similar to LazyFS

Looks like issue is closed. Do you mean #568? Repeating the ask, could we cover this exact scenario using dmflakey?

It also has downside... there may be data loss after reverting to previous TXN.

Just asking for this to be considered, because I would want to better understand the risk of revert. I think high level question is, is there a chance of previous transaction to be silently broken if the latest transaction was corrupted?

Don't think this needs to be a default option, might just be a feature in etcd as I think it would benefit it's ability to recover. For systems like etcd, bbolt is not the source of truth, just the kv database. We can still recover the last transaction from WAL. For cases where only the latest transaction was corrupted, instead of etcd giving up the bootstrap, we would just revert it and replay the transaction again from the WAL.

If the last transaction can be trusted I would propose that etcd should always try to revert meta page if the state of btree on the latest meta page is broken. But before implementing this we should verify and test health of previous transaction under different corruption scenarios.

ahrtr added the area/corruption label Jun 26, 2024

ahrtr mentioned this issue Jun 27, 2024

[DO NOT REVIERW] try to reproduce the db corruption issue #769

Draft

ahrtr mentioned this issue Jul 8, 2024

Refactor freelist management & add more dedicated tests #789

Open

6 tasks

ahrtr pinned this issue Sep 2, 2024

ahrtr mentioned this issue Sep 4, 2024

Support scan-keys command to scan the key space starting a specific revision etcd-io/etcd#18539

Merged

tjungblu mentioned this issue Sep 10, 2024

using primary and slave DBs to solve the panic problem caused by DB c… #830

Closed

zouyee mentioned this issue Sep 30, 2024

After hard shutdown, containerd fails with panic when starting Docker Desktop containerd/containerd#10666

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

after restart, bbolt db failed to get all reachable pages #778

after restart, bbolt db failed to get all reachable pages #778

timannguyen commented Jun 26, 2024

ahrtr commented Jun 26, 2024

ahrtr commented Jun 27, 2024

timannguyen commented Jun 27, 2024

fuweid commented Jul 2, 2024

timshannon commented Jul 2, 2024

ahrtr commented Jul 2, 2024

fuweid commented Jul 2, 2024

timannguyen commented Jul 8, 2024

ahrtr commented Jul 8, 2024

lnnt commented Aug 28, 2024

lnnt commented Aug 29, 2024

ahrtr commented Aug 29, 2024

ahrtr commented Aug 30, 2024

ahrtr commented Aug 30, 2024

timshannon commented Aug 30, 2024

lnnt commented Aug 30, 2024

ahrtr commented Sep 4, 2024

serathius commented Oct 4, 2024

ahrtr commented Oct 4, 2024

serathius commented Oct 4, 2024 •

edited

Loading

after restart, bbolt db failed to get all reachable pages #778

after restart, bbolt db failed to get all reachable pages #778

Comments

timannguyen commented Jun 26, 2024

ahrtr commented Jun 26, 2024

ahrtr commented Jun 27, 2024

timannguyen commented Jun 27, 2024

fuweid commented Jul 2, 2024

timshannon commented Jul 2, 2024

ahrtr commented Jul 2, 2024

fuweid commented Jul 2, 2024

timannguyen commented Jul 8, 2024

ahrtr commented Jul 8, 2024

lnnt commented Aug 28, 2024

lnnt commented Aug 29, 2024

ahrtr commented Aug 29, 2024

ahrtr commented Aug 30, 2024

ahrtr commented Aug 30, 2024

timshannon commented Aug 30, 2024

lnnt commented Aug 30, 2024

ahrtr commented Sep 4, 2024

serathius commented Oct 4, 2024

ahrtr commented Oct 4, 2024

serathius commented Oct 4, 2024 • edited Loading

serathius commented Oct 4, 2024 •

edited

Loading