Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gap in chain (investigation) #26483

Closed
holiman opened this issue Jan 12, 2023 · 18 comments
Closed

Gap in chain (investigation) #26483

holiman opened this issue Jan 12, 2023 · 18 comments
Labels

Comments

@holiman
Copy link
Contributor

holiman commented Jan 12, 2023

Stopping Go Ethereum Client...
INFO [01-11|09:23:52.203] Got interrupt, shutting down...
INFO [01-11|09:23:52.207] HTTP server stopped                      endpoint=100.98.99.93:8545
INFO [01-11|09:23:52.207] HTTP server stopped                      endpoint=127.0.0.1:8551
INFO [01-11|09:23:52.208] IPC endpoint closed                      url=/var/lib/goethereum/geth.ipc
INFO [01-11|09:23:52.215] Ethereum protocol stopped
INFO [01-11|09:23:52.215] Transaction pool stopped
INFO [01-11|09:23:52.282] Writing cached state to disk             block=16,380,262 hash=45dd8d..d179f5 root=64bc16..3fab91
WARN [01-11|09:24:00.887] Snapshot extension registration failed   peer=59d2a329 err="peer connected on snap without compatible eth support"
INFO [01-11|09:24:02.227] Looking for peers                        peercount=0  tried=138 static=0
INFO [01-11|09:24:12.282] Looking for peers                        peercount=1  tried=93  static=0
WARN [01-11|09:24:20.847] Previously seen beacon client is offline. Please ensure it is operational to follow the chain!
INFO [01-11|09:24:22.026] Persisted trie from memory database      nodes=21,477,115 size=4.95GiB time=2m45.93433664s  gcnodes=71,174,970 gcsize=27.90GiB gctime=4m34.927021369s livenodes=372,921 livesize=145.41MiB
INFO [01-11|09:24:22.026] Writing cached state to disk             block=16,380,261 hash=90594b..fcfa5e root=d3b906..60e10b
INFO [01-11|09:24:22.037] Persisted trie from memory database      nodes=2372       size=950.64KiB time=11.415431ms     gcnodes=0          gcsize=0.00B    gctime=0s              livenodes=370,549 livesize=144.49MiB
INFO [01-11|09:24:22.037] Writing cached state to disk             block=16,380,135 hash=97e6c5..376b07 root=a2d708..b5bc31
INFO [01-11|09:24:22.298] Looking for peers                        peercount=0  tried=100 static=0
INFO [01-11|09:24:22.990] Persisted trie from memory database      nodes=128,837    size=47.27MiB  time=952.676588ms    gcnodes=0          gcsize=0.00B    gctime=0s              livenodes=241,712 livesize=97.22MiB
INFO [01-11|09:24:22.990] Writing snapshot state to disk           root=ef8ca2..73e4dc
INFO [01-11|09:24:22.990] Persisted trie from memory database      nodes=0          size=0.00B     time="4.429µs"       gcnodes=0          gcsize=0.00B    gctime=0s              livenodes=241,712 livesize=97.22MiB
ERROR[01-11|09:24:23.615] Dangling trie nodes after full cleanup
INFO [01-11|09:24:23.615] Writing clean trie cache to disk         path=/var/lib/goethereum/geth/triecache threads=8
INFO [01-11|09:24:24.272] Persisted the clean trie cache           path=/var/lib/goethereum/geth/triecache elapsed=656.799ms
INFO [01-11|09:24:24.272] Blockchain stopped
geth.service: Deactivated successfully.
Stopped Go Ethereum Client.
geth.service: Consumed 1d 10h 57min 51.015s CPU time.
-- Boot 1c02008566004ee6a89e3b36b0ff4cc9 --
Started Go Ethereum Client.
INFO [01-11|10:56:45.251] Starting pprof server                    addr=http://127.0.0.1:6070/debug/pprof
INFO [01-11|10:56:45.259] Starting Geth on Ethereum mainnet...
INFO [01-11|10:56:45.262] Maximum peer count                       ETH=25 LES=0 total=25
INFO [01-11|10:56:45.264] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
INFO [01-11|10:56:45.272] Set global gas cap                       cap=50,000,000
INFO [01-11|10:56:45.275] Allocated trie memory caches             clean=900.00MiB dirty=1.46GiB
INFO [01-11|10:56:45.276] Allocated cache and file handles         database=/var/lib/goethereum/geth/chaindata cache=2.93GiB handles=262,144
INFO [01-11|10:56:59.075] Found legacy ancient chain path          location=/media/seassd/ancient
WARN [01-11|10:56:59.140] Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.20MiB stored=596.05MiB
WARN [01-11|10:56:59.140] Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.14MiB stored=596.05MiB
WARN [01-11|10:56:59.140] Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.14MiB stored=596.05MiB
WARN [01-11|10:56:59.140] Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.10MiB stored=596.05MiB
WARN [01-11|10:56:59.140] Truncating dangling head                 database=/media/seassd/ancient              table=bodies indexed=596.05MiB stored=596.05MiB
WARN [01-11|10:56:59.190] Truncating dangling indexes              database=/media/seassd/ancient              table=receipts indexed=1.71GiB   stored=1.71GiB
WARN [01-11|10:56:59.190] Truncating dangling indexes              database=/media/seassd/ancient              table=receipts indexed=1.71GiB   stored=1.71GiB
WARN [01-11|10:56:59.190] Truncating dangling indexes              database=/media/seassd/ancient              table=receipts indexed=1.71GiB   stored=1.71GiB
WARN [01-11|10:56:59.190] Truncating dangling indexes              database=/media/seassd/ancient              table=receipts indexed=1.71GiB   stored=1.71GiB
WARN [01-11|10:56:59.190] Truncating dangling indexes              database=/media/seassd/ancient              table=receipts indexed=1.71GiB   stored=1.71GiB
WARN [01-11|10:56:59.190] Truncating dangling head                 database=/media/seassd/ancient              table=receipts indexed=1.71GiB   stored=1.71GiB
WARN [01-11|10:56:59.227] Truncating freezer table                 database=/media/seassd/ancient              table=diffs    items=16,289,887 limit=16,289,882
WARN [01-11|10:56:59.228] Truncating freezer table                 database=/media/seassd/ancient              table=headers  items=16,289,887 limit=16,289,882
WARN [01-11|10:56:59.229] Truncating freezer table                 database=/media/seassd/ancient              table=hashes   items=16,289,887 limit=16,289,882
INFO [01-11|10:56:59.229] Opened ancient database                  database=/media/seassd/ancient              readonly=false
Fatal: Failed to register the Ethereum service: gap in the chain between ancients (#16289882) and leveldb (#16380262)

Shutdown

  • Writes cached state for 16,380,262, 16,380,261 and 16,380,135
  • Which means that 16,380,262 is the most recent state.

Startup:

  • Truncating freezer table only trimmed away 5 items: items=16,289,887 limit=16,289,882
  • gap in the chain
    • between ancients (#16289882) and leveldb (#16380262)
    • ancients is at 16,289,882, as expected after the trim.
    • leveldb is at 16380262, the difference here is 90380.
    • But the number we're comparing against 16380262 -- that is the HEAD. Why are
      we expecting the HEAD to link up with the ancients? Answer: we're not, normally, that's just a special-case when we do the insert-directly-to-ancients during sync. In normal case, if we wound up here, there's already a confirmed gap.

Relevant code

  	// If the genesis hash is empty, we have a new key-value store, so nothing to
	// validate in this method. If, however, the genesis hash is not nil, compare
	// it to the freezer content.
	if kvgenesis, _ := db.Get(headerHashKey(0)); len(kvgenesis) > 0 {
		if frozen, _ := frdb.Ancients(); frozen > 0 {
			// If the freezer already contains something, ensure that the genesis blocks
			// match, otherwise we might mix up freezers across chains and destroy both
			// the freezer and the key-value store.
			frgenesis, err := frdb.Ancient(chainFreezerHashTable, 0)
			if err != nil {
				return nil, fmt.Errorf("failed to retrieve genesis from ancient %v", err)
			} else if !bytes.Equal(kvgenesis, frgenesis) {
				return nil, fmt.Errorf("genesis mismatch: %#x (leveldb) != %#x (ancients)", kvgenesis, frgenesis)
			}
			// Key-value store and freezer belong to the same network. Ensure that they
			// are contiguous, otherwise we might end up with a non-functional freezer.
			if kvhash, _ := db.Get(headerHashKey(frozen)); len(kvhash) == 0 {
				// Subsequent header after the freezer limit is missing from the database.
				// Reject startup if the database has a more recent head.
				if ldbNum := *ReadHeaderNumber(db, ReadHeadHeaderHash(db)); ldbNum > frozen-1 {
					return nil, fmt.Errorf("gap in the chain between ancients (#%d) and leveldb (#%d) ", frozen, ldbNum)
				}
				// Database contains only older data than the freezer, this happens if the
				// state was wiped and reinited from an existing freezer.
			}
@holiman
Copy link
Contributor Author

holiman commented Jan 12, 2023

Ah, I get it.

  1. So, we moved 0-16,289,887 into ancients.
  2. For unknown reasons, there ancients are slighly corrupt, and we trim away five things, landing at 0-16,289,882
  3. Which means that 16,289,887,16,289,886,16,289,885,16,289,884,16,289,883 are simply missing, since it's in neither ancients nor leveldb.

@holiman
Copy link
Contributor Author

holiman commented Jan 12, 2023

table=bodies indexed=596.20MiB stored=596.05MiB --- this indicates that for bodies, we were expecting more than we saw. So we had indexed M items, but found less than M bodies.

@holiman
Copy link
Contributor Author

holiman commented Jan 12, 2023

I wonder what the filesystem is on /media/seassd/ancient cc @MariusVanDerWijden

@rjl493456442
Copy link
Member

BTW,

return nil, fmt.Errorf("gap in the chain between ancients (#%d) and leveldb (#%d) ", frozen, ldbNum)

It's ambiguous to me. It feels like we miss 90k data, but actually not. Although it's very hard to know what's the oldest one in kv store.

@rjl493456442
Copy link
Member

rjl493456442 commented Jan 12, 2023

WARN [01-11|10:56:59.140] Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.20MiB stored=596.05MiB
WARN [01-11|10:56:59.140] Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.14MiB stored=596.05MiB
WARN [01-11|10:56:59.140] Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.14MiB stored=596.05MiB
WARN [01-11|10:56:59.140] Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.10MiB stored=596.05MiB
WARN [01-11|10:56:59.140] Truncating dangling head                 database=/media/seassd/ancient              table=bodies indexed=596.05MiB stored=596.05MiB

It's interesting. Usually we truncate the higher one by deleting items one by one.
But log shows that we delete items from index file first, then somehow truncate
data file as well. Finally these two are aligned.

Honestly it's pretty weird, looks like data file misses some bodies and a partial body.

@fjl
Copy link
Contributor

fjl commented Jan 12, 2023

Main issue where people have reported these cases: #22374

There, I asked a while ago whether Truncating dangling xxx log message was printed, but nobody replied. Good to see we finally have a log where it was actually printed.

@holiman
Copy link
Contributor Author

holiman commented Jan 13, 2023

Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.20MiB stored=596.05MiB
Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.14MiB stored=596.05MiB
Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.14MiB stored=596.05MiB
Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.10MiB stored=596.05MiB
Truncating dangling head                 database=/media/seassd/ancient              table=bodies indexed=596.05MiB stored=596.05MiB

This tells us the following:

  • The corruption was not on the border of two data-files, since they grow up to 2GB. The data-file was size 596MiB.
  • As @rjl493456442 already pointed out, about 3.5, or three and a bit more, bodies were "lost". It would be interesting to know the exact size of the "lost chunk of data", but unfortunately the pretty-printing ruined that. It's around 0.15MiB that was dropped from bodies.cdat.
  • In total, it appears that 4 items were trimmed away.
Truncating dangling indexes              database=/media/seassd/ancient              table=receipts indexed=1.71GiB   stored=1.71GiB
Truncating dangling indexes              database=/media/seassd/ancient              table=receipts indexed=1.71GiB   stored=1.71GiB
Truncating dangling indexes              database=/media/seassd/ancient              table=receipts indexed=1.71GiB   stored=1.71GiB
Truncating dangling indexes              database=/media/seassd/ancient              table=receipts indexed=1.71GiB   stored=1.71GiB
Truncating dangling indexes              database=/media/seassd/ancient              table=receipts indexed=1.71GiB   stored=1.71GiB
Truncating dangling head                 database=/media/seassd/ancient              table=receipts indexed=1.71GiB   stored=1.71GiB

This is also safely "within" a file, not near a crossover.

  • 4.5 receipts-blobs were dropped.
  • In total, it appears that 5 items were trimmed away.

And then:

Truncating freezer table                 database=/media/seassd/ancient              table=diffs    items=16,289,887 limit=16,289,882
Truncating freezer table                 database=/media/seassd/ancient              table=headers  items=16,289,887 limit=16,289,882
Truncating freezer table                 database=/media/seassd/ancient              table=hashes   items=16,289,887 limit=16,289,882

So, for the diffs, headers, and hashes, we remove 5 items. So how come we do not remove a fifth item from bodies?

@holiman
Copy link
Contributor Author

holiman commented Jan 13, 2023

So, for the diffs, headers, and hashes, we remove 5 items. So how come we do not remove a fifth item from bodies?

Ah, we probably do, but we don't think it's worth logging:

	log := t.logger.Debug
	if existing > items+1 {
		log = t.logger.Warn // Only loud warn if we delete multiple items
	}
	log("Truncating freezer table", "items", existing, "limit", items)

I think we should change that - deleting anything is definitely worth logging. Ah, actually, we can't change that -- see #21483

Geth can roll back during fast sync. With the unified rollback handling, all these paths use the same code that printed the warning. This resulted in 2000 truncation warnings printed every time fast sync failed (2K headers rolled back).

@holiman
Copy link
Contributor Author

holiman commented Jan 13, 2023

I downloaded the index files from a production node, to check the data layouts / sizes.

bodies:
| number | fileno | offset |
|--------|--------|--------|
|  16289882   |  147   |  624999924   | <-- New latest
|  16289883   |  147   |  625054809   | <-- removed later
|  16289884   |  147   |  625094840   | <-- half gone
|  16289885   |  147   |  625097660   | <-- gone
|  16289886   |  147   |  625161101   | <-- gone
|  16289887   |  147   |  625212679   | <-- gone
|  16289888   |  147   |  625246568   | 
|--------------------------|


receipts:
| number | fileno | offset |
|--------|--------|--------|
|  16289882   |  068   |  1840165929   | <-- New latest
|  16289883   |  068   |  1840193399   | <-- half gone
|  16289884   |  068   |  1840202201   | <-- gone
|  16289885   |  068   |  1840204334   | <-- gone
|  16289886   |  068   |  1840233485   | <-- gone
|  16289887   |  068   |  1840245912   | <-- gone
|  16289888   |  068   |  1840257531   | 
|--------------------------|

The data lost in receipts, is somewhere between

>>> 1840257531-1840193399
64132
>>> 1840257531-1840202201
55330

So around 60K of data lost.

For bodies

>>> 625246568-625094840
151728
>>> 625246568-625097660
148908

Around 150K was lost.

@MariusVanDerWijden
Copy link
Member

@nisdas this is our thread investigating your issue, could you do a disk check on both the internal and external drive?

@nisdas
Copy link
Contributor

nisdas commented Jan 19, 2023

I could only run smart checks on my internal drives, but here it is:

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-57-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 1TB
Serial Number:                      XXXXXXXXXXX
Firmware Version:                   XXXXXXXXXX
PCI Vendor/Subsystem ID:            XXXXXX
IEEE OUI Identifier:                XXXXXXXXX
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      4
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            568,299,421,696 [568 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            XXXXXXXX
Local Time is:                      Thu Jan 19 17:46:57 2023 +08
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.80W       -        -    0  0  0  0        0       0
 1 +     6.00W       -        -    1  1  1  1        0       0
 2 +     3.40W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3      210    1200
 4 -   0.0100W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        45 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    38%
Data Units Read:                    1,073,615,542 [549 TB]
Data Units Written:                 749,692,485 [383 TB]
Host Read Commands:                 63,071,682,328
Host Write Commands:                54,082,004,201
Controller Busy Time:               66,495
Power Cycles:                       22
Power On Hours:                     11,215
Unsafe Shutdowns:                   12
Media and Data Integrity Errors:    0
Error Information Log Entries:      38
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               45 Celsius
Temperature Sensor 2:               51 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0         38     0  0x2001  0x4004      -            0     0     -

@holiman
Copy link
Contributor Author

holiman commented Jan 19, 2023

on my internal drives

Hold up -- does that mean not the drive which the ancients are on? At this point, we're only really interested in the disk holding the ancients, the /media/seassd/ancient unit, because that's where the data went missing.

@nisdas
Copy link
Contributor

nisdas commented Jan 19, 2023

For my external drive, doing extensive disk checks will be difficult as it would require me to restart my machine(my node is currently running with validators on them). I can't run the smart tool checks as it requires me to disable a module in the kernel: https://www.smartmontools.org/ticket/971 . If it helps, I just bought the external drive 2 months back.

@holiman
Copy link
Contributor Author

holiman commented Apr 20, 2023

Similar error occurred on a pebble-backed node which suffered a (few) uncontrolled shutdowns (courtesy of @chfast)

geth --datadir=/bc/geth --db.engine=pebble --cache=3072 --port=4130 --discovery.port=4130
INFO [04-20|09:12:19.683] Starting Geth on Ethereum mainnet... 
INFO [04-20|09:12:19.690] Maximum peer count                       ETH=50 LES=0 total=50
INFO [04-20|09:12:19.694] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
INFO [04-20|09:12:19.694] Using pebble as db engine 
INFO [04-20|09:12:19.719] Set global gas cap                       cap=50,000,000
INFO [04-20|09:12:19.729] Allocated trie memory caches             clean=460.00MiB dirty=768.00MiB
INFO [04-20|09:12:20.155] Using pebble as the backing database 
INFO [04-20|09:12:20.155] Allocated cache and file handles         database=/bc/geth/geth/chaindata cache=1.50GiB handles=524,288
INFO [04-20|09:12:34.235] Opened ancient database                  database=/bc/geth/geth/chaindata/ancient/chain readonly=false
ERROR[04-20|09:12:34.288] Error in block freeze operation          err="block receipts missing, can't freeze block 16854335"

@holiman
Copy link
Contributor Author

holiman commented Apr 20, 2023

The 'block receipts' missing is curious, because during freeze, it reads the fields in this order:

  • hash
  • header
  • body (transactions + uncles)
  • receipts
  • td
			hash := ReadCanonicalHash(nfdb, number)
			if hash == (common.Hash{}) {
				return fmt.Errorf("canonical hash missing, can't freeze block %d", number)
			}
			header := ReadHeaderRLP(nfdb, hash, number)
			if len(header) == 0 {
				return fmt.Errorf("block header missing, can't freeze block %d", number)
			}
			body := ReadBodyRLP(nfdb, hash, number)
			if len(body) == 0 {
				return fmt.Errorf("block body missing, can't freeze block %d", number)
			}
			receipts := ReadReceiptsRLP(nfdb, hash, number)
			if len(receipts) == 0 {
				return fmt.Errorf("block receipts missing, can't freeze block %d", number)
			}
			td := ReadTdRLP(nfdb, hash, number)
			if len(td) == 0 {
				return fmt.Errorf("total difficulty missing, can't freeze block %d", number)
			}

So that means that the other fields are present in leveldb, but only the receipts are missing (and possibly td). Checking those, it seems that the td is also present, and indeed only the receipts are gone:

root@rv41:~# sudo -u geth geth --datadir=/bc/geth --db.engine=pebble db get 0x720000000001012d3f1187436c1687352bcb934b692a08364179d4aeb4b08e4cbfca67a60be04ba6e8
INFO [04-20|09:25:12.385] Maximum peer count                       ETH=50 LES=0 total=50
INFO [04-20|09:25:12.387] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
INFO [04-20|09:25:12.387] Using pebble as db engine 
INFO [04-20|09:25:12.388] Set global gas cap                       cap=50,000,000
INFO [04-20|09:25:12.730] Using pebble as the backing database 
INFO [04-20|09:25:12.730] Allocated cache and file handles         database=/bc/geth/geth/chaindata cache=512.00MiB handles=524,288
INFO [04-20|09:25:14.904] Opened ancient database                  database=/bc/geth/geth/chaindata/ancient/chain readonly=true
INFO [04-20|09:25:14.918] Get operation failed                     key=0x720000000001012d3f1187436c1687352bcb934b692a08364179d4aeb4b08e4cbfca67a60be04ba6e8 error="pebble: not found"
pebble: not found
root@rv41:~# sudo -u geth geth --datadir=/bc/geth --db.engine=pebble db get 0x680000000001012d3f1187436c1687352bcb934b692a08364179d4aeb4b08e4cbfca67a60be04ba6e874
INFO [04-20|09:25:27.788] Maximum peer count                       ETH=50 LES=0 total=50
INFO [04-20|09:25:27.788] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
INFO [04-20|09:25:27.788] Using pebble as db engine 
INFO [04-20|09:25:27.790] Set global gas cap                       cap=50,000,000
INFO [04-20|09:25:28.103] Using pebble as the backing database 
INFO [04-20|09:25:28.103] Allocated cache and file handles         database=/bc/geth/geth/chaindata cache=512.00MiB handles=524,288
INFO [04-20|09:25:30.290] Opened ancient database                  database=/bc/geth/geth/chaindata/ancient/chain readonly=true
key 0x680000000001012d3f1187436c1687352bcb934b692a08364179d4aeb4b08e4cbfca67a60be04ba6e874: 0x8a0c70d815d562d3cfa955

@holiman
Copy link
Contributor Author

holiman commented Apr 20, 2023

Also, the child block, 16854336 / 0x47a4c8ce11136f87915e62a34606f5dccb286653ccf1fda7564892cdd25795bf, directly following, is also fully present in leveldb.

So it seems that for 16854335, the receipts just disappeared, but nothing else seems to be missing (at least not in the direct vicinity)

@holiman
Copy link
Contributor Author

holiman commented Apr 20, 2023

So, on a functioning node, I did

> debug.dbAncient("receipts", 16854335)

I took the output, and put it into a bash variable, and then did

geth --datadir=/bc/geth --db.engine=pebble db put 0x720000000001012d3f1187436c1687352bcb934b692a08364179d4aeb4b08e4cbfca67a60be04ba6e8 $RLP

THat seems to have fixed it. I'm curious to see if it is indeed fixed, or if something else will turn up.

@holiman
Copy link
Contributor Author

holiman commented Apr 9, 2024

This ticket is stale, closing

@holiman holiman closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants