Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal: Error starting protocol stack: genesis mismatch #21315

Closed
sssubik opened this issue Jul 9, 2020 · 49 comments
Closed

Fatal: Error starting protocol stack: genesis mismatch #21315

sssubik opened this issue Jul 9, 2020 · 49 comments
Assignees

Comments

@sssubik
Copy link

sssubik commented Jul 9, 2020

Hi there,

System information

Geth version: Geth/v1.9.13-stable
OS & Version: ubuntu 18.04

Expected behaviour

My server turned off abruptly where my full node GETH was running. Now when I fix the server and open the node the node should start running.

Actual behaviour

The GETH node before the server crashed was running fully synced and was displaying importing chain segment.
When I fixed the server after server turned off abruptly, and executing geth I am getting log as:

INFO [07-09|18:08:46.552] Starting Geth on Ethereum mainnet... 
INFO [07-09|18:08:46.553] Bumping default cache on mainnet         provided=1024 updated=4096
INFO [07-09|18:08:46.555] Maximum peer count                       ETH=50 LES=0 total=50
INFO [07-09|18:08:46.555] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
ERROR[07-09|18:08:46.555] Failed to enumerate USB devices          hub=ledger vendor=11415 failcount=1 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-09|18:08:46.555] Failed to enumerate USB devices          hub=trezor vendor=21324 failcount=1 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-09|18:08:46.555] Failed to enumerate USB devices          hub=trezor vendor=4617  failcount=1 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-09|18:08:46.555] Failed to enumerate USB devices          hub=ledger vendor=11415 failcount=2 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-09|18:08:46.555] Failed to enumerate USB devices          hub=trezor vendor=21324 failcount=2 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-09|18:08:46.556] Failed to enumerate USB devices          hub=trezor vendor=4617  failcount=2 err="failed to initialize libusb: libusb: unknown error [code -99]"
INFO [07-09|18:08:46.556] Starting peer-to-peer node               instance=Geth/v1.9.13-stable-cbc4ac26/linux-amd64/go1.14.2
INFO [07-09|18:08:46.556] Allocated trie memory caches             clean=614.00MiB dirty=1024.00MiB
INFO [07-09|18:08:46.556] Allocated cache and file handles         database=/home/user/.ethereum/geth/chaindata cache=2.00GiB handles=524288
ERROR[07-09|18:08:47.563] Failed to enumerate USB devices          hub=trezor vendor=21324 failcount=3 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-09|18:08:47.564] Failed to enumerate USB devices          hub=ledger vendor=11415 failcount=3 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-09|18:08:47.564] Failed to enumerate USB devices          hub=trezor vendor=4617  failcount=3 err="failed to initialize libusb: libusb: unknown error [code -99]"
DEBUG[07-13|14:21:29.599] Chain freezer table opened               database=/home/user/.ethereum/geth/chaindata/ancient table=headers items=10329515 size=0.00B
DEBUG[07-13|14:21:29.599] Chain freezer table opened               database=/home/user/.ethereum/geth/chaindata/ancient table=hashes  items=10329515 size=0.00B
DEBUG[07-13|14:21:29.599] Chain freezer table opened               database=/home/user/.ethereum/geth/chaindata/ancient table=bodies  items=10329515 size=0.00B
DEBUG[07-13|14:21:29.599] Chain freezer table opened               database=/home/user/.ethereum/geth/chaindata/ancient table=receipts items=10329515 size=0.00B
DEBUG[07-13|14:21:29.600] Chain freezer table opened               database=/home/user/.ethereum/geth/chaindata/ancient table=diffs    items=10329515 size=0.00B
INFO [07-13|14:21:29.600] Opened ancient database                  database=/home/user/.ethereum/geth/chaindata/ancient
Fatal: Error starting protocol stack: genesis mismatch: 0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3 (leveldb) !=  (ancients))

The server was off for around 8 hours is this the reason there is a mismatch problem? I dont know why the Genesis mismatch problem is there. Could you provide me with a solution to initialize the GETH node with a genesis block again or any other solution
Steps to reproduce the behaviour
Install GETH full node and if the server turns off due to some problem the above error shows up if you reboot the server.

@sssubik
Copy link
Author

sssubik commented Jul 9, 2020

Hey @karalabe . Really need your help. Please could you tell me some solution or any help..

@rjl493456442
Copy link
Member

Hi @sssubik sorry for the late reply. Is the corrupted database still available?

#21327

I added a debug log to expose the root error for the failure, hopefully it's not late

@sssubik
Copy link
Author

sssubik commented Jul 13, 2020

Hey @rjl493456442 Yes the corrupted database is still available. What should I do? Do I need to resync?
Okay I will show you the log once the commit is done.. Will it be done today?

@sssubik
Copy link
Author

sssubik commented Jul 14, 2020

Hey @rjl493456442 I tried the new version and now I am getting this:

 `DEBUG[07-14|08:29:09.522] Chain freezer table opened               database=/home/user/.ethereum/geth/chaindata/ancient table=receipts items=10329515 size=0.00B
DEBUG[07-14|08:29:09.522] Chain freezer table opened               database=/home/user/.ethereum/geth/chaindata/ancient table=diffs    items=10329515 size=0.00B
DEBUG[07-14|08:29:09.522] Chain freezer table opened               database=/home/user/.ethereum/geth/chaindata/ancient table=headers  items=10329515 size=0.00B
DEBUG[07-14|08:29:09.522] Chain freezer table opened               database=/home/user/.ethereum/geth/chaindata/ancient table=hashes   items=10329515 size=0.00B
DEBUG[07-14|08:29:09.522] Chain freezer table opened               database=/home/user/.ethereum/geth/chaindata/ancient table=bodies   items=10329515 size=0.00B
INFO [07-14|08:29:09.522] Opened ancient database                  database=/home/user/.ethereum/geth/chaindata/ancient
Fatal: Error starting protocol stack: failed to retrieve genesis from ancient EOF`

What would I need to do? @rjl493456442

@sssubik
Copy link
Author

sssubik commented Jul 15, 2020

Hey @rjl493456442. I did some research and found no solution. I find that the GETH once shutdown abruptly prunes the states in freezer without pruning level db due to some roll back to correct state. Could you provide me more detailed explanation or solution?

@rjl493456442
Copy link
Member

rjl493456442 commented Jul 15, 2020

@sssubik Can you list your ancient directory?

Something I don't understand is items=10329515 size=0.00B there are 10M items in the ancient directory but the head file size is 0.

Now I have no clue why the ancient will return an empty genesis for you. It's not easy to debug without a corrupted database. Maybe you can help a bit by offering some information.

@sssubik
Copy link
Author

sssubik commented Jul 15, 2020

`bodies.0000.cdat  bodies.0021.cdat  bodies.0042.cdat    receipts.0004.cdat
bodies.0001.cdat  bodies.0022.cdat  bodies.0043.cdat    receipts.0005.cdat
bodies.0002.cdat  bodies.0023.cdat  bodies.0044.cdat    receipts.0006.cdat
bodies.0003.cdat  bodies.0024.cdat  bodies.0045.cdat    receipts.0007.cdat
bodies.0004.cdat  bodies.0025.cdat  bodies.0046.cdat    receipts.0008.cdat
bodies.0005.cdat  bodies.0026.cdat  bodies.0047.cdat    receipts.0009.cdat
bodies.0006.cdat  bodies.0027.cdat  bodies.0048.cdat    receipts.0010.cdat
bodies.0007.cdat  bodies.0028.cdat  bodies.cidx         receipts.0011.cdat
bodies.0008.cdat  bodies.0029.cdat  diffs.0000.rdat     receipts.0012.cdat
bodies.0009.cdat  bodies.0030.cdat  diffs.ridx          receipts.0013.cdat
bodies.0010.cdat  bodies.0031.cdat  FLOCK               receipts.0014.cdat
bodies.0011.cdat  bodies.0032.cdat  hashes.0000.rdat    receipts.0015.cdat
bodies.0012.cdat  bodies.0033.cdat  hashes.ridx         receipts.0016.cdat
bodies.0013.cdat  bodies.0034.cdat  headers.0000.cdat   receipts.0017.cdat
bodies.0014.cdat  bodies.0035.cdat  headers.0001.cdat   receipts.0018.cdat
bodies.0015.cdat  bodies.0036.cdat  headers.0002.cdat   receipts.0019.cdat
bodies.0016.cdat  bodies.0037.cdat  headers.cidx        receipts.0020.cdat
bodies.0017.cdat  bodies.0038.cdat  receipts.0000.cdat  receipts.0021.cdat
bodies.0018.cdat  bodies.0039.cdat  receipts.0001.cdat  receipts.cidx
bodies.0019.cdat  bodies.0040.cdat  receipts.0002.cdat
bodies.0020.cdat  bodies.0041.cdat  receipts.0003.cdat
`

Hey I have listed my ancient directory. Also the space I calculate gives me this:
131G ancient
If anything more need I will provide. Would be really frustrating if have to resync..

@rjl493456442
Copy link
Member

Can you also give me the size of hashes.ridx and hashes.0000.rdat? Thanks!

@sssubik
Copy link
Author

sssubik commented Jul 15, 2020

60M hashes.ridx
0 hashes.0000.rdat

hashes.0000.rdat shows 0 MB

@rjl493456442
Copy link
Member

@sssubik I have to say sorry. I think there is a bug in the codebase, but the ancient database is already broken(and useless unless we do some manual recovery).

@sssubik
Copy link
Author

sssubik commented Jul 15, 2020

@rjl493456442 What does this mean. Do I need to resync? Or can I do some manual recovery? Also if I need to resync am I gonna be able to leverage the downloaded states I already have to complete resync in less time?

@rjl493456442
Copy link
Member

Do you have any other synced Geth instance?

Manual recovery means:

  • Drop another synced hashes.0000.rdat file there for replacement(Note, the new file should contain more elements then the broken one
  • Write a script the iterate the Header table and then inject the header hash into the broken hashes.0000.rdat file. It needs some coding work, maybe not a good idea

Also if I need to resync am I gonna be able to leverage the downloaded states I already have to complete resync in less time

To be honest I don't think we can plugin the downloaded states without any modification. The hacky way is: iterating the leveldb, delete all chain data(headers, tds, receipts, bodies, chain marker) but keep the states and then re-sync.

Download the chain is fast, download the state is painful.

@rjl493456442
Copy link
Member

rjl493456442 commented Jul 15, 2020

Can you search the history log that whether Truncating dangling head ... or Truncating dangling indexes ... or Truncating freezer table even appear?
I am digging the root cause and the log might help.

@sssubik
Copy link
Author

sssubik commented Jul 15, 2020

Hey I dont have any other synced GETH instance

Drop another synced hashes.0000.rdat file there for replacement(Note, the new file should contain more elements then the broken one

So, can I find a synced hashes.0000.rdat file from somewhere else.
Also

The hacky way is: iterating the leveldb, delete all chain data(headers, tds, receipts, bodies, chain marker) but keep the states and then re-sync.

Do you mean this geth removedb? But isnt the problem in the ancient folder? And to resync I just run the geth command again? Also how long could it take?

Can you search the history log that whether Truncating dangling head ... or Truncating dangling indexes ... or Truncating freezer table even appear?

As far as I know how to search log I get this:

sudo journalctl -u geth.service --no-page | grep [Truncating dangling head ]
grep: Unmatched [ or [^
Subik@h2883546:~$ sudo journalctl -u geth.service --no-page | grep [Truncating dangling indexes ]
grep: Unmatched [ or [^
Subik@h2883546:~$ sudo journalctl -u geth.service --no-page | grep [Truncating freezer table]
grep: Unmatched [ or [^

I cant find the log you just sent me

@sssubik
Copy link
Author

sssubik commented Jul 15, 2020

Sorry the above search of logs was wrong. I do get this:

Jul 09 11:20:42  WARN [07-09|11:20:42.683] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=diffs indexed=0.00B stored=101.37MiB
Jul 09 11:20:42 WARN [07-09|11:20:42.687] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=headers indexed=0.00B stored=1.86GiB
Jul 09 11:20:42  WARN [07-09|11:20:42.695] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=hashes  indexed=0.00B stored=315.23MiB
Jul 09 11:20:42 WARN [07-09|11:20:42.701] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=bodies  indexed=0.00B stored=1.86GiB
Jul 09 11:20:42  WARN [07-09|11:20:42.712] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=receipts indexed=0.00B stored=1.86GiB

@rjl493456442
Copy link
Member

@sssubik geth removedb will also delete the downloaded state, don't do it.

I think we can offer you the synced ancient file if you want. @holiman says "We can do that, but I don't have time to do it during the day, maybe toward the evening"

@sssubik
Copy link
Author

sssubik commented Jul 15, 2020

@rjl493456442
Yes that would be great.. Okay I can wait towards the evening. Do I just copy the file to the ancient folder?

@rjl493456442
Copy link
Member

@sssubik Yes, basically the truncated files for replacement. Do you have any other logs before these Truncating dangling head?

@sssubik
Copy link
Author

sssubik commented Jul 15, 2020

@rjl493456442 Ohh okay.
I have this:

Jul 08 16:48:43 [418]: INFO [07-08|16:48:43.125] Deep froze chain segment                 blocks=4    elapsed=29.928ms  number=10329509 hash=c2ec4a…6c3eaa
Jul 08 16:48:54 [418]: INFO [07-08|16:48:54.318] Imported new chain segment               blocks=1    txs=178  mgas=11.656  elapsed=344.712ms mgasps=33.812  number=10419511 hash=d9d18e…6a5227 dirty=733.40MiB
Jul 08 16:48:56 [418]: INFO [07-08|16:48:56.063] Imported new chain segment               blocks=1    txs=126  mgas=11.672  elapsed=257.415ms mgasps=45.342  number=10419512 hash=6d6102…d735f5 dirty=733.50MiB
Jul 08 16:49:06 [418]: INFO [07-08|16:49:06.565] Imported new chain segment               blocks=1    txs=181  mgas=11.644  elapsed=1.401s    mgasps=8.310   number=10419513 hash=76c035…886d06 dirty=733.79MiB
Jul 08 16:49:24 [418]: INFO [07-08|16:49:24.706] Imported new chain segment               blocks=1    txs=236  mgas=11.675  elapsed=445.034ms mgasps=26.235  number=10419514 hash=44319c…5ab7ad dirty=734.31MiB
Jul 08 16:49:36 [418]: INFO [07-08|16:49:36.692] Imported new chain segment               blocks=1    txs=172  mgas=11.683  elapsed=388.708ms mgasps=30.055  number=10419515 hash=0b4c1e…4fee7e dirty=735.31MiB
-- Reboot --
Jul 09 11:19:48 h2883546.stratoserver.net systemd[1]: Started Go Ethereum Client.
Jul 09 11:19:52 [184]: INFO [07-09|11:19:52.370] Starting Geth on Ethereum mainnet...
Jul 09 11:19:52 [184]: INFO [07-09|11:19:52.370] Bumping default cache on mainnet         provided=1024 updated=4096
Jul 09 11:19:52 [184]: INFO [07-09|11:19:52.377] Maximum peer count                       ETH=50 LES=0 total=50
Jul 09 11:19:52 [184]: INFO [07-09|11:19:52.380] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
Jul 09 11:19:53 [184]: ERROR[07-09|11:19:52.495] Failed to enumerate USB devices          hub=ledger vendor=11415 failcount=1 err="failed to initialize libusb: libusb: unknown error [code -99]"
Jul 09 11:19:53 [184]: ERROR[07-09|11:19:52.495] Failed to enumerate USB devices          hub=trezor vendor=21324 failcount=1 err="failed to initialize libusb: libusb: unknown error [code -99]"
Jul 09 11:19:53 [184]: ERROR[07-09|11:19:52.495] Failed to enumerate USB devices          hub=trezor vendor=4617  failcount=1 err="failed to initialize libusb: libusb: unknown error [code -99]"
Jul 09 11:19:53 [184]: ERROR[07-09|11:19:52.496] Failed to enumerate USB devices          hub=ledger vendor=11415 failcount=2 err="failed to initialize libusb: libusb: unknown error [code -99]"
Jul 09 11:19:53 [184]: ERROR[07-09|11:19:52.496] Failed to enumerate USB devices          hub=trezor vendor=21324 failcount=2 err="failed to initialize libusb: libusb: unknown error [code -99]"
Jul 09 11:19:53 [184]: ERROR[07-09|11:19:52.496] Failed to enumerate USB devices          hub=trezor vendor=4617  failcount=2 err="failed to initialize libusb: libusb: unknown error [code -99]"
Jul 09 11:19:53 [184]: INFO [07-09|11:19:52.511] Starting peer-to-peer node               instance=Geth/v1.9.13-stable-cbc4ac26/linux-amd64/go1.14.2
Jul 09 11:19:53 [184]: INFO [07-09|11:19:52.514] Allocated trie memory caches             clean=614.00MiB dirty=1024.00MiB
Jul 09 11:19:53 [184]: INFO [07-09|11:19:52.514] Allocated cache and file handles         database=/home/user/.ethereum/geth/chaindata cache=2.00GiB handles=2048
Jul 09 11:19:53 [184]: ERROR[07-09|11:19:53.496] Failed to enumerate USB devices          hub=trezor vendor=21324 failcount=3 err="failed to initialize libusb: libusb: unknown error [code -99]"
Jul 09 11:19:53 [184]: ERROR[07-09|11:19:53.501] Failed to enumerate USB devices          hub=trezor vendor=4617  failcount=3 err="failed to initialize libusb: libusb: unknown error [code -99]"
Jul 09 11:19:53 [184]: ERROR[07-09|11:19:53.501] Failed to enumerate USB devices          hub=ledger vendor=11415 failcount=3 err="failed to initialize libusb: libusb: unknown error [code -99]"
Jul 09 11:20:42 [184]: WARN [07-09|11:20:42.683] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=diffs indexed=0.00B stored=101.37MiB
Jul 09 11:20:42 [184]: WARN [07-09|11:20:42.687] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=headers indexed=0.00B stored=1.86GiB
Jul 09 11:20:42 [184]: WARN [07-09|11:20:42.695] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=hashes  indexed=0.00B stored=315.23MiB
Jul 09 11:20:42 [184]: WARN [07-09|11:20:42.701] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=bodies  indexed=0.00B stored=1.86GiB
Jul 09 11:20:42 [184]: WARN [07-09|11:20:42.712] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=receipts indexed=0.00B stored=1.86GiB
Jul 09 11:20:42 [184]: INFO [07-09|11:20:42.719] Opened ancient database                  database=/home/user/.ethereum/geth/chaindata/ancient
Jul 09 11:20:42 [184]: Fatal: Error starting protocol stack: genesis mismatch: 0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3 (leveldb) !=  (ancients)

Also The server crashed and was closed for around 10hours before reboot.

@rjl493456442
Copy link
Member

Any other logs before crash?

@sssubik
Copy link
Author

sssubik commented Jul 15, 2020

Not much really. Only Imported new chain segment.

Jul 08 16:47:31 [418]: INFO [07-08|16:47:31.376] Imported new chain segment               blocks=1    txs=155  mgas=11.658  elapsed=211.066ms mgasps=55.236  number=10419505 hash=9877f3…974cb7 dirty=730.99MiB
Jul 08 16:47:37 [418]: INFO [07-08|16:47:37.392] Imported new chain segment               blocks=1    txs=175  mgas=11.644  elapsed=438.830ms mgasps=26.535  number=10419506 hash=7e2921…0b9124 dirty=731.52MiB
Jul 08 16:47:43 [418]: INFO [07-08|16:47:43.095] Deep froze chain segment                 blocks=6    elapsed=29.678ms  number=10329505 hash=0b5308…b5af87
Jul 08 16:47:48 [418]: INFO [07-08|16:47:48.591] Imported new chain segment               blocks=1    txs=192  mgas=10.898  elapsed=375.860ms mgasps=28.995  number=10419507 hash=fc5ac3…db320c dirty=731.79MiB
Jul 08 16:47:53 [418]: INFO [07-08|16:47:53.086] Imported new chain segment               blocks=1    txs=154  mgas=11.664  elapsed=364.508ms mgasps=31.999  number=10419508 hash=c78c69…da55da dirty=731.97MiB
Jul 08 16:48:34 [418]: INFO [07-08|16:48:34.780] Imported new chain segment               blocks=1    txs=254  mgas=11.693  elapsed=413.370ms mgasps=28.286  number=10419509 hash=79f4fa…b4e660 dirty=731.90MiB
Jul 08 16:48:41 [418]: INFO [07-08|16:48:41.136] Imported new chain segment               blocks=1    txs=132  mgas=11.679  elapsed=278.009ms mgasps=42.011  number=10419510 hash=bc7206…1799dc dirty=733.25MiB
Jul 08 16:48:43 [418]: INFO [07-08|16:48:43.125] Deep froze chain segment                 blocks=4    elapsed=29.928ms  number=10329509 hash=c2ec4a…6c3eaa
Jul 08 16:48:54 [418]: INFO [07-08|16:48:54.318] Imported new chain segment               blocks=1    txs=178  mgas=11.656  elapsed=344.712ms mgasps=33.812  number=10419511 hash=d9d18e…6a5227 dirty=733.40MiB
Jul 08 16:48:56 [418]: INFO [07-08|16:48:56.063] Imported new chain segment               blocks=1    txs=126  mgas=11.672  elapsed=257.415ms mgasps=45.342  number=10419512 hash=6d6102…d735f5 dirty=733.50MiB
Jul 08 16:49:06 [418]: INFO [07-08|16:49:06.565] Imported new chain segment               blocks=1    txs=181  mgas=11.644  elapsed=1.401s    mgasps=8.310   number=10419513 hash=76c035…886d06 dirty=733.79MiB
Jul 08 16:49:24 [418]: INFO [07-08|16:49:24.706] Imported new chain segment               blocks=1    txs=236  mgas=11.675  elapsed=445.034ms mgasps=26.235  number=10419514 hash=44319c…5ab7ad dirty=734.31MiB
Jul 08 16:49:36 [418]: INFO [07-08|16:49:36.692] Imported new chain segment               blocks=1    txs=172  mgas=11.683  elapsed=388.708ms mgasps=30.055  number=10419515 hash=0b4c1e…4fee7e dirty=735.31MiB
-- Reboot --

But Some other logs that were like 10hours or more before the crash:

Jul 08 04:43:56 [418]: ERROR[07-08|04:43:56.621] Demoting invalidated transaction         hash=6f174b…1c6ed8
Jul 08 04:43:56 [418]: ERROR[07-08|04:43:56.621] Demoting invalidated transaction         hash=b8ecb0…3bf582
Jul 08 04:43:56 [418]: ERROR[07-08|04:43:56.621] Demoting invalidated transaction         hash=649335…6c9365
Jul 08 04:43:56 [418]: ERROR[07-08|04:43:56.681] Demoting invalidated transaction         hash=37d9e7…22ec25
Jul 08 04:43:56 [418]: ERROR[07-08|04:43:56.681] Demoting invalidated transaction         hash=9e4756…ed17f6
Jul 08 04:43:56 [418]: ERROR[07-08|04:43:56.681] Demoting invalidated transaction         hash=d44eb9…89c1ff
Jul 08 04:43:56 [418]: ERROR[07-08|04:43:56.681] Demoting invalidated transaction         hash=2b089e…390f77
Jul 08 04:43:56 [418]: ERROR[07-08|04:43:56.681] Demoting invalidated transaction         hash=da73ed…64c0ca
Jul 08 04:43:56 [418]: ERROR[07-08|04:43:56.681] Demoting invalidated transaction         hash=f405f8…b93414
Jul 08 04:43:56 [418]: ERROR[07-08|04:43:56.681] Demoting invalidated transaction         hash=e75470…d4840d
Jul 08 04:43:56 [418]: ERROR[07-08|04:43:56.681] Demoting invalidated transaction         hash=cbd6d6…796e99

Also this unique one:
`Jul 08 15:30:37 [418]: INFO [07-08|15:30:37.013] Chain reorg detected number=10419158 hash=ddd8ba…a8ad68 drop=1 dropfrom=0b9a3b…91797d add=2 addfrom=5f5fd6…28d552

Jul 08 16:02:25 [418]: INFO [07-08|16:02:25.609] Chain reorg detected =10419296 hash=90c89b…feb214 drop=1 dropfrom=28571c…aca82f add=2 addfrom=5c6482…cd2d7b`

@sssubik
Copy link
Author

sssubik commented Jul 15, 2020

Hey @rjl493456442 Can you tell your local time? Or after how much time @holiman will be available to send the synced ancient file?

@rjl493456442
Copy link
Member

@sssubik No idea, probably a few hours later. Btw I would suggest you to replace all *.idx files (e.g. diffs.ridx, bodies.cidx, etc) and all *.dat files with largest file suffix (e.g. hashes.0000.rdat, diffs.0000.rdat, bodies.0048.cdat, receipts.0021.cdat, headers.0002.cdat)

@sssubik
Copy link
Author

sssubik commented Jul 15, 2020

Hey @rjl493456442
cp -f [oringinal file] [new file]
Copies the original file and overwrites the target file

cp -f diffs.0000.rdat diffs.ridx

Do you mean this?

@rjl493456442
Copy link
Member

rjl493456442 commented Jul 15, 2020

@sssubik No, I think @holiman will publish our ancient folder somewhere, you only need to copy hashes.0000.rdat, diffs.0000.rdat, bodies.0048.cdat, receipts.0021.cdat, headers.0002.cdat, diffs.ridx, bodies.cidx, hashes.ridx, headers.cidx and receipts.cidx to replace your local files. I think it's enough.

@holiman
Copy link
Contributor

holiman commented Jul 15, 2020

I've uploaded some of it to s3. Install aws client, and try this:

[user@work ~]$ aws s3 ls s3://ancient-blockdata/
2020-07-15 19:34:57 1999999645 bodies.0000.cdat
2020-07-15 19:36:28  106796394 diffs.0000.rdat
2020-07-15 19:35:26 1999999735 headers.0000.cdat
2020-07-15 19:35:58 1999995570 receipts.0000.cdat
[user@work ~]$ aws s3 cp s3://ancient-blockdata/diffs.0000.rdat .

So you should be able to replace your broken bodies.0000.cdat (all .0000. cdat/rdat files that are uploaded) with that^ .

The ones you should download are:

2020-07-15 19:34:57 1999999645 bodies.0000.cdat
2020-07-15 19:36:28  106796394 diffs.0000.rdat
2020-07-15 19:51:23  332016320 hashes.0000.rdat
2020-07-15 19:35:26 1999999735 headers.0000.cdat
2020-07-15 19:35:58 1999995570 receipts.0000.cdat

Hopefully, your indexes are correct and matches the offsets in the data files.

The only problem is the diffs.000.rdat. It never filled up the 2G limit, and thus won't be identical to what you had. So what I think will happen is that it will truncate your data according to how many items are in diffs.0000.rdat.

I don't think you will need to modify your idx files.

Note: this may well fail. It may even cause some of your existing ancient-data to become truncated (lost), but it might be worth a shot.

@sssubik
Copy link
Author

sssubik commented Jul 16, 2020

Hey I have not used AWS till now. When I try to run the command:
aws s3 cp s3://ancient-blockdata/diffs.0000.rdat .

It gives me: fatal error: An error occurred (403) when calling the HeadObject operation: Forbidden

As I did some research looks like it needs a region parameter as AWS only allows download from a region based URL.
Could you provide me your region in AWS like:

aws s3 cp s3://ancient-blockdata/diffs.0000.rdat . **--region us-west-2b**

Or may be I need the permission to download the file.?
My Account name for AWS is ssubik

Hey @holiman may be I need the access through IAM User or is it Public?
Thanks

@rjl493456442
Copy link
Member

btw @holiman I do think the indexes are corrupted. Like the last element in indexes are invalid.
If we can't fix these indexes the data will be truncated anyway.

@sssubik
Copy link
Author

sssubik commented Jul 16, 2020

Hey @rjl493456442 I am also downloading a light client right now so that I can use the ethereum blockchain. I have simple question as to
Can I invoke a smart contract function through light client ipc at the backend?

Also the above aws command to download the files are not working.

@rjl493456442
Copy link
Member

@sssubik Sure, light client should provide nearly all functionalities as the full node but it's a little bit slower and unstable(need server connections).

@sssubik
Copy link
Author

sssubik commented Jul 17, 2020

Hey @holiman I think I need the access key to your s3 bucket. I cant seem to access it by any other means..
@rjl493456442 Thanks I will try it and query you if something comes up

@sssubik
Copy link
Author

sssubik commented Jul 20, 2020

Hey @rjl493456442 @holiman I cant download the files.. Could you help me please?

@sssubik
Copy link
Author

sssubik commented Jul 23, 2020

Hey @rjl493456442 I am syncing new fast node in another data directory right now. I tried copying to my previous working GETH fast node the following:
cp -f bodies.0000.cdat /home/user/.ethereum/geth/chaindata/ancient/bodies.0000.cdat
And I did this for:

bodies.0000.cdat
diffs.0000.rdat
hashes.0000.rdat
headers.0000.cdat
receipts.0000.cdat

But I again got this:

INFO [07-23|08:05:09.053] Maximum peer count                       ETH=50 LES=0 total=50
INFO [07-23|08:05:09.056] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
ERROR[07-23|08:05:09.060] Failed to enumerate USB devices          hub=ledger vendor=11415 failcount=1 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-23|08:05:09.061] Failed to enumerate USB devices          hub=trezor vendor=21324 failcount=1 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-23|08:05:09.061] Failed to enumerate USB devices          hub=trezor vendor=4617  failcount=1 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-23|08:05:09.061] Failed to enumerate USB devices          hub=ledger vendor=11415 failcount=2 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-23|08:05:09.061] Failed to enumerate USB devices          hub=trezor vendor=21324 failcount=2 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-23|08:05:09.061] Failed to enumerate USB devices          hub=trezor vendor=4617  failcount=2 err="failed to initialize libusb: libusb: unknown error [code -99]"
INFO [07-23|08:05:09.061] Set global gas cap                       cap=25000000
INFO [07-23|08:05:09.062] Starting peer-to-peer node               instance=Geth/v1.9.17-unstable-6ef4495a-20200713/linux-amd64/go1.13.3
INFO [07-23|08:05:09.064] Allocated trie memory caches             clean=614.00MiB dirty=1024.00MiB
INFO [07-23|08:05:09.065] Allocated cache and file handles         database=/home/user/.ethereum/geth/chaindata cache=2.00GiB handles=524288
ERROR[07-23|08:05:11.748] Failed to enumerate USB devices          hub=trezor vendor=21324 failcount=3 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-23|08:05:11.748] Failed to enumerate USB devices          hub=trezor vendor=4617  failcount=3 err="failed to initialize libusb: libusb: unknown error [code -99]"
ERROR[07-23|08:05:11.750] Failed to enumerate USB devices          hub=ledger vendor=11415 failcount=3 err="failed to initialize libusb: libusb: unknown error [code -99]"
WARN [07-23|08:05:14.558] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=diffs indexed=0.00B stored=96.47MiB
WARN [07-23|08:05:14.585] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=headers indexed=0.00B stored=1.86GiB
WARN [07-23|08:05:14.950] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=hashes  indexed=0.00B stored=301.00MiB
WARN [07-23|08:05:15.039] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=bodies  indexed=0.00B stored=1.86GiB
WARN [07-23|08:05:15.186] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=receipts indexed=0.00B stored=1.86GiB
INFO [07-23|08:05:15.782] Opened ancient database                  database=/home/user/.ethereum/geth/chaindata/ancient
Fatal: Error starting protocol stack: failed to retrieve genesis from ancient EOF

Still the same error persists. Is there some alternatives to this. Also I could not access the AWS so, I just copied the files from my new node.

@sssubik
Copy link
Author

sssubik commented Jul 23, 2020

I tried something in hope of fixing the issue. Where I copied
diffs.ridx, bodies.cidx, hashes.ridx, headers.cidx and receipts.cidx
from New Fast node into the old node. I think all indexes have truncated and I get this:

WARN [07-23|08:55:20.758] Truncating dangling indexes              database=/home/user/.ethereum/geth/chaindata/ancient table=headers indexed=546.00B    stored=0.00B
WARN [07-23|08:55:20.758] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=bodies  indexed=1.03GiB    stored=1.86GiB
WARN [07-23|08:55:20.785] Truncating dangling head                 database=/home/user/.ethereum/geth/chaindata/ancient table=receipts indexed=1.20GiB    stored=1.86GiB
WARN [07-23|08:55:20.844] Truncating freezer table                 database=/home/user/.ethereum/geth/chaindata/ancient table=hashes   items=9863168 limit=9184875
WARN [07-23|08:55:20.847] Truncating freezer table                 database=/home/user/.ethereum/geth/chaindata/ancient table=bodies   items=9863168 limit=9184875
WARN [07-23|08:55:20.988] Truncating freezer table                 database=/home/user/.ethereum/geth/chaindata/ancient table=receipts items=9863168 limit=9184875
WARN [07-23|08:55:21.004] Truncating freezer table                 database=/home/user/.ethereum/geth/chaindata/ancient table=diffs    items=9863168 limit=9184875
INFO [07-23|08:55:21.007] Opened ancient database                  database=/home/user/.ethereum/geth/chaindata/ancient
Fatal: Error starting protocol stack: gap (#9184875) in the chain between ancients and leveldb


It was a shot in the dark as I am not getting much help. So is there any fix here?

@rjl493456442
Copy link
Member

Are you sure your new node is synced?

@sssubik
Copy link
Author

sssubik commented Jul 23, 2020

Hey @rjl493456442 No the node was not synced.. I just copied those files as they were the same size

@rjl493456442
Copy link
Member

Please wait until your new node's chain is at least higher than 10329515. Then copy these files.

  • hashes.0000.rdat
  • diffs.0000.rdat
  • bodies.0000.cdat
  • receipts.0000.cdat
  • headers.0000.cdat
  • bodies.0048.cdat
  • headers.0002.cdat
  • receipts.0021.cdat
  • diffs.ridx
  • bodies.cidx
  • hashes.ridx
  • headers.cidx
  • receipts.cidx

Hopefully the issue can be fixed.

@sssubik
Copy link
Author

sssubik commented Aug 4, 2020

Hey @rjl493456442. I am syncing the GETH node but I have worries that this kind of problem could repeat again. Does backing up the GETH node (Whole chaindata) by using some backup mechanisms like rsync help to restart the GETH node if the node goes down again?
Hey @rjl493456442 Any help would be appreciated. Thanks

@FrankyHo
Copy link

Hi @sssubik did you fix the problem? I got problem after server reboot

INFO [11-14|17:25:34.415] Starting Geth on Ethereum mainnet... 
INFO [11-14|17:25:34.415] Bumping default cache on mainnet         provided=1024 updated=4096
DEBUG[11-14|17:25:34.415] Sanitizing Go's GC trigger               percent=25
INFO [11-14|17:25:34.417] Maximum peer count                       ETH=10 LES=100 total=110
INFO [11-14|17:25:34.417] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
DEBUG[11-14|17:25:34.417] FS scan times                            list="28.721µs" set="2.919µs" diff="1.858µs"
TRACE[11-14|17:25:34.417] Started watching keystore folder         path=/Ethereum/keystore
TRACE[11-14|17:25:34.417] Handled keystore changes                 time="39.265µs"
INFO [11-14|17:25:34.418] Set global gas cap                       cap=25000000
WARN [11-14|17:25:34.418] Whisper support has been deprecated and the code has been moved to github.com/ethereum/whisper 
INFO [11-14|17:25:34.418] Allocated trie memory caches             clean=1023.00MiB dirty=1024.00MiB
INFO [11-14|17:25:34.418] Allocated cache and file handles         database=/Ethereum/geth/chaindata cache=2.00GiB handles=2048
DEBUG[11-14|17:25:35.575] Chain freezer table opened               database=/Ethereum/geth/chaindata/ancient table=receipts items=11006957 size=0.00B
DEBUG[11-14|17:25:35.575] Chain freezer table opened               database=/Ethereum/geth/chaindata/ancient table=diffs    items=11006957 size=0.00B
DEBUG[11-14|17:25:35.576] Chain freezer table opened               database=/Ethereum/geth/chaindata/ancient table=headers  items=11006957 size=0.00B
DEBUG[11-14|17:25:35.576] Chain freezer table opened               database=/Ethereum/geth/chaindata/ancient table=hashes   items=11006957 size=0.00B
DEBUG[11-14|17:25:35.577] Chain freezer table opened               database=/Ethereum/geth/chaindata/ancient table=bodies   items=11006957 size=0.00B
INFO [11-14|17:25:35.577] Opened ancient database                  database=/Ethereum/geth/chaindata/ancient
Fatal: Failed to register the Ethereum service: failed to retrieve genesis from ancient EOF



@tommymcguiver
Copy link

tommymcguiver commented Dec 15, 2020

If you remove the 0 length files they come right back. Then you get the EOF error.

-rw-r--r-- 1 root root    0 Dec 15 15:43 headers.0000.cdat
-rw-r--r-- 1 root root    0 Dec 15 15:43 hashes.0000.rdat
-rw-r--r-- 1 root root    0 Dec 15 15:43 receipts.0000.cdat
-rw-r--r-- 1 root root    0 Dec 15 15:43 diffs.0000.rdat
-rw-r--r-- 1 root root    0 Dec 15 15:43 bodies.0000.cdat
Fatal: Failed to register the Ethereum service: failed to retrieve genesis from ancient EOF

@avatar-lavventura
Copy link

I am having the same problem. Only solution that I applied was removing the /private/geth folder and downloading complete chain all over again.

@nuliknol
Copy link

One unclean shutdown due to power outage, and 2 months of download are gone! Damn it. How come you guys didn't test the software for unclean shutdown? Very bad!

INFO [02-15|15:13:57.045] Opened ancient database                  database=/eth/chain/geth/chaindata/ancient
Fatal: Failed to register the Ethereum service: genesis mismatch: 0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3 (leveldb) != 0x0000000000000000000000000000000000000000000000000000000000000000 (ancients)

@nuliknol
Copy link

I am having the same problem. Only solution that I applied was removing the /private/geth folder and downloading complete chain all over again.

that is NOT the solution

@nuliknol
Copy link

Hi @sssubik did you fix the problem? I got problem after server reboot

the best solution is to fire the engineers who wrote this code and hire some good ones

@avatar-lavventura
Copy link

I am having the same problem. Only solution that I applied was removing the /private/geth folder and downloading complete chain all over again.

that is NOT the solution

I understand your frasturation. Did you have any backup for your 2 months of download to recover from?

It was the only solution I come up with which worked on my end (I know its not that efficient) . Since I was working on a private chain it was only few GBs so it took 20 minutes to sync. But I believe it won't be the case for the ethereum mainnet with which is around ~205 GB.

@karalabe
Copy link
Member

@nuliknol Geth should be able to survive crashes just fine, perhaps with minimal data loss that can be recovered from the network. In your specific case, files that Geth wrote contain all zeroes. Any explanation for the data files turning into all 0?

@nuliknol
Copy link

nuliknol commented Feb 20, 2021

@nuliknol Geth should be able to survive crashes just fine, perhaps with minimal data loss that can be recovered from the network. In your specific case, files that Geth wrote contain all zeroes. Any explanation for the data files turning into all 0?

yeah, one explanation could be this:

3.1. Disk drives that do not honor sync requests
Unfortunately, most consumer-grade mass storage devices lie about syncing. Disk drives will report that content is safely on persistent media as soon as it reaches the track buffer and before actually being written to oxide. This makes the disk drives seem to operate faster (which is vitally important to the manufacturer so that they can show good benchmark numbers in trade magazines). And in fairness, the lie normally causes no harm, as long as there is no power loss or hard reset prior to the track buffer actually being written to oxide. But if a power loss or hard reset does occur, and if that results in content that was written after a sync reaching oxide while content written before the sync is still in a track buffer, then database corruption can occur.

USB flash memory sticks seem to be especially pernicious liars regarding sync requests. One can easily see this by committing a large transaction to an SQLite database on a USB memory stick. The COMMIT command will return relatively quickly, indicating that the memory stick has told the operating system and the operating system has told SQLite that all content is safely in persistent storage, and yet the LED on the end of the memory stick will continue flashing for several more seconds. Pulling out the memory stick while the LED is still flashing will frequently result in database corruption.

Note that SQLite must believe whatever the operating system and hardware tell it about the status of sync requests. There is no way for SQLite to detect that either is lying and that writes might be occurring out-of-order. However, SQLite in WAL mode is far more forgiving of out-of-order writes than in the default rollback journal modes. In WAL mode, the only time that a failed sync operation can cause database corruption is during a checkpoint operation. A sync failure during a COMMIT might result in loss of durability but not in a corrupt database file. Hence, one line of defense against database corruption due to failed sync operations is to use SQLite in WAL mode and to checkpoint as infrequently as possible.

Source: https://www.sqlite.org/howtocorrupt.html

my advice: hire a specialized guy in database development , because this freezer stuff looks very amateur coding. How it is possible that the headers file got reset ? The linux kernel did that? I don't think so.
That's why databases first write to WAL (Write Ahead Log) all the transactions, and they make it persistent with fsync(). After WAL is written the engine replays all the transactions on tables and does lots of write()s but these can go without fsync() because the WAL already got the update. Your append()s are not very safe code from the point of view of the design. What if the disk writes the data incompletely, how can you guarantee integrity ? Do you have checksums ? No.That's why an expert in databases will do much better work, preferably a C language guy from linux kernel or OS dev community. I also doubt you did any tests of unclean shutdowns, because I only had about 3 unclean shutdowns on my Archival node (power outage) and the third time I am catching this!! If you would ran dedicated tests for unclean shutdowns (something a specialized database developer will always do because he understands how critical the DURABILITY of ACID is) we, users of geth would never experience such problems. I get power outages all the time, and my linux box restores everything like nothing happened. It is only the apps that are not storing the data in a DURABLE way get in troubles.

For the sake of documentation I am attaching what exactly went wrong and how I fixed it, so other people can have some clues on what to do. Basically I had to replace the headers file (because it got truncated to 0) and the hashes file (from a node that is already in sync).

ancient-corruption-fix

@holiman
Copy link
Contributor

holiman commented Jun 3, 2021

Closing this, we fixed a datarace in the freezer with #22728

@holiman holiman closed this as completed Jun 3, 2021
@canghai118
Copy link

i don't think this bug is fixed, I am still facing this issue at 2022
image
and there are files with zero size in ancient folder
image

this is happing after power outage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants