Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process hangs when verifying proof read&deserialized from file #1411

Open
Yoga2015 opened this issue Feb 5, 2024 · 27 comments · May be fixed by #1857
Open

process hangs when verifying proof read&deserialized from file #1411

Yoga2015 opened this issue Feb 5, 2024 · 27 comments · May be fixed by #1857
Labels
bug Something isn't working

Comments

@Yoga2015
Copy link

Yoga2015 commented Feb 5, 2024

Original issue at Discord: https://discord.com/channels/484437221055922177/1204071024191082496

All the code at https:/TokeniZK/zkapp-test/tree/main/src
tried with o1js version: 0.15.2, 0.15.4, & 0.16.0

Reproduce steps:

I have a demo zkProgram(https:/TokeniZK/zkapp-test/blob/main/src/zkprogram.ts),

step1: npm run step1
(seen at /src/step1.ts) I exec one method & gen proof, then serialize the proof into file: proof.json . and then read&deserialize it from file and verify it and smoothly get the result: True.

However,
step2: (seen at /src/step2.ts) npm run step2
I start another process to read the proof from the file on step1(proof.json), deserialize it and try to verify the proof, But the process keep await at verify(*) line !!

@Yoga2015
Copy link
Author

Yoga2015 commented Feb 5, 2024

hi, @mitschabaude, could u please suggest me? thx

@dfstio
Copy link

dfstio commented Feb 6, 2024

Using compile({ forceRecompile: true }) fixes it

@Yoga2015 Yoga2015 closed this as completed Feb 6, 2024
@Yoga2015
Copy link
Author

Yoga2015 commented Feb 6, 2024

thx!

@mitschabaude
Copy link
Contributor

Do I understand this correctly - you have to do compile({ forceRecompile: true }) every time you're loading the proof from a file?

Then it's still a bug, we do want this to work from cached verification keys. I'll reopen this to track it if you don't mind

@mitschabaude mitschabaude reopened this Feb 6, 2024
@mitschabaude mitschabaude added the bug Something isn't working label Feb 6, 2024
@dfstio
Copy link

dfstio commented Feb 6, 2024

Yes, it is correct. I've created a small repo with my usual settings that work for me in MinaNFT:
https:/dfstio/zkapp-test/tree/dfst (branch dfst)
based on @Yoga2015 code

@mitschabaude, you can use it for debugging

See also
https://discord.com/channels/484437221055922177/1203303039633465344
https://discord.com/channels/484437221055922177/1204071024191082496

@SpaceManiac
Copy link

Running in to the same problem with the dynamic verify method. It seems the workaround is to either forceRecompile: true or cache: Cache.None any program at all before attempting to verify proofs. It doesn't have to be the same program you're verifying proofs for.

Single-file repro. Will succeed on first run, hang on second run, and can be fixed by changing false to true on line 4.

import { Bool, Cache, verify, ZkProgram } from 'o1js';

//  v-- Change to `true` to fix the hang, even though NoopProgram != VerifyMeProgram.
if (false) {
  const NoopProgram = ZkProgram({
    name: 'NoopProgram',
    methods: {
      chaff: {
        privateInputs: [],
        async method(b) {}
      }
    }
  });
  await NoopProgram.compile({ cache: Cache.None });
}

const inputs = await (typeof getPreviousProof !== 'undefined' ? getPreviousProof : createProof)();
console.log('verifying...');
console.log('ok?', await verify(inputs.proof, inputs.verificationKey));

async function createProof() {
  const VerifyMeProgram = ZkProgram({
    name: 'NoopProgram',

    methods: {
      proof: {
        privateInputs: [Bool],
        async method(b) {
          b.assertTrue();
        },
      },
    },
  });

  console.log('compiling...');
  const { verificationKey } = await VerifyMeProgram.compile();
  console.log('proving...');
  const proof = (await VerifyMeProgram.proof(Bool(true))).toJSON();
  const data = { verificationKey, proof };
  const fs = await import('fs/promises');
  const url = await import('url');
  await fs.appendFile(url.fileURLToPath(import.meta.url), `async function getPreviousProof() { return ${JSON.stringify(data)}; }`);
  return data;
}

@mitschabaude
Copy link
Contributor

Thanks for the great analysis @SpaceManiac

Seems that 'verify()' relies on some values computed during compilation, and isn't able to get them from a cache. We need to fix this!

@45930
Copy link
Contributor

45930 commented Jul 30, 2024

@Yoga2015 I cloned your repo zkapp-test and upgraded to o1js 1.6.0 and found that both scripts work

➜  zkapp-test git:(main) ✗ npm run step1

> [email protected] step1
> tsc -p tsconfig.json --module esnext && cross-env TEST_ON_BERKELEY=true node --experimental-specifier-resolution=node ./dist/src/step1.js

ProverX_VK: 4245952421456887355009860533252876175165009247779780553775848711375535231266
ok: true
proofStr: {"publicInput":[],"publicOutput":["0"],"maxProofsVerified":0,"proof":"KChzdGF0ZW1lbn.....yMUFGMTEzNzUpKSkpKSkp"}
ok2: true
➜  zkapp-test git:(main) ✗ npm run step2

> [email protected] step2
> tsc -p tsconfig.json --module esnext && cross-env TEST_ON_BERKELEY=true node --experimental-specifier-resolution=node ./dist/src/step2.js

ProverX_VK: 4245952421456887355009860533252876175165009247779780553775848711375535231266
str2: {"publicInput":[],"publicOutput":["0"],"maxProofsVerified":0,"proof":"KChzdGF0Z....EY4RjUpKSkpKSkp"}
ok2: true

@SpaceManiac
Copy link

SpaceManiac commented Jul 30, 2024

I still see the hang on o1js 1.6.0. Stack trace for the record:

global.$std::sys::wasm::locks::futex_condvar::Condvar::wait::h28b31f59c39a18e1 (011a8aae:1926773)
global.$rayon_core::latch::LockLatch::wait_and_reset::h078cf456c369aa7d (011a8aae:1849956)
global.$std::thread::local::LocalKey<T>::with::h2ce6fa65e023e5b9 (011a8aae:1789370)
global.$rayon_core::registry::Registry::in_worker::h1abee179cbaab566 (011a8aae:1397531)
global.$caml_pasta_fq_plonk_proof_batch_verify (011a8aae:1843699)
module.exports.caml_pasta_fq_plonk_proof_batch_verify (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/plonk_wasm.cjs:340)
caml_pasta_fq_plonk_proof_batch_verify (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:4577)
<anonymous> (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:169582)
Promise.then (Unknown Source:0)
deferred_run (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:4882)
batch_verify$0 (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:169580)
caml_call2 (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:6803)
batch_verify (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:159208)
batch_verify$0 (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:159212)
caml_call1 (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:6801)
_iIy_ (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:349923)
<anonymous> (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:4855)
Promise.then (Unknown Source:0)
deferred_bind (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:4853)
caml_call2 (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:6803)
verify_heterogenous (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:349950)
<anonymous> (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:355112)
caml_call1 (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:6801)
with_return (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:78372)
verify_promise$0 (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:355110)
caml_call2 (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:6803)
verify$2 (/PROJECT/node_modules/o1js/dist/node/bindings/compiled/_node_bindings/o1js_node.bc.cjs:431486)
<anonymous> (/PROJECT/node_modules/o1js/dist/node/lib/proof-system/zkprogram.js:318)
withThreadPool (/PROJECT/node_modules/o1js/dist/node/bindings/js/node/node-backend.js:48)

@45930
Copy link
Contributor

45930 commented Jul 30, 2024

Running Yoga's original example with forceRecompile: false still hangs as well. They must have updated to using forceRecompile: true after that workaround was discovered.

@SpaceManiac in your snippet getPreviousProof is referenced but not defined. Is there a declaration missing?

@SpaceManiac
Copy link

createProof() appends its definition to the file after the first run (perhaps too cute...)

@45930
Copy link
Contributor

45930 commented Aug 27, 2024

Whatever this bug is, it seems deeply lodged in the Pickles code. Here's what I found...

There are 2 cases I looked at (mostly all reproducing what @SpaceManiac has already reported).

  1. Verifying a serialized JSON proof with a serialized verification key without compiling anything first
  2. Verifying a serialized JSON proof with a serialized verification key after compiling a random zk program circuit (not the same circuit that we are verifying).

The expected behavior is that these cases both work, but in the first case, the verification call hangs forever and never returns a boolean. The second case works as expected.

o1js

In the example, we call the verify function exported in

async function verify(

In both cases, the proof we pass is in a string, so we hit the same code path. I've logged the proof and verification key in o1js from each case, and they are the exact same.

ocaml

Entering the Pickles layer, the call to verify happens in https:/o1-labs/o1js-bindings/blob/8e99acc94da892eeb090367844f85d22ecf4f93b/ocaml/lib/pickles_bindings.ml#L713

In each case, we get passed the deserialization step and enter the async code:

  Pickles.Side_loaded.verify_promise ~typ [ (vk, statement, proof) ]
  |> Promise.map ~f:(fun x -> Js.bool (Or_error.is_ok x))
  |> Promise_js_helpers.to_js

That is as far as I can trace it for now.

@dfstio
Copy link

dfstio commented Aug 28, 2024

The expected behavior is that these cases both work, but in the first case, the verification call hangs forever and never returns a boolean. The second case works as expected.

Would this behavior change if you would call initializeBindings() first?

@mitschabaude
Copy link
Contributor

The expected behavior is that these cases both work, but in the first case, the verification call hangs forever and never returns a boolean. The second case works as expected.

Would this behavior change if you would call initializeBindings() first?

can't be, verify() calls that as well right at the beginning

@mitschabaude
Copy link
Contributor

mitschabaude commented Aug 28, 2024

  • Verifying a serialized JSON proof with a serialized verification key without compiling anything first
  • Verifying a serialized JSON proof with a serialized verification key after compiling a random zk program circuit (not the same circuit that we are verifying).

The expected behavior is that these cases both work, but in the first case, the verification call hangs forever and never returns a boolean. The second case works as expected.

this is great progress! so it doesn't depend on any circuit-specific artifact

That is as far as I can trace it for now.

we will probably only resolve this issue by digging into the Ocaml/Rust verification code and finding out what implicit assumptions it makes on objects that exist

And we have a list of objects that are likely contenders: every large object (i.e. subject to caching instead of always computing on the fly) that is shared among different circuits/programs. Could be the SRS?

@dfstio
Copy link

dfstio commented Aug 28, 2024

Could be the SRS?

In my example, if I remove the following files from the cache:

wrap-pk-proverx
wrap-pk-proverx.header
wrap-vk-proverx
wrap-vk-proverx.header

verify() runs successfully. If I keep those files in the cache folder, verify() hangs

@mitschabaude
Copy link
Contributor

If I keep those files in the cache folder, verify() hangs

that is when you compile before verifying right? my explanation for that would be that if those are not present, compile() "fully" recompiles, i.e. same behaviour as forceRecompile: true, and in the process also generates those other objects implicitly needed by verify()

@dfstio
Copy link

dfstio commented Aug 28, 2024

that is when you compile before verifying right?

Yes, that's right. Removing any other files does not affect the behavior; the process still hangs.

@45930
Copy link
Contributor

45930 commented Aug 28, 2024

we will probably only resolve this issue by digging into the Ocaml/Rust verification code and finding out what implicit assumptions it makes on objects that exist

@mitschabaude it looks like the verification code actually lives in the mina repo. Is that right? For instance, o1js-bindings calls Pickles.Side_loaded.verify_promise, but the implementation of that function is in mina in src/lib/pickles/pickles.ml. So would it make sense to open a new issue in that repo and link back to this one?

@mitschabaude
Copy link
Contributor

@mitschabaude it looks like the verification code actually lives in the mina repo. Is that right? For instance, o1js-bindings calls Pickles.Side_loaded.verify_promise, but the implementation of that function is in mina in src/lib/pickles/pickles.ml. So would it make sense to open a new issue in that repo and link back to this one?

yes it lives in the mina repo! no opinion on where to host the issue (it could also be that it's somehow caused by o1js behaviour of caching the SRS, which lives in the bindings repo

@andrewferrone
Copy link

Hey here, is there a workaround for this issue? Hard to tell from the comment thread. Many thanks.

@45930
Copy link
Contributor

45930 commented Sep 24, 2024

@andrewferrone, the best workaround at the moment is to compile an empty zk program, then use a cached proof from disk.

For instance

const NoopProgram = ZkProgram({
  name: 'NoopProgram',
  methods: {
    chaff: {
      privateInputs: [],
      async method(b) {}
    }
  }
});
await NoopProgram.compile({ cache: Cache.None });

@SebastienGllmt
Copy link

This is not a realistic workaround if you're trying to ship an NPM package that others use. I don't want to include a disclaimer in my NPM package that if downloading the package breaks your build, you have to compile an empty zk program to fix it.

@45930
Copy link
Contributor

45930 commented Oct 3, 2024

Fair enough. We are working on this and prioritizing the fix. The work around should unblock many users, but I agree it’s not a suitable solution for your use case.

@mrmr1993
Copy link
Member

mrmr1993 commented Oct 7, 2024

We're not pre-generating the langrange basis that we need before running the verifier. When we find it missing, the rust code panics, and WASM treats the panic as a halt. The main thread hangs waiting for the dead thread.

Will post a fix soon.

@mrmr1993
Copy link
Member

mrmr1993 commented Oct 8, 2024

@45930 Fix here: #1857. Can you chase the right reviewers etc. to get it (and the corresponding PRs for the Mina, bindings, proof-systems submodules) merged?

@45930
Copy link
Contributor

45930 commented Oct 8, 2024

@mrmr1993 I'll do my best :) thanks for pitching in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants