Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting value/python away from jiter's parser/decoders/iterators? #139

Open
fasterthanlime opened this issue Sep 13, 2024 · 6 comments
Open

Comments

@fasterthanlime
Copy link
Contributor

Would you consider splitting those into separate crates?

I'm asking because my crate now has its own Value type (it's not specifically tied to Python, it doesn't Arc all the things for example, it doesn't depend on ahash directly, etc.) and so it only uses the iterator API.

It would be neat to be able to share the parser/iterator between jiter & merde, so that if one adds x86_64 SIMD for example, the other would get it too, etc.

@samuelcolvin
Copy link
Member

The python stuff is already behind a feature flag, so that's fine.

Out of interest, what do you use instead of Arc?

I'd be happy to accept a PR to make Value a feature. Related, @davidhewitt and I were discussing simplifying LazyIndexMap to instead just use a Vec or SmallVec anyway.

@fasterthanlime
Copy link
Contributor Author

You can see my take on Value here: https://docs.rs/merde/latest/merde/enum.Value.html

(with CowStr’s owned type being compact_str if the feature is enabled)

I think for python it probably makes more sense to have everything reference counted, since those are more or less the semantics of any garbage collected language. i’m only thinking about rust API consumers in my case.

@fasterthanlime
Copy link
Contributor Author

I did end up getting rid of the lazy index map because it’s a clever trick if you unconditionally deserialize to that, but in the case of merde the common case is to target a rust struct so there’s no hashing done at all

@fasterthanlime
Copy link
Contributor Author

I'd be happy to accept a PR to make Value a feature. Related, @davidhewitt and I were discussing simplifying LazyIndexMap to instead just use a Vec or SmallVec anyway.

Are you dead-set on keeping it all one single crate? I know inter-crate optimization can be touchy (nothing thin-LTO + more aggressive #[inline] annotations cannot fix afaik) but I do believe there's a clean separation between the iterator interface and the value interface there that would be easier to maintain as two separate crates (for both of us).

@samuelcolvin
Copy link
Member

Opposite question, What's the advantage of splitting?

@fasterthanlime
Copy link
Contributor Author

Opposite question, What's the advantage of splitting?

Testing for combination of feature flags is famously hard to get right, the number of combinations explode, it's so easy to forget about a combination that doesn't work, and the whole editing experience is just awkward (e.g. code editor settings to have cargo check/clippy run with some feature flag combinations).

Dealing with optional dependencies is aggravating too, a lot of things are possible but eventually you're doing a lot of work just to avoid doing cargo publish twice — and that's something that's easily automated with something like https://release-plz.ieni.dev

Working on different component in isolation is also much nicer when they're split: say you're running cargo test on save or something, instead of the compiler having to consider the whole dependency tree, make sure all deps are downloaded, then parse everything and throw away anything that's gated by a cfg, it's only doing exactly the work it needs to do.

The "global" dependency tree will also be clearer: on crates.io / lib.rs, if you go on jiter-core or whatever, you'll see who depends on the core parser/iterator, rather than those who depends on the value/python side of things — feature flags are not tracked at all.

CVEs are reported against entire packages — imagine there's a vuln in jiter (the fat one, with python, value etc.) but not in jiter-core — a of packages would avoid the unnecessary noise.


Sometimes splitting up packages is touchy due to the lack of a clear interface between them. Here it's so clear I was honestly kinda surprised you asked for additional reasoning x) But I'm fairly happy with the reasons I gave, I can rack my brain for more if you'd like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants