Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing 0.2: "impl Valuable for Option" is a design bug. None is indistinguishable from Some(None) #2777

Closed
safinaskar opened this issue Oct 26, 2023 · 4 comments

Comments

@safinaskar
Copy link

safinaskar commented Oct 26, 2023

This report is about tracing 0.2. As well as I understand tracing 0.2 will use valuable. Unfortunately, impl Valuable for Option is implemented so ( https://docs.rs/valuable/0.1.0/src/valuable/valuable.rs.html ):

  impl<T: Valuable> Valuable for Option<T> {
    fn as_value(&self) -> Value<'_> {
        match self {
            Some(v) => v.as_value(),
            None => Value::Unit,
        }
    }
    ...
  }

In other words, Some(x) will be serialized the same way as x. This is, of course, bad for my taste. This contradicts to Rust's strong type system. I think authors of this code don't understand what is Rust's Option and why it exists. This brings us back to C++ madness, where T is implicitly convertible to std::optional<T>.

You may say, this arguments are unconvincing. Yes, I thought so. So, I spend some time thinking about this. And I came up with (I hope) strong argument: in current model if we have Option<Option<i32>>, then None is indistinguishable from Some(None). I. e. consider this code:

let a: Option<Option<i32>> = None;
let b: Option<Option<i32>> = Some(None);
info!(x = a);
info!(x = b);

In valuable branch of tracing 0.2 a and b will be serialized same way, and thus we will have data loss. (I didn't test this, but [looking at code] I'm nearly sure this is what we will get. If you want, I can test this.)

This is inappropriate. We develop logging system. It must preserve as much information as possible. For later analysis.

You may say: "But serializing with serde_json does the same!" Well, yes. But I have counterarguments:

  • In serde (at conceptual level) we first convert our source data to serde data model and then convert it to final representation, such as JSON string. (Similar thing happens with valuable.) Serde data model properly deals with Option: https://serde.rs/data-model.html . Data loss happens when we convert serde data model to JSON. Thus, even if we think of this as a bug, this bug should be fixed in serde_json crate, not in serde crate. Or some alternative to serde_json should be developed. There is no problem with serde data model. In valuable situation differs. We have problem in data model. impl Valuable for Option immediately loses information, unlike impl Serialize for Option. (Go look at both implementations, you will see the difference.)
  • Most of users of serde have needs, which are different from tracing needs. When you serialize something with serde, you usually know what output you want to get. I. e. output is for some external tool, and you know exact format this external tool expects. Also, data generated with serde usually consumed immediately, and thus any problems will be known early. Situation with tracing differs. tracing is for logging. I. e. tracing should somehow preserve all data, it doesn't matter how exactly. And need for investigation may arise much much later. I. e. after year of collecting logs you may discover that your None is indistinguishable from Some(None). Oops
  • And finally, any change to serde_json will break a lot of users. tracing 0.2 is not released yet, so we have some time

So here are my suggestions. First, we need different representation for Option (i. e. Valuable for Option should be implemented differently). I propose treating Some(x) as [x] and None as []. Or we can choose serde's approach: add Option to data model. Then consumers of data model (i. e. Visit) will somehow deal with Option.

Okay, so Valuable for Option will be implemented differently. And, of course, we should audit other implementations for Valuable. Also, we will need some Option-like type, which will preserve past Option behavior. Because, it seems, some people still need it (such as this: #2745 ). So we will need our own SkipOption or ThinOption. We also need some trait for converting from Option to our SkipOption, so user will be able to write this:

info!(x = a.as_skip_option());

and get old Option behavior.

You may say: "Why you reported this here, not to valuable?" As I said, tracing has unique needs: it should preserve as much information as possible, and this doesn't matter how exactly. This may differ with needs of other valuable users. So my point is this: this should be somehow solved in tracing. One way is to fix valuable. Another way is for tracing to migrate to something else

Version

valuable branch of tracing 0.2

@safinaskar safinaskar changed the title tracing 2.0: "impl Valuable for Option" is a design bug. None is indistinguishable from Some(None) tracing 0.2: "impl Valuable for Option" is a design bug. None is indistinguishable from Some(None) Oct 26, 2023
@davidbarsky
Copy link
Member

Thank you for opening this issue, and I'm sorry that my response isn't as thorough as your issue, as I am currently on vacation. My opinion is that while I sympathize with your frustration, I think that changing this behavior would undermine interoperability with the systems used for analysis while privileging what some—myself included—would consider to be an uncommon pattern (nested Options) that should instead be handled at the boundaries of your system or modeled through an enum. I'll expand on each:

  • valuable is designed to be an dyn-safe sibling of serde whose release cycle can be controlled by tracing authors/contributors. valuable also cares about interoperability with formats such as JSON or protobuf where it is not possible to meaningfully (or at the very least, idiomatically) distinguish between Some(None) and None. As a large amount of production usage of tracing makes use of those formats for logging, I'm not inclined to potentially make things more difficult for those existing users.
  • Even if the data model concerns didn't apply, I strongly suspect that you can model your program more effectively by using an enum to distinguish between a Some(None) and a None such that you don't need to rely on the nesting of Options . For example, your logs would contain State::A or State::B, corresponding to Some(None) and None respectively.

One way is to fix valuable. Another way is for tracing to migrate to something else

I alluded to this earlier, but valuable's raison d'être is to support tracing's needs, everything else is secondary. We have no intentions of migrating off of it.

@safinaskar
Copy link
Author

safinaskar commented Oct 27, 2023

@davidbarsky , thanks for answer!

Please, read this post to the end, in the end I will propose wonderful way for tracing to automatically work with all types implementing serde::Serialize.

Yes, now I agree that we should do whatever serde does. If serde_json doesn't distinguish between Some(None) and None, we should do the same.

But Some(None) and None are still different at data model level. And there exist serde serializers, which distinguish between them. For example, ron ( https://crates.io/crates/ron ). If we serialize Some(None) and None directly to ron, we will get different values. But if we serialize them via valuable and valuable-serde, we will get same values. This is incompatibility with serde, and hence wrong.

But I have a solution. What we need? You just said we need object-safe serde::Serialize. It exists! Here: https://crates.io/crates/erased-serde . From David Tolnay, who is also maintainer of serde. This crate has type erased_serde::Serialize with these two properties:

  • It is implemented for all types, which implement serde::Serialize
  • dyn erased_serde::Serialize implements serde::Serialize

I. e. this is exactly object-safe Serialize! I propose migrating from valuable to erased-serde (or releasing new version of valuable by copy-pasting erased-serde). Main benefit is so: tracing will be able to work directly with types, which implement Serialize! Also, all serializers (json, ron, etc) will always work exactly same way as with static serde. I. e. this will solve ron problem I talked above. This solution is simple, elegant and compatible. It works with all serializable types and with all serializers. If we convert data to JSON, we will get exactly the same output, we see with usual static serde_json. With None and Some(None) rendered to same value. This is what everybody expects. But if somebody (for example, me) really cares about data loss, he will be able to write subscriber, which will serialize using ron, and he will see None and Some(None) as different values (this is not possible today).

Note: erased-serde's README may give false impression that to use some serializer (for example, serde_json) it must expose its serde::Serializer (sic! don't confuse with Serialize) implementer. For example, one can think that serde_json should expose serde_json::Serializer. No, it should not! serde::Serialize is implemented on dyn erased_serde::Serialize, here is implementation: https://docs.rs/erased-serde/0.3.31/erased_serde/trait.Serialize.html#impl-Serialize-for-dyn+Serialize+%2B+'erased . This means that we can pass dyn erased_serde::Serialize to any function, which expects serde::Serialize, for example to serde_json::to_string. I. e. erased_serde is absolutely compatible with serde ecosystem. We don't even need to wrap anything to some wrapper struct (this is what valuable_serde does).

Most importantly, this proposal will make tracing to work with all types, which implement Serialize, i. e. with nearly all types in existence. There is no need to impose on everyone yet another trait (i. e. Valuable), which everybody now need to implement (using special derive macro).

Also: erased-serde allocates nothing in serializer (as opposed to deserializer). I checked this by grepping "box" in source code. erased-serde has homegrown Any, which allocates: https:/dtolnay/erased-serde/blob/94ab637b3dd75278e22580741ca3aabce0f7ed3e/src/any.rs#L55 . But it is used in deserializer only. Also sometimes serializer allocates in error.rs. But it seems these allocations happen in error path only. I replaced them with todo!() and run test suite. All tests passed, so it seems there are no allocations in happy path.

Of course, all this applies to allocations done by erased-serde itself. serde_json::to_string, of course, allocates final string

@davidbarsky
Copy link
Member

Please, read this post to the end, in the end I will propose wonderful way for tracing to automatically work with all types implementing serde::Serialize.

I'm sorry if I missed any detail in my original response!

But Some(None) and None are still different at data model level. And there exist serde serializers, which distinguish between them. For example, ron ( https://crates.io/crates/ron ). If we serialize Some(None) and None directly to ron, we will get different values. But if we serialize them via valuable and valuable-serde, we will get same values. This is incompatibility with serde, and hence wrong.

I'm really sorry: the disparity you've raising was documented in tokio-rs/valuable#42. We might revisit this behavior in Valuable 0.2, which is what tracing 0.2 would most likely ship with. However, my points concerning code structure and use of enums still stand and I strongly encourage you to restructure your code such that the distinction between Some(None) and None is not something that your log collection system needs to be concerned with. I understand that you care about solving this particular problem, but I strongly suspect that the need to solve this issue is a sign of some incorrect data modeling in your program.

But I have a solution. What we need? You just said we need object-safe serde::Serialize. It exists! Here: https://crates.io/crates/erased-serde . From David Tolnay, who is also maintainer of serde. This crate has type erased_serde::Serialize. This crate has type erased_serde::Serialize with these two properties:

  • It is implemented for all types, which implement serde::Serialize
  • dyn erased_serde::Serialize implements serde::Serialize

I. e. this is exactly object-safe Serialize! I propose migrating from valuable to erased-serde (or releasing new version of valuable by copy-pasting erased-serde). Main benefit is so: tracing will be able to work directly with types, which implement Serialize! Also, all serializers (json, ron, etc) will always work exactly same way as with static serde. I. e. this will solve ron problem I talked above. This solution is simple, elegant and compatible.

To address the larger, meta-point point of "consider using serde/erased_serde", we are aware of David Tolnay's (excellent!) work and have evaluated it prior to writing Valuable. Unfortunately, there are two issues:

  • erased_serde requires allocations for each nested data structure.
  • tracing maintainers must be able to control the release cycle of something as foundational as a public dependency of tracing-core. We can control the release cycle of valuable, but we cannot with erased_serde. If serde or erased_serde ever releases a major version bump (to either to 2.0 to 0.4, respectively), then tracing would be obligated to also release a major version bump.
    • While we greatly respect dtolnay's work and role in the Rust ecosystem, this is not a risk that tracing can take on or expose to our users. We will not, under any circumstances, take on a public dependency in tracing on a non-std crate that we do not control. I apologize for being so firm on this matter, but this is a policy that has served the Tokio organization extraordinarily well and has allowed us to provide a stable experience to our users for the last several years.

Have you read the blog post announcing Valuable, by any chance? If you haven't, I really recommend that you do.

Most importantly, this proposal will make tracing to work with all types, which implement Serialize, i. e. with nearly all types in existence. There is no need to impose on everyone yet another trait (i. e. Valuable), which everybody now need to implement (using special derive macro).

While I get the (legitimate!) frustration in needing to take on another explicit dependency and add another argument to a derive attribute, we weigh the concerns about end-to-end control of public dependencies substantially higher than asking users to make an additive change to their dependencies.

@safinaskar
Copy link
Author

Thanks again for big answer.

erased_serde requires allocations for each nested data structure

As I said before, serializing code in current version of erased_serde doesn't allocate anything (as opposed to deserializing) in success code path. I conclude this based on careful code reading and running test suite, as I said before.

But... well... while writing this message I looked at erased_serde code again, and now I see that erased_serde actually regularly allocates in latest released version (0.3.31) and doesn't allocate in master (94ab637b3dd75278e22580741ca3aabce0f7ed3e). I'm sorry about confusion.

But okay, I agree with your position. You may close the bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants