Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a more idiomatic way to define hashing #202

Open
hynek opened this issue Jun 2, 2017 · 15 comments
Open

Add a more idiomatic way to define hashing #202

hynek opened this issue Jun 2, 2017 · 15 comments
Labels

Comments

@hynek
Copy link
Member

hynek commented Jun 2, 2017

The current way of using True/False/None is confusing even to me.

I propose we use an Enum (probably just class on legacy Python) a la:

class Hashing(Enum):
    SMART = "smart"  # None → new default
    BY_ID = "by_id" # False
    WRITE_FOR_ALL = "write_for_all" # True
    WRITE_FOR_NONE = "write_for_none" # True, but the default on attr.ibs is False

The last two names are terrible of course.


I probably could be talked into depending on https://pypi.org/project/enum34/ on legacy Python.

Opinions?

P.S. Now while typing this out I realized that this could solve a long-standing complaint both I and many users have: assuming you want an attrs-generated method based on a small subset of attr.ibs, you have to write a lot of hash|init|repr=False. Ditching bools for enums would allow for that. 🤔 But let’s talk about that in a different ticket.

@Tinche
Copy link
Member

Tinche commented Jun 5, 2017

We should definitely do something to make the hashing behavior clearer.

I'm currently working on a PR to "fix" slots hashing behavior, and attr.s(hash=False) is really throwing me off.

The docs say nothing about what should happen when hash=False.

My personal understanding would be for attrs to do exactly nothing, we're saying do not touch __hash__. However this is complicated. I will demonstrate.

Assume a very simple class, with hashing disabled:

@attr.s(hash=False)
class A:
    a = attr.ib()

The equivalent of this class is: https://gist.github.com/Tinche/d8c1dadfaa3aa4618b736a61e752b3e1

Try pasting them both. The attrs one will be hashable, due to falling back to __object__.__hash__. The normal one will not be.

This is because in the normal class, the interpreter will see the __cmp__ and insert __hash__ = None. In the attrs class, the __cmp__ method isn't there when the class is defined; we put it into the class after the fact. So the interpreter won't touch __hash__ and it will fall through to object.

This is the root cause of why our slots behavior is different.

@hynek hynek modified the milestone: 17.3 Jun 6, 2017
@hynek
Copy link
Member Author

hynek commented Jun 6, 2017

Yeah so that’s settled.

Ideas for better names for WRITE_FOR_ALL/NONE? @glyph? @njsmith?

P.S. the replacement/slots thingie explains a lot of my own confusion.

@Tinche
Copy link
Member

Tinche commented Jun 6, 2017

What functionalities are we aiming for exactly?

I'm guessing:

  • make hashable by generating a hash function based on attributes
  • make hashable, but attributes are opt-in instead of opt-out
  • make hashable by using identity-based hashing
  • make unhashable
  • delegate to an existing self.__hash__, whatever that may be

(edit) Also:

  • smart

@njsmith
Copy link

njsmith commented Jun 6, 2017

I don't really understand the original post, but regarding the cases that need to be supported: I think it helps narrow things down to remember that for any given __eq__ there are exactly two valid ways of defining __hash__. Either it must include exactly the same things that go in the __eq__ calculation, or else you have to set __hash__ = None. This is just a fact about the language; if you violate it then dicts stop working.

So if attrs is generating its own __eq__, then the only meaningful decision is "do you want this object to be hashable?", and right now the three answers are "yes" (True), "no" (False), and "guess" (None, meaning yes for immutable objects and no otherwise). (Except apparently the "no" case is buggy? I'm pretty sure that the intention was for cmp=True, hash=False to make an unhashable class.) And then on a per attribute level you can also say yes/no/guess, but I can't think of any time that "guess" is wrong; the other options just give you interesting ways to break your class.

And if attrs isn't generating its own __eq__, then not sure why it would ever want to generate a __hash__. If the user is taking responsibility for __eq__ then we should stay out of the way and give them the regular python behavior imo.

@njsmith
Copy link

njsmith commented Jun 6, 2017

That said, I can see the case for having nicer shorthands for controlling which attribs go into the __eq__ calculation, like toggling it between opt-in and opt-out. But there's no reason to give __hash__ a separate set of toggles here; it should follow what __eq__ does.

@hynek
Copy link
Member Author

hynek commented Jun 7, 2017

Actually, you specifically asked for being able to fall back to hashing by ID – that’s why it’s in there. :) And I think that’s a great and useful feature, so I’m not trying to push the blame in your direction – just reminding. :)

The point of this ticket isn’t really to invent new behaviors, it’s just making the existing ones more understandable. Especially because one of the happened by accident (see also #205).

But maybe…how about this:

class Hashing(Enum):
    SMART = "smart"  # None → new default
    UNHASHABLE = "unhashable"  # same as None for unfrozen
    BY_ID = "by_id" # False
    BY_ATTRS = "opt_out" # True
    BY_ATTRS_OPT_IN = "opt_in" # True, but the default on attr.ibs is False

@njsmith
Copy link

njsmith commented Jun 7, 2017

Falling back to hashing by id is definitely useful! If, and only if we are also falling back to doing __eq__ by id. That falls under my line about "If the user is taking responsibility for __eq__ then we should stay out of the way and give them the regular python behavior"

@hynek
Copy link
Member Author

hynek commented Jun 7, 2017

Yeah so the fallback kind of happened by accident (cf. Tin’s explanations) but this time we have the chance to fix it properly including the deprecation year if we switch to enums.

@njsmith
Copy link

njsmith commented Jun 7, 2017

The thing where cmp=True, hash=False ends up with a generated __eq__ combined with __hash__-by-id just sounds like a straightforward bug to me. It just makes it so you can have two objects that compare equal but if you put them both into the same set then it's literally random whether you end up with 1 object added or 2 objects added.

Beyond that I'm mostly just kinda lost about what all these enums are supposed to be or what the overall goal is here. I think I showed up in the middle of the conversation :-)

@hynek
Copy link
Member Author

hynek commented Jun 12, 2017

So I guess we have to break bw-compat again and set __hash__ to None by hand if __eq__ is set? :|

@glyph
Copy link
Contributor

glyph commented Jun 12, 2017

So I guess we have to break bw-compat again and set hash to None by hand if eq is set? :|

Woah there, Tex. Ease your finger up off that trigger :-).

In this case I'm not convinced a breaking change is necessary, because there's no alternate behavior to suggest: a warning would be sufficient. The user has explicitly asked for nonsense, and if the nonsense appears to work today then it's fine to deprecate it for a while before raising an error.

@hynek
Copy link
Member Author

hynek commented Jun 12, 2017

Well the problem (which Tin explained elsewhere, I'm gonna add here for posterity) is: normal Python class behavior is setting dunder hash to None, if there's an Dundee eq. If not, it inherits object's by-id semantics. Now our eq is added after python makes this decision (except when using slots) so object's hash remains.

If slots is true it behaves differently because we build the class differently and it becomes unhashable.

IOW

  • we have conflicting behavior between slotted and not,
  • and hash=False doesn't actually mean what it looks like (Don't do anything about hash.) because of a weird side-effect.

Dunno. :|

@glyph
Copy link
Contributor

glyph commented Jun 12, 2017

Now our eq is added after python makes this decision (except when using slots) so object's hash remains

Oh, that's deeply unfortunate. I was wondering if the modifying-the-class-after-the-fact thing would come to bite us at some point; looks like it has. Perhaps we should be throwing away the first type and replacing it with a fresh one? (Although that's probably some even worse / more horrible compatibility break than this...)

@glyph
Copy link
Contributor

glyph commented Jun 12, 2017

and hash=False doesn't actually mean what it looks like (Don't do anything about hash.) because of a weird side-effect.

Focusing first on getting the exact right behavior with the new, good enum would be the right way to prioritize this. If we need another compat break (and gosh I hope not) then that's a separate thing.

@hynek
Copy link
Member Author

hynek commented Oct 6, 2017

JFTR, I started to work on this and ran into another “interesting” fact: the behavior Nathaniel described is specific to Python 3. Python 2 will happily hash by ID if there's a __eq__.

hynek added a commit that referenced this issue Oct 6, 2017
This allows us to harmonize class creation for dict and slots classes.

Ref #223 #202
@hynek hynek removed this from the 17.3 milestone Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants