Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use np.dtype='bool' at all? #494

Open
doronbehar opened this issue Oct 14, 2024 · 4 comments · May be fixed by #496
Open

Cannot use np.dtype='bool' at all? #494

doronbehar opened this issue Oct 14, 2024 · 4 comments · May be fixed by #496
Labels

Comments

@doronbehar
Copy link

Describe the bug

python -c "
from deepdiff import DeepHash 
import numpy as np
d = {'p': np.array([True], dtype='bool')}
print(DeepHash(d)[d])
"

Gives me:

8.0.0
Traceback (most recent call last):
  File "<string>", line 6, in <module>
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 211, in __init__
    self._hash(obj, parent=parent, parents_ids=frozenset({get_id(obj)}))
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 537, in _hash
    result, counts = self._prep_dict(obj=obj, parent=parent, parents_ids=parents_ids)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 401, in _prep_dict
    hashed, count = self._hash(item, parent=key_in_report, parents_ids=parents_ids_added)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 556, in _hash
    result, counts = self._prep_iterable(obj=obj, parent=parent, parents_ids=parents_ids)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 434, in _prep_iterable
    hashed, count = self._hash(item, parent=new_parent, parents_ids=parents_ids_added)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 561, in _hash
    result, counts = self._prep_obj(obj=obj, parent=parent, parents_ids=parents_ids)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 355, in _prep_obj
    result, counts = self._prep_dict(obj, parent=parent, parents_ids=parents_ids,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 401, in _prep_dict
    hashed, count = self._hash(item, parent=key_in_report, parents_ids=parents_ids_added)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/cv8dkyqwqbsdrjy1ji2nvxamvvy8ivsa-python3-3.12.6-env/lib/python3.12/site-packages/deepdiff/deephash.py", line 503, in _hash
    result, counts = self.hashes[obj]
                     ~~~~~~~~~~~^^^^^
ValueError: memoryview: hashing is restricted to formats 'B', 'b' or 'c'

Why? Isn't a boolean datatype supposed to be the simplest dtype there is?

To Reproduce

Above.

Expected behavior

No error.

OS, DeepDiff version and Python version (please complete the following information):

  • OS: NixOS
  • Version nixos-unstable
  • Python Version 3.11 & 3.12
  • DeepDiff Version 8.0.0
@seperman
Copy link
Owner

@doronbehar Thanks for reporting the bug. It is not supported because nobody until now has run into this issue and reported it. Which means boolean dtype is not very popular even if it is the simplest.
Do you think you may have time to make a PR for it? PRs are always very welcome!

@seperman seperman added the bug label Oct 14, 2024
@doronbehar
Copy link
Author

OK I see, I thought that deepdiff decided by itself due to an unclear reason to restrict hashing to formats 'B', 'b' and 'c' :), that's why I phrased my question like that.

And yes, I won't mind giving this a bit of effort. However I have no idea where that memoryview comes from.. I can create a PR that will simply skip memory view obj variables, but I'm not sure whether that is the correct thing to do. Here's what I did in the meantime:

diff --git i/deepdiff/deephash.py w/deepdiff/deephash.py
index 32fee9c..1258713 100644
--- i/deepdiff/deephash.py
+++ w/deepdiff/deephash.py
@@ -500,6 +500,8 @@ class DeepHash(Base):
         else:
             result = not_hashed
         try:
+            print("obj is", obj)
+            print("hashes are", self.hashes)
             result, counts = self.hashes[obj]
         except (TypeError, KeyError):
             pass

And ran the same reproducing snippet, and got:

obj is {'p': array([ True])}
hashes are {<object object at 0x7ff75fb7a890>: []}
obj is p
hashes are {<object object at 0x7ff75fb7a890>: []}
obj is [ True]
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1)}
obj is True
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1)}
obj is T
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1)}
obj is base
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1), 'T': ('1fc05e26a2f596e4108cb887c23b73551ce8faba3d7f6fd07b468d0df826b8f3', 1)}
obj is None
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1), 'T': ('1fc05e26a2f596e4108cb887c23b73551ce8faba3d7f6fd07b468d0df826b8f3', 1), 'base': ('d4752d47041b2df8f562b307b48709b90fcfc8ee56dd5a1df2a8d2fe2427f27e', 1)}
obj is data
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1), 'T': ('1fc05e26a2f596e4108cb887c23b73551ce8faba3d7f6fd07b468d0df826b8f3', 1), 'base': ('d4752d47041b2df8f562b307b48709b90fcfc8ee56dd5a1df2a8d2fe2427f27e', 1), None: ('bbd393a60007e5f9621b8fde442dbcf493227ef7ced9708aa743b46a88e1b49e', 1)}
obj is <memory at 0x7ff71c284540>
hashes are {<object object at 0x7ff75fb7a890>: [], 'p': ('682328452ad5d85d3e4ab905b5337a443d43adfeefb3c89d95b477f92f7fe96e', 1), 'T': ('1fc05e26a2f596e4108cb887c23b73551ce8faba3d7f6fd07b468d0df826b8f3', 1), 'base': ('d4752d47041b2df8f562b307b48709b90fcfc8ee56dd5a1df2a8d2fe2427f27e', 1), None: ('bbd393a60007e5f9621b8fde442dbcf493227ef7ced9708aa743b46a88e1b49e', 1), 'data': ('2f8c213f30eab7fcc3c2f9c88010ebad400be515f7c9f746ca13efcb1fb7ed75', 1)}

@seperman
Copy link
Owner

@doronbehar Does this look relevant? https://stackoverflow.com/a/38837737/1497443
It seems for hashing the boolean dtype, we should do hash(bytes(image1)))

doronbehar added a commit to doronbehar/deepdiff that referenced this issue Oct 19, 2024
@doronbehar
Copy link
Author

It came out to be simpler then I thought :) solution is in #496

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants