Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean numpy-backed type fails when pyarrow is installed in env #3205

Closed
pavlomuts opened this issue Sep 29, 2023 · 2 comments · Fixed by #3210
Closed

Boolean numpy-backed type fails when pyarrow is installed in env #3205

pavlomuts opened this issue Sep 29, 2023 · 2 comments · Fixed by #3210
Labels

Comments

@pavlomuts
Copy link

pavlomuts commented Sep 29, 2023

I am using altair with pandas dataframe with numpy-backed types and I using streamlit to visualize it. streamlit has pyarrow as dependency and it turns out that datatype inference using pyarrow fails for nullable boolean of pandas dtype. Small (unrealistic) example reproduces the error:

import altair as alt
import pandas as pd

data = pd.DataFrame(
    {
        "x": pd.Series([1, 3, 5, 1, 3, 5]),
        "y": pd.Series([2, 4, 6, 2, 4, 6]),
        "flag": pd.Series([True, False, True, False, True, None], dtype="boolean"),
    }
)

chart = alt.Chart(data).mark_circle().encode(x="x", y="y", color="flag")

Traceback:

Traceback (most recent call last):
  File "C:\Users\ad\AppData\Local\Programs\Python\Python311\Lib\runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\Programs\Python\Python311\Lib\runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "c:\Users\ad\.vscode\extensions\ms-python.python-2023.16.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy\__main__.py", line 39, in <module>
    cli.main()
  File "c:\Users\ad\.vscode\extensions\ms-python.python-2023.16.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 430, in main
    run()
  File "c:\Users\ad\.vscode\extensions\ms-python.python-2023.16.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "c:\Users\ad\.vscode\extensions\ms-python.python-2023.16.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\ad\.vscode\extensions\ms-python.python-2023.16.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "c:\Users\ad\.vscode\extensions\ms-python.python-2023.16.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "test.py", line 18, in <module>
    chart.save(file, format="html")
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\vegalite\v5\api.py", line 1066, in save
    result = save(**kwds)
             ^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\save.py", line 189, in save
    perform_save()
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\save.py", line 127, in perform_save
    spec = chart.to_dict(context={"pre_transform": False})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\vegalite\v5\api.py", line 2695, in to_dict
    return super().to_dict(
           ^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\vegalite\v5\api.py", line 903, in to_dict
    vegalite_spec = super(TopLevelMixin, copy).to_dict(  # type: ignore[misc]
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 965, in to_dict
    result = _todict(
             ^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 477, in _todict
    return {k: _todict(v, context) for k, v in obj.items() if v is not Undefined}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 477, in <dictcomp>
    return {k: _todict(v, context) for k, v in obj.items() if v is not Undefined}
               ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 473, in _todict
    return obj.to_dict(validate=False, context=context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 965, in to_dict
    result = _todict(
             ^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 477, in _todict
    return {k: _todict(v, context) for k, v in obj.items() if v is not Undefined}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 477, in <dictcomp>
    return {k: _todict(v, context) for k, v in obj.items() if v is not Undefined}
               ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\schemapi.py", line 473, in _todict
    return obj.to_dict(validate=False, context=context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\vegalite\v5\schema\channels.py", line 34, in to_dict
    parsed = parse_shorthand(shorthand, data=context.get('data', None))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\core.py", line 590, in parse_shorthand
    attrs["type"] = infer_vegalite_type_for_dfi_column(column)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\altair\utils\core.py", line 639, in infer_vegalite_type_for_dfi_column
    kind = column.dtype[0]
           ^^^^^^^^^^^^
  File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\pandas\core\interchange\column.py", line 128, in dtype
    return self._dtype_from_pandasdtype(dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ad\AppData\Local\pypoetry\Cache\virtualenvs\ero-bIEndBiR-py3.11\Lib\site-packages\pandas\core\interchange\column.py", line 147, in _dtype_from_pandasdtype
    byteorder = dtype.byteorder
                ^^^^^^^^^^^^^^^
AttributeError: 'BooleanDtype' object has no attribute 'byteorder'

And my environment:

altair                    5.1.1        Vega-Altair: A declarative statistical visualization library for Python.
astroid                   2.15.8       An abstract syntax tree for Python with inference support.
flake8                    6.1.0        the modular source code checker: pep8 pyflakes and co
packaging                 23.1         Core utilities for Python packages
pandas                    2.1.1        Powerful data structures for data analysis, time series, and statistics
pathspec                  0.11.2       Utility library for gitignore style pattern matching of file paths.
pillow                    9.5.0        Python Imaging Library (Fork)
platformdirs              3.10.0       A small Python package for determining appropriate platform-specific dirs, e.g. a "user data dir".
pluggy                    1.3.0        plugin and hook calling mechanisms for python
protobuf                  4.24.3
pyarrow                   13.0.0       Python library for Apache Arrow
requests                  2.31.0       Python HTTP for Humans.
rich                      13.5.3       Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal
rpds-py                   0.10.3       Python bindings to Rust's persistent data structures (rpds)
ruff                      0.0.291      An extremely fast Python linter, written in Rust.
scipy                     1.11.2       Fundamental algorithms for scientific computing in Python
six                       1.16.0       Python 2 and 3 compatibility utilities
smmap                     5.0.1        A pure Python implementation of a sliding window memory map manager
snakeviz                  2.2.0        A web-based viewer for Python profiler output
sqlalchemy                2.0.21       Database Abstraction Library
streamlit                 1.27.0       A faster way to build and share data apps
tabulate                  0.9.0        Pretty-print tabular data

yamllint                  1.32.0       A linter for YAML files.
zipp                      3.17.0       Backport of pathlib-compatible object wrapper for zip files

Thank you for taking a looking and for making such a great tool!

@pavlomuts pavlomuts added the bug label Sep 29, 2023
@jonmmease
Copy link
Contributor

Thanks for the report @pavlomuts. This looks like something we'll need to report upstream to pandas and work around in Altair. I'll try to take a closer look soon.

@jonmmease
Copy link
Contributor

Reported upstream in pandas-dev/pandas#55332 and worked around in #3210. We should be able to get this into the 5.1.2 release next week.

A workaround in the meantime is to specify the encoding type of the boolean column explicitly (e.g. for the default of nominal encoding use color="flag:N"):

chart = alt.Chart(data).mark_circle().encode(x="x", y="y", color="flag:N")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants