Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix discovery of modules in namespace packages #228

Merged

Conversation

eltoder
Copy link
Contributor

@eltoder eltoder commented Mar 21, 2024

Instead of relying on __init__.py files, stop at the first parent directory that is in sys.path. This gives the shortest module name under which the file can really be imported. (Unless there are name conflicts in sys.path, which is arguably a misconfiguration; this is caught by the location check in module_tree.)

Instead of relying on `__init__.py` files, stop at the first parent
directory that is in `sys.path`. This gives the shortest module name
under which the file can really be imported. (Unless there are name
conflicts in `sys.path`, which is arguably a misconfiguration; this
is caught by the location check in `module_tree`.)
Copy link
Owner

@ariebovenberg ariebovenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach, seems indeed better than the current one. Would it be doable to add an end-to-end test in test_cli.py that demonstrates the need for this? (i.e. a test that fails without your fix)

"Recursively find modules at given path. Nonexistent Path is ignored"
if python_path is None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems this if is only triggered on the first call. Can you make the argument non-optional to prevent this check every call? Minor inconvenience to callers to do frozenset(map(Path, sys.path)) themselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factored out an internal helper function instead so I don't need to update every caller of find_modules, and also in case we'll want to change the algorithm in the future.

src/slotscheck/discovery.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Mar 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (8f0d153) to head (9f60d08).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #228   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          495       509   +14     
  Branches       103       106    +3     
=========================================
+ Hits           495       509   +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eltoder eltoder force-pushed the feature/discover-namespace-packages branch 3 times, most recently from e293d77 to abda1a7 Compare March 22, 2024 15:07
@eltoder
Copy link
Contributor Author

eltoder commented Mar 22, 2024

@ariebovenberg I also added a small change that allows packages to span multiple directories, which is an important use case for namespace packages.

@ariebovenberg
Copy link
Owner

Excellent! Ill have a good look this weekend.

Note: please update the docs on namespace packages (grep namespace in docs/) and if relevant the docs on module discovery.

@eltoder eltoder force-pushed the feature/discover-namespace-packages branch from abda1a7 to 1789a08 Compare March 22, 2024 15:45
@eltoder
Copy link
Contributor Author

eltoder commented Mar 22, 2024

@ariebovenberg Shall I just delete the section on namespace packages? Otherwise I have to say that everything works both with and without -m, which is not informative.

@eltoder
Copy link
Contributor Author

eltoder commented Mar 22, 2024

I think the docs on discovery can stay as is: the general idea and all provided examples still hold, and the specific implementation that I changed was not described.

@ariebovenberg
Copy link
Owner

I've noticed that the end-to-end suite doesn't properly cover the 'no extra sys.path' case, which is actually quite common for users that just run slotscheck <path> expecting it to work.

in version 0.18 you get:

$ slotscheck tests/examples/files/my_scripts/foo.py
ERROR: Module 'foo' not found.

See slotscheck.rtfd.io/en/latest/discovery.html
for help resolving common import problems.

with the latest changes becomes really ugly:

Traceback (most recent call last):
  File "/Users/arie/.pyenv/versions/slotscheck312/bin/slotscheck", line 6, in <module>
    sys.exit(root())
             ^^^^^^
  File "/Users/arie/.pyenv/versions/3.12.0/envs/slotscheck312/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arie/.pyenv/versions/3.12.0/envs/slotscheck312/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/arie/.pyenv/versions/3.12.0/envs/slotscheck312/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arie/.pyenv/versions/3.12.0/envs/slotscheck312/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arie/code/slotscheck/src/slotscheck/cli.py", line 164, in root
    classes, modules = _collect(files, module, conf)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arie/code/slotscheck/src/slotscheck/cli.py", line 383, in _collect
    modules_inspected = list(
                        ^^^^^
  File "/Users/arie/code/slotscheck/src/slotscheck/discovery.py", line 318, in _find_modules
    parents = list(_module_parents(p, sys_path))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/arie/code/slotscheck/src/slotscheck/discovery.py", line 309, in _module_parents
    raise ValueError(f"File {p} is outside of PYTHONPATH ({sys.path})")
ValueError: File /Users/arie/code/slotscheck/tests/examples/files/my_scripts/foo.py is outside of PYTHONPATH (['/Users/arie/.pyenv/versions/3.12.0/envs/slotscheck312/bin', '/Users/arie/.pyenv/versions/3.12.0/lib/python312.zip', '/Users/arie/.pyenv/versions/3.12.0/lib/python3.12', '/Users/arie/.pyenv/versions/3.12.0/lib/python3.12/lib-dynload', '/Users/arie/.pyenv/versions/3.12.0/envs/slotscheck312/lib/python3.12/site-packages', '/Users/arie/code/slotscheck/src'])

The solution should be really easy though: raise ModuleNotFoundError instead of the ValueError. There is already a handler for this in cli.py that turns this into a nice message.

To prevent this regression in the future, let's also have a test case like this in cli.py:

def test_python_file_not_in_sys_path(runner: CliRunner):
    result = runner.invoke(cli, ["existing/path/to/python_file.py"])
    assert result.exit_code == 1
    assert isinstance(result.exception, SystemExit)
    assert result.output == (
        "ERROR: Module 'foo' not found.\n\n"
        "See slotscheck.rtfd.io/en/latest/discovery.html\n"
        "for help resolving common import problems.\n"
    )

@ariebovenberg
Copy link
Owner

ariebovenberg commented Mar 23, 2024

Taking a step back, I'm considering putting this new behavior behind a CLI flag, similar to mypy's battle-tested approach

Rationale:

Namespace packages are (unfortunately) a niche feature and assuming non-__init__.py directories are namespaces can lead to unexpected problems for novice users.

imagine a common setup:

src/
   myproj/
      __init__.py
      foo.py
      bar.py

Running python -m slotscheck src/myproj/foo.py will implicitly import it as src.myproj.foo. So far so good if the file is simple. But as soon as foo.py includes something like from myproj.bar import ..., it will fail. Only those familiar with namespace packages will know what is going on.

Perhaps we can kill two birds with one stone by adding a --allow-imports-relative-to option. This allows configuring native namespace packages (explicitly) while also simplifying the docs on package discovery as well: no more messing with PYTHONPATH, just specify --allow-imports-relative-to to whatever directory you want.

What do you think?

@eltoder
Copy link
Contributor Author

eltoder commented Mar 23, 2024

FWIW, mypy works with implicit namespace packages out of the box. (EDIT: To clarify, it works when using -p option, which is different from -m of slotscheck; the latter unfortunately does not recurse into namespace subpackages. AFAICT this is not supported by pkgutil.)

Also note that in a typical case slotscheck will be run from the venv created for the project, where the project itself is installed as an editable package. This means that src will be in PYTHONPATH. With this PR we'll infer myproj.foo for src/myproj/foo.py whether or not there is src/myproj/__init__.py. The same thing will happen if you run outside of the venv, but add src to PYTHONPATH manually.

I did not quite get the idea of your option. Do you mind explaining what it does exactly? Or just feel free to add commits to this PR to implement it.

@eltoder eltoder force-pushed the feature/discover-namespace-packages branch from ea43705 to 9d6ae69 Compare March 24, 2024 14:15
@ariebovenberg
Copy link
Owner

FWIW, mypy works with implicit namespace packages out of the box.

It seems I didn't understand mypy's policy fully. I understand it works with namespace packages out of the box, but from the docs I assumed it'd be more conservative:

The default case. If --namespace-packages is on, but --explicit-package-bases is off, mypy will allow for the possibility that directories without __init__.py[i] are packages. Specifically, mypy will look at all parent directories of the file and use the location of the highest __init__.py[i] in the directory tree to determine the top-level package.

However in practice it still runs:

mkdir -p a/b/c
touch foo.py

mypy a/b/c/foo.py  # this runs

Your fix brings slotscheck closer to this behavior:

python -m slotscheck a/b/c/foo.py

👍 So I guess we're go then for this feature. In general I'm fine with bringing the module discovery capabilities in line with mypy—so long as it remains maintainable.

@ariebovenberg
Copy link
Owner

One niggle is that I'm getting a test failure locally which doesn't appear on CI:

pytest tests --pdb
==================================== test session starts ====================================
platform darwin -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /Users/arie/code/slotscheck
configfile: pyproject.toml
plugins: cov-4.1.0, mock-3.12.0
collected 174 items

tests/src/test_checks.py ............................................................ [ 34%]
...                                                                                   [ 36%]
tests/src/test_cli.py ...................................                             [ 56%]
tests/src/test_config.py ............................                                 [ 72%]
tests/src/test_discovery.py ..........F
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

self = <tests.src.test_discovery.TestModuleTree object at 0x106c52360>

    def test_implicitly_namespaced(self):
>       assert module_tree("implicitly_namespaced", None) == make_pkg(
            "implicitly_namespaced",
            Module("version"),
            make_pkg("module", Module("foo"), Module("bla")),
            make_pkg("another", Module("foo")),
        )

tests/src/test_discovery.py:231:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

module = 'implicitly_namespaced', expected_location = None

    def module_tree(
        module: ModuleName,
        expected_location: Optional[AbsPath],
    ) -> Union[ModuleTree, FailedImport]:
        """May raise ModuleNotFoundError or UnexpectedImportLocation"""
        try:
            spec = find_spec(module)
        except BaseException as e:
            return FailedImport(module, e)
        if spec is None:
            raise ModuleNotFoundError(f"No module named '{module}'", name=module)
        *namespaces, name = module.split(".")
        location = Path(spec.origin) if spec.has_location and spec.origin else None
        tree: ModuleTree
        if spec.submodule_search_locations is None:
            tree = Module(name)
        else:
>           assert len(spec.submodule_search_locations) == 1
E           AssertionError

src/slotscheck/discovery.py:175: AssertionError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>>>>>>
> /Users/arie/code/slotscheck/src/slotscheck/discovery.py(175)module_tree()
-> assert len(spec.submodule_search_locations) == 1
(Pdb) pp spec.submodule_search_locations
_NamespacePath(['/Users/arie/code/slotscheck/tests/examples/implicitly_namespaced', '/Users/arie/code/slotscheck/tests/examples/other/implicitly_namespaced'])

Any idea what could be causing this?

@eltoder
Copy link
Contributor Author

eltoder commented Mar 24, 2024

That's probably because I've split a part of the change into #230, but you still have tests/examples/other/implicitly_namespaced/bar.py in your local checkout. Do git status to see if you have untracked files.

@eltoder
Copy link
Contributor Author

eltoder commented Mar 24, 2024

However in practice it still runs:

I noticed the same thing. The default behavior does not match the documentation. mypy src works without any __init__.py files with the default settings, even though the documentation says you have to use mypy -p src or set --explicit-package-bases. However, mypy tests did not work for me: I had to either add __init__.py or use mypy -p tests. I'm guessing the logic is more complicated than documented, but I did not dig into the details.

👍 So I guess we're go then for this feature. In general I'm fine with bringing the module discovery capabilities in line with mypy—so long as it remains maintainable.

One important difference between mypy and slotscheck is that mypy does not rely on importing modules. As such, it has more flexibility and can use different configuration variables like MYPYPATH. slotscheck requires that all files can be imported, so PYTHONPATH has to be set correctly. Given this, I think it is OK to rely on PYTHONPATH for discovery.

@@ -308,7 +308,7 @@ def _module_parents(
if pp in sys_path:
return
yield pp
raise ValueError(f"File {p} is outside of PYTHONPATH ({sys.path})")
raise ModuleNotFoundError(f"No module named '{p.stem}'", name=p.stem)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that you want to reuse the logic in cli.py, but this error can be very misleading. ModuleNotFoundError means we could not resolve a "dotted.module.name". Here we could not find a module name for a file path. We were going in the opposite direction (file -> module instead of module -> file), which does not normally happen in Python. I suggest that we add a new exception type for this, keep the full file path, and handle it in cli.py similar to ModuleNotFoundError.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's do a custom exception 👍

@@ -39,6 +39,19 @@ def test_module_doesnt_exist(runner: CliRunner):
)


def test_python_file_not_in_sys_path(runner: CliRunner, tmpdir):
file = tmpdir / "foo.py"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like tmp_path is now recommended over tmpdir: https://docs.pytest.org/en/latest/how-to/tmp_path.html

It returns pathlib.Path instead of the legacy py.path.local.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about this! Excellent! love pathlib

@ariebovenberg
Copy link
Owner

That's probably because I've split a part of the change into #230, but you still have tests/examples/other/implicitly_namespaced/bar.py in your local checkout. Do git status to see if you have untracked files.

😉 I'm aware of git status—it appears to be the classic "git doesn't remove empty directories" issue. Found the cause with git clean

m.name # type: ignore
).submodule_search_locations
return _package(m.name, Path(subdir))
spec = m.module_finder.find_spec(m.name) # type: ignore
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's qualify this type: ignore with a category type: ignore[union-attr] or whatever the error is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #230

@@ -39,6 +39,19 @@ def test_module_doesnt_exist(runner: CliRunner):
)


def test_python_file_not_in_sys_path(runner: CliRunner, tmpdir):
file = tmpdir / "foo.py"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about this! Excellent! love pathlib

@@ -308,7 +308,7 @@ def _module_parents(
if pp in sys_path:
return
yield pp
raise ValueError(f"File {p} is outside of PYTHONPATH ({sys.path})")
raise ModuleNotFoundError(f"No module named '{p.stem}'", name=p.stem)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's do a custom exception 👍

@eltoder
Copy link
Contributor Author

eltoder commented Mar 25, 2024

@ariebovenberg Please take another look.

Copy link
Owner

@ariebovenberg ariebovenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👌 . I made a small addition to the docs, PR can be merged pending CI checks

@@ -164,7 +165,7 @@ def module_tree(
except BaseException as e:
return FailedImport(module, e)
if spec is None:
raise ModuleNotFoundError(f"No module named '{module}'", name=module)
raise ModuleNotFoundError(f"No module named {module!r}", name=module)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice.

@ariebovenberg ariebovenberg merged commit bd1b80c into ariebovenberg:main Mar 25, 2024
9 checks passed
@eltoder eltoder deleted the feature/discover-namespace-packages branch March 25, 2024 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants