Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix windows long paths #2566

Merged
merged 4 commits into from
Jun 19, 2020
Merged

Fix windows long paths #2566

merged 4 commits into from
Jun 19, 2020

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Jun 17, 2020

resolves #2558

Description

This PR makes long paths on windows work. The issue described in #2558 no longer occurs, though there are a couple caveats:

  • if a user runs dbt deps and the total path length including dbt_modules is >= 260, I think dbt will fail, because we'll try to cd into a directory during deps that will have a total length >= 260 and that isn't allowed ever, apparently. It will tell you the file path is too long, which is nice.
  • There is almost certainly some risk of dbt creating files that are very annoying for users to delete. The test checks that dbt clean successfully removes things from target, which satisfies me.
  • I haven't tested this on windows 8/8.1 or their server equivalents, though my understanding is that this syntax is ancient.

How this works:

First, this code checks that it needs to run at all. If the system isn't running windows, the path is under 250 chars (python likes to append things like '*.*' in various places), or the windows function that tells us long paths are enabled is set, we don't do anything to the path.

If the path needs to be massaged, dbt will prepend \\?\ (escaped to '\\\\?\\') to paths that we'll read/write from, which means we access it as a win32 File Namespace. This tells windows to skip path parsing, which bypasses MAX_PATH. There are some restrictions: Paths must be absolute, paths must not have any / in them, and something about UNC paths that I hope I got right 😬 Those caveats actually aren't a big deal for dbt.

I am happy to report that very new versions of windows 10/windows server 2019 with a certain registry key/group policy setting actually don't have this problem at all with long paths, because starting with 3.6, python opts out of MAX_PATH with a compile-time options, and those versions of Windows know how to handle that. So in 10 years or whatever it takes for us to drop support, we can just not care about this at all.

I found this blog post from project zero helpful. They're approaching it from a security standpoint but there are some very helpful charts. Note that we actually don't want the "NT Path", (it's distinct from the "win32 file namespace", so ignore that part). And we don't care about "drive-relative paths" because that's a cmd.exe thing.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Jun 17, 2020
@beckjake beckjake changed the base branch from dev/marian-anderson to dev/0.17.1 June 17, 2020 21:15
@beckjake beckjake force-pushed the fix/windows-long-paths branch 3 times, most recently from 2984db2 to 8337c6b Compare June 18, 2020 19:30
@beckjake beckjake changed the title Fix windows long paths [WIP] Fix windows long paths Jun 18, 2020
@beckjake beckjake marked this pull request as ready for review June 18, 2020 21:45
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is much about the need for this change that is unfortunate, but from what I can see, the changes here look excellent. I suspect the integration tests will be the best indicator of success - this LGTM when the tests pass (though I did have one question about them!).

):
# all our hard work and the path was still too long. Log and
# continue.
logger.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

temp_dir = magic_prefix + temp_dir
outer = tempfile.mkdtemp(prefix='dbt-int-test-', dir=temp_dir)
# then inside _that_ directory make a new one that gets us to just
# barely 260 total. I picked 250 to account for the '\' and anything
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we want to just barely eclipse 260 chars instead of going for, say, 300 chars? I just want to make sure I understand the logic behind this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! We have to os.chdir into the directory we build here, and the \\?\ doesn't work for os.chdir because the underlying windows APIs for that function don't handle it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i hate it

Copy link
Contributor

@kwigley kwigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for the comments around the windows nuances

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support very long path names for compiled artifacts on Windows
3 participants