-
-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reject invalid escape sequences (and octal escape sequences) in bytes and Unicode strings #98401
Comments
I created PR #98404 to implement this change. This change should help to catch some mistakes in regular expressions and Windows paths. Example with PR #98404 which now raises SyntaxError:
Raw strings
|
See also discussion in #77093, and the reasons that the |
When this warning was introduced, there were no any plans of making it an error in the near future. It was planned as a warning with very long period. The specific of this warning is that it is not emitted for the code containing this specific kind of bug, but only emitted for the code which does not contain a bug. But which may be in near proximity of the code which contains a bug (and does not emit a warning itself). Perhaps it is a time to make it more visible (convert it into SyntaxWarning). And then after other long period of 4-5 versions it could be converted into SyntaxError. |
It first went into Python 3.6, now EOL, so all supported Python versions have the deprecation warning. Serhiy suggested in #71551 (comment) a
So that was a suggestion of 2 releases from deprecation to syntax warning. If we do a syntax warning now, that will have already been 6 releases. I don't know what the original estimate of 3.8 -> 4.0 was? Also two releases? Guido suggested in #71551 (comment) for "several" releases before the error.
Six releases (3.6 -> 3.12) is perhaps several? The original issue also had suggestions from Victor and Guido to contact linters/PyCQA to include a warning to help projects prepare, and the original author did so. And the good news is it was added to pycodestyle (part of Flake8) as W605 in April 2018, so that's 4.5 years of linter warnings. (Thanks to this, I've fixed invalid escape sequences in several packages.) @vstinner Would it be worth running Depending on the result, it may be worth promoting to a |
Ok, let's start with replacing DeprecationWarning (silent by default) with SyntaxWarning (displayed once by default): PR #99011. |
A backslash-character pair that is not a valid escape sequence now generates a SyntaxWarning, instead of DeprecationWarning. For example, re.compile("\d+\.\d+") now emits a SyntaxWarning ("\d" is an invalid escape sequence), use raw strings for regular expression: re.compile(r"\d+\.\d+"). In a future Python version, SyntaxError will eventually be raised, instead of SyntaxWarning. Octal escapes with value larger than 0o377 (ex: "\477"), deprecated in Python 3.11, now produce a SyntaxWarning, instead of DeprecationWarning. In a future Python version they will be eventually a SyntaxError. codecs.escape_decode() and codecs.unicode_escape_decode() are left unchanged: they still emit DeprecationWarning. * The parser only emits SyntaxWarning for Python 3.12 (feature version), and still emits DeprecationWarning on older Python versions. * Fix SyntaxWarning by using raw strings in Tools/c-analyzer/ and wasm_build.py.
Fixed by a60ddd3 At the end, it remains a warning, but SyntaxWarning (showed by default) is now emitted instead of DeprecationWarning (silent by default). According to @hugovk, sadly many projects of the PyPI top 5000 contain invalid escape sequences. It will take time to update them, before considering to convert the SyntaxWarning to an SyntaxError. Thanks for everybody who helped me on making this change possible! For affected projects: just add |
In case of regular expressions, I'm wondering if we'll be forced to roll back these changes before release again due to a lot of warnings in third-party code. |
I created https://discuss.python.org/t/collaboration-on-handling-python-3-12-incompatible-changes-distutils-removal-invalid-escape-escape-etc/20721 to discuss this change. |
If you work on windows and have the habit of documenting your code inside trippelquotes, (A company requirement) you now get a warning whenever a directory name starts on a d because ex: """C:\dist\project\file.dat""" gives a SyntaxWarning. And not to mention the regexp issue that have a fix. More work to port a 3.11 project to 3.12 than moving it from python2 to python3 |
Just replace |
Is there a tool that modifies the problematic regular expression? |
Loading azure-devops/azext_devops/devops_sdk/*/feed/models.py with Python 3.12 produces the following warnings: /path/to/azure-devops/azext_devops/devops_sdk/v5_0/feed/models.py:101: SyntaxWarning: invalid escape sequence '\,' """FeedCore. /path/to/azure-devops/azext_devops/devops_sdk/v5_0/feed/models.py:209: SyntaxWarning: invalid escape sequence '\,' """FeedUpdate. /path/to/azure-devops/azext_devops/devops_sdk/v5_0/feed/models.py:985: SyntaxWarning: invalid escape sequence '\,' """Feed. This occurs due to the presence of the invalid escape sequence `\,` in docstrings for FeedCore, FeedUpdate, and Feed, which produces a SyntaxWarning in Python 3.12 as a result of <python/cpython#98401>. Signed-off-by: Kevin Locke <[email protected]>
In Python, only a [limited set of characters][0] can be escaped with a backslash. In recent versions of Python, attempting to escape a non-escapeable character [raises a SyntaxWarning][1], polluting the playbook output with warnings like: <unknown>:59: SyntaxWarning: invalid escape sequence '\(' <unknown>:60: SyntaxWarning: invalid escape sequence '\.' <unknown>:61: SyntaxWarning: invalid escape sequence '\.' This commit adds the string literal prefix 'r' to regular expressions in pvesh.py to ensure that escape sequences are not interpreted in the given strings. As there were no valid escape sequences in those strings to begin with, the actual string content remains the same. [0]: https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences [1]: python/cpython#98401
This was inadvertently introduced in NixOS#281639, but was not a loud warning until Python 3.12 made invalid escape sequences a `SyntaxWarning` instead of a `DeprecationWarning` in python/cpython#98401.
This was inadvertently introduced in NixOS#281639, but was not a loud warning until Python 3.12 made invalid escape sequences a `SyntaxWarning` instead of a `DeprecationWarning` in python/cpython#98401.
In Python, only a [limited set of characters][0] can be escaped with a backslash. In recent versions of Python, attempting to escape a non-escapeable character [raises a SyntaxWarning][1], polluting the playbook output with warnings like: <unknown>:59: SyntaxWarning: invalid escape sequence '\(' <unknown>:60: SyntaxWarning: invalid escape sequence '\.' <unknown>:61: SyntaxWarning: invalid escape sequence '\.' This commit adds the string literal prefix 'r' to regular expressions in pvesh.py to ensure that escape sequences are not interpreted in the given strings. As there were no valid escape sequences in those strings to begin with, the actual string content remains the same. [0]: https://docs.python.org/3/reference/lexical_analysis.html#escape-sequences [1]: python/cpython#98401
Was seeing a lot of SyntaxWarnings when building as some regex strings were not raw strings and thus were showing invalid escape sequences due to (python/cpython#98401). These are just warnings now on Python 3.12, but will become errors in the future. Fixing now to prevent builds breaking. It seems the files are mostly following this convention now, so this makes the non raw strings consistent. Change-Id: Ia8eacfd353c41a6c1d8e0482f3b7fbd4dc7a93e1 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5832811 Reviewed-by: Mihai Sardarescu <[email protected]> Commit-Queue: Andrew Grieve <[email protected]> Reviewed-by: Andrew Grieve <[email protected]> Cr-Commit-Position: refs/heads/main@{#1352933} CrOS-Libchrome-Original-Commit: 31f4229550ffa8f15f7a5da9b67a84cc4f4eb757
Was seeing a lot of SyntaxWarnings when building as some regex strings were not raw strings and thus were showing invalid escape sequences due to (python/cpython#98401). These are just warnings now on Python 3.12, but will become errors in the future. Fixing now to prevent builds breaking. It seems the files are mostly following this convention now, so this makes the non raw strings consistent. Change-Id: Ia8eacfd353c41a6c1d8e0482f3b7fbd4dc7a93e1 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5832811 Reviewed-by: Mihai Sardarescu <[email protected]> Commit-Queue: Andrew Grieve <[email protected]> Reviewed-by: Andrew Grieve <[email protected]> Cr-Commit-Position: refs/heads/main@{#1352933} NOKEYCHECK=True GitOrigin-RevId: 31f4229550ffa8f15f7a5da9b67a84cc4f4eb757
Per https://docs.python.org/3/whatsnew/3.12.html: A backslash-character pair that is not a valid escape sequence now generates a SyntaxWarning, instead of DeprecationWarning. See python/cpython#98401 The warnings without this change are: attic/calc_tickadj/calc_tickadj:74: SyntaxWarning: invalid escape sequence '\d' contrib/cpu-temp-log:60: SyntaxWarning: invalid escape sequence '\s'
I stumbled on this thread after getting tired of having the same warning pop up over and over again every time I paste a Windows-formatted path into a notebook or script. Looking at the 'What's New in Python 3.12' article on the Python website, it seems that '[i]n a future Python version, SyntaxError will eventually be raised, instead of SyntaxWarning'. If that is the case, then I would strongly advise raising a warning that is more descriptive than just Additionally, is it really worth breaking so much existing code, and making it more difficult to copy and paste Windows paths, or even write a simple string like "You need to sleep and/or rest!", to achieve whatever objective this change is meant to achieve? |
In Python 3.6, invalid escape sequence were deprecated in string literals (bytes and str): issue #71551, commit 110b6fe.
What's New in Python 3.6: Deprecated Python behavior:
I propose now raises a SyntaxError, rather than a DeprecationWarning (which is silent in most cases).
Example:
Note: Python REPL ate some DeprecationWarning which makes manual testing harder. It was fixed last month by commit 426d72e in issue gh-96052.
Python 3.11 now emits a deprecation warning for invalid octal escape sequence (issue gh-81548):
Example:
The text was updated successfully, but these errors were encountered: