Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to tag known messages with an error code #6152

Closed
wants to merge 11 commits into from

Conversation

chadrik
Copy link
Contributor

@chadrik chadrik commented Jan 6, 2019

Adds --show-error-codes for tagging known error messages (and their notes) with error codes. This is a stepping stone toward native message filtering (#2417), but it is useful on its own for users who want to implement their own basic message filtering without relying on brittle regex against messages. Also adds --list-error-codes to get a list of all known error codes.

Design goal

Provide a way of registering message ids that is:

  1. accessible from main, so that they can be listed with --list-error-codes
  2. concise
  3. able to handle the 3 types of error messages in a fairly consistent way:
    1. static messages, with no args to format: currently have constants in mypy.messages
    2. messages with basic string args to format: currently have constants in mypy.messages (mostly)
    3. messages with more complex formatting: currently have methods on MessageBuilder
  4. type safe for the code reporting the error, to avoid mistakes with the necessary arguments for the given message type

Current Design

The design that I settled on is this:

  • convert all messages to methods of MessageBuilder
  • use a @tracked decorator to identify which methods emit messages which should be tracked
    • by default, the name of the method is the message id, but this can be overridden
    • multiple methods can have the same id, which places them all into the same group, which can be filtered together

Questions

  • naming: I was a bit non-committal on the terminology to use: "message id", "error code"? Internally I went with "message id", but the cli uses "error code"
  • should there be a less verbose way of tracking constant error messages? I created a descriptor-based prototype of this that I did not include, which allowed simple messages to be added to MessageBuilder like this:
      no_return_exepected = ErrorReporter('No return value expected')
      invalid_implicit_return = ErrorReporter('Implicit return in function which does not return')
    this would be called like a normal MessageBuilder failure message: self.msg.invalid_implicit_return(ctx). I was concerned it was a bit too magical.

What's next

This PR is not complete. My idea is to finish converting message constants to MessageBuilder methods, as I already have with no_return_exepected and must_have_none_return_type, but I wanted to get feedback on the design before doing this tedious work.

Once this is complete we can begin expanding tracked messages and adding native filtering of tracked messages.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a real review, just some random comments (but I like the idea in general).

group=error_group)
error_group.add_argument(
'--list-error-codes', action='store_true',
help="List known error codes and exit")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why didn't you add tests for the new flags? I would be great to see how they work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to get tentative approval of the idea first. Sounds like I have that, so I'll do that shortly.

mypy/main.py Outdated
import mypy.messages
for msg_id in sorted(mypy.messages.MessageBuilder.get_message_ids()):
print(msg_id)
raise SystemExit(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SystemExit is never used in build.py and main.py. I think it can cause bad interference with mypy.api etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was wary of doing it, but without it we need a way of communicating back to main that it should exit there without doing anything. I figured SystemExit behavior for this flag is the same behavior as with --version.

The return of process_options is Tuple[List[BuildSource], Options]. I can make List[BuildSource] optional, and then check for that in main. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this by making List[BuildSource] optional.

help="Show error codes in error messages",
group=error_group)
error_group.add_argument(
'--list-error-codes', action='store_true',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should accept an optional pattern: mypy --list-error-codes return would show only error codes that have a "return" substring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

mypy/messages.py Outdated Show resolved Hide resolved
def has_no_attr(self, original_type: Type, typ: Type, member: str, context: Context) -> Type:
@tracked()
def no_return_exepected(self, context: Context) -> None:
self.fail('No return value expected', context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting all static messages to this format may cause lots of code churn, but in principle I am fine with this.

Another possible option is to have a global dictionary (or JSON file) mapping error message to error code:

message_to_id = {
    NO_RETURN_VALUE_EXPECTED: 'unexpected_return',
    MUST_HAVE_NONE_RETURN_TYPE: 'unexpected_return',
    MISSING_RETURN_STATEMENT: 'missing_return',
    ...
}

Copy link
Contributor Author

@chadrik chadrik Jan 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible option is to have a global dictionary (or JSON file) mapping error message to error code

I think that could be a good compromise, but it doesn't work for messages that require formatting so we'd have to be very clear about this. A developer might expect this to work with the message_to_id dict:

self.fail(messages.MUST_HAVE_NONE_RETURN_TYPE.format(fdef.name()), item)

We can protect against this in a few ways:

  • clear comments near message_to_id documenting that this is only for static messages
  • validate message_to_id keys and raise an internal error if formatting operations {} are found.
  • add a variation of MessageBuilder.fail, or a flag, that raises an exception if the message is not in message_to_id (I'm assuming here that we're not going to get to 100% message id coverage in the near term)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a commit inspired your message_to_id idea, which addresses most of my concerns about type safety. I used an Enum, where the names are ids and the values are the messages. This is used for messages that are static or require only basic formatting (though I haven transitioned all of the code yet). If you like it, I'll finish it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilevkivskyi Sorry to bug you. Can I get a sign off on the idea of using Enum? If that sounds like a good idea to you I think I can take this PR to completion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP, I will think more about this tomorrow and let you know, I was really busy at work last two days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything you've written makes sense. The main problem is that I can't get over the notion that the message_registry dictionary is going to be awfully redundant and tedious to maintain, since it's mostly going to be the message constant followed by a lowercase string of the message constant's name (depending on how broad vs granular we are with grouping) :

message_registry = {
    NO_RETURN_VALUE_EXPECTED: 'no_return_value_expected',
    MISSING_RETURN_STATEMENT: 'missing_return_statement',
    INVALID_IMPLICIT_RETURN: 'invalid_implicit_return',
    INCOMPATIBLE_RETURN_VALUE_TYPE: 'incompatible_return_type',
    RETURN_VALUE_EXPECTED: 'return_value_expected',
    NO_RETURN_EXPECTED: 'no_return_expected',
    INVALID_EXCEPTION: 'invalid_exception',
    INVALID_EXCEPTION_TYPE: 'invalid_exception',
    RETURN_IN_ASYNC_GENERATOR: 'return_in_async_generator',
}

I also don't like that the message and the id are stored in different locations. I think making the constants into basic data classes would be an improvement:

NO_RETURN_VALUE_EXPECTED = Msg('No return value expected', 'no_return_value_expected')  # type: Final
MISSING_RETURN_STATEMENT = Msg('Missing return statement', 'missing_return_statement')  # type: Final

But I'd really love to design a system where the constant doubles as the id, unless it's overridden by a group definition. If enums won't work, the only other way I can think of to do this is with a descriptor. Should I just give up on this idea?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'd really love to design a system where the constant doubles as the id

Why not move all the constants into a dedicated module, and iterate over the module namespace and calculate the id based on the attribute name? This sort of metaprogramming should be okay with mypyc. Something like this:

...
FOO_BAR = Msg('Foo bar')
MSG_B = Msg('Another message')

for name, val in list(globals().items()):
    if isinstance(val, Msg) and val.id is None:
        val.id = name.lower()

Copy link
Member

@ilevkivskyi ilevkivskyi Jan 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chadrik I like what Jukka proposed. We should probably still support passing bare string to self.fail(), this will allow:

  • Not breaking existing plugins
  • Gradual transition to the new scheme for existing error messages (I strongly prefer two-three smaller PRs, than one large).

Passing a bare string would turn it into Msg(string, 'uncategorized_error').

Btw, the module may be named message_registry.py :-)

Also as I said above it should be clearly documented (probably both in the new module docstring and in docstring for self.register()) that there are two ways to register an error message: simple static and format strings should just go as constants in the module (and nothing else), error messages with complex formatting logic can be manually registered with @register().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradual transition to the new scheme for existing error messages (I strongly prefer two-three smaller PRs, than one large)

How about I start with a new PR that moves all of the existing constants into mypy.message_registry with no other changes? That seems to be one part of the design we've all come to agreement on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't see how this is important now. Anyway, we agreed that @JukkaL will follow on this and related PRs.

@ilevkivskyi
Copy link
Member

  1. messages with basic string args to format: currently have constants in mypy.messages (mostly)

Maybe we should fix the "(mostly)" part first? (I mean finish moving the error message constants to mypy.messages.)

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 11, 2019

I think that we should discuss how to represent error codes in abstract before looking at merging any code. This is a fairly big change, and it would be good to consider this from a few different viewpoints.

Here are some questions to consider (some of these were raised by Ivan above):

  • How are new static messages defined?
  • How are new runtime constructed messages defined?
  • What would the error message ids look like in mypy output or in the config file (give examples for a few common messages)?
  • How would the message ids be shown in mypy output?
  • At what granularity should the message ids be (i.e., should we group similar messages under one id)?
  • How well does this the approach work with mypyc? In particular, mypyc is bad at compiling certain Python idioms such as decorated methods and enums. We should perhaps either use those idioms sparingly or first improve how mypyc handles them. This can be important since there are many error messages, so whatever approach we choose it will probably have many instances in the code.
  • How can user plugins define new error messages?
  • Do we need to do something to avoid conflicts between different plugins (e.g. both define a similar message)?
  • Does the proposed approach slow down mypy startup? This is probably not an issue, but in case we construct some complex data structures this might plausibly have a measurable effect.

Iterating on a big PR is a lot of work, especially if we need to go through multiple iterations. So I'd suggest creating an issue that describes the current proposed approach in some detail and we can look for answers to the above questions. We can move on from there once we have a clearer picture of the tradeoffs.

@ilevkivskyi
Copy link
Member

IIUC this is now replaced by #6472, so closing this PR in favour of the latter.

@ilevkivskyi ilevkivskyi closed this May 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants