-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design choices around name lookup #424
Comments
(b) and (e). These are restrictive choices, but result in the least ambiguous code. It is easier to remove restrictions while maintaining compatibility than go the other direction, if we decide to change later. Similarly, I don't think there should be any reuse of names between meta and regular variables. |
Thanks for your input @josh11b ! |
I prefer (a) and (d). In fact I'd be even more liberal about shadowing than @jsiek suggests: I'd also allow
without even requiring an inner block. Rationale for allowing shadowing: it's mostly because of my experience with error-handling code in Rust and in C++, assuming we go in the errors-are-values direction. In Rust an idiom like
is very common. This kind of code is also common when you're writing C++ using a style where errors are reported using something like Rationale for treating type names and variable names in the same namespace rather than in two separate namespaces: I think we've informally agreed to try to treat types as values, so I don't think there can be any really principled distinction between type names and variable names. What "types are values" means to me is that a type name just is a variable name whose value is a type and whose type is some kind of metatype. So I think that means that whatever rules we establish for shadowing between variables with types like |
@carbon-language/carbon-leads (just to start trial-running the new process) I agree with @josh11b at the moment. I feel quite strongly that we should have a single symbol table / namespace rather than separating variables, types, functions. I don't know that such separation has meaning with the direction we are pursuing with Carbon (much as @austern says), but even if we were to move away form that or find two categories where the named entities really are separate, and meaningfully so, I would still avoid tracking the names independently. I think Carbon will greatly benefit from having a simple and single name lookup, and being able to do things like name lookup even with incomplete (or completely missing) semantics. These kinds of freedoms require that we don't factor the kind into the name lookup. From that, I think we should have a single and consistent rule around shadowing as well, and both cases presented should follow the same rule. I prefer starting without local name shadowing. While I too have seen improved code patterns such as @austern describes by intentionally shadowing a name, I have also seen plenty of folks struggle to understand bugs / errors due to it as well. I would prefer to look at allowing local variable shadowing to support the idiom mentioned by @austern in the context of establishing that idiom and contrasted with alternatives such as language features to remove the need to use shadowing for this purpose. And similarly, we should consider specifically only allowing to rebind the name to a refined type rather than actual shadowing. I'm generally not opposed to this form of shadowing, I mostly think we should add it with intent attached to the specific use cases that motivate it rather than as a baseline acceptable pattern. Note that I think shadowing of names outside a function body may be important to support code evolution. This is something that IIRC, @zygoloid had looked at somewhat and he may be able to shed more light there. |
To soften my stance somewhat: shadowing it does make it easier to copy-paste code without having to rename to avoid conflicts. I still think on balance that I prefer no shadowing -- some work when writing code is worth making it easier to read and understand code since that happens significantly more often. To @austern 's point, we absolutely should have a good pattern for error handling, but I think it is premature to say that shadowing is the answer. My preference would be to have some sort of guard construct where there is a short scope handling the error branch where only an Here is a totally not-well-thought-out straw-man syntax:
|
@chandlerc Could you clarify what distinction you are making re: local vs. non-local name shadowing? Perhaps with examples of each? |
An example to highlight the distinction I'm drawing between local and non-local:
As for examples that help motivate allowing shadowing here, I'm recalling a discussion with @zygoloid and suspect he could better link to those or reproduce them -- my memory is a bit fuzzy here. |
Great discussion and examples all around... I'll chime in on one aspect. @chandlerc I'd prefer to not make a distinction between local vs. global shadowing because I think such a distinction would make Carbon more difficult to learn... every special case comes with a cost. |
Broadly I think both (a)+(d) and (b)+(e) are reasonable positions, and I think we have good motivation for discarding any other combinations of the presented options. I think we have strong data that disallowing certain kinds of shadowing is beneficial. So the choice is between disallowing shadowing, gathering usage data to see if there are particular pain points, and then perhaps permitting shadowing in cases where we think it's appropriate to do so (which may be everywhere), or allowing shadowing now and not doing that experiment, and likely continuing to allow shadowing forever. It's easy to evolve from no-shadowing to allowing shadowing, but hard to evolve in the opposite direction, so my inclination is to initially disallow all shadowing. There are some places where I expect this to be painful (these may be the local shadowing cases that @chandlerc was referring to; I don't recall exactly which cases we previously discussed):
One additional issue we've previously discussed is whether the declaration of the shadowing name or the use of the shadowing name is what results in the error, and our tentative direction had previously been that it's the use. For example, we don't want to reject an
Viewed this way, we could say: there is no name shadowing; instead, name lookup always looks in all enclosing scopes, and it's an error if the lookup is ambiguous. |
The thing I was remembering (and incorrectly, sorry for that) was the I particularly like the approach suggested at the end of @zygoloid's comment -- it remains restrictive in the same way as (b)+(e) but gives good techniques for solving some of the tricky cases where there are API conflicts that can't realistically be avoided at declaration time, but can be addressed by avoiding unqualified lookup entirely. Then we can learn how and where this restrictive model is still problematic (the examples given by @austern and @zygoloid both) and iterate on those use cases to figure out the best way to address them. Potentially that involves allowing some amount of shadowing, but I don't think we'll look for and solve these problems effectively without the pressure. Plus we don't really have any other good ways of solving the known problems with shadowing. If we immediately enable shadowing to solve one set of use cases, we give up any real chance of finding a good way to solve the problems introduced by shadowing. Want to hear from @KateGregory here as well though. |
I find Richard's logic compelling. Being able to relax if people bring genuine use cases feels smarter than trying to guess what use cases might occur. I think people accidentally shadow (or redeclare) and want to be told they did wrong, and I might even believe that's more common than deliberate shadowing because x is the only possible name for this thing. I'm drawn towards being compassionate to people who get muddled up by scope and don't understand they don't always need to declare things if those things were declared in outer scopes, or who know that but have a small mental buffer so forgot that |
I worry about the interaction of not allowing shadowing and refactoring. In particular, I want to be able to move a function without regard to the globals in scope at the destination. |
Would you still worry if the only possible globals in question are local to the file? For me, that makes collisions still possible but much less concerning in practice, but curious how it would (or wouldn't) influence your concerns here. |
@chandlerc it is still a problem because every time I move a function I have to search the entire file for shadows of all of my locals (or suffer a build break). This will interfere with seemingly safe refactors like inlining as well as function moving. |
I think you would only suffer build breaks and similar problems if you move the function into a (non-enclosing) scope that declares a conflicting name; you only need to search the scope(s) that you moved the function into, that it wasn't already in. Eg, if you move a member function from a derived class to a base class, where the classes are in the same scope, you need to check whether the base class has a member with the same name. So it should be less work than searching the entire file, unless you're moving the code between files. For inlining in particular, I think you pretty fundamentally need to deal with the possibility that the names in the function you're inlining are also declared at the call site, and it would not be good to reuse those names for something else. In general, when you move code about you already need to check that the names the code uses mean the same thing in the old and new context, and I'm not convinced that the rule we're talking about would add a huge amount of burden to that. In particular, given that our coding convention would mostly use |
I think you are correct about inlining. I actually think that moving functions (and classes) between files is a relatively common operation. I worry about adding friction to that as files can get pretty large in practice. On the other hand, globals are not extremely common. That said, I tend to find shadowing prevention overly aggressive and would prefer if we simply permit shadowing. |
I think we can narrow our consideration down to three rules. We can tweak the chosen rules if we find cases that need special handling; our goal here should be to establish a baseline approach to name lookup. The three are:
Under option (3), we will need a way to declare a name (for example, to implement an interface) without introducing possibly-conflicting names into scope. For example:
Option 3 gives us the most room to change our minds in the future, while addressing a known source of issues and not being too inventive or surprising. As a result, that's my preference, and the direction I propose that we take. |
I think we have consensus among the leads at this point to pursue the suggestion in @zygoloid's last comment (specifically option (3) there). When this turns into a proposal, it should work to cover the different use cases where we will need some syntax for avoiding shadowing. This includes the something like the suggested syntax from @zygoloid for interface implementations to use a qualified name, but also likely similar issues. For example:
There may be other similar examples, but the same kind of pattern as suggested in (3) should be followed by forming a qualified name rather than needing an unqualified name that would introduce shadowing. Closing this issue as I think we've got an answer for the core question here with pretty good consensus, but always happy to re-open if needed! |
The executable semantics currently makes a certain set of choices about name lookup,
but it would be good to get consensus on what the choices should be. (It should be
relatively easy to change the executable semantics to match the outcome of this
discussion.)
First, let's talk about variables with the same name:
Design options:
(a) the inner
x
overshadows the outerx
, so the return value is 1.(b) the program is rejected at the inner definition of
x
, we don't allowmultiple variables with the same name to be in-scope at the same time.
Next, let's talk about type names versus variable names.
Example 1:
Example 2:
Design options:
(c) type names (such as the
x
ofstruct x
) and variable names (such as thex
ofvar Int: x
) live in separate symbol tables and we know which symbol table to use based on the context of use of the name. Example 1 is valid and returns 0. Example 2 is valid and returns 0;(d) types names and variable names live in the same symbol table and we allow overshadowing.
Example 1 is valid and returns 0. Example 2 is rejected when
x
is used as a type invar x: y = ...
.(e) types names and variable names live in the same symbol table and we disallow overshadowing. Both example 1 and example 2 are rejected at the definition of
var Int: x = ...
.There is a similar question about the interaction between variables meant to be used for meta programming and regular variables.
Interested to hear peoples thoughts about these options or if there are other options that should be considered.
The text was updated successfully, but these errors were encountered: