Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design choices around name lookup #424

Closed
jsiek opened this issue Mar 31, 2021 · 18 comments
Closed

Design choices around name lookup #424

jsiek opened this issue Mar 31, 2021 · 18 comments
Labels
leads question A question for the leads team

Comments

@jsiek
Copy link
Contributor

jsiek commented Mar 31, 2021

The executable semantics currently makes a certain set of choices about name lookup,
but it would be good to get consensus on what the choices should be. (It should be
relatively easy to change the executable semantics to match the outcome of this
discussion.)

First, let's talk about variables with the same name:

fn main() -> Int {
  var Int: x = 0;
  {
    var Int: x = 1;
    return x;
  }
}

Design options:
(a) the inner x overshadows the outer x, so the return value is 1.
(b) the program is rejected at the inner definition of x, we don't allow
multiple variables with the same name to be in-scope at the same time.

Next, let's talk about type names versus variable names.
Example 1:

struct x {
  var Int: field;
};

fn main() -> Int {
  var Int: x = 0;
  return x;
}

Example 2:

struct x {
  var Int: field;
};

fn main() -> Int {
  var Int: x = 0;
  var x: y = x(.field = x);
  return y.field;
}

Design options:
(c) type names (such as the x of struct x) and variable names (such as the x of var Int: x) live in separate symbol tables and we know which symbol table to use based on the context of use of the name. Example 1 is valid and returns 0. Example 2 is valid and returns 0;
(d) types names and variable names live in the same symbol table and we allow overshadowing.
Example 1 is valid and returns 0. Example 2 is rejected when x is used as a type in var x: y = ....
(e) types names and variable names live in the same symbol table and we disallow overshadowing. Both example 1 and example 2 are rejected at the definition of var Int: x = ....

There is a similar question about the interaction between variables meant to be used for meta programming and regular variables.

Interested to hear peoples thoughts about these options or if there are other options that should be considered.

@josh11b
Copy link
Contributor

josh11b commented Mar 31, 2021

(b) and (e). These are restrictive choices, but result in the least ambiguous code. It is easier to remove restrictions while maintaining compatibility than go the other direction, if we decide to change later. Similarly, I don't think there should be any reuse of names between meta and regular variables.

@jsiek
Copy link
Contributor Author

jsiek commented Mar 31, 2021

Thanks for your input @josh11b !

@austern
Copy link
Contributor

austern commented Mar 31, 2021

I prefer (a) and (d). In fact I'd be even more liberal about shadowing than @jsiek suggests: I'd also allow

fn main() -> Int {
  var Int: x = 0;
  var Int: x = 1;
  return x;
}

without even requiring an inner block.

Rationale for allowing shadowing: it's mostly because of my experience with error-handling code in Rust and in C++, assuming we go in the errors-are-values direction. In Rust an idiom like

let f = File::open("foo.txt");
let f = f.expect("Can't open file");

is very common. This kind of code is also common when you're writing C++ using a style where errors are reported using something like std::optional or absl::StatusOr, but it's more annoying because you need to come up with two separate names like f and f_or_error.

Rationale for treating type names and variable names in the same namespace rather than in two separate namespaces: I think we've informally agreed to try to treat types as values, so I don't think there can be any really principled distinction between type names and variable names. What "types are values" means to me is that a type name just is a variable name whose value is a type and whose type is some kind of metatype. So I think that means that whatever rules we establish for shadowing between variables with types like Result and File should be identical to the rules for shadowing between Int and Type.

@chandlerc
Copy link
Contributor

@carbon-language/carbon-leads (just to start trial-running the new process)

I agree with @josh11b at the moment.

I feel quite strongly that we should have a single symbol table / namespace rather than separating variables, types, functions. I don't know that such separation has meaning with the direction we are pursuing with Carbon (much as @austern says), but even if we were to move away form that or find two categories where the named entities really are separate, and meaningfully so, I would still avoid tracking the names independently.

I think Carbon will greatly benefit from having a simple and single name lookup, and being able to do things like name lookup even with incomplete (or completely missing) semantics. These kinds of freedoms require that we don't factor the kind into the name lookup.

From that, I think we should have a single and consistent rule around shadowing as well, and both cases presented should follow the same rule.

I prefer starting without local name shadowing. While I too have seen improved code patterns such as @austern describes by intentionally shadowing a name, I have also seen plenty of folks struggle to understand bugs / errors due to it as well. I would prefer to look at allowing local variable shadowing to support the idiom mentioned by @austern in the context of establishing that idiom and contrasted with alternatives such as language features to remove the need to use shadowing for this purpose. And similarly, we should consider specifically only allowing to rebind the name to a refined type rather than actual shadowing. I'm generally not opposed to this form of shadowing, I mostly think we should add it with intent attached to the specific use cases that motivate it rather than as a baseline acceptable pattern.

Note that I think shadowing of names outside a function body may be important to support code evolution. This is something that IIRC, @zygoloid had looked at somewhat and he may be able to shed more light there.

@josh11b
Copy link
Contributor

josh11b commented Apr 1, 2021

To soften my stance somewhat: shadowing it does make it easier to copy-paste code without having to rename to avoid conflicts. I still think on balance that I prefer no shadowing -- some work when writing code is worth making it easier to read and understand code since that happens significantly more often.

To @austern 's point, we absolutely should have a good pattern for error handling, but I think it is premature to say that shadowing is the answer. My preference would be to have some sort of guard construct where there is a short scope handling the error branch where only an error symbol is introduced but always returns or produces a legal non-error value, and after that scope you have the good name with only non-error values.

Here is a totally not-well-thought-out straw-man syntax:

let f = File::open("foo.txt") except e: return e.WithContext("Can't open file");

@josh11b
Copy link
Contributor

josh11b commented Apr 1, 2021

@chandlerc Could you clarify what distinction you are making re: local vs. non-local name shadowing? Perhaps with examples of each?

@chandlerc
Copy link
Contributor

@chandlerc Could you clarify what distinction you are making re: local vs. non-local name shadowing? Perhaps with examples of each?

An example to highlight the distinction I'm drawing between local and non-local:

fn GlobalFunction() {
  // ...
}

fn OtherFunction() {
  var Int: x = 42;

  if (...) {
    // This variable shadows a non-local (to this function) name:
    var Bool:$$ GlobalFunction = true;

    // This variable shadows a local (to this function) name:
    var Float: x = 1.0;
    // ...
  }
}

As for examples that help motivate allowing shadowing here, I'm recalling a discussion with @zygoloid and suspect he could better link to those or reproduce them -- my memory is a bit fuzzy here.

@jsiek
Copy link
Contributor Author

jsiek commented Apr 1, 2021

Great discussion and examples all around...

I'll chime in on one aspect. @chandlerc I'd prefer to not make a distinction between local vs. global shadowing because I think such a distinction would make Carbon more difficult to learn... every special case comes with a cost.

@zygoloid
Copy link
Contributor

zygoloid commented Apr 1, 2021

Broadly I think both (a)+(d) and (b)+(e) are reasonable positions, and I think we have good motivation for discarding any other combinations of the presented options.

I think we have strong data that disallowing certain kinds of shadowing is beneficial. So the choice is between disallowing shadowing, gathering usage data to see if there are particular pain points, and then perhaps permitting shadowing in cases where we think it's appropriate to do so (which may be everywhere), or allowing shadowing now and not doing that experiment, and likely continuing to allow shadowing forever. It's easy to evolve from no-shadowing to allowing shadowing, but hard to evolve in the opposite direction, so my inclination is to initially disallow all shadowing.

There are some places where I expect this to be painful (these may be the local shadowing cases that @chandlerc was referring to; I don't recall exactly which cases we previously discussed):

  • where the captures of a closure shadow the variables they capture (in C++, [x = std::move(x)] { ... })
  • where the parameters of a constructor shadow the members of a class (but this would only be an issue if the member names are in scope for unqualified lookup, which they wouldn't be if we require explicit self. qualification)

One additional issue we've previously discussed is whether the declaration of the shadowing name or the use of the shadowing name is what results in the error, and our tentative direction had previously been that it's the use. For example, we don't want to reject an impl of an interface solely because it defines a method whose name collides with that of something in an enclosing scope:

fn foo() {}
struct X {}

impl Thing for X {
  // this should be OK...
  fn foo() {}
  // ... but maybe this is an error (you could use `Thing.foo()` or something like
  // that to name the `foo` from the line above)
  fn bar() { foo(); }
}

Viewed this way, we could say: there is no name shadowing; instead, name lookup always looks in all enclosing scopes, and it's an error if the lookup is ambiguous.

@chandlerc
Copy link
Contributor

The thing I was remembering (and incorrectly, sorry for that) was the impl example you give @zygoloid -- thanks for that.

I particularly like the approach suggested at the end of @zygoloid's comment -- it remains restrictive in the same way as (b)+(e) but gives good techniques for solving some of the tricky cases where there are API conflicts that can't realistically be avoided at declaration time, but can be addressed by avoiding unqualified lookup entirely.

Then we can learn how and where this restrictive model is still problematic (the examples given by @austern and @zygoloid both) and iterate on those use cases to figure out the best way to address them. Potentially that involves allowing some amount of shadowing, but I don't think we'll look for and solve these problems effectively without the pressure.

Plus we don't really have any other good ways of solving the known problems with shadowing. If we immediately enable shadowing to solve one set of use cases, we give up any real chance of finding a good way to solve the problems introduced by shadowing.

Want to hear from @KateGregory here as well though.

@KateGregory
Copy link
Contributor

I find Richard's logic compelling. Being able to relax if people bring genuine use cases feels smarter than trying to guess what use cases might occur. I think people accidentally shadow (or redeclare) and want to be told they did wrong, and I might even believe that's more common than deliberate shadowing because x is the only possible name for this thing. I'm drawn towards being compassionate to people who get muddled up by scope and don't understand they don't always need to declare things if those things were declared in outer scopes, or who know that but have a small mental buffer so forgot that limit or whatever is already around from before. Will we have such people in Carbon? I hope so, because eventually people will come who don't have a strong background in one of the languages we're drawing on. And I don't want to do things that will make life more difficult for them, without particular benefit for the ones who might deliberately shadow. Starting restrictive means we must be shown real benefit before we relax, so if the benefit never comes along we never relax.

@fowles
Copy link

fowles commented Apr 5, 2021

I worry about the interaction of not allowing shadowing and refactoring. In particular, I want to be able to move a function without regard to the globals in scope at the destination.

@chandlerc
Copy link
Contributor

I worry about the interaction of not allowing shadowing and refactoring. In particular, I want to be able to move a function without regard to the globals in scope at the destination.

Would you still worry if the only possible globals in question are local to the file?

For me, that makes collisions still possible but much less concerning in practice, but curious how it would (or wouldn't) influence your concerns here.

@fowles
Copy link

fowles commented May 11, 2021

@chandlerc it is still a problem because every time I move a function I have to search the entire file for shadows of all of my locals (or suffer a build break). This will interfere with seemingly safe refactors like inlining as well as function moving.

@zygoloid
Copy link
Contributor

I think you would only suffer build breaks and similar problems if you move the function into a (non-enclosing) scope that declares a conflicting name; you only need to search the scope(s) that you moved the function into, that it wasn't already in. Eg, if you move a member function from a derived class to a base class, where the classes are in the same scope, you need to check whether the base class has a member with the same name. So it should be less work than searching the entire file, unless you're moving the code between files.

For inlining in particular, I think you pretty fundamentally need to deal with the possibility that the names in the function you're inlining are also declared at the call site, and it would not be good to reuse those names for something else.

In general, when you move code about you already need to check that the names the code uses mean the same thing in the old and new context, and I'm not convinced that the rule we're talking about would add a huge amount of burden to that. In particular, given that our coding convention would mostly use UpperCamelCase for names outside a function and lower_snake_case for names introduced inside a function, it seems to me that we're unlikely to see collisions of this kind that aren't collisions between local variables (which I think tools will need to handle anyway in order to produce acceptable results).

@fowles
Copy link

fowles commented May 14, 2021

I think you are correct about inlining. I actually think that moving functions (and classes) between files is a relatively common operation. I worry about adding friction to that as files can get pretty large in practice. On the other hand, globals are not extremely common. That said, I tend to find shadowing prevention overly aggressive and would prefer if we simply permit shadowing.

@zygoloid
Copy link
Contributor

I think we can narrow our consideration down to three rules. We can tweak the chosen rules if we find cases that need special handling; our goal here should be to establish a baseline approach to name lookup. The three are:

  1. Name shadowing is valid; names in an inner scope hide names from outer scopes. (Possible tweak: names declared later in a scope can even hide names declared earlier in the same scope.)
  2. Name shadowing is valid but inner names don't hide outer names; instead, any use of a name declared in multiple enclosing scopes is invalid due to ambiguity.
  3. Name shadowing is invalid: a name cannot be declared if there is another declaration in an enclosing scope or a conflicting declaration (that is, not a function overload and not a redeclaration) in the same scope.

Under option (3), we will need a way to declare a name (for example, to implement an interface) without introducing possibly-conflicting names into scope. For example:

struct Hash { var how_much_corned_beef: Weight; };
impl Hash as Hashable {
  // would be an error due to shadowing
  fn Hash[me: Self]() -> HashCode { ... }
  // OK
  fn Hashable.Hash[me: Self]() -> HashCode { ... }
}

Option 3 gives us the most room to change our minds in the future, while addressing a known source of issues and not being too inventive or surprising. As a result, that's my preference, and the direction I propose that we take.

@chandlerc
Copy link
Contributor

I think we have consensus among the leads at this point to pursue the suggestion in @zygoloid's last comment (specifically option (3) there).

When this turns into a proposal, it should work to cover the different use cases where we will need some syntax for avoiding shadowing. This includes the something like the suggested syntax from @zygoloid for interface implementations to use a qualified name, but also likely similar issues. For example:

package Abseil api

// Newly added import from a poorly named package `Map`.
import Map library "hashtables"

// Still need to export a name `Map` from the `Abseil` package.
// We use the package name itself to form a qualified name.
api struct Abseil.Map {
  // ...
}

There may be other similar examples, but the same kind of pattern as suggested in (3) should be followed by forming a qualified name rather than needing an unqualified name that would introduce shadowing.

Closing this issue as I think we've got an answer for the core question here with pretty good consensus, but always happy to re-open if needed!

@jonmeow jonmeow added the leads question A question for the leads team label Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leads question A question for the leads team
Projects
None yet
Development

No branches or pull requests

8 participants