Design choices around name lookup #424

jsiek · 2021-03-31T18:48:32Z

The executable semantics currently makes a certain set of choices about name lookup,
but it would be good to get consensus on what the choices should be. (It should be
relatively easy to change the executable semantics to match the outcome of this
discussion.)

First, let's talk about variables with the same name:

fn main() -> Int {
  var Int: x = 0;
  {
    var Int: x = 1;
    return x;
  }
}

Design options:
(a) the inner x overshadows the outer x, so the return value is 1.
(b) the program is rejected at the inner definition of x, we don't allow
multiple variables with the same name to be in-scope at the same time.

Next, let's talk about type names versus variable names.
Example 1:

struct x {
  var Int: field;
};

fn main() -> Int {
  var Int: x = 0;
  return x;
}

Example 2:

struct x {
  var Int: field;
};

fn main() -> Int {
  var Int: x = 0;
  var x: y = x(.field = x);
  return y.field;
}

Design options:
(c) type names (such as the x of struct x) and variable names (such as the x of var Int: x) live in separate symbol tables and we know which symbol table to use based on the context of use of the name. Example 1 is valid and returns 0. Example 2 is valid and returns 0;
(d) types names and variable names live in the same symbol table and we allow overshadowing.
Example 1 is valid and returns 0. Example 2 is rejected when x is used as a type in var x: y = ....
(e) types names and variable names live in the same symbol table and we disallow overshadowing. Both example 1 and example 2 are rejected at the definition of var Int: x = ....

There is a similar question about the interaction between variables meant to be used for meta programming and regular variables.

Interested to hear peoples thoughts about these options or if there are other options that should be considered.

The text was updated successfully, but these errors were encountered:

josh11b · 2021-03-31T19:42:34Z

(b) and (e). These are restrictive choices, but result in the least ambiguous code. It is easier to remove restrictions while maintaining compatibility than go the other direction, if we decide to change later. Similarly, I don't think there should be any reuse of names between meta and regular variables.

jsiek · 2021-03-31T20:41:40Z

Thanks for your input @josh11b !

austern · 2021-03-31T22:04:45Z

I prefer (a) and (d). In fact I'd be even more liberal about shadowing than @jsiek suggests: I'd also allow

fn main() -> Int {
  var Int: x = 0;
  var Int: x = 1;
  return x;
}

without even requiring an inner block.

Rationale for allowing shadowing: it's mostly because of my experience with error-handling code in Rust and in C++, assuming we go in the errors-are-values direction. In Rust an idiom like

let f = File::open("foo.txt");
let f = f.expect("Can't open file");

is very common. This kind of code is also common when you're writing C++ using a style where errors are reported using something like std::optional or absl::StatusOr, but it's more annoying because you need to come up with two separate names like f and f_or_error.

Rationale for treating type names and variable names in the same namespace rather than in two separate namespaces: I think we've informally agreed to try to treat types as values, so I don't think there can be any really principled distinction between type names and variable names. What "types are values" means to me is that a type name just is a variable name whose value is a type and whose type is some kind of metatype. So I think that means that whatever rules we establish for shadowing between variables with types like Result and File should be identical to the rules for shadowing between Int and Type.

chandlerc · 2021-04-01T06:04:21Z

@carbon-language/carbon-leads (just to start trial-running the new process)

I agree with @josh11b at the moment.

I feel quite strongly that we should have a single symbol table / namespace rather than separating variables, types, functions. I don't know that such separation has meaning with the direction we are pursuing with Carbon (much as @austern says), but even if we were to move away form that or find two categories where the named entities really are separate, and meaningfully so, I would still avoid tracking the names independently.

I think Carbon will greatly benefit from having a simple and single name lookup, and being able to do things like name lookup even with incomplete (or completely missing) semantics. These kinds of freedoms require that we don't factor the kind into the name lookup.

From that, I think we should have a single and consistent rule around shadowing as well, and both cases presented should follow the same rule.

I prefer starting without local name shadowing. While I too have seen improved code patterns such as @austern describes by intentionally shadowing a name, I have also seen plenty of folks struggle to understand bugs / errors due to it as well. I would prefer to look at allowing local variable shadowing to support the idiom mentioned by @austern in the context of establishing that idiom and contrasted with alternatives such as language features to remove the need to use shadowing for this purpose. And similarly, we should consider specifically only allowing to rebind the name to a refined type rather than actual shadowing. I'm generally not opposed to this form of shadowing, I mostly think we should add it with intent attached to the specific use cases that motivate it rather than as a baseline acceptable pattern.

Note that I think shadowing of names outside a function body may be important to support code evolution. This is something that IIRC, @zygoloid had looked at somewhat and he may be able to shed more light there.

josh11b · 2021-04-01T06:41:29Z

To soften my stance somewhat: shadowing it does make it easier to copy-paste code without having to rename to avoid conflicts. I still think on balance that I prefer no shadowing -- some work when writing code is worth making it easier to read and understand code since that happens significantly more often.

To @austern 's point, we absolutely should have a good pattern for error handling, but I think it is premature to say that shadowing is the answer. My preference would be to have some sort of guard construct where there is a short scope handling the error branch where only an error symbol is introduced but always returns or produces a legal non-error value, and after that scope you have the good name with only non-error values.

Here is a totally not-well-thought-out straw-man syntax:

let f = File::open("foo.txt") except e: return e.WithContext("Can't open file");

josh11b · 2021-04-01T06:43:56Z

@chandlerc Could you clarify what distinction you are making re: local vs. non-local name shadowing? Perhaps with examples of each?

chandlerc · 2021-04-01T06:51:04Z

@chandlerc Could you clarify what distinction you are making re: local vs. non-local name shadowing? Perhaps with examples of each?

An example to highlight the distinction I'm drawing between local and non-local:

fn GlobalFunction() {
  // ...
}

fn OtherFunction() {
  var Int: x = 42;

  if (...) {
    // This variable shadows a non-local (to this function) name:
    var Bool:$$ GlobalFunction = true;

    // This variable shadows a local (to this function) name:
    var Float: x = 1.0;
    // ...
  }
}

As for examples that help motivate allowing shadowing here, I'm recalling a discussion with @zygoloid and suspect he could better link to those or reproduce them -- my memory is a bit fuzzy here.

jsiek · 2021-04-01T13:27:51Z

Great discussion and examples all around...

I'll chime in on one aspect. @chandlerc I'd prefer to not make a distinction between local vs. global shadowing because I think such a distinction would make Carbon more difficult to learn... every special case comes with a cost.

zygoloid · 2021-04-01T21:34:20Z

Broadly I think both (a)+(d) and (b)+(e) are reasonable positions, and I think we have good motivation for discarding any other combinations of the presented options.

I think we have strong data that disallowing certain kinds of shadowing is beneficial. So the choice is between disallowing shadowing, gathering usage data to see if there are particular pain points, and then perhaps permitting shadowing in cases where we think it's appropriate to do so (which may be everywhere), or allowing shadowing now and not doing that experiment, and likely continuing to allow shadowing forever. It's easy to evolve from no-shadowing to allowing shadowing, but hard to evolve in the opposite direction, so my inclination is to initially disallow all shadowing.

There are some places where I expect this to be painful (these may be the local shadowing cases that @chandlerc was referring to; I don't recall exactly which cases we previously discussed):

where the captures of a closure shadow the variables they capture (in C++, [x = std::move(x)] { ... })
where the parameters of a constructor shadow the members of a class (but this would only be an issue if the member names are in scope for unqualified lookup, which they wouldn't be if we require explicit self. qualification)

One additional issue we've previously discussed is whether the declaration of the shadowing name or the use of the shadowing name is what results in the error, and our tentative direction had previously been that it's the use. For example, we don't want to reject an impl of an interface solely because it defines a method whose name collides with that of something in an enclosing scope:

fn foo() {}
struct X {}

impl Thing for X {
  // this should be OK...
  fn foo() {}
  // ... but maybe this is an error (you could use `Thing.foo()` or something like
  // that to name the `foo` from the line above)
  fn bar() { foo(); }
}

Viewed this way, we could say: there is no name shadowing; instead, name lookup always looks in all enclosing scopes, and it's an error if the lookup is ambiguous.

chandlerc · 2021-04-02T03:42:41Z

The thing I was remembering (and incorrectly, sorry for that) was the impl example you give @zygoloid -- thanks for that.

I particularly like the approach suggested at the end of @zygoloid's comment -- it remains restrictive in the same way as (b)+(e) but gives good techniques for solving some of the tricky cases where there are API conflicts that can't realistically be avoided at declaration time, but can be addressed by avoiding unqualified lookup entirely.

Then we can learn how and where this restrictive model is still problematic (the examples given by @austern and @zygoloid both) and iterate on those use cases to figure out the best way to address them. Potentially that involves allowing some amount of shadowing, but I don't think we'll look for and solve these problems effectively without the pressure.

Plus we don't really have any other good ways of solving the known problems with shadowing. If we immediately enable shadowing to solve one set of use cases, we give up any real chance of finding a good way to solve the problems introduced by shadowing.

Want to hear from @KateGregory here as well though.

KateGregory · 2021-04-05T22:32:39Z

I find Richard's logic compelling. Being able to relax if people bring genuine use cases feels smarter than trying to guess what use cases might occur. I think people accidentally shadow (or redeclare) and want to be told they did wrong, and I might even believe that's more common than deliberate shadowing because x is the only possible name for this thing. I'm drawn towards being compassionate to people who get muddled up by scope and don't understand they don't always need to declare things if those things were declared in outer scopes, or who know that but have a small mental buffer so forgot that limit or whatever is already around from before. Will we have such people in Carbon? I hope so, because eventually people will come who don't have a strong background in one of the languages we're drawing on. And I don't want to do things that will make life more difficult for them, without particular benefit for the ones who might deliberately shadow. Starting restrictive means we must be shown real benefit before we relax, so if the benefit never comes along we never relax.

fowles · 2021-04-05T22:56:16Z

I worry about the interaction of not allowing shadowing and refactoring. In particular, I want to be able to move a function without regard to the globals in scope at the destination.

chandlerc · 2021-04-09T05:09:57Z

I worry about the interaction of not allowing shadowing and refactoring. In particular, I want to be able to move a function without regard to the globals in scope at the destination.

Would you still worry if the only possible globals in question are local to the file?

For me, that makes collisions still possible but much less concerning in practice, but curious how it would (or wouldn't) influence your concerns here.

fowles · 2021-05-11T20:34:31Z

@chandlerc it is still a problem because every time I move a function I have to search the entire file for shadows of all of my locals (or suffer a build break). This will interfere with seemingly safe refactors like inlining as well as function moving.

zygoloid · 2021-05-14T00:34:36Z

I think you would only suffer build breaks and similar problems if you move the function into a (non-enclosing) scope that declares a conflicting name; you only need to search the scope(s) that you moved the function into, that it wasn't already in. Eg, if you move a member function from a derived class to a base class, where the classes are in the same scope, you need to check whether the base class has a member with the same name. So it should be less work than searching the entire file, unless you're moving the code between files.

For inlining in particular, I think you pretty fundamentally need to deal with the possibility that the names in the function you're inlining are also declared at the call site, and it would not be good to reuse those names for something else.

In general, when you move code about you already need to check that the names the code uses mean the same thing in the old and new context, and I'm not convinced that the rule we're talking about would add a huge amount of burden to that. In particular, given that our coding convention would mostly use UpperCamelCase for names outside a function and lower_snake_case for names introduced inside a function, it seems to me that we're unlikely to see collisions of this kind that aren't collisions between local variables (which I think tools will need to handle anyway in order to produce acceptable results).

fowles · 2021-05-14T19:21:44Z

I think you are correct about inlining. I actually think that moving functions (and classes) between files is a relatively common operation. I worry about adding friction to that as files can get pretty large in practice. On the other hand, globals are not extremely common. That said, I tend to find shadowing prevention overly aggressive and would prefer if we simply permit shadowing.

zygoloid · 2021-06-17T22:29:46Z

I think we can narrow our consideration down to three rules. We can tweak the chosen rules if we find cases that need special handling; our goal here should be to establish a baseline approach to name lookup. The three are:

Name shadowing is valid; names in an inner scope hide names from outer scopes. (Possible tweak: names declared later in a scope can even hide names declared earlier in the same scope.)
Name shadowing is valid but inner names don't hide outer names; instead, any use of a name declared in multiple enclosing scopes is invalid due to ambiguity.
Name shadowing is invalid: a name cannot be declared if there is another declaration in an enclosing scope or a conflicting declaration (that is, not a function overload and not a redeclaration) in the same scope.

Under option (3), we will need a way to declare a name (for example, to implement an interface) without introducing possibly-conflicting names into scope. For example:

struct Hash { var how_much_corned_beef: Weight; };
impl Hash as Hashable {
  // would be an error due to shadowing
  fn Hash[me: Self]() -> HashCode { ... }
  // OK
  fn Hashable.Hash[me: Self]() -> HashCode { ... }
}

Option 3 gives us the most room to change our minds in the future, while addressing a known source of issues and not being too inventive or surprising. As a result, that's my preference, and the direction I propose that we take.

chandlerc · 2021-06-23T11:50:34Z

I think we have consensus among the leads at this point to pursue the suggestion in @zygoloid's last comment (specifically option (3) there).

When this turns into a proposal, it should work to cover the different use cases where we will need some syntax for avoiding shadowing. This includes the something like the suggested syntax from @zygoloid for interface implementations to use a qualified name, but also likely similar issues. For example:

package Abseil api

// Newly added import from a poorly named package `Map`.
import Map library "hashtables"

// Still need to export a name `Map` from the `Abseil` package.
// We use the package name itself to form a qualified name.
api struct Abseil.Map {
  // ...
}

There may be other similar examples, but the same kind of pattern as suggested in (3) should be followed by forming a qualified name rather than needing an unqualified name that would introduce shadowing.

Closing this issue as I think we've got an answer for the core question here with pretty good consensus, but always happy to re-open if needed!

zygoloid added the needs decision label Apr 2, 2021

jonmeow mentioned this issue Apr 19, 2021

Open question: Calling functions defined later in the same file #472

Closed

jonmeow removed the needs decision label Apr 20, 2021

chandlerc closed this as completed Jun 23, 2021

jonmeow added the leads question A question for the leads team label Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design choices around name lookup #424

Design choices around name lookup #424

jsiek commented Mar 31, 2021 •

edited

Loading

josh11b commented Mar 31, 2021 •

edited

Loading

jsiek commented Mar 31, 2021

austern commented Mar 31, 2021 •

edited

Loading

chandlerc commented Apr 1, 2021

josh11b commented Apr 1, 2021 •

edited

Loading

josh11b commented Apr 1, 2021

chandlerc commented Apr 1, 2021

jsiek commented Apr 1, 2021

zygoloid commented Apr 1, 2021

chandlerc commented Apr 2, 2021

KateGregory commented Apr 5, 2021

fowles commented Apr 5, 2021

chandlerc commented Apr 9, 2021

fowles commented May 11, 2021

zygoloid commented May 14, 2021

fowles commented May 14, 2021

zygoloid commented Jun 17, 2021

chandlerc commented Jun 23, 2021

Design choices around name lookup #424

Design choices around name lookup #424

Comments

jsiek commented Mar 31, 2021 • edited Loading

josh11b commented Mar 31, 2021 • edited Loading

jsiek commented Mar 31, 2021

austern commented Mar 31, 2021 • edited Loading

chandlerc commented Apr 1, 2021

josh11b commented Apr 1, 2021 • edited Loading

josh11b commented Apr 1, 2021

chandlerc commented Apr 1, 2021

jsiek commented Apr 1, 2021

zygoloid commented Apr 1, 2021

chandlerc commented Apr 2, 2021

KateGregory commented Apr 5, 2021

fowles commented Apr 5, 2021

chandlerc commented Apr 9, 2021

fowles commented May 11, 2021

zygoloid commented May 14, 2021

fowles commented May 14, 2021

zygoloid commented Jun 17, 2021

chandlerc commented Jun 23, 2021

jsiek commented Mar 31, 2021 •

edited

Loading

josh11b commented Mar 31, 2021 •

edited

Loading

austern commented Mar 31, 2021 •

edited

Loading

josh11b commented Apr 1, 2021 •

edited

Loading