Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow duplicated model names across packages #3053

Closed
wants to merge 2 commits into from

Conversation

drewbanin
Copy link
Contributor

resolves #1269 (partially?)

Description

This PR makes it possible to have refable nodes (models, seeds, and snapshots) with duplicated names across different packages. Refable nodes with duplicated names continue to be disallowed within a specific package.

TODO:

  • determine if we're relying on model name uniqueness anywhere inside of Core (I didn't see anything, but need to think more about this)
  • determine if we're relying on model name uniqueness anywhere inside of the docs website? This sounds more likely....
  • add tests / maybe remove tests that depended on the previous behavior

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Feb 5, 2021
@drewbanin
Copy link
Contributor Author

@jtcohen6 what say ye - is this a change we're interested in making in dbt?

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board. The constructs hold together better than I expected, to be honest. I do think we need an answer for resource properties (description, tests, etc), which may be straightforward or may be nefariously complex.

Blocking

Can we enable namespacing for resource properties? In the event that there are multiple, could we safely assume that the user is trying to describe properties for the resource in the current package? It would be a breaking change to prevent users from defining properties for resources in other packages, but this is only possible for package resources that don't already have properties themselves, anyway.

Compilation Error
  dbt found two schema.yml entries for the same resource named model_a. Resources and their associated columns may only be described a single time. To fix this, remove the resource entry for model_a in one of these files:
   - models/other_schema.yml
  models/schema.yml

At the very least, we should update that error message to include which package the .yml files are in.

Not blocking

$ dbt run -m model_a includes both models! I guess that makes sense, it just feels a little funny to me. Obviously, $ dbt run -m root_project.model_a selects just the one.

The docs site actually works pretty well, it's just the same thing as above: focusing on +model_a+ in the DAG viz shows the subgraphs of both models. I think the fix here would be pretty simple: We should fully qualify resource names (including package name) in the DAG side-view launcher, and as the result of "Refocus on node."

Screen Shot 2021-02-05 at 1 38 12 PM

Should be:

Screen Shot 2021-02-05 at 1 48 34 PM

Of course, that still elides the UX issues of displaying two nodes with the same name, and no visual components to distinguish them. I don't find myself all that troubled, though: in part, because the package dropdown selector is first class in the DAG viz; in part, because I still wouldn't recommend doing this, exactly because it risks ambiguity. Sure, there are things we could do to clear up that ambiguity—or better enable disambiguation, at least—and they're things we're already thinking about (dbt-labs/dbt-docs#122).

# have unique names within a package
continue

if len(matched) > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder if, in the event that there are same-named models in the current project and in an installed package, we should still support single-argument ref and default to the resource in the same namespace. We would only raise this error if all the resources by that name are in other namespaces.

My motivation here: It's possible that a user wants to install a package that has a same-named model as one in their project, and the models in the package only uses single-argument ref. Of course, I think there's a lot of benefit in being more explicit to avoid surprises, so alternatively we'd establish a convention that package authors ought to use two-argument ref everywhere. It wouldn't be a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea!

@jtcohen6 jtcohen6 changed the base branch from dev/margaret-mead to develop February 8, 2021 11:30
@NavneetSajwan
Copy link

Hi @drewbanin,
Looking forward to this feature. Any update on this PR?

@jtcohen6
Copy link
Contributor

This remains a good idea. The blocking issue also remains: it's not possible to define resource properties for two resources with the same name, even if they live in different packages.

I'd have appetite to take another swing at this after v1.0—so, likely next year. This will serve as a solid jumping-off point when we're ready to take that swing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add namespacing for dbt resources
3 participants