Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(models) : Joins (Datasets) schema, resolvers and UI #8325

Merged
merged 118 commits into from
Mar 20, 2024

Conversation

poorvi767
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@poorvi767
Copy link
Contributor Author

We have closed previous PR: #7961 by @rtekal and created this PR with additional changes containing updated resolvers and UI.

@github-actions github-actions bot added the product PR or Issue related to the DataHub UI/UX label Jun 28, 2023
Copy link
Collaborator

@chriscollins3456 chriscollins3456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! this is looking solid, thank you so much for your effort here we're really appreciative.

I left some comments, some more serious than others.

In my first review I only had time to get through the modeling and backend changes. I'll take more of a look at datahub-web-react soon!

@@ -701,6 +710,9 @@ private void configureDataPlatformInstanceResolvers(final RuntimeWiring.Builder

private void configureQueryResolvers(final RuntimeWiring.Builder builder) {
builder.type("Query", typeWiring -> typeWiring
.dataFetcher("join", new AuthenticatedResolver<>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AuthenticatedResolver is actually deprecated and an authentication check is handled elsewhere so if the user isn't logged in they can't get to any of these endpoints! so feel free to remove it from here as well as below

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also i see duplicate "join" entries here in configureQueryResolvers so I think you can just remove this one entirely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

- container
- globalTags
- glossaryTerms
- browsePaths
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i'm not sure I see browsePaths on joins right now since they won't have their own entity profile page. also unsure if we need container or institutionalMemory aspects either

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the present, I removed BrowsePaths and Container
InstitutionalMemory is needed since we are providing a place for the documentation.

Comment on lines 168 to 174
private com.linkedin.common.urn.JoinUrn getJoinUrn(String urnStr) {
try {
return JoinUrn.createFromString(urnStr);
} catch (URISyntaxException e) {
throw new RuntimeException(String.format("Failed to retrieve data product with urn %s, invalid urn", urnStr));
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method and getUrn above look to be duplicated. Also we have this method getUrn to do just this in a generic UrnUtils class you can use!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still a nit but I don't think you need this method but can just use the UrnUtils.getUrn instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

"""
List all Join
"""
listJoin(input: ListJoinInput!): ListJoinResult
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm I don't see this implemented anywhere in GmsGraphQLEngine - are you still planning on creating this endpoint?

and when would we list all joins in general? won't it always be in the context of a dataset?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed, In the beginning made stubs for all endpoints, yeah not needed.

"""
Edges extending from this entity grouped by direction in the lineage graph
"""
lineage(input: LineageInput!): EntityLineageResult
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there lineage on Joins? I don't think so right? there are Relationships

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - no lineage on Joins but relationships between datasets and joins. Removed lineage

Comment on lines 3157 to 3161
"""
Display name of the Dataset
"""
name: String

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm is this intended? I don't think we're trying to allow dataset names to be editable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - removed.

"owners" : [ "urn:li:corpuser:fbar", "urn:li:corpuser:bfoo" ],
"fields" : [ {
"name" : "joinId",
"doc" : "Join native name e.g. <db>.<table>, /dir/subdir/<name>, or <name>",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be a UUID right? otherwise we wouldn't be able to have two joins with the same name and that could be problematic between all of the joins and datasets out there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

Comment on lines 12 to 26
@Searchable = {
"fieldType": "TEXT_PARTIAL",
"enableAutocomplete": true,
"boostScore": 10.0
}
afield: SchemaFieldPath

/**
* All fields from dataset B that are required for the join, maps to aFields 1:1
*/
@Searchable = {
"fieldType": "TEXT_PARTIAL",
"enableAutocomplete": true,
"boostScore": 10.0
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make these TEXT fields instead of TEXT_PARTIAL since that's what we have on other schema fields. also we don't want to enableAutoComplete so we can just remove that. also i think you can remove boostScore here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

"fieldType": "DATETIME"
}
}
lastModified: optional TimeStamp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using AuditStamp instead of TimeStamp? it just gives us some more optional fields to be more flexible in the future and requires the actor field to be populated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Comment on lines 13 to 15
@Searchable = {
"fieldType": "TEXT",
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this to be searchable? it looks like right now it's just a random UUID and we're making the name searchable anyways for joins

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed... removed.

Copy link
Collaborator

@chriscollins3456 chriscollins3456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got through most of the frontend code as well! I'm going to pull this down and test locally later this week and may have some more feedback then as well. In general, frontend code could be broken down into smaller, more consumable chunks. I'm pretty concerned about the number of queries we're making and all of the updates to our .graphql files as well that seem unrelated (changing these files can have big impact on the performance of our queries)

@@ -96,6 +97,10 @@ export class DatasetEntity implements Entity<Dataset> {
name: 'Schema',
component: SchemaTab,
},
{
name: 'Relationships',
component: RelationTab,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can rename this component RelationshipsTab?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

Comment on lines 53 to 65
// const fieldSortInput: FieldSortInput = {
// field: 'lastModifiedAt',
// sortOrder: Sort.Desc,
// };
const tabs = [
{
key: 'joinsTab',
tab: 'Joins',
},
// {
// key: 'pkFkTab',
// tab: 'PK/FK',
// },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this commented out code? or do we still want this tab here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed commented code.

joinData = joins.search.searchResults.map((r) => r.entity as Join);
}

const joinPreview = (record: Join): JSX.Element => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this RelationTab is getting pretty big and complicated. I think moving this joinPreview function with all of its rendering logic below out to its own component would be nice! or you can simply move this name logic into the existing JoinPreview component

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored.

)}
{loadingJoin && (
<div>
Joins <LoadingOutlined />{' '}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I think we usually just show <LoadingOutlined />

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Comment on lines 191 to 200
const { data: table2Dataset } = useGetDatasetQuery({
variables: {
urn: selectDataset || 'urn:li:dataset:(urn:li:dataPlatform:snowflake,testData,DEV)',
},
});
const { data: table2Schema } = useGetDatasetSchemaQuery({
variables: {
urn: selectDataset || 'urn:li:dataset:(urn:li:dataPlatform:snowflake,testData,DEV)',
},
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm we're hardcoding these urns as backup here? also i believe we'll be immediately firing these queries when we load this page. I think these should probably make these lazy queries to only fire once they select a dataset

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also why do we need to call useGetDatasetQuery and useGetDatasetSchemaQuery? we're already getting the dataset when we search for it right? we could store that in state. I get needing to make a query to show the new dataset's schema.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general we want to minimize the number of queries we make and only make queries that are (1) necessary and (2) when we need them (as opposed to on page load)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both graphql queries removed and utilized lazy queries.

selectedEntities={selectedEntities}
/>
</TabToolbar>
{!singleSelect && (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm hiding this bar based on singleSelect feels off. maybe we need another optional prop for this component saying that we should hide this select bar if it's true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Comment on lines 12 to 13
export const JoinTab = ({ properties }: { properties?: any }) => {
const { data: entityData } = useGetJoinQuery({ variables: { urn: properties } });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm confused - why is properties any type and then we pass that into the query as the urn parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @chriscollins3456 for review and valuable feedback!
While we are working on other, for this one:

i'm confused - why is properties any type and then we pass that into the query as the urn parameter?

Here JoinTab is called from JoinEntity and tab type is 'EntityTab' with definition as below:
export type EntityTab = {
name: string;
component: React.FunctionComponent<{ properties?: any }>;
display?: {
visible: (GenericEntityProperties, T) => boolean; // Whether the tab is visible on the UI. Defaults to true.
enabled: (GenericEntityProperties, T) => boolean; // Whether the tab is enabled on the UI. Defaults to true.
};
properties?: any;
};
Component can support 'React.FunctionComponent<{ properties?: any }>;' this signature only.
This has properties to pass any argument to Tab and I wanted to get urn from it.
Please feel free to add any suggestion here. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed as suggested- utilized useEntityData from EntityContext.

Comment on lines 113 to 117
incoming: relationships(
input: { types: ["joinA", "joinB"], direction: INCOMING, start: 0, count: 100 }
) {
...fullRelationshipResults
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we will then be doing a relationships query here for all dataset queries - can we move this out and create a whole new query for getting joins associated with this dataset? also, aren't we making a query to get joins where a given dataset is either joinA or joinB?

Copy link
Contributor Author

@poorvi767 poorvi767 Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We removed this fragment and we are fetching joins with graphql search query.

Comment on lines 751 to 765
fragment searchResultFieldsWithRelationships on Entity {
urn
type
... on Dataset {
name
origin
uri
platform {
...platformFields
}
dataPlatformInstance {
...dataPlatformInstanceFields
}
editableProperties {
name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we adding all of this stuff in our graphql files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed entries which are not required.

Comment on lines 177 to 183
name
description
}
platform {
...platformFields
}
ownership {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also confused why this lineage.graphql file is being updated? Joins aren't lineage, they are relationships, but we'll never get them in a lineage specific query

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@anshbansal anshbansal added the community-contribution PR or Issue raised by member(s) of DataHub Community label Jul 17, 2023
@rtekal
Copy link
Contributor

rtekal commented Mar 19, 2024

Yep Chris. Took care of all your review comments.
Please let me know. Thanks.

Copy link
Collaborator

@chriscollins3456 chriscollins3456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you so much for this!! also thank you so much for being so helpful on working with me on these requested changes - you all are real troopers no doubt about it.

I have one tiny final change that I'd like to request just to update the casing in one field in the model, which I checked should require only updates to a couple other places.

However, i'm approving this to get CI going to make sure everything is green before we merge this in! that's the only real blocker now to make sure tests and linting are passed. I'll help you guys out and point out changes to make CI pass

Unfortunately I don't have write access to poorvi's repo so I can't push code to get CI green with you unless I get access. Either way I'm happy to help guide you on anything that we might need to do :)

/**
* ERModelRelationFieldMapping (in future we can make it an array)
*/
relationshipfieldMappings: array [RelationshipFieldMapping]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more tiny one! would you mind updating the casing of this model to be relationshipFieldMappings with a capital "F"? then i checked the code and there's just a couple of places that would have to be changed to reflect then then it's good!

@chriscollins3456
Copy link
Collaborator

chriscollins3456 commented Mar 19, 2024

but it actually looks like you recently merged in master and the CI is looking solid! so really we should only have to worry about the casing request in that final comment and then we're merging this bad boy

I went ahead and merged in master one more time to make sure we're up to date

@rtekal
Copy link
Contributor

rtekal commented Mar 19, 2024

Sure Chris. Thank you. I went ahead and did the last review changes.
If you find issues with 'tests and linting' please let me know.
Tomorrow I will ask Poorvi to add you as collaborator, in case you commit any changes to accelerate things faster.

Thank you again for all your help and extreme patience with our contribution.

Copy link
Collaborator

@chriscollins3456 chriscollins3456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! i think we're good to merge :)

@chriscollins3456 chriscollins3456 merged commit c1f6efa into datahub-project:master Mar 20, 2024
60 checks passed
sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Jun 25, 2024
…ject#8325)

Co-authored-by: Raj Tekal <[email protected]>
Co-authored-by: Raj Tekal <[email protected]>
Co-authored-by: Raj Tekal <[email protected]>
Co-authored-by: Chris Collins <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants