-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2268] [Bug] dbt-core >= 1.4.2 manifests not passing v8 schema validation #7119
Comments
@dlawin Thanks for opening! There are some known problems with the library ( More discussion:
I don't think there's a quick fix here ... even though it's very quick to encounter this bug as soon as you try to use the auto-generated JSONSchemas for actual validation :( |
For context, I contribute to a couple open source tools that utilize these artifacts and their json schemas: Subsequently using that in: For now I think I will need to limit to a version < 1.4.2 |
I suspect that the problem is not with the manifest.json or even with the jsonschema, but with the fact that the jsonschema validate function cannot distinguish between types of nodes and is validating using the wrong part of the schema. In our Python code we have to explicitly load serialized nodes by resource_type, or we end up with incorrectly instantiated nodes. Jsonschema validate with only dictionary input is probably not capable of using the right part of the schema. I looked at the local copies of the generated schemas, and they are correct for resource_type. From the lines included above, it looks like jsonschema is validating a seed node using the analysis node schema. What is the goal of doing the jsonschema validate? Perhaps there's something else that can serve the same purpose. |
I think that the jsonschemas serve more as documentation of what to expect in the manifest.json. I don't think that it can be usefully used to validate the whole manifest. If you want to go through and validate the individual nodes and call out the correct nodes by resource_type, that might possibly work. |
The validation is not actually the goal, the jsonschema is used to parse files to objects for manifest, run results, and sources here. (it uses https://koxudaxi.github.io/datamodel-code-generator/ to generate the object code) If the jsonschema is inaccurate in that it doesn't validate the actual manifest files, the objects created are also inaccurate representations |
To that end, I could probably update the generated classes here so that they handle the seeds https:/yu-iskw/dbt-artifacts-parser/blob/main/dbt_artifacts_parser/parsers/manifest/manifest_v8.py |
@gshank, How can a client depend on any schema if it's not expected to validate the resulting file based on a schema? Are there other schemas we can rely on to parse manifest and catalog files? |
If we use the latest import json
from jsonschema import validate
import yaml
import requests
manifest_path = "/Users/yu/local/src/github/jaffle_shop/target/manifest.json"
r = requests.get(url = "https://raw.githubusercontent.com/dbt-labs/dbt-core/main/schemas/dbt/manifest/v8.json")
schema_str = r.content
schema = json.loads(schema_str)
with open(manifest_path, "r", encoding="utf-8") as fp:
manifest_dict = yaml.safe_load(fp)
validate(
instance=manifest_dict
, schema=schema
) |
@gshank BTW, can you tell me how you usually generate the JSON schemas like https:/dbt-labs/dbt-core/tree/main/schemas/dbt/manifest locally? |
Oh I see, the updated version is not hosted here e.g. doesn't match https://raw.githubusercontent.com/dbt-labs/dbt-core/main/schemas/dbt/manifest/v8.json (accurate one) |
I think this one is on me!! dbt-labs/schemas.getdbt.com#19 Just merged :) |
@jtcohen6 thank you. I will check the hosted schema is updated later. BTW, can you please tell me that, if you know? As I contributed to improving a schema before, I would like to know how to update the schema too. |
Is this a new bug in dbt-core?
Current Behavior
I'm noticing that manifests generated by dbt-core versions 1.4.2, 1.4.3, and 1.4.4 are not passing json schema validation based on the schema here:
Expected Behavior
I would expect this validation to pass or for a v9 manifest to be available.
Steps To Reproduce
Relevant log output
Environment
Which database adapter are you using with dbt?
other (mention it in "Additional Context")
Additional Context
Noticed this with any adapter I tested: snowflake, redshift, postgres, databricks
The text was updated successfully, but these errors were encountered: