Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1944] [SPIKE] Remove dependency on hologram #6776

Closed
jtcohen6 opened this issue Jan 30, 2023 · 8 comments
Closed

[CT-1944] [SPIKE] Remove dependency on hologram #6776

jtcohen6 opened this issue Jan 30, 2023 · 8 comments
Assignees
Labels
spike tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jan 30, 2023

We started hologram a fork of dataclasses-jsonschema. It allowed us to adopt Python dataclasses in Python 3.7+, while still making use of JSONSchema-based validation. We haven't done a particularly good job of maintaining it in the years since, and it now brings us more grief than good.

We use hologram in two places today:

  1. Validation (dataclass_schema.py), although the validation errors it raises are really unpleasant & not very helpful
  2. Generating JSONSchemas from artifacts, which are still not always technically correct ([CT-119] [Bug] run_results.json schema validation error with oneOf constraint on status #4657)

I recently ran into the unpleasantness of the validation error message, while trying to find the root cause of fivetran/dbt_amazon_ads_source#6. Nowhere did this error tell me that this was in the config.enabled of a source, nor even the file where the source was defined!

08:24:59  Encountered an error:
'False and True' is not of type 'boolean'

Failed validating 'type' in schema['properties']['enabled']:
    {'default': True, 'type': 'boolean'}

On instance['enabled']:
    'False and True'
08:24:59  jsonschema.exceptions.ValidationError: 'False and True' is not of type 'boolean'

Failed validating 'type' in schema['properties']['enabled']:
    {'default': True, 'type': 'boolean'}

On instance['enabled']:
    'False and True'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/jerco/dev/product/dbt-core/core/dbt/main.py", line 135, in main
    results, succeeded = handle_and_check(args)
  File "/Users/jerco/dev/product/dbt-core/core/dbt/main.py", line 198, in handle_and_check
    task, res = run_from_args(parsed)
  File "/Users/jerco/dev/product/dbt-core/core/dbt/main.py", line 245, in run_from_args
    results = task.run()
  File "/Users/jerco/dev/product/dbt-core/core/dbt/task/parse.py", line 93, in run
    self.get_full_manifest()
  File "/Users/jerco/dev/product/dbt-core/core/dbt/task/parse.py", line 74, in get_full_manifest
    manifest = loader.load()
  File "/Users/jerco/dev/product/dbt-core/core/dbt/parser/manifest.py", line 373, in load
    patcher.construct_sources()
  File "/Users/jerco/dev/product/dbt-core/core/dbt/parser/sources.py", line 84, in construct_sources
    parsed = self.parse_source(patched)
  File "/Users/jerco/dev/product/dbt-core/core/dbt/parser/sources.py", line 145, in parse_source
    config = config.finalize_and_validate()
  File "/Users/jerco/dev/product/dbt-core/core/dbt/contracts/graph/model_config.py", line 353, in finalize_and_validate
    self.validate(dct)
  File "/Users/jerco/dev/product/dbt-core/env/lib/python3.9/site-packages/hologram/__init__.py", line 989, in validate
    raise ValidationError.create_from(error) from error
hologram.ValidationError: 'False and True' is not of type 'boolean'

Failed validating 'type' in schema['properties']['enabled']:
    {'default': True, 'type': 'boolean'}

On instance['enabled']:
    'False and True'

Requirements

Copying from older Jira ticket:

Functional requirements:

  • Must not require us to write an entire schema validation/generation system from the ground up.
  • Must be fast enough to suit core performance goals
  • Must utilize some sort of cross platform standard (json schema, protobuf defs, etc)
  • Must auto-generate validation schemas/defs. Preferably from python data classes or python typedefs or some combination of the two.

Since writing that Jira ticket, we have introduced proto definitions for our event system, and are working (slowly but surely) to extend that framework to our core classes (#6391).

The other serialization/validation library we use, mashumaro, considers native support for schema generation out of scope:

@jtcohen6 jtcohen6 added spike tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality Team:Language labels Jan 30, 2023
@github-actions github-actions bot changed the title Remove dependency on hologram [CT-1944] Remove dependency on hologram Jan 30, 2023
@Fatal1ty
Copy link

The other serialization/validation library we use, mashumaro, considers native support for schema generation out of scope

Actually right now it's not out of scope :) It was out of scope back in 2021 because there were a bunch of other problems in the first place. I'm working on adding .jsonschema() method and hope to include it in one of the following releases.

@jtcohen6
Copy link
Contributor Author

Relevant conversation, proposing a few more alternatives:

@emmyoop emmyoop changed the title [CT-1944] Remove dependency on hologram [CT-1944] [SPIKE] Remove dependency on hologram Mar 13, 2023
@codecae
Copy link

codecae commented Apr 7, 2023

FWIW: the dependent version of hologram is now making dbt-core incompatible with Airflow >=2.5.2... Airflow now requires jsonschema>=4.0.0, which is incompatible with hologram>0.0.14,<0.0.15

Effectively, we are unable to upgrade airflow due to this dependency. I know our problems really aren't your problems, but if this is can be upgraded that would be excellent :)

@Fatal1ty
Copy link

Fatal1ty commented Apr 7, 2023

Hi, folks! As I promised, JSON Schema generation is now out of the box in mashumaro. Would you like to give it a try? If some functionality you need is missing, then I'm ready to work on it.

@mariahjrogers
Copy link

mariahjrogers commented Apr 11, 2023

Seconding @codecae's sentiment above, this dependency is also making dbt-core incompatible with poetry > 1.1.15 (three minor versions behind). Would love for this to go ~ away ~ 🫠

@aranke
Copy link
Member

aranke commented May 4, 2023

Hi folks!

Looks like migrating solely to mashumaro isn't possible currently because mashumaro only performs JSON Schema generation, not JSON Schema validation.

The available options are either:

  1. We switch to protobuf for schema validation
  2. We use another library for JSON schema validation.

Both of these approaches are outside the scope of this spike, so I'm going to stop investigating for now.

@peterallenwebb
Copy link
Contributor

@gshank Should we close this spike and open a separate issue for follow up?

@gshank
Copy link
Contributor

gshank commented Aug 16, 2023

Closing this for now, will continue work in #8426.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spike tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality
Projects
None yet
Development

No branches or pull requests

7 participants