Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async reconcile: resource definitions and code file parser #2761

Merged
merged 27 commits into from
Jul 18, 2023

Conversation

begelundmuller
Copy link
Contributor

@begelundmuller begelundmuller commented Jul 10, 2023

The key files in this PR are the revamped resource definitions in proto/rill/runtime/v1/resources.proto and the new compiler for parsing projects into resources in runtime/compilers/rillv1.

Some implementation notes (in no particular order):

  • Supports merging config from multiple files, e.g. model.yaml and model.sql can set config on the same resource
  • Supports emitting multiple resources from one file, e.g. for embedded sources
  • Supports a flexible directory structure (files outside the known directories must specify a kind in the YAML or SQL)
  • Introduces a Migration resource, to support init.sql and other ad-hoc SQL statements
  • Supports specifying a connector for any SQL-based resource, allowing e.g. models to be orchestrated on other DBs. If connector == "", the new reconcile will use the default connector (which is DuckDB on stage.db).
  • Doesn't rely on file timestamps
  • Supports incremental parsing of a subset of files (for keystroke-by-keystroke)
  • Adds a templating engine for resolving templated queries
  • If templating is used in a SQL file, DuckDB inference (like detecting referenced tables or rewriting embedded sources) is disabled. Instead, you have to use {{ ref "other-model" }} for models-referencing-models
  • Supports setting config in SQL files using -- @annotation: or {{ configure "key" "value" }} syntax. The comment-based annotations are only supported for DuckDB SQL files.
  • The parser only parses files in isolation and doesn't check references. Any DAG-based validation is expected to be handled by reconcile.

Notes about catalog changes:

  • The word "resource" replaces "catalog entry/object"
  • Resources have separate "spec" (desired state) and "state" (actual state)
  • The new catalog will support multiple resources of different kinds with the same name
    • For cases where uniqueness is required across kinds (specifically source and model), the reconciler will be responsible for validating uniqueness

@begelundmuller begelundmuller changed the title API stubs and resource definitions Async reconcile: resource definitions and code file parser Jul 10, 2023
@begelundmuller begelundmuller marked this pull request as ready for review July 14, 2023 16:15
This was referenced Jul 17, 2023
for _, resource := range p.resourcesForPath[path] {
// Multiple entries in resourcesForPath may point to the same resource.
// By adding resource.Paths to checkPaths, the outer loop will eventually clear those (maybe it already has).
checkPaths = append(checkPaths, resource.Paths...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do the deduplication here? Embedded sources seem to explode checkPaths quite a bit. The test with 2 models referencing once source leads to about 14 entries in this when one of the model is deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation. I tried investigating some simple deduplication checks / other logic changes, but wasn't able to come up with something that benchmarked better. For small list sizes, it seems appending to checkPaths is about the same cost as doing extra deduplication checks.

Given the small number of files that will usually be reparsed, I think it's okay to keep it slightly unoptimized (the seenPaths will still guard against duplicating any expensive ops, like checking file stat), and it's nice to keep the logic simpler

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Always bench before optimisation :)

spec.DefaultTimeRange = tmp.DefaultTimeRange
spec.AvailableTimeZones = tmp.AvailableTimeZones

for i, dim := range tmp.Dimensions {
Copy link
Collaborator

@AdityaHegde AdityaHegde Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where will the validation of duplicate dimension name and measure name go if not here? (Edited with clarification)

Copy link
Collaborator

@AdityaHegde AdityaHegde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. Any tweaks can go in new PRs

@begelundmuller begelundmuller merged commit c8509a4 into main Jul 18, 2023
5 checks passed
@begelundmuller begelundmuller deleted the begelundmuller/reconcile-compiler-and-resources branch July 18, 2023 13:39
djbarnwal pushed a commit that referenced this pull request Aug 3, 2023
* API stubs and resource definitions

* Revert admin proto change

* Fix proto linter errors

* Remove API stubs

* New compiler

* Adapt for duckdbsql changes

* Split parser into multiple files

* Fix tests

* DuckDB parsing fixes and limits

* Support schedule and remove loose ends

* Basic test coverage

* Fix web test

* Support errors with line numbers for YAML and DuckDB SQL parse errors

* Tests for reparse

* Tests for embedded sources

* Handle dirty parses

* Move SQL annotation parsing to a separate package

* Couple SQL and YAML files with same file stem

* Review

* Stem -> Node

* Benchmark reparse

* Use known fields for metrics views

* Split dots in annotations

* Fix errors from merge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants