Add detailed tracking for start-of-invocation parse time #2823

jtcohen6 · 2020-10-08T16:39:14Z

Describe the feature

We know that the time it takes to parse especially big projects is a major pain point, and that this parse time is increasing linearly with project size
We plan to dive deep into parsing over the coming months and identify areas for improvement
As we roll out changes that promise to decrease parse time, we need the ability to measure empirical treatment effects for active projects in the wild

Proposal: Extend our existing opt-out anonymous usage tracking to present a detailed picture of project size, resource distribution, and start-of-invocation steps with associated execution times.

As far as the implementation, I'm taking some indirect inspiration from Snowplow's performance timing context. I'm envisioning a custom structured event that does two things:

Splits out start-of-invocation tasks into discrete components—read files from disk, create hologrammed objects, run metadata queries against database, etc—and calculates the time it takes to do each of them
Includes a dictionary representation of project attributes:
- Found 1 model, 1 test, 1 snapshot, 1 analysis, 143 macros, 0 operations, 4 seed files, 0 sources
- Total number of files

Describe alternatives you've considered

Use current tracking to accomplish a crummier version of this. We fire events on invocation start and for each model (if dbt run); we could take the difference from the first model run to approximate total start-of-invocation-stuff time.
Develop an internal suite of sample projects that represent most possible project permutations (1000s of models, 1000s of schema tests, 1000s of macros, 1000s of docs blocks, etc.). Test potential changes against that suite during CI and hope it translates to real-world projects.
- I don't think this is an alternative as much as something we should do in addition. We'll still want to know the on-the-ground impact of our changes, no matter how promising they seem in development.

Who will this benefit?

Especially large organizations that will have large dbt projects from day 1
Mature projects that continue to grow, slowly but surely

The text was updated successfully, but these errors were encountered:

drewbanin · 2020-10-08T16:49:31Z

@jtcohen6 I've been thinking about this as a breakdown across node type parsing. Maybe an example payload would be:

[
  {
    "type": "model",
    "parse_time": 10.2,
    "count_parsed": 1000
  },
  {
    "type": "source",
    "parse_time": 2.4,
    "count_parsed": 128
  }
]

I think it might be challenging to break down parse times across file loading / jinja interpolation / data encoding/decoding. Instead, we can use this dataset to understand empirical performance across node types, then drill into specific performance improvements in a testing/sample project using something like cProfile and snakeviz. That's just my 2 cents - you buy it?

jtcohen6 · 2020-10-08T17:21:02Z

Totally. I imagined the difficulties here to be reversed, whereby it'd be reasonable to break down parse time by functional area of the codebase and quite challenging to break it down by specific node. I'm not exactly sure why I thought that, but I'm equally happy with the approach you outline.

kwigley · 2020-11-16T14:54:14Z

We settled on tracking higher level timing info that is already being recorded by #2883

jtcohen6 added enhancement New feature or request performance labels Oct 8, 2020

jtcohen6 added this to the Kiyoshi Kuromiya milestone Oct 8, 2020

kwigley self-assigned this Oct 29, 2020

kwigley mentioned this issue Nov 17, 2020

Add event tracking for project parse/load time #2893

Merged

4 tasks

kwigley closed this as completed in #2893 Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add detailed tracking for start-of-invocation parse time #2823

Add detailed tracking for start-of-invocation parse time #2823

jtcohen6 commented Oct 8, 2020 •

edited

Loading

drewbanin commented Oct 8, 2020 •

edited

Loading

jtcohen6 commented Oct 8, 2020

kwigley commented Nov 16, 2020

Add detailed tracking for start-of-invocation parse time #2823

Add detailed tracking for start-of-invocation parse time #2823

Comments

jtcohen6 commented Oct 8, 2020 • edited Loading

Describe the feature

Describe alternatives you've considered

Who will this benefit?

drewbanin commented Oct 8, 2020 • edited Loading

jtcohen6 commented Oct 8, 2020

kwigley commented Nov 16, 2020

jtcohen6 commented Oct 8, 2020 •

edited

Loading

drewbanin commented Oct 8, 2020 •

edited

Loading