Use Spark Accumulators to calculate Atlas Statistics #178

MikeGost · 2021-02-22T01:58:50Z

In the AtlasGenerator job, calculating statistics is a separate (optional) stage.

The idea is to replace this with a Spark Double Accumulator earlier in the flow. The Accumulator can still support the custom AtlasStatistics class and be optional.

This might improve the overall runtime of the statistics portion, as it would be done inline with Atlas creation. However, I don't have data to back up this assumption. Writing a task in case there is interest in streamlining this portion of the job.

MikeGost added the New Feature label Feb 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Spark Accumulators to calculate Atlas Statistics #178

Use Spark Accumulators to calculate Atlas Statistics #178

MikeGost commented Feb 22, 2021

Use Spark Accumulators to calculate Atlas Statistics #178

Use Spark Accumulators to calculate Atlas Statistics #178

Comments

MikeGost commented Feb 22, 2021