Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Fully Automated Benchmarking and Weight Generation #6168

Closed
23 tasks done
shawntabrizi opened this issue May 27, 2020 · 8 comments
Closed
23 tasks done

Fully Automated Benchmarking and Weight Generation #6168

shawntabrizi opened this issue May 27, 2020 · 8 comments

Comments

@shawntabrizi
Copy link
Member

shawntabrizi commented May 27, 2020

Long term goal of the benchmarking/weight effort should be to automate the whole process for the runtime engineer.

This means, if the runtime developer writes an accurate set of benchmarks for their extrinsics, that they can run some simple commands and it should do all the heavy lifting for benchmarking their runtime and creating the proper weight formulas.

Overview

In order to provide full end to end automation of this process we need to automate the following:

Benchmarking Runtime Execution

This process is already automated with our current benchmarking pipeline. We execute the extrinsic given some initial setup and collect data about the results of that benchmark. These data points are then put through a linear regression analysis which tells us the linear formula for this extrinsic. Currently, this information is outputted as text to console, but in this end to end pipeline, we need to extract this data and use it for generating the weight formulas.

Benchmarking DB Operations

Currently we benchmark DB operations through an external process which views the state logs while executing the benchmarks. With this we are able to see the DB operations that take place during the execution.

Additionally, we have special filters which take into account unique reads/writes to DB keys, and also whitelists certain keys from counting again the weight of an extrinsic. For example, if an extrinsic reads from a storage key more than once, we only count the first read as a DB operation. Anything else would be "in-memory". If we write to a key, then read from it, we only count the "write" operation, as the read would then be free. If we read/write to common storage items like events, the caller account, etc... we count these as free as we know these are already accounted for in other weight stuff.

We may need to add a special DB overlay to accurately track the DB reads and writes as well as implement a Hash table so we can remove duplicate reads/writes and add any other fancy logic we want like a whitelist. This should all be enabled only for benchmarks so that normal node operation does not have this overhead.

Generating Weight Formulas and Appropriate Rust Files

Finally once we have this data, we need to automate a process that puts it all together in a usable way.

The output of this automated process should be a rust module weights.rs for each pallet.

Each benchmark written will generate an equivalent weight_for_* formula.

For example if I have:

benchmarks!{
     function_1 { 
         let a in 0 .. a_max;
         let b in 0 .. b_max;
         let c in 0 .. c_max;
         ...
     }
     function_2 { 
         let a in 0 .. a_max;
         let d in 0 .. d_max;
         let e in 0 .. e_max;
         ...
     }
     ...
}

would result in:

weight_for_function_1(a: u32, b: u32, c: u32) -> Weight { ... }
weight_for_function_2(a: u32, d: u32, e: u32) -> Weight { ... }

Then when integrating the weight information into our runtime, we simply write:

mod weights;
...
#[weight = weights::weight_for_function_1(param_1, param_2, param_3)]
fn function_1(origin, param_1, param_2, param_3) {
    ...
}

This would also work for any piecewise weight calculations like so:

#[weight = if approved {
    weights::weight_for_approve()
} else {
    weights::weight_for_disapproved()
};]
fn approve_or_disapprove(origin, approved: bool) {
    ...
}

When we modify logic and need to update the weights, we simply run the pipeline again, and these formulas with the same name will simply be updated to represent the new weight.

@xlc
Copy link
Contributor

xlc commented May 27, 2020

I see this should generate a runtime weight description file (in whatever format suitable) and with #5966 we can attach this file with our runtime and it overrides all build-in values.

@gui1117
Copy link
Contributor

gui1117 commented May 28, 2020

Previously we talked about adding weight methods to Currency trait.

but this strategy is not explained here in Benchmarking Runtime Execution.

How do you see that inside ?

  • should benchmark always take care that call to extern pallet are in their worst-case ?
  • or should each pallet A (with external methods) give additional weight methods for its external methods and pallet B using these methods of pallet A should should not include the time for these methods in their benchmark ? like benchmark would actually only benchmark for the time inside the pallet.

about automation, I see difficulties with refund but we can still do manual refund in case automation can't be achieved.

@shawntabrizi
Copy link
Member Author

shawntabrizi commented May 28, 2020

@thiolliere I updated my post a little bit, let me know if this helps.

Ultimately the automation would result in one generated weight formula per benchmark. If we wanted to add weights to all the methods in the Currency trait, we would need to write a benchmark for all of them. This would result in a simple formula that we can place that would return the weight of this function.

The assumptions on the benchmarking should stay the same and we should always be testing for worst case scenario. In the case of refunds, most cases I would guess we can use the same benchmarking formula and just modify the parameter inputs.

so whereas we initially put:

weight_for_function_1(max_a, max_b, max_c)

Then on refund we would return:

weight_for_function_1(actual_a, actual_b, actual_c)

In the case we need a new formula, we should create a new benchmark for that logical path:

weight_for_function_1_alternative_path()

or should each pallet A (with external methods) give additional weight methods for its external methods and pallet B using these methods of pallet A should should not include the time for these methods in their benchmark ? like benchmark would actually only benchmark for the time inside the pallet.

Can you elaborate? I didnt understand this point

@kianenigma
Copy link
Contributor

kianenigma commented May 28, 2020

One minor glitch is that once we weight-ify all the inter-pallet traits (such as currency) and have a benchmark for both balances::transfer(a, b) and currency::transfer(a, b) and get a nice weights::weight_for_transfer(a, b) and weights::weight_for_currency_transfer(a, b), we must still remember that balances::transfer() is calling into currency::transfer() and have the final annotation be something along the lines of:

#[wright = weights::weight_for_transfer(a, b) + weights::weight_for_currency_transfer(a, b)]
fn transfer(a, b) {
     // other stuff 
     T::Currency::transfer(a, b)
     // more stuff
}

We could in theory write a code parses that scans the code and warns you if you are using a T::Type::method() inside a dispatchable but it is not present in the weight. But hard to do and very hard to get it accurate.

All in all, I wanted to point out for calls that have these nested, additional weights due to their usage of traits, we always need to keep an eye on the code as well. Rest of it can be automated though.


Honestly, after reading my comment I am not quite sure if my understanding of how a Currency trait with weights would work. Is the above example even correct?

I think my example is wrong and whatever the weight of T::Currency::transfer is, it is already accounted for with the given parameters of the function, and there's no need to add it to the weight formula.

But then I wonder why do we even need to benchmark T::Currency::transfer? perhaps only to learn the edge cases better?

@shawntabrizi
Copy link
Member Author

@kianenigma good point. Actually I have no idea how to address this in our benchmarks currently. One crazy solution would be to have the trait functions be no_op, but that will certainly break logic.

The other one could be that we subtract weights to find the real value:

Weight function with transfer - weight of transfer = weight of function

@gavofyork
Copy link
Member

other one could be that we subtract weights to find the real value:

Yeah, this is probably what we'll need to do.

So we'll likely want to somehow figure out those internal calls into the Currency trait impl (i.e. the balances pallet) and get their weights, so we can subtract from the final weight and replace with a call to the weight-getter of the Currency trait.

Otherwise we'll be effectively fixing the weights corresponding to the balances pallet (which is used in tests/benchmarking runtime).

Might be that we need to have some tracking code in these inter-pallet traits' impls to help automate this during benchmarking. Otherwise it'll likely be down to manual annotation.

@athei
Copy link
Member

athei commented Jul 6, 2020

So we'll likely want to somehow figure out those internal calls into the Currency trait impl (i.e. the balances pallet) and get their weights, so we can subtract from the final weight and replace with a call to the weight-getter of the Currency trait.

But why would we do it? Is it "just" so we can calculate the corrected weight (post dispatch) when the Currency::transfer wasn't called?

Update decl_module macro to automatically generate the WeightInfo trait

Please make this generation optional. For the contracts pallet those won't correlate with the dispatchables of the pallet and therefore we need a custom WeightInfo trait there.

@shawntabrizi
Copy link
Member Author

Session, Elections Phragmen, and Multisig are yet to be merged, but everything seems to be converted.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Development

No branches or pull requests

6 participants