[Torch] Support returning quantized weights and bias for BYOC use cases #9135

masahi · 2021-09-27T13:32:30Z

This addresses the issue discussed in https://discuss.tvm.apache.org/t/qnn-pytorch-byoc-full-integer-qnn-support/11127

PyTorch stores quantized weights in a custom format, so we cannot directly access 8 bit weights as Numpy arrays. We use a PyTorch function to unpack quantized weights into float32 arrays and quantization parameters.

By default, we use qnn.op.quantize(...) to recover int8 weights in a QNN graph, return float32 weights to users, and rely on the QNN lowering and the Relay constant folding pass to quantize weights at compile time. In BYOC use cases, however, we cannot apply the constant folding pass on a QNN graph.

I added a new option to quantize weights in the frontend using a function that is equivalent to qnn.op.quantize(...) operating on Numpy arrays. In hindsight, we should've chosen this way from the beginning. The old behavior is kept as the default for backward compatibility.

cc @comaniac

comaniac

LGTM. Just a nit.

python/tvm/relay/frontend/pytorch.py

* main: Fix flaky NMS test by making sure scores are unique (apache#9140) [Relay] Merge analysis/context_analysis.cc and transforms/device_annotation.cc (apache#9038) [LLVM] Make changes needed for opaque pointers (apache#9138) Arm(R) Ethos(TM)-U NPU codegen integration (apache#8849) [CI] Split Integration tests out of first phase of pipeline (apache#9128) [Meta Schedule][M3b] Runner (apache#9111) Fix Google Mock differences between Ubuntu 18.04 and 16.04 (apache#9141) [TIR] add loop partition hint pragma (apache#9121) fix things (apache#9146) [Meta Schedule][M3a] SearchStrategy (apache#9132) [Frontend][PyTorch] support for quantized conv_transpose2d op (apache#9133) [UnitTest] Parametrized test_conv2d_int8_intrinsics (apache#9143) [OpenCL] Remove redundant visit statement in CodeGen. (apache#9144) [BYOC] support arbitrary input dims for add/mul/relu of dnnl c_src codegen (apache#9127) [Relay][ConvertLayout] Support for qnn.conv2d_transpose (apache#9139) add nn.global_avgpool to fq2i (apache#9137) [UnitTests] Enable minimum testing on Vulkan target in CI (apache#9093) [Torch] Support returning quantized weights and bias for BYOC use cases (apache#9135) [Relay] Prepare for new plan_devices.cc (part II) (apache#9130) [microTVM][Zephyr] Add MIMXRT1050 board support (apache#9068)

…es (apache#9135) * [Torch] refactored the way is bias quantization done * support returning 8bit weight * add test * add doc * pylint * return_int8_weight -> keep_quantized_weight * fixed for dynamic linear case * remove test function call * simplifying

masahi added 5 commits September 27, 2021 11:31

[Torch] refactored the way is bias quantization done

5617524

support returning 8bit weight

88c65d7

add test

1ebf646

add doc

3d3e050

pylint

cc518c5

masahi requested review from areusch, comaniac, Huyuwei, jroesch, junrushao, jwfromm, kazum, mbrookhart, merrymercy, siju-samuel, srkreddy1238, tqchen and yzhliu as code owners September 27, 2021 13:32

comaniac approved these changes Sep 27, 2021

View reviewed changes

python/tvm/relay/frontend/pytorch.py Outdated Show resolved Hide resolved

masahi added 4 commits September 28, 2021 14:11

return_int8_weight -> keep_quantized_weight

978e21f

fixed for dynamic linear case

4a6f8c3

remove test function call

59e5e88

simplifying

2b5ef07

masahi merged commit 4905a8c into apache:main Sep 28, 2021

masahi mentioned this pull request Oct 7, 2021

Enable for qnn operations for const folding transformation #9164

Closed

junrushao mentioned this pull request Nov 2, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torch] Support returning quantized weights and bias for BYOC use cases #9135

[Torch] Support returning quantized weights and bias for BYOC use cases #9135

masahi commented Sep 27, 2021 •

edited

Loading

comaniac left a comment

[Torch] Support returning quantized weights and bias for BYOC use cases #9135

[Torch] Support returning quantized weights and bias for BYOC use cases #9135

Conversation

masahi commented Sep 27, 2021 • edited Loading

comaniac left a comment

Choose a reason for hiding this comment

masahi commented Sep 27, 2021 •

edited

Loading