Rework how PreTrainedModel.from_pretrained handles its arguments #866

anlsh · 2019-07-22T20:42:34Z

Unification of the from_pretrained functions belonging to various modules (GPT2PreTrainedModel, OpenAIGPTPreTrainedModel, BertPreTrainedModel) brought changes to the function's argument handling which don't cause any issues within the repository itself (afaik), but have the potential to break a variety of downstream code (eg. my own).

In the last release of pytorch_transformers (v0.6.2), the from_pretrained functions took in *args and **kwargs and passed them directly to the relevant model's constructor (perhaps with some processing along the way). For a typical example, see from_pretrained's signature in modeling.py here https:/huggingface/pytorch-transformers/blob/b832d5bb8a6dfc5965015b828e577677eace601e/pytorch_pretrained_bert/modeling.py#L526

and the relevant usage of said arguments (after some small modifications) https:/huggingface/pytorch-transformers/blob/b832d5bb8a6dfc5965015b828e577677eace601e/pytorch_pretrained_bert/modeling.py#L600

In the latest release, the function's signature remains unchanged but the *args and most of the **kwargs parameters, in particular pretty much anything not explicitly accessed in [1]
https:/huggingface/pytorch-transformers/blob/b33a385091de604afb566155ec03329b84c96926/pytorch_transformers/modeling_utils.py#L354-L358

is ignored. If a key of kwargs is shared with the relevant model's configuration file then its value is still used to override said key (see the relevant logic here), but the current architecture breaks, for example, the following pattern which was previously possible.

class UsefulSubclass(BertForSequenceClassification)
    def __init__(self, *args, useful_argument, **kwargs):
        super().__init__(*args, **kwargs)
        *logic*

...
bert = UsefulSubclass.from_pretrained(model_name, useful_argument=42).

What's more, if these arguments have default values declared in __init__ then the entire pattern is broken silently: because these default values will never be overwritten via pretrained instantiation. Thus end users might continue running experiments passing different values of useful_argument to from_pretrained, unaware that nothing is actually being changed

As evidenced by issue #833, I'm not the only one whose code was broken. This commit implements behavior which is a compromise between the old and new behaviors. From my docstring:

If config is None, then **kwargs will be passed to the model.
If config is *not* None, then kwargs will be used to
override any keys shared with the default configuration for the
given pretrained_model_name_or_path, and only the unshared
key/value pairs will be passed to the model.

It would actually be ideal to avoid mixing configuration and model parameters entirely (via some sort of model_args parameter for example): however this fix has the advantages of

Not breaking code written during the pytorch-pretrained-bert era
Preserving (to the extent possible) the usage of the from_pretrained.**kwargs parameter introduced with pytorch-transformers

I have also included various other (smaller) changes in this pull request:

~~Making PreTrainedModel.__init__ not accept *args and **kwargs parameters which it has no use for and currently ignores~~ Apparently necessary for the tests to pass :(
Stop using the the "popping from kwargs" antipattern (see [1]). Keyword arguments with default values achieve the same thing more quickly, and are strictly more informative since they linters/autodoc modules can actually make use of them. I've replaced all instances that I could find, if this pattern exists elsewhere it should be removed. Oops: turns out this is a Python 2 compatibility thing. With that said, is there really a need to continue supporting Python 2? Especially with its EOL coming up in just a few months, and especially when it necessitates such ugly code...
Subsume the fix included in Fixed PreTrainedModel.from_pretrained(...) not passing cache_dir to PretrainedConfig.from_pretrained(...) #864 , which would conflict (admittedly in a very minor fashion) with this PR.
Remove some trailing whitespace which seems to have infiltrated the file

anlsh · 2019-07-22T21:02:31Z

Hmm, well that's embarrassing. I'll inspect the failing tests some more to see what's up

thomwolf

Thanks, this is a nice solution, I like it.

I think we still need to add a mention of this new behavior in the migration guide since people might use at the same time the new Configuration classes (which have num_classes attributes so they will eat the num_classes supplied in **kwargs) and model derived from the old ForSequence pattern.

pytorch_transformers/modeling_utils.py

thomwolf · 2019-07-23T13:01:43Z

Regarding python 2, yes we want to keep supporting it and thanks for taking care of it.

Google (which is still using python 2) is a major supplier of pretrained model and architectures and having python 2 support in the library make the job of re-implementing the models a lot easier (I can load TF and PT models side-by-side) :)

thomwolf · 2019-07-23T13:10:46Z

I have updated the readme breaking change section on this (ba52fe6)

anlsh · 2019-07-23T14:52:41Z

Thanks for the feedback: In my latest commits I've updated the documentation as requested and renamed the return_unused_args parameter to return_unused_kwargs to remove any ambiguity.

I also removed the unused *args parameter from PreTrainedConfig.from_pretrained, which is the only actual interface/logic change

thomwolf · 2019-07-23T16:05:47Z

Looks good to me, thanks a lot @xanlsh!

Anish Moorthy added 2 commits July 22, 2019 18:03

Make PreTrainedModel.from_pretrained pass unused arguments to model

b8009cb

Fix PretrainedModel.from_pretrained not passing cache_dir forward

490ebbd

anlsh changed the title ~~Rework how PreTrainedModel.from_pretrained handles its arguments, other changes~~ Rework how PreTrainedModel.from_pretrained handles its arguments Jul 22, 2019

thomwolf requested changes Jul 23, 2019

View reviewed changes

pytorch_transformers/modeling_utils.py Show resolved Hide resolved

pytorch_transformers/modeling_utils.py Outdated Show resolved Hide resolved

thomwolf mentioned this pull request Jul 23, 2019

missing 1 required positional argument: 'num_classes' in 'from_pretrained' #833

Closed

thomwolf mentioned this pull request Jul 23, 2019

Fixed PreTrainedModel.from_pretrained(...) not passing cache_dir to PretrainedConfig.from_pretrained(...) #864

Closed

Anish Moorthy added 2 commits July 23, 2019 10:43

Add docs for from_pretrained functions, rename return_unused_args

e179c55

Remove unused *args parameter from PreTrainedConfig.from_pretrained

4fb56c7

thomwolf merged commit 368670a into huggingface:master Jul 23, 2019

thomwolf mentioned this pull request Jul 23, 2019

PreTrainedModel.from_pretrained(...) doesn't pass cache_dir to PretrainedConfig.from_pretrained(...) #863

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework how PreTrainedModel.from_pretrained handles its arguments #866

Rework how PreTrainedModel.from_pretrained handles its arguments #866

anlsh commented Jul 22, 2019 •

edited

Loading

anlsh commented Jul 22, 2019

thomwolf left a comment

thomwolf commented Jul 23, 2019

thomwolf commented Jul 23, 2019

anlsh commented Jul 23, 2019 •

edited

Loading

thomwolf commented Jul 23, 2019

Rework how PreTrainedModel.from_pretrained handles its arguments #866

Rework how PreTrainedModel.from_pretrained handles its arguments #866

Conversation

anlsh commented Jul 22, 2019 • edited Loading

anlsh commented Jul 22, 2019

thomwolf left a comment

Choose a reason for hiding this comment

thomwolf commented Jul 23, 2019

thomwolf commented Jul 23, 2019

anlsh commented Jul 23, 2019 • edited Loading

thomwolf commented Jul 23, 2019

anlsh commented Jul 22, 2019 •

edited

Loading

anlsh commented Jul 23, 2019 •

edited

Loading