Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Doc Test GPT-2 #16439

Merged
merged 10 commits into from
Apr 12, 2022
Merged

Add Doc Test GPT-2 #16439

merged 10 commits into from
Apr 12, 2022

Conversation

ArEnSc
Copy link
Contributor

@ArEnSc ArEnSc commented Mar 28, 2022

What does this PR do?

Fixes the broken doc tests for GPT-2
Apart of the documentation sprint work.

Fixes [Github issue] (#16292)

Before submitting

Who can review?

gpt2: @patrickvonplaten, @LysandreJik
Documentation: @sgugger

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 28, 2022

The documentation is not available anymore as the PR was closed or merged.

@ArEnSc ArEnSc changed the title [WIP] - Add Doc Test GPT-2 Add Doc Test GPT-2 Mar 28, 2022
@ArEnSc
Copy link
Contributor Author

ArEnSc commented Mar 28, 2022

I think this is what is required something up with CI failing code quality check?

#!/bin/bash -eo pipefail
black --check examples tests src utils
Skipping .ipynb files as Jupyter dependencies are not installed.
You can fix this by running ``pip install black[jupyter]``
would reformat src/transformers/models/gpt2/modeling_gpt2.py

Oh no! 💥 💔 💥
1 file would be reformatted, 1510 files would be left unchanged.

Exited with code exit status 1
CircleCI received exit code 1

@sgugger
Copy link
Collaborator

sgugger commented Mar 28, 2022

Yes, you need to run make style on your branch to make that test pass :-)

Pinging @ydshieh on this PR since Patrick is on vacation this week.

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 28, 2022

Hi, @ArEnSc

Thank you for this PR!

In order to run make style, you will need to run

pip install transformers[quality]

If you haven't done this before.

@ydshieh ydshieh self-requested a review March 28, 2022 15:35
@ArEnSc
Copy link
Contributor Author

ArEnSc commented Mar 29, 2022

  python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_new_models tests/bert_new/test_modeling_bert_new.py
  shell: /usr/bin/bash -e {0}
/usr/lib/python[3](https:/huggingface/transformers/runs/5729180598?check_suite_focus=true#step:6:3)/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (3.0.[4](https:/huggingface/transformers/runs/5729180598?check_suite_focus=true#step:6:4)) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/runner/.local/lib/python3.8/site-packages/pytest/__main__.py", line [5](https:/huggingface/transformers/runs/5729180598?check_suite_focus=true#step:6:5), in <module>
    raise SystemExit(pytest.console_main())
  File "/home/runner/.local/lib/python3.8/site-packages/_pytest/config/__init__.py", line 187, in console_main
    code = main()
  File "/home/runner/.local/lib/python3.8/site-packages/_pytest/config/__init__.py", line 145, in main
    config = _prepareconfig(args, plugins)
  File "/home/runner/.local/lib/python3.8/site-packages/_pytest/config/__init__.py", line 324, in _prepareconfig
    config = pluginmanager.hook.pytest_cmdline_parse(
  File "/home/runner/.local/lib/python3.8/site-packages/pluggy/_hooks.py", line 2[6](https:/huggingface/transformers/runs/5729180598?check_suite_focus=true#step:6:6)5, in __call__
    return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)
  File "/home/runner/.local/lib/python3.8/site-packages/pluggy/_manager.py", line 80, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
  File "/home/runner/.local/lib/python3.8/site-packages/pluggy/_callers.py", line 55, in _multicall
    gen.send(outcome)
  File "/home/runner/.local/lib/python3.8/site-packages/_pytest/helpconfig.py", line 102, in pytest_cmdline_parse
    config: Config = outcome.get_result()
  File "/home/runner/.local/lib/python3.8/site-packages/pluggy/_result.py", line 60, in get_result
    raise ex[1].with_traceback(ex[2])
  File "/home/runner/.local/lib/python3.8/site-packages/pluggy/_callers.py", line 39, in _multicall
    res = hook_impl.function(*args)
  File "/home/runner/.local/lib/python3.8/site-packages/_pytest/config/__init__.py", line 1016, in pytest_cmdline_parse
    self.parse(args)
  File "/home/runner/.local/lib/python3.8/site-packages/_pytest/config/__init__.py", line 1304, in parse
    self._preparse(args, addopts=addopts)
  File "/home/runner/.local/lib/python3.8/site-packages/_pytest/config/__init__.py", line 118[7](https:/huggingface/transformers/runs/5729180598?check_suite_focus=true#step:6:7), in _preparse
    self.pluginmanager.load_setuptools_entrypoints("pytest11")
  File "/home/runner/.local/lib/python3.[8](https:/huggingface/transformers/runs/5729180598?check_suite_focus=true#step:6:8)/site-packages/pluggy/_manager.py", line 287, in load_setuptools_entrypoints
    plugin = ep.load()
  File "/usr/lib/python3.8/importlib/metadata.py", line 77, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line [9](https:/huggingface/transformers/runs/5729180598?check_suite_focus=true#step:6:9)91, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line [10](https:/huggingface/transformers/runs/5729180598?check_suite_focus=true#step:6:10)[14](https:/huggingface/transformers/runs/5729180598?check_suite_focus=true#step:6:14), in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "/home/runner/.local/lib/python3.8/site-packages/_pytest/assertion/rewrite.py", line [16](https:/huggingface/transformers/runs/5729180598?check_suite_focus=true#step:6:16)8, in exec_module
    exec(co, module.__dict__)
  File "/home/runner/.local/lib/python3.8/site-packages/dash/__init__.py", line 5, in <module>
    from .dash import Dash, no_update  # noqa: F401,E402
  File "/home/runner/.local/lib/python3.8/site-packages/_pytest/assertion/rewrite.py", line 168, in exec_module
    exec(co, module.__dict__)
  File "/home/runner/.local/lib/python3.8/site-packages/dash/dash.py", line [18](https:/huggingface/transformers/runs/5729180598?check_suite_focus=true#step:6:18), in <module>
    from werkzeug.debug.tbtools import get_current_traceback
ImportError: cannot import name 'get_current_traceback' from 'werkzeug.debug.tbtools' (/home/runner/.local/lib/python3.8/site-packages/werkzeug/debug/tbtools.py)
I did run 
make fixup 
make style
Then I also merged master what am I missing for this last piece.
lmk if I am missing something

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 29, 2022

Hi, @ArEnSc

For this sprint, you don't need to test the model, but just to test the docstrings in model files.

You can see a guide here For Python files.

Before you run, you need to

pip install -e ".[dev]"

Let me know if this works for you.

@@ -61,7 +61,7 @@

logger = logging.get_logger(__name__)

_CHECKPOINT_FOR_DOC = "gpt2"
_CHECKPOINT_FOR_DOC = "distilgpt2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be changed. "gpt2" is the official checkpoint for GPT2 model, and it is used in the docstring example for GPT2Model and GPT2LMHeadModel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay got it will change back, I had thought they wanted us to use a lower resource version as per this instruction

Using a small model checkpoint instead of a large one: for example, change "facebook/bart-large" to "facebook/bart-base" (and adjust the expected outputs if any)

Comment on lines 1486 to 1499
expected_output=[
"LABEL_0",
"LABEL_0",
"LABEL_0",
"LABEL_0",
"LABEL_0",
"LABEL_0",
"LABEL_0",
"LABEL_0",
"LABEL_0",
"LABEL_0",
"LABEL_0",
"LABEL_0",
],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a helpful example at all, also, this model seems to be a sequence classification model according to its model card. The loss at 0.0 is very weird especially.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it is a text (sequence) classification model.

@ArEnSc Could you try if the following GPT2 token classification model ?

https://huggingface.co/brad1141/gpt2-finetuned-comp2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will take a look at this and give it a shot

@ArEnSc
Copy link
Contributor Author

ArEnSc commented Mar 29, 2022

Hi, @ArEnSc

For this sprint, you don't need to test the model, but just to test the docstrings in model files.

You can see a guide here For Python files.

Before you run, you need to

pip install -e ".[dev]"

Let me know if this works for you.

Yes I did run the required commands specifically:

python utils/prepare_for_doc_test.py src docs #line1 This command I didn't run because I was specifically working on modeling_gpt2
python utils/prepare_for_doc_test.py src/transformers/utils/doc.py src/transformers/models/gpt2/modeling_gpt2.py

pytest --doctest-modules src/transformers/models/gpt2/modeling_gpt2.py -sv --doctest-continue-on-failure # I ran this to run
 the test
 
 python utils/prepare_for_doc_test.py src docs --remove_new_line # ran this line to get everything back to normal

I am unsure about how to stop CI from running the add model like runner I suppose as that error came from CI
Thanks let me know!

@ydshieh
Copy link
Collaborator

ydshieh commented Mar 29, 2022

@ArEnSc

For now, You can ignore the errors on build_pr_documentation and Add new model like template tests from the CI.
We are currently working on these issues internally.

Then used a token classification model over a sequence model for an example.
@ArEnSc
Copy link
Contributor Author

ArEnSc commented Mar 30, 2022

@ydshieh @sgugger I think this should address the comments =)

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job & well done :-)
Thank you, @ArEnSc

"Lead",
"Lead",
"Lead",
],
Copy link
Collaborator

@ydshieh ydshieh Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sgugger

Patrick delegates the responsibility to me. I am still wondering if you have any extra comment for this PR though.

There are only 2 checkpoints on the Hub for GPT2 + token classification.
This one is trained on writing document evaluation dataset. The output for this example is therefore not really meaningful. However, I am in favor of merge as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger just following up on this. are we gonna move on this? and close the issue? Thanks =)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed my input was required here. Thanks for the pin @ArEnSc ! Fine by me but we should deactivate formatting for that one line to avoid wasting all that vertical space (and have the list back on one line). You can do so with a comment

# fmt: off

before and

# fmt: on

after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger will do!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for your PR!

"Lead",
"Lead",
"Lead",
],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed my input was required here. Thanks for the pin @ArEnSc ! Fine by me but we should deactivate formatting for that one line to avoid wasting all that vertical space (and have the list back on one line). You can do so with a comment

# fmt: off

before and

# fmt: on

after.

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 11, 2022

Hi, @ArEnSc Ping me for the merge once you finish the # fmt: off thing mentioned by Sylvain :-)

@ArEnSc
Copy link
Contributor Author

ArEnSc commented Apr 12, 2022

@ydshieh this one is good to go now! =)

@ydshieh ydshieh merged commit 924484e into huggingface:main Apr 12, 2022
@ydshieh
Copy link
Collaborator

ydshieh commented Apr 12, 2022

@ArEnSc

Hopefully ignores the formatting issue.

--> not just a hope, dream comes True now :-)

Thank you again for the contribution. Merged!

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
* First Pass All Tests Pass

* WIP

* Adding file to documentation tests

* Change the base model for the example in the doc test.

* Fix Code Styling by running
make fixup

* Called Style

* Reverted to gpt2 model rather than distill gpt2
Then used a token classification model over a sequence model for an example.

* Fix Styling Issue

* Hopefully ignores the formatting issue.

Co-authored-by: ArEnSc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants