Make tokenize tests readable #1868

krammnic · 2024-10-19T18:55:11Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
clean up

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?

Tokenizer tests should be refactored (all models) #1823

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings

Will require changes in CI(pre-commit run makes expected_tokens lists unreadable)

pytorch-bot · 2024-10-19T18:55:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1868

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 8dd57b1 with merge base 3ca0d30 ():

NEW FAILURE - The following job has failed:

Lint / lint (3.10) (gh)
tests/torchtune/models/llama3/test_llama3_tokenizer.py:226:183: B950 line too long (182 > 120 characters)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

krammnic · 2024-10-19T18:56:21Z

cc: @RdoubleA @joecummings What do you think? With current lint formatting working with this tests is really awful. Pretty minor fix

codecov-commenter · 2024-10-20T15:46:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.15%. Comparing base (3ca0d30) to head (8dd57b1).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1868      +/-   ##
==========================================
- Coverage   69.69%   67.15%   -2.55%     
==========================================
  Files         308      308              
  Lines       16147    16133      -14     
==========================================
- Hits        11254    10834     -420     
- Misses       4893     5299     +406

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

krammnic · 2024-10-20T15:46:45Z

Lint CI at this point should be changed, if not the formating will be still really bad in case of expected_tokens

make tokenize tests readable

8dd57b1

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make tokenize tests readable #1868

Make tokenize tests readable #1868

krammnic commented Oct 19, 2024

pytorch-bot bot commented Oct 19, 2024 •

edited

Loading

krammnic commented Oct 19, 2024

codecov-commenter commented Oct 20, 2024

krammnic commented Oct 20, 2024 •

edited

Loading

Make tokenize tests readable #1868

Are you sure you want to change the base?

Make tokenize tests readable #1868

Conversation

krammnic commented Oct 19, 2024

Context

Changelog

Test plan

UX

pytorch-bot bot commented Oct 19, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1868

❌ 1 New Failure

krammnic commented Oct 19, 2024

codecov-commenter commented Oct 20, 2024

Codecov Report

krammnic commented Oct 20, 2024 • edited Loading

pytorch-bot bot commented Oct 19, 2024 •

edited

Loading

krammnic commented Oct 20, 2024 •

edited

Loading