-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sourcery Starbot ⭐ refactored hasansalimkanmaz/transformers #1
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sourcery timed out performing refactorings.
Due to GitHub API limits, only the first 60 comments can be shown.
conftest.py
Outdated
make_reports = terminalreporter.config.getoption("--make-reports") | ||
if make_reports: | ||
if make_reports := terminalreporter.config.getoption("--make-reports"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function pytest_terminal_summary
refactored with the following changes:
- Use named expression to simplify assignment and conditional (
use-named-expression
)
@@ -67,6 +67,7 @@ | |||
you need to go back to main before executing this. | |||
""" | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 83-90
refactored with the following changes:
- Replace call to format with f-string (
use-fstring-for-formatting
)
extras = {} | ||
extras = {"ja": deps_list("fugashi", "ipadic", "unidic_lite", "unidic")} | ||
|
||
extras["ja"] = deps_list("fugashi", "ipadic", "unidic_lite", "unidic") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 238-240
refactored with the following changes:
- Merge dictionary assignment with declaration (
merge-dict-assign
)
examples/flax/conftest.py
Outdated
make_reports = terminalreporter.config.getoption("--make-reports") | ||
if make_reports: | ||
if make_reports := terminalreporter.config.getoption("--make-reports"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function pytest_terminal_summary
refactored with the following changes:
- Use named expression to simplify assignment and conditional (
use-named-expression
)
examples/flax/test_flax_examples.py
Outdated
if os.path.exists(path): | ||
with open(path, "r") as f: | ||
results = json.load(f) | ||
else: | ||
if not os.path.exists(path): | ||
raise ValueError(f"can't find {path}") | ||
with open(path, "r") as f: | ||
results = json.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function get_results
refactored with the following changes:
- Swap if/else branches (
swap-if-else-branches
) - Remove unnecessary else after guard condition (
remove-unnecessary-else
)
mask_indices = np.asarray([self.random_spans_noise_mask(expandend_input_length) for i in range(batch_size)]) | ||
mask_indices = np.asarray( | ||
[ | ||
self.random_spans_noise_mask(expandend_input_length) | ||
for _ in range(batch_size) | ||
] | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function FlaxDataCollatorForT5MLM.__call__
refactored with the following changes:
- Replace unused for index with underscore (
for-index-underscore
)
else: | ||
if self.train_file is not None: | ||
extension = self.train_file.split(".")[-1] | ||
assert extension in ["csv", "json"], "`train_file` should be a csv or a json file." | ||
if self.validation_file is not None: | ||
extension = self.validation_file.split(".")[-1] | ||
assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file." | ||
if self.test_file is not None: | ||
extension = self.test_file.split(".")[-1] | ||
assert extension in ["csv", "json"], "`test_file` should be a csv or a json file." | ||
if self.train_file is not None: | ||
extension = self.train_file.split(".")[-1] | ||
assert extension in ["csv", "json"], "`train_file` should be a csv or a json file." | ||
if self.validation_file is not None: | ||
extension = self.validation_file.split(".")[-1] | ||
assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file." | ||
if self.test_file is not None: | ||
extension = self.test_file.split(".")[-1] | ||
assert extension in ["csv", "json"], "`test_file` should be a csv or a json file." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function DataTrainingArguments.__post_init__
refactored with the following changes:
- Remove unnecessary else after guard condition (
remove-unnecessary-else
)
layer_norm_named_params = set( | ||
[ | ||
layer[-2:] | ||
for layer_norm_name in layer_norm_candidates | ||
for layer in flat_params.keys() | ||
if layer_norm_name in "".join(layer).lower() | ||
] | ||
) | ||
layer_norm_named_params = { | ||
layer[-2:] | ||
for layer_norm_name in layer_norm_candidates | ||
for layer in flat_params.keys() | ||
if layer_norm_name in "".join(layer).lower() | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function create_train_state
refactored with the following changes:
- Replace list(), dict() or set() with comprehension (
collection-builtin-to-comprehension
) - Replace unneeded comprehension with generator (
comprehension-to-generator
)
schedule_fn = optax.join_schedules(schedules=[warmup_fn, decay_fn], boundaries=[num_warmup_steps]) | ||
return schedule_fn | ||
return optax.join_schedules( | ||
schedules=[warmup_fn, decay_fn], boundaries=[num_warmup_steps] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function create_learning_rate_fn
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
batch = shard(batch) | ||
|
||
yield batch | ||
yield shard(batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function train_data_collator
refactored with the following changes:
- Inline variable that is immediately yielded (
inline-immediately-yielded-variable
)
batch = {k: np.array(v) for k, v in batch.items()} | ||
|
||
yield batch | ||
yield {k: np.array(v) for k, v in batch.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function eval_data_collator
refactored with the following changes:
- Inline variable that is immediately yielded (
inline-immediately-yielded-variable
)
and not any(p["offsets"] == (0, 0) for p in predictions) | ||
and all(p["offsets"] != (0, 0) for p in predictions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function postprocess_qa_predictions
refactored with the following changes:
- Invert any/all to simplify comparisons (
invert-any-all
)
start_index = int(start_indexes[i]) | ||
for j in range(end_n_top): | ||
start_index = int(start_indexes[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function postprocess_qa_predictions_with_beam_search
refactored with the following changes:
- Hoist statements out of for/while loops (
hoist-statement-from-loop
)
else: | ||
if self.train_file is not None: | ||
extension = self.train_file.split(".")[-1] | ||
assert extension in ["csv", "json"], "`train_file` should be a csv or a json file." | ||
if self.validation_file is not None: | ||
extension = self.validation_file.split(".")[-1] | ||
assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file." | ||
if self.train_file is not None: | ||
extension = self.train_file.split(".")[-1] | ||
assert extension in ["csv", "json"], "`train_file` should be a csv or a json file." | ||
if self.validation_file is not None: | ||
extension = self.validation_file.split(".")[-1] | ||
assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function DataTrainingArguments.__post_init__
refactored with the following changes:
- Remove unnecessary else after guard condition (
remove-unnecessary-else
)
batch = {k: np.array(v) for k, v in batch.items()} | ||
|
||
yield batch | ||
yield {k: np.array(v) for k, v in batch.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function data_loader
refactored with the following changes:
- Inline variable that is immediately yielded (
inline-immediately-yielded-variable
)
batch = shard(batch) | ||
|
||
yield batch | ||
yield shard(batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function train_data_collator
refactored with the following changes:
- Inline variable that is immediately yielded (
inline-immediately-yielded-variable
)
batch = {k: np.array(v) for k, v in batch.items()} | ||
|
||
yield batch | ||
yield {k: np.array(v) for k, v in batch.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function eval_data_collator
refactored with the following changes:
- Inline variable that is immediately yielded (
inline-immediately-yielded-variable
)
label_list = list(unique_labels) | ||
label_list.sort() | ||
label_list = sorted(unique_labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function main
refactored with the following changes:
- Remove an unnecessary list construction call prior to sorting (
skip-sorted-list-construction
) - Simplify if expression by using or [×2] (
or-if-exp-identity
) - Merge nested if conditions (
merge-nested-ifs
)
This removes the following comments ( why? ):
# save checkpoint after each epoch and push checkpoint to the hub
schedule_fn = optax.join_schedules(schedules=[warmup_fn, decay_fn], boundaries=[num_warmup_steps]) | ||
return schedule_fn | ||
return optax.join_schedules( | ||
schedules=[warmup_fn, decay_fn], boundaries=[num_warmup_steps] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function create_learning_rate_fn
refactored with the following changes:
- Inline variable that is immediately returned (
inline-immediately-returned-variable
)
if ( | ||
return ( | ||
(cp >= 0x4E00 and cp <= 0x9FFF) | ||
or (cp >= 0x3400 and cp <= 0x4DBF) # | ||
or (cp >= 0x20000 and cp <= 0x2A6DF) # | ||
or (cp >= 0x2A700 and cp <= 0x2B73F) # | ||
or (cp >= 0x2B740 and cp <= 0x2B81F) # | ||
or (cp >= 0x2B820 and cp <= 0x2CEAF) # | ||
or (cp >= 0x3400 and cp <= 0x4DBF) | ||
or (cp >= 0x20000 and cp <= 0x2A6DF) | ||
or (cp >= 0x2A700 and cp <= 0x2B73F) | ||
or (cp >= 0x2B740 and cp <= 0x2B81F) | ||
or (cp >= 0x2B820 and cp <= 0x2CEAF) | ||
or (cp >= 0xF900 and cp <= 0xFAFF) | ||
or (cp >= 0x2F800 and cp <= 0x2FA1F) # | ||
): # | ||
return True | ||
|
||
return False | ||
or (cp >= 0x2F800 and cp <= 0x2FA1F) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function _is_chinese_char
refactored with the following changes:
- Simplify boolean if expression (
boolean-if-exp-identity
) - Remove unnecessary casts to int, str, float or bool (
remove-unnecessary-cast
) - Lift code into else after jump in control flow (
reintroduce-else
) - Replace if statement with if expression (
assign-if-exp
)
chinese_word = len(token) > 1 and is_chinese(token) | ||
if chinese_word: | ||
if chinese_word := len(token) > 1 and is_chinese(token): | ||
word_set.add(token) | ||
word_list = list(word_set) | ||
return word_list | ||
return list(word_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function get_chinese_word
refactored with the following changes:
- Use named expression to simplify assignment and conditional (
use-named-expression
) - Inline variable that is immediately returned (
inline-immediately-returned-variable
)
max_word_len = max([len(w) for w in chinese_word_set]) | ||
max_word_len = max(len(w) for w in chinese_word_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function add_sub_symbol
refactored with the following changes:
- Replace unneeded comprehension with generator (
comprehension-to-generator
) - Use f-string instead of string concatenation (
use-fstring-for-concatenation
)
bool(training_args.local_rank != -1), | ||
training_args.local_rank != -1, | ||
training_args.fp16, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function main
refactored with the following changes:
- Remove unnecessary casts to int, str, float or bool [×2] (
remove-unnecessary-cast
) - Merge else clause's nested if statement into elif (
merge-else-if-into-elif
) - Merge dictionary updates via the union operator (
dict-assign-update-to-union
)
examples/legacy/run_openai_gpt.py
Outdated
output = [] | ||
next(f) # skip the first line | ||
for line in tqdm(f): | ||
output.append((" ".join(line[1:5]), line[5], line[6], int(line[-1]) - 1)) | ||
output = [ | ||
(" ".join(line[1:5]), line[5], line[6], int(line[-1]) - 1) | ||
for line in tqdm(f) | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function load_rocstories_dataset
refactored with the following changes:
- Move assignment closer to its usage within a block (
move-assign-in-block
) - Convert for loop into list comprehension (
list-comprehension
)
return list(tokenize_and_encode(o) for o in obj) | ||
return [tokenize_and_encode(o) for o in obj] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function main
refactored with the following changes:
- Replace list(), dict() or set() with comprehension (
collection-builtin-to-comprehension
) - Invert any/all to simplify comparisons (
invert-any-all
)
logger.info("LOOKING AT {} test".format(data_dir)) | ||
logger.info(f"LOOKING AT {data_dir} test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function RaceProcessor.get_test_examples
refactored with the following changes:
- Replace call to format with f-string (
use-fstring-for-formatting
)
files = glob.glob(input_dir + "/*txt") | ||
files = glob.glob(f"{input_dir}/*txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function RaceProcessor._read_txt
refactored with the following changes:
- Use f-string instead of string concatenation (
use-fstring-for-concatenation
)
for _, data_raw in enumerate(lines): | ||
race_id = "%s-%s" % (set_type, data_raw["race_id"]) | ||
for data_raw in lines: | ||
race_id = f'{set_type}-{data_raw["race_id"]}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function RaceProcessor._create_examples
refactored with the following changes:
- Remove unnecessary calls to
enumerate
when the index is not used (remove-unused-enumerate
) - Replace interpolated string formatting with f-string (
replace-interpolation-with-fstring
)
logger.info("LOOKING AT {} train".format(data_dir)) | ||
logger.info(f"LOOKING AT {data_dir} train") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function SynonymProcessor.get_train_examples
refactored with the following changes:
- Replace call to format with f-string (
use-fstring-for-formatting
)
logger.info("LOOKING AT {} dev".format(data_dir)) | ||
logger.info(f"LOOKING AT {data_dir} dev") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function SynonymProcessor.get_dev_examples
refactored with the following changes:
- Replace call to format with f-string (
use-fstring-for-formatting
)
Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨
Here's your pull request refactoring your most popular Python repo.
If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.
Review changes via command line
To manually merge these changes, make sure you're on the
main
branch, then run: