Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourcery Starbot ⭐ refactored hasansalimkanmaz/transformers #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SourceryAI
Copy link

Thanks for starring sourcery-ai/sourcery ✨ 🌟 ✨

Here's your pull request refactoring your most popular Python repo.

If you want Sourcery to refactor all your Python repos and incoming pull requests install our bot.

Review changes via command line

To manually merge these changes, make sure you're on the main branch, then run:

git fetch https:/sourcery-ai-bot/transformers main
git merge --ff-only FETCH_HEAD
git reset HEAD^

Copy link
Author

@SourceryAI SourceryAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sourcery timed out performing refactorings.

Due to GitHub API limits, only the first 60 comments can be shown.

conftest.py Outdated
make_reports = terminalreporter.config.getoption("--make-reports")
if make_reports:
if make_reports := terminalreporter.config.getoption("--make-reports"):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function pytest_terminal_summary refactored with the following changes:

@@ -67,6 +67,7 @@
you need to go back to main before executing this.
"""


Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 83-90 refactored with the following changes:

Comment on lines -238 to -227
extras = {}
extras = {"ja": deps_list("fugashi", "ipadic", "unidic_lite", "unidic")}

extras["ja"] = deps_list("fugashi", "ipadic", "unidic_lite", "unidic")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 238-240 refactored with the following changes:

make_reports = terminalreporter.config.getoption("--make-reports")
if make_reports:
if make_reports := terminalreporter.config.getoption("--make-reports"):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function pytest_terminal_summary refactored with the following changes:

Comment on lines 65 to 68
if os.path.exists(path):
with open(path, "r") as f:
results = json.load(f)
else:
if not os.path.exists(path):
raise ValueError(f"can't find {path}")
with open(path, "r") as f:
results = json.load(f)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_results refactored with the following changes:

Comment on lines 340 to 345
mask_indices = np.asarray([self.random_spans_noise_mask(expandend_input_length) for i in range(batch_size)])
mask_indices = np.asarray(
[
self.random_spans_noise_mask(expandend_input_length)
for _ in range(batch_size)
]
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function FlaxDataCollatorForT5MLM.__call__ refactored with the following changes:

Comment on lines -290 to +216
else:
if self.train_file is not None:
extension = self.train_file.split(".")[-1]
assert extension in ["csv", "json"], "`train_file` should be a csv or a json file."
if self.validation_file is not None:
extension = self.validation_file.split(".")[-1]
assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file."
if self.test_file is not None:
extension = self.test_file.split(".")[-1]
assert extension in ["csv", "json"], "`test_file` should be a csv or a json file."
if self.train_file is not None:
extension = self.train_file.split(".")[-1]
assert extension in ["csv", "json"], "`train_file` should be a csv or a json file."
if self.validation_file is not None:
extension = self.validation_file.split(".")[-1]
assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file."
if self.test_file is not None:
extension = self.test_file.split(".")[-1]
assert extension in ["csv", "json"], "`test_file` should be a csv or a json file."
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DataTrainingArguments.__post_init__ refactored with the following changes:

Comment on lines 335 to 340
layer_norm_named_params = set(
[
layer[-2:]
for layer_norm_name in layer_norm_candidates
for layer in flat_params.keys()
if layer_norm_name in "".join(layer).lower()
]
)
layer_norm_named_params = {
layer[-2:]
for layer_norm_name in layer_norm_candidates
for layer in flat_params.keys()
if layer_norm_name in "".join(layer).lower()
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function create_train_state refactored with the following changes:

Comment on lines -384 to +296
schedule_fn = optax.join_schedules(schedules=[warmup_fn, decay_fn], boundaries=[num_warmup_steps])
return schedule_fn
return optax.join_schedules(
schedules=[warmup_fn, decay_fn], boundaries=[num_warmup_steps]
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function create_learning_rate_fn refactored with the following changes:

Comment on lines -401 to +312
batch = shard(batch)

yield batch
yield shard(batch)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function train_data_collator refactored with the following changes:

Comment on lines 418 to 415
batch = {k: np.array(v) for k, v in batch.items()}

yield batch
yield {k: np.array(v) for k, v in batch.items()}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function eval_data_collator refactored with the following changes:

and not any(p["offsets"] == (0, 0) for p in predictions)
and all(p["offsets"] != (0, 0) for p in predictions)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function postprocess_qa_predictions refactored with the following changes:

Comment on lines 347 to 341
start_index = int(start_indexes[i])
for j in range(end_n_top):
start_index = int(start_indexes[i])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function postprocess_qa_predictions_with_beam_search refactored with the following changes:

Comment on lines -313 to +216
else:
if self.train_file is not None:
extension = self.train_file.split(".")[-1]
assert extension in ["csv", "json"], "`train_file` should be a csv or a json file."
if self.validation_file is not None:
extension = self.validation_file.split(".")[-1]
assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file."
if self.train_file is not None:
extension = self.train_file.split(".")[-1]
assert extension in ["csv", "json"], "`train_file` should be a csv or a json file."
if self.validation_file is not None:
extension = self.validation_file.split(".")[-1]
assert extension in ["csv", "json"], "`validation_file` should be a csv or a json file."
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function DataTrainingArguments.__post_init__ refactored with the following changes:

Comment on lines 367 to 366
batch = {k: np.array(v) for k, v in batch.items()}

yield batch
yield {k: np.array(v) for k, v in batch.items()}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function data_loader refactored with the following changes:

Comment on lines -350 to +267
batch = shard(batch)

yield batch
yield shard(batch)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function train_data_collator refactored with the following changes:

Comment on lines 364 to 361
batch = {k: np.array(v) for k, v in batch.items()}

yield batch
yield {k: np.array(v) for k, v in batch.items()}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function eval_data_collator refactored with the following changes:

Comment on lines -472 to +369
label_list = list(unique_labels)
label_list.sort()
label_list = sorted(unique_labels)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

This removes the following comments ( why? ):

# save checkpoint after each epoch and push checkpoint to the hub

Comment on lines -242 to +165
schedule_fn = optax.join_schedules(schedules=[warmup_fn, decay_fn], boundaries=[num_warmup_steps])
return schedule_fn
return optax.join_schedules(
schedules=[warmup_fn, decay_fn], boundaries=[num_warmup_steps]
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function create_learning_rate_fn refactored with the following changes:

Comment on lines -20 to +29
if (
return (
(cp >= 0x4E00 and cp <= 0x9FFF)
or (cp >= 0x3400 and cp <= 0x4DBF) #
or (cp >= 0x20000 and cp <= 0x2A6DF) #
or (cp >= 0x2A700 and cp <= 0x2B73F) #
or (cp >= 0x2B740 and cp <= 0x2B81F) #
or (cp >= 0x2B820 and cp <= 0x2CEAF) #
or (cp >= 0x3400 and cp <= 0x4DBF)
or (cp >= 0x20000 and cp <= 0x2A6DF)
or (cp >= 0x2A700 and cp <= 0x2B73F)
or (cp >= 0x2B740 and cp <= 0x2B81F)
or (cp >= 0x2B820 and cp <= 0x2CEAF)
or (cp >= 0xF900 and cp <= 0xFAFF)
or (cp >= 0x2F800 and cp <= 0x2FA1F) #
): #
return True

return False
or (cp >= 0x2F800 and cp <= 0x2FA1F)
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function _is_chinese_char refactored with the following changes:

Comment on lines -48 to +47
chinese_word = len(token) > 1 and is_chinese(token)
if chinese_word:
if chinese_word := len(token) > 1 and is_chinese(token):
word_set.add(token)
word_list = list(word_set)
return word_list
return list(word_set)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_chinese_word refactored with the following changes:

Comment on lines -58 to +53
max_word_len = max([len(w) for w in chinese_word_set])
max_word_len = max(len(w) for w in chinese_word_set)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function add_sub_symbol refactored with the following changes:

Comment on lines 234 to 237
bool(training_args.local_rank != -1),
training_args.local_rank != -1,
training_args.fp16,
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

Comment on lines 67 to 72
output = []
next(f) # skip the first line
for line in tqdm(f):
output.append((" ".join(line[1:5]), line[5], line[6], int(line[-1]) - 1))
output = [
(" ".join(line[1:5]), line[5], line[6], int(line[-1]) - 1)
for line in tqdm(f)
]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function load_rocstories_dataset refactored with the following changes:

Comment on lines -192 to +193
return list(tokenize_and_encode(o) for o in obj)
return [tokenize_and_encode(o) for o in obj]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function main refactored with the following changes:

Comment on lines -276 to +271
logger.info("LOOKING AT {} test".format(data_dir))
logger.info(f"LOOKING AT {data_dir} test")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function RaceProcessor.get_test_examples refactored with the following changes:

Comment on lines -289 to +284
files = glob.glob(input_dir + "/*txt")
files = glob.glob(f"{input_dir}/*txt")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function RaceProcessor._read_txt refactored with the following changes:

Comment on lines 300 to 296
for _, data_raw in enumerate(lines):
race_id = "%s-%s" % (set_type, data_raw["race_id"])
for data_raw in lines:
race_id = f'{set_type}-{data_raw["race_id"]}'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function RaceProcessor._create_examples refactored with the following changes:

Comment on lines -325 to +320
logger.info("LOOKING AT {} train".format(data_dir))
logger.info(f"LOOKING AT {data_dir} train")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function SynonymProcessor.get_train_examples refactored with the following changes:

Comment on lines -330 to +325
logger.info("LOOKING AT {} dev".format(data_dir))
logger.info(f"LOOKING AT {data_dir} dev")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function SynonymProcessor.get_dev_examples refactored with the following changes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant