Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: no more user-required-changes to migration group loaders #6047

Merged
merged 5 commits into from
Jun 24, 2024

Conversation

kylemumma
Copy link
Member

@kylemumma kylemumma commented Jun 21, 2024

Users no longer need to make changes to the group loader when creating a new migration. They are now only responsible for naming the migration correctly xxxx_migration_name.py and putting it in the correct folder snuba_migrations/dataset/. The group loader will find the new migrations in these folders by itself.

Ordering the migrations run in is based on migration numbers.

edge cases and validation

Migration numbers must begin at 0001 and strictly increase by 1 which is validated. The following would cause errors:

  • 0000_migration_name.py
  • 0001_migration.py, 0002_migration.py, 0004_migration.py

how did i test

(see commit ac0f0a5 for this code)
I added assert to every single get_migrations to verify the new function generated the same as the hard-coded list. I ran snuba migrations list with the asserts in place to validate.

I verified edge-case / validation behavior by hand.

considerations

if users slightly misname their migration it will be ignored and they might not realize or if they do, understand why
such as 001_migration.py will be ignored and the only indication to user is that it wont show up in snuba list migrations

next steps

this will make my migration autogeneration easier since it doesnt need to modify the group loaders either.

@kylemumma kylemumma marked this pull request as ready for review June 21, 2024 15:40
@kylemumma kylemumma requested a review from a team as a code owner June 21, 2024 15:40
@kylemumma kylemumma changed the title verify that the new method produces correct results in all cases feat: no more user-changes to migration group loaders Jun 21, 2024
@kylemumma kylemumma changed the title feat: no more user-changes to migration group loaders feat: no more user-required-changes to migration group loaders Jun 21, 2024
f"Migrations in folder {migration_folder} were not formatted as expected. Expected migration number {str(i+1).zfill(4)} for migration {fname} but got {fname[:4]}"
)

return migration_filenames
Copy link
Member

@MeredithAnya MeredithAnya Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be caching the results of the migration filenames ? Or is this called infrequently enough that it wouldn't provide much benefit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what calls this function, but my guess (based on what the function does) would be that it is not called very frequently or under time sensitive circumstances.

Additionally since caching adds complexity, maybe it would be better to leave out unless we find it necessary in the future? Unless you have reason to believe that it may be necessary.

@MeredithAnya
Copy link
Member

if users slightly misname their migration it will be ignored and they might not realize or if they do, understand why
such as 001_migration.py will be ignored and the only indication to user is that it wont show up in snuba list migrations

This might be out of scope but could be cool to something similar to django migrations https://docs.djangoproject.com/en/5.0/topics/migrations/#workflow where the user can pass a name but it auto generates the numerical prefix

)
)
migration_filenames = []
pattern = re.compile(r"[0-9][0-9][0-9][0-9]_.*\.py")
Copy link
Contributor

@onewland onewland Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a nit but I think you could avoid the separate regex match plus file listing (os.listdir) by using glob: https://docs.python.org/3/library/glob.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Oliver I didn't know you could do this with glob. I updated it to use glob

Copy link
Contributor

@onewland onewland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm OK with this but I'd like us to consider the following:

I'm curious if we want the constraint that makes 0001_migration.py, 0002_migration.py, 0004_migration.py impossible. My thought: let's say 0003_migration.py makes an ALTER TABLE change that works in ClickHouse 21.x but not in 23.x. But the table works fine without it in both versions. This constraint forces us to keep the file around, maybe modifying it to be a no-op?

Also, consider the glob comment above.

@kylemumma
Copy link
Member Author

kylemumma commented Jun 21, 2024

This might be out of scope but could be cool to something similar to django migrations https://docs.djangoproject.com/en/5.0/topics/migrations/#workflow where the user can pass a name but it auto generates the numerical prefix

@MeredithAnya Good idea i'll make sure to have this in the autogeneration UI

I think I'm OK with this but I'd like us to consider the following:

I'm curious if we want the constraint that makes 0001_migration.py, 0002_migration.py, 0004_migration.py impossible. My thought: let's say 0003_migration.py makes an ALTER TABLE change that works in ClickHouse 21.x but not in 23.x. But the table works fine without it in both versions. This constraint forces us to keep the file around, maybe modifying it to be a no-op?

Also, consider the glob comment above.

@onewland Good point, it is an unnecessary constraint. I modified it s.t. the only constraint is non-duplicate migration numbers. Now something like this is valid: 0000, 0004, 0008
and they are ordered by migration number

@kylemumma kylemumma merged commit 869e1a7 into master Jun 24, 2024
28 checks passed
@kylemumma kylemumma deleted the krm/autogrouploader branch June 24, 2024 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants