Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Integration tests that move files can fail when run in parallel #4060

Closed
1 task done
kwigley opened this issue Oct 14, 2021 · 1 comment · Fixed by #4068 or #4072
Closed
1 task done

[Bug] Integration tests that move files can fail when run in parallel #4060

kwigley opened this issue Oct 14, 2021 · 1 comment · Fixed by #4068 or #4072
Assignees
Labels
bug Something isn't working

Comments

@kwigley
Copy link
Contributor

kwigley commented Oct 14, 2021

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Tests that move files in order to recreate user behavior modify python source code which can lead to failures when running tests in parallel.

Example of workflow failing: https:/dbt-labs/dbt-core/runs/3887780692?check_suite_focus=true#step:9:1350
Example of test that moves files:

def test_postgres_pp_models(self):
# initial run
self.run_dbt(['clean'])
results = self.run_dbt(["run"])
self.assertEqual(len(results), 1)
# add a model file
shutil.copyfile('extra-files/model_two.sql', 'models-a/model_two.sql')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 2)
# add a schema file
shutil.copyfile('extra-files/models-schema1.yml', 'models-a/schema.yml')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 2)
manifest = get_manifest()
self.assertIn('model.test.model_one', manifest.nodes)
model_one_node = manifest.nodes['model.test.model_one']
self.assertEqual(model_one_node.description, 'The first model')
self.assertEqual(model_one_node.patch_path, 'test://' + normalize('models-a/schema.yml'))
# add a model and a schema file (with a test) at the same time
shutil.copyfile('extra-files/models-schema2.yml', 'models-a/schema.yml')
shutil.copyfile('extra-files/model_three.sql', 'models-a/model_three.sql')
results = self.run_dbt(["--partial-parse", "test"], expect_pass=False)
self.assertEqual(len(results), 1)
manifest = get_manifest()
project_files = [f for f in manifest.files if f.startswith('test://')]
self.assertEqual(len(project_files), 4)
model_3_file_id = 'test://' + normalize('models-a/model_three.sql')
self.assertIn(model_3_file_id, manifest.files)
model_three_file = manifest.files[model_3_file_id]
self.assertEqual(model_three_file.parse_file_type, ParseFileType.Model)
self.assertEqual(type(model_three_file).__name__, 'SourceFile')
model_three_node = manifest.nodes[model_three_file.nodes[0]]
schema_file_id = 'test://' + normalize('models-a/schema.yml')
self.assertEqual(model_three_node.patch_path, schema_file_id)
self.assertEqual(model_three_node.description, 'The third model')
schema_file = manifest.files[schema_file_id]
self.assertEqual(type(schema_file).__name__, 'SchemaSourceFile')
self.assertEqual(len(schema_file.tests), 1)
tests = schema_file.get_all_test_ids()
self.assertEqual(tests, ['test.test.unique_model_three_id.6776ac8160'])
unique_test_id = tests[0]
self.assertIn(unique_test_id, manifest.nodes)
# modify model sql file, ensure description still there
shutil.copyfile('extra-files/model_three_modified.sql', 'models-a/model_three.sql')
results = self.run_dbt(["--partial-parse", "run"])
manifest = get_manifest()
model_id = 'model.test.model_three'
self.assertIn(model_id, manifest.nodes)
model_three_node = manifest.nodes[model_id]
self.assertEqual(model_three_node.description, 'The third model')
# Change the model 3 test from unique to not_null
shutil.copyfile('extra-files/models-schema2b.yml', 'models-a/schema.yml')
results = self.run_dbt(["--partial-parse", "test"], expect_pass=False)
manifest = get_manifest()
schema_file_id = 'test://' + normalize('models-a/schema.yml')
schema_file = manifest.files[schema_file_id]
tests = schema_file.get_all_test_ids()
self.assertEqual(tests, ['test.test.not_null_model_three_id.3162ce0a6f'])
not_null_test_id = tests[0]
self.assertIn(not_null_test_id, manifest.nodes.keys())
self.assertNotIn(unique_test_id, manifest.nodes.keys())
self.assertEqual(len(results), 1)
# go back to previous version of schema file, removing patch, test, and model for model three
shutil.copyfile('extra-files/models-schema1.yml', 'models-a/schema.yml')
os.remove(normalize('models-a/model_three.sql'))
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 2)
# remove schema file, still have 3 models
shutil.copyfile('extra-files/model_three.sql', 'models-a/model_three.sql')
os.remove(normalize('models-a/schema.yml'))
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 3)
manifest = get_manifest()
schema_file_id = 'test://' + normalize('models-a/schema.yml')
self.assertNotIn(schema_file_id, manifest.files)
project_files = [f for f in manifest.files if f.startswith('test://')]
self.assertEqual(len(project_files), 3)
# Put schema file back and remove a model
# referred to in schema file
shutil.copyfile('extra-files/models-schema2.yml', 'models-a/schema.yml')
os.remove(normalize('models-a/model_three.sql'))
with self.assertRaises(CompilationException):
results = self.run_dbt(["--partial-parse", "--warn-error", "run"])
# Put model back again
shutil.copyfile('extra-files/model_three.sql', 'models-a/model_three.sql')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 3)
# Add model four refing model three
shutil.copyfile('extra-files/model_four1.sql', 'models-a/model_four.sql')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 4)
# Remove model_three and change model_four to ref model_one
# and change schema file to remove model_three
os.remove(normalize('models-a/model_three.sql'))
shutil.copyfile('extra-files/model_four2.sql', 'models-a/model_four.sql')
shutil.copyfile('extra-files/models-schema1.yml', 'models-a/schema.yml')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 3)
# Remove model four, put back model three, put back schema file
shutil.copyfile('extra-files/model_three.sql', 'models-a/model_three.sql')
shutil.copyfile('extra-files/models-schema2.yml', 'models-a/schema.yml')
os.remove(normalize('models-a/model_four.sql'))
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 3)
# Add a macro
shutil.copyfile('extra-files/my_macro.sql', 'macros/my_macro.sql')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 3)
manifest = get_manifest()
macro_id = 'macro.test.do_something'
self.assertIn(macro_id, manifest.macros)
# Modify the macro
shutil.copyfile('extra-files/my_macro2.sql', 'macros/my_macro.sql')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 3)
# Add a macro patch
shutil.copyfile('extra-files/models-schema3.yml', 'models-a/schema.yml')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 3)
# Remove the macro
os.remove(normalize('macros/my_macro.sql'))
with self.assertRaises(CompilationException):
results = self.run_dbt(["--partial-parse", "--warn-error", "run"])
# put back macro file, got back to schema file with no macro
# add separate macro patch schema file
shutil.copyfile('extra-files/models-schema2.yml', 'models-a/schema.yml')
shutil.copyfile('extra-files/my_macro.sql', 'macros/my_macro.sql')
shutil.copyfile('extra-files/macros.yml', 'macros/macros.yml')
results = self.run_dbt(["--partial-parse", "run"])
# delete macro and schema file
print(f"\n\n*** remove macro and macro_patch\n\n")
os.remove(normalize('macros/my_macro.sql'))
os.remove(normalize('macros/macros.yml'))
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 3)
# Add an empty schema file
shutil.copyfile('extra-files/empty_schema.yml', 'models-a/eschema.yml')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 3)
# Add version to empty schema file
shutil.copyfile('extra-files/empty_schema_with_version.yml', 'models-a/eschema.yml')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 3)
# Disable model_three
shutil.copyfile('extra-files/model_three_disabled.sql', 'models-a/model_three.sql')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 2)
manifest = get_manifest()
model_id = 'model.test.model_three'
self.assertIn(model_id, manifest.disabled)
self.assertNotIn(model_id, manifest.nodes)
# Edit disabled model three
shutil.copyfile('extra-files/model_three_disabled2.sql', 'models-a/model_three.sql')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 2)
manifest = get_manifest()
model_id = 'model.test.model_three'
self.assertIn(model_id, manifest.disabled)
self.assertNotIn(model_id, manifest.nodes)
# Remove disabled from model three
shutil.copyfile('extra-files/model_three.sql', 'models-a/model_three.sql')
results = self.run_dbt(["--partial-parse", "run"])
self.assertEqual(len(results), 3)
manifest = get_manifest()
model_id = 'model.test.model_three'
self.assertIn(model_id, manifest.nodes)
self.assertNotIn(model_id, manifest.disabled)

Expected Behavior

Tests should be able to run in parallel.

Steps To Reproduce

I've been able to reproduce by running test in parallel locally. Something like:

python -m pytest -n12 -m profile_postgres test/integration

Relevant log output

No response

Environment

No response

What database are you using dbt with?

No response

Additional Context

No response

@kwigley kwigley added bug Something isn't working triage and removed triage labels Oct 14, 2021
@kwigley
Copy link
Contributor Author

kwigley commented Oct 14, 2021

A test's lifecycle looks something along the lines of:

  • create a temp dir
  • cd to this temp dir
  • symlink everything in the test module directory of the current test being run to this temp dir (ex: everything in test/integration/068_partial_parsing_tests)
  • write dbt_project.yml , profile, and other project files to this temp dir
  • run the test (still in the temp dir)
  • clean up the temp dir

Here is an example of a possible solution: https:/dbt-labs/dbt-core/compare/refactor-partial-parsing-tests

This solution copies and deletes files in a directory in the temp dir that is not symlinked back to the source code. This is still not ideal, I want to use this opportunity to try out a better approach to protect from ending up in this scenario in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants