[Bug] Integration tests that move files can fail when run in parallel #4060

kwigley · 2021-10-14T00:42:30Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Tests that move files in order to recreate user behavior modify python source code which can lead to failures when running tests in parallel.

Example of workflow failing: https:/dbt-labs/dbt-core/runs/3887780692?check_suite_focus=true#step:9:1350
Example of test that moves files:

dbt-core/test/integration/068_partial_parsing_tests/test_partial_parsing.py

Lines 26 to 215 in fd7c95d

 def test_postgres_pp_models(self): 

 # initial run 

 self.run_dbt(['clean']) 

 results = self.run_dbt(["run"]) 

 self.assertEqual(len(results), 1) 

 # add a model file 

 shutil.copyfile('extra-files/model_two.sql', 'models-a/model_two.sql') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 2) 

 # add a schema file 

 shutil.copyfile('extra-files/models-schema1.yml', 'models-a/schema.yml') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 2) 

 manifest = get_manifest() 

 self.assertIn('model.test.model_one', manifest.nodes) 

 model_one_node = manifest.nodes['model.test.model_one'] 

 self.assertEqual(model_one_node.description, 'The first model') 

 self.assertEqual(model_one_node.patch_path, 'test://' + normalize('models-a/schema.yml')) 

 # add a model and a schema file (with a test) at the same time 

 shutil.copyfile('extra-files/models-schema2.yml', 'models-a/schema.yml') 

 shutil.copyfile('extra-files/model_three.sql', 'models-a/model_three.sql') 

 results = self.run_dbt(["--partial-parse", "test"], expect_pass=False) 

 self.assertEqual(len(results), 1) 

 manifest = get_manifest() 

 project_files = [f for f in manifest.files if f.startswith('test://')] 

 self.assertEqual(len(project_files), 4) 

 model_3_file_id = 'test://' + normalize('models-a/model_three.sql') 

 self.assertIn(model_3_file_id, manifest.files) 

 model_three_file = manifest.files[model_3_file_id] 

 self.assertEqual(model_three_file.parse_file_type, ParseFileType.Model) 

 self.assertEqual(type(model_three_file).__name__, 'SourceFile') 

 model_three_node = manifest.nodes[model_three_file.nodes[0]] 

 schema_file_id = 'test://' + normalize('models-a/schema.yml') 

 self.assertEqual(model_three_node.patch_path, schema_file_id) 

 self.assertEqual(model_three_node.description, 'The third model') 

 schema_file = manifest.files[schema_file_id] 

 self.assertEqual(type(schema_file).__name__, 'SchemaSourceFile') 

 self.assertEqual(len(schema_file.tests), 1) 

 tests = schema_file.get_all_test_ids() 

 self.assertEqual(tests, ['test.test.unique_model_three_id.6776ac8160']) 

 unique_test_id = tests[0] 

 self.assertIn(unique_test_id, manifest.nodes) 

 # modify model sql file, ensure description still there 

 shutil.copyfile('extra-files/model_three_modified.sql', 'models-a/model_three.sql') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 manifest = get_manifest() 

 model_id = 'model.test.model_three' 

 self.assertIn(model_id, manifest.nodes) 

 model_three_node = manifest.nodes[model_id] 

 self.assertEqual(model_three_node.description, 'The third model') 

 # Change the model 3 test from unique to not_null 

 shutil.copyfile('extra-files/models-schema2b.yml', 'models-a/schema.yml') 

 results = self.run_dbt(["--partial-parse", "test"], expect_pass=False) 

 manifest = get_manifest() 

 schema_file_id = 'test://' + normalize('models-a/schema.yml') 

 schema_file = manifest.files[schema_file_id] 

 tests = schema_file.get_all_test_ids() 

 self.assertEqual(tests, ['test.test.not_null_model_three_id.3162ce0a6f']) 

 not_null_test_id = tests[0] 

 self.assertIn(not_null_test_id, manifest.nodes.keys()) 

 self.assertNotIn(unique_test_id, manifest.nodes.keys()) 

 self.assertEqual(len(results), 1) 

 # go back to previous version of schema file, removing patch, test, and model for model three 

 shutil.copyfile('extra-files/models-schema1.yml', 'models-a/schema.yml') 

 os.remove(normalize('models-a/model_three.sql')) 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 2) 

 # remove schema file, still have 3 models 

 shutil.copyfile('extra-files/model_three.sql', 'models-a/model_three.sql') 

 os.remove(normalize('models-a/schema.yml')) 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 3) 

 manifest = get_manifest() 

 schema_file_id = 'test://' + normalize('models-a/schema.yml') 

 self.assertNotIn(schema_file_id, manifest.files) 

 project_files = [f for f in manifest.files if f.startswith('test://')] 

 self.assertEqual(len(project_files), 3) 

 # Put schema file back and remove a model 

 # referred to in schema file 

 shutil.copyfile('extra-files/models-schema2.yml', 'models-a/schema.yml') 

 os.remove(normalize('models-a/model_three.sql')) 

 with self.assertRaises(CompilationException): 

 results = self.run_dbt(["--partial-parse", "--warn-error", "run"]) 

 # Put model back again 

 shutil.copyfile('extra-files/model_three.sql', 'models-a/model_three.sql') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 3) 

 # Add model four refing model three 

 shutil.copyfile('extra-files/model_four1.sql', 'models-a/model_four.sql') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 4) 

 # Remove model_three and change model_four to ref model_one 

 # and change schema file to remove model_three 

 os.remove(normalize('models-a/model_three.sql')) 

 shutil.copyfile('extra-files/model_four2.sql', 'models-a/model_four.sql') 

 shutil.copyfile('extra-files/models-schema1.yml', 'models-a/schema.yml') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 3) 

 # Remove model four, put back model three, put back schema file 

 shutil.copyfile('extra-files/model_three.sql', 'models-a/model_three.sql') 

 shutil.copyfile('extra-files/models-schema2.yml', 'models-a/schema.yml') 

 os.remove(normalize('models-a/model_four.sql')) 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 3) 

 # Add a macro 

 shutil.copyfile('extra-files/my_macro.sql', 'macros/my_macro.sql') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 3) 

 manifest = get_manifest() 

 macro_id = 'macro.test.do_something' 

 self.assertIn(macro_id, manifest.macros) 

 # Modify the macro 

 shutil.copyfile('extra-files/my_macro2.sql', 'macros/my_macro.sql') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 3) 

 # Add a macro patch 

 shutil.copyfile('extra-files/models-schema3.yml', 'models-a/schema.yml') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 3) 

 # Remove the macro 

 os.remove(normalize('macros/my_macro.sql')) 

 with self.assertRaises(CompilationException): 

 results = self.run_dbt(["--partial-parse", "--warn-error", "run"]) 

 # put back macro file, got back to schema file with no macro 

 # add separate macro patch schema file 

 shutil.copyfile('extra-files/models-schema2.yml', 'models-a/schema.yml') 

 shutil.copyfile('extra-files/my_macro.sql', 'macros/my_macro.sql') 

 shutil.copyfile('extra-files/macros.yml', 'macros/macros.yml') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 # delete macro and schema file 

 print(f"\n\n*** remove macro and macro_patch\n\n") 

 os.remove(normalize('macros/my_macro.sql')) 

 os.remove(normalize('macros/macros.yml')) 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 3) 

 # Add an empty schema file 

 shutil.copyfile('extra-files/empty_schema.yml', 'models-a/eschema.yml') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 3) 

 # Add version to empty schema file 

 shutil.copyfile('extra-files/empty_schema_with_version.yml', 'models-a/eschema.yml') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 3) 

 # Disable model_three 

 shutil.copyfile('extra-files/model_three_disabled.sql', 'models-a/model_three.sql') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 2) 

 manifest = get_manifest() 

 model_id = 'model.test.model_three' 

 self.assertIn(model_id, manifest.disabled) 

 self.assertNotIn(model_id, manifest.nodes) 

 # Edit disabled model three 

 shutil.copyfile('extra-files/model_three_disabled2.sql', 'models-a/model_three.sql') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 2) 

 manifest = get_manifest() 

 model_id = 'model.test.model_three' 

 self.assertIn(model_id, manifest.disabled) 

 self.assertNotIn(model_id, manifest.nodes) 

 # Remove disabled from model three 

 shutil.copyfile('extra-files/model_three.sql', 'models-a/model_three.sql') 

 results = self.run_dbt(["--partial-parse", "run"]) 

 self.assertEqual(len(results), 3) 

 manifest = get_manifest() 

 model_id = 'model.test.model_three' 

 self.assertIn(model_id, manifest.nodes) 

 self.assertNotIn(model_id, manifest.disabled)

Expected Behavior

Tests should be able to run in parallel.

Steps To Reproduce

I've been able to reproduce by running test in parallel locally. Something like:

python -m pytest -n12 -m profile_postgres test/integration

Relevant log output

No response

Environment

No response

What database are you using dbt with?

No response

Additional Context

No response

The text was updated successfully, but these errors were encountered:

kwigley · 2021-10-14T00:47:32Z

A test's lifecycle looks something along the lines of:

create a temp dir
cd to this temp dir
symlink everything in the test module directory of the current test being run to this temp dir (ex: everything in test/integration/068_partial_parsing_tests)
write dbt_project.yml , profile, and other project files to this temp dir
run the test (still in the temp dir)
clean up the temp dir

Here is an example of a possible solution: https:/dbt-labs/dbt-core/compare/refactor-partial-parsing-tests

This solution copies and deletes files in a directory in the temp dir that is not symlinked back to the source code. This is still not ideal, I want to use this opportunity to try out a better approach to protect from ending up in this scenario in the future.

kwigley added bug Something isn't working triage and removed triage labels Oct 14, 2021

gshank self-assigned this Oct 14, 2021

gshank mentioned this issue Oct 15, 2021

[#4060] Refactor partial parsing test to avoid file collisions #4068

Merged

4 tasks

gshank added a commit that referenced this issue Oct 15, 2021

[#4060] Refactor partial parsing test to avoid file collisions

e965d1a

gshank closed this as completed in #4068 Oct 15, 2021

gshank added a commit that referenced this issue Oct 15, 2021

[#4060] Refactor partial parsing test to avoid file collisions (#4068)

80ba716

gshank mentioned this issue Oct 15, 2021

Add files in partial parsing test inadvertently not checked in #4072

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Integration tests that move files can fail when run in parallel #4060

[Bug] Integration tests that move files can fail when run in parallel #4060

kwigley commented Oct 14, 2021

kwigley commented Oct 14, 2021 •

edited

Loading

[Bug] Integration tests that move files can fail when run in parallel #4060

[Bug] Integration tests that move files can fail when run in parallel #4060

Comments

kwigley commented Oct 14, 2021

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

What database are you using dbt with?

Additional Context

kwigley commented Oct 14, 2021 • edited Loading

kwigley commented Oct 14, 2021 •

edited

Loading