Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null unit test inputs raise error for non-nullable columns #315

Open
smoothml opened this issue Jul 9, 2024 · 0 comments
Open

Null unit test inputs raise error for non-nullable columns #315

smoothml opened this issue Jul 9, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@smoothml
Copy link

smoothml commented Jul 9, 2024

Describe the bug

When using Clickhouse it is common practice to not allows columns to be nullable so as to improve the efficiency of analytic-type queries. However, this is not compatible with how unit tests in DBT handle missing values and so results in an error such as the following:

DB::Exception: Cannot convert NULL to a non-nullable type: In scope SELECT CAST(1, 'UInt8') AS id, CAST(NULL, 'String') AS foo.

I originally raised this in dbt-core (here) and was directed here. One of the maintainers suggests where the root cause may be and suggested a fix in this comment:

It sounds to me like the root cause is that dbt-clickhouse doesn't yet support dbt unit tests rather than this being a bug in dbt-core. So I'm going to close this issue in favor of you opening an issue in dbt-clickhouse instead.

Specifically, it looks like dbt-clickhouse will need to override this line within get_empty_schema_sql:

      {{ cast('null', col['data_type']) }} as {{ col_name }}{{ ", " if not loop.last }}

Steps to reproduce

Consider the following example, based on the output of the dbt init command:

-- my_first_dbt_model.sql
{{ config(materialized='table') }}

select 1 as id, 'a' AS foo
union all
select 2 as id, 'b' AS foo

-- my_second_dbt_model.sql
select *
from {{ ref('my_first_dbt_model') }}
where id = 1

-- schema.yml
version: 2

models:
  - name: my_first_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        data_type: uint64
      - name: foo
        data_type: string
  - name: my_second_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        data_type: uint64
      - name: foo
        data_type: string
unit_tests:
  - name: test_not_null
    model: my_second_dbt_model
    given:
      - input: ref('my_first_dbt_model')
        rows:
          - {id: 1}
          - {id: 2}
    expect:
      rows:
        - {id: 1}

Values for the column foo are omitted in the tests, with only the id values provided. Running dbt build or dbt test results in the error.

Expected behaviour

If a column is not-nullable and a value not provided in the test input, the values used should be the appropriate non-null default. For example, Clickhouse would typically use 0 instead of NULL for a non-nullable numeric column.

dbt and/or ClickHouse server logs

Running dbt build on the above example results in the following stacktrace:

>>> poetry run dbt build
11:10:02  Running with dbt=1.8.3
11:10:03  Registered adapter: clickhouse=1.8.0
11:10:03  Found 2 models, 443 macros, 1 unit test
11:10:03
11:10:03  Concurrency: 1 threads (target='test')
11:10:03
11:10:03  1 of 3 START sql table model `dap_test`.`my_first_dbt_model` ................... [RUN]
11:10:03  1 of 3 OK created sql table model `dap_test`.`my_first_dbt_model` .............. [OK in 0.07s]
11:10:03  2 of 3 START unit_test my_second_dbt_model::test_not_null ...................... [RUN]
11:10:03  2 of 3 ERROR my_second_dbt_model::test_not_null ................................ [ERROR in 0.04s]
11:10:03  3 of 3 SKIP relation dap_test.my_second_dbt_model .............................. [SKIP]
11:10:03
11:10:03  Finished running 1 table model, 1 unit test, 1 view model in 0 hours 0 minutes and 0.27 seconds (0.27s).
11:10:03
11:10:03  Completed with 1 error and 0 warnings:
11:10:03
11:10:03    Runtime Error in unit_test test_not_null (models/example/schema.yml)
  An error occurred during execution of unit test 'test_not_null'. There may be an error in the unit test definition: check the data types.
   Database Error
    Code: 70.
    DB::Exception: Cannot convert NULL to a non-nullable type: In scope SELECT CAST(1, 'UInt8') AS id, CAST(NULL, 'String') AS foo. Stack trace:

    0. Poco::Exception::Exception(String const&, int) @ 0x000000010ef61640
    1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000010906f044
    2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000104cb991c
    3. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x0000000104cc5178
    4. DB::(anonymous namespace)::FunctionCast::prepareUnpackDictionaries(std::shared_ptr<DB::IDataType const> const&, std::shared_ptr<DB::IDataType const> const&) const @ 0x000000010b9cb094
    5. DB::(anonymous namespace)::FunctionCast::prepare(std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&) const @ 0x000000010b9c89e8
    6. DB::QueryAnalyzer::resolveFunction(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&) @ 0x000000010c68e090
    7. DB::QueryAnalyzer::resolveExpressionNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool) @ 0x000000010c679854
    8. DB::QueryAnalyzer::resolveExpressionNodeList(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool) @ 0x000000010c678dd8
    9. DB::QueryAnalyzer::resolveProjectionExpressionNodeList(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&) @ 0x000000010c698918
    10. DB::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c674a04
    11. DB::QueryAnalyzer::resolveUnion(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c676848
    12. DB::QueryAnalyzer::resolveExpressionNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool) @ 0x000000010c679ae4
    13. DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&) @ 0x000000010c69a694
    14. DB::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c6749c0
    15. DB::QueryAnalyzer::resolveExpressionNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool) @ 0x000000010c679764
    16. DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&) @ 0x000000010c69a694
    17. DB::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c6749c0
    18. DB::QueryAnalyzer::resolve(std::shared_ptr<DB::IQueryTreeNode>&, std::shared_ptr<DB::IQueryTreeNode> const&, std::shared_ptr<DB::Context const>) @ 0x000000010c673f88
    19. DB::QueryAnalysisPass::run(std::shared_ptr<DB::IQueryTreeNode>&, std::shared_ptr<DB::Context const>) @ 0x000000010c6738a0
    20. DB::QueryTreePassManager::run(std::shared_ptr<DB::IQueryTreeNode>) @ 0x000000010c5f47fc
    21. DB::(anonymous namespace)::buildQueryTreeAndRunPasses(std::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::shared_ptr<DB::Context const> const&, std::shared_ptr<DB::IStorage> const&) @ 0x000000010cd88d40
    22. DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::vector<String, std::allocator<String>> const&) @ 0x000000010cd86e58
    23. DB::InterpreterSelectQueryAnalyzer::getSampleBlock(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&) @ 0x000000010cd89160
    24. DB::InterpreterCreateQuery::getTablePropertiesAndNormalizeCreateQuery(DB::ASTCreateQuery&, DB::LoadingStrictnessLevel) const @ 0x000000010ccedbec
    25. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x000000010ccf2adc
    26. DB::InterpreterCreateQuery::execute() @ 0x000000010ccfc658
    27. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x000000010d0691b0
    28. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x000000010d065fd4
    29. DB::TCPHandler::runImpl() @ 0x000000010dd52f00
    30. DB::TCPHandler::run() @ 0x000000010dd68268
    31. Poco::Net::TCPServerConnection::start() @ 0x000000010f012378
11:10:03
11:10:03  Done. PASS=1 WARN=0 ERROR=1 SKIP=1 TOTAL=3

Environment

  • dbt version: 1.8.3
  • dbt-clickhouse version: 1.8.0
  • clickhouse-driver version (if using native): 0.2.8
  • clickhouse-connect version (if using http): 0.7.15
  • Python version: I've observed this behaviour with 3.10 and 3.12
  • Operating system: macOS

ClickHouse server

  • ClickHouse Server version: 24.5.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant