Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Null unit test inputs raise error for non-nullable columns #10418

Closed
2 tasks done
smoothml opened this issue Jul 8, 2024 · 4 comments
Closed
2 tasks done

[Bug] Null unit test inputs raise error for non-nullable columns #10418

smoothml opened this issue Jul 8, 2024 · 4 comments
Labels
bug Something isn't working unit tests Issues related to built-in dbt unit testing functionality wontfix Not a bug or out of scope for dbt-core

Comments

@smoothml
Copy link

smoothml commented Jul 8, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When using Clickhouse it is common practice to not allows columns to be nullable so as to improve the efficiency of analytic-type queries. However, this is not compatible with how unit tests in DBT handle missing values and so results in an error such as the following:

DB::Exception: Cannot convert NULL to a non-nullable type: In scope SELECT CAST(1, 'UInt8') AS id, CAST(NULL, 'String') AS foo.

Expected Behavior

If a column is not-nullable and a value not provided in the test input, the values used should be the appropriate non-null default. For example, Clickhouse would typically use 0 instead of NULL for a non-nullable numeric column.

Steps To Reproduce

Consider the following example, based on the output of the dbt init command:

-- my_first_dbt_model.sql
{{ config(materialized='table') }}

select 1 as id, 'a' AS foo
union all
select 2 as id, 'b' AS foo

-- my_second_dbt_model.sql
select *
from {{ ref('my_first_dbt_model') }}
where id = 1

-- schema.yml
version: 2

models:
  - name: my_first_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        data_type: uint64
      - name: foo
        data_type: string
  - name: my_second_dbt_model
    description: "A starter dbt model"
    columns:
      - name: id
        data_type: uint64
      - name: foo
        data_type: string
unit_tests:
  - name: test_not_null
    model: my_second_dbt_model
    given:
      - input: ref('my_first_dbt_model')
        rows:
          - {id: 1}
          - {id: 2}
    expect:
      rows:
        - {id: 1}

Values for the column foo are omitted in the tests, with only the id values provided. Running dbt build or dbt test results in the error.

Relevant log output

>>> poetry run dbt build
11:10:02  Running with dbt=1.8.3
11:10:03  Registered adapter: clickhouse=1.8.0
11:10:03  Found 2 models, 443 macros, 1 unit test
11:10:03
11:10:03  Concurrency: 1 threads (target='test')
11:10:03
11:10:03  1 of 3 START sql table model `dap_test`.`my_first_dbt_model` ................... [RUN]
11:10:03  1 of 3 OK created sql table model `dap_test`.`my_first_dbt_model` .............. [OK in 0.07s]
11:10:03  2 of 3 START unit_test my_second_dbt_model::test_not_null ...................... [RUN]
11:10:03  2 of 3 ERROR my_second_dbt_model::test_not_null ................................ [ERROR in 0.04s]
11:10:03  3 of 3 SKIP relation dap_test.my_second_dbt_model .............................. [SKIP]
11:10:03
11:10:03  Finished running 1 table model, 1 unit test, 1 view model in 0 hours 0 minutes and 0.27 seconds (0.27s).
11:10:03
11:10:03  Completed with 1 error and 0 warnings:
11:10:03
11:10:03    Runtime Error in unit_test test_not_null (models/example/schema.yml)
  An error occurred during execution of unit test 'test_not_null'. There may be an error in the unit test definition: check the data types.
   Database Error
    Code: 70.
    DB::Exception: Cannot convert NULL to a non-nullable type: In scope SELECT CAST(1, 'UInt8') AS id, CAST(NULL, 'String') AS foo. Stack trace:

    0. Poco::Exception::Exception(String const&, int) @ 0x000000010ef61640
    1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000010906f044
    2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000104cb991c
    3. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x0000000104cc5178
    4. DB::(anonymous namespace)::FunctionCast::prepareUnpackDictionaries(std::shared_ptr<DB::IDataType const> const&, std::shared_ptr<DB::IDataType const> const&) const @ 0x000000010b9cb094
    5. DB::(anonymous namespace)::FunctionCast::prepare(std::vector<DB::ColumnWithTypeAndName, std::allocator<DB::ColumnWithTypeAndName>> const&) const @ 0x000000010b9c89e8
    6. DB::QueryAnalyzer::resolveFunction(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&) @ 0x000000010c68e090
    7. DB::QueryAnalyzer::resolveExpressionNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool) @ 0x000000010c679854
    8. DB::QueryAnalyzer::resolveExpressionNodeList(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool) @ 0x000000010c678dd8
    9. DB::QueryAnalyzer::resolveProjectionExpressionNodeList(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&) @ 0x000000010c698918
    10. DB::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c674a04
    11. DB::QueryAnalyzer::resolveUnion(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c676848
    12. DB::QueryAnalyzer::resolveExpressionNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool) @ 0x000000010c679ae4
    13. DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&) @ 0x000000010c69a694
    14. DB::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c6749c0
    15. DB::QueryAnalyzer::resolveExpressionNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, bool, bool, bool) @ 0x000000010c679764
    16. DB::QueryAnalyzer::resolveQueryJoinTreeNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::IdentifierResolveScope&, DB::QueryExpressionsAliasVisitor&) @ 0x000000010c69a694
    17. DB::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::IdentifierResolveScope&) @ 0x000000010c6749c0
    18. DB::QueryAnalyzer::resolve(std::shared_ptr<DB::IQueryTreeNode>&, std::shared_ptr<DB::IQueryTreeNode> const&, std::shared_ptr<DB::Context const>) @ 0x000000010c673f88
    19. DB::QueryAnalysisPass::run(std::shared_ptr<DB::IQueryTreeNode>&, std::shared_ptr<DB::Context const>) @ 0x000000010c6738a0
    20. DB::QueryTreePassManager::run(std::shared_ptr<DB::IQueryTreeNode>) @ 0x000000010c5f47fc
    21. DB::(anonymous namespace)::buildQueryTreeAndRunPasses(std::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::shared_ptr<DB::Context const> const&, std::shared_ptr<DB::IStorage> const&) @ 0x000000010cd88d40
    22. DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&, std::vector<String, std::allocator<String>> const&) @ 0x000000010cd86e58
    23. DB::InterpreterSelectQueryAnalyzer::getSampleBlock(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&) @ 0x000000010cd89160
    24. DB::InterpreterCreateQuery::getTablePropertiesAndNormalizeCreateQuery(DB::ASTCreateQuery&, DB::LoadingStrictnessLevel) const @ 0x000000010ccedbec
    25. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x000000010ccf2adc
    26. DB::InterpreterCreateQuery::execute() @ 0x000000010ccfc658
    27. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x000000010d0691b0
    28. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x000000010d065fd4
    29. DB::TCPHandler::runImpl() @ 0x000000010dd52f00
    30. DB::TCPHandler::run() @ 0x000000010dd68268
    31. Poco::Net::TCPServerConnection::start() @ 0x000000010f012378
11:10:03
11:10:03  Done. PASS=1 WARN=0 ERROR=1 SKIP=1 TOTAL=3

Environment

- OS: macOS
- Python: 3.12.3
- dbt: 1.8.3
- dbt-clickhouse: 1.8.0

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

We are using Clickhouse.

I previously brought this up in the Slack community and was directed here to raise an issue.

@smoothml smoothml added bug Something isn't working triage labels Jul 8, 2024
@dbeatty10 dbeatty10 added the unit tests Issues related to built-in dbt unit testing functionality label Jul 8, 2024
@dbeatty10
Copy link
Contributor

dbeatty10 commented Jul 8, 2024

Thanks for reaching out @smoothml !

It sounds to me like the root cause is that dbt-clickhouse doesn't yet support dbt unit tests rather than this being a bug in dbt-core. So I'm going to close this issue in favor of you opening an issue in dbt-clickhouse instead.

Specifically, it looks like dbt-clickhouse will need to override this line within get_empty_schema_sql:

      {{ cast('null', col['data_type']) }} as {{ col_name }}{{ ", " if not loop.last }}

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 8, 2024
@dbeatty10 dbeatty10 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Jul 8, 2024
@smoothml
Copy link
Author

smoothml commented Jul 9, 2024

Thanks for getting back to me so quickly @dbeatty10. As you recommended, I have raised the issue with dbt-clickhouse.

@steffen030
Copy link

steffen030 commented Jul 10, 2024

Hello @dbeatty10 , I think the problem is rather in this line, where the unit test boldly assumes that null could be an appropriate value. However, for not nullable data it's obviously not a great choice of a default value. In it's current implementation, that piece is also hard to override by an adapter as it is not an adapter specific macro.

One might tamper with safe_cast macro, but if might interfere with a different context. It could make sense to create a separate adapter specific macro that allows the selection of an appropriate default value.

@dbeatty10
Copy link
Contributor

Thanks for identifying the relevant portion of the code @steffen030 ! 🤩

My belief is that null is the best value here for a couple reasons:

  1. Most databases support mandatory Feature ID E131
    "Null value support (nulls in lieu of values)" from ANSI SQL by default (rather than requiring an opt-in)
  2. It's the one value that is valid across most data types

It looks like the crux of the issue is that ClickHouse has some restrictions / differences in behavior as it relates to E131 that is preventing null from being used in this way. That will naturally have some consequences for the adapter implementation for ClickHouse/dbt-clickhouse#315, possibly including relatively complicated macro overrides.

If it turns out there is a macro abstraction that dbt should consider, we'd welcome a feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unit tests Issues related to built-in dbt unit testing functionality wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

No branches or pull requests

3 participants