Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature to add _n alias to same column names #3147 #3158

Merged
merged 3 commits into from
Mar 17, 2021

Conversation

techytushar
Copy link
Contributor

@techytushar techytushar commented Mar 11, 2021

Signed-off-by: Tushar Mittal [email protected]

resolves #3147

Description

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Mar 11, 2021
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up @techytushar, this is awesome! I took this for a spin locally, and it resolves both of the reproduction cases mentioned in #3147 (calling run_query() in a file, and the original RPC request).

  • Could you add a test? I think a unit test is sufficient, passing multiple columns with the same name to process_results() and checking that all columns are present before and after.
  • Could you add yourself to the list of contributors?

I'll also tag one of the folks from the Core team for python review. In particular, I'd like to know if a loop-based approach would have meaningful performance implications at scale.

@jtcohen6 jtcohen6 requested review from a team, gshank and kwigley and removed request for a team March 15, 2021 14:58
@gshank
Copy link
Contributor

gshank commented Mar 15, 2021

I think the performance implications are trivial, and the fix looks good to me.

@techytushar
Copy link
Contributor Author

Hi @jtcohen6 , thank you for the review. In the unit tests folder there are lot of files, please can you tell in which one should I add the test? Also I am unable to find the list of contributors, can you guide me in this as well 🙂

@jtcohen6
Copy link
Contributor

@techytushar Good call! It doesn't look like any of the existing unit test files have related coverage, so I think a new unit test file makes sense—something like unit/test_sql_result.py. Here's what I'm envisioning as the basic setup:

import unittest
from dbt.adapters.sql.connections import SQLConnectionManager

class TestProcessSQLResult(unittest.TestCase):
    def test_duplicated_columns(self):
        cols_with_one_dupe = ['a', 'b', 'a', 'd']
        rows = [(1, 2, 3, 4)]

        self.assertEqual(
            SQLConnectionManager.process_results(cols_with_dupe, rows),
            [{"a": 1, "b": 2, "a_2": 3, "d": 4}]
        )
        
        cols_with_more_dupes = ['a', 'a', 'a']
        
        ... more test cases ...

There's also a code style error from flake8, once fixed the rest of the CI suite will run:

core/dbt/adapters/sql/connections.py:107:80: E501 line too long (81 > 79 characters)

The Contributors section is just a few lines further down in the Changelog here.

Let me know if you have more questions, happy to help!

Copy link
Contributor

@kwigley kwigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good 👍 thanks for the contribution

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @techytushar!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sql commands with same column name to dbt rpc produce weird results in table
4 participants