-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-399] [Bug] seed : value made of underscore-separated numbers gets detected as int ("integer out of range" error) #4740
Comments
Hey @mdutoo , thanks for submitting this, and sorry for getting back to you late! This is what I had locally that solves the issue
@jtcohen6 One thing I noticed is that we get the agate table from the CSV, but looks like we just used the column type but not the value from it when running the seed. Is this the intended behavior? |
This is a weird one indeed. The cause for this issue seems to lie in ipdb> with open(path) as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print(", ".join(row))
underscore_detected_as_int
29051_00040 ipdb> import agate
ipdb> table = agate.Table.from_csv(path)
ipdb> table.print_table()
| underscore_detect... |
| -------------------- |
| 2,905,100,040 | |
@jtcohen6 This ticket came up in BLG today, but it seems that adding an explicit |
Hi, thanks for having a look at this bug. To give you an example, I've considered enforcing column types in all my seeds to prevent other similar cases, but that's quite some work, and so even wrote some scripts to automate that ! But that's making my dbt working differently from others, which is not a good practice. And BTW I don't like putting configuration in dbt_project.yml and prefer YAML config files. My idea is that the nicest way is maybe to make all these types explicit, for instance in a crude manner by generating the appropriate configuration to dbt_project.yml , using an operation or depending on a config variable. This way, finding out the cause of the problem (and all other similar ones) and solving it (changing the enforced type) becomes really quick. So the easiest solution is probably merely to tell developers in the documentation that type autodetection is there to help developers start out faster, but beyond that enforcing column types is strongly advised because type autodetection it is notoriously not reliable (and then see how to make that easier as said). |
@mdutoo Thanks for the response!
Good news: Since v0.21, you can also specify this config in a one-off yaml file: # seeds/config.yml
version: 2
seeds:
- name: seed_underscore_detected_as_int_example
config:
column_types:
underscore_detected_as_int: varchar(32)
I agree with this! Would you be up for opening an issue (or even a PR) over at https:/dbt-labs/docs.getdbt.com/issues? The current docs on the
In the meantime, I'm going to close the issue. We know that |
Is there an existing issue for this?
Current Behavior
On dbt 1.0.1 / Python 3.8.10 / Ubuntu 20.04, running dbt seed on the following file :
seed_underscore_detected_as_int_example.csv
raises the following error:
14:59:39 Database Error in seed seed_underscore_detected_as_int_example (seeds/seed_underscore_detected_as_int_example.csv)
14:59:39 integer out of range
The reason seems to be that the "29051_00040" value is wrongly read as "2905100040" and the column type wrongly detected as integer.
Workaround : enforcing the column type as text solves it.
Expected Behavior
Type of a column with value "29051_00040" should be detected as text.
Steps To Reproduce
See above
Relevant log output
No response
Environment
What database are you using dbt with?
postgres
Additional Context
No response
The text was updated successfully, but these errors were encountered: