Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing crash when attempting to join on character(0) #4272

Merged
merged 11 commits into from
Jun 8, 2020
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,8 @@ unit = "s")

12. `rbindlist` no longer errors when coercing complex vectors to character vectors, [#4202](https:/Rdatatable/data.table/issues/4202). Thanks to @sritchie73 for reporting and the PR.

13. `X[Y, on=character(0)]` and `merge(X, Y, by.x=character(0), by.y=character(0))` no longer crash, [#4272](https:/Rdatatable/data.table/pull/4272). Thanks to @tlapak for the PR.

## NOTES

0. Retrospective license change permission was sought from and granted by 4 contributors who were missed in [PR#2456](https:/Rdatatable/data.table/pull/2456), [#4140](https:/Rdatatable/data.table/pull/4140). We had used [GitHub's contributor page](https:/Rdatatable/data.table/graphs/contributors) which omits 3 of these due to invalid email addresses, unlike GitLab's contributor page which includes the ids. The 4th omission was a PR to a script which should not have been excluded; a script is code too. We are sorry these contributors were not properly credited before. They have now been added to the contributors list as displayed on CRAN. All the contributors of code to data.table hold its copyright jointly; your contributions belong to you. You contributed to data.table when it had a particular license at that time, and you contributed on that basis. This is why in the last license change, all contributors of code were consulted and each had a veto.
Expand Down
2 changes: 1 addition & 1 deletion R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -3031,7 +3031,7 @@ isReallyReal = function(x) {
onsub = as.call(c(quote(c), onsub))
}
on = eval(onsub, parent.frame(2L), parent.frame(2L))
if (!is.character(on))
if (length(on) == 0L || !is.character(on))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, perfect. we also shouldn't have gotten to checking by.x&by.y separately in the first place because here by.x=by.y so simply by should be used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you mean. At this point we're not checking separately if we come through merge. merge sets by=by.x and then later calls y[x, on=by]. If we don't check in merge we catch it here but this is the point where it gets caught when using x[y] syntax.

(I would've been really mad if you had pushed a fix yesterday.)

stop("'on' argument should be a named atomic vector of column names indicating which columns in 'i' should be joined with which columns in 'x'.")
## extract the operators and potential variable names from 'on'.
## split at backticks to take care about variable names like `col1<=`.
Expand Down
4 changes: 2 additions & 2 deletions R/merge.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ merge.data.table = function(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FAL
if (!missing(by) && !missing(by.x))
warning("Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.")
if (!is.null(by.x)) {
if ( !is.character(by.x) || !is.character(by.y))
stop("A non-empty vector of column names are required for `by.x` and `by.y`.")
if (length(by.x) == 0L || !is.character(by.x) || !is.character(by.y))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha! I was just looking at this code yesterday and something looked funny but I didn't bother stress testing it. nice catch!

stop("A non-empty vector of column names is required for `by.x` and `by.y`.")
if (!all(by.x %chin% names(x)))
stop("Elements listed in `by.x` must be valid column names in x.")
if (!all(by.y %chin% names(y)))
Expand Down
9 changes: 8 additions & 1 deletion inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -15869,7 +15869,7 @@ test(2074.31, dcast(DT, V1 ~ z, fun.aggregate=eval(quote(length)), value.var='z'
test(2074.32, fwrite(DT, logical01=TRUE, logicalAsInt=TRUE), error="logicalAsInt has been renamed")

# merge.data.table
test(2074.33, merge(DT, DT, by.x = 1i, by.y=1i), error="A non-empty vector of column names are required")
test(2074.33, merge(DT, DT, by.x = 1i, by.y=1i), error="A non-empty vector of column names is required")

# shift naming
test(2074.34, shift(list(a=1:5, b=6:10), give.names=TRUE), list(a_lag_1=c(NA, 1:4), b_lag_1=c(NA, 6:9)))
Expand Down Expand Up @@ -16846,3 +16846,10 @@ A = data.table(A=c(complex(real = 1:3, imaginary=c(0, -1, 1)), NaN))
test(2138.3, rbind(A,B), data.table(A=c(as.character(A$A), B$A)))
A = data.table(A=as.complex(rep(NA, 5)))
test(2138.4, rbind(A,B), data.table(A=c(as.character(A$A), B$A)))

# Attempting to join on character(0) shouldn't crash R
A = data.table(A='a')
B = data.table(B='b')
test(2139.1, A[B, on=character(0)], error = "'on' argument should be a named atomic vector")
test(2139.2, merge(A, B, by=character(0) ), error = "non-empty vector of column names for `by` is required.")
test(2139.3, merge(A, B, by.x=character(0), by.y=character(0)), error = "non-empty vector of column names is required")