Skip to content

Commit

Permalink
DT[NA] returns one all-NA row again, while retaining fix that DT[some…
Browse files Browse the repository at this point in the history
…Col==3] where DT is one row and someCol is NA, returns no rows for consistency with nrow>1 cases. +with=FALSE with := now warns. +nomatch with := now warns. +logical i no longer recycles unless length 1 or nrow. #1001 #1002 #759 #354 #166 and closes #1252.
  • Loading branch information
mattdowle committed Sep 29, 2016
1 parent 879a6ef commit 95e438c
Show file tree
Hide file tree
Showing 7 changed files with 180 additions and 115 deletions.
9 changes: 8 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@

46. `fread` gets new argument `file` which expect existing file on input, to ensure no shell commands will be executed when reading file. Closes [#1702](https:/Rdatatable/data.table/issues/1702).

47. `:=` can now add new columns when its RHS is length 0. An all-NA column is created of the same type as the empty RHS.

#### BUG FIXES

1. Now compiles and runs on IBM AIX gcc. Thanks to Vinh Nguyen for investigation and testing, [#1351](https:/Rdatatable/data.table/issues/1351).
Expand Down Expand Up @@ -188,7 +190,7 @@

41. `by=.EACHI` works as expected along with `mult="first"` and `mult="last"`, [#1287](https:/Rdatatable/data.table/issues/1287) and [#1271](https:/Rdatatable/data.table/issues/1271).

42. Subsets using logical expressions in `i` never returns all-`NA` rows. Edge case `DT[NA]` is now fixed, [#1252](https:/Rdatatable/data.table/issues/1252). Thanks to @sergiizaskaleta.
42. Subsets using logical expressions in `i` (e.g. `DT[someCol==3]`) no longer return an unintended all-NA row when `DT` consists of a single row and `someCol` contains `NA`, fixing [#1252](https:/Rdatatable/data.table/issues/1252). Thanks to @sergiizaskaleta for reporting. If `i` is the reserved symbol `NA` though (i.e. `DT[NA]`) it is still auto converted to `DT[NA_integer_]` so that a single `NA` row is returned as almost surely expected. For consistency with past behaviour and to save confusion when comparing to `DT[c(NA,1)]`.

43. `setattr()` catches logical input that points to R's global TRUE value and sets attributes on a copy instead, along with a warning, [#1281](https:/Rdatatable/data.table/issues/1281). Thanks to @tdeenes.

Expand Down Expand Up @@ -332,6 +334,11 @@

35. The option `datatable.old.bywithoutby` to restore the old default has been removed. As warned 2 years ago in release notes and explicitly warned about for 1 year when used. Search down this file for the text 'bywithoutby' to see previous notes on this topic.

36. Using `with=FALSE` together with `:=` was deprecated in v1.9.4 released 2 years ago (Oct 2014). As warned then in release notes (see below) this is now a warning with advice to wrap the LHS of `:=` with parenthesis; e.g. `myCols=c("colA","colB"); DT[,(myCols):=1]`.

37. Using `nomatch` together with `:=` now warns that it is ignored.

38. Logical `i` is no longer recycled. Instead an error message if it isn't either length 1 or `nrow(DT)`. This was hiding more bugs than was worth the rare convenience. The error message suggests to recycle explcitly; i.e. `DT[rep(<logical>,length=.N),...]`.

### Changes in v1.9.6 (on CRAN 19 Sep 2015)

Expand Down
107 changes: 60 additions & 47 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -389,7 +389,15 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
if (is.expression(jsub)) jsub = jsub[[1L]] # if expression, convert it to call
# Note that the dynamic expression could now be := (new in v1.9.7)
}
if (is.call(jsub) && jsub[[1L]] == ":=") allow.cartesian=TRUE # (see #800)
if (is.call(jsub) && jsub[[1L]] == ":=") {
allow.cartesian=TRUE # (see #800)
if (!missing(i) && !missing(keyby))
stop(":= with keyby is only possible when i is not supplied since you can't setkey on a subset of rows. Either change keyby to by or remove i")
if (!missingnomatch) {
warning("nomatch isn't relevant together with :=, ignoring nomatch")
nomatch=0L
}
}
}
}
bysub=NULL
Expand Down Expand Up @@ -421,6 +429,18 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
if (!missing(i)) {
xo = NULL
isub = substitute(i)
if (identical(isub, NA)) {
# only possibility *isub* can be NA (logical) is the symbol NA itself; i.e. DT[NA]
# replace NA in this case with NA_integer_ as that's almost surely what user intended to
# return a single row with NA in all columns. (DT[0] returns an empty table, with correct types.)
# Any expression (including length 1 vectors) that evaluates to a single NA logical will
# however be left as NA logical since that's important for consistency to return empty in that
# case; e.g. DT[Col==3] where DT is 1 row and Col contains NA.
# Replacing the NA symbol makes DT[NA] and DT[c(1,NA)] consistent and provides
# an easy way to achieve a single row of NA as users expect rather than requiring them
# to know and change to DT[NA_integer_].
isub=NA_integer_
}
isnull_inames = FALSE
nqgrp = integer(0) # for non-equi join
nqmaxgrp = 1L # for non-equi join
Expand Down Expand Up @@ -751,14 +771,21 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
# i is not a data.table
if (!is.logical(i) && !is.numeric(i)) stop("i has not evaluated to logical, integer or double")
if (is.logical(i)) {
if (isTRUE(i)) irows = i = NULL # fixes #1249
else if (identical(i, NA)) irows=i=integer(0) # DT[NA] thread recycling of NA logical exists,
# but for #1252 and consistency, we need to return 0-rows
else if (length(i)==nrow(x)) irows = i = which(i) # e.g. DT[colA>3,which=TRUE]
# also replacing 'i' here - to save memory, #926.
else irows=seq_len(nrow(x))[i] # e.g. recycling DT[c(TRUE,FALSE),which=TRUE], for completeness
# it could also be DT[!TRUE, which=TRUE] (silly cases, yes).
# replaced the "else if (!isTRUE(i))" to just "else". Fixes bug report #4930
if (isTRUE(i)) irows=i=NULL
# NULL is efficient signal to avoid creating 1:nrow(x) but still return all rows, fixes #1249

else if (length(i)<=1L) irows=i=integer(0)
# FALSE, NA and empty. All should return empty data.table. The NA here will be result of expression,
# where for consistency of edge case #1252 all NA to be removed. If NA is a single NA symbol, it
# was auto converted to NA_integer_ higher up for ease of use and convenience. We definitely
# don't want base R behaviour where DF[NA,] returns an entire copy filled with NA everywhere.

else if (length(i)==nrow(x)) irows=i=which(i)
# The which() here auto removes NA for convenience so user doesn't need to remember "!is.na() & ..."
# Also this which() is for consistenty of DT[colA>3,which=TRUE] and which(DT[,colA>3])
# Assigning to 'i' here as well to save memory, #926.

else stop("i evaluates to a logical vector length ", length(i), " but there are ", nrow(x), " rows. Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. Explicitly use rep(...,length=.N) if you really need to recycle.")
} else {
irows = as.integer(i) # e.g. DT[c(1,3)] and DT[c(-1,-3)] ok but not DT[c(1,-3)] (caught as error)
irows = .Call(CconvertNegativeIdx, irows, nrow(x)) # simplifies logic from here on (can assume positive subscripts)
Expand Down Expand Up @@ -817,25 +844,18 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {

# j was substituted before dealing with i so that := can set allow.cartesian=FALSE (#800) (used above in i logic)
if (is.null(jsub)) return(NULL)
if (is.call(jsub) && jsub[[1L]]==":=") {
# short circuit do-nothing, don't do further checks on .SDcols for example
if (identical(irows, integer())) {
if (identical(nomatch, 0L)) {
.global$print = address(x)
return(invisible(x)) # irows=NULL means all rows at this stage
} else irows = rep(NA_integer_, nrow(x)) # fix for #759
}
if (!with) {
if (is.null(names(jsub)) && is.name(jsub[[2L]])) {
# TO DO: uncomment these warnings in next release. Later, make both errors.
## warning("with=FALSE is deprecated when using :=. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples.")
jsub[[2L]] = eval(jsub[[2L]], parent.frame(), parent.frame())
} else {
## warning("with=FALSE ignored, it isn't needed when using :=. See ?':=' for examples.")
}
with = TRUE

if (!with && is.call(jsub) && jsub[[1L]]==":=") {
# TODO: make these both errors in next release. Then remove in release after that.
if (is.null(names(jsub)) && is.name(jsub[[2L]])) {
warning("with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples. As warned in 2014, this is now a warning.")
jsub[[2L]] = eval(jsub[[2L]], parent.frame(), parent.frame())
} else {
warning("with=FALSE ignored, it isn't needed when using :=. See ?':=' for examples.")
}
with = TRUE
}

if (!with) {
# missing(by)==TRUE was already checked above before dealing with i
if (is.call(jsub) && deparse(jsub[[1]], 500L) %in% c("!", "-")) { # TODO is deparse avoidable here?
Expand Down Expand Up @@ -878,7 +898,7 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
ansvars = names(x)[ if (notj) -j else j ] # DT[,!"columntoexclude",with=FALSE], if a copy is needed, rather than :=NULL
# DT[, c(1,3), with=FALSE] should clearly provide both 'x' columns
ansvals = if (notj) setdiff(seq_along(x), as.integer(j)) else as.integer(j)
} else stop("When with=FALSE, j-argument should be of type logical/character/integer indicating the columns to select.") # fix for #1440.
} else stop("When with=FALSE, j-argument should be of type logical/character/integer indicating the columns to select.") # fix for #1440.
if (!length(ansvals)) return(null.data.table())
} else { # with=TRUE and byjoin could be TRUE
bynames = NULL
Expand Down Expand Up @@ -1137,26 +1157,6 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
# Suppress print when returns ok not on error, bug #2376. Thanks to: http://stackoverflow.com/a/13606880/403310
# All appropriate returns following this point are wrapped; i.e. return(suppPrint(x)).

# FR #4996 - verbose message and return when a join matches nothing with `:=` in j
if (byjoin & !notjoin) {
# Note: !notjoin is here only until the notjoin is implemented as a "proper" byjoin
if (identical(nomatch,0L) && all(f__ == 0L)) {
if (verbose) cat("No rows pass i clause so quitting := early with no changes made.\n")
return(suppPrint(x))
} else if (all(is.na(f__))) { # nomatch can't be 0 here
# fix for #759
irows = rep(NA_integer_, nrow(x))
byjoin = FALSE
}
}
if (!is.null(irows)) {
if (!length(irows)) {
if (verbose) cat("No rows pass i clause so quitting := early with no changes made.\n")
return(suppPrint(x))
} else
if (!with) irows <- irows[!is.na(irows)] # fixes 2445. TO DO: added a message if(verbose) or warning?
if (!missing(keyby)) stop("When i is present, keyby := on a subset of rows doesn't make sense. Either change keyby to by, or remove i")
}
if (is.null(names(jsub))) {
# regular LHS:=RHS usage, or `:=`(...) with no named arguments (an error)
# `:=`(LHS,RHS) is valid though, but more because can't see how to detect that, than desire
Expand Down Expand Up @@ -1190,6 +1190,19 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
# updates by reference to existing columns
cols = as.integer(m)
newnames=NULL
if (identical(irows, integer())) {
# Empty integer() means no rows e.g. logical i with only FALSE and NA
# got converted to empty integer() by the which() above
# Short circuit and do-nothing since columns already exist. If some don't
# exist then for consistency with cases where irows is non-empty, we need to create
# them of the right type and populate with NA. Which will happen via the regular
# alternative branches below, to cover #759.
# We need this short circuit at all just for convenience. Otherwise users may need to
# fix errors in their RHS when called on empty edge cases, even when the result won't be
# used anyway (so it would be annoying to have to fix it.)
.global$print = address(x)
return(invisible(x))
}
} else {
# Adding new column(s). TO DO: move after the first eval in case the jsub has an error.
newnames=setdiff(lhs,names(x))
Expand Down
3 changes: 2 additions & 1 deletion R/test.data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,8 @@ test <- function(num,x,y,error=NULL,warning=NULL,output=NULL) {
setattr(xc,"index",NULL) # too onerous to create test RHS with the correct index as well, just check result
setattr(yc,"index",NULL)
if (identical(xc,yc) && identical(key(x),key(y))) return() # check key on original x and y because := above might have cleared it on xc or yc
if (isTRUE(all.equal(xc,yc)) && identical(key(x),key(y))) return()
if (isTRUE(all.equal(xc,yc)) && identical(key(x),key(y)) &&
identical(sapply(xc,typeof), sapply(yc,typeof))) return()
}
if (is.factor(x) && is.factor(y)) {
x = factor(x)
Expand Down
Loading

0 comments on commit 95e438c

Please sign in to comment.