Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

returning .SD by group doesn't unlock .SD; and GForce [[ non-atomic type causes trouble #4159

Closed
MichaelChirico opened this issue Jan 5, 2020 · 1 comment · Fixed by #4161
Labels
Milestone

Comments

@MichaelChirico
Copy link
Member

MichaelChirico commented Jan 5, 2020

Edit: Playing with one example uncovered two pretty unrelated bugs; not sure if I should open a separate issue. For now, I've edited the title to reflect both bugs. Here's the original description:

Related to #4156, I was playing around with tidyfast::dt_nest's output from their example:

dt <- data.table(
  x = rnorm(1e5),
  y = runif(1e5),
  grp = sample(1L:3L, 1e5, replace = TRUE)
)

nested <- dt_nest(dt, grp)

We could do this without the external lib like so:

nested <- dt[ , list(data = list(.SD)), keyby = grp]

And noticed the following error:

# works
nested$data
# gives (ugly) output, no error
nested[ , data[[1]], by = grp]
# now erros
nested$data
# [[1]]
# Error in `[.data.table`(x, i, , ) : 
#   Internal error: column type 'NULL' not supported by data.table subset. All known types are supported so please report as bug.

A bit hard to debug since most operations involving nested give an error, but there's this:

str(nested)
# Classes ‘data.table’ and 'data.frame':	3 obs. of  2 variables:
#  $ grp : int  1 2 3
#  $ data:List of 3
#   ..$ :Classes ‘data.table’ and 'data.frame':	33159 obs. of  2 variables:
#   .. ..$ : num  0.0534 -0.7286 0.823 -0.4527 -0.228 ...
#   .. ..$ : num  0.379 0.399 0.282 0.965 0.308 ...
#   .. ..- attr(*, ".internal.selfref")=<externalptr> 
#   .. ..- attr(*, ".data.table.locked")= logi TRUE
#   ..$ :Classes ‘data.table’ and 'data.frame':	33421 obs. of  2 variables:
#   .. ..$ : num  0.844 1.009 1.541 -0.174 0.353 ...
#   .. ..$ : num  0.716 0.798 0.83 0.306 0.279 ...
#   .. ..- attr(*, ".internal.selfref")=<externalptr> 
#   .. ..- attr(*, ".data.table.locked")= logi TRUE
#   ..$ :Classes ‘data.table’ and 'data.frame':	33420 obs. of  2 variables:
#   .. ..$ : num  1.061 -1.909 0.251 1.01 -1.053 ...
#   .. ..$ : num  0.366 0.55 0.62 0.936 0.219 ...
#   .. ..- attr(*, ".internal.selfref")=<externalptr> 
#   .. ..- attr(*, ".data.table.locked")= logi TRUE
#  - attr(*, "sorted")= chr "grp"
#  - attr(*, ".internal.selfref")=<externalptr> 

I see the nested data.tables are locked which they shouldn't be, so maybe that's related

@MichaelChirico
Copy link
Member Author

MichaelChirico commented Jan 5, 2020

OK, picture of what's going on is clearing up.

nested <- dt[ , list(data = list(.SD)), keyby = grp]
names(nested$data[[1L]])
# [1] "x" "y"
invisible(nested[ , data[[1L]], by = grp])
names(nested$data[[1L]])
# NULL

[ is erasing the names probably by-reference.

Also related to #3209 -- GForce is being turned on here, while it probably shouldn't:

GForce optimized j to '`g[[`(data, 1L)'

@MichaelChirico MichaelChirico changed the title corrupted list column leads to internal error returning .SD by group doesn't unlock .SD; and GForce [[ non-atomic type causes trouble Jan 5, 2020
@mattdowle mattdowle added this to the 1.12.9 milestone Jan 8, 2020
@jangorecki jangorecki modified the milestones: 1.12.11, 1.12.9 May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants