as.matrix.data.table incorrect results when nrows == 1 #2930

malcook · 2018-06-12T03:16:38Z

# Minimal reproducible example

library(data.table)
d<-data.table(name=c("bob","joe"),age=10:11,IQ=c(130,140))
as.matrix(d, 1) 
#>     age  IQ
#> bob  10 130
#> joe  11 140

as.matrix(d[1, ], 1)
#>   name  age  IQ   
#> 1 "bob" "10" "130"

^{Created on 2018-06-13 by the reprex package (v0.2.0).}

The problem is that the rownames argument to as.matrix.data.table is not being respected as documented:

rownames: optional, a single column name or column index to use as the
'rownames' in the returned 'matrix'. If 'TRUE' the 'key' of
the 'data.table' will be used if it is a single column,
otherwise the first column in the 'data.table' will be used.
Alternative a vector of length 'nrow(x)' to assign as the row
names of the returned 'matrix'.

We should expect the 2nd call above to yield a 1x2 matrix of numeric having the single rowname of "bob". Instead, it yields a 1x3 character matrix.

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /n/apps/CentOS7/install/r-3.4.3/lib64/R/lib/libRblas.so
LAPACK: /n/apps/CentOS7/install/r-3.4.3/lib64/R/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.2 devtools_1.13.5  

loaded via a namespace (and not attached):
[1] compiler_3.4.3 withr_2.1.2    memoise_1.1.0  digest_0.6.15

The text was updated successfully, but these errors were encountered:

HughParsonage · 2018-06-13T03:30:18Z

Doesn't seem to be a bug. (Did you mean as.matrix(d[1, 1])?) Please reopen and clarify with another example if you think it is.

malcook · 2018-06-13T04:20:25Z

@HughParsonage - I've clarified the original issue. I think it IS a bug. I don't think I have privs to re-open per https://stackoverflow.com/questions/21333654/how-to-re-open-an-issue-in-github - can you re-open for me if you agree?

HughParsonage · 2018-06-13T04:27:48Z

Thanks @malcook, good edit. I think this is a documentation issue: essentially rownames = 1 could mean the column index or a "vector of length 'nrow(x)'". It seems the latter takes precedence when available. (Consider as.matrix(d[1, ], 99).)

malcook · 2018-06-13T04:35:09Z

I think it is NOT a documentation issue. Try passing rownames='name' and we don't get the documented behavior. IMO, the mode and dimension of as.matrix(d,1) should not depend on the dimensionality of d. That is does reminds me of the default drop=TRUE option to [ which is annoying to everybody. (Erhm, ok, at least to me ;)

In fact...

I'm guessing that under the hood (in the data.table source code) it will prove to be exactly this underlying issue - and the fix will be to add drop=FALSE to some call to [.

HughParsonage · 2018-06-13T04:47:28Z

It's definitely an issue.

as.matrix(d[1, ], "name")
#>      name  age  IQ   
#> name "bob" "10" "130"

but this is a documented result:

rownames: optional, a single column name or column index to use as the
'rownames' in the returned 'matrix'. If 'TRUE' the 'key' of
the 'data.table' will be used if it is a single column,
otherwise the first column in the 'data.table' will be used.
Alternative a vector of length 'nrow(x)' to assign as the row
names of the returned 'matrix'.

Since nrow(x) == length("name") == 1, the last sentence is what determines the interpretation of rownames. Bear in mind that this is similar to data.frame:

data.frame(x = 1, y = 7, row.names = 2)
#>   x y
#> 2 1 7

data.frame(x = 1:2, y = 7:8, row.names = 2)
#>   x
#> 7 1
#> 8 2

Unfortunately, this means that forcing the column of a one-row data.frame to a row name is not possible, but that is the behaviour.

I think even if we agree that this is documented behaviour, the documentation should be changed to emphasize that if length(rownames) == nrow(x) then that behaviour takes precedence.

malcook · 2018-06-13T04:57:36Z

Hmm. I think you’re stretching.

Try setkey(d,’name’) and then read the documentation again, and then guess what the result of passing rownames=TRUE to as.matrix.

Also, decide what you wish/expect/hope the result of passing rownames=TRUE should be before trying it.

Now try it.

Surprised?

I claim that if you are correct and this is a bug in the documentation, and you “fix” the documentation to reflect current behavior, then the documentation will only more clearly reflect a bug in the design.

But I don’t think there is a bug in the design, except perhaps for trying to overload rownames with multiple possible conflicting possible interpretations. I hope the project finds a way to provide both behaviors and resolve the (presumably unintended) ambiguity.

MichaelChirico · 2018-06-13T05:13:59Z

feel free to file a PR that solves the issue as you see it :)

…

On Wed, Jun 13, 2018, 12:57 PM Malcolm Cook ***@***.***> wrote: Hmm. I think you’re stretching. Try setkey(d,’name’) and then read the documentation again, and then guess what the result of passing rownames=TRUE to as.matrix. Also, decide what you wish/expect/hope the result of passing rownames=TRUE should be before trying it. Now try it. Surprised? I claim that if you are correct and this is a bug in the documentation, and you “fix” the documentation to reflect current behavior, then the documentation will only more clearly reflect a bug in the design. But I don’t think there is a bug in the design, except perhaps for trying to overload rownames with multiple possible conflicting possible interpretations. I hope the project finds a way to provide both behaviors and resolve the (presumably unintended) ambiguity. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2930 (comment)>, or mute the thread <https:/notifications/unsubscribe-auth/AHQQdX5Zbpy-ZFleVnpTWkM8A87O7cBiks5t8JvDgaJpZM4Ujtlv> .

malcook · 2018-06-13T05:30:39Z

My suggestion would be to change the design in such a way that the mode and dimension of the result of as.matrix should never depend on the dimensionality of d, which it does now as implemented in this (arguably) edge case.

So, any change would have to be a (possibly) breaking change.

I'd be surprised if anyone discovered this and chose to depend upon it, but YMMV.

I'm not sure who vets/approves pull requests. Is that you @MichaelChirico ?

Anyway, if that person(s) agrees that this is a poor design, and should be fixed, then perhaps there is a chance that a pull request from me that fixes the design/implementation/code would be welcome.

But I would understand that a reasonable position might be that this is not a bad design, or the badness is insufficient to merit a possible breaking change.

Please advise, oh data.table gods, and I will try and respond accordingly.

Thanks

jangorecki · 2018-06-13T07:08:16Z

I am guessing it is as.array.data.table method responsible for this behavior.

malcook · 2018-06-13T07:09:04Z

Yes – it is

jangorecki · 2018-06-13T07:19:43Z

mode and dimension of the result of as.matrix should never depend on the dimensionality of d

and what if we want to apply same principle to as.array method? didn't look into the code yet but we may end up with conflict, simply because matrix is edge case of array.

malcook · 2018-06-13T07:23:05Z

There appears to be no data.table:::as.array.data.table so not much of an issue

jangorecki · 2018-06-13T07:37:17Z

so maybe there should be...
we have as.data.table.array already, reverse operation as.array.data.table would most often result in memory error because base array not scale in memory. It produce cross product of all elements for all columns, except the (single) measure.
as of now probably better just handle this edge case in as.matrix.

malcook · 2018-06-13T07:39:36Z

I don't really understand where you are leading, but feel it is not toward resolving the issue I've raise, but, thanks for your interest and please carry on if there is a relationship I am missing.

jangorecki · 2018-06-13T07:45:02Z

if result should not depend on dimensionality then we should focus on array method instead of matrix method which by name limits dimensionality.

malcook · 2018-06-15T01:23:54Z

I'm going to bow out of this exchange I've started. Thanks all for hearing me out. If you want to "fix" the documentation so others do not fall down the same rabbit hole as I did, that is one way of resolving it.

As always, thanks for data.table. I use it every day!

sritchie73 · 2018-06-17T03:02:11Z

Thanks for raising this @malcook , and apologies for not thinking of that edge case!

mattdowle · 2018-08-14T01:53:56Z

Reopening because PR #2939 by @sritchie73 (thanks) fixes this in agreement with @malcook, about to be merged.

mattdowle · 2018-08-14T03:07:31Z

In further agreement with @malcook, for even further clarity, safety and consistency, I've suggested a minor change to @sritchie73's PR here: #2939 (comment). I'll make the change to the PR unless there are any objections.

HughParsonage added the invalid label Jun 13, 2018

HughParsonage closed this as completed Jun 13, 2018

HughParsonage reopened this Jun 13, 2018

HughParsonage added documentation and removed invalid labels Jun 13, 2018

HughParsonage added a commit that referenced this issue Jun 14, 2018

Closes #2930 to reflect current behaviour

5d302d3

HughParsonage mentioned this issue Jun 14, 2018

Fix documentation of as.matrix to reflect current behaviour #2935

Closed

malcook closed this as completed Jun 18, 2018

mattdowle added bug and removed documentation labels Aug 14, 2018

mattdowle added this to the 1.11.6 milestone Aug 14, 2018

mattdowle reopened this Aug 14, 2018

mattdowle closed this as completed in 378689e Aug 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

as.matrix.data.table incorrect results when nrows == 1 #2930

as.matrix.data.table incorrect results when nrows == 1 #2930

malcook commented Jun 12, 2018 •

edited by HughParsonage

Loading

HughParsonage commented Jun 13, 2018

malcook commented Jun 13, 2018

HughParsonage commented Jun 13, 2018 •

edited

Loading

malcook commented Jun 13, 2018

HughParsonage commented Jun 13, 2018

malcook commented Jun 13, 2018

MichaelChirico commented Jun 13, 2018 via email

malcook commented Jun 13, 2018

jangorecki commented Jun 13, 2018 •

edited

Loading

malcook commented Jun 13, 2018 via email •

edited

Loading

jangorecki commented Jun 13, 2018 •

edited

Loading

malcook commented Jun 13, 2018

jangorecki commented Jun 13, 2018

malcook commented Jun 13, 2018

jangorecki commented Jun 13, 2018

malcook commented Jun 15, 2018

sritchie73 commented Jun 17, 2018

mattdowle commented Aug 14, 2018 •

edited

Loading

mattdowle commented Aug 14, 2018

as.matrix.data.table incorrect results when nrows == 1 #2930

as.matrix.data.table incorrect results when nrows == 1 #2930

Comments

malcook commented Jun 12, 2018 • edited by HughParsonage Loading

HughParsonage commented Jun 13, 2018

malcook commented Jun 13, 2018

HughParsonage commented Jun 13, 2018 • edited Loading

malcook commented Jun 13, 2018

HughParsonage commented Jun 13, 2018

malcook commented Jun 13, 2018

MichaelChirico commented Jun 13, 2018 via email

malcook commented Jun 13, 2018

jangorecki commented Jun 13, 2018 • edited Loading

malcook commented Jun 13, 2018 via email • edited Loading

jangorecki commented Jun 13, 2018 • edited Loading

malcook commented Jun 13, 2018

jangorecki commented Jun 13, 2018

malcook commented Jun 13, 2018

jangorecki commented Jun 13, 2018

malcook commented Jun 15, 2018

sritchie73 commented Jun 17, 2018

mattdowle commented Aug 14, 2018 • edited Loading

mattdowle commented Aug 14, 2018

malcook commented Jun 12, 2018 •

edited by HughParsonage

Loading

HughParsonage commented Jun 13, 2018 •

edited

Loading

jangorecki commented Jun 13, 2018 •

edited

Loading

malcook commented Jun 13, 2018 via email •

edited

Loading

jangorecki commented Jun 13, 2018 •

edited

Loading

mattdowle commented Aug 14, 2018 •

edited

Loading