Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operations with "by" do not dispatch to S3 methods #3533

Open
d-sci opened this issue May 2, 2019 · 6 comments
Open

Operations with "by" do not dispatch to S3 methods #3533

d-sci opened this issue May 2, 2019 · 6 comments

Comments

@d-sci
Copy link
Contributor

d-sci commented May 2, 2019

When using by, it appears that data.table is not able to dispatch S3 methods. The closest similar issue I could find was with this old SO one which was fixed in v1.9.6: https://stackoverflow.com/questions/33177009/s3-method-dispatch-in-data-table-when-using-by-clause. In this case, however, although it DOES maintain the class of the column being operated on, it does not dispatch the correct S3 method unless that is explicitly provided. In the reprex below, direction is of class circular and mean(c(20, 350)) should give 5, not 185. This works without by or when using by and explicitly calling circular::mean.circular, but not when using by and plain old mean:

library(data.table)

dt <- data.table(
  index = c("A", "A"),
  direction = circular::circular(
    c(20, 350),
    template = 'geographics',
    modulo = '2pi',
    units = 'degrees'
  )
)
dt
#>    index direction
#> 1:     A        20
#> 2:     A       350
dt[, mean(direction)]  # correct
#> Circular Data: 
#> Type = angles 
#> Units = degrees 
#> Template = geographics 
#> Modulo = 2pi 
#> Zero = 1.570796 
#> Rotation = clock 
#> [1] 5
dt[, mean(direction), by = index]$V1 # wrong!
#> Circular Data: 
#> Type = angles 
#> Units = degrees 
#> Template = geographics 
#> Modulo = 2pi 
#> Zero = 1.570796 
#> Rotation = clock 
#> [1] 185
dt[, circular::mean.circular(direction), by = index]$V1 # correct
#> Circular Data: 
#> Type = angles 
#> Units = degrees 
#> Template = geographics 
#> Modulo = 2pi 
#> Zero = 1.570796 
#> Rotation = clock 
#> [1] 5

sessionInfo()
#> R version 3.5.2 (2018-12-20)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 16.04.5 LTS
#> 
#> Matrix products: default
#> BLAS: /usr/lib/libblas/libblas.so.3.6.0
#> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] data.table_1.12.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.1      mvtnorm_1.0-10  digest_0.6.18   magrittr_1.5   
#>  [5] evaluate_0.13   highr_0.8       stringi_1.4.3   boot_1.3-20    
#>  [9] rmarkdown_1.12  tools_3.5.2     stringr_1.4.0   xfun_0.6       
#> [13] yaml_2.2.0      compiler_3.5.2  circular_0.4-93 htmltools_0.3.6
#> [17] knitr_1.22
@franknarf1
Copy link
Contributor

Looks like a bug per your link (where a similar issue was fixed).

In the meantime you can force data.table not to override mean (with its internal by-group-optimized gmean) and other functions with

options(datatable.optimize=1)

Details https://stackoverflow.com/questions/22137591/about-gforce-in-data-table-1-9-2 and ?GForce

@MichaelChirico
Copy link
Member

I guess this is the same issue as #1876 and #3079

@mattdowle
Copy link
Member

mattdowle commented May 13, 2019

I think it's correct that data.table does not dispatch to S3 methods in grouping. Because S3 methods are very slow when iterated. In the very unusual cases like this where the mean of the class is not the mean of the underlying type; i.e. mean(c(20, 350)) is not 185 but 5, then needing to write out the full method name seems reasonable; i.e. dt[, circular::mean.circular(direction), by = index].

#3079 seems different: that's because we're missing a copyMostAttrib in gforce median.
#1876 it seems different too, but I don't follow that one yet.

Can we close this issue as won't fix ?

@d-sci
Copy link
Contributor Author

d-sci commented May 14, 2019

Although I agree with Michael's points in #3546 (comment), I do see your point. The behaviour exhibited in the reprex is a bit baffling, but to be fair is explainable via a reading of ?data.table.optimize or the output of the call with verbose = T.

@jangorecki
Copy link
Member

Another issue is that method mean.circular might not be exported, forcing user to use ::: operator, which warns on CRAN.

@MichaelChirico
Copy link
Member

@jangorecki yes, see my linked comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants