-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
revdep simtrial test ERROR #5826
Comments
Thanks for reporting The next step probably would be to detect at which point different value is appearing, which could/should involve maintainer of the package. |
I had a look - it looks like it's actually the package https:/cran/survMisc/blob/master/R/ten.R I could not see any peeling away of parentheses. I also don't think the other two changes (error when logical vector does not match length of names and the edge case of |
https:/dardisco/survMisc has version 0.5.1 (2019) whereas CRAN is version 0.5.6 (2022) which suggests the author is no longer using that github repo. |
I confirm the observation of @ColeMiller1 --- the code in simtrial does not give any different results using data.table master, but the test in simtrial compares with a result computed by survMisc, which does have a different result using data.table master. To demonstrate this, I modified the test https:/Merck/simtrial/blob/main/tests/testthat/test-independent_test_wlr.R#L25 -- I put the following in survMisc-bug.R: set.seed(1234)
y <- simtrial::sim_pw_surv(n = 300) |> simtrial::cut_data_by_event(30)
adjust.methods <- "asymp"
wt <- list(a1 = c(0, 0), a2 = c(0, 1), a3 = c(1, 0), a4 = c(1, 1))
ties.method <- "efron"
one.sided <- TRUE
HT.est <- FALSE
max <- TRUE
alpha <- 0.025
data.anal <- data.frame(cbind(y$tte, y$event, y$treatment))
fit <- survMisc::ten(survival::Surv(y$tte, y$event) ~ y$treatment, data = y)
# Testing
survMisc::comp(fit, p = sapply(wt, function(x) {
x[1]
}), q = sapply(wt, function(x) {
x[2]
}))
tst.rslt <- attr(fit, "lrt")
z1 <- tst.rslt$Z
a2 <- y |> simtrial::counting_process(arm = "experimental")
aa <- simtrial::fh_weight(a2, rho_gamma = data.frame(rho = c(0, 0, 1, 1), gamma = c(0, 1, 0, 1)))
result.simtrial <- aa$z
result.survMisc <- c(z1[1], z1[7:9])
print(rbind(result.simtrial, result.survMisc))
packageVersion("data.table")
testthat::expect_equal(result.survMisc, result.simtrial, tolerance = 0.00001) Then I run the following
Compare result above (DT CRAN) with result below (DT master): > print(rbind(result.simtrial, result.survMisc))
[,1] [,2] [,3] [,4]
result.simtrial -0.1777396 -0.7132094 -0.03808899 -0.6657923
result.survMisc 5.6732146 4.7658047 5.26305082 -0.6657923 |
A somewhat smaller survMisc-bug.R code is set.seed(1234)
y <- simtrial::sim_pw_surv(n = 300) |> simtrial::cut_data_by_event(30)
wt <- list(a1 = c(0, 0), a2 = c(0, 1), a3 = c(1, 0), a4 = c(1, 1))
fit <- survMisc::ten(survival::Surv(y$tte, y$event) ~ y$treatment, data = y)
comp.res <- survMisc::comp(fit, p = sapply(wt, "[", 1), q = sapply(wt, "[", 2))
attr(fit, "lrt")
packageVersion("data.table") which gives the following output
Compare above (master) with below (CRAN)
|
> survMisc:::comp.ten
function (x, ..., p = 1, q = 1, scores = seq.int(attr(x, "ncg")),
reCalc = FALSE)
{
if (!reCalc & !is.null(attr(x, "lrt"))) {
print(attr(x, "lrt"))
print(if (!is.null(attr(x, "sup")))
attr(x, "sup")
else attr(x, "tft"))
return(invisible())
}
stopifnot(attr(x, "ncg") >= 2)
stopifnot(length(p) == length(q))
fh1 <- length(p)
if (!attr(x, "sorted") == "t")
data.table::setkey(x, t)
t1 <- x[e > 0, t, by = t][, t]
wt1 <- data.table::data.table(array(data = 1, dim = c(length(t1),
5L + fh1)))
FHn <- paste("FH_p=", p, "_q=", q, sep = "")
n1 <- c("1", "n", "sqrtN", "S1", "S2", FHn)
data.table::setnames(wt1, n1)
data.table::set(wt1, j = "n", value = x[e > 0, max(n), by = t][,
V1])
data.table::set(wt1, j = "sqrtN", value = wt1[, sqrt(.SD),
.SDcols = "n"])
data.table::set(wt1, j = "S1", value = cumprod(x[e > 0, 1 -
sum(e)/(max(n) + 1), by = t][, V1]))
data.table::set(wt1, j = "S2", value = wt1[, S1] * x[e >
0, max(n)/(max(n) + 1), by = t][, V1])
S3 <- sf(x = x[e > 0, sum(e), by = t][, V1], n = x[e > 0,
max(n), by = t][, V1], what = "S")
S3 <- c(1, S3[seq.int(length(S3) - 1L)])
wt1[, `:=`((FHn), mapply(function(p, q) S3^p * ((1 - S3)^q),
p, q, SIMPLIFY = FALSE))]
n2 <- c("W", "Q", "Var", "Z", "pNorm", "chiSq", "df", "pChisq")
res1 <- data.table::data.table(matrix(0, nrow = ncol(wt1),
ncol = length(n2)))
data.table::setnames(res1, n2)
data.table::set(res1, j = 1L, value = n1)
predict(x)
eMP1 <- attr(x, "pred")
eMP1 <- eMP1[rowSums(eMP1) > 0, ]
COV(x)
cov1 <- attr(x, "COV")
if (is.null(dim(cov1))) {
cov1 <- cov1[names(cov1) %in% t1]
}
else {
cov1 <- cov1[, , dimnames(cov1)[[3]] %in% t1]
}
ncg1 <- attr(x, "ncg")
if (ncg1 == 2) {
eMP1 <- unlist(eMP1[, .SD, .SDcols = (length(eMP1) -
1L)])
data.table::set(res1, j = "Q", value = colSums(wt1 *
eMP1))
data.table::set(res1, j = "Var", value = colSums(wt1^2 *
cov1))
n3 <- c("W", "maxAbsZ", "Var", "Q", "pSupBr")
res2 <- data.table::data.table(matrix(0, nrow = 5 + fh1,
ncol = length(n3)))
data.table::setnames(res2, n3)
data.table::set(res2, j = 1L, value = n1)
data.table::set(res2, j = "maxAbsZ", value = sapply(abs(cumsum(eMP1 *
wt1)), max))
data.table::set(res2, j = "Var", value = res1[, Var])
res2[, `:=`("Q", maxAbsZ/sqrt(Var))]
res2[, `:=`("pSupBr", sapply(Q, probSupBr))]
data.table::setattr(res2, "class", c("sup", class(res2)))
}
if (ncg1 > 2) {
df1 <- seq.int(ncg1 - 1L)
eMP1 <- eMP1[, .SD, .SDcols = grep("eMP_", names(eMP1))]
res3 <- data.table::data.table(array(0, dim = c(ncol(wt1),
4L)))
data.table::setnames(res3, c("W", "chiSq", "df", "pChisq"))
data.table::set(res3, j = 1L, value = n1)
eMP1w <- apply(wt1, MARGIN = 2, FUN = function(wt) colSums(sweep(eMP1,
MARGIN = 1, STATS = wt, FUN = "*")))
cov1w <- apply(wt1, MARGIN = 2, FUN = function(wt) rowSums(sweep(cov1,
MARGIN = 3, STATS = wt^2, FUN = "*"), dims = 2))
dim(cov1w) <- c(ncg1, ncg1, ncol(cov1w))
cov1ws <- cov1w[df1, df1, ]
cov1ws <- apply(cov1ws, MARGIN = 3, FUN = solve)
dim(cov1ws) <- c(max(df1), max(df1), length(n1))
eMP1ss <- eMP1w[df1, ]
data.table::set(res3, j = "chiSq", value = sapply(seq.int(length(n1)),
function(i) eMP1ss[, i] %*% cov1ws[, , i] %*% eMP1ss[,
i]))
res3[, `:=`("df", max(df1))]
res3[, `:=`("pChisq", 1 - stats::pchisq(chiSq, df))]
data.table::setattr(res3, "class", c("lrt", class(res3)))
sAC1 <- as.matrix(expand.grid(scores, scores))
scoProd1 <- apply(sAC1, MARGIN = 1, FUN = prod)
data.table::set(res1, j = "Q", value = colSums(eMP1w *
scores))
data.table::set(res1, j = "Var", value = abs(apply(cov1w *
scoProd1, MARGIN = 3, sum)))
}
res1[, `:=`("Z", Q/sqrt(Var))]
res1[, `:=`("pNorm", 2 * (1 - stats::pnorm(abs(Z))))]
res1[, `:=`("chiSq", Q^2/Var)]
res1[, `:=`("df", 1)]
res1[, `:=`("pChisq", 1 - stats::pchisq(chiSq, df))]
data.table::setattr(res1, "class", c("lrt", class(res1)))
data.table::set(wt1, j = "t", value = t1)
data.table::setattr(x, "lrw", wt1)
if (ncg1 == 2) {
data.table::setattr(x, "lrt", res1)
data.table::setattr(x, "sup", res2)
}
else {
data.table::setattr(x, "lrt", res3)
res1 <- list(tft = res1, scores = scores)
data.table::setattr(x, "tft", res1)
}
print(attr(x, "lrt"))
print(if (!is.null(attr(x, "sup")))
attr(x, "sup")
else attr(x, "tft"))
return(invisible())
}
<bytecode: 0xa92caa8>
<environment: namespace:survMisc> |
Here is an even smaller survMisc-bug.R, in which I copied some of the code from set.seed(1234)
y <- simtrial::sim_pw_surv(n = 300) |> simtrial::cut_data_by_event(30)
wt <- list(a1 = c(0, 0), a2 = c(0, 1), a3 = c(1, 0), a4 = c(1, 1))
fit <- survMisc::ten(survival::Surv(y$tte, y$event) ~ y$treatment, data = y)
predict(fit)
eMP1 <- attr(fit, "pred")
eMP1 <- eMP1[rowSums(eMP1) > 0, ]
unlist(eMP1[, .SD, .SDcols = (length(eMP1) - 1L)])
packageVersion("data.table") The important line which results in a difference is the second to last one, with .SDcols.
looks like this is a real bug in master. > x=data.table(a=1,b=2,c=3)
> x[, .SD, .SDcols=length(x)-1]
a b
<num> <num>
1: 1 2
> x[, .SD, .SDcols=2]
b
<num>
1: 2 The two results above should be the same, but are not. |
Great work on the MRE! I think I have a fix ready. Pretty obvious bug, surprised it's taken this long to realize! |
confirming that this revdep is now fixed |
revdep simtrial recently submitted an update which shows the error below using data.table master (but not with data.table from CRAN).
above output indicates the errors are numerical, which seems odd. But I double checked on another machine, the test failure is real.
git bisect says this PR is the cause #4470 which seems to be related to .SDcols, which seems inconsistent with the numerical errors in the test failure. I don't understand, but maybe @jangorecki and @ColeMiller1 could comment/investigate please? (you authored/commented on that PR)
test source code is on github
The text was updated successfully, but these errors were encountered: