Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect foverlaps with (1) infinite and (2) same interval limits #1006

Closed
tdhock opened this issue Jan 13, 2015 · 5 comments
Closed

Incorrect foverlaps with (1) infinite and (2) same interval limits #1006

tdhock opened this issue Jan 13, 2015 · 5 comments
Assignees
Labels
Milestone

Comments

@tdhock
Copy link
Member

tdhock commented Jan 13, 2015

First of all thank you very much for the great foverlaps function, which I find very useful in my work in genomics and model selection.

(1) But I noticed some problems with the output. In particular please see the code below which contains a reference interval (-Inf, -0.1) which apparently can never be returned by foverlaps.

(2) Another issue is the fact that when the query/x parameter has the same upper and lower limit, I have to do subset(dt, start < query & query < end) to get the desired result.

These two issues are bugs, right?

works_with_R("3.1.2", data.table="1.9.4")

x <- c(-0.1, 0, 0.1)
n <- length(x)
dt.ref <- data.table(start=x[-n], end=x[-1])
setkey(dt.ref, start, end)
q <- c(-0.2, -0.05, 0.05, 0.15)
## query interval has the same lower and upper limit.
dt.query <- data.table(q1=q, q2=q)
setkey(dt.query, q1, q2)

ov <- foverlaps(dt.query, dt.ref)
## (2) Why do I have to do the subset below to get the expected result?
expected <- subset(ov, start < q1 & q1 < end)

## Same query as above, but a reference that has two more intervals,
## with infinite values.
x <- c(-Inf, -0.1, 0, 0.1, Inf)
n <- length(x)
dt.ref <- data.table(start=x[-n], end=x[-1])
setkey(dt.ref, start, end)
q <- c(-0.2, -0.05, 0.05, 0.15)
dt.query <- data.table(q1=q, q2=q)
setkey(dt.query, q1, q2)

ov <- foverlaps(dt.query, dt.ref)
result <- subset(ov, start < q1 & q1 < end)
stopifnot(nrow(result) == 4)
## (1) There should be a row with (start=-Inf, end=-0.1, q1=q2=-0.2), but
## there is not! Why?
@arunsrinivasan
Copy link
Member

Thanks. I've reproduced it in 1.9.5 as well. As a side note, it'd be nice if you could also test on devel versions.

I've fixed the first case. Now I get this:

ov <- foverlaps(dt.query, dt.ref, nomatch=0L)
ov
#    start end    q1    q2
# 1:  -0.1 0.0 -0.05 -0.05
# 2:   0.0 0.1  0.05  0.05

The second one is seemingly due to a bug in rolling joins itself... I've filed #1007. Once that's fixed, this should be as well.

@tdhock
Copy link
Member Author

tdhock commented Jan 14, 2015

Thanks very much for the quick response. Sorry for not testing on the development version at first -- I will do that next time.

I confirm the correct result now for (2) when using nomatch=0.

For (1) in my application I can just substitute -1e9 for -Inf but I will still leave the issue open until it is fixed.

@arunsrinivasan
Copy link
Member

@tdhock yes, that should work as a temporary fix. I'll close this after the other issue is fixed. Will see if I can get it done soon.

Thanks again for the nice reproducible example.

@arunsrinivasan arunsrinivasan added this to the v1.9.6 milestone Jan 15, 2015
@tdhock
Copy link
Member Author

tdhock commented Jan 22, 2015

Hey again, thanks for the previous fix but I seem to have found another bug.

Again I expected foverlaps to return only 1 row but it returns 2, why is that?

works_with_R("3.1.2",
             "hadley/testthat@575d09b891e370a2568f0e6ee609cda82ab60cbe",
             "Rdatatable/data.table@7f6b286d9961bc074a3473ba29747eef5a35dc84")

test_that("overlap join returns 1 row", {
  literal.ref <-
    data.table(min.log.lambda=c(-6.36917800737546, -2.19964384651646),
               max.log.lambda = c(-2.19964384651646, 4.07116428752538),
               peaks = c(5L, 2L))
  setkey(literal.ref, min.log.lambda, max.log.lambda)

  literal.query <-
    data.table(sample.id = "McGill0007", penalty = 2.91816502571793, 
               penalty2 = 2.91816502571793)
  setkey(literal.query, penalty, penalty2)

  subset.dt <-
    subset(literal.ref,
           min.log.lambda < literal.query$penalty &
           literal.query$penalty < max.log.lambda)
  expect_identical(nrow(subset.dt), 1L)

  overlap.dt <- foverlaps(literal.query, literal.ref)
  expect_identical(nrow(overlap.dt), 1L)
  expect_identical(overlap.dt$peaks, subset.dt$peaks)
})

I downloaded and tried this with today's most recent github version of data.table.

@arunsrinivasan
Copy link
Member

Thanks. Have fixed. Tricky FP arithmetic.. Hopefully we won't have to deal with it again..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants