FR: if CJ is passed data.table arguments, do a blocked cross-join #2343

MichaelChirico · 2017-09-08T04:08:30Z

I have a case where I don't want the outer product of all rows of some input, but rather the outer product of all blocks of rows of the input. It seems natural that CJ should be able to handle constructing this:

DT1 = data.table(x1 = c(1, 2), x2 = c(3, 4))
DT2 = data.table(y1 = c(5, 6, 7))

Desired output:

CJ(DT1, DT2)
#    x1 x2 y1
# 1:  1  3  5
# 2:  2  4  5
# 3:  1  3  6
# 4:  2  4  6
# 5:  1  3  7
# 6:  2  4  7

Hopefully it's sufficiently clear from this.

A hack is to do something like:

idxDT = CJ(seq_len(nrow(DT1)), seq_len(nrow(DT2)))
idxDT[ , cbind(DT1[V1], DT2[V2])]
#    x1 x2 y1
# 1:  1  3  5
# 2:  1  3  6
# 3:  1  3  7
# 4:  2  4  5
# 5:  2  4  6
# 6:  2  4  7

The order isn't particularly natural here, but doesn't matter in my application. Worse is that it's clunky and not easily extensible to having several more data.tables of input.

Most natural in current functionality (wrong) is CJ(DT1$x1, DT1$x2, DT2$y1), but this has too many rows and must be pared back.

The text was updated successfully, but these errors were encountered:

st-pasha · 2017-09-08T16:35:56Z

How about this:

> DT2[, (DT1), by=y1]
   y1 x1 x2
1:  5  1  3
2:  5  2  4
3:  6  1  3
4:  6  2  4
5:  7  1  3
6:  7  2  4

franknarf1 · 2017-09-08T16:51:28Z

I have a case where I don't want the outer product of all rows of some input, but rather the outer product of all blocks of rows of the input.

I don't see the distinction here. There are two rows in one table; three in the other; and the Cartesian product in the result (regarding rows as tuples and tables as sets of tuples), unless you just mean the row order.

Worse is that it's clunky and not easily extensible to having several more data.tables of input.

With Reduce...

CJDT = function(...) 
  Reduce(function(DT1, DT2) cbind(DT1, DT2[rep(1:.N, each=nrow(DT1))]), list(...))

Not sure if that's what you're after. I'm not crazy about the row ordering, so I'd probably do cbind(DT1[rep(1:.N, each=nrow(DT2))], DT2) instead, fwiw.

Btw, I guess this is related to Jan's CJ.dt #1717

MichaelChirico · 2017-10-23T05:37:27Z

Indeed I think #1717 covers this. Closing.

st-pasha added the feature request label Sep 8, 2017

MichaelChirico closed this as completed Oct 23, 2017

MichaelChirico mentioned this issue Jul 22, 2018

single symbol in j behaves strangely #2983

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: if CJ is passed data.table arguments, do a blocked cross-join #2343

FR: if CJ is passed data.table arguments, do a blocked cross-join #2343

MichaelChirico commented Sep 8, 2017 •

edited

Loading

st-pasha commented Sep 8, 2017 •

edited

Loading

franknarf1 commented Sep 8, 2017 •

edited

Loading

MichaelChirico commented Oct 23, 2017

FR: if CJ is passed data.table arguments, do a blocked cross-join #2343

FR: if CJ is passed data.table arguments, do a blocked cross-join #2343

Comments

MichaelChirico commented Sep 8, 2017 • edited Loading

st-pasha commented Sep 8, 2017 • edited Loading

franknarf1 commented Sep 8, 2017 • edited Loading

MichaelChirico commented Oct 23, 2017

MichaelChirico commented Sep 8, 2017 •

edited

Loading

st-pasha commented Sep 8, 2017 •

edited

Loading

franknarf1 commented Sep 8, 2017 •

edited

Loading