FR: warn/error when updating and i has duplicates #2837

MichaelChirico · 2018-05-05T03:08:42Z

DT = data.table(a = 1:5)
DT[c(1, 1, 2), a := 3:5][]
#    a
# 1: 4
# 2: 5
# 3: 3
# 4: 4
# 5: 5

It seems the final element to be assigned (here 4 corresponds to the second instance of 1 in i). Not clear what the right behavior is in this case; my guess is most often it's a user mistake, hence a warning. But also possibly an error since "correct" behavior is ambiguous.

The text was updated successfully, but these errors were encountered:

st-pasha · 2018-05-05T06:23:10Z

Detecting duplicates in i involves counting uniques, which has non-trivial cost when i is large. I'm not sure it's worth the effort to try to catch this exceedingly rare user error...

The current behavior is "correct" in the following sense: as we go along vector i, we assign to each index the corresponding element from j. Thus, we first assign 3 to into row 1, then assign 4 again into row 1, and finally assign 5 into row 2. Unfortunately, this "correctness" will no longer hold once we use OMP to evaluate this query (presumably it will be much larger in size). If two different threads are assigned to store different values into the same row, then the end result can be either of those values (or even anything "in-between", since we will not be doing atomic writes).

MichaelChirico · 2018-05-05T09:36:51Z

not possible to use a hash table to say "seen"/"not seen" at each element?

what about an option to green light the speed hit?

also in the case of a keyed subset, the cost is trivial

franknarf1 · 2018-05-05T11:39:41Z

Somewhat related: the same problem during a join #2022

Fwiw, I find that I use both joins and Michael's approach, since often

# from the OP
mDT = data.table(row = c(1,1,2), new = 3:5)
DT[mDT$row, a := mDT$new]

is more expedient than adding a row-number column...

# adding a column and doing a join
mDT = data.table(row = c(1,1,2), new = 3:5)
DT[, row := .I]
DT[mDT, on=.(row), a := i.new]

so my code is vulnerable to both idioms until/unless I add a dupe check, anyDuplicated(mDT, by=on_cols) or something.

jangorecki · 2018-05-05T13:25:45Z

if we want to add so many checks we should also collect more attributes. We already keep info if object is sorted, we can also put info if any is NA, or if there are duplicates, uniqueN. So at least we can reduce overhead related to extra checks.

MichaelChirico · 2018-06-10T01:29:37Z

Related: #2879. Agree with Pasha that potential speed hit should be the primary consideration re: whether to implement this. At a minimum, we should make sure there's a quick blurb in ?data.table highlighting the existing behavior.

jangorecki added the documentation label Jun 10, 2018

franknarf1 mentioned this issue Feb 19, 2019

No warning when updating while joining with LHS of greater length #3420

Closed

MichaelChirico added a commit that referenced this issue May 4, 2019

Closes #2837 -- add a note to documentation about duplicate assignment

dac8870

MichaelChirico mentioned this issue May 4, 2019

Add a note to documentation about duplicate assignment #3541

Merged

mattdowle added this to the 1.12.4 milestone May 11, 2019

mattdowle closed this as completed in #3541 May 11, 2019

This was referenced May 12, 2019

.Last.updated value after :=, closes #1885 #3460

Merged

Warning for set / := when numToDo > nrow(x)? #3557

Open

franknarf1 mentioned this issue May 21, 2019

merge method ignores allow.cartesian #3576

Closed

franknarf1 mentioned this issue Aug 6, 2019

Silently in-place merge last row with on= when multiple rows match #3747

Open

jangorecki added the joins label Apr 17, 2020

jangorecki mentioned this issue Apr 17, 2020

allow.cartesian should be more precise #4383

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: warn/error when updating and i has duplicates #2837

FR: warn/error when updating and i has duplicates #2837

MichaelChirico commented May 5, 2018

st-pasha commented May 5, 2018

MichaelChirico commented May 5, 2018

franknarf1 commented May 5, 2018

jangorecki commented May 5, 2018 •

edited

Loading

MichaelChirico commented Jun 10, 2018

FR: warn/error when updating and i has duplicates #2837

FR: warn/error when updating and i has duplicates #2837

Comments

MichaelChirico commented May 5, 2018

st-pasha commented May 5, 2018

MichaelChirico commented May 5, 2018

franknarf1 commented May 5, 2018

jangorecki commented May 5, 2018 • edited Loading

MichaelChirico commented Jun 10, 2018

jangorecki commented May 5, 2018 •

edited

Loading