-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memrecycle no longer errors when attempting to coerce CPLXSXP to STRSXP #4203
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4203 +/- ##
==========================================
+ Coverage 99.61% 99.61% +<.01%
==========================================
Files 72 72
Lines 13875 13916 +41
==========================================
+ Hits 13821 13862 +41
Misses 54 54
Continue to review full report at Codecov.
|
src/data.table.h
Outdated
@@ -47,6 +47,9 @@ typedef R_xlen_t RLEN; | |||
#define NA_INTEGER64 INT64_MIN | |||
#define MAX_INTEGER64 INT64_MAX | |||
|
|||
// for use with CPLXSXP, no macro provided by R internals | |||
#define ISNAN_COMPLEX(x) (ISNAN(x.r) || ISNAN(x.i)) // TRUE if either real or imaginary component is NA or NaN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few other places this could be used if we're going to add it:
src/coalesce.c:128: if (ISNAN(tt.r) && ISNAN(tt.i)) continue;
src/coalesce.c:134: const bool final = !ISNAN(finalVal.r) && !ISNAN(finalVal.i);
src/coalesce.c:138: if (!ISNAN(val.r) && !ISNAN(val.i)) continue;
src/coalesce.c:139: int j=0; while (ISNAN(val.r) && ISNAN(val.i) && j<k) val=((Rcomplex *)valP[j++])[i];
src/coalesce.c:140: if (!ISNAN(val.r) || !ISNAN(val.i)) xP[i]=val; else if (final) xP[i]=finalVal;
src/frank.c:59: for (int j=0; j<n; ++j) ians[j] |= (ISNAN(COMPLEX(v)[j].r) || ISNAN(COMPLEX(v)[j].i));
src/frank.c:195: while (j < n && !ISNAN(COMPLEX(v)[j].r) && !ISNAN(COMPLEX(v)[j].i)) j++;
src/gsumm.c:320: if (ISNAN(elem.r) && ISNAN(elem.i)) my_anyNA = true;
src/gsumm.c:327: if (ISNAN(elem.r) && ISNAN(elem.i)) my_anyNA = true;
src/gsumm.c:557: if (!ISNAN(elem.r)) _ans[my_low[i]].r += elem.r;
src/gsumm.c:642: if (ISNAN(xd[ix].r) || ISNAN(xd[ix].i)) continue; // || otherwise we'll need two counts in two c's too?
src/assign.c:1039: else BODY(double, REAL, double, ISNAN(val)?(im=NA_REAL,NA_REAL):(im=0.0,val), td[i].r=cval;td[i].i=im)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It strikes me as a bit odd that there aren't macros for these in Rinternals.h - I'll see how R-devel feels about adding them to base R
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I am wondering if we shouldn't generalise 64 bit vectors in C function |
@2005m IIRC Matt has high priority for supporting 64-bit row indices generally in data.table in the near-ish future, i would group these under that bigger issue |
Is there an issue open regarding that subject because yes I also think it is important especially if we want to move to long vectors in data.table. I am happy to start going through the C code and move what can be moved to 64 bits. Are you refering to issue 3957? |
I looked briefly but forget where I saw this written down ^ cc @mattdowle |
ah, found it, it was in the Q&A at the H2O talk (~20 minutes in) |
Just found this: src/datatable.h:
src/init.c:
This is used in a few places: But it looks like its not strictly correct, as demonstrated by the R code above, is.na() requires only one of the real or imaginary components to be NA before returning TRUE, not both. |
Question RE: long vector support. Should we be using |
@sritchie73 For long vector support |
Great PR, thanks! Answers to the various questions above (I tried to reply within each comment but I'd have to edit to do that which'd be messy, so here goes):
Writing out loud ... I always thought that R had defined and used So, with the rant over and that said, I currently prefer Further, long vectors aren't 64bit (or 63bit) long. They're up to 2^52 because R doesn't have a 64bit integer type, so a double is used; all integers in the range 1 : 2^52 are precise in a double. [ Give or take a bit, I'm writing quickly. ] So actually, |
That makes sense, I'm happy to go through my PRs and change The only gotcha would be that the max value of |
Great. Glad. Yes agreed I added to the end of my long comment about 2^52 as you were writing. Welcome everyone's thoughts on whether nrow(DT) should return base-R plain double, or a bit64::integer64. I'm leaning towards a plain double. For consistency with base, and actually if you increase the digits option in R, integers up to 2^52 do print out accurately and nicely. data.table could check and warn if the digits option was too low when it knows it has big nrow. Another thing is that R_xlen_t appears to be defined as |
Plain double makes more sense I think. Otherwise we'd have to add bit64 as a dependency instead of suggests. |
Closes #4202
Error was due to internal
allNA
function in utils.c not handling CPLXSXP. This PR implements this, detecting where ISNAN on either the real or imaginary component.Demonstration that this is the correct way of handling NAs for Complex Vectors: