Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Room for new data.table cheat sheet on the homepage? #3374

Open
epetrovski opened this issue Feb 8, 2019 · 38 comments
Open

Room for new data.table cheat sheet on the homepage? #3374

epetrovski opened this issue Feb 8, 2019 · 38 comments

Comments

@epetrovski
Copy link

Some time ago, I published a cheat sheet for data.table and I've been expanding on it quite a bit since then: https:/rstudio/cheatsheets/blob/master/datatable.pdf

It's currently hosted on RStudio's homepage (and GitHub) but I was wondering if there's room for it on the data.table homepage as well?

Just to be clear, I don't think it should replace the existing cheat sheet on the data.table homepage. I'm trying to brand this as a "visual cheat sheet" mostly aimed at people who are new or casual data.table users.

@MichaelChirico
Copy link
Member

I like this. How easy is it to edit this format? Or would we assign you to add new topics?

@epetrovski
Copy link
Author

epetrovski commented Feb 17, 2019

There's a powerpoint. Editing is doable but definitely a hassle since it involves a lot of copy/pasting.

I would be happy to update the cheat sheet by assignment if you want new topics covered and I'll accept pull requests as well.

Currently, there's no more room for new stuff, though, but I'm thinking of a separate fread+fst cheat sheet (fast data import) which would make room for more stuff.

@jangorecki
Copy link
Member

Your work on cheatsheet is highly appreciated. I also think that many users already had opportunity to learn from cheatsheet you made. So thank you for that!

I would avoid using a cheatsheet made from powerpoint because it will be difficult in maintenance. I also think we should aim to provide a single data.table cheatsheet. So ideally would be to merge content from
https://s3.amazonaws.com/assets.datacamp.com/img/blog/data+table+cheat+sheet.pdf
and
https:/rstudio/cheatsheets/blob/master/datatable.pdf
into single cheatsheet.
Using some open source editing tool, so it will be easier to maintain.

@epetrovski
Copy link
Author

Thank you @jangorecki !

I agree that a powerpoint based cheat sheet is too difficult to maintain. My choice of software was solely guided by the fact that I had to conform to the RStudio visual guidelines for cheat sheets in order to get it on their website. They had a powerpoint template ready to go...

But something made in Rmarkdown or the like would be way more optimal. I'll see if there's anything I can do about it but others are more than welcome to give it a shot as well :)

@jangorecki
Copy link
Member

Thanks for info on that. I filled: https:/rstudio/cheatsheets/issues/97

@jangorecki jangorecki modified the milestones: 1.12.4, 1.13.0 Sep 17, 2019
@mattdowle mattdowle modified the milestones: 1.12.7, 1.12.9 Dec 8, 2019
@KyleHaynes
Copy link
Contributor

For each section, is there scope to hyperlink to relevant data.table vignettes?

@mattdowle mattdowle modified the milestones: 1.13.1, 1.13.3 Oct 17, 2020
@jangorecki jangorecki modified the milestones: 1.14.3, 1.14.5 Jul 19, 2022
@jangorecki jangorecki modified the milestones: 1.14.11, 1.15.1 Oct 29, 2023
@jangorecki jangorecki removed this from the 1.16.0 milestone Nov 6, 2023
@tdhock
Copy link
Member

tdhock commented Jan 30, 2024

hi @epetrovski I was wondering if you could please update the cheat sheet section "RESHAPE TO LONG FORMAT" to use the new features in data.table 1.15.0 released today?

> melt(data.table(id=c("A","B"),a_x=1,a_z=2,b_x=3,b_z=4), measure.vars=measure(value.name, y, sep="_"))
       id      y     a     b
   <char> <char> <num> <num>
1:      A      x     1     3
2:      B      x     1     3
3:      A      z     2     4
4:      B      z     2     4

@epetrovski
Copy link
Author

Sorry, but it's been ages since I've used data.table or R for that matter. Others should please feel free to update the cheat sheet powerpoint and remove my contact info.

@tdhock
Copy link
Member

tdhock commented Feb 7, 2024

thanks for the info @epetrovski
Would anybody else like to volunteer to update the cheat sheet?
@Anirban166 @MaraDestefanis

@tdhock tdhock reopened this Feb 7, 2024
@MaraDestefanis
Copy link

Hi @tdhock I would like update the cheat sheet section "RESHAPE TO LONG FORMAT"

@MaraDestefanis
Copy link

@tdhock The updated is done please check if it is ok. I made two changes ( yours and removed contact)
datatable_updated.pptx
datatable_updated.pdf

@ben-schwen
Copy link
Member

@tdhock The updated is done please check if it is ok. I made two changes ( yours and removed contact) datatable_updated.pptx datatable_updated.pdf

Could you also update the data.table version and date in the footnote?

@tdhock
Copy link
Member

tdhock commented Feb 14, 2024

thanks @MaraDestefanis that is a great improvement.
I think it would be good to still write something like "Created by Erik Petrovski and Mara Destefanis [email protected]" is that ok with you?
Also for the melt code, I think it would be easier to understand (and be more consistent with the other examples), if you change the argument from data.table(id= c(“A”,“B”),a_x=1,a_z=2,b_x=3,b_z=4) to dt, so:
melt(dt, measure.vars=measure(value.name, y, sep="_")) what do you think?

@tdhock
Copy link
Member

tdhock commented Feb 14, 2024

Also for the argument docs how about:

Reshape a data.table from wide to long format.
dt: a data.table.
measure.vars: Columns containing values to fill into cells, often using measure() or patterns().
id.vars: character vector of ID column names. (optional)
variable.name, value.name: names for output columns (optional)

@tdhock
Copy link
Member

tdhock commented Feb 14, 2024

Also if you think it is appropriate, and if there is enough room, could you please add some documentation for measure()?
measure(out_name1, out_name2, sep="_", pattern="([ab])_(.*)")
sep (separator) or pattern (regular expression) are used to specify columns to melt, and parse input column names.
out_name1, out_name2: names for output columns (creates single value column), or value.name (creates value column for each unique value of the corresponding part of the melted column name).

@MaraDestefanis
Copy link

@ben-schwen done. @tdhock done, please check.
There is not enough room but I added the best I could.
Notes: Check the code -I'm not sure if it's okay -
The document, we need make sure to keep the font size the same at all points and remember that spaces are important too. That's where we draw the line
datatable_updated(1).pptx
datatable_updated(1).pdf

@tdhock
Copy link
Member

tdhock commented Feb 14, 2024

datatable_cheat_sheet_TDH_14_Feb_2024.pdf
datatable_cheat_sheet_TDH_14_Feb_2024.pptx

Hi Mara, Thanks for the quick revisions! I changed a couple of things, what do you think?

  • capitalization X x Z z Sep sep
  • change data.table(id= c(“A”,“B”),a_x=1,a_z=2,b_x=3,b_z=4) to dt
  • make long tables identical (previous version showed the same data but rows in a different order)

@MaraDestefanis
Copy link

@tdhock I am delighted to collaborate, thanks to you. I'll squeeze in some time this weekend to make those changes and hit you back. (At this moment I trust your judgment until I have better expertise). Let's go forward.

@MaraDestefanis
Copy link

MaraDestefanis commented Feb 19, 2024

Hi @tdhock I add files, you want to check it, I did it this way. tell me if you want me to change something
data.table_update(2).pdf

datatable_updated(2).pptx

@tdhock
Copy link
Member

tdhock commented Feb 21, 2024

hi @MaraDestefanis thanks for sharing. Can you please tell me what are the differences/improvements in your version, with respect to my revisions from #3374 (comment) ?

@MaraDestefanis
Copy link

hi @tdhock ,

  • Capitalize X for a and use lowercase z for b: "X x Z z"
  • Use lowercase "sep" in the text ´Reshape a data table from...´
  • Previously, in the code, it was "dt". I changed it to "data.table".

Please point out the details that I may have misunderstood, and I will gladly apply them

@tdhock
Copy link
Member

tdhock commented Feb 22, 2024

Capitalize X for a and use lowercase z for b: "X x Z z" -> For consistency with the reshape to wide/dcast example, I think it would be better to maintain consistency. (in my version the b_x is consistent between the two, whereas in your version, dcast example has b_x, and melt example has b_X)

Use lowercase "sep" in the text ´Reshape a data table from...´ -> my version already had lowercase sep.

Previously, in the code, it was "dt". I changed it to "data.table". -> for consistency with the other examples, in which the first argument is usually dt I think it would be better to keep it dt instead of data.table(id= c(“A”,“B”),a_X=1,a_z=2,b_X=3,b_z=4)

So overall I think it would be better to keep the version with the changes I proposed in #3374 (comment) If you agree, then there are no new changes to apply. Or am I missing something?

@MaraDestefanis
Copy link

@tdhock . Back at it again. Would you mind reviewing this? If everything checks out, rename the file. If there are any issues, please highlight them for clarification, and I'll make the adjustments .

datatable_updated(3).pptx
data.table_update(3).pdf

@tdhock
Copy link
Member

tdhock commented Feb 26, 2024

your new version still has some of the same issues I mentioned, and a white line over the authors at the bottom.
did you see the revised files I uploaded in #3374 (comment) ? I believe that version fixes the issues I mentioned. Can you please look at that version and tell me if you approve?

@MaraDestefanis
Copy link

Hi @tdhock, thanks for taking the time to review that. I applied what you suggested in comment #3374. I hope I didn't make any mistakes with any issues. However, if I did, tell me again until it's perfect.

data.table_update(4).pdf

@tdhock
Copy link
Member

tdhock commented Feb 28, 2024

Mara, your version still does not address the issue I mentioned above in this comment: #3374 (comment)
Please do not edit nor make a new version, which I believe is wasting our time due to some mis-communication. Instead, please read these files which I linked above in that comment, and I link again below here for clarity:
datatable_cheat_sheet_TDH_14_Feb_2024.pdf
datatable_cheat_sheet_TDH_14_Feb_2024.pptx
and tell me if you think they are ok (I believe they are OK).

@MaraDestefanis
Copy link

Hi @tdhock sorry for the delay. Yes is OK this version. I'll add it again, with just a little adjustment only in the text.
(at some point in the reviews, I got lost with this comment: capitalization X x Z z Sep sep, sorry for your time)

data_table_cheat_sheet.pptx
data_table_cheat_sheet.pdf

Feel free to let me know if there's anything else you'd like to change.
Note: these days I will try to make a quarto version. If we need to translate, I'm available for that too.

@tdhock
Copy link
Member

tdhock commented Mar 6, 2024

Hi Mara thanks for sharing. Can you please clarify what exactly you changed in your new version? "with just a little adjustment only in the text" #3374 (comment)

Your new version still does not fix the issue I mentioned previously "make long tables identical (previous version showed the same data but rows in a different order)" which is fixed in my previous version #3374 (comment) -- can you please use that version if you want to make future modifications?

In particular the issue can be seen below
tables-not-same

It is fixed in my version as can be seen below
tables-same

@tdhock
Copy link
Member

tdhock commented Mar 6, 2024

Your new version also still has another issue I mentioned, #3374 (comment) "a white line over the authors at the bottom" see below.
measure-revisions

@MaraDestefanis
Copy link

MaraDestefanis commented Mar 9, 2024

Thanks, @tdhock for the detailed clarification, I really needed that. Could you please double-check if it's good now? I'm totally cool with reviewing it as many times as we need to get it right.

data_table_cheat_sheet.pdf
data_table_cheat_sheet.pptx

@tdhock
Copy link
Member

tdhock commented Mar 10, 2024

Hi Mara that is a lot better thanks. A couple of minor suggestions:
please change "and parse input column names." to "and to parse input column names."

please remove space after open parenthesis: change "value.name ( creates" to "value.name (creates"

@MaraDestefanis
Copy link

Hi @tdhock I'm sending that text again with the review you talked about. Take a look and let me know what's next. Thanks for being patient and explaining stuff.

data_table_cheat_sheet.pptx
data_table_cheat_sheet.pdf

@tdhock
Copy link
Member

tdhock commented Mar 26, 2024

looks good, what do other people think?

please make a minor correction:

melt(dt,
measure.vars= measure (
value.name, y,sep="_"))

put space before equals sign (measure.vars =)
and space after comma (y, sep)

@MaraDestefanis
Copy link

Hey @tdhock, great to hear from you, Can you just check this out?
data_table_cheat_sheet.pdf
data_table_cheat_sheet.pptx

@tdhock
Copy link
Member

tdhock commented Apr 10, 2024

Hi Mara that looks great, can you please submit a PR to https:/rstudio/cheatsheets that updates the cheat sheet?

@tdhock
Copy link
Member

tdhock commented Jun 12, 2024

great, the rstudio repo has accepted our updated cheatsheet
so now we just need to update the link on the readme

@tdhock
Copy link
Member

tdhock commented Jun 12, 2024

or maybe delete the old one? (probably better to avoid confusion)

@MaraDestefanis
Copy link

Hi @tdhock Toby,

I'm aware of the news, that's great! I'm not sure if I need to do anything now, but I'm keeping an eye on it. And our feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants