Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.table homepage #3675

Closed
hadley opened this issue Jul 2, 2019 · 9 comments · Fixed by #3677
Closed

data.table homepage #3675

hadley opened this issue Jul 2, 2019 · 9 comments · Fixed by #3677
Assignees
Labels
Milestone

Comments

@hadley
Copy link
Contributor

hadley commented Jul 2, 2019

(Context: I have offered to help data.table get set up with a basic pkgdown website and provide some advice on the homepage. Matt suggested that this was the best place to discuss the homepage)

Package websites in the tidyverse follow a pretty standard format, and I would suggest that data.table mimics this structure, unless there are any strong objections. We usually organise the page as follows:

  • Overview: 1-3 paragraph overview of what the package does, often linking to vignettes for more details about particular features. The goal is to help people quickly figure out if they are in the right place.

  • Installation: a couple of lines of code so that people can install the package by copying and pasting code (i.e. as easily as possible).

  • Cheatsheet: If a cheatsheet is available, high-resolution thumbnails linking to the cheatsheet. This provides some visual interest to the page, and provides a quick overview of the features.

  • Usage: 1-2 screens worth of runnable code showing the most important features of the package.

  • Links in the sidebar and navbar to get more details.

(Typically we also echo this material in the readme, so that it's available directly from GitHub.)

Are there any strong feelings about this overall structure?


Much of this content can be drawn from the existing data.table wiki, but there are a couple of pieces that where the data.table community needs to guide me:

  • I think the first paragraph could be a bit punchier, focussing more on the features of data.table, and providing a little less social proof. Perhaps some of the headline features could be bought up from the bottom of the page, and there could be a new second paragraph that focussed on the impact that data.table has had in the wider community?

  • I would suggest combining the basic syntax guide (i.e. the two large images) with links to a cheatsheet. Is there a data.table cheatsheet endorsed by the community? I see that we link to one from our cheatsheet page but I'm not sure if this is official.

  • To me, the code samples are a little long. The main goal of the homepage (in my opinion) is to get people interested in the package — the details can go else, but here you want to demonstrate the key features of the package. I think it's also quite important the output is shown (particularly since data.table enhances the print method) and I'd recommend using a dataset that at least evokes a possible analysis context.

@MichaelChirico
Copy link
Member

For cheatsheet: the one you link @epetrovski was brought up in #3374, we've just been trying to figure out the best way to incorporate it & how extensible it'd be. So yes, it's as good as official.

@jangorecki
Copy link
Member

jangorecki commented Jul 3, 2019

It might be easier to work that out having PR already. One thing I need to note that we don't want to sacrifice drat repo for pkgdown page. I believe it shouldn't be problem to combine both. Drat is created after success CI using https:/Rdatatable/data.table/blob/master/deploy.sh that git reset --hard gh-pages every time, so pkgdown artifacts should probably be populated there. We could eventually plug pkgdown into gitlab CI which is much more flexible. Having done for travis will be easy to migrate there.

@hadley
Copy link
Contributor Author

hadley commented Jul 3, 2019

I can put up something concrete, but rough, in order to get feedback, but someone from the data.table community needs to contribute a new code example, and someone heavily involved in data.table development also needs to be involved so that they can continue to maintain it after my initial contribution.

I don't foresee any problem integrating the pkgdown deployment into your existing script, although you'll need to do it by hand since your CI setup is rather different from our standard.

@MichaelChirico
Copy link
Member

yes that works

@hadley
Copy link
Contributor Author

hadley commented Jul 3, 2019

Where can I find a high-resolution version of the data.table logo?

@MichaelChirico
Copy link
Member

@g3o2
Copy link

g3o2 commented Jul 14, 2019

Looking at @hadley's pull request, I agree that this is definitely the way to go.

I have discovered data.table only two weeks ago, coming from tidyverse. I love the data.table package, due to its minimal dependencies and proximity to both base R and sql syntax. Documentation is unfortunately not yet its strength:

  • which vignette to read first? Obviously the Introduction! So it should be the first article in the list. Articles on developing packages or optimizing with data.table can go last;

  • where is the advertised "Joins and rolling joins" vignette? Joins are hugely important for data processing, yet, I had to search through SO and other websites to find anything of note. Where does setkey come in ? Maybe even split the future documentation of regular joins from the more advanced rolling joins.

  • data.table advertises a lot in the style of "better than this or that package". I personally think it should advertise its own strengths and main usage scenarios. Let the users do the judgmental advertising. Just provide them with the performance data.

  • Actually, I would be most interested to see where data.table and tidyverse can be used in tandem! I recently had a scenario where I used to data.table for the job but was forced to add a dplyr function because the pure data.table syntax would have been too unreadable. There are certainly bridges between both packages, which will eventually benefit both users and developers.

  • finally, consider transferring some beginner FAQ items to a dedicated vignette. For example: "Writing a function based on data.table." It took me hours to find out about get() because I did not have the reflex to read the FAQ first and probably not the required base R knowledge to know. The FAQ should focus on power user items IMO.

@jangorecki jangorecki added this to the 1.12.4 milestone Aug 26, 2019
@jangorecki jangorecki added the www label Aug 26, 2019
@jangorecki jangorecki self-assigned this Aug 26, 2019
@jangorecki jangorecki added the ci label Aug 26, 2019
@jangorecki
Copy link
Member

jangorecki commented Aug 27, 2019

@g3o2 Thanks for your warm comment. Please find draft of pkgdown website at https://jangorecki.gitlab.io/data.table/ more details in a PR addressing this issue: #3677


  • which vignette to read first? Obviously the Introduction! So it should be the first article in the list. Articles on developing packages or optimizing with data.table can go last;

Agree, CRAN does not allow to order vignettes. On pkgdown draft website it is already sorted as necessary, also introduction vignette is linked directly from home page.

  • where is the advertised "Joins and rolling joins" vignette? Joins are hugely important for data processing, yet, I had to search through SO and other websites to find anything of note. Where does setkey come in ? Maybe even split the future documentation of regular joins from the more advanced rolling joins.

I re-opened #2181 so it is easier to track status of this single vignette, which is highly requested. You can subscribe there to get updates on progress.

  • finally, consider transferring some beginner FAQ items to a dedicated vignette. For example: "Writing a function based on data.table." It took me hours to find out about get() because I did not have the reflex to read the FAQ first and probably not the required base R knowledge to know. The FAQ should focus on power user items IMO.

To be fair, I don't think we should teach our users much about base R itself. Of course I get your point. Personally I don't even use get but more sophisticated base R features, see https://stackoverflow.com/a/54800108/2490497 for details.

@MichaelChirico
Copy link
Member

Agree, CRAN does not allow to order vignettes.

I recently saw golem uses numbering for this:

https://cran.r-project.org/web/packages/golem/index.html

Problem is we only have partial ordering -- intro comes first, but the rest, not so fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants