Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Settings for controlling if outputs should be saved to disk #124551

Closed
DonJayamanne opened this issue May 25, 2021 · 12 comments
Closed

Settings for controlling if outputs should be saved to disk #124551

DonJayamanne opened this issue May 25, 2021 · 12 comments
Assignees
Labels
feature-request Request for new features or functionality notebook-ipynb notebook-serialization *out-of-scope Posted issue is not in scope of VS Code

Comments

@DonJayamanne
Copy link
Contributor

Originally filed here by @rebornix microsoft/vscode-jupyter#4670

I think we should consider adding this feature into VSCode. I've implemented this same feature in Kusto notebooks.
Feels like something generic for VS Code.

Similar to having a cell toolbar position per viewtype, we can have Save Notebook Outputs per viewtype. This can be achieved easily from VS Code by passing an empty array into the serializer, when saving notebooks.

I personally think this would be useful for rest notebooks as well, one might want the output from the REST & GithubIssues Notebook to be saved in the notebook, just like kusto notebooks (some might want the charts/data & some might not).

Thus all notebooks can be considered serializable (including output for sharing) & user choses whether the output is to be persisted on disc or not.

E.g. with .NET team building their notebook format (*.dib) this would have to be impelemented yet again in their extension.

@rebornix /cc

@vscodebot
Copy link

vscodebot bot commented May 25, 2021

(Experimental duplicate detection)
Thanks for submitting this issue. Please also check if it is already covered by an existing one, like:

@rebornix
Copy link
Member

E.g. with .NET team building their notebook format (*.dib) this would have to be impelemented yet again in their extension.

I doubt so, dib is a concatenation of cell inputs so it doesn't save outputs.


if outputs should be saved to disk

if we are talking about serializing the content and saving to disk, then it's the responsibility of the content serializer. VS Code has no idea how the outputs will be used to generate the final file format. If we ever implement something in the core, there are many open questions:

  • should the content serializer completely drop outputs info in the file, or save them as outputs: []? how would it know it's empty because users disable it or it's simply empty?
  • should the content serializer save execution count? Same question as above, how would it know?
  • is there any other metadata that can be linked with outputs and should have the same persistence story as outputs?

In Kusto extension, IIRC if users set not to save outputs, then we also make the executionCount transient. Maybe we do the same in Jupyter, but we don't know yet if it's generic enough.

cc @jrieken @tanhakabir @roblourens

@jrieken
Copy link
Member

jrieken commented May 25, 2021

Yeah, I don't believe that it can be done. The closest request we have is support for transient per output-instance, not per notebook type: #120600

@tanhakabir
Copy link
Contributor

I don't have this case where I'm not saving outputs yet (no request yet) but this case might be possible from core by not giving output cell items in NotebookData when the serialize function is called?

So far though I don't have such a feature planned

@andreaschiappacasse
Copy link

+1
This would be also super useful in managing version control of .ipynb, in order to avoid committing large outputs/sesntive output which may be generated by the notebook.

@MrNickArcher
Copy link

I came looking for this. Sad to see it hasn't got much enthusiasm. I also want it

  • Preparing notebooks for teaching without accidentally giving students any pre-run outputs
  • Hiding sensitive output and metadata when committing to git repositories

@RatanRSur

This comment was marked as spam.

@PiotrCzapla
Copy link

It would be best to expose an ability to filter what get's saved, NBDEV that facilitate building python libraries in notebooks in literate programming, has special comment markup to remove only some outputs. I would love to be able to run the same hook that is already implemented for Jupyter in vscode.

@JotaRata
Copy link

Does anybody know if this can be made possible by using the new Notebook: Code Actions On Save setting?

@NickCrews
Copy link

Workaround: add a git filter so that in the middle of committing, the notebook is intercepted and the outputs are removed before they are saved by git. The outputs are still saved to disk locally, so you can still see them, but when you git push the outputs won't be included, or if you git checkout a different commit, you won't see the outputs there.

See this gist (PS, run git add --renormalize . as a 4th step before your first commit, so that all of your existing notebooks get scrubbed, or you will get heinous confusing merge conflicts later.)

@ndgayan
Copy link

ndgayan commented Oct 19, 2023

This is a big issue when you working on remote SSH with Jupyter Notebook files. During the VSCode's "save" action, a notebook that contains variables/data frames with a large amount of data in the output is written to the remote file. Sometimes this can cause VSCode to crash or wait a few minutes to finish the save action. It would be helpful If you could add an option to disable saving the notebook output to the file and keep it only in the memory during the session.

@DonJayamanne
Copy link
Contributor Author

This issue has been open for over 2 years and we have received very few votes on this.
Given that there are other appraoch such as clearing outputs upon saving notebooks or the like, I think that is sufficient (instead of providing a built-in solution).

@DonJayamanne DonJayamanne added the *out-of-scope Posted issue is not in scope of VS Code label Dec 4, 2023
@aiday-mar aiday-mar added this to the December / January 2024 milestone Feb 6, 2024
@aiday-mar aiday-mar removed this from the December / January 2024 milestone Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request for new features or functionality notebook-ipynb notebook-serialization *out-of-scope Posted issue is not in scope of VS Code
Projects
None yet
Development

No branches or pull requests