Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tar-stream regression in 13 with pipeline #32954

Closed
mafintosh opened this issue Apr 20, 2020 · 16 comments
Closed

tar-stream regression in 13 with pipeline #32954

mafintosh opened this issue Apr 20, 2020 · 16 comments

Comments

@mafintosh
Copy link
Member

  • Version: 13.13.0
  • Platform: Mac
  • Subsystem: stream

What steps will reproduce the bug?

stream.pipeline no longer compatible with tar-stream (around 500k GitHub dependents) for tar files with more than one file.

Attached is a test case.

echo hello > hello.txt
echo world > world.txt
tar c hello.txt world.txt > test.tar
const tar = require('tar-stream')
const fs = require('fs')
const path = require('path')
const pipeline = require('stream').pipeline

fs.createReadStream('test.tar')
  .pipe(tar.extract())
  .on('entry', function (header, stream, done) {
    console.log(header.name) // in 13 this will only unpack one file due to
                                                 // pipeline calling destroy on the substream
                                                 // causing the entire extract stream to be destroyed
                                                 // and silently fail.
    pipeline(stream, fs.createWriteStream(path.join('/tmp', header.name)), done)
  })

This seems to be related to changes that force autoDestroy behaviour on existing streams when using pipeline, added in #31940

Seems to be the same regression fixed for HTTP in #32197

@mafintosh mafintosh changed the title Stream regression in 13 tar-stream regression in 13 with pipeline Apr 20, 2020
@mcollina
Copy link
Member

cc @ronag

@ronag
Copy link
Member

ronag commented Apr 21, 2020

I'm unsure how to fix this. tar-stream is conceptually a Duplex but is implemented as a Writable. As far as I can see Node does the correct thing and I can't think of a generic way to detect this scenario.

We are kind of stuck choosing between:

  • Destroying legacy "streams" that don't want to be destroyed (e.g. tar-stream).
  • Not destroying legacy streams that want to be destroyed but doesn't automatically handle it (this is what the docs say).

The only thing I can think of right now is to fix tar-stream by making it a Duplex. Alternatively add support for some kind of willEmitClose option/property that implementors can use to override the destroy on 'finish'/'end' behaviour.

@ronag

This comment has been minimized.

@mafintosh
Copy link
Member Author

@ronag not sure i follow what you mean by making it a duplex?

It's a writable stream that forwards parts of that stream as a series of streaming files as it parses them. the forced auto destroy in pipeline makes the sub-stream forward the stream destruction to the main input stream.

@mafintosh
Copy link
Member Author

The tricky thing for tar-stream with this forced behaivor is that the substream (the one emitted in entry) doesn't know when it's done being piped, so when pipeline forces the destroy on it, it doesn't know if that destroy is because:

a) a user wants to destroy the main stream
b) a stream in the pipeline it's being used in crashed and pipeline is taking down all streams, even if they ended
c) is its a forced auto destroy by pipeline

@ronag
Copy link
Member

ronag commented Apr 21, 2020

@ronag not sure i follow what you mean by making it a duplex?

Instead of emitting 'entry', you could implement it as a Duplex with readableObjectMode: true emitting 'data' which are "entries" and in turn emits 'entry' as compat.

@ronag
Copy link
Member

ronag commented Apr 21, 2020

@mafintosh: I do see the problem but unless it's implemented as a Duplex I'm not sure how we could fix this while still maintaining the contract documented in the pipeline docs:

stream.pipeline() will call stream.destroy(err) on all streams except:

Readable streams which have emitted 'end' or 'close'.
Writable streams which have emitted 'finish' or 'close'.

@mafintosh
Copy link
Member Author

I don't see how changing it will help. Any module that forwards a substream will have this problem, like HTTP did. Even if we did change it, this has +5mio weekly downloads, so massive breakage.

I'm open for ideas, but I think we're borked

@ronag
Copy link
Member

ronag commented Apr 21, 2020

I don't see how changing it will help.

Would you mind if I try to make a PR against tar-stream?

Any module that forwards a substream will have this problem, like HTTP did. Even if we did change it, this has +5mio weekly downloads, so massive breakage.

I'm not sure what to do here. If we break the pipeline destroy behavior other stuff might break.

@mcollina @lpinca I think we might need some more opinions.

@mcollina
Copy link
Member

I think the best option is to restore the old behavior.

@mafintosh
Copy link
Member Author

@ronag go for it

I don't think this case is possible to fix though

extract.on('error', function (err) {
   // err === that error from below
})
extract.on('entry', function (header, stream) {
  pipeline(stream, new Writable({
    write (data, enc, cb) {
      cb(null)
    },
    final (cb) {
      this.destroy(new Error('I am crashing now'))
    }
  )
})

@ronag
Copy link
Member

ronag commented Apr 21, 2020

I think the best option is to restore the old behavior.

I'm ok with that.

@mafintosh
Copy link
Member Author

I'm +1 on restoring old behaviour also. I think autoDestroy is a better mechanic for this for modules. Also means we can drop the HTTP special casing.

@ronag
Copy link
Member

ronag commented Apr 21, 2020

Just as a note, the old behavior was to sometimes destroy. I'm a little unsure at the moment how the old code could have worked with tar-stream.

@ronag

This comment has been minimized.

@mafintosh
Copy link
Member Author

I'll try to test that commit

ronag added a commit to nxtedition/node that referenced this issue Apr 21, 2020
This PR logically reverts nodejs#31940 which
has caused lots of unnecessary breakage in the ecosystem.

This PR also aligns better with the actual documented behavior:

`stream.pipeline()` will call `stream.destroy(err)` on all streams except:
  * `Readable` streams which have emitted `'end'` or `'close'`.
  * `Writable` streams which have emitted `'finish'` or `'close'`.

The behavior introduced in nodejs#31940
was much more aggressive in terms of destroying streams. This was
good for avoiding potential resources leaks however breaks some
common assumputions in legacy streams.

Furthermore, it makes the code simpler and removes some hacks.

Fixes: nodejs#32954
Fixes: nodejs#32955
@ronag ronag closed this as completed in 6419e59 Apr 23, 2020
ronag added a commit to nxtedition/node that referenced this issue Apr 23, 2020
This PR logically reverts nodejs#31940 which
has caused lots of unnecessary breakage in the ecosystem.

This PR also aligns better with the actual documented behavior:

`stream.pipeline()` will call `stream.destroy(err)` on all streams except:
  * `Readable` streams which have emitted `'end'` or `'close'`.
  * `Writable` streams which have emitted `'finish'` or `'close'`.

The behavior introduced in nodejs#31940
was much more aggressive in terms of destroying streams. This was
good for avoiding potential resources leaks however breaks some
common assumputions in legacy streams.

Furthermore, it makes the code simpler and removes some hacks.

Fixes: nodejs#32954
Fixes: nodejs#32955

PR-URL: nodejs#32968
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Mathias Buus <[email protected]>
Backport-PR-URL: nodejs#32980
BethGriggs pushed a commit that referenced this issue Apr 27, 2020
This PR logically reverts #31940
which has caused lots of unnecessary breakage in the ecosystem.

This PR also aligns better with the actual documented behavior:

`stream.pipeline()` will call `stream.destroy(err)` on all streams
except:
  * `Readable` streams which have emitted `'end'` or `'close'`.
  * `Writable` streams which have emitted `'finish'` or `'close'`.

The behavior introduced in #31940
was much more aggressive in terms of destroying streams. This was
good for avoiding potential resources leaks however breaks some
common assumputions in legacy streams.

Furthermore, it makes the code simpler and removes some hacks.

Fixes: #32954
Fixes: #32955

PR-URL: #32968
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Mathias Buus <[email protected]>
BethGriggs pushed a commit that referenced this issue Apr 28, 2020
This PR logically reverts #31940
which has caused lots of unnecessary breakage in the ecosystem.

This PR also aligns better with the actual documented behavior:

`stream.pipeline()` will call `stream.destroy(err)` on all streams
except:
  * `Readable` streams which have emitted `'end'` or `'close'`.
  * `Writable` streams which have emitted `'finish'` or `'close'`.

The behavior introduced in #31940
was much more aggressive in terms of destroying streams. This was
good for avoiding potential resources leaks however breaks some
common assumputions in legacy streams.

Furthermore, it makes the code simpler and removes some hacks.

Fixes: #32954
Fixes: #32955

PR-URL: #32968
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Mathias Buus <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants