Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickling/Unpickling a doc does work #606

Closed
halfak opened this issue Nov 4, 2016 · 4 comments
Closed

Pickling/Unpickling a doc does work #606

halfak opened this issue Nov 4, 2016 · 4 comments
Labels
enhancement Feature requests and improvements 🌙 nightly Discussion and contributions related to nightly builds

Comments

@halfak
Copy link

halfak commented Nov 4, 2016

Spacy docs can't be pickle and unpickled.

$ python
Python 3.5.1+ (default, Mar 30 2016, 22:46:26) 
[GCC 5.3.1 20160330] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> from spacy.en import English
>>> parse = English()
>>> doc = parse("I'm a little teapot")
>>> pickled_doc = pickle.loads(pickle.dumps(doc))
>>> doc
I'm a little teapot
>>> pickled_doc

>>> doc == pickled_doc
False
>>> repr(pickled_doc)
''
>>> type(pickled_doc)
<class 'spacy.tokens.doc.Doc'>
$ pip freeze | grep spacy
spacy==0.101.0

Your Environment

  • Operating System: Ubuntu
  • Python Version Used: 3.5.1
  • spaCy Version Used: 0.101.0
@halfak
Copy link
Author

halfak commented Nov 4, 2016

I just noticed that there's a newer version of spacy available. I tested with that too.

$ python
Python 3.5.1+ (default, Mar 30 2016, 22:46:26) 
[GCC 5.3.1 20160330] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> from spacy.en import English

>>> parse = English()
>>> doc = parse("I'm a little teapot")
>>> pickled_doc = pickle.loads(pickle.dumps(doc))
>>> doc
I'm a little teapot
>>> pickled_doc

>>> doc == pickled_doc
False
>>> pickled_doc == parse("")
False
>>> 
$ pip freeze | grep spacy
spacy==1.1.2

@honnibal honnibal added the enhancement Feature requests and improvements label Nov 4, 2016
@honnibal
Copy link
Member

honnibal commented Nov 4, 2016

There's a long thread about this in the tracker. The tl;dr is that this is non-trivial.

I've grumbled about having to support pickle before, because it's pretty painful. There's a lot of C data in spaCy, and a lot of non-local state, because the vocabulary is shared between every document produced by a pipeline, and every model in that pipeline.

I've come around to believing this is necessary, and I'd like to see it fixed. I think this is a good candidate for a sprint, because it touches code across a number of files.

@ines
Copy link
Member

ines commented May 7, 2017

Closing this and making #1045 the master issue. Work in progress for spaCy v2.0!

@ines ines closed this as completed May 7, 2017
@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Feature requests and improvements 🌙 nightly Discussion and contributions related to nightly builds
Projects
None yet
Development

No branches or pull requests

3 participants