Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc.to_bytes() fails with MemoryError: bad allocation #437

Closed
JeremyGibson opened this issue Jun 23, 2016 · 2 comments
Closed

doc.to_bytes() fails with MemoryError: bad allocation #437

JeremyGibson opened this issue Jun 23, 2016 · 2 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@JeremyGibson
Copy link

Here is another memory error I'm getting.

def __init__(self, out_file, lst):
        self.fout = codecs.open(out_file, "w", "utf-8-sig")
        self.list_to_write = lst
        for l in self.list_to_write:
            doc = self.tag_element(l.replace('\n', ' '))
            barry = doc.to_bytes()
            print(barry)

        self.fout.close()
Traceback (most recent call last):
  File "C:\Users\jgibson\.IntelliJIdea2016.1\config\plugins\python\helpers\pydev\pydevd.py", line 1531, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Users\jgibson\.IntelliJIdea2016.1\config\plugins\python\helpers\pydev\pydevd.py", line 938, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "E:/Ubuntu/guest_share/xml_parser/xml_elements.py", line 76, in <module>
    wlf = WriteListToFile('test.txt', xce.cleaned_list)
  File "E:/Ubuntu/guest_share/xml_parser/xml_elements.py", line 56, in __init__
    barry = doc.to_bytes()
  File "spacy/tokens/doc.pyx", line 418, in spacy.tokens.doc.Doc.to_bytes (spacy/tokens/doc.cpp:10671)
    byte_string = self.vocab.serializer.pack(self)
  File "spacy/vocab.pyx", line 104, in spacy.vocab.Vocab.serializer.__get__ (spacy/vocab.cpp:4568)
    self._serializer = Packer(self, self.serializer_freqs)
  File "spacy/serialize/packer.pyx", line 89, in spacy.serialize.packer.Packer.__init__ (spacy/serialize/packer.cpp:5153)
    self.orth_codec = HuffmanCodec(_gen_orths(vocab))
  File "spacy/serialize/huffman.pyx", line 34, in spacy.serialize.huffman.HuffmanCodec.__init__ (spacy/serialize/huffman.cpp:2152)
    self.codes.push_back(code)
MemoryError: bad allocation

So this is either my problem on my machine, or I'm not doing something in the flow of the nlp pipeline. Or, perhaps I have some bad memory. I've tried code on both my office windows machine and an Ubuntu machine.

@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Oct 21, 2016
@honnibal
Copy link
Member

There's possibly a real bug here, but it's hard to action this, since I didn't get to it in time to get more details. Please reopen if you can reproduce the error on the latest version.

@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

2 participants