Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warnings when saving a spacy model during unittest #5230

Closed
lfiedler opened this issue Mar 30, 2020 · 14 comments · Fixed by #5264
Closed

Warnings when saving a spacy model during unittest #5230

lfiedler opened this issue Mar 30, 2020 · 14 comments · Fixed by #5264
Labels
feat / serialize Feature: Serialization, saving and loading help wanted Contributions welcome!

Comments

@lfiedler
Copy link
Contributor

When saving a model during unit testing with unittest I get warnings on unclosed files.
This happens with custom models as well as with models downloaded from public repository.

import spacy
from tempfile import TemporaryDirectory
from unittest import TestCase

class TestSaveSpacyModel(TestCase):
    def test_save(self):
        nlp = spacy.load("de_core_news_sm")
        with TemporaryDirectory() as temp_dir:
            nlp.to_disk(temp_dir)

Running with unittest results in:

Process finished with exit code 0
C:\***\lib\site-packages\spacy\language.py:900: ResourceWarning: unclosed file <_io.TextIOWrapper name='C:\\***\\Temp\\tmpdintxtwz\\meta.json' mode='w' encoding='cp1252'>
  srsly.json_dumps(self.meta)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
C:\***\lib\site-packages\spacy\util.py:645: ResourceWarning: unclosed file <_io.BufferedWriter name='C:\\***\\Temp\\tmpdintxtwz\\tagger\\model'>
  writer(path / key)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
C:\***\lib\site-packages\spacy\util.py:645: ResourceWarning: unclosed file <_io.BufferedWriter name='C:\\***\\Temp\\tmpdintxtwz\\vocab\\vectors'>
  writer(path / key)
ResourceWarning: Enable tracemalloc to get the object allocation traceback


Ran 1 test in 1.457s

OK

This seems to be a bug similar as to #1706, which was related to loading models and has been fixed. However, it seems that saving a model also does not properly close files.

Since this can be muted easily, this is not much of an issue for me. Still though, might as well report it.


## Your Environment
* **spaCy version:** 2.2.4
* **Platform:** Windows-10-10.0.18362-SP0
* **Python version:** 3.7.6
* Environment Information: conda 4.8.2
* models: de, custom

@svlandeg svlandeg added the feat / serialize Feature: Serialization, saving and loading label Mar 30, 2020
@svlandeg
Copy link
Member

svlandeg commented Apr 2, 2020

Thanks for the report! Definitely something to look into. If you feel like helping out with a PR, that would be most welcome :-)

@svlandeg svlandeg added the help wanted Contributions welcome! label Apr 2, 2020
@lfiedler
Copy link
Contributor Author

lfiedler commented Apr 3, 2020

Sure, I'll have a look.

@lfiedler
Copy link
Contributor Author

lfiedler commented Apr 4, 2020

As a side note, the reason this may not have been seen yet appears to be this bug in pytest:
pytest-dev/pytest#5676

@lfiedler
Copy link
Contributor Author

lfiedler commented Apr 4, 2020

As suggested by the warning messages, some of the file descriptors declared in to_disk methods of classes Language, Vectors and Tagger are currently not properly closed after use. As far as I can see this seems to be the case for classes Pipe and EntityLinker, too.
I could try to fix the latter two here as well, or should this be done somewhere else?

@svlandeg
Copy link
Member

svlandeg commented Apr 4, 2020

If it's the same issue, I think it's fine to copy the fix across. The code probably got copied around originally...

@lfiedler
Copy link
Contributor Author

lfiedler commented Apr 6, 2020

Here' s another question: The interface class Pipes requires a Model class to be attached as class attribute. In the implementations present in the same module, this is required to follow the interface class thinc.neural.Model (according to doc strings). Can I therefore assume that this is alwasys the case?

The reason I am asking is the following line in Pipes.to_disk (which seems to raise the warning):

if self.model not in (None, True, False):
    serialize["model"] = lambda p: p.open("wb").write(self.model.to_bytes())

In particular, can I assume that model will always implement the method to_disk?
Afaik this is the case for thinc.model.Model which, I assume, is the same as thinc.neural.Model.

@svlandeg
Copy link
Member

svlandeg commented Apr 6, 2020

Short answer: Yes - you can assume that a pipe's model has a to_bytes() method.

Longer answer: when looking at the Thinc code, things get tricky. You're currently working on spaCy's master, which uses Thinc 7, which is the old version and can still be seen here: https:/explosion/thinc/tree/v7.x/thinc. There, you'll find that Model is indeed in thinc.neural.

Meanwhile, Thinc 8 has been released which has Model in thinc.model. We are adjusting spaCy to this new version on the develop branch. But I would hope/assume that the fix you're creating, will be easily portable between the two.

@lfiedler
Copy link
Contributor Author

lfiedler commented Apr 6, 2020

Thank you for responding so fast. I see, I did not pay attention to the version of Thinc spacy is currently using. Thanks for pointing that out.

Actually, I was hoping to use Model.to_disk instead of Model.to_bytes (which is currently used). My question was not really clear there, sorry.

It seems that for a fix it is only necessary to use to_disk which is implemented in both versions, Thinc 7 and 8, such that it closes the file after writing to it. So this should work for both versions without larger modifications.

@svlandeg
Copy link
Member

svlandeg commented Apr 6, 2020

Yea, I think you're right. The Vocab is also written with its to_disk method so I think that'd make sense!

@DevOpsChris
Copy link

DevOpsChris commented May 13, 2020

I think this is similar to the issue when i'm invoking nlp.to_disk('/path/to/thingie')

Exception ignored in: <_io.FileIO name='/path/to/thingie/meta.json' mode='wb' closefd=True>
ResourceWarning: unclosed file <_io.TextIOWrapper name='/path/to/thingie/meta.json' mode='w' encoding='UTF-8'>
Exception ignored in: <_io.FileIO name='/path/to/thingie/tagger/model' mode='wb' closefd=True>
ResourceWarning: unclosed file <_io.BufferedWriter name='/path/to/thingie/tagger/model'>
Exception ignored in: <_io.FileIO name='/path/to/thingie/vocab/vectors' mode='wb' closefd=True>
ResourceWarning: unclosed file <_io.BufferedWriter name='/path/to/thingie/vocab/vectors'>

Python version: 3.7.7
Spacy versions: 2.2.2, 2.2.3, 2.2.4

Which is weird. The only thing I think changed was minor update to python? I was using 2.2.3 a month ago, and then today it's decided to show this error. Note: I'm using it, not unittesting.

@svlandeg svlandeg linked a pull request May 19, 2020 that will close this issue
3 tasks
@svlandeg
Copy link
Member

@lfiedler : thanks again for your contributions, this will be included in spaCy 2.3 onwards !

@lock
Copy link

lock bot commented Jun 24, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

2 similar comments
@lock
Copy link

lock bot commented Jun 24, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock
Copy link

lock bot commented Jun 24, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feat / serialize Feature: Serialization, saving and loading help wanted Contributions welcome!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants