Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on licensing #1865

Closed
jwijffels opened this issue Jan 19, 2018 · 4 comments
Closed

Question on licensing #1865

jwijffels opened this issue Jan 19, 2018 · 4 comments
Labels
meta Meta topics, e.g. repo organisation and issue management

Comments

@jwijffels
Copy link

jwijffels commented Jan 19, 2018

I have a question on licensing.
Especially when data from universaldependencies.org is used which is the case for models of Spanish, Dutch, French, Portuguese and Italian.
Am I correct that these have been built on the following corpora or are other corpora used?

I see that Spacy distributes the resulting models respectively under the following licenses:

  • Spanish: GPL
  • Dutch: CC-BY-SA 4.0
  • French: LGPL
  • Portuguese: CC-BY-SA 4.0
  • Italian: CC BY-SA 4.0

Which seems to indicate you keep the same license as the database used to train the model upon except for Italian.
Shouldn't this Italian model also be licensed under the CC-BY-NC-SA 3.0 Unported license.

I'm copying some comments from the creative commons website: https://wiki.creativecommons.org/wiki/data#Can_I_conduct_text.2Fdata_mining_on_a_CC-licensed_database.3F which indicates that, although mostly related to CC-BY 4.0 licenses.
Either way the non-commercial part seems to indicate that redistributing under the less restrictive CC BY-SA 4.0 would be not allowed.

Can I conduct text/data mining on a CC-licensed database?

Yes. However, you should be aware that whether you have to comply with the CC license terms and conditions will depend on whether the type of mining activity you conduct implicates copyright or any applicable sui generis database rights. If you are not exercising an exclusive right held by the database maker, then you do not need to rely on the license to mine. Because there are many different methods for conducting text and data mining, however, there may be some types of mining activities that will implicate the licensed rights.

If and only if your particular use is one that would require permission, you should note the following:

Permission: All six of the 4.0 licenses allow for text and data mining by granting express permission to privately reproduce, extract, and reuse the contents of a licensed database and create adapted databases.
Commercial purposes: If you are conducting text and data mining for commercial purposes, you should not mine NC-licensed databases or other material.
Outputs: If you publicly share the results of your mining activity or the data you mined, you should attribute the rights holder. If what you publicly share qualifies as an adaptation of the licensed material, you should not mine ND-licensed material. If you share an adaptation of material under an SA license, you must apply the same license to the adaptation that results.

If your use is not one that requires permission under the license, you may conduct text and data mining activity without regard to the above considerations.

Or was another corpus used for creating the Italian model?

@honnibal
Copy link
Member

You're correct --- we made a mistake on the Italian model. Huge thanks for calling our attention to this.

Our process for propagating the license information is to add the license to a metadata file in our model building repository. This attribute is then taken when the model files are built, and shipped with the trained model. The same metadata file is consulted when the table of model files is compiled for the website.

In the case of the Italian model, I copied the wrong information into the file. I was doing a number of these models at the same time, and as you note some of the UD corpora are licensed NC, while others are licensed SA.

The question now is how to go about issuing the correction. I guess we post something to the spaCy list, and update the metadata?

@honnibal honnibal added the meta Meta topics, e.g. repo organisation and issue management label Jan 20, 2018
@jwijffels
Copy link
Author

I'm not a specialist in such situations but it seems to me that all distributions where that Italian model is included should update that license to incorporate the NC part. And users of the model who used it for commercial purposes or redistributed it under that license would probably need information on this change.
Or maybe contact the authors of the treebank (https:/UniversalDependencies/UD_Italian/blob/master/LICENSE.txt) to see if they would allow to drop the NC part.

@honnibal
Copy link
Member

The new models being distributed for the v2.1 release now have updated licenses.

@lock
Copy link

lock bot commented Oct 12, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Oct 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
meta Meta topics, e.g. repo organisation and issue management
Projects
None yet
Development

No branches or pull requests

2 participants