Question on licensing #1865

jwijffels · 2018-01-19T13:43:09Z

I have a question on licensing.
Especially when data from universaldependencies.org is used which is the case for models of Spanish, Dutch, French, Portuguese and Italian.
Am I correct that these have been built on the following corpora or are other corpora used?

Spanish: (UD_Spanish-AnCora: https:/UniversalDependencies/UD_Spanish-AnCora), license GPL 3.0
Dutch: (UD_Dutch: https:/UniversalDependencies/UD_Dutch), license CC-BY-SA 4.0
French: (UD_French-Sequoia, https:/UniversalDependencies/UD_French-Sequoia), license LGPL For Linguistic Resources
Portuguese: (UD_Portuguese, https:/UniversalDependencies/UD_Portuguese), license CC-BY-SA 4.0
Italian: (UD_Italian, https:/UniversalDependencies/UD_Italian), license CC-BY-NC-SA 3.0 Unported licence

I see that Spacy distributes the resulting models respectively under the following licenses:

Spanish: GPL
Dutch: CC-BY-SA 4.0
French: LGPL
Portuguese: CC-BY-SA 4.0
Italian: CC BY-SA 4.0

Which seems to indicate you keep the same license as the database used to train the model upon except for Italian.
Shouldn't this Italian model also be licensed under the CC-BY-NC-SA 3.0 Unported license.

I'm copying some comments from the creative commons website: https://wiki.creativecommons.org/wiki/data#Can_I_conduct_text.2Fdata_mining_on_a_CC-licensed_database.3F which indicates that, although mostly related to CC-BY 4.0 licenses.
Either way the non-commercial part seems to indicate that redistributing under the less restrictive CC BY-SA 4.0 would be not allowed.

Can I conduct text/data mining on a CC-licensed database?

Yes. However, you should be aware that whether you have to comply with the CC license terms and conditions will depend on whether the type of mining activity you conduct implicates copyright or any applicable sui generis database rights. If you are not exercising an exclusive right held by the database maker, then you do not need to rely on the license to mine. Because there are many different methods for conducting text and data mining, however, there may be some types of mining activities that will implicate the licensed rights.

If and only if your particular use is one that would require permission, you should note the following:
Permission: All six of the 4.0 licenses allow for text and data mining by granting express permission to privately reproduce, extract, and reuse the contents of a licensed database and create adapted databases.
Commercial purposes: If you are conducting text and data mining for commercial purposes, you should not mine NC-licensed databases or other material.
Outputs: If you publicly share the results of your mining activity or the data you mined, you should attribute the rights holder. If what you publicly share qualifies as an adaptation of the licensed material, you should not mine ND-licensed material. If you share an adaptation of material under an SA license, you must apply the same license to the adaptation that results.
If your use is not one that requires permission under the license, you may conduct text and data mining activity without regard to the above considerations.

Or was another corpus used for creating the Italian model?

The text was updated successfully, but these errors were encountered:

honnibal · 2018-01-20T15:13:02Z

You're correct --- we made a mistake on the Italian model. Huge thanks for calling our attention to this.

Our process for propagating the license information is to add the license to a metadata file in our model building repository. This attribute is then taken when the model files are built, and shipped with the trained model. The same metadata file is consulted when the table of model files is compiled for the website.

In the case of the Italian model, I copied the wrong information into the file. I was doing a number of these models at the same time, and as you note some of the UD corpora are licensed NC, while others are licensed SA.

The question now is how to go about issuing the correction. I guess we post something to the spaCy list, and update the metadata?

jwijffels · 2018-01-21T14:25:32Z

I'm not a specialist in such situations but it seems to me that all distributions where that Italian model is included should update that license to incorporate the NC part. And users of the model who used it for commercial purposes or redistributed it under that license would probably need information on this change.
Or maybe contact the authors of the treebank (https:/UniversalDependencies/UD_Italian/blob/master/LICENSE.txt) to see if they would allow to drop the NC part.

honnibal · 2018-09-12T14:31:51Z

The new models being distributed for the v2.1 release now have updated licenses.

lock · 2018-10-12T14:58:46Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the meta Meta topics, e.g. repo organisation and issue management label Jan 20, 2018

honnibal closed this as completed Sep 12, 2018

lock bot locked as resolved and limited conversation to collaborators Oct 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on licensing #1865

Question on licensing #1865

jwijffels commented Jan 19, 2018 •

edited

Loading

honnibal commented Jan 20, 2018

jwijffels commented Jan 21, 2018

honnibal commented Sep 12, 2018

lock bot commented Oct 12, 2018

Question on licensing #1865

Question on licensing #1865

Comments

jwijffels commented Jan 19, 2018 • edited Loading

honnibal commented Jan 20, 2018

jwijffels commented Jan 21, 2018

honnibal commented Sep 12, 2018

lock bot commented Oct 12, 2018

jwijffels commented Jan 19, 2018 •

edited

Loading