Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document / Throw Warnings for Non-implemented JTS Constraints #54

Closed
cspencer51 opened this issue May 1, 2015 · 9 comments
Closed

Document / Throw Warnings for Non-implemented JTS Constraints #54

cspencer51 opened this issue May 1, 2015 · 9 comments
Labels
general General improvements

Comments

@cspencer51
Copy link

JSON Table Schema defines several constraints (minLength, maxLength, pattern, minimum, maximum, and oneOf) that are not yet implemented by goodtables. When checking a delimited file against a JTS file with column constraints, goodtables does not throw any warnings that some constraints are unchecked or ignored. Instead, the constraints are implicitly ignored and goodtables returns the "Data is Valid" message. This is a form of silent failure that is misleading to the user; they don't know that their data isn't checked against these constraints.

Even though this software is still being developed, unimplemented constraints should be documented and if unimplemented constraints are used in the JTS file, goodreads should output a warning that constraints are being ignored.

@pwalsh
Copy link
Member

pwalsh commented May 2, 2015

Ok. See my comments on #55. I'm not sure about adding code to warning about unimplemented parts of the JTS spec - I'd rather try to get all of JTS covered in JTSKit ASAP, and then reflect those changes here. There is not too much work to do to get to 100% spec coverage.

Would you be interested in helping with that?

@pwalsh pwalsh added this to the Backlog milestone May 2, 2015
@pwalsh pwalsh added general General improvements help wanted labels May 2, 2015
@cspencer51
Copy link
Author

I agree, staying up to date with JTSKit is preferable to adding warnings and I'm excited to hear that that goal seems to be achievable in the near future.

I've done some very basic projects in Python, but this would be the first time I've worked on a project of this size. So, I may need a little help understanding your setup, but I'd be happy to help out where I can.

Case in point, I forked goodtables and tried implementing the pattern constraint in schema.py. However, I wasn't able to figure out how to get the changes I made to take effect when I run the following from goodtables/goodtables/cli/ :

python main.py schema <path_to_csv_data> --schema <path_to_json_schema>

Goodtables runs and the checks pass. However, in this case, the json schema specifies a pattern constraint that should fail for the passed data. So, the checks should not pass. So, I needed to ensure that I was making changes to the code that was being run.

To do this, I decided to change one of the error messages and then intentionally trigger an error message to be displayed by changing the type of an integer column from integer to boolean. Running with the same command above, the tests do not pass but the error message is not altered. I'm not sure why I can't get the changes I make to take effect.

@pwalsh
Copy link
Member

pwalsh commented May 6, 2015

Hi @cspencer51 great to hear you'd be happy to help.

We can walk through this together. In order to do so, could you please provide links to:

  • An example CSV file
  • An example schema
  • The changes you have made to processors.schema

Thanks.

@cspencer51
Copy link
Author

Here are the csv file and json schema that I was working with:

pattern_test.csv

series_id,year,period,value,footnote_codes
ABC0000000001,2010,2010-12-01,29923,p
ABC0000000001,2010,2010-12-02,30101,p
ABC0000000001,2010,2010-12-03,30280,c
ABC0000000001,2010,2010-12-04,30094,p
ABC0000000001,2010,2010-12-05,30300,c
ABC0000000001,2010,2010-12-06,30502,p
ABC0000000001,2010,2010-12-07,30419,c
ABC0000000001,2010,2010-12-08,30663,c
ABC0000000001,2010,2010-12-09,31032,p
ABC0000000001,2010,2010-12-10,31032,c

pattern_test.json

{
  "fields": [
    {
      "name": "series_id",
      "type": "string",
      "constraints": {
        "pattern": "[A-Z]{4}[0-9]{9}"
      }
    },
    {
      "name": "year",
      "type": "number"
    },
    {
      "name": "period",
      "type": "string"
    },
    {
      "name": "value",
      "type": "number"
    },
    {
      "name": "footnote_codes",
      "type": "string"
    }
  ]
}

You'll notice that the constraint on the series_id column should not pass for those values in the pattern_test.csv file.

I cloned goodtables into a subdirectory of my home folder and made these additions to processors.schema: master...cspencer51:issue_55

I then ran goodtables from the goodtables/goodtables/cli directory on the issue_55 branch of the repo with the following command

$ python main.py schema ../../../../pattern_test.csv --schema ../../../../pattern_test.json

The response I got was:

WELL DONE! THE DATA IS VALID :)

META.
Name: schema

###

RESULTS.
No results were generated.

The problem is, as I mentioned before, that the series_id column does not match the regex pattern. In order to see if any of my changes were taking effect, I decided to modify the error message of an error that I know works and then trigger that error to see if it showed my modified error message. So, I changed the type of the series_id column from string to number and saved it as a new file:

pattern_test_mod.json

{
  "fields": [
    {
      "name": "series_id",
      "type": "number",
      "constraints": {
        "pattern": "[0-1]{3}[a-z]{10}"
      }
    },
    {
      "name": "year",
      "type": "number"
    },
    {
      "name": "period",
      "type": "string"
    },
    {
      "name": "value",
      "type": "number"
    },
    {
      "name": "footnote_codes",
      "type": "string"
    }
  ]
}

I also added "Hey! " to the beginning of the schema_003 result msg in processors.schema:

...
'schema_003': {
        'id': 'schema_003',
        'name': 'Incorrect Type',
        'msg': 'Hey! The value is not a valid {0}.',
        'help': '',
        'help_edit': ''
    }
...

I re-ran with the following arguments:

$ python main.py schema ../../../../pattern_test.csv --schema ../../../../pattern_test_mod.json

This time goodtables caught the invalid type, but the message was not altered as I expected.

OOPS.THE DATA IS INVALID :(

META.
Name: schema

###

RESULTS.
+----------------+-------------+-------------+----------------------------------+
| result_name    | result_id   |   row_index | result_message                   |
+================+=============+=============+==================================+
| Incorrect Type | schema_003  |           0 | The value is not a valid Number. |
+----------------+-------------+-------------+----------------------------------+
| Incorrect Type | schema_003  |           1 | The value is not a valid Number. |
+----------------+-------------+-------------+----------------------------------+
| Incorrect Type | schema_003  |           2 | The value is not a valid Number. |
+----------------+-------------+-------------+----------------------------------+
| Incorrect Type | schema_003  |           3 | The value is not a valid Number. |
+----------------+-------------+-------------+----------------------------------+
| Incorrect Type | schema_003  |           4 | The value is not a valid Number. |
+----------------+-------------+-------------+----------------------------------+
| Incorrect Type | schema_003  |           5 | The value is not a valid Number. |
+----------------+-------------+-------------+----------------------------------+
| Incorrect Type | schema_003  |           6 | The value is not a valid Number. |
+----------------+-------------+-------------+----------------------------------+
| Incorrect Type | schema_003  |           7 | The value is not a valid Number. |
+----------------+-------------+-------------+----------------------------------+
| Incorrect Type | schema_003  |           8 | The value is not a valid Number. |
+----------------+-------------+-------------+----------------------------------+
| Incorrect Type | schema_003  |           9 | The value is not a valid Number. |
+----------------+-------------+-------------+----------------------------------+

This leads me to believe that there must be something else I need to do in order for my changes to take effect when I run goodtables.

@pwalsh
Copy link
Member

pwalsh commented May 12, 2015

Hi,

Let's focus on the first part. Can you check if regex here

is None when you expect a value?

@cspencer51
Copy link
Author

I tried adding print(regex) just above if regex is not None and ran the program. It did not print anything out. I then added print "Hello" just below the first print and before if regex is not None and ran the program. Again, nothing printed out.

This leads me to think that the changes that I make to the file are some how not taking effect (i.e. a pre-compiled / .pyc version is being used instead of my altered version). This is why I tried adding "Hey! " to the beginning of the schema_003 result msg in processors.schema (something I knew I could get to print out by specifying an invalid column type in the json schema). I was trying to determine whether any change I made to the code was taking effect. Since the error message that printed out did not include "Hey!" like I thought it would, this seems to confirm my suspicion.

@pwalsh
Copy link
Member

pwalsh commented May 12, 2015

So it sounds like you have some local setup issue there.

@cspencer51
Copy link
Author

I just tried adding print("Hello!") here and it did print out:

Hello!
WELL DONE! THE DATA IS VALID :)

META.
Name: schema

###

RESULTS.
No results were generated.

Is there a trick to getting printed values in included files to show up?

@pwalsh
Copy link
Member

pwalsh commented Mar 15, 2016

WONTFIX here. will be fixed with #55

@pwalsh pwalsh closed this as completed Mar 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general General improvements
Projects
Archived in project
Development

No branches or pull requests

2 participants