Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write script to update all data for Scribe updates #95

Closed
andrewtavis opened this issue Jan 7, 2022 · 6 comments
Closed

Write script to update all data for Scribe updates #95

andrewtavis opened this issue Jan 7, 2022 · 6 comments
Assignees
Labels
-priority- High priority data Relates to data or Wikidata good first issue Good for newcomers help wanted Extra attention is needed

Comments

@andrewtavis
Copy link
Member

andrewtavis commented Jan 7, 2022

Currently updating Scribe's data for a release is time consuming, with this process being one that could easily be converted to a single Python script. The goal of this issue is to make a script that uses a Wikidata tool to run all of the query*.sparql scripts in the Scribe-Data/data directory, and then runs all the format_*.py scripts to put the new information into the respective Data directories for the keyboards. The script would be placed in the root of Scribe-Data/data.

Another positive is that this feature will provide many good first issues for the future where someone can update the data and thus be integrated into the contribution process.

@andrewtavis andrewtavis added good first issue Good for newcomers help wanted Extra attention is needed data Relates to data or Wikidata labels Jan 7, 2022
@andrewtavis
Copy link
Member Author

andrewtavis commented Jan 10, 2022

The script written to complete this issue should also produce a .txt/.json file that has the differences in data for the most recent run. The structure of this file should allow it to be directly pasted into CHANGELOG.md, thus turning the data update process into two steps (run script and paste results).

@andrewtavis
Copy link
Member Author

It looks like WikidataIntegrator would be an appropriate tool to complete the SPARQL query portion of this task, as shown in this Jupyter notebook.

@andrewtavis
Copy link
Member Author

andrewtavis commented Jan 11, 2022

Current progress is a script that can find the necessary queries, picks the first to run, formats it for use with WikidataIntegrator, runs the query and prints the results of the query.

Next steps:

@andrewtavis
Copy link
Member Author

andrewtavis commented Jan 12, 2022

Labels for properties do not appear to be being queried with WikidataIntegrator, but this can be checked for in formatting scripts where the string will start with Q and be followed by an int. In this case these values can be translated to their labels prior to being assigned.

Example: map_genders of format_nouns.py scripts can also check for QIDs and then convert them to abbreviations.

@andrewtavis
Copy link
Member Author

andrewtavis commented Jan 15, 2022

The final step is to allow the individual formatting files to detect when they're being ran by Scribe-Data/data/update_data.py by checking the Python paths via sys.argv[0].

@andrewtavis
Copy link
Member Author

andrewtavis commented Apr 8, 2022

Note that the result of this issue is now scribe_data/load/update_data.py, and all referenced files have been moved to Scribe-Data.

@andrewtavis andrewtavis self-assigned this Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-priority- High priority data Relates to data or Wikidata good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant