Write script to update all data for Scribe updates #95

andrewtavis · 2022-01-07T12:57:02Z

Currently updating Scribe's data for a release is time consuming, with this process being one that could easily be converted to a single Python script. The goal of this issue is to make a script that uses a Wikidata tool to run all of the query*.sparql scripts in the Scribe-Data/data directory, and then runs all the format_*.py scripts to put the new information into the respective Data directories for the keyboards. The script would be placed in the root of Scribe-Data/data.

As of now it would likely be best if translation scripts were not included as these are not being updated until Deleted: Add translation data to Wikidata Scribe-Data#17 is finished

Another positive is that this feature will provide many good first issues for the future where someone can update the data and thus be integrated into the contribution process.

The text was updated successfully, but these errors were encountered:

andrewtavis · 2022-01-10T12:37:34Z

The script written to complete this issue should also produce a .txt/.json file that has the differences in data for the most recent run. The structure of this file should allow it to be directly pasted into CHANGELOG.md, thus turning the data update process into two steps (run script and paste results).

andrewtavis · 2022-01-10T12:44:47Z

It looks like WikidataIntegrator would be an appropriate tool to complete the SPARQL query portion of this task, as shown in this Jupyter notebook.

andrewtavis · 2022-01-11T13:11:25Z

Current progress is a script that can find the necessary queries, picks the first to run, formats it for use with WikidataIntegrator, runs the query and prints the results of the query.

Next steps:

Handle errors from WikidataIntegrator when the query doesn't work
Allow Scribe-Data/data to take arguments for certain languages and data types
Save results that are currently being printed into a JSON
Run coinciding formatting script over the saved file to update Scribe's data
Compare and save results in Scribe-Data/data/_update_files/total_data.json
Update Scribe-Data/data/_update_files/data_updates.txt

andrewtavis · 2022-01-12T11:35:06Z

Labels for properties do not appear to be being queried with WikidataIntegrator, but this can be checked for in formatting scripts where the string will start with Q and be followed by an int. In this case these values can be translated to their labels prior to being assigned.

Example: map_genders of format_nouns.py scripts can also check for QIDs and then convert them to abbreviations.

andrewtavis · 2022-01-15T11:15:39Z

The final step is to allow the individual formatting files to detect when they're being ran by Scribe-Data/data/update_data.py by checking the Python paths via sys.argv[0].

andrewtavis · 2022-04-08T07:26:30Z

Note that the result of this issue is now scribe_data/load/update_data.py, and all referenced files have been moved to Scribe-Data.

andrewtavis added good first issue Good for newcomers help wanted Extra attention is needed data Relates to data or Wikidata labels Jan 7, 2022

andrewtavis added the -priority- High priority label Jan 10, 2022

andrewtavis added a commit that referenced this issue Jan 11, 2022

#95 Baseline data update scripts that can print results of a query

5f63856

andrewtavis added a commit that referenced this issue Jan 12, 2022

#95 Arguments for update data and pass QIDs in formatting scripts

3713d00

andrewtavis added a commit that referenced this issue Jan 12, 2022

#95 Add calls to formatting files in update_data.py

753123a

andrewtavis added a commit that referenced this issue Jan 12, 2022

#95 update data update json totals and text differences

7a2041d

andrewtavis added a commit that referenced this issue Jan 12, 2022

#95 Rename data updates .txt file

6c2ed0d

andrewtavis added a commit that referenced this issue Jan 12, 2022

#95 commennt update_data and minor code changes

55b8bb8

andrewtavis mentioned this issue Jan 12, 2022

Unable to handle errors from WDFunctionsEngine.execute_sparql_query SuLab/WikidataIntegrator#189

Closed

andrewtavis added a commit that referenced this issue Jan 15, 2022

#95 handle exceptions from WikidataIntegrator in data update

8ed67d3

andrewtavis added a commit that referenced this issue Jan 15, 2022

#95 update writing data_updates.txt with a colon per line

3fd4006

andrewtavis added a commit that referenced this issue Jan 15, 2022

#95 update formatting files to work with update_data.py

fe3ec56

andrewtavis closed this as completed Jan 15, 2022

andrewtavis added a commit that referenced this issue Jan 15, 2022

Changelog edits to reflect work in #95

dfe3cbb

andrewtavis added a commit that referenced this issue Jan 15, 2022

Readme edits to reflect work done in #95

a934fc6

andrewtavis added a commit that referenced this issue Jan 16, 2022

#95 add method to add a new language via edit to total_data.json

fbe0088

andrewtavis self-assigned this Apr 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write script to update all data for Scribe updates #95

Write script to update all data for Scribe updates #95

andrewtavis commented Jan 7, 2022 •

edited

Loading

andrewtavis commented Jan 10, 2022 •

edited

Loading

andrewtavis commented Jan 10, 2022

andrewtavis commented Jan 11, 2022 •

edited

Loading

andrewtavis commented Jan 12, 2022 •

edited

Loading

andrewtavis commented Jan 15, 2022 •

edited

Loading

andrewtavis commented Apr 8, 2022 •

edited

Loading

Write script to update all data for Scribe updates #95

Write script to update all data for Scribe updates #95

Comments

andrewtavis commented Jan 7, 2022 • edited Loading

andrewtavis commented Jan 10, 2022 • edited Loading

andrewtavis commented Jan 10, 2022

andrewtavis commented Jan 11, 2022 • edited Loading

andrewtavis commented Jan 12, 2022 • edited Loading

andrewtavis commented Jan 15, 2022 • edited Loading

andrewtavis commented Apr 8, 2022 • edited Loading

andrewtavis commented Jan 7, 2022 •

edited

Loading

andrewtavis commented Jan 10, 2022 •

edited

Loading

andrewtavis commented Jan 11, 2022 •

edited

Loading

andrewtavis commented Jan 12, 2022 •

edited

Loading

andrewtavis commented Jan 15, 2022 •

edited

Loading

andrewtavis commented Apr 8, 2022 •

edited

Loading