Skip to content

Analyse differences between two versions of a vector geospatial dataset

License

Notifications You must be signed in to change notification settings

eurostat/GeoDiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GeoDiff

GeoDiff allows:

  • Extracting the differences between two versions of a vector geospatial dataset.
  • Applying changes/updates to a vector geospatial dataset.

Both utilisation modes are based on the GeoDiff format.

Difference analysis mode

+ =

Update mode

+ =

Quick start

  • Download and unzip geodiff-2.3.zip.

  • To compute the difference between two versions of a dataset, run: java -jar GeoDiff.jar -m diff -v1 pathTo/dataset_v1.gpkg -v2 pathTo/dataset_v2.gpkg -id identCol -o out/. The result is stored in a new out/ folder. identCol is the name of the identifier column in both datasets.

  • To update a dataset with GeoDiff data, run: java -jar GeoDiff.jar -m up -d pathTo/dataset.gpkg -c pathTo/geodiff.gpkg. The updated dataset is stored in a new out.gpkg file.

You can alternativelly edit and execute geodiff.bat (or geodiff.sh for Linux users).

Requirements

Java 1.8 or higher is required. The java version installed, if any, can be found with java --version command. Recent versions of Java can be installed from here.

Difference analysis mode

This mode analyses differences between two versions of a vector geospatial dataset. It produces a GeoDiff file representing the differences between both dataset versions and some auxilary data describing these differences.

Input parameters

The help is displayed with java -jar GeoDiff.jar -h command.

Parameter Required Description Default value
-h Show the help message
-m x Set to 'diff' for difference anaysis mode.
-v1 x First version of the dataset. The supported formats are GeoJSON (*.geojson extension), SHP (*.shp extension) and GeoPackage (*.gpkg extension).
-v2 x Second version of the dataset. The supported formats are GeoJSON (*.geojson extension), SHP (*.shp extension) and GeoPackage (*.gpkg extension).
-id Name of the identifier field. 'id'
-res The geometrical resolution. Geometrical differences below this value will be ignored. 0
-ati List of attributes to ignore for the comparison, comma separated.
-o Output folder. The current location of the program.
-of Output format. The supported formats are GeoJSON ('geojson'), SHP ('shp') and GeoPackage ('gpkg') 'gpkg'

Outputs

The program produces the following datasets:

  • geodiff dataset containing the differences between both versions in GeoDiff format.

  • geomdiff1 dataset containing a set of linear features representing the Hausdorf segments between the two versions of the geometries. This segment represents the place where the geometrical difference between the two versions is maximum. Its length is a good measure for the difference magnitude.

(First version in gray - Second version blue outline - Corresponding Hausdorf segment in purple)

  • geomdiff2 dataset containing features representing the spatial gains and losses between the two versions of the geometries. Gains are labeled with an attribute GeoDiff set to I, and losses are labeled with D value.

(Geometry gains in green, losses in red)

  • idstab dataset: The stability of the identifier between two versions of a feature might not be respected, by mistake. This leads to the detection of superfluous pairs (deletion, insertion) of the same feature, which do not reflect genuine differences of the dataset. In general, a pair (deletion, insertion) is not considered as pertinent when both feature versions are the same (or have very similar geometries), but their identifier is different. This datasets contains the difference features representing these superflous (deletion, insertion) pairs. Those pairs could either be removed if both feature versions are exactly the same, or replaced with a difference if these versions are similar. The parameter res indicates the distance threshold to decide when the geometries are too similar to be considered as representing totally different entities.

(Detected stability issues in pink)

Update mode

This mode applies updates to a vector geospatial dataset. The updates are specified in a GeoDiff file.

Input parameters

The help is displayed with java -jar GeoDiff.jar -h command.

Parameter Required Description Default value
-h Show the help message
-m x Set to 'up' for update mode.
-d x Dataset in its initial state. The supported formats are GeoJSON (*.geojson extension), SHP (*.shp extension) and GeoPackage (*.gpkg extension).
-c x The changes/updates to apply to the dataset, in GeoDiff format. The supported formats are GeoJSON (*.geojson extension), SHP (*.shp extension) and GeoPackage (*.gpkg extension).
-id Name of the identifier field. 'id'
-o Output updated dataset. The supported formats are GeoJSON (*.geojson extension), SHP (*.shp extension) and GeoPackage (*.gpkg extension). out.gpkg

Output

The output is the dataset updated with the specified updates.

For coders

Install JGiscoTools and see the instructions here.

Support and contribution

Feel free to ask support, fork the project or simply star it (it's always a pleasure). The source code is currently stored as part of JGiscoTools repository. It is mainly based on GeoTools and JTS Topology Suite.