Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposed update to 'Accessing Data through a Service Endpoint' #88

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
257 changes: 206 additions & 51 deletions guides/Dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,8 @@ In it's most basic form, the variable as a [schema:PropertyValue](https://schema
}
</pre>
<a id="variables_external-vocab-example"></a>
A fully-fleshed out example that uses a vocabulary to describe the variable can be published as:
If a URI is available that identifies the variable, it should be included as the
[PropertyID](https://schema.org/propertyID):

<pre>
{
Expand All @@ -209,8 +210,9 @@ A fully-fleshed out example that uses a vocabulary to describe the variable can
...
"variableMeasured": [
{
<strong>"@type": ["PropertyValue", "gsn-quantity:latitude"],</strong>
"@type": "PropertyValue",
"name": "latitude",
<strong>"propertyID":"gsn-quantity:latitude"</strong>,
"url": "https://www.sample-data-repository.org/dataset-parameter/665787",
"description": "Latitude where water samples were collected; north is positive.",
"unitText": "decimal degrees",
Expand Down Expand Up @@ -343,13 +345,15 @@ d1:MediaObjectShape

Back to [top](#top)

### Distributions
### Distribution: how to access the data

Where the [schema:url](https://schema.org/url) property of the Dataset should point to a landing page, the way to describe how to download the data in a specific format is through the [schema:distribution](https://schema.org/distribution) property. The "distribution" property describes where to get the data and in what format by using the [schema:DataDownload](https://schema.org/DataDownload) type. If your dataset is not accessible through a direct download URL, but rather through a service URL that may need input parameters jump to the next section [Accessing Data through a Service Endpoint](#dataset-service-endpoint).
The [schema:url](https://schema.org/url) property of the Dataset should point to an authoritative dataset landing page, which will typically include some links to download data. Use the [schema:distribution](https://schema.org/distribution) property, for which the expected data type is [schema:DataDownload](https://schema.org/DataDownload), for datasets that have direct data download URLs, have a web application that assists users to get subsets of the data for their specific purpose, or are accessible through a web service (WebAPI) that may need input parameters.

![Distributions](/assets/diagrams/dataset/dataset_distribution.svg "Dataset - Distributions")

For data available in multipe formats, there will be multiple values of the [schema:DataDownload](https://schema.org/DataDownload):
#### Accessing Data through a direct download URL

The DataDownload/contentURL should be a URL that will get the dataset in a particular format. The data format should be indicated by the schema:DataDownload/schema:encodingFormat string. Recommended usage is to provide a registered MIME type to specify the format; if an identifier string for a particular profile of the format is available, that can be included as a type parameter with the MIME Type. For example 'application/json;type=WaterML'. Specifying a more specific format can enable automation to connect datasets and applications that work with that particular data format profile. For data available in multiple formats with different URLs for each format, there will be multiple values of the [schema:DataDownload](https://schema.org/DataDownload):

<pre>
{
Expand All @@ -368,67 +372,218 @@ For data available in multipe formats, there will be multiple values of the [sch
}
</pre>

#### Accessing Data through a data selection web application

Some large datasets are accessible via web sites that assist the user to construct a set of filters to subset the dataset and obtain only the data they need. The URL for the web site is not a direct download url, so the address for the web site should be placed in the DataDownload/url element. Adding WebSite as an additional type will make the access approach more explicit. Example:

<pre>
"distribution": {
"@type": [ "DataDownload", "WebSite" ],
"name": "ERDDAP Server",
"description": "Web form to select ARGO data and download one of many offered formats.
See https://www.ifremer.fr/erddap/tabledap/ArgoFloats.html#DAS
for complete list of variables in data structure",
"url": "https://www.ifremer.fr/erddap/tabledap/ArgoFloats.html"
}
</pre>

#### Accessing Data through a Service Endpoint

If access to the data requires some input parameters before a download can occur, we can use the [schema:potentialAction](https://schema.org/potentialAction) in this way:
In some cases the data can be accessed via a WebAPI with a request including parameters that enable, for example, subsetting, filtering, or selection of different format options. In such cases, we can use the [schema:potentialAction](https://schema.org/potentialAction), which the [schema:DataDownload](https://schema.org/DataDownload) object inherits from [schema:Thing](https://schema.org/Thing). The value expected for a [schema:potentialAction](https://schema.org/potentialAction) is [schema:SearchAction](https://schema.org/SearchAction). In the simplest case, the search action target is a [schema:EntryPoint](https://schema.org/EntryPoint) that specifies a urlTemplate (see [IETF RFC-6570](https://tools.ietf.org/html/rfc6570)), and a set of query-input [schema:PropertyValueSpecification](https://schema.org/PropertyValueSpecification) objects that describe the template parameters. The [schema:valueName](https://schema.org/valueName) in each property value specification matches one of the urlTemplate parameters, which are enclosed in curly braces ('{}').

![Service Endpoint](/assets/diagrams/dataset/dataset_service-endpoint.svg "Dataset - Service Endpoint")

The basic pattern looks like this:

<pre>
{
"@context": {
"@vocab": "https://schema.org/",
"datacite": "http://purl.org/spar/datacite/"
"@vocab": "https://schema.org/"
},
"@type": "Dataset",
"name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
"name": "Argo float data and metadata from Global Data Assembly Centre (Argo GDAC)",
...
<strong>"potentialAction": {
"@type": "SearchAction",
"target": {
"@type": "EntryPoint",
"contentType": ["application/x-netcdf", "text/tab-separated-values"],
"urlTemplate": "https://www.sample-data-repository.org/dataset/1234/download?format={format}&startDateTime={start}&endDateTime={end}&bounds={bbox}",
"description": "Download dataset 1234 based on the requested format, start/end dates and bounding box",
"httpMethod": ["GET", "POST"]
},
"query-input": [
{
"@type": "PropertyValueSpecification",
"valueName": "format",
"description": "The desired format requested either 'application/x-netcdf' or 'text/tab-separated-values'",
"valueRequired": true,
"defaultValue": "application/x-netcdf",
"valuePattern": "(application\/x-netcdf|text\/tab-separated-values)"
},
{
"@type": "PropertyValueSpecification",
"valueName": "start",
"description": "A UTC ISO DateTime",
"valueRequired": false,
"valuePattern": "(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(.[0-9]+)?(Z)?"
},
{
"@type": "PropertyValueSpecification",
"valueName": "end",
"description": "A UTC ISO DateTime",
"valueRequired": false,
"valuePattern": "(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(.[0-9]+)?(Z)?"
},
{
"@type": "PropertyValueSpecification",
"valueName": "bbox",
"description": "Two points in decimal degrees that create a bounding box fomatted at 'lon,lat' of the lower-left corner and 'lon,lat' of the upper-right",
"valueRequired": false,
"valuePattern": "(-?[0-9]+(.[0-9]+)?),[ ]*(-?[0-9]+(.[0-9]+)?)[ ]*(-?[0-9]+(.[0-9]+)?),[ ]*(-?[0-9]+(.[0-9]+)?)"
}
]
}</strong>
"distribution": {
"@type": [ "DataDownload","WebAPI" ],
"name": "Argovis WebAPI",
"serviceType": "Argovis API",
"documentation": "https://argovis.colorado.edu/api-docs/#/",
"description": "Access Argo profiles via API, i.e. temperature, salinity, and biogeochemical data by location. Argo metadata, float trajectory forecasts, gridded fields, weather events are also available through API",
"potentialAction": {
"@type": "SearchAction",
"target": {
"@type": "EntryPoint",
"urlTemplate": "https://argovis.colorado.edu/selection/profiles?startDate={start}&endDate={end}&shape={shape}&presRange={presRange}",
"description": "download profiles within a bounding box for specified start/end dates",
"httpMethod": ["GET"]
},
"query-input": [
{
"@type": "PropertyValueSpecification",
"valueName": "shape",
"description": "list of lists containing [lon, lat] coordinates that define a polygon; first and last coordinate pair should be the same poitn. example: shape = [[[-144.84375,36.031332],[-136.038755,36.210925],[-127.265625,35.746512],[-128.144531,22.755921],[-136.543795,24.835311],[-145.195313,26.431228],[-144.84375,36.031332]]]",
"valueRequired": true
},
{
"@type": "PropertyValueSpecification",
"valueName": "start",
"description": "string formatted as 'YYYY-MM-DD'",
"valueRequired": true
},
{
"@type": "PropertyValueSpecification",
"valueName": "end",
"description": "string formatted as 'YYYY-MM-DD'",
"valueRequired": true
},
{
"@type": "PropertyValueSpecification",
"valueName": "presRange",
"description": "a string of a list formatted as '[minimum pres,maximum pres]' (no spaces)",
"valueRequired": false,
"defaultValue": "None"
}
],
"result":{
"@type":"DataDownload",
"encodingFormat":"application/json"
}
}
}
</pre>

Here, we use the [schema:SearchAction](https://schema.org/SearchAction) type becuase it lets you define the query parameters and HTTP methods so that machines can build user interfaces to collect those query parmaeters and actuate a request to provide the user what they are looking for.

Here, we use the [schema:SearchAction](https://schema.org/SearchAction) type becuase it lets you define the template parameters and HTTP methods so that machines can build user interfaces to collect those query parameters and actuate a request to provide the user what they are looking for. Adding 'WebAPI' as an additional type will make the access approach more explicit,and also adds properties to specify serviceType and a link to a service description document like OpenAPI/Swagger or OGC getCapabilities.

Note that the schema:SearchAction object also includes a [schema:result](https://schema.org/result) property that can be used to provide information about the encoding format of the WebAPI response, and a [schema:object](https://schema.org/object) property that can be used to provide a more detailed description of the data type for the WebAPI response. A more detailed description of an API would be like this (elipses ... indicate where some of the template property specifications are omitted for brevity) :

<pre>
"distribution": {
"@type": [ "DataDownload","WebAPI" ],
"name": "IRIS DMC FDSNWS event Web Service",
"serviceType": "FDSNWS event API",
"documentation": "http://service.iris.edu/fdsnws/event/1/",
"description": "The fdsnws-event web service returns event (earthquake) information from catalogs originating from the NEIC and the ISC data centers. ",
"potentialAction": [
"@type": "SearchAction",
"name": "Query",
"description": "query service to obtain records of seismic events",
"result":
{
"@type": "DataDownload",
"encodingFormat": [
"application/xml;type=QuakeML",
"text/csv","QuakeML",
"text/csv+geocsv",
"GeoCSV-SeismicEvent"
],
"description": "XML, csv, or csv fromat for seismic event following EarthCube geoWs conventions."
},
"target": {
"@type": "EntryPoint",
"urlTemplate": "http://service.iris.edu/fdsnws/event/1/query?{geographic-constraints}&{depth-constraints}&{temporal-constraints}&{magnitude-constraints}&{organization-constraints}&{misc-parameters}&{format-option}&{nodata=404}",
"description": "URL with multiple query parameters--geographic location, event depth, time period of event, event magnitude, source network, miscellaneous parameters, format for returned data, and what flag to use for no data. ",
"httpMethod":"GET",
"uriTemplate-input": [
{
"@id": "urn:iris:fsdn.starttime",
"@type": "PropertyValueSpecification",
"valueName": "start",
"defaultValue": "Any",
"description": "allowed: Any valid time. Limit to events on or after the specified start time; use UTC for time zone",
"valueRequired": true,
"valuePattern": "(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(.[0-9]+)?",
"xsd:type": "dateTime"
},
{
"@id": "urn:iris:fsdn.endtime",
"@type": "PropertyValueSpecification",
"valueName": "end",
"defaultValue": "Any",
"description": "allowed: Any valid time. Limit to events on or before the specified start time",
"valueRequired": true,
"valuePattern": "(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(.[0-9]+)?"
},
{
"@id": "urn:iris:fsdn.minlatitude",
"@type": "PropertyValueSpecification",
"valueName": "minlat",
"defaultValue": "-90.0",
"description": "Limit to events with a latitude larger than or equal to the specified minimum. Value must be less that maxlat",
"valueRequired": true,
"minValue": -90.0,
"maxValue": 90.0,
"xsd:type": "float",
"unitOfMeasure": "degrees"
},
....
{
"@id": "urn:iris:fsdn.maxlongitude",
"@type": "PropertyValueSpecification",
"valueName": "maxlon",
"defaultValue": "180.0",
"description": "Limit to events with a longitude smaller than or equal to the specified maximum.",
"valueRequired": true,
"minValue":-180.0,
"maxValue": 180.0,
"xsd:type": "float",
"unitOfMeasure": "degrees"
},
{
"@id": "urn:iris:fsdn.latitude",
"@type": "PropertyValueSpecification",
"valueName": "lat",
"defaultValue": "0.0",
"description": "Specify the latitude to be used for a radius search.",
"valueRequired": false,
"minValue":-90.0,
"maxValue": 90.0,
"xsd:type": "float",
"unitOfMeasure": "degrees"
},
{
"@id": "urn:iris:fsdn.longitude",
"@type": "PropertyValueSpecification",
"valueName": "lon",
"defaultValue": "0.0",
"description": "Specify the longitude to be used for a radius search.",
"valueRequired": false,
"minValue":-180.0,
"maxValue": 180.0,
"xsd:type": "float",
"unitOfMeasure": "degrees"
},
...
{
"@id": "urn:iris:fsdn.maxradius",
"@type": "PropertyValueSpecification",
"valueName": "maxradius",
"defaultValue": "180.0",
"description": "Limit to events within the specified maximum number of degrees from the geographic point defined by the latitude and longitude parameters.",
"valueRequired": false,
"minValue": 0.0,
"maxValue": 180.0,
"xsd:type": "float",
"unitOfMeasure": "degrees"
},
...
]
},
"object": {
"@type": "DataFeed",
"description": "list of properties that are included in seismic event description in response documents. note this example does not include all the variable descriptions for the output object.",
"variableMeasured": [
{
"@type": "PropertyValue",
"name": "name of the variable",
"description": "example of documentation for a varible provided in the result object",
"propertyID": "URI for the property in some ontology",
"measurementTechnique": "URI for the measurement protocol, or text description of procedure and sensor"
}
]
}
}
</pre>

Back to [top](#top)

Expand Down