Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data View: cross-field metadata and their relationship to data visualization #97278

Closed
Tracked by #184648
monfera opened this issue Apr 15, 2021 · 6 comments
Closed
Tracked by #184648
Labels
discuss Feature:ElasticCharts Issues related to the elastic-charts library 🧊 iceboxed Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@monfera
Copy link
Contributor

monfera commented Apr 15, 2021

Keywords: metadata, field, recommender, data view, shared visual attributes, datavis best practices
Examples in response to Vijay's request in moving from index patterns to Data View.

Cross-field metadata

Not all metadata neatly belong to a specific field or an entire index. Sometimes it's about relationship between two or more fields within an index or even, across indices. Examples for metadata across fields, and their utility for visual exploration:

Fields whose contents relate to one another

Hierarchical relationship between fields

One field breaks down another. Examples:

  • Country - State/County - City
  • Product group / product / SKU

It's good to know if the subunit can even be used on its own. Eg. "Paris" can be "France/Île-de-France/Paris" or "US/Texas/Paris", so, on its own, it's ambiguous, unless the City field is a unique code.

Visualizations that work well across the hierarchy:

  • mosaic plot (partitioning)
  • treemap (partitioning)
  • sunburst (partitioning)
  • tree, dendrogram, partitioning trees or partition tree
  • small multiples (eg. horizontal split: higher layer; color: lower layer)
  • various visualizations with drilldown or drill-through interactions to reach other places in the hierarchy, eg. showing US data, then descending to a specific state

image

Styling of hierarchical data might follow a primary breakdown, eg. also projected to color, while the deeper nodes inherit that (or fade out, like the sunburst):
image

Multidimensional variables

Usually, there are several discrete (categorical or ordinal) variables associated with documents. They collectively represent slicing and dicing ability (explorability, drilldown, drill-through etc.). In a given chart, usually only one (very rarely, two) can utilize a color mapping.

Functional dependency: independent variables vs dependent variables

Knowledge or inference of which field(s) determine the value of other field(s).

Examples:

  • a country code plus a zip code fully determines a municipality
  • one field, or multiple fields combined, may act as a unique key (for the document level, or a given aggregation), called a candidate key in database terms (not just SQL!), or independent variable in statistics terms, or dimensions in data exploration

Often, exploratory interaction is about filtering or navigating in the realm of independent variables / dimensions, while the quantities and categories of dependent variables are aggregated (or in contrast, disaggregated) and visualized.

Time and space dependency

Most metrics in an index may change over time, and/or spatial dimensions where available. It's useful to default to eg. a time series view or map view (recommender) and offer suitable visualization choices, eg. lines, if the time series is reasonably continuous.

Explanatory relationship

Key and text field pair:

  • one field is a code (eg. stable, standards or conventions following, unambiguous),
  • the other field is the full text for human consumption

Visualization and data exploration impact:

  • they should be treated as one unit by exploration interactions, eg. there should not be a separate filter dropdown for the code and the text (this still allows incremental search in either of the fields); they represent one variable eg. in a Cartesian or parallel coordinates chart
  • the text should show up in tooltip, legend, annotation
  • the text is possibly available in multiple languages and with multiple lengths, eg. to use whichever fits in a table column or as a categorical axis tick label

Redundant metrics

Certain metrics may redundantly encode the same information (eg. same phenomenon, different unit) or may contain precomputed values (eg. elapsed time, MB, MB/s).

Physical data representation changes over time

For example, user name of a given user changes; name of country changes; or an upstream logging system gets fixed. The new values may be in another field. A Data View may make the change disappear, by abstracting over. Benefits:

  • avoids the need to reindex a lot of data
  • still, visualization and report building folks don't need to introduce custom logic repeatedly (DRY principle)

Independence of metrics

If there's no established relationship among certain fields, they can be assumed independent of one another. This doesn't mean no correlation, and showing correlations is probably a good idea, eg. via scatterplot, SPLOM, parcoords.
image

Shared attributes

Here, multiple fields relate to one another through common properties. This can happen across fields within the same index, or among fields that are in disparate indices.

Shared nominal types (semantic domains)

While field types are present in Elasticsearch, they represent physical domains.

For example, a part to whole ratio may be represented

  • physically, as a float in the index
  • conceptually, as a real number between 0 and 1

A "megabytes transferred" metric may be represented

  • physically, as an integer in the index
  • conceptually, as an additive number, over which summing data transform aggregations, and summing visualizations eg. partition charts work

The physical type doesn't give much useful information for what transforms and visualizations may be even legitimate.
Nominal (semantic) types are required for

  • good data visualization defaults (eg. don't offer partition charts over non-additive metrics; don't allow logarithmic Y scale if the values can be zero or negative)
  • legitimate recommendations, within which the topmost ones are the most compatible ones, based on metadata
  • meaningful visual data transform builders, where compatible pieces fit together

Nominal typing may include these, and more:

  • allowed extent of data (positive numbers, or numbers within a specific range)
  • is it a continuous measure, ie. do the numbers represent a measurement, or are they just numbers that stand for some categorization? Eg. 0 means, no error, 404 means, page not found etc. Or even, some kind of index number
  • discretized nature (eg. integers only, or increments of 0.2) or even, a limited set of allowed numbers or keywords
  • unit of measure: helps avoid adding an angle in degrees with an angle in radians; agg or report autoconversion may be possible

Note: such typing information may eventually enable more compact representation in Elasticsearch.

Several fields that reference a shared semantic type are meaningfully related. Example: both buildings_index and roads_index have a field for occupied land area. They share a unit (eg. square meters) and they share the property of additivity. These two fields may even be linked to a common metadata descriptor (DRY principle in data modeling). Therefore, a report, visualization or data transform may safely add land areas of buildings and roads, to get summarized land occupance.

Even just the knowledge of shareed, or convertible unit is useful for dataviz, because then they can be projected to a common vertical scale.

Shared visual attributes

Due to compatible nominal types,

  • it's possible to meaningfully union the domain, because their units are the same, or reconcilable
  • therefore it's sensible to map both to a common Y axis, or common color gradient

It's desirable that visual recommenders and defaults exploit common value=>aesthetic mapping when possible. Besides compatible nominal types, the default value=>aesthetic mapping can be associated with specific Data View fields, or even, across multiple Data Views.

Therefore, default mappings are first class entities which can be referenced by fields in Data Views (this still allows the implicit creation of mappings, if not shared among Data Views, for the user's convenience; can be made explicit and extracted when needed)

image

See also Beyond palettes

Multi-index Data Views

Sometimes data that relate to one another are not in the same index or index* group. Eg.

  • a field in the main index represents codes, while a small auxiliary index associates explanatory name with the code values
  • the indices describe relationships in the real world or in computer infrastructure, and specific types of entities are in their dedicated indices; one index field may reference a field in another (eg. there may be an index that associates road entities with building entities, based on which roads connect which buildings)

A future Data View may reference multiple index (or index) entities*, with metadata in Data View associating the relationship among indices and their fields (see cross-index fields)

Derived information in Data Views

Eventually, a Data View should be able to represent an aggregation, filtering or other data transformation of its input (indices, or another, more granular Data View).

Even in this case, field level metadata is useful, per field and across fields. Because the ultimate use in visual analytics is the same, and it requires various kinds of metadata.

So, Data Views may eventually become composable. Example: different parts of the organization may need

  • differing granularity
  • authorization to different slices of the data
  • different default value=>aesthetic mappings eg. color scales

Even if there's a single dashboard, or a set of dashboards that share a bunch of fields, it may be worth creating a common Data View for that, atop of a possibly preexisting Data View, so that theming and mappings can be shared:

image
Vavaliya et al: Online Performance Assessment System for Urban Water Supply and Sanitation Services in India)

A Data View that represents data transformation actually generates metadata. For example, a grouping aggregation will yield unique rows in terms of the values in fields that are part of the grouping dimensions.

@botelastic botelastic bot added the needs-team Issues missing a team label label Apr 15, 2021
@monfera monfera added discuss Team:DataVis Team label for DataVis Team labels Apr 15, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/datavis (Team:DataVis)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Apr 15, 2021
@monfera monfera added the needs-team Issues missing a team label label Apr 15, 2021
@botelastic botelastic bot removed the needs-team Issues missing a team label label Apr 15, 2021
@monfera
Copy link
Contributor Author

monfera commented Apr 15, 2021

image

@monfera
Copy link
Contributor Author

monfera commented Apr 21, 2021

Field metadata drives some of the recommendations: https://data.humdata.org/dataviz-guide/dataviz-elements/#/data-visualization/bar-charts ht @maartenzam

@monfera
Copy link
Contributor Author

monfera commented Apr 21, 2021

Related: #73152

@stratoula stratoula added Feature:ElasticCharts Issues related to the elastic-charts library Team:Visualizations Visualization editors, elastic-charts and infrastructure and removed Team:DataVis Team label for DataVis Team labels Nov 4, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-visualizations @elastic/kibana-visualizations-external (Team:Visualizations)

@markov00
Copy link
Member

markov00 commented Jun 3, 2024

In order to provide better transparency of priorities, issues that will not be prioritized within the next 24 months are being closed.

Tracking request in Lens general improvements ice box #184648

@markov00 markov00 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:ElasticCharts Issues related to the elastic-charts library 🧊 iceboxed Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

4 participants