Skip to content

OskarElek/PolyGlot

 
 

Repository files navigation

Polyglot: Bio-inspired Visual Analysis of Language Embedding Data

Polyglot is a web application for visualizing language embeddings in a 3D space. Language embeddings are typically high-dimensional vector representations of the syntactic and semantic content of words. This application allows examination of a particular word embedding data, reduced to 3D using UMAP. In addition to 3D navigation of the scatter plot space, the application also allows the user to view the exploration result of Monte-Carlo Physarum Machine (MCPM). The algorithm is a computational model simulating the self-organizing nature of slime mold. It has been shown to discover structures of underlying data following the characteristics of optimal transport networks. Lastly, the application also allows viewing the dataset by coloring based on each word's part-of-speech tag.

For this application, we use Gensim Continuous Skipgram result of Wikipedia Dump of February 2017 (296630 words). The same dataset is reduced twice using UMAP under the same parameter (can be switched using Select Dataset) to example the persistence of underlying structures.

Use mouse hover to examine the content of each word point. The toggle Show More displays all the word tokens under the mouse point, not just the one closest to the screen.

One can switch between examining slime exploration result and part-of-speech distributions using Color Mode. The four sliders (Color Gradient, Lowest Weight, Lowest Connect, Opacity Fading) can be used to customize the visualization of the slime results. Specifically, Lowest Connect is particularly helpful to declutter the scatter plot view.

The slime result is generated by placing MCPM probe agents around a single word point, which we call anchor point, and allow them to spread out and follow the trace. The anchor points are marked yellow. Hold Left-Shift to enter anchor point navigation mode. Double click on an anchor point to switch to the slime mode result for that particular point.

Web Application

You can use the web application by going to: https://creativecodinglab.github.io/Polyglot/index.html

Quick Reference

Mouse: navigate in 3D
Left-Shift: anchor-focus mode

Double click on anchor points (yellow) to view slime data from that anchor point.

Screenshots

Authors

This web visualization tool was created by a team of researchers at University of California, Santa Cruz, Dept. of Computational Media:

This work was published as Hongwei Zhou's M.S. thesis.

A version of this work was published in 2020 IEEE 5th Workshop on Visualization for the Digital Humanities (VIS4DH)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 97.2%
  • HTML 2.7%
  • Other 0.1%