Skip to content

Tutorial

Wenbo Tao edited this page Jul 25, 2019 · 59 revisions

This tutorial serves as quick intro of Kyrix's architecture and declarative model and a step-by-step guide on how to develop a Kyrix application.

System Architecture

Kyrix employs a simple client-server architecture. As a visualization developer, you will write a Kyrix specification in Javascript. Kyrix compiler compiles the spec you write and prompts errors if any. The spec is saved in the database and passed to the backend server, which then precomputes necessary database indexes offline to ensure interactive response to online user interactions. The frontend is in charge of listening to user interactions, firing data fetching request to fetch data from the backend, and rendering the visualizations.

One great thing about Kyrix is the declarative model it provides. Declarative design hides from you complex execution details such as frontend rendering and backend data fetching logic, and allows you to stay focused on designing your application. We describe this declarative model in the following.

Declarative Model

Kyrix declarative model has a concise set of abstractions, as shown in the figure above. The most important notion is a canvas, which can be considered as a "zoom level". Canvases are connected by jumps, which enable zooming in/out between canvases.

A canvas is composed of one or more overlaid layers, which can be thought of as layers in PhotoShop. Each layer is associated with a SQL query, the result of which is passed in the rendering function and placement function of this layer.

A Kyrix visualization has one or more views. A view is effectively a "window" through which you can see canvases. Different views can be linked with some type of coordination, which is also defined using the "jump" notion.

View

A view is defined by its size and location on the screen. Optionally, you can specify what canvas to show in the view when the application is first loaded.

Canvas and Layer

When specifying a canvas, you need to indicate its width and height. This sets up a shared rectangular coordinate system for its layers. The top left-hand corner of the canvas is (0, 0), while the bottom right-hand corner of the canvas is (w, h). For any dimension, if the size of the canvas is larger than the viewport size, panning is automatically enabled in that dimension.

Each layer is associated with a data transform, a rendering function and a placement function. These notions are described in detail in the following.

Data Transform

A data transform object can be seen as the data source for a layer and has two main components: a SQL query that fetches raw data and an optional transform function that "cook" raw data into desired format. The result of the data transform is used as input to both rendering and placement functions.

Rendering Function

Rendering function converts data transform results to geometries on the screen. Currently, this is specified using a D3 script.

Placement function

Placement function specifies for each row in the transform result, a bounding rectangle representing where in the canvas coordinates system this row will appear. This is an additional piece of information for the backend to perform fast data fetching. Roughly speaking, the backend will fetch rows whose bounding boxes intersect with the viewport.

Jump

A jump can be simply constructed by a source canvas and a destination canvas. It can also be customized in many ways.

An example

This example has two views with coordinated loading and selection, and a couple canvases connected by semantic zooms.

Four files

For this basketball app, we write the spec in four files:

  • nba_cmv.js: this file has definitions of projects, views, canvases, layers and jumps.
  • transforms.js: this file defines all data transform objects.
  • renderers.js: rendering functions of all layers are in this file.
  • placements.js: this file has all placement functions defined.

Examples in the example folder all follow this four-file format, which we believe is good separation of the thinking process when writing a Kyrix app. We have provided templates of these four files here.

For detailed information on the APIs, see our API manual.

Loading Data into Docker Container

After you start the docker containers, you will see a working NBA app. To move beyond this app, you need to load custom data into the Postgres container (kyrix_db_1). We provide useful scripts that let you load either a csv file or a Postgres SQL dump using one command.

Loading CSV

./docker-scripts/load-csv.sh CSV_FILE [OPTIONS]

Here, CSV_FILE is the name of a csv file. OPTIONS allow you to specify the Postgres database name and table name where you data is loaded to, as well as the delimiter of your csv file. Without OPTIONS, both database name and table name will default to the name of your csv file. Make sure that your CSV file contains proper header names that are good for Postgres. An example:

./docker-scripts/load-csv.sh nba_celtics.csv --dbname nba --tablename celtics --delimiter "\t"

will load the data in nba_celtics.csv (which uses tab as delimiter) into table celtics in the database nba.

Note that this script has no type inference and every column is typed TEXT. To do more sophisticated csv loading, consider the following strategy:

  • copy your csv file into docker container using docker cp;
  • create your database tables manually by connecting to the Postgres instance as above;
  • inside kyrix_db_1, load your csv using Postgres COPY.

Loading SQL Dump

./docker-scripts/load-sql.sh SQL_DUMP_FILE [OPTIONS]

Here, SQL_DUMP_FILE is the name of a Postgres sql dump file. OPTIONS allow you to specify the Postgres database name. Without OPTIONS, the database name will default to the name of your sql dump file. An example:

./docker-scripts/load-sql.sh forest_dump.sql --dbname forest

Inspect what's loaded

It is often useful to connect to the Postgres instance to see what's actually being loaded. To do so, run the following command to start an interactive Psql shell in container kyrix_db_1 (see here for more details of Kyrix docker configs):

> docker exec -it kyrix_db_1 /bin/sh                          # a shell session into the db container
> psql postgresql://postgres:kyrixftw@localhost/postgres      # log into the postgres instance
> \list                                                       # see the list of databases
> \c dbname                                                   # connect to one database
> \d                                                          # see the list of tables
> \d tablename                                                # see the schema of a table
> select * from tablename;                                    # see what's in a table

Typical Workflow to Create a Kyrix App

Developing locally with Docker containers running

We recommend using our docker images to start a kyrix backend to avoid running into weird errors with manual installation.

You can write Kyrix specifications inside docker containers, and compile them by running node app.js in the directory of your app. Assuming you have called project.saveProject(), this will send a REST request to the Kyrix container (kyrix_kyrix_1) which will then save the specifications in the database container (kyrix_db_1).

If you don't feel like writing code in a VM, you can develop locally as the following. First, get NodeJS installed locally, rename config.txt.example in the root folder to config.txt, and then run npm install under compiler/. Then you can run node app.js locally. Our compiler will send your project specification into the docker container using a REST request.

When you first start the containers, the kyrix backend (in kyrix_kyrix_1) is by default serving the example NBA application. After you run node app.js, the backend will start serving your app. To see your app, simply go to localhost:8000.

Debugging

Generally, there are three types of errors: compiler errors, frontend errors and backend errors. Compiler errors can be relatively easily fixed with the help of error messages after running node app.js.

Frontend errors are typically notified by your browser's console. A good iterative way to debug them is to keep the kyrix backend running, compile the project every time you modify the rendering function and then refresh the browser. Note that only modifying a rendering function will not trigger a recomputation of indexes. Another way would be to embed your rendering function into an HTML file, and use some sample data to see if it looks correct.

Backend errors are trickier to debug right now. Backend server will stop working if there are errors during either indexing or online querying. We are working on fixing it (#15). Now, if you see exceptions in backend, please restart the docker containers.

Skipping reindexing

Every time you run node app.js, the backend uses a simple strategy to decide if the difference between your new spec and the previous one requires recomputing the DB indexes. Things that can trigger a recomputation include modifying the data transforms, adding or deleting layers, etc.

For large datasets, recomputation can take a fair amount of time. We are working on ways to reduce this turnaround time. Stay tuned. To skip recomputation, run node app.js -s.

Forcing reindexing

Note that Kyrix is (currently) unable to detect changes in the database. So if the data corresponding to one canvas is updated, you need to force the backend to recompute the index. To do so, run node app.js -f.

A List of Commonly Used Commands

Command Description
docker-compose up Start Kyrix docker containers, run under root directory.
./docker-scripts/load-csv.sh CSV_FILE [OPTIONS] Load a csv file into the Postgres instance. See descriptions above.
node app.js Compile your app and send it to the Kyrix backend if there is no error. The Kyrix backend will compare the current specification with the previous one to decide if re-indexing is needed.
node app.js -s Compile your app and send it to the Kyrix backend if there is no error. The Kyrix backend will skip re-indexing.
node app.js -f Compile your app and send it to the Kyrix backend if there is no error. The Kyrix backend will force re-indexing.
docker-compose build Rebuild docker containers. Run every time there are changes to Kyrix code (e.g. a git pull to fetch latest code). Run under root directory.
docker exec -it kyrix_kyrix_1 /bin/sh Get a shell into the Kyrix backend container.
docker exec -it kyrix_db_1 /bin/sh Get a shell into the Postgres DB container.
docker exec -it kyrix_db_1 /bin/sh -c "psql postgresql://postgres:kyrixftw@localhost/postgres" Get a shell into the Postgres instance.
docker container stop $(docker container ls -a -q) && docker system prune -a -f --volumes Clear everything docker-related.
Clone this wiki locally