Skip to content

Redis Data Structure

Tim Schwab edited this page Apr 18, 2019 · 5 revisions

Data

Snippets

The data that we are storing and searching is essentially a list of snippets. A snippet is an object that contains a list of string keywords, a string problem, and a string solution. So, the data could be represented in JSON like this:

[
	{
		"problem": "Have an HTML link open a new tab",
		"solution": "<a href='foo.html' target='_blank'></a>",
		"keywords": ["html", "new", "tab", "link"]
	},
	{
		"problem": "Terminate code in C#",
		"solution": "Environment.Exit([exit code]);",
		"keywords": ["c#", "exit", "terminate", "stop", "halt", "execution"]
	},
	...
]

This is the core, atomic data structure that governs everything. Everything is based on getting, viewing, searching, editing, deleting, etc these snippets.

Redis data

All of our data is stored in Redis. There are several groups of redis data that CheatSheet uses.

Quick summation of all the data

  • [snippet id (number)]
  • [snippet id]-problem-tokens
  • [snippet id]-answer-tokens
  • [keyword]-keywords
  • [keyword]-answers
  • [keyword]-problems
  • [keyword]-scores
  • ~~counter
  • ~~results
  • ~~recently-deleted

Snippet

The simplest of the three is just the snippet, stored in a redis string, in stringified JSON format. The key for this data is the id of the snippet, which is just a number. So if the data above was added to CheatSheet, then you could query redis like this:

get 1
"{\"keywords\":[\"html\",\"new\",\"tab\",\"link\"],\"problem\":\"Have an HTML link open a new tab\",\"solution\":\"<a href='' target='_blank'></a>\"}"

Indices

When a snippet is added, it gets tokenized and indexed right away. The result of this indexing is three groups of redis sets. For every keyword from the snippet, the index of the snippet gets added to a redis set named [keyword]-keywords. If the set doesn't exist during the tokenization process, it gets created. For every token from the snippet problem, the index of the snippet gets added to [token]-problems. Same thing for the tokens for the solution - they create [token]-solutions.

Searchable scores

After the snippet's tokens are added, CheatSheet recalculates the scores for all the search terms it touched. These are stored in sorted sets named like so: [token]-scores. These are what actually get used by the search algorithm.

Tokenizing history

The tokenizing process changes depending on the user's settings. So, a snippet might get indexed, and then the settings changed, and then the user wants to delete the snippet. This poses a problem because we can't be sure what tokens were created when the snippet got added, so we can't be sure that we can find all the places to remove the snippet's index.

To address this, there are two other redis sets for every snippet. They store the tokens that were produced for that snippet's problem and solution. Their keys are [snippet index]-problem-tokens and [snippet index]-solution-tokens.

Special keys

There are several special system keys that are kept in redis. They all begin with ~~. The first is ~~counter. This is a simple counter string that is used to create snippet IDs.

Next, ~~results. This is created by the search algorithm and contains a sorted set of the snippet indices that get returned as the result of a search. The weight of a member is the likelihood that it is what the user searched for.

Finally, a sorted set named ~~recently-deleted. This stores all of the dropped snippets. The score of an element is the time it was added.

Settings

The settings are stored in settings.json. Here are all the settings so far:

  • list of ignored words in tokens
  • list of allowed characters in tokens
  • keyword/problem/solution weights
  • amount of time to keep a dropped snippet