Skip to content

Latest commit

 

History

History
214 lines (146 loc) · 3.88 KB

fuzzy.md

File metadata and controls

214 lines (146 loc) · 3.88 KB

fuzzy: Fuzzy string matching and phonetics in SQLite

The sqlean-fuzzy extension provides fuzzy-matching helpers:

  • Measure distance between two strings.
  • Compute phonetic string code.
  • Transliterate a string.

If you want a ready-to-use mechanism to search a large vocabulary for close matches, see the spellfix extension instead.

String distancesPhonetic codesTransliterationAcknowledgementsInstallation and usage

String distances

These functions measure the distance between two strings.

Only ASCII strings are supported. Use the translit function to convert the input string from UTF-8 to plain ASCII.

damleveditdisthammingjarowinlevenosadist

fuzzy_damlev

fuzzy_damlev(x, y)

Calculates the Damerau-Levenshtein distance.

select fuzzy_damlev('awesome', 'aewsme');
-- 2

fuzzy_editdist

fuzzy_editdist(x, y)

Calculates the spellcheck edit distance.

select fuzzy_editdist('awesome', 'aewsme');
-- 215

fuzzy_hamming

fuzzy_hamming(x, y)

Calculates the Hamming distance.

select fuzzy_hamming('awesome', 'aewsome');
-- 2

fuzzy_jarowin

fuzzy_jarowin(x, y)

Calculates the Jaro-Winkler distance.

select fuzzy_jarowin('awesome', 'aewsme');
-- 0.907142857142857

fuzzy_leven

Calculates the Levenshtein distance.

select fuzzy_leven('awesome', 'aewsme');
-- 3

fuzzy_osadist

fuzzy_osadist(x, y)

Calculates the Optimal String Alignment distance.

select fuzzy_osadist('awesome', 'aewsme');
-- 3

Phonetic codes

These functions compute phonetic string codes.

Only ASCII strings are supported. Use the translit function to convert the input string from UTF-8 to plain ASCII.

caverphoneticsoundexrsoundex

fuzzy_caver

fuzzy_caver(x)

Calculates the Caverphone code.

select fuzzy_caver('awesome');
-- AWSM111111

fuzzy_phonetic

fuzzy_phonetic(x)

Calsulates the spellcheck phonetic code.

select fuzzy_phonetic('awesome');
-- ABACAMA

fuzzy_soundex

fuzzy_soundex(x)

Calculates the Soundex code.

select fuzzy_soundex('awesome');
-- A250

fuzzy_rsoundex

fuzzy_rsoundex(x)

Calculates the Refined Soundex code.

select fuzzy_rsoundex('awesome');
-- A03080

Transliteration

fuzzy_translit(str)

Transliteration converts the input string from UTF-8 into plain ASCII by converting all non-ASCII characters to some combination of characters in the ASCII subset.

The distance and phonetic functions are ASCII only, so to work with a Unicode string, you should first transliterate it:

select fuzzy_translit('sí señor');
-- si senor

select fuzzy_translit('привет');
-- privet

Some characters may be lost:

select fuzzy_translit('oh my 😅');
-- oh my ?

Acknowledgements

Adapted from libstrcmp by Ross Bayer and spellfix.c by D. Richard Hipp.

Installation and usage

SQLite command-line interface:

sqlite> .load ./fuzzy
sqlite> select fuzzy_soundex('hello');

See How to install an extension for usage with IDE, Python, etc.

Download the extension.

Explore other extensions.

Subscribe to stay on top of new features.