Edit distance functions
Edit distance functions measure how similar two strings are by counting the minimum number of operations needed to transform one string into the other.| Function | Arguments | Returns | Description |
|---|---|---|---|
fuzzy_leven(A, B) | A, B TEXT | INTEGER | Levenshtein distance between A and B (insertions, deletions, substitutions) |
fuzzy_damlev(A, B) | A, B TEXT | INTEGER | Damerau-Levenshtein distance (adds transpositions to Levenshtein) |
fuzzy_editdist(A, B) | A, B ASCII TEXT | INTEGER | Weighted edit distance; if A ends with *, treats it as a prefix of B |
fuzzy_hamming(A, B) | A, B TEXT (equal length) | INTEGER | Hamming distance; returns -1 if strings differ in length |
fuzzy_osadist(A, B) | A, B TEXT | INTEGER | Optimal String Alignment distance |
fuzzy_jarowin(A, B) | A, B TEXT | REAL | Jaro-Winkler similarity score between 0.0 (no match) and 1.0 (identical) |
Examples
Phonetic encoding functions
Phonetic encoding functions convert words to codes that represent how they sound, so that words with similar pronunciations match even when spelled differently.| Function | Arguments | Returns | Description |
|---|---|---|---|
fuzzy_soundex(X) | X TEXT | TEXT or NULL | Standard Soundex code (e.g. P532 for phonetics) |
fuzzy_rsoundex(X) | X TEXT | TEXT or NULL | Refined Soundex — more granular than standard Soundex |
fuzzy_phonetic(X) | X TEXT | TEXT or NULL | Phonetic hash using an alternative phonetic algorithm |
fuzzy_caver(X) | X TEXT | TEXT or NULL | Caverphone encoding (10-character fixed-length code) |
Examples
Transliteration and script functions
| Function | Arguments | Returns | Description |
|---|---|---|---|
fuzzy_translit(X) | X TEXT | TEXT | Converts non-ASCII Roman characters in X to their closest ASCII equivalents |
fuzzy_script(X) | X TEXT | INTEGER | Returns the ISO 15924 numeric code of the dominant script in X |
Script codes returned by fuzzy_script
| Code | Script |
|---|---|
| 125 | Hebrew |
| 160 | Arabic |
| 200 | Greek |
| 215 | Latin |
| 220 | Cyrillic |
| 998 | Mixed (two or more scripts detected) |
| 999 | Unknown (no recognized script) |