summaryrefslogtreecommitdiff
path: root/contrib/pg_trgm/README.pg_trgm
diff options
context:
space:
mode:
Diffstat (limited to 'contrib/pg_trgm/README.pg_trgm')
-rw-r--r--contrib/pg_trgm/README.pg_trgm144
1 files changed, 0 insertions, 144 deletions
diff --git a/contrib/pg_trgm/README.pg_trgm b/contrib/pg_trgm/README.pg_trgm
deleted file mode 100644
index e7ff73e4f1..0000000000
--- a/contrib/pg_trgm/README.pg_trgm
+++ /dev/null
@@ -1,144 +0,0 @@
-trgm - Trigram matching for PostgreSQL
---------------------------------------
-
-Introduction
-
- This module is sponsored by Delta-Soft Ltd., Moscow, Russia.
-
- The pg_trgm contrib module provides functions and index classes
- for determining the similarity of text based on trigram
- matching.
-
-Definitions
-
- Trigram (or Trigraph)
-
- A trigram is a set of three consecutive characters taken
- from a string. A string is considered to have two spaces
- prefixed and one space suffixed when determining the set
- of trigrams that comprise the string.
-
- eg. The set of trigrams in the word "cat" is " c", " ca",
- "at " and "cat".
-
-Public Functions
-
- real similarity(text, text)
-
- Returns a number that indicates how closely matches the two
- arguments are. A zero result indicates that the two words
- are completely dissimilar, and a result of one indicates that
- the two words are identical.
-
- real show_limit()
-
- Returns the current similarity threshold used by the '%'
- operator. This in effect sets the minimum similarity between
- two words in order that they be considered similar enough to
- be misspellings of each other, for example.
-
- real set_limit(real)
-
- Sets the current similarity threshold that is used by the '%'
- operator, and is returned by the show_limit() function.
-
- text[] show_trgm(text)
-
- Returns an array of all the trigrams of the supplied text
- parameter.
-
-Public Operators
-
- text % text (returns boolean)
-
- The '%' operator returns TRUE if its two arguments have a similarity
- that is greater than the similarity threshold set by set_limit(). It
- will return FALSE if the similarity is less than the current
- threshold.
-
-Public Index Operator Classes
-
- gist_trgm_ops
-
- The pg_trgm module comes with an index operator class that allows a
- developer to create an index over a text column for the purpose
- of very fast similarity searches.
-
- To use this index, the '%' operator must be used and an appropriate
- similarity threshold for the application must be set.
-
- eg.
-
- CREATE TABLE test_trgm (t text);
- CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
-
- At this point, you will have an index on the t text column that you
- can use for similarity searching.
-
- eg.
-
- SELECT
- t,
- similarity(t, 'word') AS sml
- FROM
- test_trgm
- WHERE
- t % 'word'
- ORDER BY
- sml DESC, t;
-
- This will return all values in the text column that are sufficiently
- similar to 'word', sorted from best match to worst. The index will
- be used to make this a fast operation over very large data sets.
-
-Tsearch2 Integration
-
- Trigram matching is a very useful tool when used in conjunction
- with a text index created by the Tsearch2 contrib module. (See
- contrib/tsearch2)
-
- The first step is to generate an auxiliary table containing all
- the unique words in the Tsearch2 index:
-
- CREATE TABLE words AS SELECT word FROM
- stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
-
- Where 'documents' is a table that has a text field 'bodytext'
- that TSearch2 is used to search. The use of the 'simple' dictionary
- with the to_tsvector function, instead of just using the already
- existing vector is to avoid creating a list of already stemmed
- words. This way, only the original, unstemmed words are added
- to the word list.
-
- Next, create a trigram index on the word column:
-
- CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);
- or
- CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops);
-
- Now, a SELECT query similar to the example above can be used to
- suggest spellings for misspelled words in user search terms. A
- useful extra clause is to ensure that the similar words are also
- of similar length to the misspelled word.
-
- Note: Since the 'words' table has been generated as a separate,
- static table, it will need to be periodically regenerated so that
- it remains up to date with the word list in the Tsearch2 index.
-
-Authors
-
- Oleg Bartunov <oleg@sai.msu.su>, Moscow, Moscow University, Russia
- Teodor Sigaev <teodor@sigaev.ru>, Moscow, Delta-Soft Ltd.,Russia
-
-Contributors
-
- Christopher Kings-Lynne wrote this README file
-
-References
-
- Tsearch2 Development Site
- http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
-
- GiST Development Site
- http://www.sai.msu.su/~megera/postgres/gist/
-