1 files changed, 214 insertions, 0 deletions
diff --git a/doc/src/sgml/trgm.sgml b/doc/src/sgml/trgm.sgml
new file mode 100644
index 0000000000..62a5e30382
--- /dev/null
+++ b/doc/src/sgml/trgm.sgml
@@ -0,0 +1,214 @@
+<sect1 id="pgtrgm">
+ <title>pg_trgm</title>
+ 
+ <indexterm zone="pgtrgm">
+  <primary>pgtrgm</primary>
+ </indexterm>
+
+ <para>
+  The <literal>pg_trgm</literal> module provides functions and index classes
+  for determining the similarity of text based on trigram matching.
+ </para>
+
+ <sect2>
+  <title>Trigram (or Trigraph)</title>
+  <para>
+   A trigram is a set of three consecutive characters taken
+   from a string.  A string is considered to have two spaces
+   prefixed and one space suffixed when determining the set
+   of trigrams that comprise the string.
+  </para>
+  <para>
+   eg. The set of trigrams in the word "cat" is "  c", " ca", 
+   "at " and "cat".
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Public Functions</title>
+  <table>
+   <title><literal>pg_trgm</literal> functions</title>
+   <tgroup cols="2">
+    <thead>
+     <row>
+      <entry>Function</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+    <tbody>
+     <row>
+      <entry><literal>real similarity(text, text)</literal></entry>
+      <entry>
+       <para>
+        Returns a number that indicates how closely matches the two
+        arguments are.  A zero result indicates that the two words
+        are completely dissimilar, and a result of one indicates that
+        the two words are identical.
+       </para>
+      </entry>
+     </row>
+     <row>
+      <entry><literal>real show_limit()</literal></entry>
+      <entry>
+       <para>
+        Returns the current similarity threshold used by the '%'
+        operator.  This in effect sets the minimum similarity between
+        two words in order that they be considered similar enough to
+        be misspellings of each other, for example.
+       </para>
+      </entry>
+     </row>
+     <row>
+      <entry><literal>real set_limit(real)</literal></entry>
+      <entry>
+       <para>
+        Sets the current similarity threshold that is used by the '%'
+        operator, and is returned by the show_limit() function.
+       </para>
+      </entry>
+     </row>
+     <row>
+      <entry><literal>text[] show_trgm(text)</literal></entry>
+      <entry>
+       <para>
+        Returns an array of all the trigrams of the supplied text
+        parameter.
+       </para>
+      </entry>
+     </row>
+     <row>
+      <entry>Operator: <literal>text % text (returns boolean)</literal></entry> 
+      <entry>
+       <para>
+        The '%' operator returns TRUE if its two arguments have a similarity
+        that is greater than the similarity threshold set by set_limit(). It
+        will return FALSE if the similarity is less than the current
+        threshold.
+       </para>
+      </entry>
+     </row>
+    </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
+ <sect2>
+  <title>Public Index Operator Class</title>
+  <para>
+   The <literal>pg_trgm</literal> module comes with the 
+   <literal>gist_trgm_ops</literal> index operator class that allows a
+   developer to create an index over a text column for the purpose
+   of very fast similarity searches.
+  </para>
+  <para>
+   To use this index, the '%' operator must be used and an appropriate
+   similarity threshold for the application must be set. Example:
+  </para>
+  <programlisting>
+CREATE TABLE test_trgm (t text);
+CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
+  </programlisting>
+  <para>	
+   At this point, you will have an index on the t text column that you
+   can use for similarity searching. Example:
+  </para>
+  <programlisting>
+SELECT
+	t,
+	similarity(t, 'word') AS sml
+FROM
+	test_trgm
+WHERE
+	t % 'word'
+ORDER BY
+	sml DESC, t;
+  </programlisting>
+  <para>
+   This will return all values in the text column that are sufficiently
+   similar to 'word', sorted from best match to worst.  The index will
+   be used to make this a fast operation over very large data sets.
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Tsearch2 Integration</title>
+  <para>
+   Trigram matching is a very useful tool when used in conjunction
+   with a text index created by the Tsearch2 contrib module. (See
+   contrib/tsearch2)
+  </para>
+  <para>
+   The first step is to generate an auxiliary table containing all
+   the unique words in the Tsearch2 index:
+  </para>
+  <programlisting>
+CREATE TABLE words AS SELECT word FROM 
+	stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
+  </programlisting>
+  <para>
+   Where 'documents' is a table that has a text field 'bodytext'
+   that TSearch2 is used to search.  The use of the 'simple' dictionary
+   with the to_tsvector function, instead of just using the already
+   existing vector is to avoid creating a list of already stemmed
+   words.  This way, only the original, unstemmed words are added
+   to the word list.
+  </para>
+  <para>
+   Next, create a trigram index on the word column:
+  </para>
+  <programlisting>
+CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);
+  </programlisting>
+  <para>
+   or
+  </para>
+  <programlisting>
+CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops);
+  </programlisting>
+  <para>
+   Now, a <literal>SELECT</literal> query similar to the example above can be 
+   used to suggest spellings for misspelled words in user search terms. A
+   useful extra clause is to ensure that the similar words are also
+   of similar length to the misspelled word.
+  </para>
+  <para>
+   <note>
+    <para>
+     Since the 'words' table has been generated as a separate,
+     static table, it will need to be periodically regenerated so that
+     it remains up to date with the word list in the Tsearch2 index.
+    </para>
+   </note>
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>References</title>
+  <para>
+   Tsearch2 Development Site
+   <ulink url="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/"></ulink>
+  </para>
+  <para>
+   GiST Development Site
+   <ulink url="http://www.sai.msu.su/~megera/postgres/gist/"></ulink>
+  </para>
+ </sect2>
+
+ <sect2>
+  <title>Authors</title>
+  <para>
+   Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
+  </para>
+  <para>
+   Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
+  </para>
+  <para>
+   Documentation: Christopher Kings-Lynne 
+  </para>
+  <para>
+   This module is sponsored by Delta-Soft Ltd., Moscow, Russia.
+  </para>
+ </sect2>
+
+</sect1>
+