From 5edb24a8983e4a103e26153853d91141f818227c Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Thu, 8 Sep 2011 17:51:23 +0300 Subject: Buffering GiST index build algorithm. When building a GiST index that doesn't fit in cache, buffers are attached to some internal nodes in the index. This speeds up the build by avoiding random I/O that would otherwise be needed to traverse all the way down the tree to the find right leaf page for tuple. Alexander Korotkov --- doc/src/sgml/gist.sgml | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'doc/src/sgml/gist.sgml') diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml index 78171cfa3f..1b6fa1a881 100644 --- a/doc/src/sgml/gist.sgml +++ b/doc/src/sgml/gist.sgml @@ -642,6 +642,40 @@ my_distance(PG_FUNCTION_ARGS) + + GiST buffering build + + Building large GiST indexes by simply inserting all the tuples tends to be + slow, because if the index tuples are scattered across the index and the + index is large enough to not fit in cache, the insertions need to perform + a lot of random I/O. PostgreSQL from version 9.2 supports a more efficient + method to build GiST indexes based on buffering, which can dramatically + reduce number of random I/O needed for non-ordered data sets. For + well-ordered datasets the benefit is smaller or non-existent, because + only a small number of pages receive new tuples at a time, and those pages + fit in cache even if the index as whole does not. + + + + However, buffering index build needs to call the penalty + function more often, which consumes some extra CPU resources. Also, the + buffers used in the buffering build need temporary disk space, up to + the size of the resulting index. Buffering can also infuence the quality + of the produced index, in both positive and negative directions. That + influence depends on various factors, like the distribution of the input + data and operator class implementation. + + + + By default, the index build switches to the buffering method when the + index size reaches . It can + be manually turned on or off by the BUFFERING parameter + to the CREATE INDEX clause. The default behavior is good for most cases, + but turning buffering off might speed up the build somewhat if the input + data is ordered. + + + -- cgit v1.2.1