summaryrefslogtreecommitdiff
path: root/doc/src/sgml/high-availability.sgml
diff options
context:
space:
mode:
authorBruce Momjian <bruce@momjian.us>2006-11-17 16:38:44 +0000
committerBruce Momjian <bruce@momjian.us>2006-11-17 16:38:44 +0000
commite1693e514c197b6ddd27b59678f78b9385ac0539 (patch)
tree556b481a9bd656ed1e9d066b7f50f5a31dc1bfb0 /doc/src/sgml/high-availability.sgml
parentcc9698254c9cff7c36ed1e7ea78dad51d87616b1 (diff)
downloadpostgresql-e1693e514c197b6ddd27b59678f78b9385ac0539.tar.gz
Rename replication section "High Availability and Load Balancing".
Diffstat (limited to 'doc/src/sgml/high-availability.sgml')
-rw-r--r--doc/src/sgml/high-availability.sgml249
1 files changed, 249 insertions, 0 deletions
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
new file mode 100644
index 0000000000..56733e38e8
--- /dev/null
+++ b/doc/src/sgml/high-availability.sgml
@@ -0,0 +1,249 @@
+<!-- $PostgreSQL: pgsql/doc/src/sgml/high-availability.sgml,v 1.1 2006/11/17 16:38:44 momjian Exp $ -->
+
+<chapter id="high-availability">
+ <title>High Availability and Load Balancing</title>
+
+ <indexterm><primary>high availability</></>
+ <indexterm><primary>failover</></>
+ <indexterm><primary>replication</></>
+ <indexterm><primary>load balancing</></>
+ <indexterm><primary>clustering</></>
+ <indexterm><primary>data partitioning</></>
+
+ <para>
+ Database servers can work together to allow a second server to
+ quickly take over quickly if the primary server fails (high
+ availability), or to allow several computers to serve the same
+ data (load balancing). Ideally, database servers could work
+ together seamlessly. Web servers serving static web pages can
+ be combined quite easily by merely load-balancing web requests
+ to multiple machines. In fact, read-only database servers can
+ be combined relatively easily too. Unfortunately, most database
+ servers have a read/write mix of requests, and read/write servers
+ are much harder to combine. This is because though read-only
+ data needs to be placed on each server only once, a write to any
+ server has to be propagated to all servers so that future read
+ requests to those servers return consistent results.
+ </para>
+
+ <para>
+ This synchronization problem is the fundamental difficulty for
+ servers working together. Because there is no single solution
+ that eliminates the impact of the sync problem for all use cases,
+ there are multiple solutions. Each solution addresses this
+ problem in a different way, and minimizes its impact for a specific
+ workload.
+ </para>
+
+ <para>
+ Some solutions deal with synchronization by allowing only one
+ server to modify the data. Servers that can modify data are
+ called read/write or "master" servers. Servers that can reply
+ to read-only queries are called "slave" servers. Servers that
+ cannot be accessed until they are changed to master servers are
+ called "standby" servers.
+ </para>
+
+ <para>
+ Some failover and load balancing solutions are synchronous, meaning that
+ a data-modifying transaction is not considered committed until all
+ servers have committed the transaction. This guarantees that a failover
+ will not lose any data and that all load-balanced servers will return
+ consistent results with no propagation delay. Asynchronous updating has
+ a small delay between the time of commit and its propagation to the
+ other servers, opening the possibility that some transactions might be
+ lost in the switch to a backup server, and that load balanced servers
+ might return slightly stale results. Asynchronous communication is used
+ when synchronous would be too slow.
+ </para>
+
+ <para>
+ Solutions can also be categorized by their granularity. Some solutions
+ can deal only with an entire database server, while others allow control
+ at the per-table or per-database level.
+ </para>
+
+ <para>
+ Performance must be considered in any failover or load balancing
+ choice. There is usually a tradeoff between functionality and
+ performance. For example, a full synchronous solution over a slow
+ network might cut performance by more than half, while an asynchronous
+ one might have a minimal performance impact.
+ </para>
+
+ <para>
+ The remainder of this section outlines various failover, replication,
+ and load balancing solutions.
+ </para>
+
+ <variablelist>
+
+ <varlistentry>
+ <term>Shared Disk Failover</term>
+ <listitem>
+
+ <para>
+ Shared disk failover avoids synchronization overhead by having only one
+ copy of the database. It uses a single disk array that is shared by
+ multiple servers. If the main database server fails, the standby server
+ is able to mount and start the database as though it was recovering from
+ a database crash. This allows rapid failover with no data loss.
+ </para>
+
+ <para>
+ Shared hardware functionality is common in network storage
+ devices. Using a network file system is also possible, though
+ care must be taken that the file system has full POSIX behavior.
+ One significant limitation of this method is that if the shared
+ disk array fails or becomes corrupt, the primary and standby
+ servers are both nonfunctional. Another issue is that the
+ standby server should never access the shared storage while
+ the primary server is running.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Warm Standby Using Point-In-Time Recovery</term>
+ <listitem>
+
+ <para>
+ A warm standby server (see <xref linkend="warm-standby">) can
+ be kept current by reading a stream of write-ahead log (WAL)
+ records. If the main server fails, the warm standby contains
+ almost all of the data of the main server, and can be quickly
+ made the new master database server. This is asynchronous and
+ can only be done for the entire database server.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Master/Slave Replication</term>
+ <listitem>
+
+ <para>
+ A master/slave replication setup sends all data modification
+ queries to the master server. The master server asynchronously
+ sends data changes to the slave server. The slave can answer
+ read-only queries while the master server is running. The
+ slave server is ideal for data warehouse queries.
+ </para>
+
+ <para>
+ Slony-I is an example of this type of replication, with per-table
+ granularity, and support for multiple slaves. Because it
+ updates the slave server asynchronously (in batches), there is
+ possible data loss during fail over.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Query Broadcasting</term>
+ <listitem>
+
+ <para>
+ In query broadcasting, a program intercepts every SQL query
+ and sends it to all servers. Each server operates independently.
+ Read-only queries can be sent to a single server because there
+ is no need for all servers to process it.
+ </para>
+
+ <para>
+ One limitation of this solution is that functions like
+ <function>random()</>, <function>CURRENT_TIMESTAMP</>, and
+ sequences can have different values on different servers. This
+ is because each server operates independently, and because SQL
+ queries are broadcast (and not actual modified rows). If this
+ is unacceptable, applications must query such values from a
+ single server and then use those values in write queries.
+ Also, care must be taken that all transactions either commit
+ or abort on all servers, perhaps using two-phase commit (<xref
+ linkend="sql-prepare-transaction"
+ endterm="sql-prepare-transaction-title"> and <xref
+ linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">.
+ Pgpool is an example of this type of replication.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Multi-Master Replication Using Clustering</term>
+ <listitem>
+
+ <para>
+ In clustering, each server can accept write requests, and
+ modified data is transmitted from the original server to every
+ other server before each transaction commits. Heavy write
+ activity can cause excessive locking, leading to poor performance.
+ In fact, write performance is often worse than that of a single
+ server. Read requests can be sent to any server. Clustering
+ is best for mostly read workloads, though its big advantage
+ is that any server can accept write requests &mdash; there is
+ no need to partition workloads between master and slave servers,
+ and because the changes are sent from one server to another,
+ there is not a problem with non-deterministic functions like
+ <function>random()</>.
+ </para>
+
+ <para>
+ Clustering is implemented by <productname>Oracle</> in their
+ <productname><acronym>RAC</></> product. <productname>PostgreSQL</>
+ does not offer this type of load balancing, though
+ <productname>PostgreSQL</> two-phase commit (<xref
+ linkend="sql-prepare-transaction"
+ endterm="sql-prepare-transaction-title"> and <xref
+ linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
+ can be used to implement this in application code or middleware.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Data Partitioning</term>
+ <listitem>
+
+ <para>
+ Data partitioning splits tables into data sets. Each set can
+ be modified by only one server. For example, data can be
+ partitioned by offices, e.g. London and Paris, with a server
+ in each office. If queries combining London and Paris data
+ are necessary, an application can query both servers, or
+ master/slave replication can be used to keep a read-only copy
+ of the other office's data on each server.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Clustering For Parallel Query Execution</term>
+ <listitem>
+
+ <para>
+ This allows multiple servers to work concurrently on a single
+ query. One possible way this could work is for the data to be
+ split among servers and for each server to execute its part of
+ the query and results sent to a central server to be combined
+ and returned to the user. There currently is no
+ <productname>PostgreSQL</> open source solution for this.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Commercial Solutions</term>
+ <listitem>
+
+ <para>
+ Because <productname>PostgreSQL</> is open source and easily
+ extended, a number of companies have taken <productname>PostgreSQL</>
+ and created commercial closed-source solutions with unique
+ failover, replication, and load balancing capabilities.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+
+</chapter>