summaryrefslogtreecommitdiff
path: root/src/backend/access/gist/gistxlog.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Make GiST indexes on-disk compatible with 9.2 again.Heikki Linnakangas2013-01-171-3/+3
| | | | | | | | | | | The patch that turned XLogRecPtr into a uint64 inadvertently changed the on-disk format of GiST indexes, because the NSN field in the GiST page opaque is an XLogRecPtr. That breaks pg_upgrade. Revert the format of that field back to the two-field struct that XLogRecPtr was before. This is the same we did to LSNs in the page header to avoid changing on-disk format. Bump catversion, as this invalidates any existing GiST indexes built on 9.3devel.
* Update copyrights for 2013Bruce Momjian2013-01-011-1/+1
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Remove obsolete XLogRecPtr macrosAlvaro Herrera2012-12-281-2/+2
| | | | | | | | | | | | | | | | | This gets rid of XLByteLT, XLByteLE, XLByteEQ and XLByteAdvance. These were useful for brevity when XLogRecPtrs were split in xlogid/xrecoff; but now that they are simple uint64's, they are just clutter. The only downside to making this change would be ease of backporting patches, but that has been negated by other substantive changes to the involved code anyway. The clarity of simpler expressions makes the change worthwhile. Most of the changes are mechanical, but in a couple of places, the patch author chose to invert the operator sense, making the code flow more logical (and more in line with preceding comments). Author: Andres Freund Eyeballed by Dimitri Fontaine and Alvaro Herrera
* Split out rmgr rm_desc functions into their own filesAlvaro Herrera2012-11-281-49/+0
| | | | | This is necessary (but not sufficient) to have them compilable outside of a backend environment.
* Fix multiple problems in WAL replay.Tom Lane2012-11-121-90/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the replay functions for WAL record types that modify more than one page failed to ensure that those pages were locked correctly to ensure that concurrent queries could not see inconsistent page states. This is a hangover from coding decisions made long before Hot Standby was added, when it was hardly necessary to acquire buffer locks during WAL replay at all, let alone hold them for carefully-chosen periods. The key problem was that RestoreBkpBlocks was written to hold lock on each page restored from a full-page image for only as long as it took to update that page. This was guaranteed to break any WAL replay function in which there was any update-ordering constraint between pages, because even if the nominal order of the pages is the right one, any mixture of full-page and non-full-page updates in the same record would result in out-of-order updates. Moreover, it wouldn't work for situations where there's a requirement to maintain lock on one page while updating another. Failure to honor an update ordering constraint in this way is thought to be the cause of bug #7648 from Daniel Farina: what seems to have happened there is that a btree page being split was rewritten from a full-page image before the new right sibling page was written, and because lock on the original page was not maintained it was possible for hot standby queries to try to traverse the page's right-link to the not-yet-existing sibling page. To fix, get rid of RestoreBkpBlocks as such, and instead create a new function RestoreBackupBlock that restores just one full-page image at a time. This function can be invoked by WAL replay functions at the points where they would otherwise perform non-full-page updates; in this way, the physical order of page updates remains the same no matter which pages are replaced by full-page images. We can then further adjust the logic in individual replay functions if it is necessary to hold buffer locks for overlapping periods. A side benefit is that we can simplify the handling of concurrency conflict resolution by moving that code into the record-type-specfic functions; there's no more need to contort the code layout to keep conflict resolution in front of the RestoreBkpBlocks call. In connection with that, standardize on zero-based numbering rather than one-based numbering for referencing the full-page images. In HEAD, I removed the macros XLR_BKP_BLOCK_1 through XLR_BKP_BLOCK_4. They are still there in the header files in previous branches, but are no longer used by the code. In addition, fix some other bugs identified in the course of making these changes: spgRedoAddNode could fail to update the parent downlink at all, if the parent tuple is in the same page as either the old or new split tuple and we're not doing a full-page image: it would get fooled by the LSN having been advanced already. This would result in permanent index corruption, not just transient failure of concurrent queries. Also, ginHeapTupleFastInsert's "merge lists" case failed to mark the old tail page as a candidate for a full-page image; in the worst case this could result in torn-page corruption. heap_xlog_freeze() was inconsistent about using a cleanup lock or plain exclusive lock: it did the former in the normal path but the latter for a full-page image. A plain exclusive lock seems sufficient, so change to that. Also, remove gistRedoPageDeleteRecord(), which has been dead code since VACUUM FULL was rewritten. Back-patch to 9.0, where hot standby was introduced. Note however that 9.0 had a significantly different WAL-logging scheme for GIST index updates, and it doesn't appear possible to make that scheme safe for concurrent hot standby queries, because it can leave inconsistent states in the index even between WAL records. Given the lack of complaints from the field, we won't work too hard on fixing that branch.
* Update copyright notices for year 2012.Bruce Momjian2012-01-011-1/+1
|
* Buffering GiST index build algorithm.Heikki Linnakangas2011-09-081-2/+4
| | | | | | | | | When building a GiST index that doesn't fit in cache, buffers are attached to some internal nodes in the index. This speeds up the build by avoiding random I/O that would otherwise be needed to traverse all the way down the tree to the find right leaf page for tuple. Alexander Korotkov
* Remove unnecessary #include references, per pgrminclude script.Bruce Momjian2011-09-011-3/+0
|
* Spell checking and markup refinementPeter Eisentraut2011-05-191-1/+1
|
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-101-7/+10
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-011-1/+1
|
* Rewrite the GiST insertion logic so that we don't need the post-recoveryHeikki Linnakangas2010-12-231-616/+181
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cleanup stage to finish incomplete inserts or splits anymore. There was two reasons for the cleanup step: 1. When a new tuple was inserted to a leaf page, the downlink in the parent needed to be updated to contain (ie. to be consistent with) the new key. Updating the parent in turn might require recursively updating the parent of the parent. We now handle that by updating the parent while traversing down the tree, so that when we insert the leaf tuple, all the parents are already consistent with the new key, and the tree is consistent at every step. 2. When a page is split, we need to insert the downlink for the new right page(s), and update the downlink for the original page to not include keys that moved to the right page(s). We now handle that by setting a new flag, F_FOLLOW_RIGHT, on the non-rightmost pages in the split. When that flag is set, scans always follow the rightlink, regardless of the NSN mechanism used to detect concurrent page splits. That way the tree is consistent right after split, even though the downlink is still missing. This is very similar to the way B-tree splits are handled. When the downlink is inserted in the parent, the flag is cleared. To keep the insertion algorithm simple, when an insertion sees an incomplete split, indicated by the F_FOLLOW_RIGHT flag, it finishes the split before doing anything else. These changes allow removing the whole "invalid tuple" mechanism, but I retained the scan code to still follow invalid tuples correctly. While we don't create any such tuples anymore, we want to handle them gracefully in case you pg_upgrade a GiST index that has them. If we encounter any on an insert, though, we just throw an error saying that you need to REINDEX. The issue that got me into doing this is that if you did a checkpoint while an insert or split was in progress, and the checkpoint finishes quickly so that there is no WAL record related to the insert between RedoRecPtr and the checkpoint record, recovery from that checkpoint would not know to finish the incomplete insert. IOW, we have the same issue we solved with the rm_safe_restartpoint mechanism during normal operation too. It's highly unlikely to happen in practice, and this fix is far too large to backpatch, so we're just going to live with in previous versions, but this refactoring fixes it going forward. With this patch, you don't get the annoying 'index "FOO" needs VACUUM or REINDEX to finish crash recovery' notices anymore if you crash at an unfortunate moment.
* Remove cvs keywords from all files.Magnus Hagander2010-09-201-1/+1
|
* Update copyright for the year 2010.Bruce Momjian2010-01-021-2/+2
|
* Fix wrong WAL info value generated when gistContinueInsert() performs anTom Lane2009-12-241-2/+5
| | | | | | | | index page split. This would result in index corruption, or even more likely an error during WAL replay, if we were unlucky enough to crash during end-of-recovery cleanup after having completed an incomplete GIST insertion. Yoichi Hirai
* Allow read only connections during recovery, known as Hot Standby.Simon Riggs2009-12-191-1/+7
| | | | | | | | | | | | Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
* Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock shouldHeikki Linnakangas2009-01-201-2/+3
| | | | | | | | | be used instead of the normal exclusive lock, and make WAL redo functions responsible for calling RestoreBkpBlocks(). They know better what kind of a lock they need. At the moment, this just moves things around with no functional change, but makes the hot standby patch that's under review cleaner.
* Update copyright for 2009.Bruce Momjian2009-01-011-2/+2
|
* Improve our #include situation by moving pointer types away from theAlvaro Herrera2008-06-191-1/+2
| | | | | | | corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.
* Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relationHeikki Linnakangas2008-06-121-19/+13
| | | | | | | | | | forks. XLogOpenRelation() and the associated light-weight relation cache in xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument, instead of Relation. For functions that still need a Relation struct during WAL replay, there's a new function called CreateFakeRelcacheEntry() that returns a fake entry like XLogOpenRelation() used to.
* Restructure some header files a bit, in particular heapam.h, by removing someAlvaro Herrera2008-05-121-2/+3
| | | | | | | | | | | | unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-011-2/+2
|
* Wording cleanup for error messages. Also change can't -> cannot.Bruce Momjian2007-02-011-2/+2
| | | | | | | | | | | | | | Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-051-2/+2
| | | | back-stamped for this.
* pgindent run for 8.2.Bruce Momjian2006-10-041-68/+82
|
* Make recovery from WAL be restartable, by executing a checkpoint-likeTom Lane2006-08-071-1/+9
| | | | | | | | | | operation every so often. This improves the usefulness of PITR log shipping for hot standby: formerly, if the standby server crashed, it was necessary to restart it from the last base backup and replay all the WAL since then. Now it will only need to reread about the same amount of WAL as the master server would. The behavior might also come in handy during a long PITR replay sequence. Simon Riggs, with some editorialization by Tom Lane.
* Remove 576 references of include files that were not needed.Bruce Momjian2006-07-141-5/+1
|
* Add FILLFACTOR to CREATE INDEX.Bruce Momjian2006-07-021-2/+2
| | | | ITAGAKI Takahiro
* Call MarkBufferDirty() before XLogInsert() during completion of insertTeodor Sigaev2006-05-191-4/+6
|
* Simplify gistSplit() and some refactoring related code.Teodor Sigaev2006-05-191-17/+5
|
* Rework completion of incomplete inserts. Now it writesTeodor Sigaev2006-05-191-99/+171
| | | | WAL log during inserts.
* Reduce size of critial section during vacuum full, criticalTeodor Sigaev2006-05-171-49/+82
| | | | | | | | sections now isn't nested. All user-defined functions now is called outside critsections. Small improvements in WAL protocol. TODO: improve XLOG replay
* Reduce size of critical section and remove call of user-defined functions inTeodor Sigaev2006-05-101-2/+2
| | | | | | insertion and deletion, modify gistSplit() to do not use buffers. TODO: gistvacuumcleanup and XLOG
* Fix thinko in gistRedoPageUpdateRecord: if XLR_BKP_BLOCK_1 is set, weTom Lane2006-04-031-13/+15
| | | | | don't have anything to do to the page, but we still have to adjust the incomplete_inserts list that we're maintaining in memory.
* Clean up WAL/buffer interactions as per my recent proposal. Get rid of theTom Lane2006-03-311-15/+13
| | | | | | | | | | | | | | | | misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.
* Improve gist XLOG code to follow the coding rules needed to preventTom Lane2006-03-301-171/+126
| | | | | | | torn-page problems. This introduces some issues of its own, mainly that there are now some critical sections of unreasonably broad scope, but it's a step forward anyway. Further cleanup will require some code refactoring that I'd prefer to get Oleg and Teodor involved in.
* Clean up and document the API for XLogOpenRelation and XLogReadBuffer.Tom Lane2006-03-291-47/+16
| | | | | | | | This commit doesn't make much functional change, but it does eliminate some duplicated code --- for instance, PageIsNew tests are now done inside XLogReadBuffer rather than by each caller. The GIST xlog code still needs a lot of love, but I'll worry about that separately.
* Arrange to emit a description of the current XLOG record as error contextTom Lane2006-03-241-16/+16
| | | | | | | | | when an error occurs during xlog replay. Also, replace the former risky 'write into a fixed-size buffer with no overflow detection' API for XLOG record description routines; use an expansible StringInfo instead. (The latter accounts for most of the patch bulk.) Qingqing Zhou
* Update copyright for 2006. Update scripts.Bruce Momjian2006-03-051-2/+2
|
* pgindent new GIST index code, per request from Tom.Bruce Momjian2005-09-221-396/+501
|
* Adjust GiST error messages to conform to message style guidelines.Tom Lane2005-09-221-23/+25
|
* Improve error messages and add commentTeodor Sigaev2005-07-011-5/+14
|
* Bug fixes for GiST crash recovery.Teodor Sigaev2005-06-301-37/+77
| | | | | | - add forgotten check of lsn for insert completion - remove level of pages: hard to check in recovery - some cleanups
* Code cleanup. gistfillbuffer accepts InvalidOffsetNumber.Teodor Sigaev2005-06-281-12/+4
|
* Concurrency for GiSTTeodor Sigaev2005-06-271-59/+42
| | | | | | | | | | | | | | | | | | - full concurrency for insert/update/select/vacuum: - select and vacuum never locks more than one page simultaneously - select (gettuple) hasn't any lock across it's calls - insert never locks more than two page simultaneously: - during search of leaf to insert it locks only one page simultaneously - while walk upward to the root it locked only parent (may be non-direct parent) and child. One of them X-lock, another may be S- or X-lock - 'vacuum full' locks index - improve gistgetmulti - simplify XLOG records Fix bug in index_beginscan_internal: LockRelation may clean rd_aminfo structure, so move GET_REL_PROCEDURE after LockRelation
* fix founded hole in recovery after crash, add vacuum_delay_point()Teodor Sigaev2005-06-201-127/+49
|
* 1. full functional WAL for GiSTTeodor Sigaev2005-06-201-134/+406
| | | | | | | | | | | 2. improve vacuum for gist - use FSM - full vacuum: - reforms parent tuple if it's needed ( tuples was deleted on child page or parent tuple remains invalid after crash recovery ) - truncate index file if possible 3. fixes bugs and mistakes
* WAL for GiST. It work for online backup and so on, but onTeodor Sigaev2005-06-141-0/+628
recovery after crash (power loss etc) it may say that it can't restore index and index should be reindexed. Some refactoring code.