summaryrefslogtreecommitdiff
path: root/src/include/access
Commit message (Collapse)AuthorAgeFilesLines
* Implement new 'lightweight lock manager' that's intermediate betweenTom Lane2001-09-292-6/+5
| | | | | | | | | existing lock manager and spinlocks: it understands exclusive vs shared lock but has few other fancy features. Replace most uses of spinlocks with lightweight locks. All remaining uses of spinlocks have very short lock hold times (a few dozen instructions), so tweak spinlock backoff code to work efficiently given this assumption. All per my proposal on pghackers 26-Sep-01.
* Measure the current transaction time to milliseconds.Thomas G. Lockhart2001-09-281-1/+4
| | | | | | | | | | | | | | Define a new function, GetCurrentTransactionStartTimeUsec() to get the time to this precision. Allow now() and timestamp 'now' to use this higher precision result so we now have fractional seconds in this "constant". Add timestamp without time zone type. Move previous timestamp type to timestamp with time zone. Accept another ISO variant for date/time values: yyyy-mm-ddThh:mm:ss (note the "T" separating the day from hours information). Remove 'current' from date/time types; convert to 'now' in input. Separate time and timetz regression tests. Separate timestamp and timestamptz regression test.
* Suppress compiler warning.Tom Lane2001-09-171-2/+2
|
* Apply 7.1.3 changes to the current tree also.Hiroshi Inoue2001-09-081-1/+2
|
* Transaction IDs wrap around, per my proposal of 13-Aug-01. MoreTom Lane2001-08-262-7/+9
| | | | documentation to come, but the code is all here. initdb forced.
* Replace implementation of pg_log as a relation accessed through theTom Lane2001-08-257-50/+67
| | | | | | | | | | | buffer manager with 'pg_clog', a specialized access method modeled on pg_xlog. This simplifies startup (don't need to play games to open pg_log; among other things, OverrideTransactionSystem goes away), should improve performance a little, and opens the door to recycling commit log space by removing no-longer-needed segments of the commit log. Actual recycling is not there yet, but I felt I should commit this part separately since it'd still be useful if we chose not to do transaction ID wraparound.
* Ensure that all TransactionId comparisons are encapsulated in macrosTom Lane2001-08-231-16/+24
| | | | | (TransactionIdPrecedes, TransactionIdFollows, etc). First step on the way to transaction ID wrap solution ...
* Update GiST for new pg_opclass arrangement (finally a clean solutionTom Lane2001-08-221-4/+5
| | | | | | for haskeytype). Update GiST contrib modules too. Add linear-time split algorithm for R-tree GiST opclass. From Oleg Bartunov and Teodor Sigaev.
* Restructure pg_opclass, pg_amop, and pg_amproc per previous discussions inTom Lane2001-08-211-3/+3
| | | | | | | | | | | | | | | | | | | | pgsql-hackers. pg_opclass now has a row for each opclass supported by each index AM, not a row for each opclass name. This allows pg_opclass to show directly whether an AM supports an opclass, and furthermore makes it possible to store additional information about an opclass that might be AM-dependent. pg_opclass and pg_amop now store "lossy" and "haskeytype" information that we previously expected the user to remember to provide in CREATE INDEX commands. Lossiness is no longer an index-level property, but is associated with the use of a particular operator in a particular index opclass. Along the way, IndexSupportInitialize now uses the syscaches to retrieve pg_amop and pg_amproc entries. I find this reduces backend launch time by about ten percent, at the cost of a couple more special cases in catcache.c's IndexScanOK. Initial work by Oleg Bartunov and Teodor Sigaev, further hacking by Tom Lane. initdb forced.
* Make OIDs optional, per discussions in pghackers. WITH OIDS is still theTom Lane2001-08-101-5/+21
| | | | | | | | | | | | default, but OIDS are removed from many system catalogs that don't need them. Some interesting side effects: TOAST pointers are 20 bytes not 32 now; pg_description has a three-column key instead of one. Bugs fixed in passing: BINARY cursors work again; pg_class.relhaspkey has some usefulness; pg_dump dumps comments on indexes, rules, and triggers in a valid order. initdb forced.
* 1. null-safe interface to GiSTBruce Momjian2001-08-101-2/+5
| | | | | | | | | | | | | | (as proposed in http://fts.postgresql.org/db/mw/msg.html?mid=1028327) 2. support for 'pass-by-value' arguments - to test this we used special opclass for int4 with values in range [0-2^15] More testing will be done after resolving problem with index_formtuple and implementation of B-tree using GiST 3. small patch to contrib modules (seg,cube,rtree_gist,intarray) - mark functions as 'isstrict' where needed. Oleg Bartunov
* Arrange to recycle old XLOG log segment files as new segment files,Tom Lane2001-07-191-2/+3
| | | | | | | | | | | | | | rather than deleting them only to have to create more. Steady state is 2*CHECKPOINT_SEGMENTS + WAL_FILES + 1 segment files, which will simply be renamed rather than constantly deleted and recreated. To make this safe, added current XLOG file/offset number to page header of XLOG pages, so that an un-overwritten page from an old incarnation of a logfile can be reliably told from a valid page. This change means that if you try to restart postmaster in a CVS-tip database after installing the change, you'll get a complaint about bad XLOG page magic number. If you don't want to initdb, run contrib/pg_resetxlog (and be sure you shut down the old postmaster cleanly).
* Restructure index AM interface for index building and index tuple deletion,Tom Lane2001-07-156-33/+35
| | | | | | | | | | | | | | | | | | | | | | | | | per previous discussion on pghackers. Most of the duplicate code in different AMs' ambuild routines has been moved out to a common routine in index.c; this means that all index types now do the right things about inserting recently-dead tuples, etc. (I also removed support for EXTEND INDEX in the ambuild routines, since that's about to go away anyway, and it cluttered the code a lot.) The retail indextuple deletion routines have been replaced by a "bulk delete" routine in which the indexscan is inside the access method. I haven't pushed this change as far as it should go yet, but it should allow considerable simplification of the internal bookkeeping for deletions. Also, add flag columns to pg_am to eliminate various hardcoded tests on AM OIDs, and remove unused pg_am columns. Fix rtree and gist index types to not attempt to store NULLs; before this, gist usually crashed, while rtree managed not to crash but computed wacko bounding boxes for NULL entries (which might have had something to do with the performance problems we've heard about occasionally). Add AtEOXact routines to hash, rtree, and gist, all of which have static state that needs to be reset after an error. We discovered this need long ago for btree, but missed the other guys. Oh, one more thing: concurrent VACUUM is now the default.
* Create a new HeapTupleSatisfiesVacuum() routine in tqual.c that embodies theTom Lane2001-07-123-80/+43
| | | | | | validity checking rules for VACUUM. Make some other rearrangements of the VACUUM code to allow more code to be shared between full and lazy VACUUM. Minor code cleanups and added comments for TransactionId manipulations.
* Further work on connecting the free space map (which is still just aTom Lane2001-06-291-2/+2
| | | | | | | | stub) into the rest of the system. Adopt a cleaner approach to preventing deadlock in concurrent heap_updates: allow RelationGetBufferForTuple to select any page of the rel, and put the onus on it to lock both buffers in a consistent order. Remove no-longer-needed isExtend hack from API of ReleaseAndReadBuffer.
* Statistical system views (yet without the config stuff, butJan Wieck2001-06-222-3/+7
| | | | | | | it's hard to keep such massive changes in sync with the tree so I need to get it in and work from there now). Jan
* Clean up various to-do items associated with system indexes:Tom Lane2001-06-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pg_database now has unique indexes on oid and on datname. pg_shadow now has unique indexes on usename and on usesysid. pg_am now has unique index on oid. pg_opclass now has unique index on oid. pg_amproc now has unique index on amid+amopclaid+amprocnum. Remove pg_rewrite's unnecessary index on oid, delete unused RULEOID syscache. Remove index on pg_listener and associated syscache for performance reasons (caching rows that are certain to change before you need 'em again is rather pointless). Change pg_attrdef's nonunique index on adrelid into a unique index on adrelid+adnum. Fix various incorrect settings of pg_class.relisshared, make that the primary reference point for whether a relation is shared or not. IsSharedSystemRelationName() is now only consulted to initialize relisshared during initial creation of tables and indexes. In theory we might now support shared user relations, though it's not clear how one would get entries for them into pg_class &etc of multiple databases. Fix recently reported bug that pg_attribute rows created for an index all have the same OID. (Proof that non-unique OID doesn't matter unless it's actually used to do lookups ;-)) There's no need to treat pg_trigger, pg_attrdef, pg_relcheck as bootstrap relations. Convert them into plain system catalogs without hardwired entries in pg_class and friends. Unify global.bki and template1.bki into a single init script postgres.bki, since the alleged distinction between them was misleading and pointless. Not to mention that it didn't work for setting up indexes on shared system relations. Rationalize locking of pg_shadow, pg_group, pg_attrdef (no need to use AccessExclusiveLock where ExclusiveLock or even RowExclusiveLock will do). Also, hold locks until transaction commit where necessary.
* Remove RelationGetBufferWithBuffer(), which is horribly confused aboutTom Lane2001-06-092-30/+18
| | | | | | | | | appropriate pin-count manipulation, and instead use ReleaseAndReadBuffer. Make use of the fact that the passed-in buffer (if there is one) must be pinned to avoid grabbing the bufmgr spinlock when we are able to return this same buffer. Eliminate unnecessary 'previous tuple' and 'next tuple' fields of HeapScanDesc and IndexScanDesc, thereby removing a whole lot of bookkeeping from heap_getnext() and related routines.
* Clean up some minor problems exposed by further thought about Panon's bugTom Lane2001-06-011-7/+7
| | | | | | | | | | | | | | report on old-style functions invoked by RI triggers. We had a number of other places that were being sloppy about which memory context FmgrInfo subsidiary data will be allocated in. Turns out none of them actually cause a problem in 7.1, but this is for arcane reasons such as the fact that old-style triggers aren't supported anyway. To avoid getting burnt later, I've restructured the trigger support so that we don't keep trigger FmgrInfo structs in relcache memory. Some other related cleanups too: it's not really necessary to call fmgr_info at all while setting up the index support info in relcache entries, because those ScanKeyEntry structs are never used to invoke the functions. This should speed up relcache initialization a tiny bit.
* Updates to make GIST work with multi-key indexes (from Oleg BartunovTom Lane2001-05-311-43/+29
| | | | | and Teodor Sigaev). Declare key values as Datum where appropriate, rather than char* (Tom Lane).
* Tweak StrategyEvaluation data structure to eliminate hardwired limit onTom Lane2001-05-304-74/+57
| | | | | number of strategies supported by an index AM. Add missing copyright notices and CVS $Header$ markers to GIST source files.
* Remove unused, redundant header files.Tom Lane2001-05-302-42/+0
|
* Oops, only wanted python change in the last commit. Backing out.Bruce Momjian2001-05-251-2/+1
|
* While changing Cygwin Python to build its core as a DLL (like Win32Bruce Momjian2001-05-251-1/+2
| | | | | | | | | | | | | | | Python) to support shared extension modules, I have learned that Guido prefers the style of the attached patch to solve the above problem. I feel that this solution is particularly appropriate in this case because the following: PglargeType PgType PgQueryType are already being handled in the way that I am proposing for PgSourceType. Jason Tishler
* Repair race condition introduced into heap_update() in 7.1 ---Tom Lane2001-05-161-2/+3
| | | | | | | | | | | | | | | PageGetFreeSpace() was being called while not holding the buffer lock, which not only could yield a garbage answer, but even if it's the right answer there might be less space available after we reacquire the buffer lock. Also repair potential deadlock introduced by my recent performance improvement in RelationGetBufferForTuple(): it was possible for two heap_updates to try to lock two buffers in opposite orders. The fix creates a global rule that buffers of a single heap relation should be locked in decreasing block number order. Currently, this only applies to heap_update; VACUUM can get away with ignoring the rule since it holds exclusive lock on the whole relation anyway. However, if we try to implement a VACUUM that can run in parallel with other transactions, VACUUM will also have to obey the lock order rule.
* Remove unused tables pg_variable, pg_inheritproc, pg_ipl tables. InitdbBruce Momjian2001-05-141-35/+1
| | | | forced.
* Rewrite of planner statistics-gathering code. ANALYZE is now available asTom Lane2001-05-071-4/+8
| | | | | | | | | | | | | | | | | a separate statement (though it can still be invoked as part of VACUUM, too). pg_statistic redesigned to be more flexible about what statistics are stored. ANALYZE now collects a list of several of the most common values, not just one, plus a histogram (not just the min and max values). Random sampling is used to make the process reasonably fast even on very large tables. The number of values and histogram bins collected is now user-settable via an ALTER TABLE command. There is more still to do; the new stats are not being used everywhere they could be in the planner. But the remaining changes for this project should be localized, and the behavior is already better than before. A not-very-related change is that sorting now makes use of btree comparison routines if it can find one, rather than invoking '<' twice.
* Improve comments for xlog item size #defines.Tom Lane2001-03-251-4/+11
|
* Remove dashes in comments that don't need them, rewrap with pgindent.Bruce Momjian2001-03-221-1/+2
|
* pgindent run. Make it all clean.Bruce Momjian2001-03-2217-168/+171
|
* Remove NEXTXID xlog record type to avoid three-way deadlock risk.Tom Lane2001-03-182-6/+4
| | | | | NEXTXID isn't really necessary, per previous discussion in pghackers, but I mulishy insisted we should put it in anyway. Mea culpa.
* Support syncing WAL log to disk using either fsync(), fdatasync(),Tom Lane2001-03-161-1/+13
| | | | | | | | O_SYNC, or O_DSYNC (as available on a given platform). Add GUC parameter to control sync method. Also, add defense to XLogWrite to prevent it from going nuts if passed a target write position that's past the end of the buffers so far filled by XLogInsert.
* Change xlog page-header format to include StartUpID. Use the SUI toTom Lane2001-03-131-3/+9
| | | | | | | | | detect case that next page in log came from an older run than the prior page. This avoids the necessity to re-zero the log after recovery from a crash, which is good because we need not risk destroying valuable log information. This forces another initdb since yesterday :-(. Need to get that log reset utility done...
* XLOG (and related) changes:Tom Lane2001-03-134-72/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Store two past checkpoint locations, not just one, in pg_control. On startup, we fall back to the older checkpoint if the newer one is unreadable. Also, a physical copy of the newest checkpoint record is kept in pg_control for possible use in disaster recovery (ie, complete loss of pg_xlog). Also add a version number for pg_control itself. Remove archdir from pg_control; it ought to be a GUC parameter, not a special case (not that it's implemented yet anyway). * Suppress successive checkpoint records when nothing has been entered in the WAL log since the last one. This is not so much to avoid I/O as to make it actually useful to keep track of the last two checkpoints. If the things are right next to each other then there's not a lot of redundancy gained... * Change CRC scheme to a true 64-bit CRC, not a pair of 32-bit CRCs on alternate bytes. Polynomial borrowed from ECMA DLT1 standard. * Fix XLOG record length handling so that it will work at BLCKSZ = 32k. * Change XID allocation to work more like OID allocation. (This is of dubious necessity, but I think it's a good idea anyway.) * Fix a number of minor bugs, such as off-by-one logic for XLOG file wraparound at the 4 gig mark. * Add documentation and clean up some coding infelicities; move file format declarations out to include files where planned contrib utilities can get at them. * Checkpoint will now occur every CHECKPOINT_SEGMENTS log segments or every CHECKPOINT_TIMEOUT seconds, whichever comes first. It is also possible to force a checkpoint by sending SIGUSR1 to the postmaster (undocumented feature...) * Defend against kill -9 postmaster by storing shmem block's key and ID in postmaster.pid lockfile, and checking at startup to ensure that no processes are still connected to old shmem block (if it still exists). * Switch backends to accept SIGQUIT rather than SIGUSR1 for emergency stop, for symmetry with postmaster and xlog utilities. Clean up signal handling in bootstrap.c so that xlog utilities launched by postmaster will react to signals better. * Standalone bootstrap now grabs lockfile in target directory, as added insurance against running it in parallel with live postmaster.
* Implement COMMIT_SIBLINGS parameter to allow pre-commit delay to occurTom Lane2001-02-261-1/+6
| | | | | | | | | | | only if at least N other backends currently have open transactions. This is not a great deal of intelligence about whether a delay might be profitable ... but it beats no intelligence at all. Note that the default COMMIT_DELAY is still zero --- this new code does nothing unless that setting is changed. Also, mark ENABLEFSYNC as a system-wide setting. It's no longer safe to allow that to be set per-backend, since we may be relying on some other backend's fsync to have synced the WAL log.
* More comment improvements.Bruce Momjian2001-02-221-4/+4
|
* Clean up index/btree comments/macros, as approved.Bruce Momjian2001-02-222-31/+41
|
* Comment improvements.Bruce Momjian2001-02-213-12/+18
|
* Although we can't support out-of-line TOAST storage in indexes (yet),Tom Lane2001-02-151-1/+15
| | | | | | | | compressed storage works perfectly well. Might as well have a coherent strategy for applying it, rather than the haphazard store-what-you-get approach that was in the code before. The strategy I've set up here is to attempt compression of any compressible index value exceeding BLCKSZ/16, or about 500 bytes by default.
* Macro for btree runtime fix.Vadim B. Mikheev2001-02-071-1/+5
|
* Change Copyright from PostgreSQL, Inc to PostgreSQL Global Development Group.Bruce Momjian2001-01-2424-48/+48
|
* Fix all the places that called heap_update() and heap_delete() withoutTom Lane2001-01-231-1/+4
| | | | | | | | | | | bothering to check the return value --- which meant that in case the update or delete failed because of a concurrent update, you'd not find out about it, except by observing later that the transaction produced the wrong outcome. There are now subroutines simple_heap_update and simple_heap_delete that should be used anyplace that you're not prepared to do the full nine yards of coping with concurrent updates. In practice, that seems to mean absolutely everywhere but the executor, because *noplace* else was checking.
* Restructure backend SIGINT/SIGTERM handling so that 'die' interruptsTom Lane2001-01-141-2/+1
| | | | | | | are treated more like 'cancel' interrupts: the signal handler sets a flag that is examined at well-defined spots, rather than trying to cope with an interrupt that might happen anywhere. See pghackers discussion of 1/12/01.
* Add more critical-section calls: all code sections that hold spinlocksTom Lane2001-01-121-2/+2
| | | | | | | | | | | are now critical sections, so as to ensure die() won't interrupt us while we are munging shared-memory data structures. Avoid insecure intermediate states in some code that proc_exit will call, like palloc/pfree. Rename START/END_CRIT_CODE to START/END_CRIT_SECTION, since that seems to be what people tend to call them anyway, and make them be called with () like a function call, in hopes of not confusing pg_indent. I doubt that this is sufficient to make SIGTERM safe anywhere; there's just too much code that could get invoked during proc_exit().
* 1. WAL needs in zero-ed content of newly initialized page.Vadim B. Mikheev2000-12-301-1/+2
| | | | | 2. Log record for PageRepaireFragmentation now keeps array of !LP_USED offnums to redo cleanup properly.
* New WAL version - CRC and data blocks backup.Vadim B. Mikheev2000-12-283-52/+88
|
* Fix portability problems recently exposed by regression tests on Alphas.Tom Lane2000-12-274-80/+151
| | | | | | | | | | 1. Distinguish cases where a Datum representing a tuple datatype is an OID from cases where it is a pointer to TupleTableSlot, and make sure we use the right typlen in each case. 2. Make fetchatt() and related code support 8-byte by-value datatypes on machines where Datum is 8 bytes. Centralize knowledge of the available by-value datatype sizes in two macros in tupmacs.h, so that this will be easier if we ever have to do it again.
* Clean up backend-exit-time cleanup behavior. Use on_shmem_exit callbacksTom Lane2000-12-181-2/+2
| | | | | | to ensure that we have released buffer refcounts and so forth, rather than putting ad-hoc operations before (some of the calls to) proc_exit. Add commentary to discourage future hackers from repeating that mistake.
* Disable elog(ERROR|FATAL) in signal handlers inVadim B. Mikheev2000-12-031-1/+2
| | | | critical sections of code.
* Make tuple receive/print routines TOAST-aware. Formerly, printtup wouldTom Lane2000-12-011-7/+7
| | | | | leak memory when printing a toasted attribute, and printtup_internal didn't work at all...