summaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAgeFilesLines
* Teach heapam code to know the difference between a real seqscan and theTom Lane2007-06-092-23/+40
| | | | | | | | | pseudo HeapScanDesc created for a bitmap heap scan. This avoids some useless overhead during a bitmap scan startup, in particular invoking the syncscan code. (We might someday want to do that, but right now it's merely useless contention for shared memory, to say nothing of possibly pushing useful entries out of syncscan's small LRU list.) This also allows elimination of ugly pgstat_discount_heap_scan() kluge.
* Allow numeric_fac() to be interrupted, since it can take quite a while forTom Lane2007-06-091-1/+10
| | | | | | large inputs. Also cause it to error out immediately if the result will overflow, instead of grinding through a lot of calculation first. Per gripe from Jim Nasby.
* Disallow the cost balancing code from resulting in a zero cost limit, whichAlvaro Herrera2007-06-081-2/+6
| | | | | | | | causes a division-by-zero error in the vacuum code. This can happen when there are more workers than cost limit units. Per report from Galy Lee in <200705310914.l4V9E6JA094603@wwwmaster.postgresql.org>.
* Avoid passing zero as a value for vacuum_cost_limit, because it's not a validAlvaro Herrera2007-06-081-5/+11
| | | | | | | | | | value for the vacuum code. Instead, make zero signify getting the value from a higher level configuration facility, just like -1 in the original coding. We still document that -1 is the value that disables the feature, to avoid confusing the user unnecessarily. Reported by Galy Lee in <200705310914.l4V9E6JA094603@wwwmaster.postgresql.org>; per subsequent discussion.
* Arrange for large sequential scans to synchronize with each other, so thatTom Lane2007-06-086-19/+457
| | | | | | | when multiple backends are scanning the same relation concurrently, each page is (ideally) read only once. Jeff Davis, with review by Heikki and Tom.
* Redefine IsTransactionState() to only return true for TRANS_INPROGRESS state,Tom Lane2007-06-073-39/+23
| | | | | | | | | which is the only state in which it's safe to initiate database queries. It turns out that all but two of the callers thought that's what it meant; and the other two were using it as a proxy for "will GetTopTransactionId() return a nonzero XID"? Since it was in fact an unreliable guide to that, make those two just invoke GetTopTransactionId() always, then deal with a zero result if they get one.
* Rework temp_tablespaces patch so that temp tablespaces are assigned separatelyTom Lane2007-06-078-147/+254
| | | | | | | | | for each temp file, rather than once per sort or hashjoin; this allows spreading the data of a large sort or join across multiple tablespaces. (I remain dubious that this will make any difference in practice, but certain people insisted.) Arrange to cache the results of parsing the GUC variable instead of recomputing from scratch on every demand, and push usage of the cache down to the bottommost fd.c level.
* Avoid losing track of data for shared tables in pgstats. Report by MichaelAlvaro Herrera2007-06-071-2/+4
| | | | Fuhr, patch from Tom Lane after a messier suggestion by me.
* Fix up text concatenation so that it accepts all the reasonable cases thatTom Lane2007-06-068-71/+149
| | | | | | | | were accepted by prior Postgres releases. This takes care of the loose end left by the preceding patch to downgrade implicit casts-to-text. To avoid breaking desirable behavior for array concatenation, introduce a new polymorphic pseudo-type "anynonarray" --- the added concatenation operators are actually text || anynonarray and anynonarray || text.
* Minor editorialization: don't flush plan cache without need.Tom Lane2007-06-051-22/+17
|
* Downgrade implicit casts to text to be assignment-only, except for the onesTom Lane2007-06-0527-1024/+551
| | | | | | | | | | | | | | | | | | | | | | | | | from the other string-category types; this eliminates a lot of surprising interpretations that the parser could formerly make when there was no directly applicable operator. Create a general mechanism that supports casts to and from the standard string types (text,varchar,bpchar) for *every* datatype, by invoking the datatype's I/O functions. These new casts are assignment-only in the to-string direction, explicit-only in the other, and therefore should create no surprising behavior. Remove a bunch of thereby-obsoleted datatype-specific casting functions. The "general mechanism" is a new expression node type CoerceViaIO that can actually convert between *any* two datatypes if their external text representations are compatible. This is more general than needed for the immediate feature, but might be useful in plpgsql or other places in future. This commit does nothing about the issue that applying the concatenation operator || to non-text types will now fail, often with strange error messages due to misinterpreting the operator as array concatenation. Since it often (not always) worked before, we should either make it succeed or at least give a more user-friendly error; but details are still under debate. Peter Eisentraut and Tom Lane
* The session_replication_role actually can be changed at will duringJan Wieck2007-06-052-17/+11
| | | | | | | | a session regardless of the existence of cached plans. The plancache only needs to be invalidated so that rules affected by the new setting will be reflected in the new query plans. Jan
* Move call of MarkBufferDirty() before XLogInsert() as required.Teodor Sigaev2007-06-053-18/+25
| | | | | Many thanks to Heikki Linnakangas <heikki@enterprisedb.com> for his sharp eyes.
* Remove ill-conceived CRLF translation for Windows in syslogger.Andrew Dunstan2007-06-041-47/+3
|
* Fix bundle bugs of GIN:Teodor Sigaev2007-06-045-54/+146
| | | | | | | | | | | | | | | | | - Fix possible deadlock between UPDATE and VACUUM queries. Bug never was observed in 8.2, but it still exist there. HEAD is more sensitive to bug after recent "ring" of buffer improvements. - Fix WAL creation: if parent page is stored as is after split then incomplete split isn't removed during replay. This happens rather rare, only on large tables with a lot of updates/inserts. - Fix WAL replay: there was wrong test of XLR_BKP_BLOCK_* for left page after deletion of page. That causes wrong rightlink field: it pointed to deleted page. - add checking of match of clearing incomplete split - cleanup incomplete split list after proceeding All of this chages doesn't change on-disk storage, so backpatch... But second point may be an issue for replaying logs from previous version.
* On win32, retry reading when WSARecv returns WSAEWOULDBLOCK. There seemMagnus Hagander2007-06-041-10/+30
| | | | | | | to be cases when at least Windows 2000 can do this even though select just indicated that the socket is readable. Per report and analysis from Cyril VELTER.
* On win32, don't use SO_REUSEADDR for TCP sockets.Magnus Hagander2007-06-041-1/+12
| | | | Per failure on buildfarm member baiji and subsequent discussion.
* Clarify some error messages about duplicate things.Peter Eisentraut2007-06-034-11/+11
|
* Create a GUC parameter temp_tablespaces that allows selection of theTom Lane2007-06-0312-111/+363
| | | | | | | | | | tablespace(s) in which to store temp tables and temporary files. This is a list to allow spreading the load across multiple tablespaces (a random list element is chosen each time a temp object is to be created). Temp files are not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace directories. Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
* Minimal message corrections found by spell checker.Peter Eisentraut2007-06-023-7/+7
|
* Fix erroneous error reporting for overlength input in text_date(),Tom Lane2007-06-021-5/+8
| | | | text_time(), and text_timetz(). 7.4-vintage bug found by Greg Stark.
* Improve efficiency of LIKE/ILIKE code, especially for multi-byte charsets,Andrew Dunstan2007-06-022-439/+183
| | | | | | | | | | | and most especially for UTF8. Remove unnecessary special cases for bytea processing and single-byte charset ILIKE. a ILIKE b is now processed as lower(a) LIKE lower(b) in all cases. The code is now considerably simpler. All comparisons are now performed byte-wise, and the text and pattern are also advanced byte-wise where it is safe to do so - essentially where a wildcard is not being matched. Andrew Dunstan, from an original patch by ITAGAKI Takahiro, with ideas from Tom Lane and Mark Mielke.
* Fix aboriginal bug in BufFileDumpBuffer that would cause it to write theTom Lane2007-06-011-2/+2
| | | | | | | | wrong data when dumping a bufferload that crosses a component-file boundary. This probably has not been seen in the wild because (a) component files are normally 1GB apiece and (b) non-block-aligned buffer usage is relatively rare. But it's fairly easy to reproduce a problem if one reduces RELSEG_SIZE in a test build. Kudos to Kurt Harriman for spotting the bug.
* Allow leading and trailing whitespace in the input to the booleanNeil Conway2007-06-011-11/+57
| | | | | | | type. Also, add explicit casts between boolean and text/varchar. Both of these changes are for conformance with SQL:2003. Update the regression tests, bump the catversion.
* Make CREATE/DROP/RENAME DATABASE wait a little bit to see if other backendsTom Lane2007-06-012-96/+135
| | | | | | | will exit before failing because of conflicting DB usage. Per discussion, this seems a good idea to help mask the fact that backend exit takes nonzero time. Remove a couple of thereby-obsoleted sleeps in contrib and PL regression test sequences.
* Buy back some of the cycles spent in more-expensive hash functions byTom Lane2007-06-011-29/+27
| | | | | | | selecting power-of-2, rather than prime, numbers of buckets in hash joins. If the hash functions are doing their jobs properly by making all hash bits equally random, this is good enough, and it saves expensive integer division and modulus operations.
* Fix several hash functions that were taking chintzy shortcuts instead ofTom Lane2007-06-013-38/+63
| | | | | | | | | | | | | delivering a well-randomized hash value. I got religion on this after observing that performance of multi-batch hash join degrades terribly if the higher-order bits of hash values aren't random, as indeed was true for say hashes of small integer values. It's now expected and documented that hash functions should use hash_any or some comparable method to ensure that all bits of their output are about equally random. initdb forced because this change invalidates existing hash indexes. For the same reason, this isn't back-patchable; the hash join performance problem will get a band-aid fix in the back branches.
* The shortcut exit that I recently added to ExecInitIndexScan() forTom Lane2007-05-311-7/+7
| | | | | | | | | EXPLAIN-only operation was a little too short; it skipped initializing the node's result tuple type, which may be needed depending on what's above the indexscan node. Call ExecAssignResultTypeFromTL before exiting. (For good luck I moved up the ExecAssignScanProjectionInfo call as well, so that everything except indexscan-specific initialization will still be done.) Per example from Grant Finnemore.
* Change build_index_pathkeys() so that the expressions it builds to representTom Lane2007-05-312-9/+34
| | | | | | | | | | | | | index key columns always have the type expected by the index's associated operators, ie, we add RelabelType nodes when dealing with binary-compatible index opclasses. This is needed to get varchar indexes to play nicely with the new EquivalenceClass machinery, as per recent gripe from Josh Berkus that CVS HEAD was failing to match a varchar index column to a constant restriction in the query. It seems likely that this change will allow removal of a lot of ugly ad-hoc RelabelType-stripping that the planner has traditionally done while matching expressions to other expressions, but I'll worry about that some other day.
* Make some messages more consistentPeter Eisentraut2007-05-312-4/+4
|
* Replace ReadBuffer to ReadBufferWithStrategy in all vacuum-involved placesTeodor Sigaev2007-05-312-16/+21
| | | | to implement limited-size "ring" of buffers for VACUUM for GIN & GIST
* Downgrade some low-level startup messages to DEBUG1.Peter Eisentraut2007-05-311-6/+6
|
* Fix overly-strict sanity check in BeginInternalSubTransaction that made itTom Lane2007-05-301-7/+8
| | | | | | fail when used in a deferred trigger. Bug goes back to 8.0; no doubt the reason it hadn't been noticed is that we've been discouraging use of user-defined constraint triggers. Per report from Frank van Vugt.
* Make large sequential scans and VACUUMs work in a limited-size "ring" ofTom Lane2007-05-3016-244/+672
| | | | | | | | | | | | | | | | | | | | | | | buffers, rather than blowing out the whole shared-buffer arena. Aside from avoiding cache spoliation, this fixes the problem that VACUUM formerly tended to cause a WAL flush for every page it modified, because we had it hacked to use only a single buffer. Those flushes will now occur only once per ring-ful. The exact ring size, and the threshold for seqscans to switch into the ring usage pattern, remain under debate; but the infrastructure seems done. The key bit of infrastructure is a new optional BufferAccessStrategy object that can be passed to ReadBuffer operations; this replaces the former StrategyHintVacuum API. This patch also changes the buffer usage-count methodology a bit: we now advance usage_count when first pinning a buffer, rather than when last unpinning it. To preserve the behavior that a buffer's lifetime starts to decrease when it's released, the clock sweep code is modified to not decrement usage_count of pinned buffers. Work not done in this commit: teach GiST and GIN indexes to use the vacuum BufferAccessStrategy for vacuum-driven fetches. Original patch by Simon, reworked by Heikki and again by Tom.
* Tweak: use memcpy() in text_time(), rather than manually copying bytesNeil Conway2007-05-301-12/+8
| | | | in a loop.
* Fix a bug in input processing for the "interval" type. Previously,Neil Conway2007-05-291-2/+14
| | | | | | | | "microsecond" and "millisecond" units were not considered valid input by themselves, which caused inputs like "1 millisecond" to be rejected erroneously. Update the docs, add regression tests, and backport to 8.2 and 8.1
* mmgr README tweak: "either" is no longer correct. The previous wordingNeil Conway2007-05-291-2/+2
| | | | compared PortalContext with QueryContext, but the latter no longer exists.
* Tweak the code in a couple of places to try to deliver more user-friendlyTom Lane2007-05-282-17/+50
| | | | | error messages when a single COPY line is too long for us to handle. Per example from Johann Spies.
* Code cleanup: use "bool" for Boolean variables, rather than "int".Neil Conway2007-05-271-15/+15
|
* Ooops, I was too busy worrying about getting the transactional infrastructureTom Lane2007-05-271-4/+9
| | | | | right to think carefully about how insert and delete counts map to n_live_tuples. Of course a deletion should reduce n_live_tuples.
* pgstat's on-proc-exit hook has to execute after the last transaction commitTom Lane2007-05-272-17/+33
| | | | | | | or abort within a backend; rearrange InitPostgres processing to make it so. Revealed by just-added Asserts along with ECPG regression tests (hm, I wonder why the core regression tests didn't expose it?). This possibly is another reason for missing stats updates ...
* Fix up pgstats counting of live and dead tuples to recognize that committedTom Lane2007-05-2715-316/+662
| | | | | | | | | | | and aborted transactions have different effects; also teach it not to assume that prepared transactions are always committed. Along the way, simplify the pgstats API by tying counting directly to Relations; I cannot detect any redeeming social value in having stats pointers in HeapScanDesc and IndexScanDesc structures. And fix a few corner cases in which counts might be missed because the relation's pgstat_info pointer hadn't been set.
* Repair two constraint-exclusion corner cases triggered by proving that anTom Lane2007-05-262-16/+38
| | | | | | | | | | | inheritance child of an UPDATE/DELETE target relation can be excluded by constraints. I had rearranged some code in set_append_rel_pathlist() to avoid "useless" work when a child is excluded, but overdid it and left the child with no cheapest_path entry, causing possible failure later if the appendrel was involved in a join. Also, it seems that the dummy plan generated by inheritance_planner() when all branches are excluded has to be a bit less dummy now than was required in 8.2. Per report from Jan Wieck. Add his test case to the regression tests.
* Create hooks to let a loadable plugin monitor (or even replace) the plannerTom Lane2007-05-256-57/+135
| | | | | | | | | | | | | | | and/or create plans for hypothetical situations; in particular, investigate plans that would be generated using hypothetical indexes. This is a heavily-rewritten version of the hooks proposed by Gurjeet Singh for his Index Advisor project. In this formulation, the index advisor can be entirely a loadable module instead of requiring a significant part to be in the core backend, and plans can be generated for hypothetical indexes without requiring the creation and rolling-back of system catalog entries. The index advisor patch as-submitted is not compatible with these hooks, but it needs significant work anyway due to other 8.2-to-8.3 planner changes. With these hooks in the core backend, development of the advisor can proceed as a pgfoundry project.
* Remove ruleutils.c's use of varnoold/varoattno as a shortcut for determiningTom Lane2007-05-241-16/+20
| | | | | | | | | | what a Var node refers to. This is no longer necessary because the new flat-range-table representation of plan trees makes it relatively easy to dig down through child plan levels to find the original reference; and to keep doing it that way, we'd have to store joinaliasvars lists in flattened RTEs, as demonstrated by bug report from Leszek Trenkner. This change makes varnoold/varoattno truly just debug aids, which wasn't quite the case before. Perhaps we should drop them, or only have them in assert-enabled builds?
* Repair planner bug introduced in 8.2 by ability to rearrange outer joins:Tom Lane2007-05-224-12/+51
| | | | | | | | | | | | | | | | | | | in cases where a sub-SELECT inserts a WHERE clause between two outer joins, that clause may prevent us from re-ordering the two outer joins. The code was considering only the joins' own ON-conditions in determining reordering safety, which is not good enough. Add a "delay_upper_joins" flag to OuterJoinInfo to flag that we have detected such a clause and higher-level outer joins shouldn't be permitted to commute with this one. (This might seem overly coarse, but given the current rules for OJ reordering, it's sufficient AFAICT.) The failure case is actually pretty narrow: it needs a WHERE clause within the RHS of a left join that checks the RHS of a lower left join, but is not strict for that RHS (else we'd have simplified the lower join to a plain join). Even then no failure will be manifest unless the planner chooses to rearrange the join order. Per bug report from Adam Terrey.
* Fix best_inner_indexscan to return both the cheapest-total-cost andTom Lane2007-05-223-54/+83
| | | | | | | | | | | | cheapest-startup-cost innerjoin indexscans, and make joinpath.c consider both of these (when different) as the inside of a nestloop join. The original design was based on the assumption that indexscan paths always have negligible startup cost, and so total cost is the only important figure of merit; an assumption that's obviously broken by bitmap indexscans. This oversight could lead to choosing poor plans in cases where fast-start behavior is more important than total cost, such as LIMIT and IN queries. 8.1-vintage brain fade exposed by an example from Chuck D.
* Teach tuplestore.c to throw away data before the "mark" point when the callerTom Lane2007-05-215-35/+227
| | | | | | | | | | | | is using mark/restore but not rewind or backward-scan capability. Insert a materialize plan node between a mergejoin and its inner child if the inner child is a sort that is expected to spill to disk. The materialize shields the sort from the need to do mark/restore and thereby allows it to perform its final merge pass on-the-fly; while the materialize itself is normally cheap since it won't spill to disk unless the number of tuples with equal key values exceeds work_mem. Greg Stark, with some kibitzing from Tom Lane.
* XPath fixes:Peter Eisentraut2007-05-211-94/+111
| | | | | | | | | | | | | | | - Function renamed to "xpath". - Function is now strict, per discussion. - Return empty array in case when XPath expression detects nothing (previously, NULL was returned in such case), per discussion. - (bugfix) Work with fragments with prologue: select xpath('/a', '<?xml version="1.0"?><a /><b />'); // now XML datum is always wrapped with dummy <x>...</x>, XML prologue simply goes away (if any). - Some cleanup. Nikolay Samokhvalov Some code cleanup and documentation work by myself.
* To support external compression of archived WAL data, add a flag bit toTom Lane2007-05-203-11/+38
| | | | | | | | | | | | | | | | | | WAL records that shows whether it is safe to remove full-page images (ie, whether or not an on-line backup was in progress when the WAL entry was made). Also make provision for an XLOG_NOOP record type that can be used to fill in the extra space when decompressing the data for restore. This is the portion of Koichi Suzuki's "full page writes" patch that has to go into the core database. The remainder of that work is two external compression and decompression programs, which for the time being will undergo separate development on pgfoundry. Per discussion. Also, twiddle the handling of BTREE_SPLIT records to ensure it'll be possible to compress them (the previous coding caused essential info to be omitted). The other commonly-used record types seem OK already, with the possible exception of GIN and GIST WAL records, which I don't understand well enough to opine on.