Remove redundant grouping and DISTINCT columns.

Avoid explicitly grouping by columns that we know are redundant for sorting, for example we need group by only one of x and y in SELECT ... WHERE x = y GROUP BY x, y This comes up more often than you might think, as shown by the changes in the regression tests. It's nearly free to detect too, since we are just piggybacking on the existing logic that detects redundant pathkeys. (In some of the existing plans that change, it's visible that a sort step preceding the grouping step already didn't bother to sort by the redundant column, making the old plan a bit silly-looking.) To do this, build processed_groupClause and processed_distinctClause lists that omit any provably-redundant sort items, and consult those not the originals where relevant. This means that within the planner, one should usually consult root->processed_groupClause or root->processed_distinctClause if one wants to know which columns are to be grouped on; but to check whether grouping or distinct-ing is happening at all, check non-NIL-ness of parse->groupClause or parse->distinctClause. This is comparable to longstanding rules about handling the HAVING clause, so I don't think it'll be a huge maintenance problem. nodeAgg.c also needs minor mods, because it's now possible to generate AGG_PLAIN and AGG_SORTED Agg nodes with zero grouping columns. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/185315.1672179489@sss.pgh.pa.us
author: Tom Lane <tgl@sss.pgh.pa.us> 2023-01-18 12:37:57 -0500
committer: Tom Lane <tgl@sss.pgh.pa.us> 2023-01-18 12:37:57 -0500
commit: 8d83a5d0a2673174dc478e707de1f502935391a5 (patch)
tree: abfdce0820345a412b8a046ed8832b915094f031 /src/include/nodes/pathnodes.h
parent: d540a02a724b9643205abce8c5644a0f0908f6e3 (diff)
download: postgresql-8d83a5d0a2673174dc478e707de1f502935391a5.tar.gz
1 files changed, 28 insertions, 0 deletions
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c20b7298a3..2d1d8f4bcd 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -407,6 +407,34 @@ struct PlannerInfo
 	struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
 
 	/*
+	 * The fully-processed groupClause is kept here.  It differs from
+	 * parse->groupClause in that we remove any items that we can prove
+	 * redundant, so that only the columns named here actually need to be
+	 * compared to determine grouping.  Note that it's possible for *all* the
+	 * items to be proven redundant, implying that there is only one group
+	 * containing all the query's rows.  Hence, if you want to check whether
+	 * GROUP BY was specified, test for nonempty parse->groupClause, not for
+	 * nonempty processed_groupClause.
+	 *
+	 * Currently, when grouping sets are specified we do not attempt to
+	 * optimize the groupClause, so that processed_groupClause will be
+	 * identical to parse->groupClause.
+	 */
+	List	   *processed_groupClause;
+
+	/*
+	 * The fully-processed distinctClause is kept here.  It differs from
+	 * parse->distinctClause in that we remove any items that we can prove
+	 * redundant, so that only the columns named here actually need to be
+	 * compared to determine uniqueness.  Note that it's possible for *all*
+	 * the items to be proven redundant, implying that there should be only
+	 * one output row.  Hence, if you want to check whether DISTINCT was
+	 * specified, test for nonempty parse->distinctClause, not for nonempty
+	 * processed_distinctClause.
+	 */
+	List	   *processed_distinctClause;
+
+	/*
 	 * The fully-processed targetlist is kept here.  It differs from
 	 * parse->targetList in that (for INSERT) it's been reordered to match the
 	 * target table, and defaults have been filled in.  Also, additional
author	Tom Lane <tgl@sss.pgh.pa.us>	2023-01-18 12:37:57 -0500
committer	Tom Lane <tgl@sss.pgh.pa.us>	2023-01-18 12:37:57 -0500
commit	8d83a5d0a2673174dc478e707de1f502935391a5 (patch)
tree	abfdce0820345a412b8a046ed8832b915094f031 /src/include/nodes/pathnodes.h
parent	d540a02a724b9643205abce8c5644a0f0908f6e3 (diff)
download	postgresql-8d83a5d0a2673174dc478e707de1f502935391a5.tar.gz