[FLINK-39202][table] Add new SimplifyCoalesceWithEquiJoinConditionRule#27733
[FLINK-39202][table] Add new SimplifyCoalesceWithEquiJoinConditionRule#27733gustavodemorais wants to merge 4 commits intoapache:masterfrom
Conversation
| simplified = true; | ||
|
|
||
| // Handle potential type mismatch by adding a CAST if needed | ||
| if (call.getType().equals(preservedRef.getType())) { |
There was a problem hiding this comment.
why LogicalTypeCast#supportsImplicitCast is not an option here?
There was a problem hiding this comment.
It's a good option 👍
I think in practice the types in the join conditions are always or almost always the same so the behavior will be the same but I think with implicit casts it's slightly more broad and accurate.
| void testCoalesceOnInnerJoinEquiKey() { | ||
| util.verifyRelPlan( | ||
| "SELECT COALESCE(b.order_id, a.order_id) AS order_id " | ||
| + "FROM orders a INNER JOIN order_details b ON a.order_id = b.order_id"); |
There was a problem hiding this comment.
what about cases with COALESCE in condition?
is it out of scope?
There was a problem hiding this comment.
Yeah, it's out of scope and doesn't apply here: when it's in the join condition, we have to evaluate it for all records. So it happens before the join and there's no preserved side or something like that
| */ | ||
| @Internal | ||
| @Value.Enclosing | ||
| public class SimplifyCoalesceWithEquiJoinConditionRule |
There was a problem hiding this comment.
Do we need a separate rule or we can combine it with RemoveUnreachableCoalesceArgumentsRule?
There was a problem hiding this comment.
I think having different rules is cleaner since they work differently:
- RemoveUnreachableCoalesceArgumentsRule trims arguments based on type nullability metadata (e.g., COALESCE(nonNull, x) -> nonNull), without any join awareness.
- This one is specific to joins and merging both would look less elegant IMO.
…k for type equality
| void testNestedCoalesceInExpression() { | ||
| util.verifyRelPlan( | ||
| "SELECT CAST(COALESCE(b.order_id, a.order_id) AS STRING) AS order_id_str " | ||
| + "FROM orders a LEFT JOIN order_details b ON a.order_id = b.order_id"); | ||
| } |
There was a problem hiding this comment.
I noticed it is not really about nested...
I tried to played locally with nested like I tweaked order_details in a way
util.tableEnv()
.executeSql(
"CREATE TABLE order_details ("
+ " r ROW<order_id BIGINT NOT NULL, order_name STRING NOT NULL> NOT NULL, "
+ " detail STRING,"
+ " PRIMARY KEY (r) NOT ENFORCED"
+ ") WITH ('connector' = 'values')");and then the test
util.verifyRelPlan(
"SELECT CAST(COALESCE(b.r, ROW(a.order_id, 'e')) AS STRING) AS order_id_str "
+ "FROM orders a LEFT JOIN order_details b ON a.order_id = b.r.order_id");and it shows that COALESCE is still present in optimized rel plan
e.g.
Calc(select=[CAST(COALESCE(r, ROW(order_id, 'e')) AS VARCHAR(2147483647)) AS order_id_str])
+- Join(joinType=[LeftOuterJoin], where=[=(order_id, $f2)], select=[order_id, r, $f2], leftInputSpec=[JoinKeyContainsUniqueKey], rightInputSpec=[HasUniqueKey])
:- Exchange(distribution=[hash[order_id]])
: +- TableSourceScan(table=[[default_catalog, default_database, orders, project=[order_id], metadata=[]]], fields=[order_id])
+- Exchange(distribution=[hash[$f2]])
+- Calc(select=[r, r.order_id AS $f2])
+- TableSourceScan(table=[[default_catalog, default_database, order_details, project=[r], metadata=[]]], fields=[r])
There was a problem hiding this comment.
since r is declared as ROW(...) NOT NULL i would expect removal of COALESCE
There was a problem hiding this comment.
The name was misleading 👍 The test is actually about being inside the cast. Renamed it to testCoalesceWrappedInCast.
There was a problem hiding this comment.
Regarding structured type/row support:
- We support structured type field access. Added a new test testCoalesceOnNestedRowScalarField
- We don't support the case you mentioned. 'order_name' and the literal 'e' here are not always equal, so we cannot simplify this
- Additionally, entire rows as join conditions are out of scope - also not a common/recommended practice afaik
…stCoalesceOnNestedRowScalarField
| util.tableEnv() | ||
| .executeSql( | ||
| "CREATE TABLE order_details_row (" | ||
| + " r ROW<order_id BIGINT NOT NULL, order_name STRING NOT NULL> NOT NULL, " | ||
| + " detail STRING," | ||
| + " PRIMARY KEY (r) NOT ENFORCED" | ||
| + ") WITH ('connector' = 'values')"); | ||
|
|
There was a problem hiding this comment.
I guess we should follow same approach and move to the top of test class
| util.verifyRelPlan( | ||
| "SELECT COALESCE(b.order_id, a.order_id, 0) AS order_id " | ||
| + "FROM orders a LEFT JOIN order_details b ON a.order_id = b.order_id"); |
There was a problem hiding this comment.
Looks like I found an issue with COALESCE
it works if we have NOT NULL at the first position and don't have NULLs
however it doesn't seem to work if we have NULL at first position
like query
SELECT COALESCE(NULL, a.order_id, 0) AS order_id
"FROM orders a LEFT JOIN order_details b ON a.order_id = b.order_iddoesn't matter how many args > 1
and even more interesting thing: doesn't matter whether null is at the first position or not
for instance this query also doesn't remove coalesce
SELECT COALESCE(b.order_id, NULL, a.order_id, 0) AS order_id
"FROM orders a LEFT JOIN order_details b ON a.order_id = b.order_idThere was a problem hiding this comment.
btw if I add NULL at the end it works
like
SELECT COALESCE(b.order_id, a.order_id, NULL) AS order_id
"FROM orders a LEFT JOIN order_details b ON a.order_id = b.order_id
What is the purpose of the change
Add a new planner optimization rule that simplifies COALESCE expressions referencing columns from opposite sides of an equi-join condition, replacing them with a direct column reference to the preserved (non-null) side.
Brief change log
Verifying this change
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (no)Documentation