Better processing and optimization if IN <list> predicates by dyemanov · Pull Request #7707 · FirebirdSQL/firebird

dyemanov · 2023-08-10T17:53:27Z

This PR implements native IN processing rather than converting it into a binary tree of ORed predicates. Details are described below.

Processing is now linear rather than recursive, thus no runtime stack limitations. The artificial limit of 1500 items is gone. The current implementation has a hard-limit of 64K items just because BLR needs some limit and because this value looks OK from the sanity POV.
Lists that are known to be constant are pre-evaluated as invariants and cached as a binary search tree, thus making comparisons faster if condition needs to be checked for many rows or if the value list is long enough.
If the list is very long or if the IN predicate is not selective, the index scan supports searching groups using the sibling pointer (i.e. horizontally) rather than searching every group from root (i.e. vertically). This corresponds to Improve the index scan to match multiple values at once [CORE4600] #4915.

Performance results in TPC-R queries containing IN predicates:

Q12: ~ 2x improvement (list scan is used instead of full-scan with bitmap)
Q16: no difference (the plan is the same)
Q19: ~ 10x improvement (list scan is used instead of many range scans)
Q22: ~ 2x improvement (the plan is the same, but expression evaluation is more optimal)

The patch (in its original incarnation, based on v3 and v4 codebases) was tested by our customer in production since 2022 and looked stable, although they didn't see such a noticeable performance boost. But they needed longish lists (> 1500 items) more than a better performance ;-) When porting this into master I improved some things I didn't like originally.

…h some post-fixes and minor adjustments

… indices

…among concurrent requests

…t-based and sibling-based list scans inside the same plan node.

asfernandes · 2023-08-11T00:22:04Z


+class InListBoolNode : public TypedNode<BoolExprNode, ExprNode::TYPE_IN_LIST_BOOL>
+{
+	const static UCHAR blrOp = blr_in_list;


Why make a constant instead of use the blr directly in gen function?

It's used in two methods (also in internalPrint) and I don't like duplicating things ;-)

asfernandes · 2023-08-11T00:23:09Z

+
+	static DmlNode* parse(thread_db* tdbb, MemoryPool& pool, CompilerScratch* csb, const UCHAR blrOp);
+
+	virtual void getChildren(NodeRefsHolder& holder, bool dsql) const


As it is new code, worth change the virtual redeclarations to use override instead.

asfernandes · 2023-08-11T00:27:54Z

+
+	if (!(impure->vlu_flags & VLU_computed))
+	{
+		delete impure->vlu_misc.vlu_sortedList;


It is good to initialize the field to nullptr just after the delete.
Because if the new constructor fail, it could be left with garbage.
This is done after delete impure->vlu_misc.vlu_invariant; in SubstringSimilarNode::execute.

asfernandes · 2023-08-11T00:43:36Z

+	return MOV_compare(JRD_get_thread_data(), desc1, desc2);
+}
+
+LookupValueList::LookupValueList(MemoryPool& pool, const ValueListNode* values, ULONG impure)


Wouldn't be possible to avoid const_casts replacing some constructors to not use pointer to consts?

I will try.

…riano

* WIP * Original (circa 2022) implementation of the IN LIST optimization, with some post-fixes and minor adjustments * Make it possible to optimize IN <list> for middle segments in compund indices * Avoid modifying the retrieval structure at runtime, it may be shared among concurrent requests * Simplify the code a little. Better cost calculation. Support both root-based and sibling-based list scans inside the same plan node. * Removed the unneeded const casts and other changed as suggested by Adriano

dyemanov added 5 commits August 10, 2023 20:15

WIP

1235de2

Original (circa 2022) implementation of the IN LIST optimization, wit…

1e8c59e

…h some post-fixes and minor adjustments

Make it possible to optimize IN <list> for middle segments in compund…

65926b8

… indices

Avoid modifying the retrieval structure at runtime, it may be shared …

7100879

…among concurrent requests

Simplify the code a little. Better cost calculation. Support both roo…

38349ab

…t-based and sibling-based list scans inside the same plan node.

dyemanov self-assigned this Aug 10, 2023

asfernandes reviewed Aug 11, 2023

View reviewed changes

Removed the unneeded const casts and other changed as suggested by Ad…

960de0e

…riano

dyemanov added type: improvement fix-version: 5.0 RC 1 labels Sep 4, 2023

dyemanov merged commit 0493422 into master Sep 4, 2023

dyemanov deleted the in-list branch September 4, 2023 06:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better processing and optimization if IN <list> predicates#7707

Better processing and optimization if IN <list> predicates#7707
dyemanov merged 6 commits intomasterfrom
in-list

dyemanov commented Aug 10, 2023

Uh oh!

asfernandes Aug 11, 2023

Uh oh!

dyemanov Aug 11, 2023

Uh oh!

asfernandes Aug 11, 2023

Uh oh!

dyemanov Aug 11, 2023

Uh oh!

asfernandes Aug 11, 2023

Uh oh!

dyemanov Aug 11, 2023

Uh oh!

asfernandes Aug 11, 2023

Uh oh!

dyemanov Aug 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		static DmlNode* parse(thread_db* tdbb, MemoryPool& pool, CompilerScratch* csb, const UCHAR blrOp);

		virtual void getChildren(NodeRefsHolder& holder, bool dsql) const

Uh oh!

Conversation

dyemanov commented Aug 10, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants