Is there an existing issue for the same tech request?
Does this tech request not affect user experience?
What would you like to be added ?
When a fulltext index query has additional WHERE filters on the source table (e.g. WHERE category='news' AND match(content) against('keyword')), push down a BloomFilter of the filtered PKs to the fulltext index table scan, so that irrelevant doc_id rows are skipped at the reader level.
Plan structure (when pushdown is enabled):
outerJoin(scanNode, innerJoin(ft_func_chain, secondScanProject))
secondScanProject: scans the source table with the non-fulltext filters, outputs only PK
- BloomFilter runtime filter:
secondScan(build) → ft_func(probe) — pushes filtered PKs into fulltext index table scan
- IN-list runtime filter:
innerJoin(build) → scanNode(probe) — pushes fulltext match results back to source table scan
Why is this needed ?
Currently, fulltext index queries scan the entire index table regardless of additional WHERE conditions on the source table. For example:
SELECT * FROM articles
WHERE category = 'news'
AND match(content) against('database' in natural language mode);
The fulltext index scan processes all doc_id entries, then JOINs with the source table where category='news' filters out most rows. This wastes significant I/O when the WHERE condition is highly selective.
With BloomFilter pre-filter pushdown:
- First scan the source table with
category='news' to collect matching PKs
- Build a BloomFilter from these PKs
- Push the BloomFilter down to the fulltext index table reader
- Skip blocks/rows whose
doc_id doesn't pass the BloomFilter check
This significantly reduces I/O for fulltext queries with selective non-fulltext filters.
Is there an existing issue for the same tech request?
Does this tech request not affect user experience?
What would you like to be added ?
When a fulltext index query has additional WHERE filters on the source table (e.g.
WHERE category='news' AND match(content) against('keyword')), push down a BloomFilter of the filtered PKs to the fulltext index table scan, so that irrelevantdoc_idrows are skipped at the reader level.Plan structure (when pushdown is enabled):
secondScanProject: scans the source table with the non-fulltext filters, outputs only PKsecondScan(build) → ft_func(probe)— pushes filtered PKs into fulltext index table scaninnerJoin(build) → scanNode(probe)— pushes fulltext match results back to source table scanWhy is this needed ?
Currently, fulltext index queries scan the entire index table regardless of additional WHERE conditions on the source table. For example:
The fulltext index scan processes all
doc_identries, then JOINs with the source table wherecategory='news'filters out most rows. This wastes significant I/O when the WHERE condition is highly selective.With BloomFilter pre-filter pushdown:
category='news'to collect matching PKsdoc_iddoesn't pass the BloomFilter checkThis significantly reduces I/O for fulltext queries with selective non-fulltext filters.