Implement function materialization#588
Conversation
|
Related to #590 |
|
Hi @shachibista! Thanks for the contribution. This is a really interesting functionality and I just started checking the implementation details, so I still don't have a full picture of the work needed. I'm sorry I can't give complete feedback yet, but I wanted to ask a few questions to get your opinion:
|
Ah yes, I missed that.
You are correct, I tried to use the default implementation but did not realise that dbt-adapters needed to be updated too. Should be unnecessary after update.
Unfortunately we cannot use them because ClickHouse functions expect an expression: However the tests defined in the adapters repo expect the function body to be a
What errors are you getting? I did not try to run it from a model but I currently get the following when trying to run from a model. I am not sure what the reason is. |
| {% endmacro %} | ||
|
|
||
| {% macro clickhouse__scalar_function_create_replace_signature_sql(target_relation) %} | ||
| CREATE OR REPLACE FUNCTION {{ target_relation.include(database=false, schema=false) }} {{ on_cluster_clause(this) }} AS ({{ clickhouse__formatted_scalar_function_args_sql() }}) -> |
There was a problem hiding this comment.
Inconsistent relation used for cluster clause
Medium Severity
The clickhouse__scalar_function_create_replace_signature_sql macro receives target_relation as a parameter and uses it for the function name, but then uses the global this variable for on_cluster_clause(this). This is inconsistent with the established pattern throughout the codebase where the same relation is used for both the identifier and the cluster clause. If target_relation and this ever differ, the ON CLUSTER clause may be generated incorrectly or omitted unexpectedly.
2a75b23 to
d90afdc
Compare
|
Hi @shachibista ! Sorry for not continuing giving more feedback. This PR is really interesting for us and I like to review it with care, but we have not prioritised this feature right now. My idea is to work with you to try to merge this PR in near future. I'll give you piece of feedback again as soon as I can. In the meantime, some feedback:About your changes in the extra spaces in the changelog: I'm ok if you want to move the small changelog fixes to a different PR. That way we can reduce the size of this one and we can merge the other asap. About the tests in dbt-tests-adapter:
You should be able to override the definition of the files to make them match the CH expectations. That way we should reuse the rest of the tests. Would you try? Maybe we something like this https://github.com/ClickHouse/dbt-clickhouse/blob/main/tests/integration/adapter/unit_testing/test_types.py so the tests functions are reused and we just override the part that are different in CH. About the errors when using the created UDFs:
I recall seeing similar ones. For this feature to be merged we need not just to define the UDFs, but also use it in models later. Please, try to check it again to see if we may be missing something 🙏 . About dbt-adapter version change: I see somethings are failing related to the dbt-adapter version change. I'm going to try to fix it in a separated PR so you don't have to deal with it in this PR Also another question: The default UDF materialization in dbt mentions it supports both Python and SQL functions. Did you tested both? Can you also add a test to assert that Python one works? Thanks! |
- Use UDF tests from dbt-adapter - Fire event during SQLQuery (test expectation) - Fix function relation
5b7d70f to
271df27
Compare
| if schema == relation_config.source_name and relation_config.database: | ||
| schema = relation_config.database | ||
| elif relation_config.resource_type == NODE_TYPE_FUNCTION: | ||
| schema = None |
There was a problem hiding this comment.
Functions never get ON CLUSTER clause in clusters
Medium Severity
The elif branch for NODE_TYPE_FUNCTION in create_from sets schema = None but falls through without reading cluster and database_engine from quoting.credentials, since that logic is in the else block. This leaves cluster as "" and can_on_cluster as None, so on_cluster_clause(this) in scalar.sql will never emit the ON CLUSTER clause — even when a cluster is configured. The cluster/database_engine credential reads need to happen for function types too.
Additional Locations (1)
Ah, it was a side-effect of the linter I was using.
I have now changed the tests to re-use the dbt-adapter tests. Although had to add events so that the tests worked as expected, probably a good thing if other tests depend on it and we want to reuse tests in the future.
ClickHouse seems to support python UDFs but they need to be configured on the server. So I deem it outside dbt's responsibility. I have not tested, nor intend to implement it. |


Summary
Adds support for User Defined Functions.
Checklist
Delete items not relevant to your PR:
Note
Medium Risk
Introduces a new relation type and DDL macros for creating ClickHouse functions, plus upgrades to
dbt-core>=1.11.0; these may affect relation rendering/schema handling and overall adapter compatibility. Connection logging now emitsSQLQueryevents, which could change log volume/behavior.Overview
Adds initial support for dbt User Defined Functions on ClickHouse by introducing a
functionrelation type (treated as global by ignoring schema) and new macros toCREATE OR REPLACE FUNCTIONfor scalar UDFs.Upgrades the adapter version to
1.11.0and bumps development dependency todbt-core>=1.11.0, adds an integration test (tests/integration/adapter/udf/test_udf.py) covering basic UDF creation, and updates connection execution to emitSQLQueryevents (with optional SQL log abridging) before running statements.Written by Cursor Bugbot for commit 37472f6. This will update automatically on new commits. Configure here.