Skip to content

feat: add array_sum scalar function#22542

Open
crm26 wants to merge 1 commit into
apache:mainfrom
crm26:feat/array-sum
Open

feat: add array_sum scalar function#22542
crm26 wants to merge 1 commit into
apache:mainfrom
crm26:feat/array-sum

Conversation

@crm26
Copy link
Copy Markdown
Contributor

@crm26 crm26 commented May 26, 2026

Which issue does this PR close?

Partial of #21536array_sum (first of the array aggregates in the series).

Rationale for this change

Continues the per-function split sequence requested by @alamb on #21536. Four sibling PRs already merged: cosine_distance (#21542), inner_product (#21861), array_normalize (#22013), array_scale (#22466). array_add is in flight as #22459 by @SubhamSinghal.

array_sum is the first of the three array-aggregate functions (sum, product, avg). Its semantics set the pattern for the other two aggregates.

What changes are included in this PR?

  • New scalar UDF array_sum(array) in datafusion/functions-nested/src/array_sum.rs
  • Module wire-up + registration in datafusion/functions-nested/src/lib.rs
  • SLT tests at datafusion/sqllogictest/test_files/array_sum.slt
  • Auto-generated docs entry in docs/source/user-guide/sql/scalar_functions.md

Signature: `List/LargeList/FixedSizeList` in, `Float64` out (one scalar per row). Numeric inner types coerced to `Float64`.

NULL semantics — SQL aggregate convention (deliberate divergence from binary-op siblings):

  • NULL row → NULL row out
  • NULL elements are skipped, matching PostgreSQL `array_sum`, DuckDB `list_sum`, Spark `aggregate`. Binary-op siblings (`inner_product`, `array_normalize`) null-row on NULL element because their per-element operation is undefined on NULL; aggregates conventionally skip NULLs in SQL.
  • All-NULL row → NULL out (matches `SUM(...)` over an all-NULL column)

Empty array → `0.0` (additive identity; matches DuckDB `list_sum([]) = 0`).

Alias: `list_sum` (matches the precedent of `array_normalize`→`list_normalize`, `array_scale`→`list_scale`).

Are these changes tested?

Yes. SLT covers happy paths, empty arrays, NULL row, NULL elements (mix + all-NULL), all list variants (List/LargeList/FixedSizeList), numeric coercion (Float32/Int64/integer literals), multi-row composition, error paths, return type, and the `list_sum` alias.

Are there any user-facing changes?

Yes — new SQL scalar function `array_sum(array)` and its alias `list_sum`.

Adds `array_sum(array)` returning the sum of elements in a numeric array.
Aliased as `list_sum`. Part of the per-function split sequence on
tracking issue apache#21536, following the pattern of the already-merged PRs
in this series (cosine_distance apache#21542, inner_product apache#21861,
array_normalize apache#22013, array_scale apache#22466).

Semantics:
- NULL row in array -> NULL row out
- NULL elements are skipped (SQL aggregate convention; matches
  PostgreSQL array_sum, DuckDB list_sum, Spark aggregate). A row whose
  every element is NULL yields NULL.
- Empty array -> 0.0 (additive identity, matches SQL SUM over no rows
  conceptually, and DuckDB list_sum([]) = 0)

Input is List/LargeList/FixedSizeList of any numeric type; elements
are coerced to Float64. Output is Float64.
@github-actions github-actions Bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant