SQL overview by kbatuigas · Pull Request #573 · redpanda-data/cloud-docs

kbatuigas · 2026-05-04T22:47:06Z

This pull request makes significant improvements to the Redpanda SQL documentation, focusing on restructuring and clarifying key concepts, updating navigation, and enhancing learning objectives and use case explanations. The most important changes are summarized below.

Documentation Restructuring and Navigation Updates:

The main overview for Redpanda SQL has been rewritten and moved to a new file, overview.adoc, which now serves as the entry point for understanding Redpanda SQL, its architecture, and use cases. The previous overview file, what-is-redpanda-sql.adoc, has been deleted, and navigation links have been updated accordingly. [1] [2] [3]

Content and Conceptual Enhancements:

The new overview provides a detailed explanation of Redpanda SQL’s architecture, supported workloads, query patterns, and technical differentiators, including vectorized execution, columnar storage, decoupled storage/compute, and optimized data transfer.
The oltp-vs-olap.adoc page has been updated to clarify the distinction between OLTP and OLAP in the context of streaming data, and now includes explicit learning objectives and personas. [1] [2]

Reference and Comparison Improvements:

The redpanda-sql-vs-postgresql.adoc page has been enhanced to clarify its purpose as a reference, add learning objectives, and include a TODO for further engineering review of compatibility differences. The section on error handling differences has also been clarified. [1] [2]

Catalogs and Querying Workflow Clarification:

The redpanda-catalogs.adoc page has been rewritten to clarify the Redpanda catalog model, its components, and typical usage, including examples and learning objectives. The page topic type is now set to "concept" and personas are specified.

References:
[1] [2] [3] [4] [5] [6] [7] [8]

Resolves https://github.com/redpanda-data/documentation-private/issues/
Review deadline: 19 May

Page previews

Redpanda SQL > Get Started > Redpanda SQL Overview
Redpanda SQL > Get Started > Redpanda SQL Overview > OLTP vs OLAP
Redpanda SQL > Get Started > Redpanda SQL Overview > Redpanda SQL vs PostgreSQL
Redpanda SQL > Query Data > Redpanda Catalogs

Checks

New feature
Content gap
Support Follow-up
Small fix (typos, links, copyedits, etc)

netlify · 2026-05-04T22:47:11Z

✅ Deploy Preview for rp-cloud ready!

Name	Link
🔨 Latest commit	`d278b7f`
🔍 Latest deploy log	https://app.netlify.com/projects/rp-cloud/deploys/6a0bd9263111020008a9bf5f
😎 Deploy Preview	https://deploy-preview-573--rp-cloud.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2026-05-04T22:47:14Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b0e53aa4-43b3-4d5a-a5c0-802e5206fdb5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch DOC-2049-redpanda-sql-introduction-and-overview

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kbatuigas · 2026-05-12T16:58:27Z

Removed the tables under Functions and Mathematical operators as they didn't seem to describe any actual differences from PostgreSQL, and so may not be worth keeping. Are there any actual known differences w.r.t. functions and operators (other than the one with JSON)?

PeterCorless · 2026-05-14T20:30:54Z

Feedback for : https://deploy-preview-573--rp-cloud.netlify.app/redpanda-cloud/sql/get-started/redpanda-sql-vs-postgresql/

Currently:

Redpanda SQL aims for close compatibility with PostgreSQL but differs in some functions, operators, and behaviors. Use this page to check which features are supported and where Redpanda SQL diverges from PostgreSQL.

Suggested edit to above paragraph:

Redpanda SQL aims for close compatibility to PostgreSQL semantics, yet differs significantly in design and function. Use this page to check which features are supported and where Redpanda SQL diverges from PostgreSQL.

For example, PostgreSQL is an online transactional processing (OLTP) database by default, whereas Redpanda SQL is an online analytical processing (OLAP) query engine.

Many transaction processing functions for PostgreSQL are not available in Redpanda SQL, such as the ability to write or upsert data directly.

Instead, Redpanda SQL relies upon Apache Kakfa-compatible topics to be written into Redpanda Streaming. Redpanda SQL can then query against topics in local storage (for a "hot storage" tier), as well as Apache Iceberg-compatible tables written to object storage (for "cold storage"). Redpanda SQL performs a federated query, using the topics as a row-store, and the Iceberg tables as a column-store, performing a seamless, deduplicated join across both.

Another key thing to note: Redpanda SQL, while semantically compatible, is not code compatible with PostgreSQL. It cannot use common PostgreSQL plugins such as pgvector, PostGIS, or pg_cron.

… and catalogs Tightens the PostgreSQL framing in the overview (compatible query engine implementing the Postgres wire protocol and a Postgres-based dialect, not a full Postgres database). Aligns Iceberg references with the v1 product scope: only Iceberg tables created from Iceberg-enabled Redpanda topics are queryable; no external Iceberg lakehouses or REST catalogs. Collapses the overview's "Query Iceberg tables" and "Bridge queries" sections into "Query Iceberg topics". Rewrites the Redpanda Catalogs page with the named-collection-of-source-data framing, leads with default_redpanda_connection auto-creation, and adds a storage > catalog > tables hierarchy. Replaces the prior CREATE-flow walkthrough with a smaller demo using default_redpanda_connection. Per PM SME 2026-05-07. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mmaslankaprv · 2026-05-19T07:41:03Z

+:learning-objective-2: Identify the query patterns Redpanda SQL supports
+:learning-objective-3: Describe the architectural characteristics that enable those patterns
+
+Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip.


@grzebiel do we fully support JDBC ?

We do, AFAIK, see https://redpandadata.atlassian.net/browse/OXLA-6135

mattschumpert · 2026-05-19T22:12:34Z

 |Quick results for frequently accessed data
 |Consistently fast response to requests

 |Audience


@kbatuigas The rest of this page looks good but this one feels really weird to me. What does 'market-oriented information' even mean.

Better way to rephrase @adam-szymanski @ndrsbl ?

mattschumpert · 2026-05-19T22:19:26Z

@@ -0,0 +1,105 @@
+= Redpanda SQL Overview
+:description: Redpanda SQL is a column-oriented OLAP query engine in Redpanda Cloud BYOC for querying live and Iceberg-translated Redpanda topics with PostgreSQL syntax.


"live topics and their history in Iceberg with PostgreSQL syntax"

mattschumpert · 2026-05-19T22:22:19Z

+:learning-objective-2: Identify the query patterns Redpanda SQL supports
+:learning-objective-3: Describe the architectural characteristics that enable those patterns
+
+Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip.


'translation' is an internal/low-level implementation detail.

"Redpanda SQL turns your live Redpanda topics and their history in Apache Iceberg into queryable SQL tables inside your Redpanda Bring Your Own Cloud (BYOC) cluster"

mattschumpert · 2026-05-19T22:36:41Z

+
+Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip.
+
+Redpanda SQL handles a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems.


"You can power real-time and agentic AI applications, create (BI) dashboards, run time-series analytics, and perform exploratory queries over large datasets without moving data, switching tools, learning new APIs or SQL dialects, or maintaining separate systems to house real-time streams and analytic lakehouse data."

mattschumpert · 2026-05-19T22:38:33Z

+
+== Why use Redpanda SQL
+
+Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Redpanda SQL eliminates this overhead by querying both live and historical data in place.


"Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, managing connector fleets, copying data between systems, and running multiple resource-intensive analytical engines. Redpanda SQL eliminates this overhead by querying both live and historical data in place."

mattschumpert · 2026-05-19T23:11:53Z

+* [ ] {learning-objective-1}
+* [ ] {learning-objective-2}

 NOTE: Redpanda SQL operates in read-only mode. Data mutation operations such as `INSERT`, `UPDATE`, and `DELETE` are not available. Data is ingested into Redpanda topics and made queryable through catalog mappings.


Isn't "Data Manipulation Language ('DML') operations" the more correct term to use here @adam-szymanski ?

mattschumpert · 2026-05-19T23:12:28Z

-  schema_registry_url = 'http://schema-registry:8081'
-);
----
+The Redpanda catalog model has three components, in hierarchy order:


'hierarchical order'?

mattschumpert · 2026-05-19T23:13:33Z

----
+The Redpanda catalog model has three components, in hierarchy order:
+
+* Storage connection: A named connection to object storage that backs the catalog. The default Redpanda catalog's storage connection is automatically defined using your cluster's object storage.


'object storage bucket/container'

mattschumpert · 2026-05-19T23:13:49Z

+The Redpanda catalog model has three components, in hierarchy order:
+
+* Storage connection: A named connection to object storage that backs the catalog. The default Redpanda catalog's storage connection is automatically defined using your cluster's object storage.
+* Catalog: A named collection of source data, typically your Redpanda cluster.


"* Catalog: A named collection of source data, typically containing your Redpanda cluster's topics"

mattschumpert · 2026-05-19T23:20:41Z

 [source,sql]
 ----
-CREATE TABLE production_redpanda=>user_events
+CREATE TABLE default_redpanda_catalog=>user_events


@kbatuigas I think we also want to explain the layered catalog model here.

Meaning something like this:

"To query topics with history in Apache Iceberg (Iceberg Topic), a redpanda catalog is created USING an existing iceberg catalog

<>

In Redpanda BYOC, both catalogs are pre-created for the BYOC cluster, allowing you to immediately query iceberg topics in the local cluster with a simple CREATE TABLE statement"

I say this because, they need to understand that this single catalog is a 'layered catalog' (maybe we should use that term), and not a vanilla redpanda catalog , at least in BYOC. And, if they do a DESCRIBE on the catalog, I think they will see this iceberg information that shows the catalog was created USING the iceberg catalog (albeit automatically, but they can see this fact). Without this it could be confusing

kbatuigas force-pushed the DOC-2049-redpanda-sql-introduction-and-overview branch 2 times, most recently from 3597669 to 908c8b1 Compare May 11, 2026 19:54

kbatuigas marked this pull request as ready for review May 11, 2026 23:16

kbatuigas requested a review from a team as a code owner May 11, 2026 23:16

kbatuigas requested a review from takidau May 11, 2026 23:21

kbatuigas commented May 12, 2026

View reviewed changes

kbatuigas requested review from mattschumpert and wkozlowski-oxla May 12, 2026 17:00

kbatuigas force-pushed the rp-sql branch from 248d62d to 3c582b2 Compare May 13, 2026 18:10

kbatuigas force-pushed the DOC-2049-redpanda-sql-introduction-and-overview branch from d341c3b to 60a4bec Compare May 13, 2026 18:46

kbatuigas force-pushed the rp-sql branch from e051360 to 1b9d587 Compare May 19, 2026 03:26

kbatuigas and others added 14 commits May 18, 2026 20:27

Draft SQL overview rewrite

f91f794

Add TODO to flesh out sql v pg

7713887

Move why RP SQL up

a775551

Minor edits

61c70ce

Review pass

f4cf4c6

Change to default_redpanda_catalog

e5783f5

Tweak overview learning objectives

a571e55

Review pass

6337e50

Intro rephrase

afd0929

Remove tables not describing meaningful differences with Postgres

c6ac004

Clarify Iceberg benefit of querying data outside of topic retention

03b6e82

Minor edit

2429016

Apply suggestions from SME feedback

d278b7f

kbatuigas force-pushed the DOC-2049-redpanda-sql-introduction-and-overview branch from b65e870 to d278b7f Compare May 19, 2026 03:29

mmaslankaprv reviewed May 19, 2026

View reviewed changes

mattschumpert approved these changes May 19, 2026

View reviewed changes

		@@ -0,0 +1,105 @@
		= Redpanda SQL Overview
		:description: Redpanda SQL is a column-oriented OLAP query engine in Redpanda Cloud BYOC for querying live and Iceberg-translated Redpanda topics with PostgreSQL syntax.


		Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip.

		Redpanda SQL handles a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems.


		== Why use Redpanda SQL

		Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Redpanda SQL eliminates this overhead by querying both live and historical data in place.

Conversation

kbatuigas commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Page previews

Checks

Uh oh!

netlify Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for rp-cloud ready!

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PeterCorless commented May 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kbatuigas commented May 4, 2026 •

edited

Loading

netlify Bot commented May 4, 2026 •

edited

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading