Skip to content

SQL overview#573

Open
kbatuigas wants to merge 14 commits into
rp-sqlfrom
DOC-2049-redpanda-sql-introduction-and-overview
Open

SQL overview#573
kbatuigas wants to merge 14 commits into
rp-sqlfrom
DOC-2049-redpanda-sql-introduction-and-overview

Conversation

@kbatuigas
Copy link
Copy Markdown
Contributor

@kbatuigas kbatuigas commented May 4, 2026

This pull request makes significant improvements to the Redpanda SQL documentation, focusing on restructuring and clarifying key concepts, updating navigation, and enhancing learning objectives and use case explanations. The most important changes are summarized below.

Documentation Restructuring and Navigation Updates:

  • The main overview for Redpanda SQL has been rewritten and moved to a new file, overview.adoc, which now serves as the entry point for understanding Redpanda SQL, its architecture, and use cases. The previous overview file, what-is-redpanda-sql.adoc, has been deleted, and navigation links have been updated accordingly. [1] [2] [3]

Content and Conceptual Enhancements:

  • The new overview provides a detailed explanation of Redpanda SQL’s architecture, supported workloads, query patterns, and technical differentiators, including vectorized execution, columnar storage, decoupled storage/compute, and optimized data transfer.
  • The oltp-vs-olap.adoc page has been updated to clarify the distinction between OLTP and OLAP in the context of streaming data, and now includes explicit learning objectives and personas. [1] [2]

Reference and Comparison Improvements:

  • The redpanda-sql-vs-postgresql.adoc page has been enhanced to clarify its purpose as a reference, add learning objectives, and include a TODO for further engineering review of compatibility differences. The section on error handling differences has also been clarified. [1] [2]

Catalogs and Querying Workflow Clarification:

  • The redpanda-catalogs.adoc page has been rewritten to clarify the Redpanda catalog model, its components, and typical usage, including examples and learning objectives. The page topic type is now set to "concept" and personas are specified.

References:
[1] [2] [3] [4] [5] [6] [7] [8]

Resolves https://github.com/redpanda-data/documentation-private/issues/
Review deadline: 19 May

Page previews

Redpanda SQL > Get Started > Redpanda SQL Overview
Redpanda SQL > Get Started > Redpanda SQL Overview > OLTP vs OLAP
Redpanda SQL > Get Started > Redpanda SQL Overview > Redpanda SQL vs PostgreSQL
Redpanda SQL > Query Data > Redpanda Catalogs

Checks

  • New feature
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

@netlify
Copy link
Copy Markdown

netlify Bot commented May 4, 2026

Deploy Preview for rp-cloud ready!

Name Link
🔨 Latest commit d278b7f
🔍 Latest deploy log https://app.netlify.com/projects/rp-cloud/deploys/6a0bd9263111020008a9bf5f
😎 Deploy Preview https://deploy-preview-573--rp-cloud.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 4, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b0e53aa4-43b3-4d5a-a5c0-802e5206fdb5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch DOC-2049-redpanda-sql-introduction-and-overview

Comment @coderabbitai help to get the list of available commands and usage tips.

@kbatuigas kbatuigas force-pushed the DOC-2049-redpanda-sql-introduction-and-overview branch 2 times, most recently from 3597669 to 908c8b1 Compare May 11, 2026 19:54
@kbatuigas kbatuigas marked this pull request as ready for review May 11, 2026 23:16
@kbatuigas kbatuigas requested a review from a team as a code owner May 11, 2026 23:16
@kbatuigas kbatuigas requested a review from takidau May 11, 2026 23:21
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the tables under Functions and Mathematical operators as they didn't seem to describe any actual differences from PostgreSQL, and so may not be worth keeping. Are there any actual known differences w.r.t. functions and operators (other than the one with JSON)?

@kbatuigas kbatuigas force-pushed the DOC-2049-redpanda-sql-introduction-and-overview branch from d341c3b to 60a4bec Compare May 13, 2026 18:46
@PeterCorless
Copy link
Copy Markdown

Feedback for : https://deploy-preview-573--rp-cloud.netlify.app/redpanda-cloud/sql/get-started/redpanda-sql-vs-postgresql/

Currently:

Redpanda SQL aims for close compatibility with PostgreSQL but differs in some functions, operators, and behaviors. Use this page to check which features are supported and where Redpanda SQL diverges from PostgreSQL.

Suggested edit to above paragraph:

Redpanda SQL aims for close compatibility to PostgreSQL semantics, yet differs significantly in design and function. Use this page to check which features are supported and where Redpanda SQL diverges from PostgreSQL.

For example, PostgreSQL is an online transactional processing (OLTP) database by default, whereas Redpanda SQL is an online analytical processing (OLAP) query engine.

Many transaction processing functions for PostgreSQL are not available in Redpanda SQL, such as the ability to write or upsert data directly.

Instead, Redpanda SQL relies upon Apache Kakfa-compatible topics to be written into Redpanda Streaming. Redpanda SQL can then query against topics in local storage (for a "hot storage" tier), as well as Apache Iceberg-compatible tables written to object storage (for "cold storage"). Redpanda SQL performs a federated query, using the topics as a row-store, and the Iceberg tables as a column-store, performing a seamless, deduplicated join across both.

Another key thing to note: Redpanda SQL, while semantically compatible, is not code compatible with PostgreSQL. It cannot use common PostgreSQL plugins such as pgvector, PostGIS, or pg_cron.

kbatuigas and others added 14 commits May 18, 2026 20:27
… and catalogs

Tightens the PostgreSQL framing in the overview (compatible query engine
implementing the Postgres wire protocol and a Postgres-based dialect, not
a full Postgres database). Aligns Iceberg references with the v1 product
scope: only Iceberg tables created from Iceberg-enabled Redpanda topics
are queryable; no external Iceberg lakehouses or REST catalogs. Collapses
the overview's "Query Iceberg tables" and "Bridge queries" sections into
"Query Iceberg topics".

Rewrites the Redpanda Catalogs page with the named-collection-of-source-data
framing, leads with default_redpanda_connection auto-creation, and adds a
storage > catalog > tables hierarchy. Replaces the prior CREATE-flow
walkthrough with a smaller demo using default_redpanda_connection.

Per PM SME 2026-05-07.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kbatuigas kbatuigas force-pushed the DOC-2049-redpanda-sql-introduction-and-overview branch from b65e870 to d278b7f Compare May 19, 2026 03:29
:learning-objective-2: Identify the query patterns Redpanda SQL supports
:learning-objective-3: Describe the architectural characteristics that enable those patterns

Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grzebiel do we fully support JDBC ?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|Quick results for frequently accessed data
|Consistently fast response to requests

|Audience
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbatuigas The rest of this page looks good but this one feels really weird to me. What does 'market-oriented information' even mean.

Better way to rephrase @adam-szymanski @ndrsbl ?

@@ -0,0 +1,105 @@
= Redpanda SQL Overview
:description: Redpanda SQL is a column-oriented OLAP query engine in Redpanda Cloud BYOC for querying live and Iceberg-translated Redpanda topics with PostgreSQL syntax.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"live topics and their history in Iceberg with PostgreSQL syntax"

:learning-objective-2: Identify the query patterns Redpanda SQL supports
:learning-objective-3: Describe the architectural characteristics that enable those patterns

Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'translation' is an internal/low-level implementation detail.

"Redpanda SQL turns your live Redpanda topics and their history in Apache Iceberg into queryable SQL tables inside your Redpanda Bring Your Own Cloud (BYOC) cluster"


Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip.

Redpanda SQL handles a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"You can power real-time and agentic AI applications, create (BI) dashboards, run time-series analytics, and perform exploratory queries over large datasets without moving data, switching tools, learning new APIs or SQL dialects, or maintaining separate systems to house real-time streams and analytic lakehouse data."


== Why use Redpanda SQL

Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Redpanda SQL eliminates this overhead by querying both live and historical data in place.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, managing connector fleets, copying data between systems, and running multiple resource-intensive analytical engines. Redpanda SQL eliminates this overhead by querying both live and historical data in place."

* [ ] {learning-objective-1}
* [ ] {learning-objective-2}

NOTE: Redpanda SQL operates in read-only mode. Data mutation operations such as `INSERT`, `UPDATE`, and `DELETE` are not available. Data is ingested into Redpanda topics and made queryable through catalog mappings.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't "Data Manipulation Language ('DML') operations" the more correct term to use here @adam-szymanski ?

schema_registry_url = 'http://schema-registry:8081'
);
----
The Redpanda catalog model has three components, in hierarchy order:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'hierarchical order'?

----
The Redpanda catalog model has three components, in hierarchy order:

* Storage connection: A named connection to object storage that backs the catalog. The default Redpanda catalog's storage connection is automatically defined using your cluster's object storage.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'object storage bucket/container'

The Redpanda catalog model has three components, in hierarchy order:

* Storage connection: A named connection to object storage that backs the catalog. The default Redpanda catalog's storage connection is automatically defined using your cluster's object storage.
* Catalog: A named collection of source data, typically your Redpanda cluster.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"* Catalog: A named collection of source data, typically containing your Redpanda cluster's topics"

[source,sql]
----
CREATE TABLE production_redpanda=>user_events
CREATE TABLE default_redpanda_catalog=>user_events
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbatuigas I think we also want to explain the layered catalog model here.

Meaning something like this:

"To query topics with history in Apache Iceberg (Iceberg Topic), a redpanda catalog is created USING an existing iceberg catalog

<>

In Redpanda BYOC, both catalogs are pre-created for the BYOC cluster, allowing you to immediately query iceberg topics in the local cluster with a simple CREATE TABLE statement"

I say this because, they need to understand that this single catalog is a 'layered catalog' (maybe we should use that term), and not a vanilla redpanda catalog , at least in BYOC. And, if they do a DESCRIBE on the catalog, I think they will see this iceberg information that shows the catalog was created USING the iceberg catalog (albeit automatically, but they can see this fact). Without this it could be confusing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants