SQL overview#573
Conversation
✅ Deploy Preview for rp-cloud ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
3597669 to
908c8b1
Compare
There was a problem hiding this comment.
Removed the tables under Functions and Mathematical operators as they didn't seem to describe any actual differences from PostgreSQL, and so may not be worth keeping. Are there any actual known differences w.r.t. functions and operators (other than the one with JSON)?
d341c3b to
60a4bec
Compare
|
Feedback for : https://deploy-preview-573--rp-cloud.netlify.app/redpanda-cloud/sql/get-started/redpanda-sql-vs-postgresql/ Currently:
Suggested edit to above paragraph: Redpanda SQL aims for close compatibility to PostgreSQL semantics, yet differs significantly in design and function. Use this page to check which features are supported and where Redpanda SQL diverges from PostgreSQL. For example, PostgreSQL is an online transactional processing (OLTP) database by default, whereas Redpanda SQL is an online analytical processing (OLAP) query engine. Many transaction processing functions for PostgreSQL are not available in Redpanda SQL, such as the ability to write or upsert data directly. Instead, Redpanda SQL relies upon Apache Kakfa-compatible topics to be written into Redpanda Streaming. Redpanda SQL can then query against topics in local storage (for a "hot storage" tier), as well as Apache Iceberg-compatible tables written to object storage (for "cold storage"). Redpanda SQL performs a federated query, using the topics as a row-store, and the Iceberg tables as a column-store, performing a seamless, deduplicated join across both. Another key thing to note: Redpanda SQL, while semantically compatible, is not code compatible with PostgreSQL. It cannot use common PostgreSQL plugins such as pgvector, PostGIS, or pg_cron. |
… and catalogs Tightens the PostgreSQL framing in the overview (compatible query engine implementing the Postgres wire protocol and a Postgres-based dialect, not a full Postgres database). Aligns Iceberg references with the v1 product scope: only Iceberg tables created from Iceberg-enabled Redpanda topics are queryable; no external Iceberg lakehouses or REST catalogs. Collapses the overview's "Query Iceberg tables" and "Bridge queries" sections into "Query Iceberg topics". Rewrites the Redpanda Catalogs page with the named-collection-of-source-data framing, leads with default_redpanda_connection auto-creation, and adds a storage > catalog > tables hierarchy. Replaces the prior CREATE-flow walkthrough with a smaller demo using default_redpanda_connection. Per PM SME 2026-05-07. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b65e870 to
d278b7f
Compare
| :learning-objective-2: Identify the query patterns Redpanda SQL supports | ||
| :learning-objective-3: Describe the architectural characteristics that enable those patterns | ||
|
|
||
| Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip. |
There was a problem hiding this comment.
We do, AFAIK, see https://redpandadata.atlassian.net/browse/OXLA-6135
| |Quick results for frequently accessed data | ||
| |Consistently fast response to requests | ||
|
|
||
| |Audience |
There was a problem hiding this comment.
@kbatuigas The rest of this page looks good but this one feels really weird to me. What does 'market-oriented information' even mean.
Better way to rephrase @adam-szymanski @ndrsbl ?
| @@ -0,0 +1,105 @@ | |||
| = Redpanda SQL Overview | |||
| :description: Redpanda SQL is a column-oriented OLAP query engine in Redpanda Cloud BYOC for querying live and Iceberg-translated Redpanda topics with PostgreSQL syntax. | |||
There was a problem hiding this comment.
"live topics and their history in Iceberg with PostgreSQL syntax"
| :learning-objective-2: Identify the query patterns Redpanda SQL supports | ||
| :learning-objective-3: Describe the architectural characteristics that enable those patterns | ||
|
|
||
| Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip. |
There was a problem hiding this comment.
'translation' is an internal/low-level implementation detail.
"Redpanda SQL turns your live Redpanda topics and their history in Apache Iceberg into queryable SQL tables inside your Redpanda Bring Your Own Cloud (BYOC) cluster"
|
|
||
| Redpanda SQL turns your Redpanda glossterm:topic[,topics], including their Iceberg-translated history, into queryable SQL surfaces inside your Redpanda Bring Your Own Cloud (BYOC) glossterm:cluster[]. Built as a column-oriented online analytical processing (OLAP) engine, Redpanda SQL runs analytical queries over streaming and historical data without moving or duplicating data. It is a PostgreSQL-compatible query engine that implements the PostgreSQL wire protocol and a PostgreSQL-based SQL dialect, so you can connect with any PostgreSQL client, including `psql`, JDBC, DBeaver, and DataGrip. | ||
|
|
||
| Redpanda SQL handles a wide range of analytical workloads in a single system. You can power real-time business intelligence (BI) dashboards, process log data, run time-series analytics, and perform exploratory queries over large datasets without switching tools or maintaining separate systems. |
There was a problem hiding this comment.
"You can power real-time and agentic AI applications, create (BI) dashboards, run time-series analytics, and perform exploratory queries over large datasets without moving data, switching tools, learning new APIs or SQL dialects, or maintaining separate systems to house real-time streams and analytic lakehouse data."
|
|
||
| == Why use Redpanda SQL | ||
|
|
||
| Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, copying data between systems, and running multiple analytical engines. Redpanda SQL eliminates this overhead by querying both live and historical data in place. |
There was a problem hiding this comment.
"Querying real-time streaming data alongside historical lakehouse data typically means building ETL pipelines, managing connector fleets, copying data between systems, and running multiple resource-intensive analytical engines. Redpanda SQL eliminates this overhead by querying both live and historical data in place."
| * [ ] {learning-objective-1} | ||
| * [ ] {learning-objective-2} | ||
|
|
||
| NOTE: Redpanda SQL operates in read-only mode. Data mutation operations such as `INSERT`, `UPDATE`, and `DELETE` are not available. Data is ingested into Redpanda topics and made queryable through catalog mappings. |
There was a problem hiding this comment.
Isn't "Data Manipulation Language ('DML') operations" the more correct term to use here @adam-szymanski ?
| schema_registry_url = 'http://schema-registry:8081' | ||
| ); | ||
| ---- | ||
| The Redpanda catalog model has three components, in hierarchy order: |
| ---- | ||
| The Redpanda catalog model has three components, in hierarchy order: | ||
|
|
||
| * Storage connection: A named connection to object storage that backs the catalog. The default Redpanda catalog's storage connection is automatically defined using your cluster's object storage. |
There was a problem hiding this comment.
'object storage bucket/container'
| The Redpanda catalog model has three components, in hierarchy order: | ||
|
|
||
| * Storage connection: A named connection to object storage that backs the catalog. The default Redpanda catalog's storage connection is automatically defined using your cluster's object storage. | ||
| * Catalog: A named collection of source data, typically your Redpanda cluster. |
There was a problem hiding this comment.
"* Catalog: A named collection of source data, typically containing your Redpanda cluster's topics"
| [source,sql] | ||
| ---- | ||
| CREATE TABLE production_redpanda=>user_events | ||
| CREATE TABLE default_redpanda_catalog=>user_events |
There was a problem hiding this comment.
@kbatuigas I think we also want to explain the layered catalog model here.
Meaning something like this:
"To query topics with history in Apache Iceberg (Iceberg Topic), a redpanda catalog is created USING an existing iceberg catalog
<>
In Redpanda BYOC, both catalogs are pre-created for the BYOC cluster, allowing you to immediately query iceberg topics in the local cluster with a simple CREATE TABLE statement"
I say this because, they need to understand that this single catalog is a 'layered catalog' (maybe we should use that term), and not a vanilla redpanda catalog , at least in BYOC. And, if they do a DESCRIBE on the catalog, I think they will see this iceberg information that shows the catalog was created USING the iceberg catalog (albeit automatically, but they can see this fact). Without this it could be confusing
This pull request makes significant improvements to the Redpanda SQL documentation, focusing on restructuring and clarifying key concepts, updating navigation, and enhancing learning objectives and use case explanations. The most important changes are summarized below.
Documentation Restructuring and Navigation Updates:
overview.adoc, which now serves as the entry point for understanding Redpanda SQL, its architecture, and use cases. The previous overview file,what-is-redpanda-sql.adoc, has been deleted, and navigation links have been updated accordingly. [1] [2] [3]Content and Conceptual Enhancements:
oltp-vs-olap.adocpage has been updated to clarify the distinction between OLTP and OLAP in the context of streaming data, and now includes explicit learning objectives and personas. [1] [2]Reference and Comparison Improvements:
redpanda-sql-vs-postgresql.adocpage has been enhanced to clarify its purpose as a reference, add learning objectives, and include a TODO for further engineering review of compatibility differences. The section on error handling differences has also been clarified. [1] [2]Catalogs and Querying Workflow Clarification:
redpanda-catalogs.adocpage has been rewritten to clarify the Redpanda catalog model, its components, and typical usage, including examples and learning objectives. The page topic type is now set to "concept" and personas are specified.References:
[1] [2] [3] [4] [5] [6] [7] [8]
Resolves https://github.com/redpanda-data/documentation-private/issues/
Review deadline: 19 May
Page previews
Redpanda SQL > Get Started > Redpanda SQL Overview
Redpanda SQL > Get Started > Redpanda SQL Overview > OLTP vs OLAP
Redpanda SQL > Get Started > Redpanda SQL Overview > Redpanda SQL vs PostgreSQL
Redpanda SQL > Query Data > Redpanda Catalogs
Checks