cloud-landerox-data

Public GCP Data Architecture Baseline: Hybrid Warehouse/Lakehouse with Batch + Streaming

What this repository is

This repository contains architecture guidance, standards, diagram templates, and folder placeholders for GCP data platforms:

Event-driven ingestion patterns with Cloud Functions + Pub/Sub
Stream and batch processing patterns with Dataflow (Apache Beam)
Data organization patterns with Bronze/Silver/Gold conventions on GCS + BigQuery
Architecture documentation and pattern placeholders for pipeline design decisions
Alignment with a separate Terraform infrastructure repository for GCP provisioning

Scope note:

This repository does not host production runtime implementations.
Production runtime code is expected in private runtime repositories.
Infrastructure provisioning is expected in a separate Terraform repository.

Current status

Production folders exist but are still placeholders:
- functions/ingestion/
- functions/trigger/
- dataflow/pipelines/
Runtime modules are intentionally not implemented in this public baseline.
CI currently runs quality gates (lint, type checks via pre-commit, tests).
This repository is maintained as a public baseline (docs, patterns, templates).
Concrete production pipelines should live in private runtime repositories per project context.

Architecture stance

This project is intentionally hybrid, not pure Kappa or pure Lambda:

Use warehouse-first patterns when BigQuery native tables are fastest to deliver value.
Use lakehouse patterns (BigLake + open formats) when interoperability and file-based processing matter.
Run streaming and batch side by side; choose per source SLA, data shape, and cost profile.
Treat Data Mesh as an organizational model, not a mandatory runtime pattern for this repo.
Apply cross-cutting controls: data contracts, schema evolution, DLQ/replay, idempotency, quality gates, observability/SLO, and governance baselines.

See the full decision model in docs/architecture.md.

Reference technology map

Category	Technologies in scope
Processing	Cloud Functions, Pub/Sub, Dataflow
Storage & query	GCS, BigQuery, BigLake
Table formats	Apache Iceberg (primary lakehouse table format), BigQuery native tables
File formats	JSON/NDJSON, Avro, Parquet
Optional ecosystem	Databricks/Delta interoperability considered when required by source/domain

Repository structure

Directory	Purpose
`functions/`	Cloud Function folder structure placeholders (public baseline)
`dataflow/`	Beam pipeline folder structure placeholders (public baseline)
`shared/common/`	Shared infrastructure utilities (I/O, logging, secrets)
`tests/`	Unit/integration tests mirroring source layout
`docs/`	Architecture, CI/CD, and engineering guidance

Scaling guidance (recommended)

For medium/large runtimes (for example, ~50 pipelines) in GCP:

Dataflow: organize by domain -> layer (bronze/silver/gold) -> pipeline module.
Functions: keep ingestion/ and trigger/, then group by domain and source/event purpose.
CI/CD: avoid one mega deploy pipeline; use shared CI plus selective CD by changed module in private runtime repos.

See details in:

Getting started

Prerequisites

Python 3.13+
uv
just

Setup

just sync
just pre-commit-install

Quality checks

just lint
just type
just test

Documentation

License

MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.config		.config
.devcontainer		.devcontainer
.github		.github
dataflow		dataflow
docs		docs
functions		functions
shared		shared
tests		tests
.editorconfig		.editorconfig
.gcloudignore		.gcloudignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cloud-landerox-data

What this repository is

Current status

Architecture stance

Reference technology map

Repository structure

Scaling guidance (recommended)

Getting started

Prerequisites

Setup

Quality checks

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cloud-landerox-data

What this repository is

Current status

Architecture stance

Reference technology map

Repository structure

Scaling guidance (recommended)

Getting started

Prerequisites

Setup

Quality checks

Documentation

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages