
Day 1 of Windmill launch week. We are shipping two new storage primitives: Data Tables for relational data with managed SQL, and Ducklake for massive datasets backed by S3.
The problem
Workflow engines typically punt on data storage. You end up managing separate databases, connection strings, credential rotation, and permission models. Your orchestration layer knows how to run code but has no opinion about where results go.
For analytics workloads, the gap is wider. Teams default to managed data warehouses (Snowflake, BigQuery) that charge per query and live entirely outside the orchestration layer. The result: two systems, two permission models, and a lot of glue code to move data between them.
We wanted Windmill users to go from "I have a script" to "I have a script that reads and writes data" without leaving the platform.
Data Tables: managed SQL with one line
Data Tables give you a workspace-scoped PostgreSQL layer where credentials are managed by Windmill. Users write SQL; they never see connection strings.
- TypeScript
- Python
- DuckDB
import * as wmill from 'windmill-client';
export async function main(user_id: string) {
let sql = wmill.datatable();
// String interpolation is safe: auto-converted to parameterized queries
let friend = await sql`SELECT * FROM friend WHERE id = ${user_id}`.fetchOne();
return friend;
}
import wmill
def main(user_id: str):
db = wmill.datatable()
# Positional arguments for safe parameterized queries
friend = db.query('SELECT * FROM friend WHERE id = $1', user_id).fetch_one()
return friend
-- $user_id (bigint)
ATTACH 'datatable' AS dt;
USE dt;
SELECT * FROM friend WHERE id = $user_id;
That's it. One import, one function call, standard SQL. TypeScript uses tagged template literals that are automatically converted to parameterized queries, so string interpolation is safe by default. Python uses positional arguments ($1, $2). DuckDB uses the native ATTACH syntax.
Why PostgreSQL
We chose PostgreSQL because:
- It is the most widely understood SQL dialect. No new query language to learn.
- Battle-tested ACID guarantees out of the box.
- DuckDB can attach to Postgres natively, so Data Tables and Ducklake share the same query surface.
Organizing with schemas
We recommend using one or a few Data Tables per workspace and organizing data with schemas:
let sql = wmill.datatable(':analytics');
await sql`SELECT * FROM events`; // refers to analytics.events
This keeps things clean without spinning up separate databases for every project.
Why we built it this way
Three design choices drove the architecture:
Workspace scoping. Data Tables are scoped to a workspace. All members can read and write. This removes the need for database-level user management while keeping workspaces isolated from each other.
Credential opacity. Users never see or manage database connection strings. Windmill handles credentials internally. This eliminates a whole class of credential-rotation bugs and accidental leaks.
Bring your own Postgres. You attach a workspace Postgres resource to the data table. Windmill manages credentials internally so users never see connection strings. This gives you full control over database hosting while keeping the API simple.
Asset tracking and data lineage
When you reference a Data Table in a script, Windmill automatically parses your code and detects which tables you read from and write to.
Assets appear as nodes in flows, giving you a visual data dependency graph. Click any asset node to open it in the Database Studio and inspect the data directly.
Ducklake: S3-backed data lakehouse
Data Tables are great for transactional data. But some workloads produce millions of rows that do not belong in a relational database. For those, we built Ducklake support directly into Windmill.
Ducklake stores data as Parquet files in S3 and keeps a metadata catalog in Postgres. You query it with standard SQL through DuckDB.
The API follows the same pattern as Data Tables:
- TypeScript
- Python
- DuckDB
import * as wmill from 'windmill-client';
export async function main(user_id: string) {
let sql = wmill.ducklake();
let friend = await sql`SELECT * FROM friend WHERE id = ${user_id}`.fetchOne();
return friend;
}
import wmill
def main(user_id: str):
dl = wmill.ducklake()
friend = dl.query('SELECT * FROM friend WHERE id = $id', id=user_id).fetch_one()
return friend
-- $user_id (bigint)
ATTACH 'ducklake' AS dl;
USE dl;
SELECT * FROM friend WHERE id = $user_id;
Real-world example: sentiment analysis pipeline
Here is a DuckDB script that receives analyzed messages (e.g., from an LLM sentiment analysis step in a flow) and inserts them into a Ducklake table. Each insert creates a new Parquet file in S3 and updates the catalog metadata.
-- $messages (json[])
ATTACH 'ducklake://main' AS dl;
USE dl;
CREATE TABLE IF NOT EXISTS messages (
content STRING NOT NULL,
author STRING NOT NULL,
date STRING NOT NULL,
sentiment STRING
);
CREATE TEMP TABLE new_messages AS
SELECT
value->>'content' AS content,
value->>'author' AS author,
value->>'date' AS date,
value->>'sentiment' AS sentiment
FROM json_each($messages);
INSERT INTO messages
SELECT * FROM new_messages;
Under the hood, TypeScript and Python integrations run DuckDB inline within the same worker. No separate job is spawned, so the overhead is minimal.
Why Ducklake over a data warehouse
| Ducklake | Managed warehouse | |
|---|---|---|
| Storage format | Parquet on your S3 | Proprietary |
| Query engine | DuckDB (single-node, in-process) | Managed cluster |
| Catalog | Postgres (already in your stack) | Proprietary |
| Cost model | S3 storage pricing | Per-query compute pricing |
| Lock-in | None: standard Parquet files | High |
Ducklake gives you a data lakehouse without new infrastructure. Your data stays in an open format on storage you control. The catalog metadata lives in the same Postgres that Windmill already uses. And DuckDB handles analytical queries on a single node without cluster management.
Database Studio
Both Data Tables and Ducklake are browsable through the Database Studio, a visual interface for inspecting schemas, editing rows, and running SQL.
Building data pipelines
Data Tables and Ducklake are designed to work with Windmill flows. A typical data pipeline looks like this:
- Extract: a script pulls data from an external source (API, webhook, database).
- Transform: one or more steps clean, enrich, or aggregate the data using Python, TypeScript, or SQL.
- Load: the result is written to a Data Table for operational use or to Ducklake for analytical queries.
Because each step is a standalone script, you can mix languages freely. For example, fetch data with TypeScript, run a sentiment analysis in Python, and insert the results with a DuckDB query into Ducklake. Windmill handles the orchestration, retries, error handling, and data lineage tracking.
You can also schedule pipelines with cron triggers, react to events with webhooks, or chain them with other flows.
Getting started
Data Tables:
- Go to workspace settings, then Data Tables.
- Add your own Postgres resource.
- Use
wmill.datatable()in any script.
Ducklake:
- Configure a workspace S3 storage.
- Go to workspace settings, then Object storage and set up a Ducklake.
- Use
wmill.ducklake()in any script.
What's next
Tomorrow is Day 2: Full code apps. Build complete applications with React and Svelte frontends connected to Windmill backend logic. Follow along.
You can self-host Windmill using a
docker compose up, or go with the cloud app.
