porter

module
v0.21.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 30, 2026 License: MIT

README ΒΆ

Porter Logo

Porter

A streaming-first Arrow server for DuckDB β€” Flight SQL and WebSocket, simple and built for motion.


🧭 Overview

Porter is a DuckDB-backed Arrow server with two transport protocols:

  • Flight SQL β€” gRPC-based Arrow Flight SQL
  • WebSocket β€” HTTP-based Arrow streaming

SQL goes in. Arrow streams out. Everything else is detail.

Both transports share the same execution engine, ensuring identical query semantics.


Summary Benchmark Results

Metric WebSocket FlightSQL (gRPC)
Ops 12 12
Success 12 12
Errors 0 0
Rows/sec 130,712,427 121,704,008
Throughput 1014.32 MB/s 928.53 MB/s
Latency p50 26 ms 17 ms
Latency p95 41 ms 60 ms
Latency p99 41 ms 60 ms

See the Benchmark Report for details.


⚑ Key Characteristics

  • Streaming-first execution model (Arrow RecordBatch streams)
  • Dual transport support: Flight SQL + WebSocket
  • Bulk Ingest β€” Arrow RecordBatch β†’ DuckDB with transactional semantics
  • Shared execution engine for semantic parity
  • Native DuckDB execution via ADBC
  • Full prepared statement lifecycle with parameter binding
  • TTL-based handle management with background GC
  • Live status surface with pipeline flow, pressure, and backpressure visibility

πŸ—οΈ Architecture

           +-------------------+
           |   Flight Client   |  <-- ADBC / Flight SQL
           +-------------------+
                     |
               gRPC / Flight
                     |
           +-------------------+
           |   Porter Server   |
           |-------------------|
           | Shared Engine     |  <-- BuildStream()
           +-------------------+
                     |
           +-------------------+
           |     DuckDB        |
           |   (via ADBC)     |
           +-------------------+
                     |
           +-------------------+
           | Arrow RecordBatches|
           +-------------------+

The server is intentionally thin: routing, lifecycle, and streaming glue only. DuckDB does the heavy lifting.


πŸš€ Getting Started

You have three ways to run Porter:

  • Docker (fastest path)
  • go install (clean local toolchain)
  • Build from source (full control)

🐳 Option 1 β€” Run with Docker

docker build -t porter .
docker run -p 32010:32010 -p 8080:8080 porter --ws

Run with a persistent database:

docker run -p 32010:32010 -p 8080:8080 -v $(pwd)/data:/data porter --db /data/porter.duckdb --ws

Defaults:

  • Flight SQL: 0.0.0.0:32010
  • WebSocket: 0.0.0.0:8080 (when --ws enabled)
  • Status: 0.0.0.0:9091 (enabled by default)
  • Database: in-memory (:memory:)

Prerequisites

Install dbc and required ADBC drivers:

curl -LsSf https://dbc.columnar.tech/install.sh | sh
dbc install duckdb
dbc install flightsql

βš™οΈ Option 2 β€” Install via go install

1. Install Porter
go install github.com/TFMV/porter/cmd/porter@latest

This installs porter into your $GOBIN.


πŸ›  Option 3 β€” Build from Source

1. Clone
git clone https://github.com/TFMV/porter.git
cd porter
2. Run
go run ./cmd/porter serve

πŸ’» CLI Usage

porter --help

Quick Start

porter              # Start Flight SQL server on :32010
porter serve        # Same as above

With WebSocket

porter --ws                        # Flight SQL + WebSocket
porter serve --ws                   # Same as above
porter serve --ws --ws-port 9090   # Custom WebSocket port
porter serve --status-port 9191    # Custom status surface

Full Flags

Flag Description Default
--db DuckDB file path :memory:
--port Flight SQL port 32010
--ws Enable WebSocket false
--ws-port WebSocket port 8080
--status Enable live status surface true
--status-port Status server port 9091

Execute a query

porter query "SELECT 1 AS value"

REPL

porter repl

Load Parquet

porter load data.parquet

Inspect schema

porter schema table_name

Environment variables

  • PORTER_DB
  • PORTER_PORT
  • PORTER_WS
  • PORTER_WS_PORT
  • PORTER_STATUS
  • PORTER_STATUS_PORT

Live Status Surface

Porter now exposes a dedicated status server with a living cross-section of the pipeline:

  • /status β€” live instrument panel UI
  • /status/live β€” current JSON snapshot
  • /status/stream β€” SSE stream of snapshots
  • /status/history β€” rolling snapshot history
  • /status/health β€” deterministic health status

The flow view tracks:

  • ingress -> transport -> execution -> egress
  • rows/sec and MB/sec per stage
  • queue depth and pressure buildup
  • p50/p95/p99 latency divergence
  • live structured activity feed
  • WebSocket vs FlightSQL vs ingest path comparison

Porter Status


🌐 Wire Contract

Flight SQL

Operation Behavior
SQL Query Raw SQL β†’ FlightInfo β†’ DoGet stream
Prepared Statements Handle-based execution with binding
Schema Introspection Lightweight probe execution
ExecuteUpdate DDL/DML via DoPutCommandStatementUpdate

WebSocket

Send JSON query request:

{"query": "SELECT * FROM table"}

Receive:

  1. Schema message: {"type": "schema", "fields": ["col1", "col2"]}
  2. Binary IPC frames containing Arrow RecordBatches

πŸ“₯ Bulk Ingest

Porter supports high-throughput Arrow RecordBatch ingestion via Flight SQL's DoPut:

// Engine interface
IngestStream(ctx, table, reader, opts) (int64, error)

Features:

Feature Description
Transactional One stream = one DB transaction
Schema validation Incoming Arrow schema must match target table
Backpressure Configurable MaxUncommittedBytes (default 64MB)
Table locking Per-table mutex prevents concurrent writes to same table
Auto-commit Automatically commits on successful ingest, rolls back on failure

IngestOptions:

Option Description
Catalog Target catalog name
DBSchema Target schema name
Temporary Create as temporary table
IngestMode Append, replace, or create
MaxUncommittedBytes Memory limit before fail-fast (default 64MB)

Flow:

Client β†’ DoPut (Arrow RecordBatch stream) β†’ Engine.IngestStream β†’ SegmentWriter β†’ Commit β†’ DuckDB

The SegmentWriter accumulates RecordBatches in memory, then atomically publishes them on commit. If MaxUncommittedBytes is exceeded, ingestion fails fast with rollback.


🌊 Streaming Core

Both transports use the same execution primitive:

BuildStream(ctx, sql, params) (*arrow.Schema, <-chan StreamChunk, error)
DuckDB β†’ Arrow RecordReader β†’ Channel β†’ StreamChunk

Backpressure is enforced naturally via the channel boundary.


πŸ›£οΈ Roadmap

  • Streaming Flight SQL execution
  • WebSocket transport
  • Shared execution engine
  • Bulk Ingest (DoPut)
  • Prepared statements
  • TTL-based lifecycle
  • Background GC
  • Session context
  • Improved schema probing
  • Benchmark suite

🀝 Contributing

If you've ever looked at a data system and thought:

"Why is this so complicated?"

You're in the right place.

Build it smaller. Make it clearer. Keep it moving.

Directories ΒΆ

Path Synopsis
bench
flight command
transport command
(same imports as before)
(same imports as before)
ws command
cmd
client/flight command
client/ws command
porter command
execution
adapter/flightsql
Package server provides a production-grade DuckDB-backed Arrow Flight SQL server.
Package server provides a production-grade DuckDB-backed Arrow Flight SQL server.
internal
testutil

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL