Skip to main content

ClickHouse meets SeaORM: Arrow-powered data pipeline

ยท 5 min read
SeaQL Team
Chris Tsang
SeaORM 2.0 Banner

sea-clickhouse is a ClickHouse client that integrates with the SeaQL ecosystem. It is a soft fork of clickhouse-rs: 100% compatible with all upstream features, with SeaQL's dynamic type and Arrow support added on top. The fork is continually rebased on upstream.

In this blog post we cover:

  • Dynamic rows with try_get: fetch query results without defining any schema struct
  • Arrow RecordBatch streaming: stream query results as RecordBatches and insert them back into ClickHouse
  • SeaORM to ClickHouse: convert SeaORM entities to Arrow and insert into ClickHouse
  • Arrow Schema to ClickHouse DDL: derive CREATE TABLE DDL from an Arrow schema, no hand-written SQL

Quick Installโ€‹

[dependencies]
# a drop-in replacement of clickhouse
sea-clickhouse = { version = "0.14", features = ["arrow", "chrono", "rust_decimal"] }

Ergonomic dynamic column type with try_getโ€‹

The native clickhouse-rs client requires you to define a #[derive(Row)] struct whose field types match the query output exactly:

#[derive(Row, Deserialize)]
struct Reading {
#[serde(with = "special_deserializer")] // or use a custom deserializer?
temperature: f64, // f32? f64? depends on the SQL expression
}

let mut cursor = client.query("SELECT ...").fetch::<Reading>()?;

If the struct says f32 but the server returns Float64, you get a runtime deserialization error. For computed columns, ad-hoc queries, or SELECT * on evolving tables, maintaining these structs is fragile.

sea-clickhouse reads the column types from the response metadata and maps them to sea_query::Value automatically:

let mut cursor = client.query("SELECT 1::UInt32 + 1::Float32 AS value").fetch_rows()?;
let row = cursor.next().await?.expect("one row");

// UInt32 + Float32 -> Float64
assert_eq!(row.try_get::<f64, _>(0)?, 2.0); // by index
assert_eq!(row.try_get::<f32, _>(0)?, 2.0); // narrower type also works
assert_eq!(row.try_get::<Decimal, _>("value")?, 2.into()); // by column name

try_get coerces at runtime: access by index or column name, request the type you need, and it converts where possible.

Arrow RecordBatch Streamingโ€‹

next_arrow_batch(chunk_size) streams query results as Arrow RecordBatches, ready for DataFusion, Polars, Parquet export, or any Arrow consumer.

use clickhouse::Client;
use sea_orm_arrow::arrow::{array::RecordBatch, util::pretty};

let client = Client::default().with_url("http://localhost:18123");

let sql = r#"
SELECT
toUInt64(id) AS id,
toDateTime64(recorded_at, 6) AS recorded_at,
toInt32(sensor_id) AS sensor_id,
toFloat64(temperature) AS temperature,
toDecimal128(voltage, 4) AS voltage
FROM sensor_data
LIMIT 1000
"#;

let mut cursor = client.query(sql).fetch_rows()?;
let mut batches: Vec<RecordBatch> = Vec::new();
while let Some(batch) = cursor.next_arrow_batch(100).await? {
batches.push(batch);
}
pretty::print_batches(&batches).unwrap();
+----+-------------------------+-----------+----------------------+---------+
| id | recorded_at | sensor_id | temperature | voltage |
+----+-------------------------+-----------+----------------------+---------+
| 1 | 2026-01-01T13:35:36.736 | 106 | 36.345616831016436 | 3.2736 |
| 2 | 2026-01-01T10:07:38.458 | 108 | 10.122001773336567 | 3.3458 |
| 3 | 2026-01-01T01:15:18.518 | 108 | 35.21406789966149 | 3.1518 |
| 4 | 2026-01-01T05:36:57.017 | 107 | 22.92828141235666 | 3.2016 |
| 5 | 2026-01-01T13:17:36.056 | 106 | -2.082591477369223 | 3.0056 |
| ... |
+----+-------------------------+-----------+----------------------+---------+

Those batches can be inserted back into ClickHouse directly:

let mut insert = client.insert_arrow("sensor_data_processed", &batches[0].schema()).await?;
for batch in &batches {
insert.write_batch(batch).await?;
}
insert.end().await?;

No #[derive(Row)] macros. The Arrow schema carries the type information end-to-end.

SeaORM to ClickHouseโ€‹

SeaORM started as an OLTP toolkit, but with Arrow and now ClickHouse support, SeaQL bridges data engineering with data science: your OLTP entities become the source of truth for OLAP pipelines too.

use sea_orm::entity::prelude::*;

#[sea_orm::model]
#[derive(Clone, Debug, PartialEq, DeriveEntityModel)]
#[sea_orm(table_name = "measurement", arrow_schema)]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
pub recorded_at: ChronoDateTime,
pub sensor_id: i32,
pub temperature: f64,
#[sea_orm(column_type = "Decimal(Some((38, 4)))")]
pub voltage: Decimal,
}

Build models, convert to a RecordBatch, and insert:

use sea_orm::ArrowSchema;

let models: Vec<measurement::ActiveModel> = vec![..];

let schema = measurement::Entity::arrow_schema();
let batch = measurement::ActiveModel::to_arrow(&models, &schema)?;

let mut insert = client.insert_arrow("measurement", &schema).await?;
insert.write_batch(&batch).await?;
insert.end().await?;

arrow_schema on the entity derives the Arrow schema at compile time. to_arrow converts a slice of ActiveModels into a RecordBatch. From there, insert_arrow streams the batch into ClickHouse over HTTP. See the full working example.

Arrow Schema to ClickHouse DDLโ€‹

ClickHouseSchema::from_arrow derives a full CREATE TABLE statement from any Arrow schema, and exposes a SeaQuery-like fluent API to configure ClickHouse-specific attributes:

use clickhouse::schema::{ClickHouseSchema, Engine};

let mut schema = ClickHouseSchema::from_arrow(&batch.schema());
schema
.table_name("sensor_readings")
.engine(Engine::ReplacingMergeTree)
.primary_key(["recorded_at", "device"]);
schema.find_column_mut("device").set_low_cardinality(true);

let ddl = schema.to_string();
client.query(&ddl).execute().await?;

Generated DDL:

CREATE TABLE sensor_readings (
id UInt64,
recorded_at DateTime64(6),
device LowCardinality(String),
temperature Nullable(Float64),
voltage Decimal(38, 4)
) ENGINE = ReplacingMergeTree()
PRIMARY KEY (recorded_at, device)

The full workflow: Arrow -> derive DDL -> create table -> insert batches. Zero hand-written ClickHouse SQL. All examples shown here are available as runnable examples in the repository.

๐Ÿฆ€ Rustacean Sticker Packโ€‹

The Rustacean Sticker Pack is the perfect way to express your passion for Rust. Our stickers are made with a premium water-resistant vinyl with a unique matte finish.

Sticker Pack Contents:

  • Logo of SeaQL projects: SeaQL, SeaORM, SeaQuery, Seaography
  • Mascots: Ferris the Crab x 3, Terres the Hermit Crab
  • The Rustacean wordmark

Support SeaQL and get a Sticker Pack!

Rustacean Sticker Pack by SeaQL