Skip to main content

SeaORM now supports Arrow & Parquet

ยท 11 min read
SeaQL Team
Chris Tsang
SeaORM 2.0 Banner

SeaORM 2.0 adds native Apache Arrow and Parquet support. Derive an Arrow schema directly from your SeaORM entity: no redundant schema definitions, no drift.

Motivationโ€‹

Traditional ORMs are built for OLTP. But Rust backends increasingly need to:

  • Export data snapshots to object storage (S3, GCS)
  • Feed analytical pipelines (DataFusion, Polars, DuckDB)
  • Archive time-series rows efficiently in columnar format
  • Seed or replicate databases from Parquet files

Arrow is the lingua franca of in-memory columnar data. Parquet is its on-disk counterpart. Both are supported by the entire modern data stack.

The problem: you've already defined your schema as SeaORM entities. Redefining it as an Arrow schema is redundant and error-prone. SeaORM now comes with Arrow support out-of-the-box!

Getting Startedโ€‹

Enable Arrow support with the with-arrow feature flag:

[dependencies]
sea-orm = { version = "2.0.0-rc", features = ["with-arrow"] }
parquet = { version = "54", features = ["arrow"] }

Suppose you have a sensor data pipeline. You want to archive today's rows to Parquet for downstream analytics.

Arrow Schema Derivationโ€‹

Add arrow_schema to the #[sea_orm(..)] attribute on your entity:

measurement.rs
use sea_orm::entity::prelude::*;

#[sea_orm::model] // <- new Entity
#[derive(Clone, Debug, PartialEq, DeriveEntityModel)]
#[sea_orm(table_name = "measurement", arrow_schema)] // <- enable Arrow
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
pub recorded_at: ChronoDateTimeUtc,
pub sensor_id: i32,
pub temperature: f64,
#[sea_orm(column_type = "Decimal(Some((10, 4)))")]
pub voltage: Decimal,
}

(for compact entity)

measurement.rs
#[derive(DeriveEntityModel, DeriveArrowSchema, ..)] // <- extra derive
#[sea_orm(table_name = "measurement")]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
..
}

This derives the ArrowSchema trait on Entity and ActiveModel, exposing three methods:

use sea_orm::ArrowSchema;

// Get the Arrow Schema matching your entity
let schema = measurement::Entity::arrow_schema();

// Serialize a slice of ActiveModels into an Arrow RecordBatch
let batch = measurement::ActiveModel::to_arrow(&models, &schema)?;

// Deserialize an Arrow RecordBatch back into ActiveModels
let models = measurement::ActiveModel::from_arrow(&batch)?;

Exporting to Parquetโ€‹

Step 1: convert your ActiveModel slice into a RecordBatch:

use sea_orm::ArrowSchema;

let schema = measurement::Entity::arrow_schema();

let models: Vec<measurement::ActiveModel> = vec![..];
let batch = measurement::ActiveModel::to_arrow(&models, &schema)?;

Step 2: write to Parquet using the parquet crate:

let file = std::fs::File::create("measurements.parquet")?;
let mut writer = parquet::arrow::ArrowWriter::try_new(file, schema.into(), None)?;
writer.write(&batch)?; // write many more batches
writer.close()?;

The resulting file is readable by any Parquet-compatible tool: DuckDB, Polars, Spark, BigQuery, pandas.

Importing from Parquetโ€‹

Read a Parquet file back into ActiveModels and insert into any SeaORM-supported database:

use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;

let file = std::fs::File::open("measurements.parquet")?;
let reader = ParquetRecordBatchReaderBuilder::try_new(file)?.build()?;

let batches: Vec<_> = reader.collect::<Result<_, _>>()?;
let restored = measurement::ActiveModel::from_arrow(&batches[0])?;

measurement::Entity::insert_many(restored).exec(&db).await?;

from_arrow reconstructs full ActiveModel values: Arrow nulls become Set(None), absent columns become NotSet.

Full Exampleโ€‹

A complete working example: generate sensor readings, write to Parquet, verify the roundtrip, then insert into SQLite is available in the SeaORM repository: examples/parquet_example.

As a bonus, you can also use sea-orm-sync and avoid the async runtime entirely if your application is synchronous!

Type Mappingโ€‹

SeaORM maps Rust/SQL types to Arrow data types as follows:

Rust TypeSeaORM Column TypeArrow TypeNotes
i8TinyIntegerInt8
i16SmallIntegerInt16
i32IntegerInt32
i64BigIntegerInt64
u8TinyUnsignedUInt8
u16SmallUnsignedUInt16
u32UnsignedUInt32
u64BigUnsignedUInt64
f32FloatFloat32
f64DoubleFloat64
boolBooleanBoolean
StringCharUtf8
StringTextLargeUtf8unbounded strings use LargeUtf8
Vec<u8>Binary, VarBinaryBinary
DecimalDecimal(Some((p, s)))Decimal128(p, s)precision โ‰ค 38; use Decimal256 for larger
DecimalMoneyDecimal128(19, 4)default precision/scale
JsonJson, JsonBinaryUtf8serialized as JSON text
UuidUuidBinaryraw bytes
ActiveEnumEnumUtf8serialized as string
NaiveDateDateDate32days since epoch
NaiveTimeTimeTime64(Microsecond)
NaiveDateTimeDateTime, TimestampTimestamp(Microsecond, None)timezone-naive
DateTime<Utc>TimestampWithTimeZoneTimestamp(Microsecond, Some("UTC"))UTC-annotated

Key behaviors:

  • String length: String(StringLen::N(n)) with n โ‰ค 32767 maps to Utf8; Text and unbounded strings map to LargeUtf8.
  • Timestamp resolution: microseconds by default. Override per-field with arrow_timestamp_unit.
  • Timezone annotation: timezone-aware Rust types (DateTime<Utc>, DateTime<FixedOffset>) always produce a Timestamp with timezone. Naive types (NaiveDateTime) produce no annotation. Override with arrow_timezone.
  • Decimal: precision and scale are derived from column_type. If not specified, defaults are Decimal128(38, 10). Override per-field with arrow_precision and arrow_scale.

Timestamp Typesโ€‹

Timezone and Resolutionโ€‹

Arrow distinguishes timezone-aware and timezone-naive timestamps at the schema level. SeaORM maps them accordingly:

  • ChronoDateTime / NaiveDateTime / PrimitiveDateTime โ†’ Timestamp(Microsecond, None): no timezone annotation
  • ChronoDateTimeUtc / DateTime<Utc>/ OffsetDateTime โ†’ Timestamp(Microsecond, Some("UTC")): UTC annotated
#[sea_orm::model]
#[derive(Clone, Debug, PartialEq, DeriveEntityModel)]
#[sea_orm(table_name = "test_chrono", arrow_schema)]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
pub created_date: ChronoDate, // -> Date32
pub created_time: ChronoTime, // -> Time64(Microsecond)
pub created_at: ChronoDateTime, // -> Timestamp(Microsecond, None)
pub updated_at: ChronoDateTimeUtc, // -> Timestamp(Microsecond, Some("UTC"))
pub nullable_ts: Option<ChronoDateTimeUtc>,
}

let models = vec![..];

let batch = ActiveModel::to_arrow(&models, &schema)?;
let restored = ActiveModel::from_arrow(&batch)?;

assert_eq!(restored, models);

The default resolution is microseconds. Both the time unit and timezone can be overridden per-field using arrow_timestamp_unit and arrow_timezone:

#[sea_orm::model]
#[derive(Clone, Debug, PartialEq, Eq, DeriveEntityModel)]
#[sea_orm(table_name = "event", arrow_schema)]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
#[sea_orm(column_type = "DateTime", arrow_timestamp_unit = "Nanosecond")]
pub nano_ts: ChronoDateTime, // -> Timestamp(Nanosecond, None)
#[sea_orm(column_type = "DateTime", arrow_timestamp_unit = "Second")]
pub second_ts: ChronoDateTime, // -> Timestamp(Second, None)
#[sea_orm(
column_type = "DateTime",
arrow_timestamp_unit = "Nanosecond",
arrow_timezone = "America/New_York"
)]
pub nano_with_tz: ChronoDateTime, // -> Timestamp(Nanosecond, Some("America/New_York"))
}

Valid values for arrow_timestamp_unit: "Second", "Millisecond", "Microsecond", "Nanosecond".

Decimal Typesโ€‹

Each Decimal column is stored as Decimal128 in Arrow, preserving the exact precision and scale declared in column_type. Columns with different precision/scale are handled independently. Values are scaled to fit Arrow's internal i128 representation (value ร— 10^scale).

#[sea_orm::model]
#[derive(Clone, Debug, PartialEq, DeriveEntityModel)]
#[sea_orm(table_name = "test_rust_decimal", arrow_schema)]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
#[sea_orm(column_type = "Decimal(Some((10, 2)))")]
pub price: Decimal, // -> Decimal128(10, 2)
#[sea_orm(
column_type = "Decimal(Some((20, 4)))",
arrow_precision = 20,
arrow_scale = 4
)]
pub amount: Decimal, // -> Decimal128(20, 4)
}

let price = Decimal::new(1234567, 2); // 12345.67
let amount = Decimal::new(98765432109, 4); // 9876543.2109

let models = vec![
decimal_entity::ActiveModel {
id: Set(1),
price: Set(price),
amount: Set(amount),
nullable_decimal: Set(Some(price)),
},
];

let batch = ActiveModel::to_arrow(&models, &schema)?;

// Arrow column carries the declared precision and scale
let price_arr = batch.column_by_name("price").unwrap()
.as_any().downcast_ref::<Decimal128Array>().unwrap();
assert_eq!(price_arr.value(0), 1234567); // 12345.67 stored as 1234567 (ร— 10^-2)
assert_eq!(price_arr.precision(), 10);
assert_eq!(price_arr.scale(), 2);

// Full roundtrip
assert_eq!(ActiveModel::from_arrow(&batch)?, models);

BigDecimal is also supported with Decimal256 but not illustrated here.

SeaORM 2.0โ€‹

SeaORM 2.0 is shaping up to be our most significant release yet - with a few breaking changes, plenty of enhancements, and a clear focus on developer experience.

SeaORM 2.0 has reached its release candidate phase. We'd love for you to try it out and help shape the final release by sharing your feedback.

๐ŸŒŸ Sponsorsโ€‹

Gold Sponsorโ€‹

QDX pioneers quantum dynamics-powered drug discovery, leveraging AI and supercomputing to accelerate molecular modeling. We're grateful to QDX for sponsoring the development of SeaORM, the SQL toolkit that powers their data intensive applications.

GitHub Sponsorsโ€‹

If you feel generous, a small donation will be greatly appreciated, and goes a long way towards sustaining the organization.

A big shout out to our GitHub sponsors:

Godwin Effiong
Ryan Swart
OteroRafael
Yuta Hinokuma
wh7f
MS
Numeus
Caido Community
Marcus Buffett
MasakiMiyazaki
KallyDev
Manfred Lee
Afonso Barracha
Dean Sheather

๐Ÿฆ€ Rustacean Sticker Packโ€‹

The Rustacean Sticker Pack is the perfect way to express your passion for Rust. Our stickers are made with a premium water-resistant vinyl with a unique matte finish.

Sticker Pack Contents:

  • Logo of SeaQL projects: SeaQL, SeaORM, SeaQuery, Seaography
  • Mascots: Ferris the Crab x 3, Terres the Hermit Crab
  • The Rustacean wordmark

Support SeaQL and get a Sticker Pack!

Rustacean Sticker Pack by SeaQL