Skip to main content
Version: 2.0.x

Arrow & Parquet

Since 2.0.0

SeaORM derives an Apache Arrow schema directly from your entity definition. This bridges your ORM layer with the columnar data ecosystem: Parquet, DataFusion, Polars, DuckDB, and others.

For a detailed walkthrough, see the blog post.

Setup​

Enable Arrow support with the with-arrow feature flag:

[dependencies]
sea-orm = { version = "2.0.0-rc", features = ["with-arrow"] }
parquet = { version = "57", features = ["arrow"] }

Deriving the Arrow Schema​

Add arrow_schema to the #[sea_orm(..)] attribute:

use sea_orm::entity::prelude::*;

#[sea_orm::model]
#[derive(Clone, Debug, PartialEq, DeriveEntityModel)]
#[sea_orm(table_name = "measurement", arrow_schema)]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
pub recorded_at: ChronoDateTimeUtc,
pub sensor_id: i32,
pub temperature: f64,
#[sea_orm(column_type = "Decimal(Some((10, 4)))")]
pub voltage: Decimal,
}

For compact entities, use DeriveArrowSchema as an extra derive:

#[derive(DeriveEntityModel, DeriveArrowSchema, ..)]
#[sea_orm(table_name = "measurement")]
pub struct Model { .. }

This derives the ArrowSchema trait, exposing three methods:

use sea_orm::ArrowSchema;

let schema = measurement::Entity::arrow_schema();
let batch = measurement::ActiveModel::to_arrow(&models, &schema)?;
let models = measurement::ActiveModel::from_arrow(&batch)?;

Exporting to Parquet​

Convert ActiveModels into a RecordBatch, then write with the parquet crate:

use sea_orm::ArrowSchema;

let schema = measurement::Entity::arrow_schema();
let models: Vec<measurement::ActiveModel> = vec![..];
let batch = measurement::ActiveModel::to_arrow(&models, &schema)?;

let file = std::fs::File::create("measurements.parquet")?;
let mut writer = parquet::arrow::ArrowWriter::try_new(file, schema.into(), None)?;
writer.write(&batch)?;
writer.close()?;

The resulting file is readable by any Parquet-compatible tool.

Importing from Parquet​

Read a Parquet file back into ActiveModels and insert into any SeaORM-supported database:

use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;

let file = std::fs::File::open("measurements.parquet")?;
let reader = ParquetRecordBatchReaderBuilder::try_new(file)?.build()?;

let batches: Vec<_> = reader.collect::<Result<_, _>>()?;
let restored = measurement::ActiveModel::from_arrow(&batches[0])?;

measurement::Entity::insert_many(restored).exec(&db).await?;

Arrow nulls become Set(None), absent columns become NotSet.

Type Mapping​

Rust TypeSeaORM Column TypeArrow Type
i8TinyIntegerInt8
i16SmallIntegerInt16
i32IntegerInt32
i64BigIntegerInt64
u8–u64TinyUnsigned–BigUnsignedUInt8–UInt64
f32FloatFloat32
f64DoubleFloat64
boolBooleanBoolean
StringChar, String(N)Utf8
StringTextLargeUtf8
Vec<u8>BinaryBinary
DecimalDecimal(Some((p, s)))Decimal128(p, s)
JsonJson, JsonBinaryUtf8
UuidUuidBinary
NaiveDateDateDate32
NaiveTimeTimeTime64(Microsecond)
NaiveDateTimeDateTimeTimestamp(Microsecond, None)
DateTime<Utc>TimestampWithTimeZoneTimestamp(Microsecond, Some("UTC"))

Overriding Timestamp and Decimal Mapping​

Override the timestamp resolution or timezone per-field:

#[sea_orm::model]
#[derive(Clone, Debug, PartialEq, Eq, DeriveEntityModel)]
#[sea_orm(table_name = "event", arrow_schema)]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
#[sea_orm(column_type = "DateTime", arrow_timestamp_unit = "Nanosecond")]
pub nano_ts: ChronoDateTime,
#[sea_orm(
column_type = "DateTime",
arrow_timestamp_unit = "Nanosecond",
arrow_timezone = "America/New_York"
)]
pub nano_with_tz: ChronoDateTime,
}

Valid values for arrow_timestamp_unit: "Second", "Millisecond", "Microsecond", "Nanosecond".

Override decimal precision and scale per-field:

#[sea_orm(
column_type = "Decimal(Some((20, 4)))",
arrow_precision = 20,
arrow_scale = 4
)]
pub amount: Decimal,

Full Example​

A complete working example (generate data β†’ write Parquet β†’ roundtrip β†’ insert into SQLite) is available in the SeaORM repository.