Arrow & Parquet
2.0.0SeaORM derives an Apache Arrow schema directly from your entity definition. This bridges your ORM layer with the columnar data ecosystem: Parquet, DataFusion, Polars, DuckDB, and others.
For a detailed walkthrough, see the blog post.
Setupβ
Enable Arrow support with the with-arrow feature flag:
[dependencies]
sea-orm = { version = "2.0.0-rc", features = ["with-arrow"] }
parquet = { version = "57", features = ["arrow"] }
Deriving the Arrow Schemaβ
Add arrow_schema to the #[sea_orm(..)] attribute:
use sea_orm::entity::prelude::*;
#[sea_orm::model]
#[derive(Clone, Debug, PartialEq, DeriveEntityModel)]
#[sea_orm(table_name = "measurement", arrow_schema)]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
pub recorded_at: ChronoDateTimeUtc,
pub sensor_id: i32,
pub temperature: f64,
#[sea_orm(column_type = "Decimal(Some((10, 4)))")]
pub voltage: Decimal,
}
For compact entities, use DeriveArrowSchema as an extra derive:
#[derive(DeriveEntityModel, DeriveArrowSchema, ..)]
#[sea_orm(table_name = "measurement")]
pub struct Model { .. }
This derives the ArrowSchema trait, exposing three methods:
use sea_orm::ArrowSchema;
let schema = measurement::Entity::arrow_schema();
let batch = measurement::ActiveModel::to_arrow(&models, &schema)?;
let models = measurement::ActiveModel::from_arrow(&batch)?;
Exporting to Parquetβ
Convert ActiveModels into a RecordBatch, then write with the parquet crate:
use sea_orm::ArrowSchema;
let schema = measurement::Entity::arrow_schema();
let models: Vec<measurement::ActiveModel> = vec![..];
let batch = measurement::ActiveModel::to_arrow(&models, &schema)?;
let file = std::fs::File::create("measurements.parquet")?;
let mut writer = parquet::arrow::ArrowWriter::try_new(file, schema.into(), None)?;
writer.write(&batch)?;
writer.close()?;
The resulting file is readable by any Parquet-compatible tool.
Importing from Parquetβ
Read a Parquet file back into ActiveModels and insert into any SeaORM-supported database:
use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;
let file = std::fs::File::open("measurements.parquet")?;
let reader = ParquetRecordBatchReaderBuilder::try_new(file)?.build()?;
let batches: Vec<_> = reader.collect::<Result<_, _>>()?;
let restored = measurement::ActiveModel::from_arrow(&batches[0])?;
measurement::Entity::insert_many(restored).exec(&db).await?;
Arrow nulls become Set(None), absent columns become NotSet.
Type Mappingβ
| Rust Type | SeaORM Column Type | Arrow Type |
|---|---|---|
i8 | TinyInteger | Int8 |
i16 | SmallInteger | Int16 |
i32 | Integer | Int32 |
i64 | BigInteger | Int64 |
u8βu64 | TinyUnsignedβBigUnsigned | UInt8βUInt64 |
f32 | Float | Float32 |
f64 | Double | Float64 |
bool | Boolean | Boolean |
String | Char, String(N) | Utf8 |
String | Text | LargeUtf8 |
Vec<u8> | Binary | Binary |
Decimal | Decimal(Some((p, s))) | Decimal128(p, s) |
Json | Json, JsonBinary | Utf8 |
Uuid | Uuid | Binary |
NaiveDate | Date | Date32 |
NaiveTime | Time | Time64(Microsecond) |
NaiveDateTime | DateTime | Timestamp(Microsecond, None) |
DateTime<Utc> | TimestampWithTimeZone | Timestamp(Microsecond, Some("UTC")) |
Overriding Timestamp and Decimal Mappingβ
Override the timestamp resolution or timezone per-field:
#[sea_orm::model]
#[derive(Clone, Debug, PartialEq, Eq, DeriveEntityModel)]
#[sea_orm(table_name = "event", arrow_schema)]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
#[sea_orm(column_type = "DateTime", arrow_timestamp_unit = "Nanosecond")]
pub nano_ts: ChronoDateTime,
#[sea_orm(
column_type = "DateTime",
arrow_timestamp_unit = "Nanosecond",
arrow_timezone = "America/New_York"
)]
pub nano_with_tz: ChronoDateTime,
}
Valid values for arrow_timestamp_unit: "Second", "Millisecond", "Microsecond", "Nanosecond".
Override decimal precision and scale per-field:
#[sea_orm(
column_type = "Decimal(Some((20, 4)))",
arrow_precision = 20,
arrow_scale = 4
)]
pub amount: Decimal,
Full Exampleβ
A complete working example (generate data β write Parquet β roundtrip β insert into SQLite) is available in the SeaORM repository.