SeaORM now supports Arrow & Parquet
SeaORM 2.0 adds native Apache Arrow and Parquet support. Derive an Arrow schema directly from your SeaORM entity: no redundant schema definitions, no drift.
Motivationโ
Traditional ORMs are built for OLTP. But Rust backends increasingly need to:
- Export data snapshots to object storage (S3, GCS)
- Feed analytical pipelines (DataFusion, Polars, DuckDB)
- Archive time-series rows efficiently in columnar format
- Seed or replicate databases from Parquet files
Arrow is the lingua franca of in-memory columnar data. Parquet is its on-disk counterpart. Both are supported by the entire modern data stack.
The problem: you've already defined your schema as SeaORM entities. Redefining it as an Arrow schema is redundant and error-prone. SeaORM now comes with Arrow support out-of-the-box!
Getting Startedโ
Enable Arrow support with the with-arrow feature flag:
[dependencies]
sea-orm = { version = "2.0.0-rc", features = ["with-arrow"] }
parquet = { version = "54", features = ["arrow"] }
Suppose you have a sensor data pipeline. You want to archive today's rows to Parquet for downstream analytics.
Arrow Schema Derivationโ
Add arrow_schema to the #[sea_orm(..)] attribute on your entity:
use sea_orm::entity::prelude::*;
#[sea_orm::model] // <- new Entity
#[derive(Clone, Debug, PartialEq, DeriveEntityModel)]
#[sea_orm(table_name = "measurement", arrow_schema)] // <- enable Arrow
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
pub recorded_at: ChronoDateTimeUtc,
pub sensor_id: i32,
pub temperature: f64,
#[sea_orm(column_type = "Decimal(Some((10, 4)))")]
pub voltage: Decimal,
}
(for compact entity)
#[derive(DeriveEntityModel, DeriveArrowSchema, ..)] // <- extra derive
#[sea_orm(table_name = "measurement")]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
..
}
This derives the ArrowSchema trait on Entity and ActiveModel, exposing three methods:
use sea_orm::ArrowSchema;
// Get the Arrow Schema matching your entity
let schema = measurement::Entity::arrow_schema();
// Serialize a slice of ActiveModels into an Arrow RecordBatch
let batch = measurement::ActiveModel::to_arrow(&models, &schema)?;
// Deserialize an Arrow RecordBatch back into ActiveModels
let models = measurement::ActiveModel::from_arrow(&batch)?;
Exporting to Parquetโ
Step 1: convert your ActiveModel slice into a RecordBatch:
use sea_orm::ArrowSchema;
let schema = measurement::Entity::arrow_schema();
let models: Vec<measurement::ActiveModel> = vec![..];
let batch = measurement::ActiveModel::to_arrow(&models, &schema)?;
Step 2: write to Parquet using the parquet crate:
let file = std::fs::File::create("measurements.parquet")?;
let mut writer = parquet::arrow::ArrowWriter::try_new(file, schema.into(), None)?;
writer.write(&batch)?; // write many more batches
writer.close()?;
The resulting file is readable by any Parquet-compatible tool: DuckDB, Polars, Spark, BigQuery, pandas.
Importing from Parquetโ
Read a Parquet file back into ActiveModels and insert into any SeaORM-supported database:
use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;
let file = std::fs::File::open("measurements.parquet")?;
let reader = ParquetRecordBatchReaderBuilder::try_new(file)?.build()?;
let batches: Vec<_> = reader.collect::<Result<_, _>>()?;
let restored = measurement::ActiveModel::from_arrow(&batches[0])?;
measurement::Entity::insert_many(restored).exec(&db).await?;
from_arrow reconstructs full ActiveModel values: Arrow nulls become Set(None), absent columns become NotSet.
Full Exampleโ
A complete working example: generate sensor readings, write to Parquet, verify the roundtrip, then insert into SQLite is available in the SeaORM repository: examples/parquet_example.
As a bonus, you can also use sea-orm-sync and avoid the async runtime entirely if your application is synchronous!
Type Mappingโ
SeaORM maps Rust/SQL types to Arrow data types as follows:
| Rust Type | SeaORM Column Type | Arrow Type | Notes |
|---|---|---|---|
i8 | TinyInteger | Int8 | |
i16 | SmallInteger | Int16 | |
i32 | Integer | Int32 | |
i64 | BigInteger | Int64 | |
u8 | TinyUnsigned | UInt8 | |
u16 | SmallUnsigned | UInt16 | |
u32 | Unsigned | UInt32 | |
u64 | BigUnsigned | UInt64 | |
f32 | Float | Float32 | |
f64 | Double | Float64 | |
bool | Boolean | Boolean | |
String | Char | Utf8 | |
String | Text | LargeUtf8 | unbounded strings use LargeUtf8 |
Vec<u8> | Binary, VarBinary | Binary | |
Decimal | Decimal(Some((p, s))) | Decimal128(p, s) | precision โค 38; use Decimal256 for larger |
Decimal | Money | Decimal128(19, 4) | default precision/scale |
Json | Json, JsonBinary | Utf8 | serialized as JSON text |
Uuid | Uuid | Binary | raw bytes |
ActiveEnum | Enum | Utf8 | serialized as string |
NaiveDate | Date | Date32 | days since epoch |
NaiveTime | Time | Time64(Microsecond) | |
NaiveDateTime | DateTime, Timestamp | Timestamp(Microsecond, None) | timezone-naive |
DateTime<Utc> | TimestampWithTimeZone | Timestamp(Microsecond, Some("UTC")) | UTC-annotated |
Key behaviors:
- String length:
String(StringLen::N(n))withn โค 32767maps toUtf8;Textand unbounded strings map toLargeUtf8. - Timestamp resolution: microseconds by default. Override per-field with
arrow_timestamp_unit. - Timezone annotation: timezone-aware Rust types (
DateTime<Utc>,DateTime<FixedOffset>) always produce aTimestampwith timezone. Naive types (NaiveDateTime) produce no annotation. Override witharrow_timezone. - Decimal: precision and scale are derived from
column_type. If not specified, defaults areDecimal128(38, 10). Override per-field witharrow_precisionandarrow_scale.
Timestamp Typesโ
Timezone and Resolutionโ
Arrow distinguishes timezone-aware and timezone-naive timestamps at the schema level. SeaORM maps them accordingly:
ChronoDateTime/NaiveDateTime/PrimitiveDateTimeโTimestamp(Microsecond, None): no timezone annotationChronoDateTimeUtc/DateTime<Utc>/OffsetDateTimeโTimestamp(Microsecond, Some("UTC")): UTC annotated
#[sea_orm::model]
#[derive(Clone, Debug, PartialEq, DeriveEntityModel)]
#[sea_orm(table_name = "test_chrono", arrow_schema)]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
pub created_date: ChronoDate, // -> Date32
pub created_time: ChronoTime, // -> Time64(Microsecond)
pub created_at: ChronoDateTime, // -> Timestamp(Microsecond, None)
pub updated_at: ChronoDateTimeUtc, // -> Timestamp(Microsecond, Some("UTC"))
pub nullable_ts: Option<ChronoDateTimeUtc>,
}
let models = vec![..];
let batch = ActiveModel::to_arrow(&models, &schema)?;
let restored = ActiveModel::from_arrow(&batch)?;
assert_eq!(restored, models);
The default resolution is microseconds. Both the time unit and timezone can be overridden per-field using arrow_timestamp_unit and arrow_timezone:
#[sea_orm::model]
#[derive(Clone, Debug, PartialEq, Eq, DeriveEntityModel)]
#[sea_orm(table_name = "event", arrow_schema)]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
#[sea_orm(column_type = "DateTime", arrow_timestamp_unit = "Nanosecond")]
pub nano_ts: ChronoDateTime, // -> Timestamp(Nanosecond, None)
#[sea_orm(column_type = "DateTime", arrow_timestamp_unit = "Second")]
pub second_ts: ChronoDateTime, // -> Timestamp(Second, None)
#[sea_orm(
column_type = "DateTime",
arrow_timestamp_unit = "Nanosecond",
arrow_timezone = "America/New_York"
)]
pub nano_with_tz: ChronoDateTime, // -> Timestamp(Nanosecond, Some("America/New_York"))
}
Valid values for arrow_timestamp_unit: "Second", "Millisecond", "Microsecond", "Nanosecond".
Decimal Typesโ
Each Decimal column is stored as Decimal128 in Arrow, preserving the exact precision and scale declared in column_type. Columns with different precision/scale are handled independently. Values are scaled to fit Arrow's internal i128 representation (value ร 10^scale).
#[sea_orm::model]
#[derive(Clone, Debug, PartialEq, DeriveEntityModel)]
#[sea_orm(table_name = "test_rust_decimal", arrow_schema)]
pub struct Model {
#[sea_orm(primary_key)]
pub id: i32,
#[sea_orm(column_type = "Decimal(Some((10, 2)))")]
pub price: Decimal, // -> Decimal128(10, 2)
#[sea_orm(
column_type = "Decimal(Some((20, 4)))",
arrow_precision = 20,
arrow_scale = 4
)]
pub amount: Decimal, // -> Decimal128(20, 4)
}
let price = Decimal::new(1234567, 2); // 12345.67
let amount = Decimal::new(98765432109, 4); // 9876543.2109
let models = vec![
decimal_entity::ActiveModel {
id: Set(1),
price: Set(price),
amount: Set(amount),
nullable_decimal: Set(Some(price)),
},
];
let batch = ActiveModel::to_arrow(&models, &schema)?;
// Arrow column carries the declared precision and scale
let price_arr = batch.column_by_name("price").unwrap()
.as_any().downcast_ref::<Decimal128Array>().unwrap();
assert_eq!(price_arr.value(0), 1234567); // 12345.67 stored as 1234567 (ร 10^-2)
assert_eq!(price_arr.precision(), 10);
assert_eq!(price_arr.scale(), 2);
// Full roundtrip
assert_eq!(ActiveModel::from_arrow(&batch)?, models);
BigDecimal is also supported with Decimal256 but not illustrated here.
SeaORM 2.0โ
SeaORM 2.0 is shaping up to be our most significant release yet - with a few breaking changes, plenty of enhancements, and a clear focus on developer experience.
SeaORM 2.0 has reached its release candidate phase. We'd love for you to try it out and help shape the final release by sharing your feedback.
๐ Sponsorsโ
Gold Sponsorโ
QDX pioneers quantum dynamics-powered drug discovery, leveraging AI and supercomputing to accelerate molecular modeling. We're grateful to QDX for sponsoring the development of SeaORM, the SQL toolkit that powers their data intensive applications.
GitHub Sponsorsโ
If you feel generous, a small donation will be greatly appreciated, and goes a long way towards sustaining the organization.
A big shout out to our GitHub sponsors:
๐ฆ Rustacean Sticker Packโ
The Rustacean Sticker Pack is the perfect way to express your passion for Rust. Our stickers are made with a premium water-resistant vinyl with a unique matte finish.
Sticker Pack Contents:
- Logo of SeaQL projects: SeaQL, SeaORM, SeaQuery, Seaography
- Mascots: Ferris the Crab x 3, Terres the Hermit Crab
- The Rustacean wordmark
Support SeaQL and get a Sticker Pack!

