DolphinDB RANGE partitioning splits data by continuous value ranges. Its core value lies in improving range query performance and simplifying hot-cold data tiering and historical archiving. This guide covers table creation, querying, capacity expansion, archiving, and composite partition design. Keywords: DolphinDB, RANGE partitioning, time-series database
Technical specification snapshot
| Parameter | Description |
|---|---|
| Database | DolphinDB |
| Language | DolphinDB Script |
| Partition Type | RANGE |
| Typical Fields | DATE, TIMESTAMP, INT, MONTH |
| Query Benefits | Partition pruning, optimized range scans |
| Core Capabilities | Table creation, expansion, deletion, archiving, composite partitioning |
| Core Dependencies | distributed file system, partitioned table |
| Protocol/License | Original source marked as CC 4.0 BY-SA |
RANGE partitioning fits data layouts driven by continuous values
At its core, RANGE partitioning splits a table by intervals of a column’s values. Each partition holds one continuous value range, which makes it a natural fit for ordered data such as time, numeric values, and increasing IDs. Unlike VALUE partitioning, which targets discrete enumerations, RANGE partitioning emphasizes ordering and range filtering.
It addresses three common pain points: full-table scans on large datasets are expensive, historical data management is coarse-grained, and separating hot and cold data is difficult. As long as the query condition hits the partition key, the system can skip irrelevant partitions and significantly reduce I/O and compute costs.
RANGE partition boundaries determine how data is written to disk
# Pseudocode: understand the meaning of RANGE partition boundaries
partitions = [1, 100, 200, 300] # Partition boundaries
# Corresponding intervals: [1,100), [100,200), [200,300)
# Core logic: route data to the partition that matches the partition key interval
This example shows that RANGE partitioning relies on a boundary vector to describe continuous intervals. Boundary design directly affects data distribution and partition pruning efficiency.
When creating RANGE partitions, design boundaries around query patterns first
The most basic database creation syntax is database("dfs://db_name", RANGE, partition_vector). Here, partition_vector is not just an arbitrary list. It is the physical representation of your query model. If the workload mainly queries by day, you should not partition only by year.
Time-based data is the best fit for RANGE partitioning, especially for device telemetry, logs, transaction records, and monitoring metrics. The following example creates a partitioned table by date range.
// Create a RANGE-partitioned database by time range
dates = 2024.01.01..2024.12.31 // Generate date boundaries for the full year
db = database("dfs://time_db", RANGE, dates)
schema = table(1:0, `device_id`timestamp`temperature`humidity,
[INT, TIMESTAMP, DOUBLE, DOUBLE])
db.createPartitionedTable(schema, `sensor_data, `timestamp) // Create the table using timestamp as the partitioning column
This example creates a time-based RANGE-partitioned database and table, which is well suited for time-series detail data.
Numeric interval partitioning fits tiered statistics and metric bandwidth management
If the workload aggregates by numeric ranges, such as amount bands, score bands, or device ID ranges, you can define boundaries as explicit intervals. For example, [0,100,200,300,400,500] can route data predictably into multiple statistical shards.
Monthly partitioning is a compromise within time-based RANGE partitioning. It gives up some query precision in exchange for fewer partitions, which makes it suitable for lower-frequency writes and monthly archival scenarios.
// Partition by month
months = [2024.01M, 2024.02M, 2024.03M, 2024.04M, 2024.05M, 2024.06M,
2024.07M, 2024.08M, 2024.09M, 2024.10M, 2024.11M, 2024.12M, 2025.01M] // Monthly boundaries must cover future writes
db = database("dfs://monthly_db", RANGE, months)
This example shows how to define month-level boundaries for business tables archived by calendar month.
The real benefits of RANGE partitioning appear during partition pruning
When the filter condition matches the partitioning column, DolphinDB can prune irrelevant partitions before execution. For example, a query for data from January 2024 does not need to scan the full year’s data directories. That is where RANGE partitioning gets its performance advantage.
In development, use explain to inspect the execution plan and confirm whether partition pruning occurs. If the where clause wraps the partitioning column in an unsuitable function, the optimizer may lose some of that benefit.
// Range query and execution plan inspection
t = loadTable("dfs://time_db", "sensor_data")
select count(*) from t
where date(timestamp) between 2024.01.01 and 2024.01.31 // Hit the time-range partitions
explain select * from t
where date(timestamp) between 2024.01.01 and 2024.01.07 // Check whether only the target partitions are scanned
This example helps verify whether RANGE partitioning is actually delivering partition pruning.
Time-window aggregation works naturally with RANGE partitioning
In time-series analysis, a common pattern is to filter by date first and then aggregate by hour or minute. RANGE partitioning narrows the scan scope, while bar(timestamp, 1h) performs windowed aggregation. Together, they can significantly improve query stability.
Dynamic partition management is the key to making RANGE designs sustainable
RANGE partitioning is not a one-time design decision. In real production environments, you continuously add future partitions and remove expired ones. DolphinDB provides addPartitions and dropPartition, which makes lifecycle management scriptable.
For systems that write data daily, pre-creating future-window partitions is recommended so that writes do not fail because of missing boundaries. Partition deletion requires caution, because it usually means the underlying data is physically removed.
// Dynamically add and remove partitions
db = database("dfs://time_db")
addPartitions(db, [2025.01.01, 2025.02.01]) // Pre-create future partitions
dropPartition(db, [2023.01.01]) // Deleting an old partition also deletes all data in that partition
This example demonstrates online expansion and historical cleanup for RANGE partitioning.
Archive migration should copy first and delete later
A safer strategy is to write old data into an archive database first and then delete the corresponding partitions. This keeps the online database focused on hot data while preserving historical traceability, which is especially useful for IoT and monitoring platforms.
Best practices should focus on granularity, scale, and composite design
For granularity, high-frequency writes usually benefit from daily or even hourly partitioning. Medium-frequency workloads fit weekly or monthly partitions. Low-frequency archival scenarios fit monthly or yearly partitions. The goal is not the finest possible granularity, but a balance between per-partition size, partition count, and query hit rate.
As a rule of thumb, keeping each partition between 1 GB and 10 GB is easier to manage, and the total number of partitions should remain within a maintainable range. If pure time-based partitioning still creates hotspots, use composite partitioning to distribute pressure further.
// Composite partitioning: time RANGE + device HASH
db = database("dfs://combo_db2", COMPO,
[RANGE, 2024.01.01..2024.12.31,
HASH, [INT, 10]]) // Time pruning + device hash to spread hotspots
This example combines RANGE and HASH partitioning, which is a good fit for high-concurrency device data ingestion.
IoT scenarios best demonstrate the engineering value of RANGE partitioning
A typical design partitions device telemetry tables and alert tables by day, so querying a single day’s data only touches the corresponding daily partition. Combined with scheduled jobs, you can archive data older than 90 days, delete obsolete partitions, and automatically create partitions for the next 30 days.
This design solves three problems at once: it keeps online queries lightweight, controls storage costs, and standardizes operational workflows. For continuously growing time-series systems, that matters more than query speed alone.
FAQ
Why is RANGE partitioning especially suitable for time-series databases?
Because time-series data naturally increases over time, and queries usually target time ranges. RANGE partitioning lets time filters map directly to partition pruning, which reduces the scan scope.
How do you choose daily, monthly, or yearly partitioning?
It depends on write frequency, daily data volume, and query habits. If writes are frequent and single-day queries are common, choose daily partitioning. If monthly reporting dominates, choose monthly partitioning. If the workload is mainly low-frequency archival, choose yearly partitioning.
When should RANGE be combined with HASH or VALUE partitioning?
If time-only partitioning still causes write hotspots, oversized partitions, or frequent filtering by device dimensions, you should consider composite partitioning. Use RANGE for pruning, and use HASH or VALUE for additional distribution or targeting.
AI Readability Summary: This article systematically explains the principles of DolphinDB RANGE partitioning, table creation patterns, query optimization, dynamic scaling, and composite partitioning practices. It covers time-series workloads, numeric interval use cases, and IoT data scenarios to help developers build scalable and maintainable partitioning strategies.