DolphinDB provides a unified data exchange model for files, columnar storage, and external databases. It efficiently handles CSV, JSON, and Parquet import/export, bulk synchronization, and format conversion, helping solve compatibility, performance, and automation challenges in industrial IoT and time-series data processing. Keywords: DolphinDB, CSV, Parquet.
Technical Specification Snapshot
| Parameter | Description |
|---|---|
| Platform/Language | DolphinDB scripting language |
| Supported Protocols/Interfaces | File system, MySQL plugin, ODBC/PostgreSQL |
| Applicable Scenarios | IoT time-series data, batch ETL, database synchronization |
| GitHub Stars | Not provided in the original content |
| Core Dependencies | mysql plugin, odbc plugin, Parquet/JSON/text import and export functions |
| Core Functions | loadText, saveText, loadJSON, saveJSON, loadParquet, saveParquet |
DolphinDB uses a unified function model to cover mainstream data exchange scenarios
DolphinDB import and export capabilities extend far beyond plain text files. They cover structured text, columnar files, and external databases. For developers, the key advantage is a consistent function model: the semantics stay uniform, the learning curve stays low, and ETL pipelines become easier to assemble.
As shown in the source material, common input functions include loadText, loadJSON, and loadParquet, while output functions include saveText, saveJSON, and saveParquet. If the target is a distributed database table, you can further use loadTextEx and loadParquetEx to write data directly into storage.
Supported formats and core function mapping
| Data Type | Typical Formats | Read Functions | Write Functions |
|---|---|---|---|
| Text | CSV, TXT | loadText | saveText |
| JSON | JSON, nested JSON | loadJSON / parseJson | saveJSON |
| Columnar | Parquet | loadParquet | saveParquet |
| Distributed Ingestion | Partitioned table import | loadTextEx / loadParquetEx | saveTextEx |
| External Databases | MySQL, PostgreSQL | mysql::loadTable / odbc::query | mysql::saveTable / odbc::execute |
The core value of this mapping is simple: developers can choose functions by data shape instead of redesigning the entire processing pipeline.
CSV remains the most universal entry point for data exchange
CSV works well for cross-system exchange, offline collection, and manual validation. It offers strong compatibility, but weak type information. For that reason, you should explicitly define a schema during import whenever possible to prevent time and numeric columns from being inferred incorrectly.
// Define the schema for the sensor data table
schema = table(
`device_id`timestamp`temperature`humidity as colNames,
[INT, DATETIME, DOUBLE, DOUBLE] as colTypes
)
// Import the CSV with an explicit schema to avoid type inference errors
t = loadText("/data/sensor_data.csv", schema=schema, skipRows=1)
// Specify the time format to ensure strings are converted correctly to time types
t = loadText("/data/sensor_data.csv", schema=schema, dateFormat="yyyy-MM-dd HH:mm:ss")
This code provides stable structured CSV ingestion, with schema and date format used to control type quality.
For exports, saveText supports delimiters, append mode, and header control. That makes it suitable for downstream scripts, reporting tools, or temporary audit files.
// Build sample data
t = table(
1..100 as device_id,
2024.01.01 + 0..99 as date,
rand(20.0..30.0, 100) as temperature,
rand(40.0..60.0, 100) as humidity
)
// Export to a CSV file
saveText(t, "/output/sensor_data.csv", delimiter=',', header=true)
This code quickly exports an in-memory table to a standard CSV file for exchange and backup.
Large file imports should prioritize parallelism and distributed ingestion
Once a CSV file reaches millions of rows, the main bottleneck is usually not syntax parsing. It is I/O throughput and single-node memory capacity. The original material presents two optimization paths: ploadText for parallel reading and loadTextEx for direct ingestion into partitioned tables.
// Create a distributed database partitioned by date
db = database("dfs://iot_data", VALUE, 2024.01.01..2024.12.31)
// Import the CSV directly into a distributed table, partitioned by timestamp
loadTextEx(db, "sensor_data", "/data/sensor_data.csv", `timestamp)
This code bypasses an intermediate table and writes large files directly into distributed storage, reducing memory pressure.
JSON is better suited for semi-structured data and API integration
JSON provides stronger structural expressiveness, especially for API responses, nested objects, and device event payloads. The tradeoff is higher parsing cost, so read/write performance is typically weaker than CSV and Parquet.
// Read a JSON file
t = loadJSON("/data/sensor_data.json")
// Parse a JSON string
jsonStr = '{"device_id":1,"temperature":25.5,"humidity":50.0}'
data = parseJson(jsonStr)
This code demonstrates two JSON entry points: file-level import and string-level parsing.
For nested JSON, DolphinDB can directly access values by object path. This works well when device metadata and measurement payloads are wrapped together.
// Parse a nested JSON object
nestedJson = '{"device":{"id":1,"name":"sensor_001"},"data":{"temperature":25.5,"humidity":50.0}}'
data = parseJson(nestedJson)
// Extract nested fields
device_id = data.device.id
temperature = data.data.temperature
This code extracts key fields from nested JSON and is well suited for flattening structures before loading data into a database.
Parquet is the preferred format for analytical workloads
If you need long-term storage, compressed persistence, and column-oriented access, Parquet usually outperforms CSV and JSON. The original material also clearly highlights its strengths: columnar storage, high compression, and cross-platform compatibility.
// Read only the required columns to reduce I/O and memory usage
t = loadParquet(
"/data/sensor_data.parquet",
columns=`device_id`timestamp`temperature
)
// Export Parquet using snappy compression
saveParquet(t, "/output/sensor_data.parquet", compression="snappy")
This code demonstrates two major Parquet advantages: column pruning and compressed storage.
Parquet has clear advantages in analytical pipelines
| Feature | Engineering Value |
|---|---|
| Columnar storage | Reads only required columns and reduces scan cost |
| High compression ratio | Saves more space than CSV, often by several times |
| Query-friendly | Better suited for analytical workloads and batch computation |
| Ecosystem compatibility | Easy to integrate with Spark, Hive, and similar systems |
For log archiving, offline analytics, and lakehouse staging, Parquet should take priority over CSV.
External database synchronization turns DolphinDB into an ETL hub
The source material shows how to integrate with MySQL and PostgreSQL. The core approach is to establish a connection through plugins, pull tables or query results, and then write them into DolphinDB or sync the processed data back into the target database.
// Load the MySQL plugin and establish a connection
loadPlugin("/plugins/mysql/libPluginMySQL.so")
conn = mysql::connect("localhost", 3306, "root", "password", "iot_db")
// Query MySQL data
t = mysql::query(conn, "SELECT * FROM sensor_data WHERE date >= '2024-01-01'")
This code pulls relational data into DolphinDB and is suitable for incremental synchronization and analytical preprocessing.
Batch imports and scheduled jobs are ideal for automated pipelines
When data sources deliver files by directory or land data hourly, you can combine directory scanning with scheduled jobs to build ETL automation. The main benefit is reduced manual intervention and more stable synchronization.
// Execute the sync job once per hour
def syncData() {
source = mysql::loadTable(mysql_conn, "sensor_data") // Pull data from the source database
loadTable("dfs://iot_data", "sensor_data").append!(source) // Write into the target table
}
scheduleJob("sync_sensor", "Sensor data synchronization", syncData, 00:00, 2024.01.01, 2030.12.31, 'H')
This code builds a recurring data synchronization task suitable for production ingestion workflows.
Performance optimization depends on format selection, column pruning, and early cleansing
Import and export speed depends not only on the function itself, but also on data format and processing strategy. CSV is universal but storage-heavy. JSON is flexible but slower to parse. In analytical scenarios, Parquet is often the best overall choice.
In engineering practice, prioritize the following strategies: use bulk ingestion instead of row-by-row writes, use parallel loading instead of serial reads, apply column pruning to avoid unnecessary fields, and push cleansing logic upstream to reduce the downstream cost of dirty data.
Common optimization strategies at a glance
| Optimization Area | Recommendation |
|---|---|
| Large file import | Prefer ploadText or loadTextEx |
| Analytical file export | Prefer Parquet with compression enabled |
| Query performance | Read only required columns |
| Data quality | Parse timestamps and apply range filters during ingestion |
| Automation | Use scheduleJob for recurring synchronization |
FAQ
Q: How should I choose between CSV, JSON, and Parquet?
A: Choose CSV if you want universal exchange and human readability. Choose JSON if you need to represent nested structures or API objects. Choose Parquet first if you care about compression ratio, query efficiency, and analytical performance.
Q: What is the most important optimization for importing very large CSV files into DolphinDB?
A: The key is to avoid full serial loading on a single node. Prefer ploadText for parallel reads, or use loadTextEx to write directly into distributed tables. Combine that with proper partition design to reduce memory pressure.
Q: How can I implement continuous synchronization between DolphinDB and MySQL/PostgreSQL?
A: Establish database connections through plugins, use query functions to pull incremental data, and combine the sync logic with scheduleJob for periodic execution. For production-grade reliability, add checkpoint recovery, idempotent writes, and retry handling.
References can be used to verify implementation details further

AI Visual Insight: This image shows a QR code entry point for business collaboration, technical communication, or community access, which is common at the end of technical articles. It typically allows readers to contact the author, request solution consultations, or join a technical community, reflecting how content distribution extends into technical services and ecosystem engagement.

AI Visual Insight: This image is a personal WeChat contact-card QR code, typically containing a unique identity marker for establishing one-to-one technical communication. In technical content ecosystems, it is often used as a private-channel entry point for pre-sales consultation, project collaboration, training services, or community building.
[AI Readability Summary]
This article systematically explains DolphinDB data import and export capabilities across CSV, JSON, Parquet, MySQL, and PostgreSQL synchronization, as well as batch processing, format conversion, and performance optimization. It helps developers build efficient and reliable data exchange pipelines.