[AI Readability Summary] This article explains how to build indexes safely on multi-million-row MySQL order tables. It focuses on six practical principles: choosing highly selective columns, ordering composite indexes correctly, building covering indexes, removing redundant indexes, using online schema change tools, and monitoring index usage continuously. The goal is to improve query performance without causing lock contention, disk growth, or write throughput degradation.
Technical specifications are summarized here
| Parameter | Description |
|---|---|
| Technical topic | Index design for multi-million-row MySQL tables |
| Applicable engine | InnoDB |
| Related protocols/mechanisms | B+Tree, Online DDL, Metadata Locks (MDL) |
| Data scale | Multi-million-row order table |
| Core dependencies | MySQL 5.6+, performance_schema, sys, pt-online-schema-change, gh-ost |
| Source format | Interview question and production practice summary |
| Star count | Not provided; the original article is not from an open-source repository page |
Building indexes on massive tables is a stability problem, not a syntax problem
Running ALTER TABLE ADD INDEX on a small table often finishes quickly. On a multi-million-row table, however, an index change affects locks, disk usage, I/O, and write throughput at the same time.
The real challenge is not whether you know how to create an index. It is whether you can add that index safely without downtime. That is also one of the clearest differences between an interview answer and real production experience.
Index changes on large tables require evaluating three categories of risk first
The first risk is the interaction between metadata locks and long-running transactions. Even if a DDL operation is labeled online, it can still block writes at the worst possible moment. The second risk is temporary space expansion, because index creation fundamentally reorganizes data structures.
The third risk is write amplification. Every additional secondary index forces INSERT, UPDATE, and DELETE to maintain more B+Trees, which permanently increases write cost.
-- Inspect table size and index distribution before adding an index
SHOW TABLE STATUS LIKE 'orders';
SHOW INDEX FROM orders;
This SQL helps confirm table size, index count, and baseline state before making the change.
Choosing highly selective columns is what makes an index actually effective
More indexes do not automatically mean better performance. Prioritize columns that appear frequently in filters and have high selectivity. Common strong candidates include user_id, order_no, and merchant_id.
Columns such as status or gender, which have very few distinct values, usually provide little benefit as standalone indexes. On a multi-million-row table, they may match too many records, and the optimizer may decide not to use the index at all.
-- Low-selectivity columns are poor candidates for standalone indexes
ALTER TABLE orders ADD INDEX idx_status (status);
-- Better: place the low-selectivity column later in a composite index
ALTER TABLE orders ADD INDEX idx_user_status (user_id, status);
This SQL shows that a low-selectivity column should act as a supporting filter, not as the main index entry point.
Composite index order must serve the real query path
Composite indexes follow the leftmost prefix rule. If you choose the wrong column order, the index may exist but still remain unusable. Equality predicate columns usually come first, while range predicate columns usually come later.
If you put a range column first, the optimizer’s ability to use subsequent columns drops significantly. That is one of the most common reasons indexes become ineffective on large tables.
-- Recommended: equality column first, range column second
ALTER TABLE orders ADD INDEX idx_user_ctime (user_id, create_time);
-- This query can use the index effectively
EXPLAIN SELECT * FROM orders
WHERE user_id = 123
AND create_time > '2026-01-01';
This SQL is used to verify whether the composite index matches the query predicate pattern.
Covering indexes can significantly reduce random I/O from table lookups
In large-table queries, the expensive part is often not scanning the index itself. The real cost comes from reading full rows by going back to the clustered primary key. A covering index reduces random disk access by placing all required query columns directly in the index.
If a high-frequency query only needs user_id and status, then (user_id, status) is clearly better than (user_id) alone.
-- Let the index cover as many queried columns as possible
ALTER TABLE orders ADD INDEX idx_user_status (user_id, status);
EXPLAIN SELECT user_id, status
FROM orders
WHERE user_id = 123;
The goal of this SQL is to make the execution plan show Using index, which avoids extra table lookups.
Redundant indexes consume both storage and write performance
If you already have (user_id, create_time), then keeping (user_id) is often redundant. It wastes storage, increases write maintenance cost, and makes optimizer decisions more complex.
Production systems should scan for redundant indexes regularly, especially on core business tables that have gone through multiple iterations.
-- Check for redundant indexes
SELECT *
FROM sys.schema_redundant_indexes
WHERE table_schema = 'your_db'
AND table_name = 'orders';
This SQL identifies duplicate or prefix-redundant indexes so you can clean them up later.
Online schema change tools act as the safety valve for index maintenance on large tables
Even though MySQL 5.6+ supports Online DDL, that does not mean you should run index changes directly during peak traffic. A safer approach is to use pt-online-schema-change or gh-ost.
These tools rely on a shadow table, incremental synchronization, and a final cutover step to minimize the blocking window and turn high-risk DDL into a controlled, gradual operation.
pt-online-schema-change \
--alter "ADD INDEX idx_user_ctime (user_id, create_time)" \
D=test,t=orders \
--execute
This command adds a composite index to the order table in an online manner, reducing the impact of direct DDL on the business workload.
Monitoring index usage is a required part of index lifecycle management
Indexes are not one-time assets. They are evolving performance structures. When business queries change, old indexes can become zombie indexes.
Use performance_schema to observe index read and write behavior, and decide whether to keep, adjust, or remove an index based on load testing and slow query analysis.
-- Inspect index usage statistics and identify indexes that are rarely read
SELECT *
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE object_schema = 'your_db'
AND object_name = 'orders';
This SQL helps identify ineffective indexes whose COUNT_READ has remained very low or even zero for a long time.
One image conveys the content source and distribution context

AI Visual Insight: This animated image shows a sharing prompt for blog content in the WeChat ecosystem, which suggests that the original material has strong knowledge-sharing characteristics. However, it does not provide technical visualizations such as execution plans, index tree structures, or load test results. In production, you should still validate conclusions by cross-checking EXPLAIN, monitoring dashboards, and DDL change logs.
Six interview-ready and production-proven rules of thumb
- First decide whether you truly need a new index. Consider hot/cold data separation, archiving, or partitioning first.
- Only index high-selectivity, high-frequency filter conditions.
- Composite indexes should follow the rule: equality first, range later.
- Prioritize covering indexes for high-frequency read queries.
- Regularly remove duplicate and redundant indexes.
- For large-table changes, prefer online DDL tools and monitor the entire process closely.
FAQ
FAQ 1: Why does adding indexes to a multi-million-row table hurt write performance?
Because every write operation must maintain the primary key and all related secondary indexes. The more indexes you add, the more frequently MySQL updates B+Trees, which increases page splits, flush pressure, and lock contention.
FAQ 2: Why should range query columns usually appear last in a composite index?
Because once the optimizer starts a range scan on one column, its ability to keep using subsequent columns efficiently decreases. Putting the range column later maximizes the usefulness of the leftmost prefix match.
FAQ 3: When must you use pt-online-schema-change or gh-ost?
When the table is large, the workload has frequent writes, the business cannot tolerate noticeable blocking, or you need a change process that supports rollback and throttling, you should prefer these online schema change tools over running ALTER TABLE directly.
Core summary: This article breaks down six essential principles for indexing multi-million-row tables: highly selective columns, composite index ordering, covering indexes, redundant index cleanup, online DDL tools, and continuous monitoring. Together, they help developers improve query performance while preserving production stability.