Why MySQL Ignores Indexes: The Truth About the CBO Cost Model - Devuly | Smart Analytics for Developers & Projects

[AI Readability Summary]

MySQL does not automatically use an index just because one exists. Instead, its Cost-Based Optimizer (CBO) compares the estimated cost of full table scans, index scans, back-to-table lookups, and sorting, then chooses the cheapest execution path. This article explains why indexes are sometimes ignored, how the cost model works, and how to diagnose optimizer decisions. Keywords: MySQL, CBO, Index Optimization

Table of Contents

The technical specification snapshot provides quick context

Parameter	Description
Topic Language	SQL / MySQL
Storage Engine	InnoDB
Optimization Mechanism	CBO (Cost-Based Optimization)
Diagnostic Protocols / Tools	EXPLAIN FORMAT=JSON, EXPLAIN ANALYZE
Reference Popularity	The original article shows 319 views and 7 saves
Core Dependencies	InnoDB statistics, optimizer cost model

AI Visual Insight: This animated image appears in the introduction and serves as a problem setup. It emphasizes that index availability and execution plan selection are not linearly related, and it helps draw attention to SQL scan paths, back-to-table lookup costs, and optimizer decision differences.

MySQL index usage is fundamentally a cost comparison

Many engineers assume that once an index exists, MySQL will use it. That is not how MySQL works. The optimizer does not care whether an index exists in isolation. It cares whether the total cost of using that index is lower than the cost of a full table scan.

The CBO enumerates candidate execution paths for each SQL statement, then uses statistics to estimate scanned pages, matched rows, back-to-table lookups, and sort cost. It ultimately chooses the plan with the lowest cost. In other words, an index is only a candidate, not a requirement.

The optimizer follows a four-step decision process for index selection

Parse the SQL statement and complete syntax and privilege checks.
Generate candidate indexes based on WHERE, JOIN, and ORDER BY.
Quantify the cost of both full table scans and index access paths.
Choose the execution plan with the lowest query cost.

EXPLAIN FORMAT=JSON
SELECT *
FROM user
WHERE age BETWEEN 20 AND 30;

This command shows the optimizer’s estimated execution cost and its final index choice.

The cost model depends on cost constants and statistics

MySQL cost values are not actual milliseconds. They are abstract cost units. Common cost constants include disk page reads, memory page reads, row filtering, and key comparisons. Together, they determine whether a path is theoretically expensive.

The most important idea is this: random I/O is usually much more expensive than CPU-side row filtering. That is why a secondary index can become very costly once it triggers a large number of back-to-table lookups, sometimes even more costly than a sequential full table scan.

Common cost items can be understood this way

Cost Item	Default Value	Meaning
io_block_read_cost	1.0	Disk I/O cost to read one data page
memory_block_read_cost	0.25	Cost to read one page from the buffer pool
row_evaluate_cost	0.2	CPU cost to evaluate whether one row matches the condition
key_compare_cost	0.1	CPU cost to compare index key values

ANALYZE TABLE user;

This command refreshes statistics and helps prevent optimizer misjudgment caused by inaccurate estimates.

Statistics accuracy directly determines whether the optimizer sees the problem correctly

The optimizer relies on sampled InnoDB statistics rather than recalculating exact values in real time for every query. Common statistics include total table rows, clustered index pages, secondary index pages, and index cardinality. In general, higher cardinality means better index selectivity.

After large batches of inserts, updates, or deletes, statistics such as Rows and Cardinality may drift away from the actual data distribution. In that case, even if the SQL is written correctly, the optimizer may still skip an index because its estimates are wrong.

A simple example table is enough to demonstrate the issue

CREATE TABLE user (
  id INT NOT NULL AUTO_INCREMENT COMMENT 'Primary key',
  age INT NOT NULL COMMENT 'Age',
  name VARCHAR(100) NOT NULL COMMENT 'Name',
  create_time DATETIME NOT NULL COMMENT 'Creation time',
  PRIMARY KEY (id),
  KEY idx_age (age) COMMENT 'Secondary index on age'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

This table contains a primary key and a secondary index on age, which is enough to demonstrate three typical access paths: full table scan, back-to-table lookup, and covering index scan.

A full table scan is sometimes cheaper

Assume the user table contains 100,000 rows and the clustered index occupies 1,024 pages. The full table scan cost formula can be simplified as: total clustered index pages × I/O cost + total table rows × row evaluation cost.

Using the sample values: 1024 × 1.0 + 100000 × 0.2 = 21024. This number becomes the optimizer’s baseline for comparing other candidate plans. If an index-based plan costs more than 21024, the optimizer may reject the index.

The biggest problem with a secondary index scan is the back-to-table lookup

Assume the query age BETWEEN 20 AND 30 matches 20,000 rows. The optimizer must not only scan the range pages in idx_age, but also use the primary key to fetch the full row from the clustered index for each match. That creates a large amount of random I/O.

SELECT *
FROM user
WHERE age BETWEEN 20 AND 30;

For this SQL statement, the key cost is not index positioning. It is the back-to-table read after matching 20,000 rows.

According to the article’s estimate, the total cost of the index scan plus back-to-table lookups is about 30020, which is higher than the full table scan cost of 21024. That is why MySQL may still choose a full table scan even when an index exists on age. This is the root cause behind the common statement that an index becomes ineffective when the filter ratio is too high.

A covering index removes the most expensive cost component directly

If the query only needs id and age, the secondary index leaf nodes already contain enough information to return the result, because InnoDB secondary indexes naturally include the primary key value. In that case, MySQL does not need to perform back-to-table lookups, and the cost drops significantly.

SELECT id, age
FROM user
WHERE age BETWEEN 20 AND 30;

This SQL statement uses a covering index, so MySQL can return the result by scanning only the secondary index.

Based on the sample estimate, the total cost of the covering index path is about 6020, far below the full table scan cost of 21024. The value of a covering index is therefore not abstract. It eliminates random I/O from back-to-table lookups.

Sort cost is also part of the total bill

If ORDER BY create_time is not supported by an index, MySQL may execute a full table scan + filesort plan. Even if the query finally returns only 10 rows, MySQL may still need to read and compare a large number of records before the sort completes, making the cost very high.

SELECT *
FROM user
ORDER BY create_time
LIMIT 10;

If this SQL statement lacks an index on create_time, the bottleneck is usually not LIMIT. It is the data read and comparison work before sorting.

EXPLAIN is the first tool for validating CBO decisions

EXPLAIN FORMAT=JSON lets you inspect query_cost, read_cost, eval_cost, and rows_examined_per_scan. It helps you confirm why the optimizer chose a specific index, or why it skipped indexes entirely.

EXPLAIN ANALYZE goes one step further. It actually executes the SQL statement and reports real execution time, actual scanned rows, and loop counts. If the estimated row count is far from reality, the statistics are usually stale.

A typical JSON view of cost details looks like this

{
  "query_block": {
    "cost_info": {
      "query_cost": "30020.00"
    },
    "table": {
      "table_name": "user",
      "access_type": "range",
      "key": "idx_age",
      "rows_examined_per_scan": 20000,
      "cost_info": {
        "read_cost": "26020.00",
        "eval_cost": "4000.00",
        "prefix_cost": "30020.00"
      }
    }
  }
}

This output shows how the optimizer combines read cost and evaluation cost into the final query cost.

Indexes are usually ignored in eight common scenarios

First, the indexed column has low selectivity, such as gender or status, so the filter ratio is too high. Second, the query applies a function, computation, or implicit type conversion to the indexed column. Third, a composite index does not satisfy the leftmost prefix rule. Fourth, LIKE '%abc' breaks prefix matching.

Fifth, the statistics are inaccurate. Sixth, an OR condition includes a column without an index. Seventh, negative predicates such as !=, NOT IN, and IS NOT NULL cover too wide a range. Eighth, the table is so small that a full table scan is cheaper by design.

-- Anti-pattern 1: applying a function to an indexed column
WHERE YEAR(create_time) = 2026

-- Anti-pattern 2: implicit type conversion
WHERE phone = 13800138000

-- Anti-pattern 3: arithmetic breaks index ordering
WHERE age + 1 = 30

These patterns share the same core problem: they prevent the optimizer from using the ordered lookup capability of the B+ tree effectively.

Index optimization should be treated as cost governance in production systems

Prefer indexing high-cardinality columns, because they can significantly reduce the number of scanned rows. Design queries to use covering indexes whenever possible, and avoid SELECT *, which amplifies back-to-table lookup costs. If sorting and grouping fields are used frequently, consider index design that supports them together.

At the same time, do not assume that more indexes are always better. Indexes increase write cost, and they also increase the optimizer’s complexity when enumerating candidate paths. A truly effective index should serve high-frequency query patterns, not emotional comfort.

FAQ

1. Why does MySQL use an index for the same column sometimes, but not always?

Because different filter ratios change the total cost. When only a few rows match, an index is usually cheaper. When many rows match, random I/O from back-to-table lookups can make a full table scan cheaper.

2. Why does `SELECT *` make indexes more likely to be ignored?

Because it usually prevents a covering index. After a secondary index match, MySQL still has to perform a back-to-table lookup to fetch all columns. Once that lookup cost becomes high, the optimizer may reject the index.

3. What should I do first when an execution plan looks wrong?

Start with EXPLAIN FORMAT=JSON to inspect query_cost and row estimates. If you suspect stale statistics, run ANALYZE TABLE next. If needed, compare the estimates with real execution behavior using EXPLAIN ANALYZE.

Core takeaway: This article explains why MySQL indexes can exist but still remain unused by analyzing the Cost-Based Optimizer, cost constants, statistics, full table scans, back-to-table lookups, covering indexes, sort cost, and diagnostic tools. It helps developers build a systematic understanding of index selection logic.