2026 Mathematical Modeling Problem A: A Complete Technical Framework for Population Regional Classification, Forecasting, and Policy Simulation - Devuly | Smart Analytics for Developers & Projects

This article reconstructs a complete solution framework for 2026 Mathematical Modeling Problem A. Its core objective is to classify regional population patterns, quantify influencing factors, produce long-term population forecasts, and simulate policy scenarios. The framework specifically addresses three pain points: limited census snapshots, unstable long-term forecasting, and the difficulty of quantifying policy effects. Keywords: regional population distribution, Leslie matrix, grey relational analysis.

Table of Contents

The technical specification snapshot defines the modeling scope

Parameter	Details
Topic	Diversity in China’s regional population distribution and development support policies
Language	Python
Data Sources	National Bureau of Statistics census data, statistical yearbooks
Time Span	1990, 2000, 2010, 2020
Forecast Horizon	2030, 2035, 2055
Core Methods	Entropy Weight Method, Hierarchical Clustering, Grey Relational Analysis, PanelOLS, Leslie Matrix
Core Dependencies	numpy, pandas, scikit-learn, scipy, linearmodels, matplotlib
Stars	Not provided in the source data

This problem should be decomposed into a verifiable modeling pipeline

Although the problem appears to be about population policy analysis, it is fundamentally a strongly dependent modeling chain: classify first, explain second, forecast third, and simulate policy last. If the classification logic in Question 1 is unstable, the following three tasks lose their explanatory foundation.

Therefore, the optimal write-up is not to model the four questions in isolation, but to build a unified storyline: cluster provincial population characteristics to obtain regional types, identify key factors within each type, and then feed type-specific parameters into a Leslie matrix for downstream policy intervention simulation.

# Dependency chain across the four questions
pipeline = [
    "Question 1: Regional population classification",  # Derive stable categories first
    "Question 2: Quantify influencing factors",  # Compare differences based on categories
    "Question 3: Population forecasting by category",  # Use category parameters in forecasting
    "Question 4: Policy scenario simulation"   # Add policy adjustments to the forecasting model
]
print(" -> ".join(pipeline))

This code snippet clarifies the main modeling storyline and dependency order of the paper.

Question 1 is best solved by combining the Entropy Weight Method with hierarchical clustering

This problem contains only 31 provincial-level samples, but the indicator space is relatively high-dimensional, and the model must balance objective weighting with interpretable classification. Compared with K-Means, hierarchical clustering is more suitable for small-sample settings, and a dendrogram is also easier to present in a competition paper.

A practical indicator system should use five layers: population size, population structure, population quality, urbanization, and migration. It should also include economic association variables such as GDP per capita and the share of the tertiary industry. This design both satisfies the requirement to “add necessary and reasonable factors” and improves the interpretability of classification results.

The classification indicator system should cover both static population features and migration dynamics

Dimension	Representative Indicators	Modeling Role
Population Size	Total population, population density, growth rate	Identify the intensity of population agglomeration
Population Structure	Sex ratio, aging rate, child population share	Capture structural pressure
Population Quality	Share of highly educated population, illiteracy rate, years of schooling	Explain development potential
Urbanization	Urbanization rate, urban population growth rate	Reflect absorptive capacity
Mobility	Share of floating population, net inflow ratio	Identify migration polarization
Economic Association	GDP per capita, tertiary industry share	Explain exogenous attractiveness

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
from sklearn.cluster import AgglomerativeClustering

# X is the indicator matrix before standardization
X_std = StandardScaler().fit_transform(X)  # Standardize to remove scale effects
best_k, best_score = None, -1
for k in range(3, 6):
    model = AgglomerativeClustering(n_clusters=k, linkage="ward")
    labels = model.fit_predict(X_std)  # Hierarchical clustering returns category labels
    score = silhouette_score(X_std, labels)  # Silhouette score evaluates clustering quality
    if score > best_score:
        best_k, best_score = k, score

This code snippet determines the optimal number of clusters and provides a reproducible experimental basis for Question 1.

Question 2 is better supported by grey relational analysis plus panel regression

With only four census rounds, the number of time points is extremely limited, so traditional long-horizon time-series methods are not ideal. Grey relational analysis is well suited to small-sample, weak-information systems and can quickly rank the importance of influencing factors across regional types.

However, grey relational analysis only answers which factors are more related; it does not quantify effect size. For that reason, it is advisable to add panel regression and estimate fixed effects for each regional category, thereby quantifying the marginal effects of education, industry, aging, and migration variables.

The factor analysis section should output both rankings and coefficients

First, construct a reference sequence of total population for each regional type using grey relational analysis, and then compute relational degrees for each factor to produce a radar chart or heatmap. Next, build category-specific PanelOLS models and report coefficient significance and direction.

import numpy as np

def grey_relation(y, x, rho=0.5):
    y = np.array(y)
    x = np.array(x)
    diff = np.abs(y - x)  # Compute the difference between the reference and comparison sequences
    m, M = diff.min(), diff.max()
    coeff = (m + rho * M) / (diff + rho * M)  # Grey relational coefficient
    return coeff.mean()  # Return the average relational degree

This code snippet computes the grey relational degree between a single factor and population size.

Question 3 requires the Leslie matrix as the primary model for long-term forecasting

The problem asks for forecasts through 2055, which makes this a classic long-term population forecasting task. If the model relies only on GM(1,1) or trend fitting, it may work in the short run, but it lacks long-term interpretability and does not naturally support policy simulation.

The Leslie matrix has clear advantages: it models fertility and survival by age group, which makes it inherently appropriate for demographic analysis, and it can also incorporate migration rates. For a competition paper, this approach offers both strong theoretical acceptance and high extensibility.

The Leslie matrix allows forecasting and policy analysis to share one parameter interface

Split the population into 5-year age groups. The first row represents fertility contributions from childbearing-age groups, and the subdiagonal represents age-transition survival rates. By assigning different fertility, mortality, and migration parameters to different regional types, the model can generate differentiated forecasting paths.

import numpy as np

# Build a simplified Leslie matrix example
fertility = np.array([0, 0, 0.12, 0.18, 0.10, 0, 0])
survival = np.array([0.96, 0.98, 0.99, 0.98, 0.96, 0.92])
L = np.zeros((7, 7))
L[0, :] = fertility  # The first row stores fertility rates by age group
for i in range(1, 7):
    L[i, i-1] = survival[i-1]  # The subdiagonal stores survival rates

n_t = np.array([80, 75, 90, 88, 76, 60, 30])
n_t1 = L @ n_t  # Forecast the age structure in the next period

This code snippet shows how to perform one-step population age-structure forecasting with a Leslie matrix.

The key to Question 4 is not proposing policies, but mapping policies into model parameters

Many papers lose points on the final question not because the proposed policies are unreasonable, but because they cannot be quantified. The correct approach here is to rewrite policies as parameter changes inside the Leslie matrix. For example, fertility subsidies map to higher fertility rates in childbearing-age groups, while talent-attraction policies map to improved net migration rates.

At a minimum, define three scenarios: a baseline scenario, a moderate policy scenario, and an aggressive policy scenario. This setup produces clear comparative curves and supports phased conclusions over 5-year, 10-year, and 30-year horizons.

Policy mapping must vary by regional type

Population Category	Recommended Policy	Parameter Mapping
Highly aged regions	Fertility subsidies, childcare support, elderly care security	Increase fertility rates and slightly adjust elderly survival rates
Population outflow regions	Industrial support, talent return programs, rural entrepreneurship	Improve net migration rate
Economically developed regions	Housing support, balanced education access	Slight recovery in fertility rates
Population growth regions	Education investment, industrial upgrading	Improve years of schooling and urbanization

def apply_policy(fertility, migration, level="mild"):
    fertility = fertility.copy()
    migration = migration.copy()
    if level == "mild":
        fertility[2:5] *= 1.08  # Moderate policy: increase fertility by 8% for childbearing-age groups
        migration += 0.01       # Slightly improve the net migration rate
    elif level == "strong":
        fertility[2:5] *= 1.18  # Aggressive policy: increase fertility by 18%
        migration += 0.03
    return fertility, migration

This code snippet converts policy intensity into computable adjustments to population forecasting parameters.

The visual content indicates that the source page contains platform elements rather than core academic material

Reward QR code and interaction buttons

AI Visual Insight: This image is a platform interaction graphic rather than a technical figure. It does not contain modeling results, algorithm workflows, or data structure information, so it has no direct methodological value for the competition problem.

Ad placement display

AI Visual Insight: This image is an advertising slot on the page, typically used for content distribution and click conversion. It does not contain a population distribution map, clustering output, or forecasting curves, so it should be excluded during information extraction to avoid contaminating the technical content.

A high-scoring paper should emphasize unity rather than model stacking

The best strategy for this problem is not “one new model per question,” but “one core framework connecting all four questions.” The recommended storyline is: use entropy-weighted hierarchical clustering for classification, use grey relational analysis and panel regression to explain differences, use the Leslie matrix for forecasting, and use parameter mapping to drive policy scenario simulation.

This structure has three advantages: it creates a strong logical loop, allows parameter reuse and reduces repeated modeling, and makes it easier for judges to see a continuous evidence chain from data and classification to policy implementation.

FAQ

Q: Why not use K-Means directly for Question 1?

A: K-Means requires the number of clusters to be specified in advance and is sensitive to initialization. Because this problem has a small sample size and many indicators, hierarchical clustering is better suited to showing how categories are formed. Combined with silhouette scores and a dendrogram, it provides stronger interpretability.

Q: With only four census snapshots, why is regression and forecasting still feasible?

A: A pure time series would indeed be too short, but the problem also includes a cross-provincial dimension, which allows the data to be structured as panel data. For forecasting, the model uses the demographic Leslie matrix instead of long-series-dependent approaches such as ARIMA.

Q: How can policy effects avoid becoming arbitrary parameter assumptions?

A: You can define parameter ranges using historical population policies, statistical yearbooks, and published literature, and then run sensitivity analysis. The key is not to produce one “correct” number, but to show how changes in policy intensity affect the direction and magnitude of outcomes.

Core summary

Based on the original competition material, this article reconstructs a population regional distribution modeling framework suitable for 2026 Mathematical Modeling Problem A. It covers clustering-based classification, factor analysis, Leslie-matrix population forecasting, and policy scenario simulation, while also providing a practical indicator system, model linkage logic, and an implementation-ready code framework.