This article reconstructs a complete solution framework for 2026 Mathematical Modeling Problem A. Its core objective is to classify regional population patterns, quantify influencing factors, produce long-term population forecasts, and simulate policy scenarios. The framework specifically addresses three pain points: limited census snapshots, unstable long-term forecasting, and the difficulty of quantifying policy effects. Keywords: regional population distribution, Leslie matrix, grey relational analysis.
The technical specification snapshot defines the modeling scope
| Parameter | Details |
|---|---|
| Topic | Diversity in China’s regional population distribution and development support policies |
| Language | Python |
| Data Sources | National Bureau of Statistics census data, statistical yearbooks |
| Time Span | 1990, 2000, 2010, 2020 |
| Forecast Horizon | 2030, 2035, 2055 |
| Core Methods | Entropy Weight Method, Hierarchical Clustering, Grey Relational Analysis, PanelOLS, Leslie Matrix |
| Core Dependencies | numpy, pandas, scikit-learn, scipy, linearmodels, matplotlib |
| Stars | Not provided in the source data |
This problem should be decomposed into a verifiable modeling pipeline
Although the problem appears to be about population policy analysis, it is fundamentally a strongly dependent modeling chain: classify first, explain second, forecast third, and simulate policy last. If the classification logic in Question 1 is unstable, the following three tasks lose their explanatory foundation.
Therefore, the optimal write-up is not to model the four questions in isolation, but to build a unified storyline: cluster provincial population characteristics to obtain regional types, identify key factors within each type, and then feed type-specific parameters into a Leslie matrix for downstream policy intervention simulation.
# Dependency chain across the four questions
pipeline = [
"Question 1: Regional population classification", # Derive stable categories first
"Question 2: Quantify influencing factors", # Compare differences based on categories
"Question 3: Population forecasting by category", # Use category parameters in forecasting
"Question 4: Policy scenario simulation" # Add policy adjustments to the forecasting model
]
print(" -> ".join(pipeline))
This code snippet clarifies the main modeling storyline and dependency order of the paper.
Question 1 is best solved by combining the Entropy Weight Method with hierarchical clustering
This problem contains only 31 provincial-level samples, but the indicator space is relatively high-dimensional, and the model must balance objective weighting with interpretable classification. Compared with K-Means, hierarchical clustering is more suitable for small-sample settings, and a dendrogram is also easier to present in a competition paper.
A practical indicator system should use five layers: population size, population structure, population quality, urbanization, and migration. It should also include economic association variables such as GDP per capita and the share of the tertiary industry. This design both satisfies the requirement to “add necessary and reasonable factors” and improves the interpretability of classification results.
The classification indicator system should cover both static population features and migration dynamics
| Dimension | Representative Indicators | Modeling Role |
|---|---|---|
| Population Size | Total population, population density, growth rate | Identify the intensity of population agglomeration |
| Population Structure | Sex ratio, aging rate, child population share | Capture structural pressure |
| Population Quality | Share of highly educated population, illiteracy rate, years of schooling | Explain development potential |
| Urbanization | Urbanization rate, urban population growth rate | Reflect absorptive capacity |
| Mobility | Share of floating population, net inflow ratio | Identify migration polarization |
| Economic Association | GDP per capita, tertiary industry share | Explain exogenous attractiveness |
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
from sklearn.cluster import AgglomerativeClustering
# X is the indicator matrix before standardization
X_std = StandardScaler().fit_transform(X) # Standardize to remove scale effects
best_k, best_score = None, -1
for k in range(3, 6):
model = AgglomerativeClustering(n_clusters=k, linkage="ward")
labels = model.fit_predict(X_std) # Hierarchical clustering returns category labels
score = silhouette_score(X_std, labels) # Silhouette score evaluates clustering quality
if score > best_score:
best_k, best_score = k, score
This code snippet determines the optimal number of clusters and provides a reproducible experimental basis for Question 1.
Question 2 is better supported by grey relational analysis plus panel regression
With only four census rounds, the number of time points is extremely limited, so traditional long-horizon time-series methods are not ideal. Grey relational analysis is well suited to small-sample, weak-information systems and can quickly rank the importance of influencing factors across regional types.
However, grey relational analysis only answers which factors are more related; it does not quantify effect size. For that reason, it is advisable to add panel regression and estimate fixed effects for each regional category, thereby quantifying the marginal effects of education, industry, aging, and migration variables.
The factor analysis section should output both rankings and coefficients
First, construct a reference sequence of total population for each regional type using grey relational analysis, and then compute relational degrees for each factor to produce a radar chart or heatmap. Next, build category-specific PanelOLS models and report coefficient significance and direction.
import numpy as np
def grey_relation(y, x, rho=0.5):
y = np.array(y)
x = np.array(x)
diff = np.abs(y - x) # Compute the difference between the reference and comparison sequences
m, M = diff.min(), diff.max()
coeff = (m + rho * M) / (diff + rho * M) # Grey relational coefficient
return coeff.mean() # Return the average relational degree
This code snippet computes the grey relational degree between a single factor and population size.
Question 3 requires the Leslie matrix as the primary model for long-term forecasting
The problem asks for forecasts through 2055, which makes this a classic long-term population forecasting task. If the model relies only on GM(1,1) or trend fitting, it may work in the short run, but it lacks long-term interpretability and does not naturally support policy simulation.
The Leslie matrix has clear advantages: it models fertility and survival by age group, which makes it inherently appropriate for demographic analysis, and it can also incorporate migration rates. For a competition paper, this approach offers both strong theoretical acceptance and high extensibility.
The Leslie matrix allows forecasting and policy analysis to share one parameter interface
Split the population into 5-year age groups. The first row represents fertility contributions from childbearing-age groups, and the subdiagonal represents age-transition survival rates. By assigning different fertility, mortality, and migration parameters to different regional types, the model can generate differentiated forecasting paths.
import numpy as np
# Build a simplified Leslie matrix example
fertility = np.array([0, 0, 0.12, 0.18, 0.10, 0, 0])
survival = np.array([0.96, 0.98, 0.99, 0.98, 0.96, 0.92])
L = np.zeros((7, 7))
L[0, :] = fertility # The first row stores fertility rates by age group
for i in range(1, 7):
L[i, i-1] = survival[i-1] # The subdiagonal stores survival rates
n_t = np.array([80, 75, 90, 88, 76, 60, 30])
n_t1 = L @ n_t # Forecast the age structure in the next period
This code snippet shows how to perform one-step population age-structure forecasting with a Leslie matrix.
The key to Question 4 is not proposing policies, but mapping policies into model parameters
Many papers lose points on the final question not because the proposed policies are unreasonable, but because they cannot be quantified. The correct approach here is to rewrite policies as parameter changes inside the Leslie matrix. For example, fertility subsidies map to higher fertility rates in childbearing-age groups, while talent-attraction policies map to improved net migration rates.
At a minimum, define three scenarios: a baseline scenario, a moderate policy scenario, and an aggressive policy scenario. This setup produces clear comparative curves and supports phased conclusions over 5-year, 10-year, and 30-year horizons.
Policy mapping must vary by regional type
| Population Category | Recommended Policy | Parameter Mapping |
|---|---|---|
| Highly aged regions | Fertility subsidies, childcare support, elderly care security | Increase fertility rates and slightly adjust elderly survival rates |
| Population outflow regions | Industrial support, talent return programs, rural entrepreneurship | Improve net migration rate |
| Economically developed regions | Housing support, balanced education access | Slight recovery in fertility rates |
| Population growth regions | Education investment, industrial upgrading | Improve years of schooling and urbanization |
def apply_policy(fertility, migration, level="mild"):
fertility = fertility.copy()
migration = migration.copy()
if level == "mild":
fertility[2:5] *= 1.08 # Moderate policy: increase fertility by 8% for childbearing-age groups
migration += 0.01 # Slightly improve the net migration rate
elif level == "strong":
fertility[2:5] *= 1.18 # Aggressive policy: increase fertility by 18%
migration += 0.03
return fertility, migration
This code snippet converts policy intensity into computable adjustments to population forecasting parameters.
The visual content indicates that the source page contains platform elements rather than core academic material

AI Visual Insight: This image is a platform interaction graphic rather than a technical figure. It does not contain modeling results, algorithm workflows, or data structure information, so it has no direct methodological value for the competition problem.

AI Visual Insight: This image is an advertising slot on the page, typically used for content distribution and click conversion. It does not contain a population distribution map, clustering output, or forecasting curves, so it should be excluded during information extraction to avoid contaminating the technical content.
A high-scoring paper should emphasize unity rather than model stacking
The best strategy for this problem is not “one new model per question,” but “one core framework connecting all four questions.” The recommended storyline is: use entropy-weighted hierarchical clustering for classification, use grey relational analysis and panel regression to explain differences, use the Leslie matrix for forecasting, and use parameter mapping to drive policy scenario simulation.
This structure has three advantages: it creates a strong logical loop, allows parameter reuse and reduces repeated modeling, and makes it easier for judges to see a continuous evidence chain from data and classification to policy implementation.
FAQ
Q: Why not use K-Means directly for Question 1?
A: K-Means requires the number of clusters to be specified in advance and is sensitive to initialization. Because this problem has a small sample size and many indicators, hierarchical clustering is better suited to showing how categories are formed. Combined with silhouette scores and a dendrogram, it provides stronger interpretability.
Q: With only four census snapshots, why is regression and forecasting still feasible?
A: A pure time series would indeed be too short, but the problem also includes a cross-provincial dimension, which allows the data to be structured as panel data. For forecasting, the model uses the demographic Leslie matrix instead of long-series-dependent approaches such as ARIMA.
Q: How can policy effects avoid becoming arbitrary parameter assumptions?
A: You can define parameter ranges using historical population policies, statistical yearbooks, and published literature, and then run sensitivity analysis. The key is not to produce one “correct” number, but to show how changes in policy intensity affect the direction and magnitude of outcomes.
Core summary
Based on the original competition material, this article reconstructs a population regional distribution modeling framework suitable for 2026 Mathematical Modeling Problem A. It covers clustering-based classification, factor analysis, Leslie-matrix population forecasting, and policy scenario simulation, while also providing a practical indicator system, model linkage logic, and an implementation-ready code framework.