This hands-on guide fully implements an MLP binary classification workflow in Keras for financial up/down prediction, covering feature engineering, Sequential and Functional API design, EarlyStopping, Dropout, and model persistence. The core goal is to upgrade a model from merely “running” to being evaluable, reproducible, and tunable. Keywords: Keras, MLP, Dropout.
This hands-on article fully reviews a Keras MLP project for financial classification
| Parameter | Description |
|---|---|
| Language | Python |
| Framework | TensorFlow 2.20.0 / Keras |
| Task Type | Binary classification for next-trading-day direction prediction |
| Dataset Size | 2,451 samples, 11 features |
| Train/Test Split | 1,715 / 736 |
| Validation Set | 343 |
| License | CC 4.0 BY-SA |
| Stars | Not provided in the original article |
| Core Dependencies | tensorflow, pandas, numpy, scikit-learn, matplotlib, seaborn |
The project goal is to master the minimum engineering loop for MLPs by the shortest path
The original project focuses on building a multilayer perceptron with Keras. Its goal is not just to define a few Dense layers, but to establish a complete pipeline from data processing and modeling to training, evaluation, and persistence. For deep learning beginners, this is one of the most valuable steps.
The business scenario is financial time-series direction prediction. The label is defined as whether the next day’s return is greater than 0. In essence, this is a standard binary classification problem, which fits naturally with a sigmoid output layer and the binary_crossentropy loss function.
Data processing determines whether the model has anything worth learning
The code constructs 11 features, including RSI, MACD, MACD Signal, moving-average ratio, volatility, volume ratio, and multi-order momentum. The value of this feature set is that it encodes trend, volatility, price-volume behavior, and short-term inertia at the same time.
def generate_features(df):
df['return'] = df['close'].pct_change() # Compute returns
df['ma5'] = df['close'].rolling(5).mean() # Short-term moving average
df['ma20'] = df['close'].rolling(20).mean() # Medium-term moving average
df['ma_ratio'] = df['ma5'] / df['ma20'] - 1 # Moving-average deviation ratio
df['volatility'] = df['return'].rolling(20).std() # Volatility
df['target'] = (df['return'].shift(-1) > 0).astype(int) # Next-day up/down label
return df.dropna()
This code transforms raw candlestick price series into supervised learning samples suitable for MLP input.
The model architecture uses two Keras APIs to cover different complexity levels
The Sequential API is a good fit for linearly stacked networks. Its code is concise and makes it ideal as a teaching starting point. In this experiment, the baseline model uses three hidden layers with sizes 128-64-32 and a total of 11,905 parameters. It is small enough for fast iteration.
The Functional API offers more flexible topology. You can insert BatchNormalization, branching structures, or multi-input networks. For future scenarios that combine technical indicators with fundamental factors, it is more extensible than Sequential.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
model = Sequential([
Dense(128, activation='relu', input_shape=(11,)), # First hidden layer
Dropout(0.3), # Randomly drop units to reduce overfitting
Dense(64, activation='relu'), # Second hidden layer
Dropout(0.3),
Dense(32, activation='relu'), # Third hidden layer
Dense(1, activation='sigmoid') # Output probability of an upward move
])
This code builds a three-layer MLP for binary classification and adds Dropout regularization.
Training configuration determines whether the model can converge stably
The project uses the Adam optimizer, binary_crossentropy loss, and accuracy as the evaluation metric. That is the standard setup for binary classification. More importantly, it introduces three callbacks: EarlyStopping, ModelCheckpoint, and ReduceLROnPlateau.
EarlyStopping stops training early when validation loss no longer improves. ModelCheckpoint persists the best model. ReduceLROnPlateau automatically lowers the learning rate when the training process reaches a plateau. Together, these three callbacks form a gold-standard template for Keras training control.
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
callbacks = [
EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True), # Stop when validation performance no longer improves
ModelCheckpoint('mlp_model.keras', monitor='val_accuracy', save_best_only=True), # Save the best model
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6) # Reduce learning rate automatically
]
This code adds early stopping, best-model checkpointing, and adaptive learning-rate scheduling to the training process.
The training results show clear generalization pressure in this task
The baseline model reaches a best validation accuracy of about 0.6093, but test accuracy is only 0.5136, with an AUC of 0.5538. This indicates that the model learns some patterns during training, but gains limited value when transferred to unseen samples.
From the training logs, the model stops early at epoch 11, and the best weights roll back to around epoch 1. This suggests that later training does not improve generalization and may instead begin to overfit or memorize noise.
The training curves directly expose both overfitting and weak signal strength

AI Visual Insight: The left chart shows training loss continuing to decline while validation loss stays mostly flat and rises slightly. The right chart shows training accuracy steadily improving while validation accuracy remains stuck around 0.59 to 0.61. This is a classic pattern of mild to moderate overfitting, indicating that the dataset contains limited predictive signal while the model already has enough capacity to memorize the training set.
Dropout significantly improves generalization performance in this experiment
The comparison experiment shows that the model without Dropout reaches a training accuracy of 0.7726, but validation accuracy is only 0.5860, resulting in an overfitting gap of 0.1866. After adding Dropout, training accuracy falls to 0.6181, while validation accuracy rises to 0.6152, leaving a gap of only 0.0029.
This result is highly representative: in weak-signal financial tasks, higher training accuracy does not mean stronger predictive power. A constrained model is often more robust than a high-capacity one.

AI Visual Insight: This heatmap shows the hit distribution of binary predictions across the “up” and “down” classes. It helps identify whether the model is biased toward one class. If the diagonal advantage is not obvious, the model usually lacks sufficient discriminative power and the class boundary remains weak.

AI Visual Insight: The curve stays only slightly above the random diagonal. Combined with an AUC of about 0.55, this suggests that the model captures only weakly separable signal. It is better suited for further feature enhancement, threshold optimization, and ensemble modeling than for direct use in high-confidence trading decisions.
Comparative experiments further show that regularization beats blindly deepening the network

AI Visual Insight: This figure compares validation loss and validation accuracy trajectories with and without Dropout. After adding Dropout, the curves become smoother and validation metrics become more stable. This indicates that random unit dropping reduces neuron co-adaptation and improves robustness on unseen samples.
Hyperparameter search results show that returns depend more on constraint than pure scaling
In the hidden-layer experiment, the larger three-layer architecture (256, 128, 64) achieves the highest validation accuracy of 0.6181, but its AUC is only about 0.5657, so the gain is not substantial. This suggests that widening the network improves fit but does not fundamentally change task difficulty.
The Dropout-rate experiment is more revealing. When Dropout increases to 0.5, AUC reaches about 0.5735, the best among all candidates. This indicates that under the current feature system, controlling overfitting matters more than increasing parameter count.

AI Visual Insight: The left chart shows the relationship between Dropout rate and AUC, while the right chart shows the relationship between Dropout rate and the overfitting gap. The overall trend shows that higher Dropout significantly compresses the train-validation gap and achieves better AUC in the higher range, indicating that this dataset favors stronger regularization.
Model saving and loading provide the foundation for reproducibility and deployment
The project saves both the full model file best_mlp.model.keras and the weights file best_mlp.weights.h5. The reloaded model produces the same AUC as the original model, which confirms that the persistence workflow is valid. This step is a prerequisite for moving from notebook experimentation to service deployment.
from tensorflow.keras.models import load_model
best_model.save('best_mlp.model.keras') # Save the full model
loaded_model = load_model('best_mlp.model.keras') # Reload the model
pred = loaded_model.predict(X_test) # Run inference with the reloaded model
This code persists a trained Keras model and verifies that loading it reproduces the same results.
The core conclusion from this hands-on project is that weak-signal finance demands generalization control
If this entire project can be reduced to one sentence, it is this: building an MLP with Keras is easy, but the real challenge is keeping validation and test performance consistent. In this case, Dropout, early stopping, and disciplined data splitting are more effective than simply stacking more network layers.
For future extension, the best next steps are to add more time-series features, use rolling-window validation, optimize classification thresholds, and build ensembles that combine MLPs with tree-based models instead of relying only on deeper fully connected networks.
FAQ
Why is this MLP’s test accuracy close to random?
Short-term financial direction prediction is inherently a weak-signal problem, and 11 technical indicators are not enough to characterize future return direction reliably. The model can fit the training set to some extent, but its generalization ability on the test set is limited, which is why the AUC stays only slightly above 0.5.
How should you choose between the Sequential API and the Functional API in practice?
If the network is a simple linear stack, use Sequential first because it gives you the shortest and clearest code. If you need multi-input design, skip connections, shared layers, pluggable BatchNormalization, or more complex topology, you should move directly to the Functional API.
Is a higher Dropout rate always better?
No. If Dropout is too low, regularization may be insufficient. If it is too high, the model’s representational power may degrade. In this experiment, 0.5 performs best, but that only means stronger regularization is more suitable for the current data, features, and model setup. You must revalidate this choice on a different dataset.
AI Readability Summary: This article rebuilds a practical MLP workflow for binary financial direction prediction with Keras and TensorFlow, covering data generation, Sequential and Functional API modeling, callback-driven training, Dropout-based overfitting control, hyperparameter experiments, and model persistence, while interpreting performance through real evaluation metrics.