Metalearning in Time Series Ensembles

The Research Question

Can we use deep learning models to identify market conditions where particular statistical models are more effective? If so, we could dynamically weight ensemble members to improve overall performance.

Approach

Traditional time series ensembles use static weights or simple averaging. This project explores metalearning: training a neural network to predict which base model will perform best given recent market features.

Base Models

The ensemble includes classic statistical models:

Python: Base Model Implementations▼

import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from arch import arch_model

class BaseForecaster:
    """Abstract base class for forecasting models."""
def fit(self, y):
    raise NotImplementedError

def predict(self, horizon):
    raise NotImplementedError
class ARIMAForecaster(BaseForecaster):
    def init(self, order=(1, 1, 1)):
        self.order = order
        self.model = None
def fit(self, y):
    self.model = ARIMA(y, order=self.order).fit()
    return self

def predict(self, horizon):
    return self.model.forecast(steps=horizon)
class GARCHForecaster(BaseForecaster):
    def init(self, p=1, q=1):
        self.p = p
        self.q = q
        self.model = None
def fit(self, y):
    returns = np.diff(np.log(y)) * 100
    self.model = arch_model(returns, p=self.p, q=self.q).fit(disp=&#039;off&#039;)
    return self

def predict(self, horizon):
    forecast = self.model.forecast(horizon=horizon)
    return forecast.variance.values[-1]</code></pre>



Metalearning Network
The metalearner observes recent time series features and predicts which base model will have lowest error:

Python: Metalearning Architecture▼

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

def build_metalearner(n_features, n_models, sequence_length=20):
    """
    LSTM-based metalearner that predicts optimal model weights.
    """
    inputs = keras.Input(shape=(sequence_length, n_features))
# Feature extraction with LSTM
x = layers.LSTM(64, return_sequences=True)(inputs)
x = layers.Dropout(0.2)(x)
x = layers.LSTM(32)(x)
x = layers.Dropout(0.2)(x)

# Market regime embedding
regime = layers.Dense(16, activation=&#039;relu&#039;)(x)

# Output: softmax weights for each base model
weights = layers.Dense(n_models, activation=&#039;softmax&#039;, name=&#039;model_weights&#039;)(regime)

model = keras.Model(inputs=inputs, outputs=weights)
model.compile(
    optimizer=keras.optimizers.Adam(0.001),
    loss=&#039;categorical_crossentropy&#039;,  # Best model as one-hot
    metrics=[&#039;accuracy&#039;]
)

return model
def extract_features(window):
    """Extract features from recent time series window."""
    returns = np.diff(np.log(window))
return np.array([
    np.mean(returns),           # Mean return
    np.std(returns),            # Volatility
    np.max(returns) - np.min(returns),  # Range
    np.corrcoef(returns[:-1], returns[1:])[0, 1],  # Autocorrelation
    skew(returns),              # Skewness
    kurtosis(returns),          # Kurtosis
])</code></pre>



Results
The metalearning approach achieved:

12% reduction in mean absolute error compared to equal-weighted ensemble
Improved Sharpe ratio in backtested trading strategy
Successfully identified regime changes where model preferences shifted

Key Insights

Volatility clustering was the strongest predictor of GARCH outperformance
Trend strength indicators helped identify when trend-following models excelled
The metalearner learned meaningful regime representations without explicit labels