Predicting stock prices is a challenging yet exciting task in the field of data science. One of the most popular time series forecasting techniques is SARIMAX (Seasonal Autoregressive Integrated Moving Average with Exogenous Regressors), which combines autoregressive models with seasonality and external factors. In this blog post, we’ll walk through how to use SARIMAX to forecast stock prices for multiple companies over the next 30 days using Python.
Why SARIMAX for Stock Price Prediction?
Stock prices often exhibit both trends and seasonality over time. SARIMAX is a powerful tool for modeling time series data, especially when there are seasonal components and external variables (like the day of the week) influencing the prices. This technique builds on the ARIMA model by adding two essential features:
- Seasonal Component: Captures the seasonal trends in the data (e.g., weekly cycles).
- Exogenous Variables (Exog): Incorporates external variables like the day of the week, which may affect stock prices.
Code Walkthrough: Predicting Stock Prices Using SARIMAX
In this section, we’ll break down the code that forecasts stock prices for companies like Amazon (AMZN), Meta (META), and Google (GOOG).
Step 1: Data Loading and Preprocessing
The dataset used in this example is a simple stock price dataset for the companies AMZN, META, and GOOG. The dataset is loaded from a CSV file, where the Date
column contains the dates, and stock prices for the companies are listed in columns.
# Read the dataset from CSV (sample dataset downloaded from Yahoo Finance)
df = pd.read_csv('testdatastock.csv')
# Convert the 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'])
# Set 'Date' as index
df.set_index('Date', inplace=True)
# Add a 'day_of_week' column
df['day_of_week'] = df.index.dayofweek
# Drop missing values
df.dropna(subset=['AMZN', 'META', 'GOOG', 'day_of_week'], inplace=True)
Here, we:
- Convert the
Date
column to a datetime format for easier manipulation. - Set the
Date
column as the index of the dataframe. - Create a new column
day_of_week
to capture the day of the week as an exogenous variable (which could affect stock prices). - Drop rows with missing values.
Step 2: Differencing to Make Data Stationary
A stationary time series is one where the mean and variance do not change over time. Since stock prices are often non-stationary, we apply differencing to make the data more suitable for forecasting.
for target in target_vars:
# Apply the first differencing (lags by 30 periods)
df[target + '_diff'] = df[target].diff(periods=30).dropna()
Here, we perform seasonal differencing (subtracting the value 30 periods before) for each of the target variables (AMZN, META, and GOOG) to remove long-term trends.
Step 3: Defining and Fitting the SARIMAX Model
Next, we define and fit the SARIMAX model to the differenced data. We use the SARIMAX
class from Statsmodels, specifying the order (AR, I, MA) and seasonal order (seasonal AR, seasonal differencing, seasonal MA, and the period).
def fit_sarimax_and_predict(target, exog_vars, df, n_periods=30, order=(1, 1, 1), seasonal_order=(1, 1, 0, 7)):
# Define the SARIMAX model
sarimax_model = SARIMAX(df[target],
exog=df[exog_vars],
order=order,
seasonal_order=seasonal_order,
enforce_stationarity=False,
enforce_invertibility=False)
# Fit the model
sarimax_results = sarimax_model.fit()
print(f"Model summary for {target}:")
print(sarimax_results.summary())
In this function:
- SARIMAX Model: We define the SARIMAX model using the target variable (e.g., AMZN stock prices), exogenous variables (like
day_of_week
), and specified orders for AR, I, MA, and seasonal components. - Fit the Model: The model is then trained on the data, and the results are printed out.
Step 4: Forecasting the Next 30 Days
After fitting the model, we generate forecasts for the next 30 days. The exogenous variables (like the day of the week) are used for the forecast period to ensure that the future predictions incorporate known external factors.
# Generate exogenous variables for the future period (next 30 days)
future_dates = pd.date_range(start=df.index[-1] + pd.Timedelta(days=1), periods=n_periods, freq='D')
# Create exogenous features for future dates (assuming daily frequency)
future_exog = pd.DataFrame({
'day_of_week': future_dates.dayofweek,
}, index=future_dates)
# Make forecast for the future period
forecast = sarimax_results.get_forecast(steps=n_periods, exog=future_exog)
# Extract predicted values and confidence intervals
forecast_values = forecast.predicted_mean
conf_int = forecast.conf_int()
Here:
- Future Dates: We generate the next 30 days for which we want predictions.
- Exogenous Variables: We create a dataframe of exogenous variables for these future dates (i.e., the
day_of_week
). - Forecasting: Using the fitted SARIMAX model, we predict stock prices for the next 30 days and also calculate the confidence intervals.
Step 5: Visualizing the Forecasts
Once the forecasts are generated, we visualize them along with the actual historical data. We plot the predicted stock prices along with their confidence intervals for each company.
# Plot the forecasts for the first few target variables
plt.figure(figsize=(15, 12))
for i, target in enumerate(target_vars[:3]):
plt.subplot(4, 1, i+1)
plt.plot(df.index, df[target], label=f'Actual {target}', color='blue')
plt.plot(future_dates, forecasts[target+'_diff'], label=f'Forecast {target}', color='red')
plt.fill_between(future_dates, conf_intervals[target+'_diff'].iloc[:, 0], conf_intervals[target+'_diff'].iloc[:, 1], color='pink', alpha=0.3)
plt.title(f'{target}_diff Forecast using SARIMAX')
plt.xlabel('Date')
plt.ylabel(f'{target}_diff Value')
plt.legend()
# Save the graph as an image file
trends_graph_file = os.path.join("reports", "stock_predictions.png")
plt.savefig(trends_graph_file)
plt.close()
This code generates:
- A plot showing the actual stock prices and the forecasted values.
- Confidence intervals are visualized as shaded areas around the forecast to show the uncertainty in the predictions.
Step 6: Saving Forecasts to CSV
Finally, we save the predicted stock prices and their confidence intervals into CSV files for later analysis or reporting.
# Optionally, save the forecasts and confidence intervals to CSV
forecast_df = pd.DataFrame(forecasts)
forecast_df.index = future_dates
forecast_df.to_csv('reports/forecasted_values_future.csv')
conf_int_df = pd.concat(conf_intervals.values(), axis=1)
conf_int_df.index = future_dates
conf_int_df.to_csv('reports/confidence_intervals_future.csv')
Conclusion
In this post, we've walked through the steps of using the SARIMAX model for forecasting stock prices over the next 30 days. We demonstrated:
- Data preprocessing techniques like differencing to make the data stationary.
- Building and fitting a SARIMAX model with exogenous variables.
- Forecasting future stock prices and visualizing the results.
- Saving the forecasts and confidence intervals for reporting.
This approach can be applied to any time series forecasting task, whether it’s predicting stock prices, sales, or any other time-dependent variable.
By incorporating seasonality and external factors, SARIMAX provides a powerful framework for forecasting future trends with increased accuracy.
Code repo - https://github.com/Digitophile/timeseriesprediction-sarimax
Comments
Post a Comment