Sunday, February 21, 2016

An Initial Exploration of Electricity Price Forecasting

One month ago, I decided to perform some exploration on the problem electricity price forecasting to get some knowledge in electricity market and sharpen skills in data management and modeling. This post will describe what I have done and achieved in this project, from problem definition, literature review, data collection, data exploration, feature creation, model implementation, model evaluation, and result discussion. Feature engineering and Spark are two skills I aim to gain and improve in this project.  

The price of a commodity in a market influences the behaviors of all market participants from suppliers to consumers. So knowledge about the future price plays a determinant role on selling and buying decisions for suppliers to make profits and for consumers to save cost.

Electricity is the commodity in the electricity market. In deregulated electricity market, generating companies (GENCOs) submit production bids one day ahead. When GENCOs decide the bids, both electricity load and price for the coming day are not known. So those decisions rely on the forecasting of electricity load and price. Electricity load forecasting has moved to an advanced stage both in the industry and academic with low enough prediction error, while electricity price forecasting is not as mature as electricity load forecasting in the respect of tools and algorithms. That is because the components of electricity price are more complicated than electricity load. 

1. Literature Review

"If I have been able to see further, it was only because I stood on the shoulder of giants." -- Newton

The review paper (Electricity Price Forecasting in Deregulated Markets: A Review and Evaluation) has been mainly referred to. In this paper, both price-influencing factors and models are summarized. 

2. Exploratory Data Analysis

The locational based marginal price (LBMP) in day ahead market provided by New York Independent System Operator (ISO) is used. Because of computational resource limitation, this project is only to forecast the price of "WEST" zone for the coming day.

"Some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used."   -- Pedro Domingos in "A Few Things to Know about Machine Learning"

Based on the scatter plot and correlation coefficient, the following variables are used as model inputs. 

- Marginal cost losses
- The square of marginal cost losses
- Marginal cost congestion
- The square of marginal cost congestion
- The average price in the past 1 day
- The average price in the past 7 days
- The average price in the past 28 days
- The average price in the past 364 days
- The standard deviation of price in the past 1 day
- The standard deviation of price in the past 7 days
- The standard deviation of price in the past 28 days
- The standard deviation of price in the past 364 days
- The ratio of average price in the past 1 day over average price in the past 7 days
- The ratio of average price in the past 7 days over average price in the past 364 days
- The ratio of average price in the past 28 days over average price in the past 364 days
- The ratio of  standard deviation in the past 1 day over standard deviation in the past 7 days
- The ratio of standard deviation in the past 7 days over standard deviation in the past 364 days
- The ratio of standard deviation in the past 28 days over standard deviation in the past 364 days
- The year 

3. Model Development

The linear regression is used as the initial model for exploration. 

4. Result Presentation and Analysis 

The mean absolute percentage error (MAPE) is used as the performance error. The MAPE is currently around 53%, which is high. The solution can be improved in the following respects.
  • Create more efficient input variables, like electricity load. The electricity load data in New York ISO is provided in 5-minute interval. Those data has been retrieved, but is under manipulation process, like imputing missing value and aggregating to hour-level. 
  • Use other models like neural network. 
The code developed for this project can be found here. The computation was tried on Spark system. And feature engineering was paid especial attention to in this project. There are still a lot to do in this project in order to improve forecasting accuracy. I will try to continue this topic if I get enough time and energy. 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.