Overview of stock prediction application

The purpose of this example is to demonstrate the prediction property a Neural Network. This example aims to predict the Google stock price based on its 4-year historical data. The Neural Network will be designed in such as a way that reads 20 days of historical price to predict the shared price of 1 day ahead as shown in Figure 4.18. This example uses a multilayer feed forward Neural Network supported by ANNHub, better prediction result might be achieved by a Recurrent Neural Network.


Figure 4.18: Overview of the stock prediction application

Prepare stock dataset

The first step is to obtain historical data of the Google stock price. This can be achieved accessing yahoo finance website, and search for GOOG, the historical data (1) for a certain period (2) will be available to download (3) as shown in Figure 4.19.

Figure 4.19: Obtain Google historical data for training dataset and test dataset.


Figure 4.20: The 5 year Google historical data in csv format.


The downloaded five-year historical data is in csv format as shown in Figure 4.20. In this example, only "Open Price" data are used for prediction purpose, and the five-year historical data will be separate into 2 sets, training set (data from 2013- 2017) is used to train a Neural Network, and test set (data from 2017-2018) is used to verify it that trained Neural Network can correctly predict the Google stock price.  


Since the Neural Network classifier needs to have 20-day historical data to predict 1 day ahead, the inputs of this Neural Network will be 20 and its output will be 1. Both training dataset and test dataset are segmented into data segments with first 20 days are inputs and day 21st as the output/target. Each data segment is called as 1 sample. As a result, a window containing 21 historical data will scan through five-year historical data to from training data set and test dataset.


Matab code to extract training dataset and testing dataset as shown as follows.

%% 1.Load google dataset from csv file

clear all;

filename = 'GOOG.csv';

fid=fopen(filename);

data = textscan(fid,'%s %f %f %f %f %f %f','headerlines', 1, 'delimiter', ',');

fclose(fid);

%% 2.Load open price and seperate four years (2013-2017) for training data and 1 year (2018) is for testing

openpriceData = data{2};

L = round(4*length(openpriceData)/5);

trainingSet = openpriceData(1:L,:);

testingSet  = openpriceData(L+1:end,:);

%% 3. Prepare dataset as the first 20 days as inputs and the output is day 21th

temp= prepareStockData(trainingSet);

trainingData = shuffleRow(temp);

testingData= prepareStockData(testingSet);


%% 4 Export training data to a file

input = trainingData(:,1:20);

target = trainingData(:,21:end);

edata =[input target];

[N,ip] = size(input);

[N,op] = size(target);

textHeader= getTextHeader(ip,op);

%write header to file

fid = fopen('GoogleTrainingSet.csv','w');

fprintf(fid,'%s\n',textHeader);

fclose(fid);

dlmwrite('GoogleTrainingSet.csv',edata,'-append');

%% 5 Export testing data to a file

input = testingData(:,1:20);

target = testingData(:,21:end);

edata =[input target];

[N,ip] = size(input);

[N,op] = size(target);

textHeader= getTextHeader(ip,op);

%write header to file

fid = fopen('GoogleTestingSet.csv','w');

fprintf(fid,'%s\n',textHeader);

fclose(fid);

dlmwrite('GoogleTestingSet.csv',edata,'-append');


The getTextHeader() function


function tHeader = getTextHeader(inputs,outputs)

textHeader ='';

for i=1:inputs

   AddStr1 =  strcat('Day ',num2str(i));

   AddStr2 =strcat(AddStr1,',');

   textHeader =strcat(textHeader, AddStr2);

end

   

for j=1:outputs-1

   AddStr1 =  strcat('Output ',num2str(j));

   AddStr2 =strcat(AddStr1,',');

   textHeader =strcat(textHeader, AddStr2);

end

   AddStr1 =  strcat('Output ',num2str(outputs));

   textHeader =strcat(textHeader, AddStr1);

   tHeader=textHeader;

end



The prepareStockData() function

function StockData = prepareStockData(rawdata)

   result =[];

   for i=1:length(rawdata)-20

       temp = rawdata(i:20+i);

       result = [result temp];

   end

   StockData= result';

end



After being scanned by 21-day length window, the training dataset and test dataset have the format shown in Figure 4.21. Both training dataset and test dataset are then saved into csv files, located in example folder.


Figure 4.21: The format of training dataset and test dataset.

Load training dataset

Similar to Handwriting recognition application, the training dataset will be loaded in Step 1 to provide ANNHub data structure information that would help it to provide recommended Neural Network structure.  


Figure 4.22: Loading training dataset into ANNHub.

Configure Neural Network


The Neural Network is configured as shown in Figure 4.23. In this application, Bayesian Neural Network is used as a predictor.

Figure 4.23: Configure Neural Network structure for stock prediction application.

Train Neural Network

Since Bayesian Neural Network is used, early stopping technique is not required to avoid over-fitting issue. During training procedure, ANNHub will record all performance indexes, and the best performance will be detected at certain epoch to achieve better prediction result as shown in Figure 4.24.


Figure 4.24: Train Neural Network for stock prediction application.


Evaluate Neural Network

The regression curve is then used to evaluate how well the trained Bayesian Neural Network perform with training dataset (with 70% for training and 30% for testing) as shown in Figure 4.25. This curve will provide the relationship between predicted outputs (outputs) with known outputs (targets) in the training dataset.


Figure 4.25: Evaluate Neural Network for stock prediction application.


Test Neural Network with new dataset

To confirm the stability of the trained Neural Network, the 1-year test dataset (data from 2017-2018 that is not being used in the above steps) is used for testing. As shown in Figure 4.26 and 4.27, the trained Neural Network provide excellent prediction results with R-squared value over 0.9. The R-squared value =1 means that the predicted outputs match targets perfectly.  


Figure 4.26: Test Neural Network for stock prediction application with test dataset

Figure 4.27: Fit plot for stock prediction application with test dataset