Overview of stock prediction application

The purpose of this example is to demonstrate the prediction property a Neural Network. This example aims to predict the Google stock price based on its 4-year historical data. The Neural Network will be designed in such as a way that reads 20 days of historical price to predict the shared price of 1 day ahead as shown in Figure 4.18. This example uses a multi-layer feed forward Neural Network supported by ANNHUB, better prediction result might be achieved by a Recurrent Neural Network.

 


Figure 4.18: Overview of the stock prediction application

Prepare stock data-set

The first step is to obtain historical data of the Google stock price. This can be achieved accessing yahoo finance website, and search for GOOG, the historical data (1) for a certain period (2) will be available to download (3) as shown in Figure 4.19.

Figure 4.19: Obtain Google historical data for training data-set and test data-set.


Figure 4.20: The 5 year Google historical data in csv format.


The downloaded five-year historical data is in csv format as shown in Figure 4.20. In this example, only "Open Price" data are used for prediction purpose, and the five-year historical data will be separate into 2 sets, training set (data from 2013- 2017) is used to train a Neural Network, and test set (data from 2017-2018) is used to verify it that trained Neural Network can correctly predict the Google stock price.  


Since the Neural Network classifier needs to have 20-day historical data to predict 1 day ahead, the inputs of this Neural Network will be 20 and its output will be 1. Both training data-set and test data-set are segmented into data segments with first 20 days are inputs and day 21st as the output/target. Each data segment is called as 1 sample. As a result, a window containing 21 historical data will scan through five-year historical data to from training data set and test data-set.


Matab code to extract training data-set and testing data-set as shown as follows.

 

%% 1.Load google data-set from csv file

clear all;

filename = 'GOOG.csv';

fid=fopen(filename);

data = textscan(fid,'%s %f %f %f %f %f %f','headerlines', 1, 'delimiter', ',');

fclose(fid);

 

%% 2.Load open price and separate four years (2013-2017) for training data and 1 year (2018) is for testing

openpriceData = data{2};

L = round(4*length(openpriceData)/5);

trainingSet = openpriceData(1:L,:);

testingSet  = openpriceData(L+1:end,:);

 

%% 3. Prepare data-set as the first 20 days as inputs and the output is day 21th

temp= prepareStockData(trainingSet);

trainingData = shuffleRow(temp);

testingData= prepareStockData(testingSet);


%% 4 Export training data to a file

input = trainingData(:,1:20);

target = trainingData(:,21:end);

edata =[input target];

[N,ip] = size(input);

[N,op] = size(target);

textHeader= getTextHeader(ip,op);

%write header to file

fid = fopen('GoogleTrainingSet.csv','w');

fprintf(fid,'%s\n',textHeader);

fclose(fid);

dlmwrite('GoogleTrainingSet.csv',edata,'-append');

 

%% 5 Export testing data to a file

input = testingData(:,1:20);

target = testingData(:,21:end);

edata =[input target];

[N,ip] = size(input);

[N,op] = size(target);

textHeader= getTextHeader(ip,op);

%write header to file

fid = fopen('GoogleTestingSet.csv','w');

fprintf(fid,'%s\n',textHeader);

fclose(fid);

dlmwrite('GoogleTestingSet.csv',edata,'-append');

 


The getTextHeader() function


function tHeader = getTextHeader(inputs,outputs)

textHeader ='';

for i=1:inputs

    AddStr1 =  strcat('Day ',num2str(i));

    AddStr2 =strcat(AddStr1,',');

    textHeader =strcat(textHeader, AddStr2);

end 

   

for j=1:outputs-1

    AddStr1 =  strcat('Output ',num2str(j));

    AddStr2 =strcat(AddStr1,',');

    textHeader =strcat(textHeader, AddStr2);

end 

    AddStr1 =  strcat('Output ',num2str(outputs));

    textHeader =strcat(textHeader, AddStr1);

    tHeader=textHeader;

end



The prepareStockData() function

function StockData = prepareStockData(rawdata)

    result =[];

    for i=1:length(rawdata)-20

        temp = rawdata(i:20+i);

        result = [result temp];

    end

    StockData= result';

end 



After being scanned by 21-day length window, the training data-set and test data-set have the format shown in Figure 4.21. Both training data-set and test data-set are then saved into csv files, located in example folder.


Figure 4.21: The format of training data-set and test data-set.

Load training data-set

Similar to Handwriting recognition application, the training data-set will be loaded in Step 1 to provide ANNHUB data structure information that would help it to provide recommended Neural Network structure.  


Figure 4.22: Loading training data-set into ANNHUB.

Configure Neural Network


The Neural Network is configured as shown in Figure 4.23. In this application, Bayesian Neural Network is used as a predictor.

Figure 4.23: Configure Neural Network structure for stock prediction application.

Train Neural Network

Since Bayesian Neural Network is used, early stopping technique is not required to avoid over-fitting issue. During training procedure, ANNHUB will record all performance indexes, and the best performance will be detected at certain epoch to achieve better prediction result as shown in Figure 4.24.


Figure 4.24: Train Neural Network for stock prediction application.


Evaluate Neural Network

The regression curve is then used to evaluate how well the trained Bayesian Neural Network perform with training data-set (with 70% for training and 30% for testing) as shown in Figure 4.25. This curve will provide the relationship between predicted outputs (outputs) with known outputs (targets) in the training data-set.


Figure 4.25: Evaluate Neural Network for stock prediction application.


Test Neural Network with new data-set

To confirm the stability of the trained Neural Network, the 1-year test data-set (data from 2017-2018 that is not being used in the above steps) is used for testing. As shown in Figure 4.26 and 4.27, the trained Neural Network provide excellent prediction results with R-squared value over 0.9. The R-squared value =1 means that the predicted outputs match targets perfectly.  


Figure 4.26: Test Neural Network for stock prediction application with test data-set

Figure 4.27: Fit plot for stock prediction application with test data-set