Title: | Time Series Prediction Tools |
---|---|
Description: | Makes the time series prediction easier by automatizing this process using four main functions: prep(), modl(), pred() and postp(). Features different preprocessing methods to homogenize variance and to remove trend and seasonality. Also has the potential to bring together different predictive models to make comparatives. Features ARIMA and Data Mining Regression models (using caret). |
Authors: | Alberto Vico Moreno [aut, cre], Antonio Jesus Rivera Rivas [aut, ths], Maria Dolores Perez Godoy [aut, ths] |
Maintainer: | Alberto Vico Moreno <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.0 |
Built: | 2024-10-18 04:30:59 UTC |
Source: | https://github.com/avm00016/predtoolsts |
This function give us the tools to build predictive models for time series.
modl(tserie, method = "arima", algorithm = NULL, formula = NULL, initialWindow = NULL, horizon = NULL, fixedWindow = NULL)
modl(tserie, method = "arima", algorithm = NULL, formula = NULL, initialWindow = NULL, horizon = NULL, fixedWindow = NULL)
tserie |
A ts or prep object. |
method |
A string. Current methods available are "arima" and "dataMining". Method "arima" is set as default. |
algorithm |
A string. In case |
formula |
An integer vector. Contains the indexes from the time series wich will indicate how to extract the features. The last value will be the class index. Default value: c(1:16) |
initialWindow |
An integer. The initial number of consecutive values in each training set sample. Default value: 30. |
horizon |
An integer. The number of consecutive values in test set sample. Default value: 15. |
fixedWindow |
A logical: if FALSE, the training set always start at the first sample and the training set size will vary over data splits. Default value: TRUE. |
Returns an object modl
which stores all the information related to
the final chosen model (errors, parameters, model).
Currently this function covers two different methods: the widely know ARIMA
and the "not so used for prediction" data mining. For the data mining we make
use of the caret
package.
The caret
package offers plenty of data mining algorithms.
For the data splitting here we use a rolling forecasting origin technique, wich
works better on time series.
A list is returned of class modl
containing:
tserie |
Original time serie. |
tserieDF |
Time serie converted to data frame. |
method |
Method used to build the model. |
algorithm |
If method is data mining, indicates wich algorithm was used. |
horizon |
Horizon for the splitting. |
model |
Model result from |
errors |
Contains three different metrics to evaluate the model. |
Alberto Vico Moreno
http://topepo.github.io/caret/index.html
prep
modl.arima
,
modl.tsToDataFrame
,
modl.trControl
,
modl.dataMining
p <- prep(AirPassengers) modl(p,method='arima') modl(p,method='dataMining',algorithm='rpart')
p <- prep(AirPassengers) modl(p,method='arima') modl(p,method='dataMining',algorithm='rpart')
Assuming "tserie" is stationary, returns the best arima model
modl.arima(tserie)
modl.arima(tserie)
tserie |
A ts object. |
ARIMA model.
Alberto Vico Moreno
modl.arima(AirPassengers)
modl.arima(AirPassengers)
Train the time serie(as data frame) to build the model.
modl.dataMining(form, tserieDF, algorithm, timeControl, metric = "RMSE", maximize = FALSE)
modl.dataMining(form, tserieDF, algorithm, timeControl, metric = "RMSE", maximize = FALSE)
form |
A formula of the form y ~ x1 + x2 + ... |
tserieDF |
Data frame. |
algorithm |
A string. Algorithm to perform the training. Full list at http://topepo.github.io/caret/train-models-by-tag.html. Only regression types allowed. |
timeControl |
trainControl object. |
metric |
A string. Specifies what summary metric will be used to select the optimal model. Possible values in |
maximize |
A logical. Should the metric be maximized or minimized? Default is FALSE, since that is what makes sense for time series. |
train object
Alberto Vico Moreno
modl.dataMining(form=Class ~ ., tserieDF=modl.tsToDataFrame(AirPassengers,formula=c(1:20)), algorithm='rpart', timeControl=modl.trControl(initialWindow=30,horizon=15,fixedWindow=TRUE))
modl.dataMining(form=Class ~ ., tserieDF=modl.tsToDataFrame(AirPassengers,formula=c(1:20)), algorithm='rpart', timeControl=modl.trControl(initialWindow=30,horizon=15,fixedWindow=TRUE))
Creates the needed caret::trainControl
object to control the training
splitting.
modl.trControl(initialWindow, horizon, fixedWindow, givenSummary = FALSE)
modl.trControl(initialWindow, horizon, fixedWindow, givenSummary = FALSE)
initialWindow |
An integer. The initial number of consecutive values in each training set sample. Default value: 30. |
horizon |
An integer. The number of consecutive values in test set sample. Default value: 15. |
fixedWindow |
A logical: if FALSE, the training set always start at the first sample and the training set size will vary over data splits. Default value: TRUE. |
givenSummary |
A logical. Indicates if it should be used the customized summaryFunction(?trainControl for more info) modl.sumFunction or not. Default is FALSE; this will use default |
We always split using method "timeslice", wich is the better for time series. More information on how this works on http://topepo.github.io/caret/data-splitting.html#data-splitting-for-time-series.
trainControl object
Alberto Vico Moreno
modl.trControl(initialWindow=30,horizon=15,fixedWindow=TRUE,givenSummary=TRUE)
modl.trControl(initialWindow=30,horizon=15,fixedWindow=TRUE,givenSummary=TRUE)
Transform a ts object into a data frame using the given formula.
modl.tsToDataFrame(tserie, formula = NULL)
modl.tsToDataFrame(tserie, formula = NULL)
tserie |
A ts object. |
formula |
An integer vector. Contains the indexes from the |
the time serie as data frame
Alberto Vico Moreno
modl.tsToDataFrame(AirPassengers,formula=c(1,3,4,5,6,7)) modl.tsToDataFrame(AirPassengers,formula=c(1:20))
modl.tsToDataFrame(AirPassengers,formula=c(1,3,4,5,6,7)) modl.tsToDataFrame(AirPassengers,formula=c(1:20))
Plots object prep
## S3 method for class 'pred' plot(x, ylab = "Values", main = "Predictions", ...)
## S3 method for class 'pred' plot(x, ylab = "Values", main = "Predictions", ...)
x |
|
ylab |
ylab |
main |
main |
... |
ignored |
plot(pred(modl(prep(AirPassengers))))
plot(pred(modl(prep(AirPassengers))))
Plots object prep
## S3 method for class 'prep' plot(x, ylab = "Preprocessed time serie", xlab = "", ...)
## S3 method for class 'prep' plot(x, ylab = "Preprocessed time serie", xlab = "", ...)
x |
|
ylab |
ylab |
xlab |
xlab |
... |
ignored |
plot(prep(AirPassengers),ylab="Stationary AisPassengers")
plot(prep(AirPassengers),ylab="Stationary AisPassengers")
Using the prep
data we undo the changes on a pred
object.
postp(prd, pre)
postp(prd, pre)
prd |
A |
pre |
A |
A pred
object with reverted transformations.
Alberto Vico Moreno
pred
prep
,
postp.homogenize.log
,
postp.homogenize.boxcox
,
postp.detrend.differencing
,
postp.detrend.sfsm
,
postp.deseason.differencing
preprocess <- prep(AirPassengers) prediction <- pred(modl(preprocess),n.ahead=30) postp.prediction <- postp(prediction,preprocess)
preprocess <- prep(AirPassengers) prediction <- pred(modl(preprocess),n.ahead=30) postp.prediction <- postp(prediction,preprocess)
Uses inverse seasonal differences to reverse the changes
postp.deseason.differencing(tserie, nsd, firstseasons, frequency)
postp.deseason.differencing(tserie, nsd, firstseasons, frequency)
tserie |
A |
nsd |
Number of seasonal differences. |
firstseasons |
Values lost on the original differences |
frequency |
Frequency of the original time serie |
A ts
object.
Alberto Vico Moreno
p <- prep.deseason.differencing(AirPassengers) postp.deseason.differencing(p$tserie,p$nsd,p$firstseasons,frequency(AirPassengers))
p <- prep.deseason.differencing(AirPassengers) postp.deseason.differencing(p$tserie,p$nsd,p$firstseasons,frequency(AirPassengers))
Uses inverse differences to revert the changes
postp.detrend.differencing(tserie, nd, firstvalues)
postp.detrend.differencing(tserie, nd, firstvalues)
tserie |
A |
nd |
Number of differences. |
firstvalues |
Values lost on the original differences |
A ts
object.
Alberto Vico Moreno
p <- prep.detrend.differencing(AirPassengers) postp.detrend.differencing(p$tserie,p$nd,p$firstvalues)
p <- prep.detrend.differencing(AirPassengers) postp.detrend.differencing(p$tserie,p$nd,p$firstvalues)
Undo detrend(substracting full-means method)
postp.detrend.sfsm(tserie, means, start, frequency)
postp.detrend.sfsm(tserie, means, start, frequency)
tserie |
A |
means |
A numeric vector. |
start |
Start of original time serie |
frequency |
Frequency of the original time serie |
A ts
object.
Alberto Vico Moreno
p <- prep.detrend.sfsm(AirPassengers) postp.detrend.sfsm(p$tserie,p$means,start(AirPassengers),frequency(AirPassengers))
p <- prep.detrend.sfsm(AirPassengers) postp.detrend.sfsm(p$tserie,p$means,start(AirPassengers),frequency(AirPassengers))
Undo Box-Cox transformation
postp.homogenize.boxcox(tserie, lambda)
postp.homogenize.boxcox(tserie, lambda)
tserie |
A |
lambda |
A numeric. |
A ts
object.
Alberto Vico Moreno
p <- prep.homogenize.boxcox(AirPassengers) postp.homogenize.boxcox(p$tserie,p$lambda)
p <- prep.homogenize.boxcox(AirPassengers) postp.homogenize.boxcox(p$tserie,p$lambda)
Uses exponent to reverse the logarithm
postp.homogenize.log(tserie)
postp.homogenize.log(tserie)
tserie |
A |
A ts
object.
Alberto Vico Moreno
postp.homogenize.log(prep.homogenize.log(AirPassengers))
postp.homogenize.log(prep.homogenize.log(AirPassengers))
Performs predictions over a trained model.
pred(model = NULL, n.ahead = 20, tserie = NULL, predictions = NULL)
pred(model = NULL, n.ahead = 20, tserie = NULL, predictions = NULL)
model |
A |
n.ahead |
Number of values to predict ahead of the end of the original time serie. Default value is 20. Must ve lower than 100. |
tserie |
A |
predictions |
A |
Predicts future values over a "modl" object which can be ARIMA or data mining, and returns the predictions. Data mining predictions start right after the last value contained in the training data, so they overlap with the end of the original.
The object contains only two time series: the original one and the predictions. You can just set these series aswell.
A list is returned of class pred
containing:
tserie |
Original time serie. |
predictions |
Time serie with the predictions. |
Alberto Vico Moreno
modl
pred.arima
,
pred.dataMining
,
pred.compareModels
prediction <- pred(model=modl(prep(AirPassengers)),n.ahead=25) pred(tserie=prediction$tserie, predictions=prediction$predictions)
prediction <- pred(model=modl(prep(AirPassengers)),n.ahead=25) pred(tserie=prediction$tserie, predictions=prediction$predictions)
Performs predictions over an ARIMA model using the stats::predict
function.
pred.arima(model, n.ahead)
pred.arima(model, n.ahead)
model |
An ARIMA model. |
n.ahead |
Number of values to predict. |
A ts
object containing the predictions.
Alberto Vico Moreno
pred.arima(forecast::auto.arima(prep(AirPassengers)$tserie),n.ahead=30)
pred.arima(forecast::auto.arima(prep(AirPassengers)$tserie),n.ahead=30)
Plots the original time serie along with 2-5 predictive models.
pred.compareModels(originalTS, p_1, p_2, p_3 = NULL, p_4 = NULL, p_5 = NULL, legendNames = NULL, colors = NULL, legend = TRUE, legendPosition = NULL, yAxis = "Values", title = "Predictions")
pred.compareModels(originalTS, p_1, p_2, p_3 = NULL, p_4 = NULL, p_5 = NULL, legendNames = NULL, colors = NULL, legend = TRUE, legendPosition = NULL, yAxis = "Values", title = "Predictions")
originalTS |
A |
p_1 |
A |
p_2 |
A |
p_3 |
A |
p_4 |
A |
p_5 |
A |
legendNames |
String vector with the names for the legend. Has to be same length as number of time series we are plotting(including the original one). Default is NULL. |
colors |
Vector with the colors. Has to be same length as number of time series we are plotting(including the original one). Default is NULL. |
legend |
A logical. Do we want a legend? Default is TRUE. |
legendPosition |
A string with the position of the legend (bottomright, topright, ...). Default is NULL. |
yAxis |
A string. Name for the y axis. "Values" as default. |
title |
A string. Title for the plot. Default is "Predictions". |
This function aims to ease the comparation between different predictive models by plotting them into the same graphic.
Alberto Vico Moreno
data(AirPassengers) #pre-processing p <- prep(AirPassengers) #modelling arima.modl <- modl(p) cart.modl <- modl(p,method='dataMining',algorithm='rpart') #predicting arima.pred <- pred(arima.modl,n.ahead=30) cart.pred <- pred(cart.modl,n.ahead=45) #post-processing arima.pred <- postp(arima.pred,p) cart.pred <- postp(cart.pred,p) #visual comparison pred.compareModels(AirPassengers,arima.pred$predictions,cart.pred$predictions ,legendNames=c('AirPassengers','ARIMA','CART'),yAxis='Passengers',legendPosition = 'topleft')
data(AirPassengers) #pre-processing p <- prep(AirPassengers) #modelling arima.modl <- modl(p) cart.modl <- modl(p,method='dataMining',algorithm='rpart') #predicting arima.pred <- pred(arima.modl,n.ahead=30) cart.pred <- pred(cart.modl,n.ahead=45) #post-processing arima.pred <- postp(arima.pred,p) cart.pred <- postp(cart.pred,p) #visual comparison pred.compareModels(AirPassengers,arima.pred$predictions,cart.pred$predictions ,legendNames=c('AirPassengers','ARIMA','CART'),yAxis='Passengers',legendPosition = 'topleft')
Performs predictions over a data mining model using the caret::predict.train
function.
pred.dataMining(model, n.ahead)
pred.dataMining(model, n.ahead)
model |
A |
n.ahead |
Number of values to predict. |
A ts
object containing the predictions.
Alberto Vico Moreno
m <- modl(prep(AirPassengers),method='dataMining',algorithm='rpart') pred.dataMining(m,n.ahead=15)
m <- modl(prep(AirPassengers),method='dataMining',algorithm='rpart') pred.dataMining(m,n.ahead=15)
This function performs pre-processing on a time series object(ts) to treat heterocedasticity, trend and seasonality in order to make the serie stationary.
prep(tserie, homogenize.method = "log", detrend.method = "differencing", nd = NULL, deseason.method = "differencing", nsd = NULL, detrend.first = TRUE)
prep(tserie, homogenize.method = "log", detrend.method = "differencing", nd = NULL, deseason.method = "differencing", nsd = NULL, detrend.first = TRUE)
tserie |
A ts object. |
homogenize.method |
A string. Current methods available are "log" and "boxcox". Method "log" is set as default. If you don't want to perform this transformation, set method as "none". |
detrend.method |
A string. Current methods available are "differencing" and "sfsm". Method "differencing" is set as default. If you don't want to perform this transformation, set method as "none". |
nd |
A number. Number of differences you want to apply to the "differencing" detrending method. As default its value is NULL, which means nd will be calculated internally. |
deseason.method |
A string. Current methods available are "differencing". Method "differencing" is set as default. If you don't want to perform this transformation, set method as "none". |
nsd |
A number. Number of seasonal differences you want to apply to the "differencing" deseasoning method. As default its value is NULL, which means nsd will be calculated internally. |
detrend.first |
A boolean. TRUE if detrending method is applied first, then deseasoning. FALSE if deseasoning method is applied first. Default is TRUE. |
Returns an object prep
which stores all data needed to undo the changes later on.
This function provides an automatic way of pre-processing based on unit root tests, but this is not the perfect way to do it. You should always check manually if the given time serie is actually stationary, and modify the parameters according to your thoughts.
A list is returned of class prep
containing:
tserie |
Processed ts object. |
homogenize.method |
Method used for homogenizing. |
detrend.method |
Method used for detrending. |
nd |
Number of differences used on detrending through differencing. |
firstvalues |
First |
deseason.method |
Method used for deseasoning. |
nsd |
Number of seasonal differences used on deseasoning through differencing. |
firstseasons |
First |
detrend.first |
Processed ts object |
means |
Vector of means used in "sfsm" detrending method. |
lambda |
Coefficient used in "boxcox" transformation. |
start |
Start of the original time serie. |
length |
Length of the original time serie. |
Alberto Vico Moreno
https://www.otexts.org/fpp/8/1
prep.homogenize.log
,
prep.homogenize.boxcox
,
prep.detrend.differencing
,
prep.detrend.sfsm
,
prep.deseason.differencing
,
prep.check.acf
,
prep.check.adf
prep(AirPassengers) prep(AirPassengers,homogenize.method='boxcox',detrend.method='none')
prep(AirPassengers) prep(AirPassengers,homogenize.method='boxcox',detrend.method='none')
Plots the autocorrelation function to check stationarity
prep.check.acf(tserie)
prep.check.acf(tserie)
tserie |
a |
For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly. Also, for non-stationary data, the value is often large and positive.
prep.check.acf(AirPassengers) prep.check.acf(prep(AirPassengers))
prep.check.acf(AirPassengers) prep.check.acf(prep(AirPassengers))
Performs ADF test just as another tool to check stationarity.
prep.check.adf(tserie)
prep.check.adf(tserie)
tserie |
a |
Shows the results of an ADF test. A p-value<0.05 suggests the data is stationary.
prep.check.adf(AirPassengers) prep.check.adf(prep(AirPassengers))
prep.check.adf(AirPassengers) prep.check.adf(prep(AirPassengers))
Performs differencing with lag=frequency.
prep.deseason.differencing(tserie, nsd = NULL)
prep.deseason.differencing(tserie, nsd = NULL)
tserie |
a |
nsd |
number of seasonal differences to apply. As default its value is NULL; in this case, the function will perform an automatic estimation of |
If no number of differences is specified, the function will make an estimation
of the number of differences needed based on unit root test provided by forecast::nsdiffs
A list is returned containing:
tserie |
Transformed ts object. |
nsd |
Number of seasonal differencies applied. |
firstseasons |
Lost values after differencing. |
prep.deseason.differencing(AirPassengers) prep.deseason.differencing(AirPassengers,nsd=2)
prep.deseason.differencing(AirPassengers) prep.deseason.differencing(AirPassengers,nsd=2)
Performs differencing with lag=1.
prep.detrend.differencing(tserie, nd = NULL)
prep.detrend.differencing(tserie, nd = NULL)
tserie |
a |
nd |
number of differences to apply. As default its value is NULL; in this case, the function will perform an automatic estimation of |
If no number of differences is specified, the function will make an estimation
of the number of differences needed based on unit root test provided by forecast::ndiffs
A list is returned containing:
tserie |
Transformed ts object. |
nd |
Number of differencies applied. |
firstvalues |
Lost values after differencing. |
prep.detrend.differencing(AirPassengers) prep.detrend.differencing(AirPassengers,nd=2)
prep.detrend.differencing(AirPassengers) prep.detrend.differencing(AirPassengers,nd=2)
Performs "substracting full-season means" method to go for a totally automatic approach.
prep.detrend.sfsm(tserie)
prep.detrend.sfsm(tserie)
tserie |
a |
Under this detrending scheme, a series is first split into segments. The length
of the segments is equal to the length of seasonality(12 for monthly).
The mean of the historical observations within each of these segments is substacted
from every historical observation in the segment.
To get the detrended serie we do:
ds = xi - m
Being xi
the actual values on the time series and m
the mean of the segment of xi
A list is returned containing:
tserie |
Transformed ts object. |
means |
Vector containing the historical means. |
prep.detrend.sfsm(AirPassengers)
prep.detrend.sfsm(AirPassengers)
Performs a Box-Cox transformation to a time serie.
prep.homogenize.boxcox(tserie)
prep.homogenize.boxcox(tserie)
tserie |
a |
A list is returned containing:
boxcox |
Transformed ts object. |
lambda |
Lambda value. |
Box-Cox transformation: https://en.wikipedia.org/wiki/Power_transform#Box.E2.80.93Cox_transformation
prep.homogenize.log(AirPassengers)
prep.homogenize.log(AirPassengers)
Performs a logarithmic transformation to a time serie.
prep.homogenize.log(tserie)
prep.homogenize.log(tserie)
tserie |
a |
ts
object with transformed time serie
prep.homogenize.log(AirPassengers)
prep.homogenize.log(AirPassengers)
Prints object modl
## S3 method for class 'modl' print(x, ...)
## S3 method for class 'modl' print(x, ...)
x |
|
... |
ignored |
print(modl(prep(AirPassengers)))
print(modl(prep(AirPassengers)))
Prints object pred
## S3 method for class 'pred' print(x, ...)
## S3 method for class 'pred' print(x, ...)
x |
|
... |
ignored |
print(pred(modl(prep(AirPassengers))))
print(pred(modl(prep(AirPassengers))))
Prints object prep
## S3 method for class 'prep' print(x, ...)
## S3 method for class 'prep' print(x, ...)
x |
|
... |
ignored |
print(prep(AirPassengers))
print(prep(AirPassengers))
Summary of object modl
## S3 method for class 'modl' summary(object, ...)
## S3 method for class 'modl' summary(object, ...)
object |
|
... |
ignored |
summary(modl(prep(AirPassengers)))
summary(modl(prep(AirPassengers)))
Summary of object pred
## S3 method for class 'pred' summary(object, ...)
## S3 method for class 'pred' summary(object, ...)
object |
|
... |
ignored |
summary(pred(modl(prep(AirPassengers))))
summary(pred(modl(prep(AirPassengers))))
Summary of object prep
## S3 method for class 'prep' summary(object, ...)
## S3 method for class 'prep' summary(object, ...)
object |
|
... |
ignored |
summary(prep(AirPassengers))
summary(prep(AirPassengers))