Multivariate Adaptive Regression Splines - Pros and Cons

Pros and Cons

No regression modeling technique is best for all situations. The guidelines below are intended to give an idea of the pros and cons of MARS, but there will be exceptions to the guidelines. It is useful to compare MARS to recursive partitioning and this is done below. (Recursive partitioning is also commonly called regression trees, decision trees, or CART; see the recursive partitioning article for details).

MARS models are more flexible than linear regression models.

MARS models are simple to understand and interpret. Compare the equation for ozone concentration above to, say, the innards of a trained neural network or a random forest.

MARS can handle both continuous and categorical data. MARS tends to be better than recursive partitioning for numeric data because hinges are more appropriate for numeric variables than the piecewise constant segmentation used by recursive partitioning.

Building MARS models often requires little or no data preparation. The hinge functions automatically partition the input data, so the effect of outliers is contained. In this respect MARS is similar to recursive partitioning which also partitions the data into disjoint regions, although using a different method. (Nevertheless, as with most statistical modeling techniques, known outliers should be considered for removal before training a MARS model.)

MARS (like recursive partitioning) does automatic variable selection (meaning it includes important variables in the model and excludes unimportant ones). However, bear in mind that variable selection is not a clean problem and there is usually some arbitrariness in the selection, especially in the presence of collinearity and 'concurvity'.

MARS models tend to have a good bias-variance trade-off. The models are flexible enough to model non-linearity and variable interactions (thus MARS models have fairly low bias), yet the constrained form of MARS basis functions prevents too much flexibility (thus MARS models have fairly low variance).

MARS is suitable for handling fairly large datasets. It is a routine matter to build a MARS model from an input matrix with, say, 100 predictors and 105 observations. Such a model can be built in about a minute on a 1 GHz machine, assuming the maximum degree of interaction of MARS terms is limited to one (i.e. additive terms only). A degree two model with the same data on the same 1 GHz machine takes longer—about 12 minutes. Be aware that these times are highly data dependent. Recursive partitioning is much faster than MARS.

With MARS models, as with any non-parametric regression, parameter confidence intervals and other checks on the model cannot be calculated directly (unlike linear regression models). Cross-validation and related techniques must be used for validating the model instead.

MARS models do not give as good fits as boosted trees, but can be built much more quickly and are more interpretable. (An 'interpretable' model is in a form that makes it clear what the effect of each predictor is.)

The earth, mda, and polspline implementations do not allow missing values in predictors, but free implementations of regression trees (such as rpart and party) do allow missing values using a technique called surrogate splits.

MARS models can make predictions quickly. The prediction function simply has to evaluate the MARS model formula. Compare that to making a prediction with say a Support Vector Machine, where every variable has to be multiplied by the corresponding element of every support vector. That can be a slow process if there are many variables and many support vectors.

Read more about this topic: Multivariate Adaptive Regression Splines

Famous quotes related to pros and cons:

“Quite generally, the familiar, just because it is familiar, is not cognitively understood. The commonest way in which we deceive either ourselves or others about understanding is by assuming something as familiar, and accepting it on that account; with all its pros and cons, such knowing never gets anywhere, and it knows not why.... The analysis of an idea, as it used to be carried out, was, in fact, nothing else than ridding it of the form in which it had become familiar.”
—Georg Wilhelm Friedrich Hegel (1770–1831)