The svmodt package in R implements recursive oblique decision trees, leveraging linear Support Vector Machines (SVMs) to define oblique splits at each node. While traditional decision trees are valued for their interpretability due to axis-aligned splits, oblique decision trees introduce complexity by using linear combinations of features, making optimal split determination more challenging. SVMs, however, offer a principled approach to splitting by identifying hyperplanes that maximize the margin between classes.
You can install the development version of svmodt from GitHub with:
# install.packages("devtools")
devtools::install_github("AneeshAgarwala/svmodt")Linear SVM splits for simple decision boundaries
Binary & Multiclass Classification via one-vs-rest SVM splits at each node.
Flexible feature selection (random, mutual information, correlation)
Penalized Feature selection applies to penalty to used features at ancestor nodes for diversified feature selection
Dynamic Feature Selection allows user to either randomize or decrease the number of features in child nodes
Class weight support for imbalanced data (balanced, balanced sub-sample, custom weights)
Node-specific scaling for improved performance
library(svmodt)
# Load data
data(wdbc) # The package is inclusive of this dataset
wdbc$diagnosis <- factor(wdbc$diagnosis)
# Split
set.seed(123)
train_idx <- sample(nrow(wdbc), 0.8 * nrow(wdbc))
train_data <- wdbc[train_idx, ]
test_data <- wdbc[-train_idx, ]# Train with class weights
tree <- svm_split(
data = train_data,
response = "diagnosis",
max_depth = 4,
max_features = 2,
feature_method = "mutual",
class_weights = "balanced",
verbose = TRUE
)
# Predict
predictions <- predict(tree, test_data)
# Visualize Split Boundary at Individual Node(s)
viz <- plot(
tree = tree,
original_data = train_data,
response_col = "diagnosis",
plot.type = "boundary"
)
# Visualize Overall Surface Split(s)
viz <- plot_surface(
tree = tree,
data = data,
response = "diagnois",
plot.type = "surface")# Penalize previously used features to promote diversity
tree <- svm_split(
data = train_data,
response = "diagnosis",
max_depth = 4,
max_features = 3,
feature_method = "mutual",
penalize_used_features = TRUE,
feature_penalty_weight = 0.5
)set.seed(123)
# Decrease number of features at deeper levels
tree <- svm_split(
data = train_data,
response = "diagnosis",
max_depth = 5,
max_features = 10,
max_features_strategy = "decrease",
max_features_decrease_rate = 0.8
)
# Random feature selection at each node
tree <- svm_split(
data = train_data,
response = "diagnosis",
max_features_strategy = "random",
max_features_random_range = c(0.3, 0.8)
)# Balanced class weights
tree <- svm_split(
data = train_data,
response = "diagnosis",
class_weights = "balanced"
)
set.seed(123)
# Custom class weights
custom_weights <- c("B" = 1, "M" = 3)
tree <- svm_split(
data = train_data,
response = "diagnosis",
class_weights = "custom",
custom_class_weights = custom_weights
)