This is a quick demo of how to use the trustOptim package. For this example, the objective function is the Rosenbrock function. \[ f(x_{1:N},y_{1:N})=\sum_{i=1}^N \left[100\left(x^2_i-y_i\right)^2+\left(x_i-1\right)^2\right] \]
The parameter vector contains \(2N\) variables ordered as \(x_1, y_1, x_2, y_2, ... x_n, y_n\). The optimum of the function is a vector of ones, and the value at the minimum is zero.
The following functions return the objective, gradient, and Hessian (in sparse format) of this function.
require(trustOptim)
require(Matrix)
f <- function(V) {
N <- length(V)/2
x <- V[seq(1,2*N-1,by=2)]
y <- V[seq(2,2*N,by=2)]
return(sum(100*(x^2-y)^2+(x-1)^2))
}
df <- function(V) {
N <- length(V)/2
x <- V[seq(1,2*N-1,by=2)]
y <- V[seq(2,2*N,by=2)]
t <- x^2-y
dxi <- 400*t*x+2*(x-1)
dyi <- -200*t
return(as.vector(rbind(dxi,dyi)))
}
hess <- function(V) {
N <- length(V)/2
x <- V[seq(1,2*N-1,by=2)]
y <- V[seq(2,2*N,by=2)]
d0 <- rep(200,N*2)
d0[seq(1,(2*N-1),by=2)] <- 1200*x^2-400*y+2
d1 <- rep(0,2*N-1)
d1[seq(1,(2*N-1),by=2)] <- -400*x
H <- bandSparse(2*N,
k=c(-1,0,1),
diagonals=list(d1,d0,d1),
symmetric=FALSE,
repr='C')
return(drop0(H))
}For this demo, we start at a random vector.
Next, we call trust.optim, with all default
arguments.
opt <- trust.optim(start, fn=f, gr=df, hs=hess, method="Sparse")
# Beginning optimization
#
# iter f nrm_gr status
# 1 10015.031437 10987.613876 Continuing - TR expand
# 2 10015.031437 10987.613876 Continuing - TR contract
# 3 219.817975 1588.461373 Continuing - TR expand
# 4 219.817975 1588.461373 Continuing - TR contract
# 5 219.817975 1588.461373 Continuing - TR contract
# 6 219.817975 1588.461373 Continuing - TR contract
# 7 82.628847 703.991138 Continuing
# 8 17.092097 196.558177 Continuing - TR expand
# 9 17.092097 196.558177 Continuing - TR contract
# 10 17.092097 196.558177 Continuing - TR contract
# 11 17.092097 196.558177 Continuing - TR contract
# 12 15.903949 87.156911 Continuing
# 13 10.985480 39.443763 Continuing - TR expand
# 14 10.985480 39.443763 Continuing - TR contract
# 15 9.991011 96.253959 Continuing
# 16 7.829903 26.847358 Continuing
# 17 7.829903 26.847358 Continuing - TR contract
# 18 6.689435 33.066635 Continuing - TR expand
# 19 6.689435 33.066635 Continuing - TR contract
# 20 6.221075 58.822744 Continuing
# 21 4.451638 16.401176 Continuing
# 22 4.451638 16.401176 Continuing - TR contract
# 23 3.834185 30.511945 Continuing - TR expand
# 24 2.924576 8.993872 Continuing
# 25 2.924576 8.993872 Continuing - TR contract
#
# iter f nrm_gr status
# 26 2.924576 8.993872 Continuing - TR contract
# 27 2.532653 20.239445 Continuing
# 28 1.786237 5.608209 Continuing
# 29 1.786237 5.608209 Continuing - TR contract
# 30 1.380482 7.023595 Continuing - TR expand
# 31 1.019554 5.780869 Continuing
# 32 0.725310 4.438916 Continuing
# 33 0.502544 4.865620 Continuing
# 34 0.324202 3.268843 Continuing
# 35 0.207341 5.565848 Continuing
# 36 0.111685 1.780737 Continuing
# 37 0.072993 7.082883 Continuing
# 38 0.022568 0.361306 Continuing
# 39 0.022568 0.361306 Continuing - TR contract
# 40 0.022568 0.361306 Continuing - TR contract
# 41 0.009448 2.928907 Continuing - TR expand
# 42 0.001544 0.247652 Continuing
# 43 0.000153 0.498492 Continuing
# 44 0.000001 0.005509 Continuing
# 45 0.000000 0.000359 Continuing
# 46 0.000000 0.000000 Continuing
#
# Iteration has terminated
# 46 0.000000 0.000000 SuccessIn the above output, f is the objective function, and
nrm_gr is the norm of the gradient. The status
messages illustrate how the underlying trust region algorithm is
progressing, and are useful mainly for debugging purposes. Note that the
objective value is non-increasing at each iteration, but the norm of the
gradient is not. The algorithm will continue until either the norm of
the gradient is less than the control parameter prec, the
trust region radius is less than stop.trust.radius, or the
iteration count exceeds maxit. See the package manual for
details of the control parameters. We use the default control parameters
for this demo (hence, there is no control list here.
The result contains the objective value, the minimum, the gradient at the minimum (should be numerically zero for all elements), and the Hessian at the minimum.
opt
# $fval
# [1] 2.233e-19
#
# $solution
# [1] 1 1 1 1 1 1
#
# $gradient
# [1] 2.331e-09 -1.631e-09 0.000e+00 0.000e+00 0.000e+00 0.000e+00
#
# $hessian
# 6 x 6 sparse Matrix of class "dsCMatrix"
#
# [1,] 802 -400 . . . .
# [2,] -400 200 . . . .
# [3,] . . 802 -400 . .
# [4,] . . -400 200 . .
# [5,] . . . . 802 -400
# [6,] . . . . -400 200
#
# $iterations
# [1] 46
#
# $status
# [1] "Success"
#
# $trust.radius
# [1] 0.5006
#
# $nnz
# [1] 9
#
# $method
# [1] "Sparse"Note that opt$fval, and all elements of
opt$gradient are zero, within machine precision. The
solution is correct, and the Hessian is returned as a compressed sparse
Matrix object (refer to the Matrix package for details).
One way to potentially speed up convergence (but not
necessarily compute time) is to apply a preconditioner to the algorithm.
Other than the identity matrix (the default), the package current
supports only a modified Cholesky preconditioner. This is implemented
with a control parameter preconditioner=1. To save space,
we report the optimizer status only ever 10 iterations.
opt1 <- trust.optim(start, fn=f, gr=df, hs=hess, method="Sparse",
control=list(preconditioner=1, report.freq=10))
# Beginning optimization
#
# iter f nrm_gr status
# 10 13.648174 7.496606 Continuing - TR contract
# 20 7.151988 37.094473 Continuing - TR expand
# 30 3.408749 18.596902 Continuing - TR expand
# 40 0.836703 12.994474 Continuing
# 50 0.128263 6.112045 Continuing
# 60 0.000011 0.142876 Continuing
#
# Iteration has terminated
# 63 0.000000 0.000000 SuccessHere, we see that adding the preconditioner actually increases the number of iterations. Sometimes preconditioners help a lot, and sometimes not at all.
##Quasi-Newton Methods
The trust.optim function also supports quasi-Newton
approximations to the Hessian. The two options are BFGS and SR1 updates.
See Nocedal and Wright (2006) for details.
You do not need to provide the Hessian for these methods, and they are
often preferred when the Hessian is dense. However, they may take longer
to converge, which is why we need to change the maxit
control parameter. To save space, we report the status of the optimizer
only every 10 iterations.
opt.bfgs <- trust.optim(start, fn=f, gr=df, method="BFGS", control=list(maxit=5000, report.freq=10))
# Beginning optimization
#
# iter f nrm_gr status
# 10 88.806530 354.018026 Continuing
# 20 1.823163 6.328353 Continuing - TR contract
# 30 1.389553 4.246382 Continuing
# 40 0.802565 3.855513 Continuing - TR expand
# 50 0.571448 2.816390 Continuing - TR expand
# 60 0.208713 10.847588 Continuing - TR contract
# 70 0.007267 1.328120 Continuing - TR contract
# 80 0.005292 0.670026 Continuing - TR expand
# 90 0.000001 0.004486 Continuing
# 100 0.000000 0.000000 Continuing
#
# Iteration has terminated
# 101 0.000000 0.000000 Success
opt.bfgs
# $fval
# [1] 8.829e-23
#
# $solution
# [1] 1 1 1 1 1 1
#
# $gradient
# [1] -1.149e-10 6.506e-11 8.306e-12 -6.639e-12 -7.881e-11 3.622e-11
#
# $iterations
# [1] 101
#
# $status
# [1] "Success"
#
# $trust.radius
# [1] 0.3383
#
# $method
# [1] "BFGS"
#
# $hessian.update.method
# [1] 2And we can do the same thing with SR1 updates.
opt.sr1 <- trust.optim(start, fn=f, gr=df, method="SR1", control=list(maxit=5000, report.freq=10))
# Beginning optimization
#
# iter f nrm_gr status
# 10 175.256780 287.816322 Continuing - TR contract
# 20 2.931556 2.996846 Continuing
# 30 2.131656 6.411590 Continuing
# 40 1.127632 3.784477 Continuing - TR expand
# 50 0.315880 6.964198 Continuing
# 60 0.208635 1.302632 Continuing - TR expand
# 70 0.132457 5.187552 Continuing - TR contract
# 80 0.108929 2.243054 Continuing - TR contract
# 90 0.103658 0.417444 Continuing - TR expand
# 100 0.039978 2.964965 Continuing - TR contract
# 110 0.007481 2.932641 Continuing - TR contract
# 120 0.006169 1.885494 Continuing - TR contract
# 130 0.000005 0.033300 Continuing - TR contract
# 140 0.000000 0.002521 Continuing
#
# Iteration has terminated
# 144 0.000000 0.000000 Success
opt.sr1
# $fval
# [1] 1.657e-19
#
# $solution
# [1] 1 1 1 1 1 1
#
# $gradient
# [1] -6.302e-10 3.902e-10 -4.991e-11 1.402e-10 -1.377e-08 6.698e-09
#
# $iterations
# [1] 144
#
# $status
# [1] "Success"
#
# $trust.radius
# [1] 0.005085
#
# $method
# [1] "SR1"
#
# $hessian.update.method
# [1] 1Note that the quasi_Newton updates do not return a Hessian. We do not think that the final approximations from BFGS or SR1 updates are particularly reliable. If you need the Hessian, you can use the sparseHessianFD package.