Getting Started with NNS: Clustering and Regression

Fred Viole

Clustering and Regression

Below are some examples demonstrating unsupervised learning with NNS clustering and nonlinear regression using the resulting clusters. As always, for a more thorough description and definition, please view the References.

NNS Partitioning NNS.part

NNS.part is both a partitional and hierarchal clustering method. NNS iteratively partitions the joint distribution into partial moment quadrants, and then assigns a quadrant identification at each partition.

NNS.part returns a data.table of observations along with their final quadrant identification. It also returns the regression points, which are the quadrant means used in NNS.reg.

x=seq(-5,5,.05); y=x^3

NNS.part(x,y,Voronoi = T)

## $order
## [1] 3
## 
## $dt
##          x         y quadrant prior.quadrant
##   1: -5.00 -125.0000     q444            q44
##   2: -4.95 -121.2874     q444            q44
##   3: -4.90 -117.6490     q444            q44
##   4: -4.85 -114.0841     q444            q44
##   5: -4.80 -110.5920     q444            q44
##  ---                                        
## 197:  4.80  110.5920     q111            q11
## 198:  4.85  114.0841     q111            q11
## 199:  4.90  117.6490     q111            q11
## 200:  4.95  121.2874     q111            q11
## 201:  5.00  125.0000     q111            q11
## 
## $regression.points
##    quadrant      x          y
## 1:      q44 -4.100 -72.426500
## 2:      q42 -2.825 -22.889562
## 3:      q41 -1.225  -3.751562
## 4:      q14  1.275   4.064063
## 5:      q13  2.850  23.448375
## 6:      q11  4.100  72.426500

X-only Partitioning

NNS.part offers a partitioning based on \(x\) values only, using the entire bandwidth in its regression point derivation, and shares the same limit condition as partitioning via both \(x\) and \(y\) values.

NNS.part(x,y,Voronoi = T,type="XONLY")

## $order
## [1] 6
## 
## $dt
##          x         y quadrant prior.quadrant
##   1: -5.00 -125.0000  q111111         q11111
##   2: -4.95 -121.2874  q111111         q11111
##   3: -4.90 -117.6490  q111111         q11111
##   4: -4.85 -114.0841  q111111         q11111
##   5: -4.80 -110.5920  q111112         q11111
##  ---                                        
## 197:  4.80  110.5920  q222221         q22222
## 198:  4.85  114.0841  q222221         q22222
## 199:  4.90  117.6490  q222222         q22222
## 200:  4.95  121.2874  q222222         q22222
## 201:  5.00  125.0000  q222222         q22222
## 
## $regression.points
##     quadrant      x            y
##  1:   q11111 -4.850 -114.2296250
##  2:   q11112 -4.525  -92.7511875
##  3:   q11121 -4.200  -74.2140000
##  4:   q11122 -3.875  -58.2703125
##  5:   q11211 -3.575  -45.7689375
##  6:   q11212 -3.275  -35.1980625
##  7:   q11221 -2.950  -25.7608750
##  8:   q11222 -2.625  -18.1453125
##  9:   q12111 -2.325  -12.6189375
## 10:   q12112 -2.025   -8.3480625
## 11:   q12121 -1.700   -4.9640000
## 12:   q12122 -1.375   -2.6296875
## 13:   q12211 -1.075   -1.2658125
## 14:   q12212 -0.775   -0.4824375
## 15:   q12221 -0.450   -0.1046250
## 16:   q12222 -0.125   -0.0046875
## 17:   q21111  0.175    0.0091875
## 18:   q21112  0.500    0.1400000
## 19:   q21121  0.825    0.5795625
## 20:   q21122  1.125    1.4484375
## 21:   q21211  1.425    2.9248125
## 22:   q21212  1.750    5.4118750
## 23:   q21221  2.075    8.9795625
## 24:   q21222  2.375   13.4484375
## 25:   q22111  2.700   19.7640000
## 26:   q22112  3.025   27.7468125
## 27:   q22121  3.325   36.8326875
## 28:   q22122  3.625   47.7140625
## 29:   q22211  3.950   61.7483750
## 30:   q22212  4.275   78.2218125
## 31:   q22221  4.575   95.8576875
## 32:   q22222  4.875  115.9640625
##     quadrant      x            y

Clusters Used in Regression

for(i in 1:3){NNS.part(x,y,order=i,Voronoi = T);NNS.reg(x,y,order=i)}

NNS Regression NNS.reg

NNS.reg can fit any \(f(x)\), for both uni- and multivariate cases. NNS.reg returns a self-evident list of values provided below.

Univariate:

NNS.reg(x,y,order=4,noise.reduction = 'off')

## $R2
## [1] 0.9998899
## 
## $MSE
## [1] 6.291015e-05
## 
## $Prediction.Accuracy
## [1] 0.02985075
## 
## $equation
## NULL
## 
## $derivative
##     Coefficient X.Lower.Range X.Upper.Range
##  1:    67.09000        -5.000        -4.600
##  2:    58.87750        -4.600        -4.125
##  3:    43.66125        -4.125        -3.625
##  4:    34.04250        -3.625        -3.000
##  5:    24.00250        -3.000        -2.650
##  6:    15.96250        -2.650        -2.025
##  7:     9.48250        -2.025        -1.400
##  8:     2.92000        -1.400        -0.600
##  9:     0.78250        -0.600         0.650
## 10:     3.09250         0.650         1.425
## 11:     9.84250         1.425         2.050
## 12:    16.44250         2.050         2.700
## 13:    24.56250         2.700         3.025
## 14:    34.72250         3.025         3.650
## 15:    44.05000         3.650         4.150
## 16:    59.31250         4.150         4.600
## 17:    67.09000         4.600         5.000
## 
## $Point
## NULL
## 
## $Point.est
## numeric(0)
## 
## $regression.points
##          x           y
##  1: -5.000 -125.000000
##  2: -4.600  -98.164000
##  3: -4.125  -70.197187
##  4: -3.625  -48.366563
##  5: -3.000  -27.090000
##  6: -2.650  -18.689125
##  7: -2.025   -8.712562
##  8: -1.400   -2.786000
##  9: -0.600   -0.450000
## 10:  0.650    0.528125
## 11:  1.425    2.924813
## 12:  2.050    9.076375
## 13:  2.700   19.764000
## 14:  3.025   27.746813
## 15:  3.650   49.448375
## 16:  4.150   71.473375
## 17:  4.600   98.164000
## 18:  5.000  125.000000
## 
## $partition
##              y NNS.ID
##   1: -125.0000  q4444
##   2: -121.2874  q4444
##   3: -117.6490  q4444
##   4: -114.0841  q4444
##   5: -110.5920  q4444
##  ---                 
## 197:  110.5920  q1111
## 198:  114.0841  q1111
## 199:  117.6490  q1111
## 200:  121.2874  q1111
## 201:  125.0000  q1111
## 
## $Fitted
##          y.hat
##   1: -125.0000
##   2: -121.6455
##   3: -118.2910
##   4: -114.9365
##   5: -111.5820
##  ---          
## 197:  111.5820
## 198:  114.9365
## 199:  118.2910
## 200:  121.6455
## 201:  125.0000
## 
## $Fitted.xy
##          x         y     y.hat
##   1: -5.00 -125.0000 -125.0000
##   2: -4.95 -121.2874 -121.6455
##   3: -4.90 -117.6490 -118.2910
##   4: -4.85 -114.0841 -114.9365
##   5: -4.80 -110.5920 -111.5820
##  ---                          
## 197:  4.80  110.5920  111.5820
## 198:  4.85  114.0841  114.9365
## 199:  4.90  117.6490  118.2910
## 200:  4.95  121.2874  121.6455
## 201:  5.00  125.0000  125.0000

Multivariate:

f= function(x,y) x^3+3*y-y^3-3*x
y=x; z=expand.grid(x,y)
g=f(z[,1],z[,2])
NNS.reg(z,g,order='max')

Inter/Extrapolation

NNS.reg can inter- or extrapolate any point of interest. The NNS.reg(x,y,point.est=...) paramter permits any sized data of similar dimensions to \(x\) and called specifically with $Point.est.

NNS.reg(iris[,1:4],iris[,5],point.est=iris[1:10,1:4])$Point.est

##  [1] 1 1 1 1 1 1 1 1 1 1

NNS Dimension Reduction Regression

NNS.reg also provides a dimension reduction regression by including a parameter NNS.reg(x,y,type="CLASS"). Reducing all regressors to a single dimension using the returned equation $equation.

NNS.reg(iris[,1:4],iris[,5],type = "CLASS")$equation

## [1] "Synthetic Independent Variable X* = (0.3480*X1  0.3525*X2  0.3769*X3  1.0000*X4)/4"

Threshold

NNS.reg(x,y,type="CLASS",threshold=...) offers a method of reducing regressors further by controlling the absolute value of required correlation.

NNS.reg(iris[,1:4],iris[,5],type = "CLASS",threshold=.35)$equation

## [1] "Synthetic Independent Variable X* = (0.0000*X1  0.3525*X2  0.3769*X3  1.0000*X4)/3"

and the point.est=... operates in the same manner as the full regression above, again called with $Point.est.

NNS.reg(iris[,1:4],iris[,5],type = "CLASS",threshold=.35,point.est=iris[1:10,1:4])$Point.est

##  [1] 1.000000 1.000000 1.000000 1.000000 1.000000 1.227822 1.000000
##  [8] 1.000000 1.000000 1.000000

References

If the user is so motivated, detailed arguments further examples are provided within the following:

*Nonlinear Nonparametric Statistics: Using Partial Moments

*Deriving Nonlinear Correlation Coefficients from Partial Moments

*New Nonparametric Curve-Fitting Using Partitioning, Regression and Partial Derivative Estimation

*Clustering and Curve Fitting by Line Segments

*Classification Using NNS Clustering Analysis