In many practical situations, it is possible to have information about an auxiliary variate \(x_i\) (correlated with \(y_i\)) for all the population units, or at least for each unit in the sample, plus the population mean, \(\bar X\). In practice, \(x_i\) is often the value of \(y_i\) at some previous time when a complete census was taken. This approach is used in situations where the expected value and the variance of \(y_i\) is proportional to \(x_i\), so in the BLE setup, we replace some hypotheses about the \(y\)’s with ones about the first two moments of the rate \(y_i\)/\(x_i\). To the best of our knowledge, the new ratio estimator proposed below is a novel contribution in sampling survey theory.
The new ratio estimator is obtained as a particular case of model (2.4) and with the hypothesis of exchangeability, used in Bayes linear approach, applied to the rate \(y_i\)/\(x_i\) for all \(i = 1,..., N\) as described below:
\[\begin{equation} \tag{3.1} E \left( \frac{y_i}{x_i} \right) = m, \hspace{0.7cm} V \left( \frac{y_i}{x_i} \right) = v \hspace{0.7cm} \text{and} \hspace{0.7cm} Cov \left( \frac{y_i}{x_i},\frac{y_j}{x_j} \right) = c, \hspace{0.5cm} i,j = 1,...,N \hspace{0.5cm} \forall i \neq j \end{equation}\]such that: \(\sigma^2 = v - c\)
We can apply this with the BLE_Ratio() function, which receives the following parameters:
Letting \(v \to \infty\) and \(v \to \infty\), but keeping \(\sigma^2\) fixed, that is, assuming prior ignorance, we recover the ratio type estimator, found in the design-based approach: \(\hat{T}_{ra} = N \bar{X} (\bar{y}_s / \bar{x}_s)\).
This can be achieved using the BLE_SRS() function by omitting either the prior mean or the prior variance, that is:
data(BigCity)
end <- dim(BigCity)[1]
s <- seq(from = 1, to = end, by = 1)
set.seed(5)
samp <- sample(s, size = 10000, replace = FALSE)
ordered_samp <- sort(samp)
BigCity_red <- BigCity[ordered_samp,]
Expend <- BigCity_red$Expenditure
Income <- BigCity_red$Income
sampl <- sample(seq(1,10000),size=10)
ys <- Expend[sampl]
xs <- Income[sampl]The real ratio between expenditure and income will be the value we want to estimate. In this example we know its real value:
Our design-based estimator for the mean would be the ratio between sample means:
Applying the prior information about the ratio we can get a better estimate, especially in cases when only a small sample is available:
x_nots <- BigCity_red$Income[-sampl]
Estimator <- BLE_Ratio(ys, xs, x_nots, m = 0.85, v = 0.24, sigma = sqrt(0.23998))
Estimator$est.beta
#>        Beta
#> 1 0.7723287
Estimator$Vest.beta
#>             V1
#> 1 1.383985e-05
Estimator$est.mean[1:4,]
#> [1]  104.2644  230.4165  826.3917 1241.5184
Estimator$Vest.mean[1:5,1:5]
#>           V1         V2         V3         V4         V5
#> 1 32.6495313  0.5574125   1.999167   3.003421  0.5217451
#> 2  0.5574125 72.8274736   4.418010   6.637338  1.1530181
#> 3  1.9991667  4.4180104 272.623847  23.804893  4.1353134
#> 4  3.0034210  6.6373380  23.804893 421.530808  6.2126320
#> 5  0.5217451  1.1530181   4.135313   6.212632 68.0936545
Estimator$est.tot
#> [1] 4466282ys <- c(10,8,6)
xs <- c(5,4,3.1)
x_nots <- c(1,20,13,15,-5)
m <- 2.5
v <- 10
sigma <- 2
Estimator <- BLE_Ratio(ys, xs, x_nots, m, v, sigma)
Estimator
#> $est.beta
#>       Beta
#> 1 2.010444
#> 
#> $Vest.beta
#>          V1
#> 1 0.3133159
#> 
#> $est.mean
#>       y_nots
#> 1   2.010444
#> 2  40.208877
#> 3  26.135770
#> 4  30.156658
#> 5 -10.052219
#> 
#> $Vest.mean
#>          V1         V2         V3         V4        V5
#> 1  4.313316   6.266319   4.073107   4.699739  -1.56658
#> 2  6.266319 205.326371  81.462141  93.994778 -31.33159
#> 3  4.073107  81.462141 104.950392  61.096606 -20.36554
#> 4  4.699739  93.994778  61.096606 130.496084 -23.49869
#> 5 -1.566580 -31.331593 -20.365535 -23.498695 -12.16710
#> 
#> $est.tot
#> [1] 112.4595
#> 
#> $Vest.tot
#> [1] 782.5796ys <- mean(c(10,8,6))
xs <- mean(c(5,4,3.1))
n <- 3
x_nots <- c(1,20,13,15,-5)
m <- 2.5
v <- 10
sigma <- 2
Estimator <- BLE_Ratio(ys, xs, x_nots, m, v, sigma, n)
#> sample means informed instead of sample observations, parameters 'n' and 'sigma' will be necessary
Estimator
#> $est.beta
#>       Beta
#> 1 2.010444
#> 
#> $Vest.beta
#>          V1
#> 1 0.3133159
#> 
#> $est.mean
#>       y_nots
#> 1   2.010444
#> 2  40.208877
#> 3  26.135770
#> 4  30.156658
#> 5 -10.052219
#> 
#> $Vest.mean
#>          V1         V2         V3         V4        V5
#> 1  4.313316   6.266319   4.073107   4.699739  -1.56658
#> 2  6.266319 205.326371  81.462141  93.994778 -31.33159
#> 3  4.073107  81.462141 104.950392  61.096606 -20.36554
#> 4  4.699739  93.994778  61.096606 130.496084 -23.49869
#> 5 -1.566580 -31.331593 -20.365535 -23.498695 -12.16710
#> 
#> $est.tot
#> [1] 112.4595
#> 
#> $Vest.tot
#> [1] 782.5796