Earlier works on spatial prediction issue often assume that the spatial data are realization of Gaussian random field. However, this assumption is not applicable to the skewed and kurtosis distributed data. The closed skew normal distribution has been used in these circumstances. As another alternative, we apply generalized skew Laplace distributions for defining a skew and heavy tailed random field for Bayesian prediction. Simulation study and a real problem are then applied to evaluate the performance of this model.
Spatial prediction is an important subject in statistical analysis of environmental data which are often spatially correlated. In such cases it is supposed that data come from a Gaussian random field (RF). When spatial data are non-Gaussian but a Gaussian transformation for them exist, Oliveira et al. [1], Oliveira and Ecker [2], Oliveira [3] propose spatial prediction for transformed RFs. However, normalizing transformation is unknown in application and interpretation of the transformed data is also more difficult than original data [4,5]. When distribution of data has an appropriate number of similarities with normal distribution but is asymmetric, Kim and Mallick [6] used a skew Gaussian (SG) RF. Hosseini et al. [7,8] made inference on spatial generalized linear mixed models (SGLMMs) with SG latent variables. Because of some shortage in SG distribution such as nonclosing under conditioning, Dominguez-Molina et al. [9] defined multivariate closed skew normal (CSN) distribution. Karimi and Mohammadzadeh [10,11] and Karimi et al. [12] used CSN RF for Bayesian spatial prediction, Bayesian spatial regression and inversion of seismic data. Hosseini and Mohammadzadeh [13] have done Bayesian prediction for SGLMM with CSG latent variables. However, CSN random variables are not appropriate for modeling data with heavy tails. Since t-distribution is an applicable distribution to real data with heavy tails, Røislien and Omre [14] defined a t-distributed RF, but this RF could not be suitable for modeling the skew data. An appropriate choice for modeling skew and heavy tailed data which is well defined for RF and has some similar characteristics with Gaussian field is multivariate generalized asymmetric Laplace (GAL) distribution introduced by Kozubowski et al. [15]. In this paper, we have applied multivariate GAL distributions for defining the skew and heavy tail RFs. The Bayesian predictions are then achieved by using this RF. A simulation study is performed to check the validity of the model and the proposed prediction method is used on an environmental data set.
A review on the multivariate GAL distribution is presented in Section 2. As in Section 3, the GAL RF is defined, the Bayesian spatial prediction has been considered by GAL RF in Section 4. A simulation study is discussed in Section 5 followed by application of the GAL RF to environmental real data in Section 6.
2. MULTIVARIATE GAL RANDOM VARIABLE
In this section, we review the definition and basic setup of the multivariate GAL distribution introduced by [15].
Definition 2.1
Multivariate GAL law. A continuous p-dimensional random vector X has a GAL distribution, denoted by X~GALpμ,Σ,q, if its characteristic function at t∈Rp is
ϕt=11+12tTΣt−itTμq(1)
where the location μ∈Rp, nonnegative definite dispersion p×p matrix Σ and extension q>0 are its parameters, and T denotes the matrix transpose. When q=1, we deal with asymmetric Laplace (AL) distribution, denoted by X~ALpμ,Σ. By [15], “The significance of AL distribution is partially due to the fact that these arise rather naturally as the only distributional limits for (appropriately normalized) random sums
X1+…+XNg(2)
of independent and identically random vectorXiwith finite second moments asgconverges to zero, where the integer-valued random variableNgis independent ofXiand has a Geometric distribution with mean1g. Sums similar to(2)frequently appear in many applied problems of fields such as Biology, Economics, Insurance mathematics, Reliability and other fields (Kalashnikov [16]; Klebanov et al. [17]). Thus, AL distributions have a wide variety of applications (Kotz et al. [18]). The AL distributions play an analogous role among the heavy tailed Geometric stable laws approximating sums(2)without the restriction of finite second moment [17]. From amongst the stable laws, Gaussian distributions have finite moments of all orders, and their theory is elegant and straight forward. However, in spite of finiteness of moments, their tails are substantially longer than those of the Gaussian laws. This, coupled with the fact that they allow for asymmetry, makes them more flexible in modeling heavy tailed and asymmetrical data.”
If matrix Σ is positive-definite, the distribution is truly p-dimensional and has a probability density function of the following form:
where Kuu is the modified Bessel function or Bessel function of type 3 with index u, Qx,Σ=xTΣ−1x and Ψμ,Σ=2+μTΣ−1μ.
We have the following representation for multivariate GAL random variable
X=μG+GN,(4)
where G has a standard Gamma distribution with shape parameter q and N~Np0,Σ, showing that GAL distributions are location-scale mixtures of normal distributions. By Equation (4), this distribution has some similarities with normal distribution. Therefore, it is appropriate to modeling data which has some similar characteristics with Gaussian data but are skew and heavy tailed. The stochastic representation (4) leads to many further properties of GAL random vectors, including moments, marginal, conditional distributions and linear transformations (Kozubowski et al. [15]). Here, we only review some of useful results for our work.
If X~GALpμ,Σ,q then
EX=μq(5)
CovX=qΣ+μμT(6)
If A be a real matrix l×p. Then,
AX~GALlAμ,AΣAT,q(7)
Also, let X~GALpμ,Σ,q and consider the partition XT=X1T,X2T with dimX1=p1, dimX2=p2=p−p1 and the corresponding partition of the parameters μ,Σ. Then
X1~GALp1μ1,Σ11,q(8)
If q=1, the conditional distribution of X2 given X1=x1 is the generalized p2 dimensional hyperbolic distribution Hp2λ,α,β,δ,m,Δ. By Kotz et al. [18], for conditional mean and variance we have the following results:
First of all, we use the following definition for GAL RF.
Definition 3.1
A RF Z=Zs:s∈D⊆Rd is termed a GAL RF if Z=(Zs1,…,Z(sn))~GALnμ,1qC−μμT,q for all configurations (s1,…,sn)∈D×…×D and all n∈Ν.
The parameters in this RF has been determined
Let Zs1,…,Z(sn) be the observations from a GAL RF Zs:s∈D⊆Rd at n locations (s1,…,sn). For predicting Zs0 at new location s0, based on the observations Z=Zs1,…,Z(sn), we define Z∗=Zs0,ZTT. Then we have Z∗~GALn+1F∗β,Σ∗,q where Σ∗=C∗q−F∗ββTF∗T, fs0=f1s0,…,frs0T, F∗=fs0,FTT, F=fjsin×r are known regression functions (covariates), β is the regression coefficients, C∗=C00cTcC is spatial covariance matrix for all observations and prediction, c=C0in×1, C=VarZ and Cij=CovZsi,Zsj.
Note 3.1.
In some works such as Karimi and Mohammadzadeh [10] that have used CSN models, they used C∗ as scale matrix of Z∗. This choice does not consider spatial structure properly, since VarZ∗≠C∗. Also, there is no another choice for parameters in CSN model which results in VarZ∗=C∗. Here for GAL model, we consider C∗q−F∗ββTF∗T as the scale matrix of Z∗ that by Equation (6) results in VarZ∗=C∗. In Section 5 we will show the efficiency of choosing C∗q−F∗ββTF∗T over C∗ through a simulation study, too.
Now, the best predictor of Zs0 based on the square error loss function LZ^s0,Zs0=EZ^s0−Zs02 is given by E(Zs0|Z), that can be computed by conditional distribution of Zs0|Z. In applications, the regression coefficients and spatial correlation structure are unknown and they have to be estimated. For parameter estimation we can only use the observation vector Z which has n-dimensional GAL distribution, Z~GALnFβ,Cq−FββTFT,q. We assume a stationary GAL RF with stationary spatial covariance function Ch=σ2ρh,θ, where ρ.,θ is a known correlation function, θ is spatial correlation parameter and σ2 is variance of the RF. Let η=βT,σ2,θT and choose the priors β~Nrβ0,Σ0, σ2~IGφ,ζ and also suitable improper prior πθ=1, then the posterior density of η is given by
πη|Z∝f(Z|η)ϕrβ,β0,Σ0ζφΓφ1σ2φ+1e−ζ2σ2πθ.(11)
This posterior density has a complicated form. Therefore we use Markov chain Monte Carlo (MCMC) method to generate sample from the posterior distribution of the parameters. To use Gibbs sampler, the derived full conditional distributions are given by
πβ|Z,σ2,θ∝f(Z|η)ϕrβ,β0,Σ0(12)
πσ2|Z,β,θ∝fZ|ηζφΓφ1σ2φ+1e−ζ2σ2(13)
πθ|Z,β,σ2∝fZ|η(14)
These distributions do not have a closed form solution. For generating data from these densities, we use Metropolis–Hastings (MH) algorithms in Gibbs sampler. The proposal distribution gβy:Nrβ, diagb2 is used for (12), where b is a suitable number that controls efficiency in MH sampling. The proposal distributions for (13) and (14) are gσ2y:Gammaα0,1σ2 and gθy=1, respectively. After generating a random sample η1,…,ηN∗ with size N∗ of the parameter η=βT,σ2,θT, in the second stage if q=1 we compute EZs0|Z,ηi then
EZs0|Z=1N∗∑i=1N∗EZs0|Z,ηi.(15)
The conditional expectations in summation of Equation (15) is provided in following theorem.
Let Z∗=Zs0,ZTT~GALn+1F∗β,Σ∗,q. We use Equation (9) in order to compute EZs0|Z,η. From F∗β=fTs0Fβ it can be easily concluded that μ2=fTs0β and μ1=Fβ. We only need to find suitable matrices Σ12 and Σ11. So we have to partition the matrix Σ∗.
Therefore, Σ12=c−FββTfs0 and Σ11=C−FββTFT that complete proof.
For computing variance of prediction, VarZs0|Z,η, we use Equation (10) with the same replacements of μ1=Fβ, μ2=fTs0β, Σ12=c−FββTfs0, Σ11=C−FββTFT. There is no necessity for prediction in one location at a fix stage. By using Theorem 4.1, we can predict in more than two locations. If q≠1 we generate samples Z1,…,ZM from Zs0|Z,ηi for all i=1,…,N∗ for generated parameters η1,…,ηN∗, respectively. Since for q≠1, Zs0|Z,ηi does not have a closed form, we use MH algorithm in order to generate a sample, then
where Zij is the jth sample generated from distribution of Zs0|Z,ηi. Here, we use proposal distribution gZ0y:NZ0,b12 in MH algorithm.
4. SIMULATION STUDY
In order to study the performance of GAL model, a simulation study is performed with calculations done in R. To generate, say 50 realizations, from Z., we used a stationary GAL RF with exponential covariance function C|h|=σ2exp−|h|/θ on a regular lattice 500×500 with parameters β=4,7, σ2=1, θ=4. Other parameters are β0=5,6, Σ0=diag2, ζ=2, φ=1, b=2 and α0=2. The histogram and P-P plot of simulated data given in Figure 1, shows that data are skewed and not Gaussian. The parameters are firstly estimated by using Gibbs sampler with 5000 iteration. Parameter estimates are β^=3.84,7.32, σ^2=1.76 and θ^=4.11. Plots of convergence for mean of simulated parameters given in Figure 2, show that the Bayes estimators have converged to the real parameters. In the next step, the sample was divided at random into an estimation set of size n and a validation set of size m. We choose m=10 and in this 10 random selected locations (test set), prediction is done by the other 40 locations (training data set).
Comparing the real and estimated values at these locations given in Table 1, shows plausible results. The used sample size N∗ in (19) is 60. From Table 1, we conclude that the estimation for q=1 and Σ∗ is more precise than q≠1 and C∗. To compare the performance of four used methods in Table 1, the leave-one-out method is used. In this method, we choose m=1 and repeat this process n times for n−1 remaining observations. The prediction mean of square errors (PREMS) are computed by
PREMSZ^l=∑i≠lZ^l−Zi2n,(17)
and are illustrated in Table 2, for all 4 methods. From the results in Table 2, we observe that the best results are derived by applying Σ∗ for q=1. As we expected, the estimation in GAL model for q=1 has less error than GAL model for q≠1, since for q=1, the conditional expectation EZs0|Z,η has closed form but for q≠1 this conditional expectation has to be computed by MCMC algorithms. Also, this comes from Table 2 that more precise predictions are given by using Σ∗ instead of C∗. As is mentioned in Note 1, Σ∗ properly considers the spatial covariance structure while C∗ does not completely consider this structure.
Real Value
q=1
Real Value
q≠1
Σ∗
C∗
Σ∗
C∗
30.54
30.50 (1.07)
30.59 (1.13)
183.33
183.34 (0.71)
183.36 (1.40)
14.40
14.25 (0.72)
14.60 (0.96)
34.11
33.99 (1.90)
33.97 (1.81)
15.30
15.34 (0.11)
16.10 (0.74)
16.14
16.30 (2.13)
15.8 (2.30)
57.81
57.80 (1.12)
57.42 (1.06)
17.52
17.91 (1.17)
17.86 (0.90)
65.02
64.98 (0.36)
65.70 (0.60)
34.08
33.92 (0.04)
35.10 (0.11)
19.60
19.54 (0.85)
20.50 (0.76)
9.47
9.63 (1.86)
9.94 (1.64)
4.41
4.23 (0.21)
4.90(0.50)
84.23
84.29 (2.09)
84.72 (1.83)
5.13
5.18 (1.00)
5.63 (1.27)
15.16
15.63 (0.14)
15.87 (0.27)
58.71
58.46 (.06)
59.40 (0.49)
66.74
66.71 (1.91)
67.13 (2.13)
3.71
3.77 (0.13)
3.80 (0.09)
64.98
65.42 (2.47)
65.48 (2.58)
Table 1
Real values in 10 locations and their predictions (prediction error) for different models.
Model
q=1
q≠1
Method
Σ∗
C∗
Σ∗
C∗
PREMS Z^
0.374
0.560
0.547
0.810
Table 2
The prediction mean of square errors (PREMS) in simulated generalized asymmetric Laplace (GAL) model for different models.
In order to have a knowledge about sensitivity of prediction with respect to the estimated parameters, we did predicted Z by assuming parameters, βe=3,9, σe2=2.5 and θe=2.5 which have remarkable difference with real parameters. Results show that this prediction has a very weak sensitivity to the value of estimated parameters. However, when the difference in assumed parameters and their real values is increasing, the required sample size N∗ in (16) has to increase in order to have a good prediction. Sensitivity of prediction on the values of q is very negligible, somewhat conditional expectation in Theorem 4.1 is a very appropriate approximation for the case q≠1.
5. APPLICATION TO REAL DATA
In this section, we analyze a real data set of 45 metals in 811 locations at a region near to Darab city of Iran. The histogram and Normal Q-Q plot of all metals show the skewness, heavy tail and non-Gaussian behavior of almost all data. However due to limited space we do not include all diagrams. The Histogram and Q-Q plot of two metals Sodium (Na) and Magnesium (Mg) have been shown in Figures 3. These two metals are selected because of their similarities to the GAL distribution. Figure 3, shows that GAL density functions has a good fitness to these data, where the parameters are estimated by using the maximum likelihood estimation method. Normal Q-Q plot of data shows that data are non-Gaussian. Also, the small p-values of Kolmogorov–Smirnov test and Shapiro–Wilk test for both two data set confirm that data are not Gaussian. In remainder of this section, we only consider Mg data. The scatter plots of data with respect to its coordinates given in Figure 4, show existence of some outliers in data, which we have ignored them for the time being. The scatter plots of 102 remained data show no trend in data.
We use 100 observations as the training set and 2 observations as the test set. Although we have used spatial covariance function throughout the paper, the variogram function is used for selecting a valid model. This comes from the fact that the variogram is more precise in model selection [19]. The empirical variogram of data is plotted in Figure 5.
The isotropic exponential model γ‖h‖=τ2+σ21−exp−‖h‖θ has been fitted to the empirical variogram. The variogram parameters are estimated by using package geoR. The estimates σ2=3,θ=12521,τ2=4.5,p=7.5 are used as starting values in MH algorithm. The Gibbs sampler with 5000 iterations in order to estimate parameters is also applied. Final estimated parameters by MCMC method are β^=7.2,7.5, σ2^=3.67 and θ^=11719. In order to compare performance of GAL model with a full Gaussian model, we predicted in a test location by two methods. Table 3 shows surprising results of this comparison.
Predicted Value (Standard Deviation)
Real Value
Kriging
GAL
Σ∗
C∗
Σ∗
C∗
15.29
15.52, NaN
16.63 (1.86)
15.29 (1.4)
16.21 (3.4)
18.05
19.45, NaN
17.69 (1.87)
17.93 (1.3)
14.74 (2.8)
NaN, refers to not a number.
Table 3
Comparison between Kriging and generalized asymmetric Laplace (GAL).
This comparison shows the efficiency of GAL model with respect to Kriging. It is remarkable that using Σ∗ insteadof C∗ has more precise in GAL model. However, as expected, there is no benefit in using Σ∗ instead of C∗ for Gaussian model. The negative estimate of prediction variance in Gaussian model based on Σ∗ shows that this model is not basically well defined. Prediction surfaces in Figure 6 show no smoothness for predictions based on C∗, while prediction based on Σ∗, is more smooth.
Contour graph of surface prediction is shown in Figure 7. Because of having an erratic shape of surface prediction based on C∗, the contour graph only has plotted for the left panel of Figure 6. From Figure 7, we can see spatial structure in predicted value of Mg in whole of region. This point comes from this fact that contour lines with near numbers are in a neighborhood. Surface of prediction error in Figure 8, shows less prediction error for inner locations. This result has a logical justification in statistical point of view. There is more information for prediction in an inner location than a marginal location by considering the number of observations that have basic rule in prediction for mentioned location.
6. DISCUSSION AND CONCLUDING REMARKS
In order to define a RF in terms of the multivariate skew distributions, we propose the multivariate GAL distribution which is shown to be an appropriate choice for modeling skew and heavy tailed data. A Bayesian approach was then used for spatial interpolation. The simulation study and the analysis of a real data set show the reliable performance of this model. The proposed RF can be also used for many spatial purposes such as spatial regression and spatial generalized mixed linear models which have been previously studied by SN RF and CSN RF (see, e.g., Karimi and Mohammadzadeh [11] and Hosseini and Mohammadzadeh [13]).
CONFLICTS OF INTEREST
There is no conflicts of interest in this article.
AUTHORS' CONTRIBUTIONS
All authors have effective role in Sections 1–4 and 7. However, two other sections which is simulation and application and are concerned with programming in R, is only work of the first author.
Funding Statement
The work is sponsored by the Higher Education Center of Eghlid, Iran. None of authors have received Funding for this paper.
ACKNOWLEDGMENTS
The authors would like to sincerely thank the Editor-in-Chief and reviewers for their useful comments. The first author am also indebted to Higher Education Center of Eghlid for supporting this research.
16.V. Kalashnikov, Geometric Sums: Bounds for Rare Events with Applications: Risk Analysis, Reliability, Queuing, Springer, Dordrecht, Netherlands, 2010.
17.L.B. Klebanov, T.J. Kozubowski, and S.T. Rachev, Ill-Posed Problems in Probability and Stability of Random Sums, Nova Science Publishers, New York, NY, USA, 2006.
TY - JOUR
AU - Mohammad Mehdi Saber
AU - Alireza Nematollahi
AU - Mohsen Mohammadzadeh
PY - 2021
DA - 2021/01/20
TI - Generalized Skew Laplace Random Fields: Bayesian Spatial Prediction for Skew and Heavy Tailed Data
JO - Journal of Statistical Theory and Applications
SP - 76
EP - 85
VL - 20
IS - 1
SN - 2214-1766
UR - https://doi.org/10.2991/jsta.d.210111.001
DO - 10.2991/jsta.d.210111.001
ID - Saber2021
ER -