Journal of Statistical Theory and Applications

Volume 18, Issue 2, June 2019, Pages 113 - 122

Divergence Measures Estimation and Its Asymptotic Normality Theory Using Wavelets Empirical Processes III

Authors
Amadou Diadié Bâ1, Gane Samb Lo2, *, Diam Bâ3
1Unité de Formation et de Recherche des Sciences Appliquées à la Technologie, Laboratoire d'Etudes et de Recherches en Statistiques et Développement, Gaston Berger University, Saint Louis, Sénégal
2LERSTAD, Gaston Berger University, Saint-Louis, Senegal, Evanston Drive, NW, Calgary, Canada, T3P 0J9, Associate Researcher, LSTA, Pierre et Marie University, Paris, France, Associated Professor, African University of Sciences and Technology, Abuja, Nigeria
3Unité de Formation et de Recherche des Sciences Appliquées à la Technologie, Laboratoire d'Etudes et de Recherches en Statistiques et Développement, Gaston Berger University, Saint Louis, Sénégal
*Corresponding author. Email: gane-samb.lo@ugb.edu.sn
Corresponding Author
Gane Samb Lo
Received 29 October 2018, Accepted 24 February 2019, Available Online 23 May 2019.
DOI
10.2991/jsta.d.190514.002How to use a DOI?
Keywords
Divergence measures estimation
Abstract

In the two previous papers of this series, the main results on the asymptotic behaviors of empirical divergence measures based on wavelets theory have been established and particularized for important families of divergence measures like Rényi and Tsallis families and for the Kullback-Leibler measures. While the proofs of the results in the second paper may be skipped, the proofs of those in paper 1 are to be thoroughly proved since they serve as a foundation to the whole structure of results. We prove them in this last paper of the series. We will also address the applicability of the results to usual distribution functions.

Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION AND RECALL OF THE RESULTS TO BE PROVED

For a general introduction, we refer the reader to the ten (10) first pages in [1] in which the notation and the assumption are exposed.

Let us recall here the main results we exposed in. The first is related to the empirical process based on wavelets.

Theorem 1.1.

Given the Xnn1, defined in Condition (8) such that fB,t and let fn defined as Formula (13) and Gn,Xw defined as in Formula (17). Then, under Assumptions [13], all in [1] and for any bounded function h, defined on D, belonging to B,t, we have

σh,n1Gn,XwhN0,1 asn,
where we have
σh,n2=EXKjnhX2EXKjnhX2VarhX asn.

Proof.

Suppose that Assumptions 1 and 3, in [1], are satisfied and hB,t.

We have

Dfnxfxhxdx=n,XKjnhEXh=n,XEXKjnh+EXKjnhXhX.

It comes that

Gn,Xwh=nn,XEXKjnh+nR1,n,
where R1,n=EXKjnhXhX.

To complete the proof, we have to show that (1) nn,XEXKjnh converges in distribution to a centered normal distribution and (2) nR1,n converges to zero in probability, as n. By the way, we will assume that, in the sequel, all the limits as meant as n, unless the contrary is specified.

For the first point, we show that

nn,XEXKjnhXN0,VarhX as  n,
by applying the central theorem for independent randoms variables.

Let us denote Zi,n=Zi,nh=KjnhXi, μjn=EZi,n, and σi,n2=σi,nh2=VarZi,n<. Let

Tn=i=1nZi,nμjn,
sn2=VarTn=i=1nσi,n2.

Tn/sn has mean 0 and variance 1; our goal is to give conditions under which

TnsnN0,1,  as  n+.

Such conditions are given in the Lindeberg-Feller-Levy conditions (See [4]), Point B, pp. 292).

We have to check that

L1sn1max1inσi,n0,
and for any ε>0,
L2Ln=:1sn2i=1n|Zi,nEZi,n|>εsn|Zi,nEZi,n|2d0.

To prove this, let us begin to see that x

|Kjnhxhx|=2jnDK2jnx,2jnththxdt,

By Assumption 3 in [1], we have for any xD,

|Kjnhxhx|DΦu|hx+2jnuhx|1x+2jnuDdu.

Recall that h is uniformly continuous on D and on the compact K which supports Φ. We have

ρh,n=sups,tD2,|ts|C2jn|hths|0.

For all p1, for all xD, and for c=ΦλK,

|Kjnhxhx|pfx1Dxcpρh,npfx1Dx

We get that for all p1, for any n1

E|KjnhXhX|pcpρh,np0.

Then for any 1in,

|EZi,nEhX|E|Zi,nhX|cρh,n0,
that is,
max1in|EZi,nEhX|0.

We have

|σi,nVarhX1/2|=|VarKjnhXi1/2VarhX1/2|=|EZi,nEZi,n21/2EhXEhX21/2|=|Zi,nEZi,n2hXEhX2|.

Hence,

|σi,nVarhX1/2||EZi,nEhX2KjnhXihX2|,
and using (3) and (5), we get that
max1in|σi,nVarhX1/2|0.

We have from (2)

Zi,n2=KjnhXi2=hX+KjnhXihX22hX2+|KjnhXihX|22hX2+c2ρh,n2.

Thus,

Zi,n2+EZi,n22|hX|2+c2ρh,n2+E2hX2+2c2ρh,n22|hX|2+2c2ρh,n2+2EhX22hX2+EhX2+2c2ρh,n212Z+δn
where
Z=4hX2+EhX2  and  δn=8c2ρh,n2.

Besides, the C2inequality gives

|Zi,nEZi,n|22Zi,n2+|EZi,n|2Z+δn.

By the way, we have also

Z+δn8h2+δn=Δn8h2.

To prove L1, put αn=max1in|σi,nVarhX1/2|. Then,

σi,n2max αn+VarhX1/22,αn+VarhX1/22,  for any  1in.

We get

sn2nmax VarhX1/2αn2,VarhX1/2+αn2,
hence
sn2nVarhXmax1αnVarhX1/22,1+αnVarhX1/22.

By (6), we have

sn2nVarhX1max 1αn(VarhX)1/221,1+αn(VarhX)1/2210.

And then

sn2nVarhX.

Next

sn1max1inσi,n1+αnVarhX1/2sn1+αnn0,
which proves L1.

We have

Ln1sn2i=1nZ+δn>ε2sn2Δnd=nsn28h2+δnZ+δn>ε2sn2d=nsn28h2+δnZ+δn>ε2sn2nsn48h2+δnEZ+δnε2
by Chebyshev's inequality. So
Ln8h2+δnnVarhX2δn+EZε20,
since sn4n2VarhX2 as n+.

Which proves L2.

Now that Conditions L1 and L2 have been proved, we have

TnsnN0,1,  as  n+.

But we have

Tnsn=1snn1ni=1nZi,nμjn=nsn1ni=1nZi,nμjn=nsn1ni=1nKjnhXiEKjnhX=nsnn,XEXKjnhX.

Using (7), we get

Tnsnnn,XEKjnhnVhX1/2.

Finally, from (8), we obtain

nn,XEKjnhVhX1/2N0,1,  as  n+,
that is,
nn,XEKjnhN0,VhX,  as  n+.

This ends the first point.

As to the second point, we apply Theorem 9.3 in [3] to have

EXKjnhhXDKjnhxhxfxdxC3(Kjnh)hfκ2C32jnt.

Therefore, we have

nR1,nhκ2C3n2jnt=κ2C3n12t/8=o1,
for any 1/2<t<T.

The two others main results are related to the asymptotics of class of the ϕ-divergence measures. They concern the almost sure efficiency of them.

Theorem 1.2.

Under Assumptions [13], C-A, C-h, C1-ϕ, C2-ϕ, and (BD) all in [1], we have

limsupn+|Jfn,gJf,g|anA1,a.s
limsupn+|Jf,gnJf,g|bnA2,a.s
limsupn,m+,+Jfn,gmJf,gcn,mA1+A2  a.s
where an, bn and cn are as in Formulas (16) in [1].

Proof.

In the proofs, we will systematically use the mean values theorem. In the multivariate handling, we prefer to use the Taylor-Lagrange-Cauchy as stated in [5], page 230. The assumptions have already been set up to meet these two rules. To keep the notation simple, we introduce the two following notations:

an=Δnf   and   bn=Δng.

Recall that

Gn,Xwh=nDΔnfxhxdx    and   Gn,Ywh=nDΔngxhxdx,

We start by showing that (10) holds.

We have

ϕfnx,gx=ϕfx+Δnfx,gx.

So by applying the mean value theorem to the function u1xϕu1x,gx, we have

ϕfnx,gx=ϕfx,gx+Δnfxϕ11fx+θ1xΔnfx,gx
where θ1x is some number lying between 0 and 1. In the sequel, any θi satisfies |θi|<1. By applying again the mean values theorem to the function u2xϕ11u2x,gx, we have
Δnfxϕ11fx+θ1xΔnfx,gx=Δnfxϕ11fx,gx+θ1xΔnfx2ϕ12fx+θ2xΔnfx,gx,
where θ2x is some number lying between 0 and 1. We can write (13) as
ϕfnx,gx=ϕfx,gx+Δnfxϕ11fx,gx+θ1xΔnfx2ϕ12fx+θ2xΔnfx,gx.

Now we have

Jfn,gJf,g=DΔnfxϕ11fx,gxdx+Dθ1xΔnfx2ϕ12fx+θ2xΔnfx,gxdx,
and hence,
|Jfn,gJf,g|anD|ϕ11fx,gx|dx+an2D|ϕ12fx+θ2xΔnfx,gx|dx.

Therefore,

limsupn|Jfn,gJf,g|anA1+anDϕ12fx+θ2xΔnfx,gxdx.

Under the Boundedness Assumption (6) in [1], we know that A1< and that condition (19), [1], is satisfied, that is

Dϕ12fx+θ2xΔnfx,gxdxDϕ12fx,gxdx<   as   n.

This proves (10).

Formula (11) is obtained in a similar way. We only need to adapt the result concerning the first coordinate to the second.

The proof of (12) comes by splitting Dϕfnx,gmxϕfx,gxdx, into the following two terms:

Dϕfnx,gmxϕfx,gxdx=Dϕfnx,gmxϕfx,gmxdx+Dϕfx,gmxϕfx,gxdxIn,1+In,2

We already know how to handle In,2. As to In,1, we may still use the Taylor-Lagrange-Cauchy formula since we have

fnx,gmxfx,gmx=fnxfx,0=an0.

By the Taylor-Lagrange-Cauchy (see [5], page 230), we have

In,1=DΔfnxϕfnx+θΔnfx,gmxdxanDϕfnx+θΔfnx,gmxdx=anA2+o1.

From there, the combination of these remarks directs to the result.

The second main result concerns the asymptotic normality of the ϕdivergence measures.

Theorem 1.3.

Under Assumptions [13], C-A, C-h, C1-ϕ, C2-ϕ, and (BD) all in [1], we have

nJfn,gJf,gN0,Varh1X,asn+
nJf,gnJf,gN0,Varh2Y,asn+
and as n+ and m+,
nmmVarh1X+nVarh2Y1/2Jfn,gmJf,gN0,1.

Proof.

We start by proving (15). By going back to (14), we have

nJfn,gJf,g=nDΔnfxϕ11fx,gxdx+Dθ1xnΔnfx2ϕ12fx+θ2xΔnfx,gxdx.=Gn,Xwh1+nR2,n
where R2,n=Dθ1xnΔnfx2ϕ12fx+θ2xΔnfx,gxdx.

Now by theorem 1.1, one knows that Gn,Xwh1N0,Varh1X as n provided that h1B,t. Thus, (15) will be proved if we show that nR2,n=01. We have

|nR2,n|nan2Dϕ12fx+θ2xΔnfx,gxdx.

Let show that nan2=o1. By the Bienaymé-Tchebychev inequality, we have, for any ϵ>0

nan2>ϵ=an>ϵn1/4n1/4ϵEXan2.

From Theorem 3 in [2], we have

EXan21/2=Ojn2jnn+2tjn=O14log 2log nn3/4+nt/4
where we use the fact that 2jnn1/4. Thus,
nan2>ϵ2=O14log 2log nn1/2+n12t/8

Finally nan2=o1 since

14log 2log nn1/2+n12t/80 as n+
for any t>1/2. Finally from (18) and using Condition (19) in [1], we have nR2,n0 as n+.

This ends the proof of (15).

The result (16) is obtained by a symmetry argument by swapping the role of f and g.

Now, it remains to prove Formula (17) of the theorem. Let us use bivariate Taylor-Lagrange-Cauchy formula to get,

Jfn,gmJf,g=DΔnfxϕ11fx,gxdx+DΔmgxϕ21fx,gxdx12DΔnfx2ϕ12+ΔnfxΔngxϕ1,22+Δngx2ϕ22unx,vnydx.

We have

unx,vny=fx+θΔnfx, gx+θΔngx.

Thus we get

Jfn,gmJf,g=1nGn,Xwh1+1mGm,Ywh2+Rn,m,
where Rn,m is given by
12DΔnfx2ϕ12+ΔnfxΔmgxϕ1,22+Δmgx2ϕ22unx,vnydx.

But we have

Gn,Xwh1=Nn1+o1Gm,Ywh2=Nn2+o1,
where NniN0,VarhiX, i1,2 and Nn1 and Nn2 are independent.

Using this independence, we have

f1nGn,Xwh1+1mGm,Ywh2=N0,Vh1Xn+Vh2Ym+o1n+o1m.

Therefore, we have

Jfn,gmJf,g=N0,Vh1Xn+Vh2Ym+o1n+o1m+Rn,m.

Hence,

1Vh1Xn+Vh2YmJfn,gmJf,g=N0,1+o1n1Vh1Xn+Vh2Ym+o1m1Vh1Xn+Vh2Ym+ 1Vh1Xn+Vh2YmRn,m.

That leads to

nmmVh1X+nVh2YJfn,gmJf,g=N0,1+o1+nmmVh1X+nVh2YRn,m,
since m/mVh1X+nVh2Y and m/nVh1X+nVh2Y are bounded, and then
o1n1Vh1Xn+Vh2Ym=ommVh1X+nVh2Y=o1ando1m1Vh1Xn+Vh2Ym=onmVh1X+nVh2Y=o1.

It remains to prove that nmmVh1X+nVh2YRn,m=o1. But we have by the continuity assumptions on ϕ and on its partial derivatives and by the uniform convergence of Δnfx and Δngx to zero, that

nmmVh1X+nVh2YRn,m12nan2Dϕ12fx,gxdx+o1mmVh1X+nVh2Y+12mbm2Dϕ22fx,gxdx+o1nmVh1X+nVh2Y+12nambmDϕ22fx,gxdx+o1nmVh1X+nVh2Y

As previously, we have nan2=o1, mbm2=o1 and nambm=o1.

From there, the conclusion is immediate.

We finish the series by this section on the applicability of our results for usual pdf's.

2. APPLICABILITY OF THE RESULTS FOR USUAL PROBABILITY LAWS

Here, we address the applicability of our results on usual distribution functions. We have seen that we need to avoid infinite and null values. For example, integrals in the Rényi's and the Tsallis family, we may encounter such problems as signaled in the first pages of paper [1]. To avoid them, we already suggested to used a modification of the considered divergence measure in the following way:

First of all, it does not make sense to compare two distributions of different supports. Comparing a pdf with support , like the Gaussian one, with another with support 0,1, like the standard uniform one, is meaningless. So, we suppose that the pdf's we are comparing have the same support D.

Next, for each ε>0, we find a domain Dε included in the common support D of f and g such that

Dεfxdx1ε and Dεgxdx1ε.

And there exist two finite numbers κ1>0 and κ2>0, such that we have

κ1f1Dε,g1Dεκ2.

Besides, we choose the Dϵ's increasing to D as ϵ decreases to zero. We define the modified divergence measure

Dεf,g=Dfε,gε,
where
fε=D11f1Dε,   gε=D21g1Dε,
with   D1=Dfxdx   and   D2=Dgxdx.

Based on the remarks that the Dϵ's increases to D as ϵ decreases to zero and that the equality between f and g implies that of fε and gε, we recommend to replace the exact test of f=g by the approximated test fε=gε, for ε as small as possible.

So each application should begin by a quick look at the domain D of the two pdfs and the founding of the appropriate sub-domain Dε on which are applied the tests.

Assumption (20) also ensures that the pdf's fε and gε lie in B,t for almost all the usual laws. Actually, according to [3], page 104, we have that flB,t, for some t>0, if and only if

supx|fx|+supxsuph0|ftx+h2ftx+ftxh||h|tt<,
where t stands for the integer part of the real number t, that is the greatest integer less or equal to t and fp denotes the p-th derivative function of f.

Whenever the functions fε and gε have t+1-th derivatives bounded and not vanishing on Dε, they will belong to B,t. Assumption (20) has been set on purpose for this. Once this is obtained, all the functions that are required to lie on B,t for the validity of the results, effectively are in that space. All examples we will use in this sections satisfy these conditions, including the following random variables to cite a few: Gaussian, Gamma, Hyperbolic, and so on.

3. CONCLUSION

In this last paper of this series, the main results have been proved. Wavelet theory has proved to be a good framework for processing estimates of divergence measures. We believe that having exactly the values of the scaling function will give better results in our work.

ACKNOWLEDGMENTS

The three (1 &2 &3) authors acknowledges support from the World Bank Excellence Center (CEA-MITIC) that is continuously funding his research activities from starting 2014.

Journal
Journal of Statistical Theory and Applications
Volume-Issue
18 - 2
Pages
113 - 122
Publication Date
2019/05/23
ISSN (Online)
2214-1766
ISSN (Print)
1538-7887
DOI
10.2991/jsta.d.190514.002How to use a DOI?
Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Amadou Diadié Bâ
AU  - Gane Samb Lo
AU  - Diam Bâ
PY  - 2019
DA  - 2019/05/23
TI  - Divergence Measures Estimation and Its Asymptotic Normality Theory Using Wavelets Empirical Processes III
JO  - Journal of Statistical Theory and Applications
SP  - 113
EP  - 122
VL  - 18
IS  - 2
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.d.190514.002
DO  - 10.2991/jsta.d.190514.002
ID  - Bâ2019
ER  -