Journal of Statistical Theory and Applications

Volume 19, Issue 2, June 2020, Pages 332 - 341

Behavior of OC Curve of Generalized Exponentiated Data

Authors
Anwar Hassan1, *, Mehraj Ahmad2, Najmus Saquib Hassan3
1Department of Statistics, University of Kashmir, Jammu and Kashmir, India
2Department of Economics and Statistics, Jammu and Kashmir, India
3Agriculture Department of Plant Sciences Shambu Campus, Wollega University Ethiopia
*Corresponding author. Email: anwar.hassan2007@gmail.com
Corresponding Author
Anwar Hassan
Received 9 March 2018, Accepted 17 April 2020, Available Online 21 July 2020.
DOI
10.2991/jsta.d.200714.001How to use a DOI?
Keywords
Operating characteristics curve; Generalized exponential distribution; Left censored
Abstract

In this paper a generalized exponential distribution is considered for analyzing left-censored lifetime data as such mechanisms are applicable when the observations become available in an ordered manner with some cases where the origin and the event both occur prior to the start of follow-up. In the present study a test procedure is developed which will approximate a prescribed operating characteristics curve. We also done testing of hypothesis and tried to find values of r and C subject to the operating characteristics curve be such that Lα1=Praccept α=α1 when α1 is the true value=1γ and Lα2=Praccept α=α1 when α2 is the true valueβ. By simulation technique it has been shown that a suitable value of r is to be used for different values of γ and β.

Copyright
© 2020 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION

Two-parameter generalized exponential (GE) distribution was originally introduced by Gupta and Kundu [1] as a skewed distribution, and as an alternative to Weibull, gamma, or log-normal distribution. Because of the shape and scale parameters, it is observed that GE distribution can take different shapes and it can be used quite effectively to analyze skewed data. Extensive work has been done by several authors on GE distribution. See, e.g., by Gupta and Kundu [24], Raqab [5], Raqab and Ahsanullah [6], Zheng [7], and Mitra and Kundu [8] studied the maximum likelihood estimators of the unknown parameters of the GE distribution for left-censored data.

GE distribution, which more accurately represents time to failure, is used instead of more commonly used exponential distribution. Although, incorporation of GE distribution in life testing modeling adds to complexity of modeling and estimation, but due to its flexibility, it fits more accurately to life data than exponential distribution.

The two-parameter GE distribution has the following density function:

fx,α,λ=αλ1eλxα1eλx,x>0,α>0,λ>0(1)

Cumulative distribution function (cdf)

Fx,α,λ=1eλxα,x>0,α>0,λ>0(2)

Reliability function

Rx,α,λ=11eλxα,x>0,α>0,λ>0(3)

Hazard function

hx,α,λ=αλ1eλxα1eλx11eλxα,x>0,α>0,λ>0(4)

Here α>0 and λ>0 are the shape and scale parameters respectively. For different values of the shape parameter, the density function can take different shapes. From now on GE distribution with shape parameter α and scale parameter λ will be denoted by GEα,λ.

There is a widespread application and use of left-censoring or left-censored data in survival analysis and reliability theory, in which a subject is left censored if it is known that the event of interest occurs some time before the recorded follow-up period. For example, a study conducts as investigating factors influencing days to first oestrus in dairy cattle. You start observing your population (for argument's sake) at 40 days after calving but find that several cows in the group have already had an oestrus event. These cows are said to be left censored at day 40, in medical studies patients are subject to regular examinations. Discovery of a condition only tells us that the onset of sickness fell in the period since the previous examination and nothing about the exact date of the attack. Thus the time elapsed since onset has been left censored. Similarly, we have to handle left-censored data when estimating functions of exact policy duration without knowing the exact date of policy entry; or when estimating functions of exact age without knowing the exact date of birth. A study on the “Patterns of Health Insurance Coverage among Rural and Urban Children” (Coburn et al. [9]) faces this problem due to the incidence of a higher proportion of rural children whose spells were “left censored” in the sample (i.e., those children who entered the sample uninsured), and who remained uninsured throughout the sample. Yet another study (Danzon et al. [10]) which used data on over 900 firms for the period 1988–2000 to estimate the effect on phase-specific (phases 1, 2 and 3) biotech and pharmaceutical R&D success rates of a firm's overall experience, its experience in the relevant therapeutic category, the diversification of its experience across categories, the industry's experience in the category, and alliances with large and small firms, saw that the data suffered from left censoring. This occurred, e.g., when a phase 2 trial was initiated for a particular indication where there was no information on the phase 1 trial. Application can also be traced in econometric model, e.g., for the joint determination of wages and turnover. Here, after the derivation of the corresponding likelihood function, an appropriate dataset is used for estimation. For a model that is designed for a comprehensive matched employer–employee panel dataset with fairly detailed information on wages, tenure, experience, and a range of other covariates, it may be seen that the raw dataset may contain both completed and uncompleted job spells. A job duration might be incomplete because the beginning of the job spells is not observed, which is an incidence of left censoring (Bagger [11]). For some further examples, one may refer to Balakrishnan and Varadan [12], Lee et al. [13], etc.

Put in general terms, let X o,X 1,X 2,X 3,.,X r the variable of interest of n subjects from some GE population is measured, further let first (n−r) subjects are censored at X 1 the nrth observation and the rest r ordered samples assumed at random from some GE population and the data become available in such a way that the smallest observation comes first, the second smallest second, ………‥, and finally the largest observation last.

The main aim of this paper is to establishing the Cramer Roa lower bound and efficiency of estimates with respect to Cramer Roa lower bound. Besides that we also derived minimum variance of the biased estimates, also established testing of hypothesis and studied behavior of operating characteristics (OC) curve of GE distribution. From the point of view of acceptance testing, the OC curve based on the r out of n ordered observations for left-censored data (acceptance region of the form α^r,n<C1 or α^r,n>C2 is identical with that based on all r out of r observations, details are given in the subsequent sections of the paper.

2. MAXIMUM LIKELIHOOD ESTIMATION

In this section, maximum likelihood estimators of the GE (α, λ) are derived in presence of left-censored observations. Let X 1,X 2,X 3,.,X r be the last r order statistics from a random sample of size n following GE (α, λ) distribution. For our convenience we denote ordered statistics by X1,X2,,Xr, then the joint probability density function of the ordered statistics is given by

fx1,x2,.,Xr;α,λ=n!r!Fx1nr fx1fXr=n!r!1eλx1αnrαλreλi=1rxii=1r1eλxiα1

The log likelihood function of the observed sample is

Lα,λ=lnC+rlnα+lnλλi=1rxi+α1i=1rln1eλxi+αnrln1eλx1;          where C =n!r!

The maximum likelihood estimation (MLE) of α say α^r,n for known λ is

α^r,n=α^=r˙=1rln1eλxi+nrln1eλx1=r˙=1rTi+nrT1(5)
where Ti=ln1eλxi1  and  T1=ln1eλx11

The MLE α^r,n is the biased estimate; however unbiased estimate can be constructed as

α~r,n=α~=r1rα^r,n=r1.=1rTi+nrT1(6)

Result 1: If Xi are random variables independently and identically generalized exponentially distributed GEDα,λ, with λ known, then Ti=ln1eλxi1 fallows Expo(α).

Result 2: α^ the MLE of α has inverted gamma distribution [14] as given below:

fα^|α=1αΓr+1erαα^rαα^r+1;α^0,r,α0(7)
Eα^=rαr1  and  Vα^=r2α2r12r2

Result 3: α~ the unbiased estimate of α has inverted gamma distribution [14] as given below:

fα~|α=1αr1Γrer1αα~r1αα~r+1;α~0,r>0(8)
Eα~=α and Vα~=α2r2

The density (7) depends only on r and not on n, this is in fact identical with the density of α^r,r, i.e., MLE of α when r out of r observations are tested.

3. CRAMER RAO LOWER BOUND (CRLB)

Cramer Rao lower bound unbiased, efficient, and sufficient estimate of MLE and unbiased estimate of the parameter α. It is also suggested that an estimate of α which has minimum variance but is biased.

The log likelihood function of the observed sample is

Lα,λ=lnC+rlnα+lnλλi=1rXi+α1i=1rln1eλxi+αnrln1eλx1;      where C =n!r!
Lα,λ=lnC+rlnα+rlnλλi=1rxi+α1i=1rln1eλxi+nrln1eλx1+nrln1eλx1(9)

Using Equation (5) or (6) in Equation (9) we have

=lnC+rlnα+rlnλλi=1rxi+α1rα^+nrln1eλx1
Lα=Cλ,x+rlnαα1rα^
Lαα=rαrα^

Therefore ELαα2=Erαrα^2r2E1α1α^2=r2E1α2+1α^22αα^

ELαα2=r21α2+2α22ααELαα2=r2α2

Therefore Cramer Rao Lower Bound is α2r2

Efficiency of the estimate α^ with respect MVUE, Efficiency α^=α2r2.r12r2rα2

Efficiencyα^=11r2r2r2

Efficiency of the estimate α~ with respect MVUE, Efficiency α~=α2r2.r2α2

Efficiencyα~=r2r2

Clearly efficiency of α^ is less than α~.

Minimum variance biased estimate of α is α^B=r1r2r2α^ its variance coincides with Cramer Rao Lower Bound (CRLB).

4. DERIVATION OF TEST BASED ON THE r OUT OF n ORDERED FOLLOW-UP OBSERVATIONS DRAWN FROM GED

In this section we develop a best test on the first r ordered observations (from a sample of size n) so as to decide between two values of α,α1 & α2 i.e. Ho:α=α1 and Ho:α=α2. Case I: when α1>α2 and case II: when α1<α2.

By the best test we mean according to the usual Neyman–Pearson terminology a test which has the property that among all tests having a fixed probability γ size of rejecting α=α1 when it is true, the test in question will have the largest possible chance of rejecting α=α1 when the alternative α=α2 is true.

To derive the best test we use Neyman-Pearson (NP) lemma, according to the lemma a best test must be one for which the region of rejection can be found from the inequality.

Lx1,x2,.,Xr,α2Lx1,x2,.,Xr,α1>k
n!r! α2rλri=1r(1eλxi)α21 eλi=1rxi(1eλx1)α2nrn!r! α1rλri=1r(1eλxi)α11 eλi=1rxi(1eλx1)α1nr>k
(α2α1)i=1rln1eλxi+nrln1eλx1>lnk1
where k1=kα1α2r,

Case I: when α1>α2

Because α1and α2 are preassigned constants such that α1>α2, and using Equation (5) we have

(α2α1)rα^>lnk1r(α1α2)α^>lnk1

Since α1 and α2 are preassigned constants such that α1>α2. It fallows at once that the region of rejection/critical region for α=α1 is

α^<r(α1α2)lnk1α^<C1,      where C1=r(α1α2)lnk1
i.e., best critical region is
W1=X:α^<C1(10)

Case II when α1<α2:

(α2α1)rα^>lnk2r(α2α1)α^>lnk2;    where k2=kα2α1r

Since α1 and α2 are preassigned constants such that α2>α1. It fallows at once that the region of rejection/critical region for α=α1 is

α^>r(α2α1)lnk2α^>C2,       where C2=r(α2α1)lnk2
i.e., best critical region is
W2=X:α^>C2(11)

5. DETERMINATION OF CONSTANTS C1 AND C2

The constants C1 and C2 are so chosen as to make the probability of each of the relations (10) and (11) equal to γ when the null hypothesis Ho is true or in other words to meet the condition that the probability of rejecting α=α1 when true value equals γ, we need to choose C1 and C2 so that

Case I when α1>α2:

Pα^>C1α=α1=γ P1α^>1C1|Ho=γ

To find C1 explicitly we use the result which states that α^ has (7) as its probability density function and it can be very easily verified that rαα^~Gammar

Thus 2rαα^~Gamma12,r i,χ2r2 is a random variable which is distributed as chi-square with 2r degrees of freedom. Thus above probability relation can be written as

P2rαα^>2rα1C1|α=α1=γ  Pχ22r>2rα1C1|α=α1=γ(12)

Let us denote a chi-square variable with n degrees of freedom as χ2n and let us define the constant χγ2n by the equality Pχ2n>χγ2n=γ, where χγ2n is the upper 100γ per cent point.

Then (12) can be written as 2rα1C1=χγ22r, this gives C1=2rα1χγ22r.

Hence (12) will be satisfied if the region of rejection for α=α1 is given by

W1=X:α^<2rα1χγ22r(13)

It is convenient in what follows to use acceptance rather than rejection regions. Consequently the NP theory tells us that a simple test for α=α1 against α<α1 with type-I error equal to γ is given by an acceptance region of the form

A1=X:α^>2rα1χγ22r(14)

Case II when α1<α2: Now constant C2 can be obtained as Pα^>C2|α=α1=γ

P1α^<1C2Ho=γ  P2rαα^<2rα1C2|α=α1=γ
Pχ22r>2rα1C2|α=α1=1γ

Thus in a analogue as discussed earlier, the above probability equation can yield 2rα1C2=χ1γ22r, this gives C2=2rα1χ1γ22r.

The best critical region can be written in the form as given below

W2=X:α^>2rα1χ1γ22r(15)

It is convenient in what fallows to use acceptance rather than rejection regions. Consequently the NP theory tells us that a simple test for α=α1 against α>α1 with type-I error equal to γ is given by an acceptance region of the form

A2=X:α^<2rα1χ1γ22r(16)

6. POWER FUNCTION OF THE TEST

Case I when α1>α2: The power function of the test is

Power=1β=PXW1|H1=Pχ22r>2rα2C1|α=α2
Power=Pχ22r>α2α1χγ22r

According to NP lemma the region of rejection (13) has a greater chance of rejecting α=α1 when α=α2 is true than any other region which assigns probability γ to the rejection of α=α1 when α1 is the true value. Evidently the region (13) does not depend on the particular choice of alternative α. Therefore the region (13) gives a uniformly most powerful test in the NP sense of the hypothesis α=α1 against α<α1.

Case II when α1<α2: The power function of the test is

Power=1β=PXW2|H1=Pχ22r<2rα2C2|α=α2
Power=Pχ22r<α2α1χ1γ22r

According to NP lemma the region of rejection (15) has a greater chance of rejecting α=α1 when α=α2 is true than any other region which assigns probability γ to the rejection of α=α1 when α1 is the true value. Evidently the region (15) does not depend on the particular choice of alternative α. Therefore the region (15) gives a uniformly most powerful test in the NP sense of the hypothesis α=α1 against α>α1.

7. OC FUNCTION

Case I when α1>α2: Let us now look at the OC curve of a procedure specified by (14), i.e., let us study

Lα=Probability of accepting α=α1 when α is the true value
Lα=Pα^>2rα1χγ22r
Lα=1Pχ22r>αα1χγ22r(17)

The graph of Lα for various values of r and of the ratio αα1 when γ=0.05 is given in Figure 1.

Figure 1

OC of fests of the form MLE(α)>C1,L(α)=0.95

In the problem just discussed, it was assumed that r and γ are known and C1 is unknown. We shall now consider a problem where both r and C1 are initially unknown. We want to choose these unknowns in such a way that the resulting OC curve will have the property that

Lα1=1γ and Lα2β(18)
where α2α1 and γ and β are prescribed in advance. To meet conditions (18) means substituting α2 for α in (17) and requiring that r be such that
Lα2β  Pχ22r<α2α1χγ22rβ
or    Pχ22r>α2α1χγ22r1β

This implies that

α2α1χγ22rχ1β22rα1α2χγ22rχ1β22r(19)

Knowing (19) makes it an easy matter to find that integer r which ensures that the OC curve pass most nearly through the points α1,Lα1=1γ and α2,Lα2=β. It can be verified that as r goes through the values 1, 2, 3, …‥ the ratio χγ22r/χ1β22r is strictly decreasing and it is easy to show that it tends to zero. Consequently there is a smaller integer r such that α1α2χγ22rχ1β22r

This is the value of r which we wish to use. If with this value of r we use an acceptance region α=α1 of the form α^>C1 where C1=2rα1χγ22r

We shall have a test whose OC curve is such that Lα1=1γ and Lα2β. Incidentally, a region of acceptance for α=α1 of the form α^>C1 where C1=2rα2χ1β22r will give for same r an OC curve such that Lα11γ and Lα2=β.

Case II when α1<α2: Let us now look at the OC curve of a procedure specified by (16), i.e., let us study

Lα=Probability of accepting α=α1 when α is the true value
Lα=Pα^<2rα1χ1γ22r  Lα=P2rαα^>αχ1γ22rα1
Lα=Pχ22r>αα1χ1γ22r(20)

The graph of Lα for various values of r and of the ratio αα1 when γ=0.05 is given in Figure 2.

Figure 2

Operating characterstics of tests of the form MLE(α)<C2,L(α1)=0.95

In the problem just discussed, it was assumed that r and γ are known and C2 is unknown. We shall now consider a problem where both r and C2 are initially unknown. We want to choose these unknowns in such a way that the resulting OC curve will have the property that

Lα1=1γ and Lα2β(21)
where α2α1 and γ and β are prescribed in advance. To meet conditions (21) means substituting α2 for α in (20) and requiring that r be such that
Lα2β
Pχ22r>α2α1χ1γ22rβ

This implies that

α2α1χ1γ22rχβ22rα2α1χβ22rχ1γ22r(22)

Knowing (22) makes it an easy matter to find that integer r which ensures that the OC curve pass most nearly through the points α1,Lα1=1γ and α2,Lα2=β. It can be verified that as r goes through the values 1, 2, 3, …‥ the ratio χβ22r/χ1γ22r is strictly decreasing and it is easy to show that it tends to zero. Consequently there is a smaller integer r such that α2α1χβ22rχ1γ22r

This is the value of r which we wish to use. If with this value of r we use an acceptance region α=α1 of the form α^<C2 where C2=2rα1χ1γ22r.

We shall have a test whose OC curve is such that Lα1=1γ and Lα2β. Incidentally, a region of acceptance for α=α1 of the form α^>C2 where C2=2rα2χβ22r will give for same r an OC curve such that Lα11γ and Lα2=β.

8. UNIFORMLY MOST POWERFUL CRITICAL REGION

Case I: when α1>α2

The best critical region as given in (13) is W1=X:α^<2rα1χγ22r

W1=X:˙=1rln1eλxinrln1eλx1>χγ22r2α1

It is independent of α2, i.e., alternative value of α, therefore W1 is uniformly most powerful critical region for testing Ho:α=α1 against H1:α=α2<α1, this implies that no choice of α2 can change the size of critical region for α2<α1.

Case II: when α2>α1

The best critical region as given in (15) is W2=X:α^>2rα1χ1γ22r

W2=X:˙=1rln1eλxinrln1eλx1<χ1γ22r2α1

It is independent of α2, i.e., alternative value of α, therefore W2 is also uniformly most powerful critical region for testing Ho:α=α1 against H1:α=α2>α1, this implies that no choice of α2 can change the size of critical region for α2>α1.

However, since the two critical regions W1 and W2 are different, i.e., W1W2=, therefore there exists no critical region of size γ which is uniformly most powerful for testing Ho:α=α1 against the two-tailed alternative H1:αα1.

If α1α2=3 and γ=β=0.01, it is easy to show that a suitable r to use is 18. If α1α2=3 and γ=0.05,β=0.01, then the proper r is 14. Similarly If α1α2=3 and γ=0.1,β=0.01 then proper r is 12. This means, for instance that if we want the test procedure to accept a lot whose life characteristics is If α2=500 hours only 5% of the times, then it is possible to draw valid inference while observing only 9 observations x1n,x2n,.,x9n although rest (n-9) are left censored, and if α^>935 hours accept α=α1, if α^<935 hours accept α=α2. Such a procedure will have an OC curve for which L(α1)=0.95 and L(α2)0.05. It should be noted that “n” number of items tested is left arbitrary. If one's object is to reduce testing time, then it is clearly advisable from Table 1 to make “n” more than 9.

α1/α2 r C1/α1 C1*/α1 R C1/α1 C1*/α1 r C1/α1 C1*/α1
or Or or Or or or or
α2/α1 C2*/α2 C2/α2 C2*/α2 C2/α2 C2*/α2 C2/α2
α = 0.01, β = 0.01 α = 0.01, β = 0.05 α = 0.01, β = 0.1

1.5 132 0.82403 0.82435 95 0.797427 0.796082 76 0.777591 0.778725
2 45 0.725126 0.728697 32 0.686571 0.68677 25 0.656565 0.66333
2.5 26 0.661445 0.665692 18 0.614133 0.61886 14 0.579971 0.591365
3 18 0.614133 0.623938 12 0.558402 0.577683 10 0.532393 0.535793
4 11 0.54605 0.576369 8 0.500001 0.502409 6 0.457719 0.475904
5 8 0.500001 0.550565 6 0.457719 0.45924 4 0.398203 0.458513
10 4 0.398203 0.48588 3 0.35689 0.366887 2 0.30128 0.376073
15 3 0.35689 0.458668 2 0.30128 0.375205 2 0.30128 0.250715
20 2 0.30128 0.673153 2 0.30128 0.281404 2 0.30128 0.188037

α = 0.05, β = 0.01 α = 0.05, β = 0.05 α = 0.05, β = 0.1

1.5 98 0.853424 0.854591 66 0.825963 0.826612 51 0.805852 0.807819
2 34 0.770537 0.775582 22 0.727503 0.738565 17 0.699554 0.709745
2.5 20 0.71738 0.721883 13 0.668636 0.67624 10 0.636731 0.642952
3 14 0.677357 0.68806 9 0.6235 0.638947 7 0.591097 0.599094
4 9 0.6235 0.641491 6 0.57072 0.57405 4 0.515886 0.573142
5 7 0.591097 0.600804 4 0.515886 0.585515 3 0.476509 0.544432
10 3 0.476509 0.688002 2 0.421597 0.562807 2 0.421597 0.376073
15 2 0.421597 0.897537 2 0.421597 0.375205 2 0.421597 0.250715
20 2 0.421597 0.673153 2 0.421597 0.281404 2 0.421597 0.188037

α = 0.1, β = 0.01 α = 0.1, β = 0.05 α = 0.1, β = 0.1

1.5 82 0.87422 0.875869 53 0.84776 0.539311 40 0.828344 0.829731
2 29 0.803771 0.807497 18 0.762515 0.352952 14 0.738476 0.739206
2.5 17 0.757185 0.764511 10 0.703928 0.254692 8 0.679641 0.687268
3 12 0.722973 0.736895 7 0.664637 0.197032 5 0.625501 0.685141
4 8 0.679641 0.688206 5 0.625501 0.13656 3 0.563664 0.68054
5 6 0.646923 0.672162 3 0.563664 0.095302 2 0.514176 0.752146
10 3 0.563664 0.688002 2 0.514176 0.04216 2 0.514176 0.376073
15 2 0.514176 0.897537 2 0.514176 0.028106 2 0.514176 0.250715
20 2 0.514176 0.673153 2 0.514176 0.02108 2 0.514176 0.188037
Table 1

Values of r and acceptance regions for fixed γ,β where α= probability of rejecting α1 when α=α1;β= probability of accepting α1 when α=α1; i.e., β= probability of accepting α1 when α=α2; for both cases α1>α2 and α1<α2. Acceptance regions are of the form α^>C1, α^<C2.

ACKNOWLEDGEMENTS

The authors appreciate and thank the referee and the editor for many helpful comments and suggestions, which substantially simplified the paper.

Journal
Journal of Statistical Theory and Applications
Volume-Issue
19 - 2
Pages
332 - 341
Publication Date
2020/07/21
ISSN (Online)
2214-1766
ISSN (Print)
1538-7887
DOI
10.2991/jsta.d.200714.001How to use a DOI?
Copyright
© 2020 The Authors. Published by Atlantis Press B.V.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Anwar Hassan
AU  - Mehraj Ahmad
AU  - Najmus Saquib Hassan
PY  - 2020
DA  - 2020/07/21
TI  - Behavior of OC Curve of Generalized Exponentiated Data
JO  - Journal of Statistical Theory and Applications
SP  - 332
EP  - 341
VL  - 19
IS  - 2
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.d.200714.001
DO  - 10.2991/jsta.d.200714.001
ID  - Hassan2020
ER  -