Journal of Statistical Theory and Applications

Volume 18, Issue 1, March 2019, Pages 46 - 64

On Partially Linear Single-Index Models with Missing Response and Error-in-Variable Predictors

Authors
Tsung-Lin Cheng1, Yin-Ying Lin2, Xuewen Lu3, Radhey Singh4, *
1Department of Mathematics, National Changhua University of Education, Changhua city, Taiwan
2Institute of Statistics and Information, National Changhua University of Education, Changhua city, Taiwan
3Department of Mathematics and Statistics, University of Calgary, Calgary, AB, Canada
4Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON, Canada
*Corresponding author. Email: rssingh@uwaterloo.ca
Corresponding Author
Tsung-Lin Cheng
Received 20 January 2017, Accepted 15 August 2018, Available Online 22 April 2019.
DOI
10.2991/jsta.d.190306.006How to use a DOI?
Abstract

In this paper, we consider a partially linear single-index model when missing responses and nonlinear regressors with measurement error are taken into account. Utilizing data imputation for missing values and regression calibration for error-prone regressors, we not only estimate the parameters in the linear part as well as the single-index part, but also estimate the nonparametric link function by local linear fit. Under normalization, all the proposed estimators for the regression coefficients and the link function are proven to be asymptotically normal, and some illustrative simulations are provided to justify our methods.

Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

1. INTRODUCTION

To avoid the so called “curse of dimensionality” in the nonparametric or semiparametric regression analysis, partially linear single-index models (PLSIM) emerged as an effective device for dimension reduction; see, for example, Härdle and Stoker [1], Powell et al. [2], Newey and Stoker [3], Ichimura [4], Carroll et al. [5], Xia and Härdle [6], Lu and Cheng [7], and many others. Served as an effective way of modelling a nonlinear relationship between several covariates and their response, PLSIM, however, might obtain biased estimations when the covariates and/or their response are not complete.

When one collects data (e.g., survival data), due to many practical problems, he may obtain an incomplete data set which, to such an extent, may lead to a biased estimation. Therefore, the augmentation of the missing data becomes more and more important in the data demanded world. In general, missing data might emerge in both of the responses and covariates, while in this paper we will mainly focus on the case when solely the response is missing. According to the nature of missing data, Little and Rubin [8] firstly classified the types of missingness into three categoriesmissing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). In the present paper, we will consider the MAR mechanism (see, e.g., Wang et al. [9], Yun et al. [10]) which kicks in when the probability that a response is missing dose not depend on the unobserved measurements. A very important type of missingness is censoring; in particular, for the censoring case in PLSIM, Lu and Cheng [7] adopted a Kaplan–Meier-like transformation to overcome the biasedness of the estimation of the coefficients and link function. Besides, Cheng et al. [11] considers a more difficult problem concerning the estimation of the parameters and nonparametric function for a PLSIM with censored response and covariates having measurement error. However, for general missingness of the responses in PLSIM, there’s not a paper studying on it.

Another important issue concerning incomplete data is about measurement error. Measurement error models have been largely studied in the literature, for example, Fuller [12], Carroll [13], Carroll and Stefanski [14], Carroll and Li [15], Lue [16], and Fan and Troung [17], among others. It was indicated by Carroll et al. [18] that, there are three effects caused by measurement errors: first, it causes bias in parameter estimation for statistical models; second, it leads to a loss of power, sometimes profound, for detecting interesting relationship among variables; finally, it makes the features of the data, making graphical model analysis difficult. Especially, the effects of biasedness of the parameters become severer especially when the relationship between the covariates and responses appear to be nonlinear.

In this paper, we consider the following PLSIM

Y=β0TV+λ0α0TX+σV,Xε,  α0=1,
where Y is the response variable, X=X1,,XpT and V=V1,,VqT are predictors, α0 and β0 are parameters to be estimated, λ0 is an unknown smooth function, and σ, denotes the conditional variance representing the possible heteroscedasticity. Throughout this paper, denotes the Euclidean norm. The restriction α0=1 assures identifiability. Suppose that Vi,Xi and εi are independent, and εi are assumed to have mean Eεi=0 and variance Varεi=1, for i=1,,n. Suppose that we obtain a random sample of incomplete data
Yi,δi,Vi,Xi,i=1,2,,n
from model Eq. (1), where δi=0 if Yi is missing, otherwise δi=1. The MAR assumption implies that δ and Y are conditionally independent given V and X, that is, pδ=1|Y,V,X=pδ=1|V,X.

Among the wide variety of procedures to handle missing data, data imputation is an important step. By imputing a plausible value for each missing datum, under mild conditions, the problem can be dealt with as if they were complete. Different categories of imputations can be found in Schulte Nordholt [19]. The first classification, roughly speaking, comprises the deterministic as well as the stochastic imputations [20]. The second classification is a distinction between naive and principled approaches. The naive imputations, mainly based on analyzing complete cases (listwise or pairwise), are a quick option. For example, the imputation of an unconditional mean is a naive approach. It might lead to a biased estimate even if the data are randomly missing. Little and Rubin ([8] Chapter 3) indicated that the obvious corrections of this biasedness will obtain the same estimates as found with available case procedures. The principled approaches adopt models for both the observed and missing data on which the imputations are based.

Besides, there is a distinction between imputations according to “explicit” and “implicit” models [21, 22]. Examples can be referred to the hot-deck procedures [23], in which missing values are imputed with donor cases from the set of completely observed cases. There are still many other imputation methods, for example, linear regression imputation [24], multiple imputation [20, 25], nonparametric kernel regression imputation[26, 27]), nearest neighbor imputation [28], ratio imputation [29], regression calibration [30], and semiparametric regression imputation [9], and so on. Wang and Sun [31] adopted semiparametric imputation, semiparametric regression surrogate and inverse marginal probability weighted (IMPW) approaches, separately, to estimate β and g simultaneously in the partial linear model

Y=Xτβ+gT+ε,
where Y is a scalar response MAR, X and T are complete covariates, β is a unknown regression parameter, g is an unknown measurable function, and ε is the prediction error independent of X and T. As mentioned above, Wang and Sun adopted three imputation methods to the partial linear model. When we consider PLSIM, it is found that the third imputation approach, that is, the IMPW approach, doesn’t work good according to our simulation study. Therefore, we drop the IMPW approach and adopt the other two approaches in the PLSIM setting. In this paper, we consider not only the missing responses, but also the regressors with measurement errors. Suppose that we can’t observe the real covariate X but its contaminants W instead. In a general framework, the relationship between X and W can be described as below:
W=γ+ΓX+e,
where Γ is a q×p matrix, pq, which may be known, unknown, or partly known. An important case is when Γ equals the identity matrix I. No additional assumption is made except that δ, which has mean zero and constant covariance matrix δ, is independent of X,V,ε. When X is a scalar and α0=1, model Eq. (1) is a partially linear model. Partially linear model has many applications, in which Engle et al. [32] is the first to consider this kind of models. A more general case than model Eq. (1), was studied by Carroll et al. [5] in which model Eq. (1) is replaced by g1EY|X,V, with a known link function g. Model Eq. (1) reduces to that of Carroll et al. [5] when the link function g becomes identity. Recently, partially linear single-index model with measurement error was studied by Liang and Wang [33]. They assumed the linear predictor V to be subjected to measurement errors, while in our setting not only the response is MAR but also the nonlinear regressor X has measurement errors. The paper is organized as follows: in Section 2, we depict the estimation procedures for model Eq. (1); Section 3 states the results on the asymptotic properties of our estimators; in Section 4, we present some illustrative simulations. All related proofs and theorems can be found in Appendix I. The estimation outcomes will be presented in Appendix II.

2. PROCEDURES OF ESTIMATIONS

2.1. Carroll and Li’s Transformation

As mentioned in the introduction, how to calibrate the contaminated regressors to be unbiased is a very important issue. The Carroll and Li’s [15] transformation, as stated in the following, is nothing more than a simple linear prediction of X by W,

U*=LW=covX,WΣW1W,
where ΣW is the covariance matrix of W. Suppose that the individuals in a study are indexed by i=1,,n, with the first m individuals being the validation sample, for which either the true X are observed in addition to the contaminated W or there are replicates of W. Conventionally, we refer the data consisting of i.i.d. sample Yi,Wi i=m+1,,n to be as the primary data. Typically, m is much less than n. In general, L is unknown, and it can be estimated from a validation sample. Suppose that X and W are observed in Eq. (3) for a sample with validation data. L can be estimated by
L^=co^vX,WΣ^2W1,
where co^vX,W is the sample covariance matrix between X and W and Σ^2W denotes the sample covariance matrix of W based on the validation sample Xi,Wi,i=1,2,,m. Each row of L^ is the usual least squares regression slope of the corresponding coordinate of X against W with intercept included. Set U^i*=L^Wi for i=m+1,,n and define the associated sample covariance matrix Σ^U*=L^Σ^1WL^ based on the primary sample. Hereafter, the Ui* can be replaced by U^i* when L is unknown.

Suppose on the other hand that, we have a replicated data rather than a validation sample. As in Carroll and Li [15] and Lue [16], we consider an important special case when Γ is known and p=q. W.L.O.G. we take Γ=I. Let

Wij=γ+Xi+eijj=1,2,i=1,,m

If Σe is the covariance matrix of eij, then L=covX,WΣW1=covX,γ+X+eΣW1=ΣXΣW1=ΣWΣeΣW1=IΣeΣW1. Let Σ^e and Σ^W12Σ^e be the sample covariance matrices of Wi1Wi2/2 and Wi1+Wi2/2, respectively; and let

L^=IΣ^eΣ^W1.

With this choice of L, some similar results could be obtained.

2.2. Estimations of PLSIM with Missing Response and Error-Prone Predictors

Consider the PLSIM model defined by Eq. (1). In this section, we assume that we are given a data set with partially missing response and error-prone regressors in the nonlinear single-index term. In order to remedy the biasedness of estimations caused by missing and measurement error, we propose a modified quasi log-likelihood estimation procedure via an iterative minimization algorithm.

Let θ=α,β be the vector of model parameters. If λ were known and the data is free of measurement error and missing, the quasi log-likelihood estimator of θ0=α0,β0 and λ0 is the one to minimize

Lnθ,λ=i=m+1nYiβTVi+λαTXi2withα=1.

In the case when the data consists of MAR response variables and error-prone regressors, some auxiliary treatment of the data set is necessary. A difficulty common to single-index model is that, minimizing Eq. (6) involves the estimation of the nonparametric function λ. We partition Y into two parts Y=Yobs,Ymis, with Yobs indicating the observed values, and the s×1 vector Ymis, indicating values that are missing. Assume that the observations are Yi,δi,Vi,Wi:i=m+1,,n, which is a random sample from the population {Y,δ,V,W defined by Eqs. (1) and (2).

We denote the transformed Ui to be Ui*,i=m+1,,n. Note that the transformed regressors Ui*=LW, where L=covX,WΣW1 if L is known and L^=co^vX,WΣ^W1 otherwise. First, assume that Y is not missing and can be observed completely. For fixed u and v in a near neighborhood of u, one may approximate the unknown smooth function λ(v) by

λvλu+λuvua0+a1vu,
which is called a “local linear fit.” Thus, finding λu is tantamount to finding the intercept a0 of the approximating regression line. Around u, model Eq. (1) is approximately becoming
Y=βTV+λu+λuαTXu+σV,Xε.

In order to claim that, when we replace X in model Eq. (1) by LW, the estimated α is unchanged, we reduce our problem to the following simple linear case. Consider the following model

Y*=a0+a1αTXu+ε*,
where Y*=YβTV, X is the same as that in Eq. (1) and is uncorrelated with ε*=σV,Xε, and a0 and a1 are constants. Let W satisfy Eq. (3), we may consider the following model
Y*=a0+a1αTU*u+ε*,
where U*=LW. It is clear that model Eqs. (9) and (10) have the same estimate of α when L is known. Moreover, even if L is unknown, utilizing the validation data to obtain L^=co^vX,WΣ^w1, model Eqs. (9) and (10) still have approximately the same estimate of α.

Now we return to the case as Y is partially missing. Let Z=V,X, σ2Z=Eε2|Z and Δz=Pδ=1|Z=z. Motivated by Wang and Sun [31], we have YiIm=δiYi+1δiβ0TVi+λ0α0TXi, that is, YiIm=Yi if δi=1, otherwise, YiIm=β0TVi+λ0α0TXi. By MAR assumption, we have EYIm|Z=EδY+1δβ0TV+λ0α0TX|Z=β0TV+λ0α0TX=EY|Z. But YiIm contains unknown α0, β0 and λ0. Naturally, we might replace YiIm by

YiI=δiYi+1δiβ^0obsTVi+λ^0obsα^0obsTXi
where α^0obs, β^0obs and λ^0obs are obtained by our estimation algorithm below by replacing Y** with Yobs. Similarly, we may define
YiR=β^0obsTVi+λ^0obsα^0obsTXi
to be as the semiparametric regression surrogate. Then we substitute these synthetic data, YI and YR, into Step 1 to estimate both parametric component θ0 and nonparametric function λ0 by using the local linear fit and denote the corresponding estimator by θ^0I=α^0I,β^0I, λ^0I, θ^0R=α^0R,β^0R, λ^0R, respectively. With the local model Eq. (8), we may estimate λu~ by minimizing the following modified local quasi-likelihood
i=m+1nY**βTVi+a0+a1αTUi*u~2KhαTUi*u~
with respect to a0 and a1, where Kh=h1K/h, h is a suitable bandwidth, u~ is a fixed real number, and Y** may be YI or YR according to which augmentation is used. Fan and Gijbels [34] proposed a nonparametric estimator of λu~, which is defined by
λ^u~=i=m+1nwi(u~)Y**/i=m+1nwiu~,
with
wiu~=Ku~αTUi*hsn,2u~αTUi*sn,1.
where
sn,l=i=m+1nKu~αTUi*hu~αTUil,l=1,2.

Our estimation algorithm consists of the following steps:

  • Step 1: Treat the synthetic data Y** and U* as complete data and obtain initial guess of θ0=α0,β0 by Xia and Härdle’s [6] algorithm. Let θ^=α^,β^ be the initial guess of θ0. Set α^=1.

  • Step 2: Find λ^u~;h,θ^=a^u~ as a function of u~ by minimizing

    i=m+1nY**βTVi+a0+a1αTUi*u~2KhαTUi*u~.

  • Step 3: Update θ^ by minimizing

    i=m+1nY**βTVi+λ^αTUi*;h,θ^2
    with respect to θ=α,β.

  • Step 4: Iterate Steps 2 and 3 until convergence is achieved.

3. ASYMPTOTIC THEOREMS FOR THE ESTIMATORS

In this section, we will establish the asymptotic normality of the estimators of the parameters emerging in the PLSIM model. Condition A is given to ensure the asymptotic properties of the estimators to hold.

Condition A.

  1. The kernel K is a symmetric function on [-1,1], and satisfies uniform Lipschitz condition of order 1 on R.

  2. The random vectors V and U*=LX+e are bounded.

  3. The marginal density fu~ of U~=α0TU* is positive, and has a continuous second derivative on its compact support DR.

  4. The random vector U*=LX+e has a compact support Rp, Dλ0 is an open interval containing αTu*:α=1,u*. The second derivative of λ0u~ exists, is continuous and bounded on Dλ0.

  5. The functions EU*|U~=u~ and EV|U~=u~ are twice differentiable in u~D, and their second derivatives satisfy Lipschitz condition of order 1. On the boundaries, the continuity and differentiability mean left or right continuity and differentiability.

  6. For a given λ^, assume that α^α0 and β^β0=Opn1/2, that is, the initial estimates are in a n-neighborhood of the true parameter values in probability, respectively.

  7. Let

    Ψ=U*λ0U~V,H=ΨEΨ|U~,ϵ**=Y**β0TV+λ0α0TU*,
    both Q=EH2 and Ω=EHϵ**2 are positively definite, Y** may be YI or YR and ϵ** may be ϵI or ϵR according to which augmentation is used.

Theorem 1.

Under Condition A and the following conditions on the bandwidth: nh40 and nh3=Ologn, as n, hold. Then, the estimator θ^0I=α^0I,β^0I from the iterative algorithm satisfies

n1/2α^0Iα0β^0Iβ0DN0,Q1ΩQ1,
where Q and Ω are defined in Condition Avii, D denotes convergence in distribution.

Theorem 2.

Under the same conditions as given in Theorem 1, the estimator  θ^0R=α^0R,β^0R from the iterative algorithm satisfies

n1/2α^0Rα0β^0Rβ0DN0,Q1ΩQ1.

It is interesting to note that α^0I,β^0I have the same asymptotic variance as α^0R,β^0R, which has been shown by Wang and Sun [31]. All these related theorems are referred to Appendix I.

By the root-n consistency of α^,β^ and the assumptions for the bandwidth h and the kernel function K, we may prove that λ^u~;α^,β^λ^u~;α0,β0=Opn1/2. When α0 and β0 are known, we can easily prove the asymptotic normality of λ^u~;α0,β0 using the results in Fan and Gijbels [34]. Therefore, the asymptotic normality for the local linear estimator λ^u~;α^,β^ with estimated parameters α^ and β^ can be stated as follows:

Theorem 3.

Let f be the density function of U~=α0TU*. If h=On1/5 and K has third-order continuous derivatives and its third-order derivative is bounded on D, then under Condition A, on the covariates U~1,,U~n, for any interior point u~D,

nhλ^u~;α^,β^λ0u~λ0u~cKh2/2DN0,dKσ*2u~,
where σ*u~=VarY**β0TV|U~=u~, cK=+v2Kvdv and dK=+K2vdv.

4. SIMULATION

Example

In this example, we conduct some Monte Carlo simulations to estimate the regression coefficients for an partially single-index model with incomplete data, and q=dimV=2 and p=dimX=2. Let X=X1,X2T, X1Uniform2,2, X1'Triangular2,2, X2=13X1+23X1 and V=V1,V2T, where V1 and V2Bernoullip=0.5 are independent. Assume in addition that the covariates V and X are independent. One would notice that X1 and X2 are dependent. Let the data be generated from the following model:

Y=β0TV+λ0α0TX+ε,
where εN0,σ02=0.52, the true parameters are β0=1,2T and α0=2/2,2/2T and the true unknown function is λ0u~=1/2u~2/22+6, u~=α0Tu*. First, we consider the case as Y is MAR. We generate, respectively, 300 replicates of random sample of size n=60,120, and 240 for the following three mechanisms:

Case 1: Δ1z=Pδ=1|V=v1,v2,X=x1,x2=0.8+0.2|v1|+|v2|+|x1|+|x2| if |v1|+|v2|+|x1|+|x2|1, and =0.90 elsewhere.

Case 2: Δ2z=Pδ=1|V=v1,v2,X=x1,x2=0.90.2|v1|+|v2|+|x1|+|x2| if |v1|+|v2|+|x1|+|x2|1.5, and =0.80 elsewhere.

Case 3: Δ3z=Pδ=1|V=v1,v2,X=x1,x2=0.80.2|v1|+|v2|+|x1|+|x2| if |v1|+|v2|+|x1|+|x2|1, and =0.50 elsewhere.

By conducting Monte Carlo simulations, the mean response rates of the above three cases are EΔ1z0.90, EΔ2z0.78, and EΔ3z0.51, respectively. Accordingly, our missing proportions are about 10%, 22%, and 49%, respectively. Second, we focus on the case when the response Y is MAR and the covariate X of nonparametric part has a validation data concerning its contamination W and itself. We assume that the primary sample size is n and the sample size of the validation data is m, γ=0, Γ=I, and the distribution of e are normal with mean 0, variance 3/4.

In Table A.1 (resp. Table A.4), we report the results of α^0I,β^0Iresp. α^0R,β^0R when Y is MAR and X is without measurement error. In Table A.2 (resp. Table A.5), we consider the case as X has measurement error with σe2 of e taken to be 3/4. After calibrating W into U*, we report the results of α^0I,β^0Iresp. α^0R,β^0R. While in Table A.3 (resp. Table A.6), the error-prone W is not calibrated and the other assumptions about missing are preserved. We conduct 300 simulations totally for each table. In these tables the sample mean (MEAN), standard derivation (SD), root-mean-square error (RMSE), and the median (MED) are represented as a function of the sample size n, primary size n, validation size m, and the missing proportion p. We use the well-known Epanechnikov kernel function Kv=341v2I|v|1 to do the kernel smoothing. Figures A.1A.6 illustrate the true nonparametric curve and the fitted curve (dotted curve).

From Tables A.1 and A.4, all the proposed estimates of α0,β0 have similar SD and RMSE. α^0I and α^0R perform similarly and β^0I performs slightly better than β^0R. From Tables A.2, A.3, and Tables A.5, A.6, those estimates of α0,β0 with calibrated outperform those with W uncalibrated. From Figures A.1 and, A.4 λ^0I and λ^0R perform similarly. From Figures A.2, A.3 and Figures A.5, A.6, both approaches work to relieve the effect upon missingness and measurement error.

APPENDIX I

Proof of Theorem 1. and 2.

The proof of Theorem 2 is just a part of arguments used in the proof of Theorem 1, therefore we omit it. Here, we give a detailed proof of Theorem 1 only.

Denote

Ψ=U*λ0U~V,Λ=U*λ0U~00V,
and
Ω=EΨEΨ|U~ϵIΨEΨ|U~ϵIT,
where U~=α0TU* and ϵI=YIλ0U~+β0TV. Let Q=Bα0,β0Aα0,β0, with
Aα0,β0=EΨΨT,Bα0,β0=EEΨ|U~EΨT|U~.

The proof consists of two steps. The first step is to obtain an expansion for λ^. For simplicity, let a0=a0u~=λ0u~, a1=a1u~=hλ0u~, ϵiI*=YiIa0+a1U~iu~/h+β0TVi. Without loss of generality, suppose that D=c,d for <c<d<, and define D0=c+h,dh and D1=DD0, where h is the bandwidth. Let

Lnu~=n1i=1nKhU~iu~ϵiI*fu~α^Tα0TEU*λ0U~|U~=u~β^Tβ0TEV|U~=u~.

We will show that

supu~|D0|λ^(u~;α^,β^)λ0(u~)Ln(u~)|=op(n1/2)+Op(h2),supu~|D0|λ^(u~;α^,β^)λ0(u~)Ln(u~)|=op(n1/2)+Op(h2)+Op(h).

Denote the k×k identity matrix by Ik and Pα0 by

Pα0=Ipα0α0T00Iq.

Then, we will obtain the following representation:

Pα0Qn1/2α^α0β^β0=n1/2i=1nPα0ΨiEΨi|U~iϵiI+op1=Sn+op1,
where ϵiI=YiIλ0U~i+β0TVi. The second step is to show that the first term on the right-hand side of Eq. (A.3) has an asymptotic variance–covariance matrix Pα0ΩPα0. Therefore,
n1/2α^α0β^β0=Pα0QSn+Pα0Qop1,
where A denotes the generalized inverse of a square matrix A, Pα0QSn has an asymptotic variancecovariance Pα0QPα0ΩPα0Pα0QT=QΩQ=Q1ΩQ1, Pα0Qop1=op1 since the elements of Pα0Q are finite. To the end, Theorem 1 is proved by applying the central limit theorem. Now, we start to derive the desired results in each step.

Proof of (A.2).

Let a0=λ0u~, a1=hλ0u~. The local linear estimates of a0 and a1 are obtained from solving

0=n1i=1nKhU~^iu~1U~^iu~/hϵ^iI*,
where ϵ^iI*=YiIa^0+a^1U~^iu~/h+β^TVi, .^ indicates the estimated error and .^* indicates a local version of the estimated error. By this convention, we define ϵ^iI*=YiIλ^α^TUi*;α^,β^+β^TVi. Using a Taylor expansion approximately and eliminating higher order term, we get uniformly for u~D,
0=n1i=1nKhU~iu~1U~iu~/h{ϵ^iI*a^0a0U~iu~/h×a^1a1a1/hα^Tα0TUi*β^Tβ0TVi}+opn1/2+Oph2.

Solving the above equation for a^0a0, we have uniformly for u~D,

a^0a0=1/n1i=1nKhU~iu~n1i=1nKhU~iu~ϵiI*a1/h×α^Tα0TUiβ^Tβ0TVi+opn1/2+Oph2.

Let f^u~=n1i=1nKhU~iu~ be the kernel estimator of fu~, we have the following results about the kernel density estimators (proofs are put in Sections I.1 and I.2):

supu~D|n1i=1nKhU~iu~a1/hUi*/f^u~EU*|U~=u~λ0u~|=Oph,
supu~D|n1i=1nKhU~iu~Vi/f^u~EV|U~=u~|=Oph,
supu~D|n1i=1nKhU~iu~ϵiI*/f^u~0|=Oph,
and
supu~D0|f^u~fu~|=Oph,supu~D1|f^u~fu~|=Op1.

Since

n1i=1nKhU~iu~ϵiI*n1i=1nKhU~iu~n1i=1nKhU~iu~ϵiI*fu~
=n1i=1nKhU~iu~ϵiI*f^u~×fu~f^u~fu~,
by Eqs. (A.5) and (A.7), we obtain
supu~D0|n1i=1nKhU~iu~ϵiI*n1i=1nKhU~iu~n1i=1nKh(U~iu~)ϵi(I)*fu~|=Oph2
and
supu~D1|n1i=1nKhU~iu~ϵiI*n1i=1nKhU~iu~n1i=1nKhU~iu~ϵiI*fu~|=Oph.

Substituting the kernel terms in the linearized Eq. (A.4) by their asymptotic counterparts, we obtain Eq. (A.2).

Proof of (A.3).

By a Taylor expansion, we have

λ^α^TUi*;α^,β^λ0α0TUi*    =λ^α^TUi*;α^,β^λ^α0TUi*;α^,β^+λ^α0TUi*;α^,β^λ0α0TUi*    =λ^α^0TUi*;α^,β^α^TαoTUi*+λ^α0TUi*;α^,β^λ0α0TUi*+opn1/2    =λ0α0TUi*α^TαoTUi*+λ^α0TUi*;α^,β^λ0α0TUi*+opn1/2.

With ξ being the Lagrange multiplier, we know that α^,β^ is the solution to

0=ξα^0+n1/2i=1nΛ^iDi,
where
Λ^i=Ui*λ^α^TUi*;α^,β^00Vi,Di=ϵ^iIϵ^iI,,
ϵ^iI=YiIλ^α^TUi*;α^,β^+β^TVi.

Let

D0i=ϵiIϵiI,.

By Taylor expansion, we obtain

Di=D0i+1111λ^α^TUi*;α^,β^λ0α0TUi*ViTβ^β0+opn1/2=D0i+11ViTβ^β0+11λ^α^TUi*;α^,β^λ0α0TUi*+opn1/2.

Since Λ^i=Λi+op1, we have

0=ξα^0+n1/2i=1nΛiD0i+n1/2i=1nΛi11ViTβ^β0+n1/2i=1nΛi11λ^α^TUi*;α^,β^λ0α0TUi*+op1.

By Eq. (A.8), we get

  n1/2i=1nΛi11λ^α^TUi*;α^,β^λ0α0TUi*=n1/2i=1nΛi11λ0α0TUi*Ui*Tα^α0  +n1/2i=1nΛi11λ^α^TUi*;α^,β^λ0α0TUi*+op1.

Plugging this into Eq. (A.9) gives

0=ξα^0+n1/2i=1nΛ^iD0i+n1/2i=1nΛ^i11ViTβ^β0+n1/2i=1nΛ^i11λ0α0TUi*Ui*α^α0+n1/2i=1nΛ^i11λ^α^TUi*;α^,β^λ0α0TUi*+op1.

This leads to

0=ξα^0+n1/2i=1nΛiD0i+n1/2i=1nΛi1111ΛiTα^α0β^β0+n1/2i=1nΛi11λ^α0TUi*;α^,β^λ0α0TUi*+op1.

Note that by using matrix notation, Lnu~ in Eq. (A.2) can be written as

Lnu~=n1i=1nKhU~iu~ϵiI*fu~+EΛ11T|U~=u~α^α0β^β0.

Then from Eq. (A.2) and the definition of Aα0,β0, we obtain

0=ξα^0+n1/2i=1nΛiD0i+Aα0,β0n1/2α^α0β^β0+n1/2i=1nΛi11EΛi11T|U~=U~iα^α0β^β0+n1/2i=1nΛi11n1j=1nKhU~jU~iϵiI*fU~i+n1/2i=1nΛi11{Oph2IU~iD0+OphIU~iD1+opn1/2+Oph2}+op1.

It is easy to see that the sixth term is Eq. (A.10) is Opnh2+op1=op1. The fifth term in Eq. (A.10) is essentially the same as (a proof is given in Section I.3)

n1/2i=1nEΛi11|U~=U~iϵiI+op1.

From

n1i=1nΛi11EΛi11T|U~=U~ipEEΛ11|U~EΛ11T|U~=Bα0,β0
and the definition of Q, Eq. (A.10) can be written as
0=ξα^0+n1/2i=1nΛiϵiIϵiI+EΛi11|U~iϵiIQn1/2α^α0β^β0+op1.

Multiplying both sides by Pαo and noticing that Λi1,1T=Ψi, we obtain the first equality in Eq. (A.3). At the moment, we focus on those auxiliary results required to establish the first equality.

Section I.1. Proofs of (A.5) and (A.7)

Proof of (A.5).

Let ψ*i denote the quantity ψa0u~+a1u~U~iu~/h+β0TVi and let ψi denote the similar quantity ψa0U~i+β0TVi for some differential and bounded function ψ or one of the quantities Vi and Ui* shown up in Eq. (A.5). We will show that

supu~D|n1i=1nKhU~iu~ψ*i/f^u~n1i=1nKhU~iu~ψi/f^u~|=Oph
and
supu~D|n1i=1nKhU~iu~ψi/f^u~Eψ|U~=u~|=Oph.

Equation (A.12) will be used in the proof of Eq. (A.6). First, we assume that Eq. (A.13) holds and we prove Eq. (A.12). Let ψt=ψt/t, then

ψ*iψi=ψλ0u~+λ0u~U~iu~+β0TViψλ0U~i+β0TVi=ψξiu~λ0u~λ0U~i+λ0u~U~iu~=ψξiu~λ0ξiu~/2U~iu~2=OpU~iu~2,
where ξiu~ is between λ0U~i+β0TVi and λ0u~+λ0u~U~iu~+β0TVi, ξiu~ is between U~i and u~. Therefore
supu~D|n1i=1nKhU~iu~ψ*i/f^u~n1i=1nKhU~iu~ψi/f^u~|Op1supu~D|n1i=1nKhU~iu~U~iu~2/f^u~|=Oph.
using Eq. (A.13) by taking ψi=U~iu~2 and noticing that EU~iu~2|U~i=u~=0, this proves Eq. (A.12).

Now we prove Eq. (A.13). Let r^hu~=n1i=1nKhU~iu~ψi, then

n1i=1nKhU~iu~ψi/f^u~Eψ|U~=u~=r^hu~Er^hu~Ef^u~f^u~Ef^u~Er^hu~f^u~Ef^u~+Ef^u~Ef^u~  +Er^hu~Ef^u~Eψ|U~=u~I1u~+I2u~.

We consider I2u~ first. Since

Er^hu~=EKhU~iu~ψi=EKhU~iu~Eψi|U~i=1hcdKyu~hEψ|U~=yfydy=cu~hdu~hKtEψ|U~=u~+htfu~+htdt=cu~hdu~hK(t)dtEψ|U~=u~fu~+Oh
and
Ef^u~=cu~hdu~hKtdtfu~+Oh
hold uniformly for u~D, we have
supu~D|I2u~|=Oh.

To finish the proof, it suffices to show

supu~D|r^hu~Er^hu~|=Oph,
supu~D|f^u~Ef^u~|=Oph.

We prove Eq. (A.16) but Eq. (A.17), since Eq. (A.17) is very easy to prove. We consider a more general case where ψi might be unbounded but |ψ|CψCTTig for some constant Cψ, CT, g>0 and some i.i.d. random variables Ti for which supi,u~DETi2s+1g|U~i=u~< and supiETi2s+1g< for some s>1. Taking Nn=h1/s and writing

r^hu~=n1i=1nKhU~iu~ψiI|ψi|Nn+n1i=1nKhU~iu~ψiI|ψi|>NnJ1u~+J2u~,
it suffices to shown
supu~D|J1u~EJ1u~|=Oph
and
supu~D|J2u~EJ2u~|=Oph.

When ψi is bounded, Eq. (A.19) is trivial.

Suppose that Mn intervals u~:|u~u~l|ηn,l=1,2,,Mn, cover the compact set D and the union of the these intervals equals D. Then, for any >0,

Psupu~D|J1u~EJ1u~|>h=Psupl=1,,Mnsup|u~u~l|ηn|J1u~EJ1u~|>hPsupl=1,,Mn|J1u~lEJ1u~l|>2h+Psupl=1,,Mnsup|u~u~l|ηn|J1u~J1u~lEJ1u~EJ1u~l|>2h.

By Condition A(ii), there exists some constants CK>0 and CL>0 such that |Ku*|CK and |Ku1*Ku2*|CL|u1*u2*|. Taking Mn=On2 and ηn=On2, when |u~u~l|ηn, we have

|J1u~J1u~l|=|nh1i=1nKU~iu~hKU~iu~lhψiIψiNn|nh1CL|u~u~lh|nNn=CLh2+1/sηn=Onh32Oh41/s=oph.

Therefore, supl=1,,Mnsup|u~u~l|ηn|J1u~J1u~l|=oph.

Similarly, supl=1,,Mnsup|u~u~l|ηn|EJ1u~EJ1u~l|=oh. Hence, the second probability in Eq. (A.20) is negligible. Let diu~=KU~iu~hψiI|ψi|Nn and Snu~=i=1ndiu~Ediu~. Then, |diu~Ediu~|2CKNn and σn2=VarSnu~ =nEK2U~iu~hψ2iI|ψi|NnEKU~iu~hψiI|ψi|Nn2 =OnhOnh2=Onh, because Eψ2i|U~i=u~Cψ2CT2ETi2g|U~i=u~<M< for some constants Cψ>0 and M>0 by the the preceding assumptions. Without loss of generality, we assume σn2=nh. By Bernstein’s inequality, for any ω>0, we get

P|Snu~l|ωσn2expω22+232CKNnσnω.

Taking ω=hσn/2 and noticing σn=nh and Nn=h1/s, s>1, we get

P|Snu~l|nh/2h2exphσn222+232CKNnσnhσn2=2exphσn222+23CKh11/s=2exp22Onh32+23CKh11/sO2n3/322Assume CKh11/s<1.

Since Mn=On2, when is large enough so that 3322>2, we get

Psupl=1,,Mn|i=1ndiu~lEdiu~l|h2σn2MnO2n3/322n0.

This implies

supl=1,,Mn|1nhi=1ndiu~lEdiu~l|=Oph.

Combining Eqs. (A.20) and (A.21) proves Eq. (A.18).

Now we prove Eq. (A.19). By Conditions A, |ψi|CψCTTig for some constant CT, Cψ>0. From |Ku*CK|, we have

supu~D|J2u~EJ2u~|=2CψCKCTh1ni=1nTigITig>Nn
since ETigITig>Nn=t>Nn1/gtgdFTt, where FTt is the c.d.f. of T, and Nn=h1/s, s>1. Let Qn=Nn1/g=h1/sg, we get
t>QntgdFTth2=t>QntgdFTtQn2sgt>Qnt2s+1gdFTtn0,
because ET2s+1g< by the preceding assumptions. This implies 1/nh i=1nTigITig>Nn=Oph. Therefore, by Eq. (A.22), we obtain Eq. (A.19).

Proof of (A.7).

Since

supu~D0|f^u~fu~|supu~D0|f^u~Ef^u~|+supu~D0|Ef^u~fu~|,
supu~D1|f^u~fu~|supu~D1|f^u~Ef^u~|+supu~D1|Ef^u~fu~|,
supu~D1|Ef^u~fu~|supu~D1|Ef^u~|+supu~D1|fu~|=O1,
using Eq. (A.17) and noticing that supu~D0|Ef^u~fu~|=Oh2, we obtain Eq. (A.7).

Section I.2. Proof of (A.6)

Proof of (A.6).

Since ϵiI*=YiIλ0u~+λ0u~U~iu~+β0TVi, it suffices to show

supu~D|n1i=1nKhU~iu~ϵiI*/f^u~n1i=1nKhU~iu~ϵiI/f^u~|=Oph
and
supu~D|n1i=1nKhU~iu~ϵiI/f^u~0|=Oph.

The proof of Eq. (A.23) is similar to that of Eq. (A.12), we omit it. Now we prove Eq. (A.24). Note that En1i=1nKhU~iu~ϵiI=0, by the same arguments used in the proof of Eq. (A.13), it suffices to show

supu~D|n1i=1nKhU~iu~ϵiIEn1i=1nKhU~iu~ϵiI|=Oph.

By decomposition, it suffices to show

supu~D|n1i=1nKhU~iu~λ0U~iEn1i=1nKhU~iu~λ0U~i|=Oph,
supu~D|n1i=1nKhU~iu~β0TViEn1i=1nKhU~iu~β0TVi|=Oph
and
supu~D|n1i=1nKhU~iu~YiIEn1i=1nKhU~iu~YiI|=op1.

We shall apply the similar techniques used in the proof of Eq. (A.16) to prove the preceding three equalities. By Conditions A(ii) and (iv), λ0U~i and β0TVi are bounded random variables, the proofs of Eqs. (A.26) and (A.27) are straightforward. And we can obtain Eq. (A.28) by observing that

En1i=1nKhU~iu~YiIEKhU~iu~YiI2
=n2i=1nEKhU~iu~YiIEKhU~iu~YiI20.

Section I.3. Proof of (A.11)

Proof of Eq. (A.11).

To prove Eq. (A.11) is equivalent to prove the following equality:

ϵiI*=ϵiI+op1.

Noting that

ϵiI=YiIEYiI|Vi,Ui*=YiIλ0U~i+β0TVi
and
ϵiI*=YiIλ0u~+λ0u~U~iu~+β0TVi,
we have
ϵiIϵiI*=λ0u~+λ0u~U~iu~λ0U~i=OU~iu~2=Oh2.

It implies that n1i=1nϵiI*=n1i=1nϵiI+op1, which leads to Eq. (A.30).

Now we return to the proof of Theorem 1. In Eq. (A.3), ΨiEΨi|U~i is a vector with p+q elements. Let Hi=ΨiEΨi|U~i and suppose its elements are Hi,l=Hi,lUi*,Vi,U~i, l=1,2,,p+q, then we consider

M1nl=n1/2i=1nHi,lϵiI,l=1,2,,p+q.

Therefore, by Eq. (A.31), we have shown that

limnEM1n2=Ω,
where M1n=M1n1,,M1np+qT. Theorem 1 is then proved by the central limit theorem for sums of independent random vectors.

APPENDIX II

α0 MEAN SD RMSE MED β0 MEAN SD RMSE MED
p = 10%, n = 60
0.707 0.710 0.061 0.061 0.707 −1 −0.998 0.148 0.147 −0.999
0.707 0.699 0.064 0.065 0.707   2 2.003 0.149 0.148 2.007
p = 22%, n = 120
0.707 0.709 0.044 0.044 0.707 −1 −0.995 0.107 0.107 −0.994
0.707 0.703 0.045 0.045 0.707   2 2.005 0.114 0.114 2.005
p = 49%, n = 240
0.707 0.704 0.038 0.038 0.707 −1 −1.002 0.096 0.096 −0.998
0.707 0.708 0.037 0.037 0.707   2 2.006 0.095 0.095 2.003

MED, median; SD, standard derivation ; RMSE, root-mean-square error.

Table A.1

Descriptive statistics of α^0(I),β^0(I) with missing response as a function of missing proportion p and sample sizes.

α0 MEAN SD RMSE MED β0 MEAN SD RMSE MED
p = 10%, n=60, m=20
0.707 0.686 0.165 0.166 0.707 −1 −0.992 0.202 0.202 −1.009
0.707 0.682 0.192 0.193 0.707   2 2.011 0.194 0.194 2.011
p = 22%, n=120, m = 40
0.707 0.673 0.127 0.131 0.707 −1 −0.995 0.152 0.152 −0.989
0.707 0.718 0.122 0.122 0.707   2 2.007 0.149 0.149 2.014
p = 49%, n=240, m = 80
0.707 0.665 0.105 0.113 0.695 −1 −1.005 0.137 0.136 −1.001
0.707 0.733 0.095 0.098 0.719   2 1.993 0.127 0.127 1.993

MED, median; SD, standard derivation ; RMSE, root-mean-square error.

Table A.2

Descriptive statistics of α^0(I),β^0(I) with missing response and error-prone σe2=3/4 predictors when W calibrated and primary size n, validation size m, and missing proportion p.

α0 MEAN SD RMSE MED β0 MEAN SD RMSE MED
p = 10%, n=60, m = 20
0.707 0.741 0.105 0.110 0.715 −1 −1.000 0.206 0.205 −1.003
0.707 0.651 0.129 0.140 0.699   2 1.973 0.203 0.204 1.964
p = 22%, n=120, m = 40
0.707 0.747 0.080 0.089 0.740 −1 −0.977 0.138 0.140 −0.981
0.707 0.653 0.097 0.111 0.673   2 1.995 0.150 0.150 1.987
p = 49%, n=240, m = 80
0.707 0.752 0.065 0.079 0.748 −1 −1.017 0.127 0.128 −1.014
0.707 0.651 0.077 0.095 0.663   2 1.984 0.127 0.128 1.985

MED, median; SD, standard derivation ; RMSE, root-mean-square error.

Table A.3

Descriptive statistics of α^0(I),β^0(I) with missing response and error-prone σe2=3/4 predictors when W, not calibrated and primary size, validation size and missing proportion.

α0 MEAN SD RMSE MED β0 MEAN SD RMSE MED
p = 10%, n = 60
0.707 0.709 0.046 0.046 0.707 −1 −1.002 0.144 0.144 −1.004
0.707 0.702 0.049 0.049 0.707   2 1.992 0.151 0.151 1.985
p = 22%, n = 120
0.707 0.710 0.040 0.040 0.707 −1 −0.989 0.108 0.108 −0.992
0.707 0.701 0.042 0.042 0.707   2 2.000 0.102 0.102 1.996
p = 49%, n = 240
0.707 0.710 0.038 0.039 0.707 −1 −0.994 0.097 0.097 −0.988
0.707 0.702 0.041 0.041 0.707   2 2.004 0.092 0.092 2.005

MED, median; SD, standard derivation ; RMSE, root-mean-square error.

Table A.4

Descriptive statistics of α^0(R),β^0(R) with missing response as a function of missing proportion and sample sizes n.

α0 MEAN SD RMSE MED β0 MEAN SD RMSE MED
p = 10%, n=60, m = 20
0.707 0.696 0.136 0.136 0.707 −1 −1.005 0.230 0.230 −0.985
0.707 0.689 0.151 0.151 0.707   2 2.002 0.183 0.183 2.003
p = 22%, n=120, m = 40
0.707 0.672 0.113 0.118 0.707 −1 −1.008 0.168 0.168 −1.001
0.707 0.724 0.109 0.110 0.707   2 1.999 0.154 0.154 2.002
p = 49%, n=240, m = 80
0.707 0.675 0.105 0.109 0.707 −1 −1.014 0.129 0.130 −1.019
0.707 0.724 0.102 0.103 0.707   2 1.988 0.133 0.133 1.995

MED, median; SD, standard derivation ; RMSE, root-mean-square error.

Table A.5

Descriptive statistics of α^0(R),β^0(R) with missing response and error-prone σe2=3/4 predictors when calibrated and primary size, validation size and missing proportion.

α0 MEAN SD RMSE MED β0 MEAN SD RMSE MED
p = 10%, n=60, m = 20
0.707 0.748 0.095 0.104 0.729 −1 −0.977 0.195 0.196 −0.965
0.707 0.645 0.125 0.140 0.684   2 2.021 0.210 0.210 2.021
p = 22%, n=120, m = 40
0.707 0.751 0.070 0.083 0.738 −1 −1.001 0.147 0.146 −0.998
0.707 0.651 0.087 0.103 0.674   2 2.008 0.161 0.161 2.011
p = 49%, n=240, m = 80
0.707 0.754 0.073 0.087 0.750 −1 −1.013 0.136 0.137 −1.016
0.707 0.647 0.087 0.106 0.661   2 1.984 0.134 0.135 1.996

MED, median; SD, standard derivation ; RMSE, root-mean-square error.

Table A.6

Descriptive statistics of α^0(R),β^0(R) with missing response and error-prone σe2=3/4 predictors when not calibrated and primary size n, validation size m and missing proportion.

Figure A.1

Simulated curves of λ^0(I) with missing response, different sample sizes n and different missing proportions p (the title for the x axis: Single-index α^0(I)TX, solid circle: the response is observed, circle: the response is missing, solid line: the true curve, dashed line: the fitted curve).

Figure A.2

Simulated curves of λ^0(I) with missing response and error-prone σe2=3/4 predictors when W calibrated and primary size n, validation size m and missing proportion p (the title for the x axis: Single-index α^0(I)TU*, solid circle: the response is observed, circle: the response is missing, solid line: the true curve, dashed line: the fitted curve).

Figure A.3

Simulated curves of λ^0(I) with missing response and error-prone σe2=3/4 predictors when W not calibrated and primary size n, validation size m and missing proportion p (the title for the x axis: Single-index α^0(I)TW, solid circle: the response is observed, circle: the response is missing, solid line: the true curve, dashed line: the fitted curve).

Figure A.4

Simulated curves of λ^0(R) with missing response, different sample sizes n and different missing proportions p (the title for the x axis: Single-index α^0(R)TX, solid circle: the response is observed, circle: the response is missing, solid line: the true curve, dashed line: the fitted curve).

Figure A.5

Simulated curves of λ^0(R) with missing response and error-prone σe2=3/4 predictors when W calibrated and primary size n, validation size m and missing proportion p (the title for the x axis: Single-index α^0(R)TU*, solid circle: the response is observed, circle: the response is missing, solid line: the true curve, dashed line: the fitted curve).

Figure A.6

Simulated curves of λ^0(R) with missing response and error-prone σe2=3/4 predictors when W calibrated and primary size n, validation size m and missing proportion p (the title for the x axis: Single-index α^0(R)TU*, solid circle: the response is observed, circle: the response is missing, solid line: the true curve, dashed line: the fitted curve).

REFERENCES

28.T. Nittner, Statist. Method & Appl., Vol. 12, 2003, pp. 195-210.
33.H. Liang and N. Wang, Statistica Sinica., Vol. 15, 2005, pp. 99-116.
Journal
Journal of Statistical Theory and Applications
Volume-Issue
18 - 1
Pages
46 - 64
Publication Date
2019/04/22
ISSN (Online)
2214-1766
ISSN (Print)
1538-7887
DOI
10.2991/jsta.d.190306.006How to use a DOI?
Copyright
© 2019 The Authors. Published by Atlantis Press SARL.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Tsung-Lin Cheng
AU  - Yin-Ying Lin
AU  - Xuewen Lu
AU  - Radhey Singh
PY  - 2019
DA  - 2019/04/22
TI  - On Partially Linear Single-Index Models with Missing Response and Error-in-Variable Predictors
JO  - Journal of Statistical Theory and Applications
SP  - 46
EP  - 64
VL  - 18
IS  - 1
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.d.190306.006
DO  - 10.2991/jsta.d.190306.006
ID  - Cheng2019
ER  -