Simultaneous Optimization of Multiple Responses That Involve Correlated Continuous and Ordinal Responses According to the Gaussian Copula Models
- DOI
- 10.2991/jsta.d.190701.001How to use a DOI?
- Keywords
- Gaussian copula; Mixed outcomes; Multivariate distribution; Simultaneous optimization
- Abstract
This study investigates the simultaneous optimization of multiple correlated responses that involve mixed ordinal and continuous responses. The proposed approach is applicable for responses that have either an all ordinal categorical form are continuous but have different marginal distributions, or when standard multivariate distribution of responses is not applicable or does not exist. These multiple responses have rarely been the focus of studies despite their high occurrence during experiments. The copula functions have been used to construct a multivariate model for mixed responses. To resolve the computational problems of estimation under a high dimension of responses, we have estimated parameters of the model according to a pairwise likelihood estimation method. We adapted the generalized distance approach to determine settings of the factors that simultaneously optimized the mean of continuous responses and desired cumulative categories of the ordinal responses. A simulation study was used to evaluate the performance of the estimators from the pairwise likelihood approach. Finally, we presented an application of the proposed method in a real data example of a semiconductor manufacturing process.
- Copyright
- © 2019 The Authors. Published by Atlantis Press SARL.
- Open Access
- This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).
1. INTRODUCTION
Improvements to product or process performance are important problems in pharmacology, agriculture, and industry, which is the focus of attention by manufacturers. Consequently, of particular interest is detection of optimal settings for the control factors at which the response presents certain desired characteristics. Several publications focus on optimizing a single response (Draper [1]; Hoerl [2]; Peace [3]; Fowlkes and Creveling [4]; Paul and Khuri [5]). However, numerous situations exist where multiple responses need to be simultaneously optimized. For example, in clinical trials it is important to determine the combination of drugs with maximum therapeutic effects and the lowest level of toxicity. In a semiconductor manufacturing process minimizing the defect count in the sensitive area and achieving the target amount of an ion implanted in a wafer may simultaneously need to be considered for an ion implantation process.
Most quality improvement studies research approaches for simultaneous optimization of multiple responses. The desirability function is one of the most popular methods to optimize a multi-response system (Harrington [6]; Derringer and Suich [7]; Kim and Lin [8]). This approach, turns the several responses into a single response, which results in a combined desirability. The desirability function approach is easy to apply and allows the user to make a subjective judgment on the importance of each response; however, this approach does not take into account the variance–covariance structure of the responses. Ignoring the possible correlations between the responses may be misleading and leads to incorrect optimization decisions. To overcome this difficulty, Elsayed and Chen [9], Ko et al. [10], Pignatiello [11], Tsui [12], and Vining [13] have proposed the use of the loss function approach to optimize multiple correlated responses. Khuri and Conlon [14] introduced an efficient optimization algorithm based on a generalized distance approach. These researchers assumed that all mean responses in the system depended on the same set of controllable variables via a polynomial regression model. The first step of their algorithm was to obtain individual optima of the estimated responses over the experimental region, after which they measured the deviation from the ideal optimum by means of a distance function expressed in terms of the estimated responses along with their variance–covariance structure. Finally, this function could be minimized to arrive at a set of suitable operating conditions.
The following regression models are used in the mentioned multiple response optimization procedure: ordinary least squares (OLS), generalized least squares (GLS), and multivariate regression (MVR), all of which are under the assumption of normal errors. OLS and GLS regressions consider modeling responses individually with the assumption of independent responses. OLS regression is under the homogeneity of error variances, whereas GLS is free from error variances. MVR models the responses simultaneously, by taking into consideration a correlation between the errors. Normality assumption of responses and homogeneity of error variances may be violated in situations where the responses are not normal, discrete, or exhibit heterogeneous variances. Such situations happen frequently in clinical and epidemiological studies. Mukhopadhyay and Khuri [15] have recently modified Khuri and Conlon's [14] algorithm to a generalized linear model (GLM), which can be used to handle multiple discrete responses and bypass the heterogeneity of error variances. They assume all margin distributions are the same and the joint distribution of responses are available and belong to the multivariate exponential family.
A number of different proposed methods optimize multiple continuous responses. Su and Tong [16] have used principle component analysis. However, Lai and Chang [17], Lu and Antony [18], and Tong and Su [19] used the fuzzy theorem whereas Wu and Chyu [20] suggested a mathematical programming method.
However, ordinal responses observed are observed in number of experiments due to the quality characteristic or the convenience of the measurement technique and cost-effectiveness. In the optimization of ordinal responses, Taguchi [21–23] primarily employed the accumulation analysis (AA) method. In this method the corresponding cumulative categories are defined, then the researcher determines the effects of the factor levels according to the probability distribution by the categories. Finally, the optimal control factor settings are obtained by the desired cumulative category and the important location effects are taken into consideration. Nair [24] has proposed two scoring schemes that separately detect the location and dispersion effects. Jeng and Guo [25] presented a weighted probability-scoring scheme (WPSS) to avoid the computational complexity of Nair's scoring scheme. Thy considered the location and dispersion effects. Chipman and Hamada [26] used a GLM with Bayesian estimation techniques to optimize these type of responses. Computational complexity was more than Nair's scoring scheme.
In case of mixed responses very few studies have been conducted that optimize the ordinal-continuous responses. Hsieh and Tong [27] have employed an artificial neural network technique to optimize the ordinal-continuous responses, a method that is employed with difficulty in industrial settings. Wu [28] has presented an approach based on the quality loss function of Taguchi [23] where ordinal responses can be treated as continuous responses and a weighted average quality loss is defined for the ordinal responses. This approach is easier than the approach by Hsieh and Tong [27], however there is no correlation between the responses.
In this article, we introduce an approach to simultaneously optimize mixed correlated continuous and ordinal responses. Our procedure can be easily applied when responses are all ordinal, all continuous with different types of marginal distributions, or in cases where standard multivariate distribution of responses is not applicable or does not exist. For example, when the entire marginal distribution of responses is gamma and the responses are correlated, it is difficult to determine the multivariate exponential distribution. In this approach we have used the Gaussian copula function. We extended the regression models for a bivariate mixed outcomes of De Leon and Wu [29] to the multivariate mixed discrete and continuous outcomes through pairwise fitting of models for the joint modeling of a multivariate mixed outcome based on the concept by Fieuws and Verbeke [30]. After specifying the effects of the factor levels on the mean continuous responses and probability distributions by the categories of the ordinal responses, we have adopted the generalized distance approach of Khuri and Conlon [14] to carry out the optimal control factor settings by mean of continuous responses and desired cumulative categories of ordinal responses. Copula-based dependencies, introduced in statistical literature by Skalar [31], allows one to model the dependence structure independently of marginal distributions. This approach provides an alternative and more useful representation of multivariate distribution compared to traditional approaches such as multivariate normality. Formally, copula can be defined as follows: Suppose that we have K marginal CDFs,
We considered the real data obtained from a semiconductor manufacturing process in which the defect counted on the sensitive area (an ordinal response) and the amount of ion implanted (a continuous response) require simultaneous investigation for an ion implantation process (Hsieh and Tong [27]), as discussed in Section 6.
This paper is organized as follows: We define a multivariate model for mixed responses in Section 2. In Section 3, parameters of regression models and variance–covariance of the parameters are simultaneously estimated. The confidence region of the parameters, estimated mean of continuous responses, and estimated desired cumulative categories of ordinal responses are obtained in this section. We use all of these for the optimization algorithm. Section 4 outlines the optimization algorithm according to a generalized distance approach. In Section 5, we have conducted a simulation study to compare the performance of estimators from pairwise and full likelihood estimation. An application of the proposed optimization algorithm is described in Section 6 with a real data example. Finally, concluding remarks are presented in Section 7.
2. A MULTIVARIATE MODEL FOR MIXED RESPONSES
We consider a mixed multi-response obtained from the ith run of the experiment,
2.1. Estimation
The maximum likelihood estimator (MLE) of
Next with applying (2), differentiation of function (3) with respect to
In situations where the dimension of the vector variable
Herein
In order to overcome these complicated problems, we can use the pairwise likelihood estimation procedure of Fieuws and Verbeke [30]. In this approach instead of maximizing the full log-likelihood, each pairwise log-likelihood is separately maximized. Let the vector parameter of all possible pair likelihoods be
For computational convenience in the estimation step the constraints on
Based on Wald [36] an approximate
In the ordinal-continuous multi-response system the goal of simultaneous optimization is determining a point,
Thus an approximation of
Herein
3. THE SIMULTANEOUS OPTIMIZATION PROCEDURE
At the outset, we individually optimize each estimated mean response of continuous variables,
Since
If
Therefore by minimizing the right-hand side of (10) over the region
4. SIMULATION STUDY: FULL VERSUS PAIRWISE LIKELIHOOD ESTIMATION
In order to evaluate and compare the performance of the estimators from the pairwise and full likelihood approaches for the mixed ordinal-continuous responses, we have considered a simple
Parameter | Truth | Ave |
||||||
---|---|---|---|---|---|---|---|---|
1 | −0.8577 | −25.0141 | 0.1035 | 0.1107 | 0.9354 | 0.0123 | 0.0798 | |
−3 | −0.1182 | −0.2141 | 0.1260 | 0.1559 | 0.8080 | 0.0243 | 0.0254 | |
−2 | −0.2783 | −1.1402 | 0.1259 | 0.1374 | 0.9164 | 0.0189 | 0.0217 | |
−1 | 3.6322 | 16.4251 | 0.1867 | 0.1641 | 1.1380 | 0.0282 | 0.0608 | |
1 | 0.7144 | −0.1973 | 0.2410 | 0.2026 | 1.1893 | 0.0411 | 0.0451 | |
−1 | −1.4872 | −2.2351 | 0.2396 | 0.2007 | 1.1939 | 0.0405 | 0.0454 | |
1 | 11.4930 | −29.3023 | 0.3426 | 0.3477 | 0.9854 | 0.1272 | 0.0890 | |
−1 | 10.1537 | −10.6593 | 0.3854 | 0.3956 | 0.9741 | 0.1513 | 0.0567 | |
2 | 10.6494 | −14.6922 | 0.5231 | 0.4963 | 1.0539 | 0.3493 | 0.0919 | |
−0.25 | 1.6823 | 116.7175 | 0.2153 | 0.1948 | 1.1054 | 0.0380 | 0.0880 | |
1 | 6.0361 | −2.0552 | 0.2547 | 0.2356 | 1.0808 | 0.0592 | 0.0396 | |
1 | 6.4798 | −15.4204 | 0.2652 | 0.2426 | 1.0930 | 0.0631 | 0.0521 | |
−2 | 5.1658 | −14.7951 | 0.3553 | 0.3184 | 1.1161 | 0.1120 | 0.0920 | |
1 | −4.3821 | 29.3046 | 0.0753 | 0.1196 | 0.6298 | 0.0162 | 0.0890 | |
0.4 | 3.5435 | −20.1926 | 0.0534 | 0.0521 | 1.0258 | 0.0029 | 0.0090 | |
0.5 | −1.5025 | 18.1566 | 0.0918 | 0.0851 | 1.0793 | 0.0073 | 0.0098 | |
0.5 | 4.4983 | −40.5024 | 0.1917 | 0.1621 | 1.1827 | 0.0268 | 0.0424 | |
0.5 | 2.7490 | −40.6943 | 0.1427 | 0.1278 | 1.1160 | 0.0165 | 0.0428 | |
0.5 | 5.1281 | −31.6022 | 0.2037 | 0.1746 | 1.1671 | 0.0311 | 0.0331 | |
0.5 | 3.3858 | −33.5907 | 0.1484 | 0.1312 | 1.1306 | 0.0175 | 0.0340 | |
0.5 | 6.8221 | 18.7066 | 0.3100 | 0.2837 | 1.0927 | 0.0817 | 0.0099 |
True values are given, the relative bias (RB = Bias ÷ Parameter) under the pairwise
Result of simulation study for pairwise and full likelihood estimation.
5. ILLUSTRATIVE EXAMPLE
In dealing with simultaneous optimization of mixed ordered categorical and continuous responses, a case study of an ion implantation process from a Taiwanese integrated circuit (IC) fabrication manufacturer was conducted by Hsieh and Tong [27] based on artificial neural networks. This example contained two quality responses: i) the amount of ion implanted in a wafer, continuous response denoted by
Level | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Level 1 | Type 1 | 6 | 50 | 5 | 4 | 25 | 1 | 1 | 1 | 1 | 1 | 1 |
Level 2 | Type 2 | 12 | 100 | 10 | 8 | 50 | 0 | 2 | 2 | 2 | 2 | 2 |
Level 3 | 18 | 150 | 15 | 12 | 75 | 3 | 3 | 3 | 3 | 3 |
Control factors and coded factors with their levels.
i | ||||||
---|---|---|---|---|---|---|
1 | 741.4 | 33 | 3 | 0 | 0 | 0 |
2 | 972.1 | 24 | 5 | 6 | 1 | 0 |
3 | 796.1 | 6 | 2 | 20 | 8 | 0 |
4 | 797.8 | 0 | 28 | 4 | 4 | 0 |
5 | 796.6 | 2 | 2 | 4 | 12 | 16 |
6 | 802.1 | 4 | 0 | 20 | 4 | 8 |
7 | 908.2 | 0 | 2 | 6 | 14 | 14 |
8 | 645.7 | 10 | 2 | 8 | 4 | 12 |
9 | 650.3 | 0 | 0 | 0 | 24 | 12 |
10 | 1072.5 | 34 | 0 | 2 | 0 | 0 |
11 | 1316.1 | 30 | 2 | 4 | 0 | 0 |
12 | 890.5 | 10 | 10 | 12 | 0 | 4 |
13 | 886.6 | 14 | 8 | 10 | 4 | 0 |
14 | 826.5 | 8 | 16 | 12 | 0 | 0 |
15 | 800.1 | 0 | 8 | 6 | 4 | 18 |
16 | 816.1 | 18 | 12 | 6 | 0 | 0 |
17 | 824.2 | 10 | 6 | 0 | 4 | 16 |
18 | 735.6 | 0 | 4 | 2 | 6 | 24 |
Experimental data.
The continuous response is a nominal-the-best (NTB) with a target value of 1000 (after the data was transformed). First, for the continuous response
Therefore log-likelihood for the presented data set in Table 4 is
Following this log-likelihood, we estimated the parameters and calculated the likelihood's score functions at the estimated parameters in R software using the “optim” and “fdHess” functions, respectively. Table 5 lists the estimated parameters, their standard errors and p-values. In this table
Parameter | Estimate | Std.Error | p-value | |
---|---|---|---|---|
7.1070 | 0.2268 | 31.3394 | 0.0000 | |
−0.1316 | 0.0789 | −1.6682 | 0.0476 | |
−0.1143 | 0.0481 | −2.3778 | 0.0087 | |
−0.0557 | 0.0488 | −1.1416 | 0.1268 | |
−0.0115 | 0.0474 | −0.2426 | 0.4042 | |
−0.0281 | 0.0487 | −0.5772 | 0.2819 | |
0.0566 | 0.0488 | 1.1610 | 0.1228 | |
0.7580 | 0.1561 | 4.8551 | 0.0000 | |
1.4816 | 0.1073 | 13.8052 | 0.0000 | |
1.1839 | 0.1009 | 11.7374 | 0.0000 | |
−0.2319 | 0.0939 | −2.4699 | 0.0068 | |
0.1324 | 0.0970 | 1.3653 | 0.0861 | |
0.3071 | 0.0942 | 3.2607 | 0.0006 | |
4.9282 | 0.4683 | 10.5234 | 0.0000 | |
5.9847 | 0.4880 | 12.2635 | 0.0000 | |
7.2068 | 0.5166 | 13.9513 | 0.0000 | |
8.2651 | 0.5351 | 15.4455 | 0.0000 | |
0.0092 | 0.1125 | 0.0822 | 0.4672 |
Estimates and standard errors.
Gamma Response | Ordinal Response | |
---|---|---|
Location | (1, 6.002, 63.848, 6.08, 4.401, 71.88) | (0, 6, 50, 15, 4, 25) |
Optimum mean response | 1000 | 0.925 |
Confidence region | (822.078, 1000) | (0.849, 0.965) |
The individual optima and confidence region
Taghuchi | Neural Network | Gaussian Copula | |
---|---|---|---|
Location | (1, 12, 50, 15, 8, 25) | (1, 6.06, 46.32, 12.19, 11.63, 52.06) | (0,7.26, 54.9, 12.55, 11.91, 25.42) |
Ordinal response | (0.54, 0.23, 0.15, 0.05, 0.03) | (0.75, 0.14, 0.07, 0.02, 0.01) | (0.85,0.094, 0.04, 0.01, 0.01) |
Gamma response | 778.113 | 912.682 | 946.277 |
Max |
0.496 | 0.245 | 0.194 |
Simultaneous optima (Example 1).
In this example we used the distance measure
For the purpose of compression we considered two points of the design region that Hsieh and Tong [27] introduced for simultaneous optimization of these responses with the Taguchi and Artificial Neural Network methods. By using these locations, estimated parameters in Table 5 and confidence region in Table 6,
6. CONCLUSION
In the simultaneous optimization problem, due to the inherent nature of the data and convenience of measurements, it is not feasible to report all of the responses as continuous variables with normal distribution. One of the most popular means is to represent the data in the ordinal categorical form. Thus the outputs may involve mixed continuous and ordinal variables. The innovative use of the copula function permits a model of various types of correlated responses, such as mixed continuous and ordinal responses, those with all ordinal categorical forms, continuous responses that have different marginal distributions, or where standard multivariate distribution of the responses is not applicable or does not exist. This paper used the pairwise likelihood estimation method for a high dimension of responses, and alleviated the computational demands of estimation. The results of the simulation study showed the usefulness of this method. Adopting the generalized distance approach would allow us to simultaneously optimize such responses by considering the dependency between them. An example demonstrated the effectiveness of the proposed method. The published methods could not be directly applied to simultaneous optimization of such correlated responses.
REFERENCES
Cite this article
TY - JOUR AU - Fatemeh Jiryaie AU - Ahmad Khodadadi PY - 2019 DA - 2019/09/09 TI - Simultaneous Optimization of Multiple Responses That Involve Correlated Continuous and Ordinal Responses According to the Gaussian Copula Models JO - Journal of Statistical Theory and Applications SP - 212 EP - 221 VL - 18 IS - 3 SN - 2214-1766 UR - https://doi.org/10.2991/jsta.d.190701.001 DO - 10.2991/jsta.d.190701.001 ID - Jiryaie2019 ER -