Missing Value Imputation for RNA-Sequencing Data Using Statistical Models: A Comparative Study
- DOI
- 10.2991/jsta.2016.15.3.3How to use a DOI?
- Keywords
- Bayesian approach; Clustering analysis; EM algorithm; Missing data analysis; RNA-seq data set.
- Abstract
RNA-seq technology has been widely used as an alternative approach to traditional microarrays in transcript analysis. Sometimes gene expression by sequencing, which generates RNA-seq data set, may have missing read counts. These missing values can adversely affect downstream analyses. Most of the methods for analysing the RNA-seq data sets require a complete matrix of RNA-seq data. In the past few years, researchers have been putting a great deal of effort into presenting evaluations of the different imputation algorithms in microarray gene expression data sets, However, these are limited works for RNA-seq data sets and a comparative study for investigating the performance of the missing value imputation for RNA-seq data is essential. In this paper, we propose the use of some parametric models such as Regression imputation, Bayesian generalized linear model, Poisson mixture model, EM approach , Bayesian Poisson regression, Bayesian quasi-Poisson regression and the Bootstrap version of two latter for single imputation of missing values in RNA-seq count data sets. The approaches are also applied for identifying differentially expressed genes in the presence of missing values. Multiple imputation, proposed by Rubin (1978), is also used for multiple imputation of missing RNA-seq counts. This approach allows appropriate assessment of imputation uncertainty for missing values. The performance of the single and multiple imputations are investigated using some simulation studies. Also, some real data sets are analyzed using the proposed approaches.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - JOUR AU - Taban Baghfalaki AU - Mojtaba Ganjali AU - Damon Berridge PY - 2016 DA - 2016/09/01 TI - Missing Value Imputation for RNA-Sequencing Data Using Statistical Models: A Comparative Study JO - Journal of Statistical Theory and Applications SP - 221 EP - 236 VL - 15 IS - 3 SN - 2214-1766 UR - https://doi.org/10.2991/jsta.2016.15.3.3 DO - 10.2991/jsta.2016.15.3.3 ID - Baghfalaki2016 ER -