Pre-processing Approach for Semi-Structured Medical Data
- DOI
- 10.2991/978-94-6463-589-8_17How to use a DOI?
- Keywords
- Data Preprocessing; Semi-Structured Data; Electronic Health Records (HER); Automated Machine Learning (AutoML)
- Abstract
— The exponential growth of healthcare data nowadays due to the widespread use of electronic health records (EHRs) has presented both opportunities and challenges in patient care, resource allocation, and medical research. Although advances in automated machine learning (AutoML) have streamlined many processes especially in data preprocessing, however the preprocessing of semi-structured data remains a time-consuming task, particularly when the data does not conform to standard database structures. This paper introduces a preprocessing approach designed to automate and simplify the preprocessing of semi-structured medical data. The approach specifically addresses the challenges posed by nested data within columns and involves a series of steps, including renaming and removing columns, merging separated rows, and expanding data in to structured formats. Through the application of this approach to an actual medical dataset, we demonstrate its effectiveness in automating the data preparation phase. The results indicate a successful transformation of nested data into a structured format, where each previously nested element is now represented by its own row and column, thereby facilitating future data analysis. The integration of this approach into healthcare data management systems has the potential to enhance the efficiency of data preprocessing and improve the quality of subsequent data analysis. Additionally, this preprocessing step can be further refined using other techniques to enhance the data. Future research should focus on expanding the preprocessing's capability to handle a wider variety of data types and structures, and exploring its applicability to other fields that encounter similar nested data challenges.
- Copyright
- © 2024 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Aiman Haziq Ab Yazik AU - Azliza Mohd Ali AU - Sharifalillah Nordin AU - Sazzli Shahlan Kasim PY - 2024 DA - 2024/12/01 TI - Pre-processing Approach for Semi-Structured Medical Data BT - Proceedings of the International Conference on Innovation & Entrepreneurship in Computing, Engineering & Science Education (InvENT 2024) PB - Atlantis Press SP - 160 EP - 169 SN - 2352-538X UR - https://doi.org/10.2991/978-94-6463-589-8_17 DO - 10.2991/978-94-6463-589-8_17 ID - Yazik2024 ER -