Proceedings of the International Conference on Innovation & Entrepreneurship in Computing, Engineering & Science Education (InvENT 2024)

Pre-processing Approach for Semi-Structured Medical Data

Authors
Aiman Haziq Ab Yazik1, Azliza Mohd Ali1, *, Sharifalillah Nordin1, Sazzli Shahlan Kasim2
1School of Computing Sciences, College of Computing, Informatics and Mathematics, Universiti Teknologi MARA, 40450, Shah Alam, Selangor, Malaysia
2Faculty of Medicine, Sungai Buloh Campus, Universiti Teknologi MARA, Shah Alam, Malaysia
*Corresponding author. Email: azliza@tmsk.uitm.edu.my
Corresponding Author
Azliza Mohd Ali
Available Online 1 December 2024.
DOI
10.2991/978-94-6463-589-8_17How to use a DOI?
Keywords
Data Preprocessing; Semi-Structured Data; Electronic Health Records (HER); Automated Machine Learning (AutoML)
Abstract

— The exponential growth of healthcare data nowadays due to the widespread use of electronic health records (EHRs) has presented both opportunities and challenges in patient care, resource allocation, and medical research. Although advances in automated machine learning (AutoML) have streamlined many processes especially in data preprocessing, however the preprocessing of semi-structured data remains a time-consuming task, particularly when the data does not conform to standard database structures. This paper introduces a preprocessing approach designed to automate and simplify the preprocessing of semi-structured medical data. The approach specifically addresses the challenges posed by nested data within columns and involves a series of steps, including renaming and removing columns, merging separated rows, and expanding data in to structured formats. Through the application of this approach to an actual medical dataset, we demonstrate its effectiveness in automating the data preparation phase. The results indicate a successful transformation of nested data into a structured format, where each previously nested element is now represented by its own row and column, thereby facilitating future data analysis. The integration of this approach into healthcare data management systems has the potential to enhance the efficiency of data preprocessing and improve the quality of subsequent data analysis. Additionally, this preprocessing step can be further refined using other techniques to enhance the data. Future research should focus on expanding the preprocessing's capability to handle a wider variety of data types and structures, and exploring its applicability to other fields that encounter similar nested data challenges.

Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Innovation & Entrepreneurship in Computing, Engineering & Science Education (InvENT 2024)
Series
Advances in Computer Science Research
Publication Date
1 December 2024
ISBN
978-94-6463-589-8
ISSN
2352-538X
DOI
10.2991/978-94-6463-589-8_17How to use a DOI?
Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Aiman Haziq Ab Yazik
AU  - Azliza Mohd Ali
AU  - Sharifalillah Nordin
AU  - Sazzli Shahlan Kasim
PY  - 2024
DA  - 2024/12/01
TI  - Pre-processing Approach for Semi-Structured Medical Data
BT  - Proceedings of the International Conference on Innovation & Entrepreneurship in Computing, Engineering & Science Education (InvENT 2024)
PB  - Atlantis Press
SP  - 160
EP  - 169
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-589-8_17
DO  - 10.2991/978-94-6463-589-8_17
ID  - Yazik2024
ER  -