Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes

Ching Heng Lin, Kai Cheng Hsu, Kory R. Johnson, Marie Luby, Yang C. Fann

Research output: Contribution to journalArticle

Abstract

Introduction: Clinicians commonly use the modified Rankin Scale (mRS) and the Barthel Index (BI) to measure clinical outcome after stroke. These are potential targets in machine learning models for stroke outcome prediction. Therefore, the quality of the measurements is crucial for training and validation of these models. The objective of this study was to apply and evaluate density-based outlier detection methods for identifying potentially incorrect measurements in multiple large stroke datasets to assess the measurement quality. Method: We applied three density-based outlier detection methods including density-based spatial clustering of applications (DBSCAN), hierarchical DBSCAN (HDBSCAN) and local outlier factor (LOF) based on a large dataset obtained from a nationwide prospective stroke registry in Taiwan. The testing of each method was done by using four different NINDS funded stroke datasets. Result: The DBSCAN achieved a high performance across all mRS values where the highest average accuracy was 99.2 ± 0.7 at mRS of 4 and the lowest average accuracy was 92.0 ± 4.6 at mRS of 3. The LOF also achieved similar performance, however, the HDBSCAN with default parameters setting required further tuning improvement. Conclusion: The density-based outlier detection methods were proven to be promising for validation of stroke outcome measures. The outlier detection algorithm developed from a large prospective registry dataset was effectively applied in four different NINDS stroke datasets with high performance results. The tool developed from this detection algorithm can be further applied to real world datasets to increase the data quality in stroke outcome measures.

Original languageEnglish (US)
Article number103988
JournalInternational Journal of Medical Informatics
Volume132
DOIs
StatePublished - Dec 2019

Fingerprint

Stroke
National Institute of Neurological Disorders and Stroke
Cluster Analysis
Outcome Assessment (Health Care)
Registries
Datasets
Taiwan

Keywords

  • Barthel Index
  • Outlier detection
  • Stroke outcome
  • modified Rankin Scale

ASJC Scopus subject areas

  • Health Informatics

Cite this

Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes. / Lin, Ching Heng; Hsu, Kai Cheng; Johnson, Kory R.; Luby, Marie; Fann, Yang C.

In: International Journal of Medical Informatics, Vol. 132, 103988, 12.2019.

Research output: Contribution to journalArticle

@article{132a0ce3285048a6ac31441e136c458a,
title = "Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes",
abstract = "Introduction: Clinicians commonly use the modified Rankin Scale (mRS) and the Barthel Index (BI) to measure clinical outcome after stroke. These are potential targets in machine learning models for stroke outcome prediction. Therefore, the quality of the measurements is crucial for training and validation of these models. The objective of this study was to apply and evaluate density-based outlier detection methods for identifying potentially incorrect measurements in multiple large stroke datasets to assess the measurement quality. Method: We applied three density-based outlier detection methods including density-based spatial clustering of applications (DBSCAN), hierarchical DBSCAN (HDBSCAN) and local outlier factor (LOF) based on a large dataset obtained from a nationwide prospective stroke registry in Taiwan. The testing of each method was done by using four different NINDS funded stroke datasets. Result: The DBSCAN achieved a high performance across all mRS values where the highest average accuracy was 99.2 ± 0.7 at mRS of 4 and the lowest average accuracy was 92.0 ± 4.6 at mRS of 3. The LOF also achieved similar performance, however, the HDBSCAN with default parameters setting required further tuning improvement. Conclusion: The density-based outlier detection methods were proven to be promising for validation of stroke outcome measures. The outlier detection algorithm developed from a large prospective registry dataset was effectively applied in four different NINDS stroke datasets with high performance results. The tool developed from this detection algorithm can be further applied to real world datasets to increase the data quality in stroke outcome measures.",
keywords = "Barthel Index, Outlier detection, Stroke outcome, modified Rankin Scale",
author = "Lin, {Ching Heng} and Hsu, {Kai Cheng} and Johnson, {Kory R.} and Marie Luby and Fann, {Yang C.}",
year = "2019",
month = "12",
doi = "10.1016/j.ijmedinf.2019.103988",
language = "English (US)",
volume = "132",
journal = "International Journal of Medical Informatics",
issn = "1386-5056",
publisher = "Elsevier Ireland Ltd",

}

TY - JOUR

T1 - Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes

AU - Lin, Ching Heng

AU - Hsu, Kai Cheng

AU - Johnson, Kory R.

AU - Luby, Marie

AU - Fann, Yang C.

PY - 2019/12

Y1 - 2019/12

N2 - Introduction: Clinicians commonly use the modified Rankin Scale (mRS) and the Barthel Index (BI) to measure clinical outcome after stroke. These are potential targets in machine learning models for stroke outcome prediction. Therefore, the quality of the measurements is crucial for training and validation of these models. The objective of this study was to apply and evaluate density-based outlier detection methods for identifying potentially incorrect measurements in multiple large stroke datasets to assess the measurement quality. Method: We applied three density-based outlier detection methods including density-based spatial clustering of applications (DBSCAN), hierarchical DBSCAN (HDBSCAN) and local outlier factor (LOF) based on a large dataset obtained from a nationwide prospective stroke registry in Taiwan. The testing of each method was done by using four different NINDS funded stroke datasets. Result: The DBSCAN achieved a high performance across all mRS values where the highest average accuracy was 99.2 ± 0.7 at mRS of 4 and the lowest average accuracy was 92.0 ± 4.6 at mRS of 3. The LOF also achieved similar performance, however, the HDBSCAN with default parameters setting required further tuning improvement. Conclusion: The density-based outlier detection methods were proven to be promising for validation of stroke outcome measures. The outlier detection algorithm developed from a large prospective registry dataset was effectively applied in four different NINDS stroke datasets with high performance results. The tool developed from this detection algorithm can be further applied to real world datasets to increase the data quality in stroke outcome measures.

AB - Introduction: Clinicians commonly use the modified Rankin Scale (mRS) and the Barthel Index (BI) to measure clinical outcome after stroke. These are potential targets in machine learning models for stroke outcome prediction. Therefore, the quality of the measurements is crucial for training and validation of these models. The objective of this study was to apply and evaluate density-based outlier detection methods for identifying potentially incorrect measurements in multiple large stroke datasets to assess the measurement quality. Method: We applied three density-based outlier detection methods including density-based spatial clustering of applications (DBSCAN), hierarchical DBSCAN (HDBSCAN) and local outlier factor (LOF) based on a large dataset obtained from a nationwide prospective stroke registry in Taiwan. The testing of each method was done by using four different NINDS funded stroke datasets. Result: The DBSCAN achieved a high performance across all mRS values where the highest average accuracy was 99.2 ± 0.7 at mRS of 4 and the lowest average accuracy was 92.0 ± 4.6 at mRS of 3. The LOF also achieved similar performance, however, the HDBSCAN with default parameters setting required further tuning improvement. Conclusion: The density-based outlier detection methods were proven to be promising for validation of stroke outcome measures. The outlier detection algorithm developed from a large prospective registry dataset was effectively applied in four different NINDS stroke datasets with high performance results. The tool developed from this detection algorithm can be further applied to real world datasets to increase the data quality in stroke outcome measures.

KW - Barthel Index

KW - Outlier detection

KW - Stroke outcome

KW - modified Rankin Scale

UR - http://www.scopus.com/inward/record.url?scp=85072803799&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072803799&partnerID=8YFLogxK

U2 - 10.1016/j.ijmedinf.2019.103988

DO - 10.1016/j.ijmedinf.2019.103988

M3 - Article

AN - SCOPUS:85072803799

VL - 132

JO - International Journal of Medical Informatics

JF - International Journal of Medical Informatics

SN - 1386-5056

M1 - 103988

ER -