Item

On Enabling Video Networks for Reliable 2D Time Echocardiography Analysis

Maani, Fadillah Adamsyah
Department
Computer Vision
Embargo End Date
2026-05-21
Type
Thesis
Date
2024
License
Language
English
Collections
Research Projects
Organizational Units
Journal Issue
Abstract
Echocardiography has become an indispensable clinical imaging modality for general heart health assessment. From calculating biomarkers such as ejection fraction to the probability of a patient s heart failure, accurate segmentation of the heart structures allows doctors to assess the heart s condition and devise treatments with greater precision and accuracy. However, achieving accurate and reliable left ventricle segmentation is time consuming and challenging due to different reasons. Hence, clinicians often rely on segmenting the left ventricular (LV) in two specific echocardiogram frames to make a diagnosis. This limited coverage in manual LV segmentation poses a challenge for developing automatic LV segmentation with high temporal consistency, as the resulting dataset is typically annotated sparsely. In response to this challenge, this work introduces SimLVSeg, a novel paradigm that enables video-based networks for consistent LV segmentation from sparsely annotated echocardiogram videos. SimLVSeg consists of self-supervised pre-training with temporal masking, followed by weakly supervised learning tailored for LV segmentation from sparse annotations. We demonstrate how SimLVSeg outperforms the state-of-the-art solutions by achieving a 93.32% (95% CI 93.21-93.43%) dice score on the largest 2D time echocardiography dataset (EchoNet-Dynamic) while being more efficient. SimLVSeg is compatible with two types of video segmentation networks: 2D super image and 3D segmentation. To show the effectiveness of our approach, we provide extensive ablation studies, including pre-training settings and various deep learning backbones. We further conduct an out-of-distribution test to showcase SimLVSeg s generalizability on unseen distribution (CAMUS dataset). Deep learning (DL) models have been advancing automatic medical image analysis on various modalities, including echocardiography, by offering a comprehensive end-to-end training pipeline. This approach enables DL models to regress ejection fraction (EF) directly from 2D+ time echocardiograms, resulting in superior performance. However, the end-to-end training pipeline makes the learned representations less explainable. The representations may also fail to capture the continuous relation among echocardiogram clips, indicating the existence of spurious correlations, which can negatively affect the generalization. To mitigate this issue, we propose CoReEcho, a novel training framework emphasizing continuous representations tailored for direct EF regression. Our extensive experiments demonstrate that CoReEcho: (1) outperforms the current state-of-the-art (SOTA) on the largest echocardiography dataset (EchoNet-Dynamic) with MAE of 3.90 & R2 of 82.44, and (2) provides robust and generalizable features that transfer more effectively in related downstream tasks.
Citation
F. A. Maani, "On Enabling Video Networks for Reliable 2D Time Echocardiography Analysis", M.S. Thesis, Computer Vision, MBZUAI, Abu Dhabi, UAE, 2024
Source
Conference
Keywords
Subjects
Source
Publisher
DOI
Full-text link