Abstract
To address the limitation of preserving data for dynamic visualization in fetal ultrasound screening, a novel framework is proposed to facilitate the generation of fetal four-chamber echocardiogram videos, incorporating multi-source visual fusion and understanding. The framework utilizes an effective spectrogram-ultrasound synchronizer to align the ultrasound images with time, ensuring the generated video matches the actual heartbeat rhythm. It further employs effective frame interpolation techniques to synthesize a video by incorporating a nonlinear bidirectional motion prediction. By integrating a Transformer model for the autoregressive generation of visual semantic sequence, the proposed framework demonstrates its capability to generate high-resolution frames. Experimental outcomes show the Clip-Similarity of 96.23% and DINOv2-Similarity of 99.77%. Furthermore, a multimodal dataset of fetal echocardiogram examinations has been constructed.
Original language | English |
---|---|
Article number | 102510 |
Journal | Information Fusion |
Volume | 111 |
DOIs | |
Publication status | Published - Nov 2024 |
Keywords
- Cross-modal synchronization
- Fetal echocardiogram scenario
- Multi-source visual fusion and understanding
- Transformer model
- Visual data generation
ASJC Scopus subject areas
- Software
- Signal Processing
- Information Systems
- Hardware and Architecture