Optimized Multimodal Healthcare Image Fusion Using U2Net Restormer with Dilated Dense Encoder–Decoder and Haar-Based Feature Selection

Latest revision as of 11:02, 23 March 2026

Abstract

Multimodal medical imaging plays a pivotal role in clinical diagnostics by integrating complementary anatomical and functional information from modalities such as Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), and Single-Photon Emission Computed Tomography (SPECT). Despite notable progress, existing fusion approaches continue to face persistent challenges. Convolutional Neural Network (CNN)-based methods often suffer from information loss due to convolutional down-sampling, while Transformer architectures, though effective at capturing global dependencies, incur high computational costs and rely on large-scale pretraining. Generative Adversarial Network (GAN)-based fusion models can generate visually realistic outputs but are prone to training instability and limited reproducibility. In addition, prior studies frequently adopt inconsistent evaluation metrics, with insufficient emphasis on clinical interpretability and robustness, hindering real-world deployment across heterogeneous datasets and institutions. To address these limitations, this study proposes a U-shaped Nested Network – Restoration Transformer (U2Net–Restormer) framework with a Dilated Dense Encoder–Decoder architecture for robust multimodal medical image fusion. The framework integrates hierarchical multiscale representation learning with residual global contextual refinement. To enhance discriminative capability, an optimized Haar-based feature selection strategy is introduced to preserve high-gradient structural and functional details while reducing feature redundancy. Furthermore, an attention-driven fusion mechanism adaptively weights modality-specific contributions, enabling effective integration of heterogeneous information. The proposed method is evaluated on the Augmented Alzheimer’s Neuroimaging Library (AANLIB) multimodal brain imaging dataset, covering CT-MRI, PET-MRI, and SPECT-MRI fusion tasks. Experimental results demonstrate consistent performance gains over state-of-the-art CNN-, Transformer-, and GANbased methods, achieving Structural Similarity Index Measure (SSIM) up to 0.963, Peak Signal-to-Noise Ratio (PSNR) of 42.1 dB, Feature Mutual Information (FMI) of 0.86, and Edge Preservation Index (EPI) of 0.91, with improvements of at least 4%–6% across modalities. Subjective evaluations by radiologists and neurologists report Likert scores up to 4.8/5 for structural visibility, functional fidelity, and diagnostic value. Robustness analysis under Gaussian noise (σ= 15%) further confirms the method’s resilience. Overall, the proposed framework delivers high-fidelity, clinically interpretable multimodal fusion suitable for diverse imaging scenarios.

Revision as of 10:47, 23 March 2026 (view source) Scipediacontent (talk \| contribs) (Created page with " == Abstract == <p>Multimodal medical imaging plays a pivotal role in clinical diagnostics by integrating complementary anatomical and functional information from modalities...")	Latest revision as of 11:02, 23 March 2026 (view source) Scipediacontent (talk \| contribs) m (Scipediacontent moved page Review 488264515018 to Shafiq et al 2026b)
(One intermediate revision by the same user not shown)
(No difference)

Latest revision as of 11:02, 23 March 2026

Abstract

Document

Document information

Document Score

Share this document

Keywords

claim authorship