MIMTP: Mamba-Driven Interaction-Aware Multi-Modal Trajectory Prediction for Autonomous Driving

Abstract

Accurate prediction of future vehicle trajectories is essential for ensuring safety and reliable decision-making in autonomous driving systems. However, existing deep learning-based approaches exhibit several limitations. Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) struggle to effectively model long-term temporal dependencies and complex agent interactions, while Transformer-based architectures often suffer from high computational complexity and limited efficiency. To overcome these challenges, this paper proposes an efficient Mamba-based feature extraction framework for jointly encoding vehicle trajectories and map information. By leveraging state-space modeling and a selective scanning mechanism, the proposed approach effectively captures longrange dependencies and enhances the representation of complex traffic behaviors. Specifically, raw scene data are first normalized and embedded into a unified feature space. A Mamba Encoder is then employed to extract high-level features from historical vehicle trajectories and map elements. Subsequently, Vehicle-Vehicle and Vehicle-Map interaction modules are introduced to explicitly model dynamic interactions among traffic participants and between vehicles and the surrounding map. The resulting high-dimensional features are further fused using an additional Mamba Encoder, while a Global Interaction Module is designed to capture scenelevel dependencies. Finally, a Gated Recurrent Unit (GRU) decoder generates multi-modal future trajectory predictions. Experimental results on the Argoverse 1 dataset demonstrate that the proposed method achieves superior performance in terms of minADE, minFDE, and minMR, while maintaining high computational efficiency.OPEN ACCESS Received: 28/01/2026 Accepted: 16/04/2026

Abstract

Document

Document information

Document Score

Share this document

Keywords

claim authorship