MBZUAI Institutional Repository

Recent Submissions

  • Item
    Beamwidth-Adaptive ISAC Beamforming: A Joint Optimization Framework for Detection and Communication
    (IEEE, 2025-09-11) Fu, Zengchang; Yuan, Jide; Yang, Yuli; Guizani, Mohsen
    In this paper, we study a sensing beamwidthadaptive beamforming in an integrated sensing and communication (ISAC) system, aiming to illuminate the communication user to assist in detecting adjacent targets. To this end, we consider the beamwidth variability of sensing beam as a key concept of the beamformer. The ratio of summation of beampattern gain on desired over entire angular coverage, namely effective beampattern gain ratio (EBGR), is adopted as the objective for characterizing the sensing performance. The proposed framework takes the communication signal-to-interference-plus-noise ratio (SINR), the total transmit power limit, and the beampattern matching into account. To address the intrinsic non-convexity introduced by EBGR and SINR, we employ Dinkelbach's method to convert the objective form, and obtain a high-quality beamforming solution by applying the semidefinite relaxation (SDR)-based approach. A regularized zero-forcing (RZF)-based approach is further proposed aiming to reduce complexity. Numerical results validate the effectiveness of our beamwidth-adaptive design in various scenarios.
  • Item
    Easz: An Agile Transformer-based Image Compression Framework for Resource-constrained IoTs
    (IEEE, 2025-09-15) Mao, Yu; Li, Jingzong; Wang, Jun; Xu, Hong; Kuo, Tei-Wei Wei; Guan, Nan; Xue, Chun Jason
    Neural image compression, necessary in various machine-to-machine communication scenarios, suffers from its heavy encode-decode structures and inflexibility in switching between different compression levels. Consequently, it raises significant challenges in applying the neural image compression to edge devices that are developed for powerful servers with high computational and storage capacities. We take a step to solve the challenges by proposing a new transformer-based edge-computefree image coding framework called Easz. Easz shifts the computational overhead to the server, and hence avoids the heavy encoding and model switching overhead on the edge. Easz utilizes a patch-erase algorithm to selectively remove image contents using a conditional uniform-based sampler. The erased pixels are reconstructed on the receiver side through a transformer-based framework. To further reduce the computational overhead on the receiver, we then introduce a lightweight transformer-based reconstruction structure to reduce the reconstruction load on the receiver side. Extensive evaluations conducted on a realworld testbed demonstrate multiple advantages of Easz over existing compression approaches, in terms of adaptability to different compression levels, computational efficiency, and image reconstruction quality.
  • Item
    Optimising Vision Transformer Performance on Limited Datasets: A Multi-Gradient Approach
    (IEEE, 2025-09-15) Ali, Mohsin; Raza, Haider; Gan, John Q.; Khan, Muhammad Haris
    Vision Transformers (ViTs) are well-known for capturing the global context of images using Multi-head Self-Attention (MHSA). However, compared to Convolutional Neural Networks (CNNs), ViTs typically exhibit a reduced inductive bias and require a larger volume of training image data to learn local feature representations. While various methods like the integration of CNN features or advanced pre-training strategies have been proposed to introduce this inductive bias, they often require significant architectural modifications or rely heavily on expansive pre-training datasets. This paper introduces a novel approach for training ViTs on limited datasets without altering the ViT architecture. We propose the Multi-Gradient Image Transformer (MGiT), which utilizes a parallel training method with a compact auxiliary ViT to adaptively optimize the weights of the target ViT. This approach yields significant performance improvements across diverse datasets and training scenarios. Our findings demonstrate that MGiT enhances ViT efficiency more effectively than traditional training methods. Furthermore, the application of Jensen-Shannon (JS) Divergence validates the convergence and alignment of feature understanding between the primary and auxiliary ViTs, thereby stabilizing the training process. The code is available at https://github.com/game-sys/Multi-Gradient-Image-Transformer-MGiT-
  • Item
    Pureformer: Transformer-Based Image Denoising
    (IEEE, 2025-09-15) Gautam, Arnim; Pawar, Aditi|; Joshi, Aishwarya; Tazi, Satyanarayan Narayan; Chaudhary, Sachin; Dudhane, Akshay A.; Vipparthi, Santosh Kumar; Murala, Subrahmanyam
    Image denoising is a crucial task in computer vision with applications in real-world smartphones image processing, remote sensing, and photography. Traditional convolution neural networks (CNNs) often struggle to reduce noise while preserving fine details due to their limited receptive fields. Transformer-based approaches, such as Restormer, improve long-range feature modeling, while PromptIR enhances local feature refinement. However, existing methods still face challenges in effectively integrating multi-scale features for robust noise reduction. We propose Pureformer, a Transformer-based encoder-decoder architecture specifically designed for image de-noising. The model employs a four-level encoder-decoder structure, where each stage utilizes Multi-Dconv Head Transposed Attention (MDTA) and Gated-Dconv Feed-Forward Network (GDFN) to extract and refine multi-scale features. We proposed a feature enhancer block in the latent space expands the receptive field using a spatial filter bank, improving feature fusion and texture restoration. Skip connections between the encoder and decoder help retain spatial information, ensuring high-fidelity reconstruction. Pureformer is evaluated on the NTIRE 2025 Image Denoising Challenge dataset, achieving a test PSNR of 29.64 dB and SSIM of 0.8601. We also validated our Pureformer on existing benchmark datasets BSD68 and Urban100 datasets. The results demonstrate that Pureformer surpasses existing methods in both noise reduction and detail preservation, making it a strong choice for real-world image denoising. Access our codes and models from https://github.com/Chapstick53/NTIRE2025_cipher_vision.
  • Item
    FusedVision: A Knowledge-Infusing Approach for Practical Anomaly Detection in Real-World Surveillance Videos
    (IEEE, 2025-09-15) Dawoud, Khaled; Zaheer, Muhammad Zaigham; Khan, Mustaqeem; Nandakumar, Karthik; Saddik, Abdulmotaleb El; Khan, Muhammad Haris
    Object-centric approaches have gained attention as effective one-class classification methods for detecting anomalies in videos. These approaches rely on using an object detector to isolate all objects in the frames and subsequently leveraging either the objects themselves or their interactions to train a learning system. In this study, we put forth a novel perspective towards anomaly detection by proposing a branched network architecture that employs both an object detector and a normalcy learning model, working together in tandem to more effectively identify anomalies within the data. Through extensive experimentation, we analyze the optimal fusion mechanism as well as anomaly scoring proposed in our branched approach. Our approach is more practical towards realworld applications of anomaly detection where infusion of the knowledge about anticipated anomalies may result in better performance while maintaining a baseline performance nonetheless. To evaluate the general applicability of our approach, we integrate it with multiple existing recent anomaly detection methods and assess its efficacy on three widely used anomaly detection datasets: ShanghaiTech, Avenue, and Ped2. Our proposed approach noticeably outperforms existing methods, demonstrating its effectiveness in detecting anomalies across a range of contexts. The implementation of our method is available at https://github.com/kdawoud91/FusedVision.

Communities in MBZUAI iRep

Select a community to browse its collections.