MBZUAI Institutional Repository

Recent Submissions

  • ItemOpen Access
    Byzantine-Robust Optimization under (L0, L1 )-Smoothness
    (ML Research Press, 2026) Bolatov, Arman; Horvath, Samuel; Takac, Martin; Gorbunov, Eduard
    We consider distributed optimization under Byzantine attacks in the presence of (L0, L1)-smoothness, a generalization of standard L-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose Byz-NSGDM1, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by (L0, L1 )-smoothness and Byzantine adversaries. We prove that Byz-NSGDM achieves a convergence rate of O(K−1/4 ) up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification, synthetic (L0, L1 )-smooth optimization, and character-level language modeling with a small GPT model demonstrates the effectiveness of our approach against various Byzantine attack strategies. An ablation study further shows that Byz-NSGDM is robust across a wide range of momentum and learning rate choices.
  • ItemOpen Access
    Learning in the Null Space: Small Singular Values for Continual Learning
    (ML Research Press, 2026) Pham, Cuong Anh; Vepakomma, Praneeth; Horvath, Samuel
    Alleviating catastrophic forgetting while enabling further learning is a primary challenge in continual learning (CL). Orthogonal-based training methods have gained attention for their efficiency and strong theoretical properties, and many existing approaches enforce orthogonality through gradient projection. In this paper, we revisit orthogonality and exploit the fact that small singular values correspond to directions that are nearly orthogonal to the input space of previous tasks. Building on this principle, we introduce NESS (Null-space Estimated from Small Singular values), a CL method that applies orthogonality directly in the weight space rather than through gradient manipulation. Specifically, NESS constructs an approximate null space using the smallest singular values of each layer’s input representation and parameterizes task-specific updates via a compact low-rank adaptation (LoRA-style) formulation constrained to this subspace. The subspace basis is fixed to preserve the null-space constraint, and only a single trainable matrix is learned for each task. This design ensures that the resulting updates remain approximately in the null space of previous inputs while enabling adaptation to new tasks. Our theoretical analysis and experiments on three benchmark datasets demonstrate competitive performance, low forgetting, and stable accuracy across tasks, highlighting the role of small singular values in continual learning. The code is available at https://github.com/pacman-ctm/NESS.
  • ItemOpen Access
    Selective Collaboration for Robust Federated Learning
    (ML Research Press, 2026) Tupitsa, Nazarii; Horvath, Samuel; Takac, Martin; Gorbunov, Eduard
    Federated Learning (FL) revolutionizes machine learning by enabling model training across decentralized data sources without aggregating sensitive client data. However, the inherent heterogeneity of client data presents unique challenges, as not all client contributions positively impact model performance. In this work, we propose a novel algorithm, Merit-Based Federated Averaging (MeritFed), which dynamically assigns aggregation weights to clients based on their data distribu-tion’s relevance to a target objective. By leveraging stochastic gradients and solving an auxiliary optimization problem, our method adaptively identifies beneficial collaborators, ensuring efficient and robust learning. We establish theoretical convergence guarantees under mild assumptions and demonstrate that MeritFed achieves superior convergence by harnessing the advantages of diverse yet complementary datasets. Empirical evaluations highlight its ability to mitigate the adverse effects of outlier and adversarial clients, paving the way for more effective and resilient FL in heterogeneous environments.
  • ItemOpen Access
    What Scalable Second-Order Information Knows for Pruning at Initialization
    (ML Research Press, 2026) Navarrete, Ivo Gollini; Avila, Nicolas Mauricio Cuadrado; Takac, Martin; Horvath, Samuel
    Pruning remains an effective strategy for reducing both the costs and environmental impact associated with deploying large neural networks (NNs) while maintaining performance. Classical methods, such as OBD [1] and OBS [2], demonstrate that utilizing curvature information can significantly enhance the balance between network complexity and performance. However, the computation and storage of the Hessian matrix make it impractical for modern NNs, motivating the use of approximations. Recent research [3, 4] suggests that the top eigenvalues guide optimization in a small subspace, are identifiable early, and remain consistent during training. Motivated by these findings, we revisit pruning at initialization (PaI) to evaluate scalable, unbiased second-order approximations, such as the Empirical Fisher and Hutchinson diagonals. Our experiments show that these methods capture sufficient curvature information to improve the identification of critical parameters compared to first-order baselines, while maintaining linear complexity. Additionally, we empirically demonstrate that updating batch normalization statistics as a warmup phase improves the performance of data-dependent criteria and mitigates the issue of layer collapse. Notably, Hutchinson-based criteria consistently outperformed or matched existing PaI algorithms across various models (includ-ing VGG, ResNet, and ViT) and datasets (such as CIFAR-10/100, TinyImageNet, and ImageNet). Our findings suggest that scalable second-order approximations strike an effective balance between computational efficiency and accuracy, making them a valuable addition to the pruning toolkit. We make our code available2 .
  • ItemMetadata only
    Special Issue: 13th International Conference on Computational Advances in Bio and Medical Sciences
    (SAGE Publications, 2026-05-13) Alser, Mohammed; Bansal, Mukul S; Khudyakov, Yury; Mangul, Serghei; Mandoiu, Ion I; Moussa, Marmar R; Patterson, Murray; Rajasekaran, Sanguthevar; Skums, Pavel; Yooseph, Shibu; Zelikovsky, Alexander

Communities in MBZUAI iRep

Select a community to browse its collections.