Survey on Distributed Techniques applied to Neural Networks

   page       attach   
sommario

This paper presents a systematic literature review of distributed and parallel techniques for running deep neural networks on multiple machines and GPUs. It is composed of three parts: 1) a review of the available libraries that enable distributed training across GPU clusters, 2) a review of the most popular frameworks that facilitate parallelizing the training process on GPUs, and 3) a practical section that demonstrates a proof-of-concept implementation of the training process using common frameworks, namely PyTorch DDP and cuDNN. 

The review synthesizes research from the past decade, examining various approaches to distributed training, their effectiveness, and implementation challenges. The work is aimed at students and practitioners, with the goal to provide an introduction to the topic and help frame a general idea of the most common libraries in each domain. 

The distributed experiments use data parallelism to accelerate the training process, while the GPU experiments use cuDNN, cuBLAS and manual kernel implementations to train a small network. The effectiveness of each approach is demonstrated and to aid reproduction and experimentation Docker environments are provided. This allows to simulate a multi-GPU setup on a single Nvidia GPU, promoting ease of use by not relying on cloud services.

prodotti