2024-04-09 DLI Training Series - Model Parallelism - Building and Deploying Large Neural Networks (hdli1s24)

Course	DDLI Training Series - Model Parallelism - Building and Deploying Large Neural Networks
Number	hdli1s24
Available places	11
Date	09.04.2024 – 09.04.2024
Price	EUR 0.00
Location	Leibniz Rechenzentrum Boltzmannstr. 1 85748 Garching b. München
Room	Kursraum 2
Registration deadline	26.03.2024 23:59
E-mail	education@lrz.de

This is an on-site course at LRZ in Garching near Munich. There will be no possibility to join online remotely via video conference.

Participants are expected to bring their own laptops running the latest version of Chrome or Firefox. There are no PCs installed in the course room!

Large language models (LLMs) and deep neural networks (DNNs), whether applied to natural language processing (e.g., GPT-3), computer vision (e.g., huge Vision Transformers), or speech AI (e.g., Wave2Vec 2), have certain properties that set them apart from their smaller counterparts. As LLMs and DNNs become larger and are trained on progressively larger datasets, they can adapt to new tasks with just a handful of training examples, accelerating the route toward general artificial intelligence. Training models that contain tens to hundreds of billions of parameters on vast datasets isn’t trivial and requires a unique combination of AI, high-performance computing (HPC), and systems knowledge. The goal of this course is to demonstrate how to train the largest of neural networks and deploy them to production.

The course is part of a training series co-organised by LRZ and NVIDIA Deep Learning Institute (DLI). All instructors are NVIDIA certified University Ambassadors.

Learning Objectives

By participating in this workshop, you’ll learn how to:

Scale training and deployment of LLMs and neural networks across multiple nodes.
Use techniques such as activation checkpointing, gradient accumulation, and various forms of model parallelism to overcome the challenges associated with large-model memory footprint.
Capture and understand training performance characteristics to optimize model architecture.
Deploy very large multi-GPU, multi-node models to production using NVIDIA Triton™ Inference Server.

Important information

After you are accepted, please create an account under courses.nvidia.com/join.

Ensure your laptop / PC will run smoothly by going to http://websocketstest.com/ Make sure that WebSockets work for you by seeing under Environment, WebSockets is supported and Data Receive, Send and Echo Test all check Yes under WebSockets (Port 80).If there are issues with WebSockets, try updating your browser.

NVIDIA Deep Learning Institute

The NVIDIA Deep Learning Institute delivers hands-on training for developers, data scientists, and engineers. The program is designed to help you get started with training, optimising, and deploying neural networks to solve real-world problems across diverse industries such as self-driving cars, healthcare, online services, and robotics.

Prerequisites

Good understanding of PyTorch
Good understanding of deep learning and data parallel training concepts
Practice with natural language processing are useful, but optional

Hands-On

The lectures are interleaved with many hands-on sessions using Jupyter Notebooks. The exercises will be done on a fully configured GPU-accelerated workstation in the cloud.

Language

English

Lecturers

PD Dr. Juan Durillo Barrionuevo (LRZ, NVIDIA certified University Ambassador)

Prices and Eligibility

The course is open and free of charge for people from academia from the Member States of the European Union (EU) and Associated Countries to the Horizon 2020 programme.

Registration

Please register with your official e-mail address to prove your affiliation.

Withdrawal Policy

See Withdrawal

Legal Notices

For registration for LRZ courses and workshops we use the service edoobox from Etzensperger Informatik AG (www.edoobox.com). Etzensperger Informatik AG acts as processor and we have concluded a Data Processing Agreement with them.

See Legal Notices

No.	Date	Time	Leader	Location	Room	Description
1	09.04.2024	10:00 – 17:00	Juan Durillo Barrionuevo LRZ Events	Leibniz Rechenzentrum	Kursraum 2	Lecture

Contents