Distributed Training
Note
About these examples: (Click the arrow below to expand)
This section contains some minimal examples of how to run jobs on the Mila cluster. Each example is self-contained and can be run as-is directly on the cluster without error. Each example has the following structure:
job.sh: SLURMsbatchscript. Can be launched withsbatch job.sh.main.py: Example python script.
Some examples are displayed as a difference with respect to a “base” example. For instance, the multi-gpu example is shown as a difference with respect to the single-gpu example.