Supercomputers at the LRZ
- National Supercomputer SuperMUC Petascale System.
SuperMUC is the name of the new supercomputer at Leibniz-Rechenzentrum (Leibniz Supercomputing Centre) in Garching near Munich (the MUC suffix is borrowed from the Munich airport code). With more than 155.000 cores and a peak performance of 3 Petaflop/s (=10^15 Floating Point Operations per second) in June 2012 SuperMUC is one of the fastest supercomputers in the world.
System overview
155,656 processor cores in 9400 compute nodes
>300 TB RAM
Infiniband FDR10 interconnect
4 PB of NAS-based permanent disk storage
10 PB of GPFS-based temporary disk storage
>30 PB of tape archive capacity
Powerful visualization systems
Highest energy-efficiency
- C2PAP
A small island of 128 compute nodes with the setup nearly identical to SuperMUC is available for the C2PAP. Members of the cluster can apply for access within a well defined computing project, or as individuals for trial purposes. Contact the C2PAP administrator Aliaksei Krukau and supply your LRZ, or LMU, account name and your static IP address. More details are given at C2PAP.
- Linux Cluster: HPC system for the Munich and Bavarian universities.
- AMD Opterons 2- and 4-way nodes for serial processing
- AMD Opterons 8-way nodes with 10GE Myrinet for parallel processing
- Intel Xeon based 4-way nodes for serial processing
- Intel Nocona based 2-way nodes with Infiniband interconnect
- Intel Itanium2 based 4-way nodes with Myrinet interconnect
- Intel Itanium2 based 2-way nodes with Gigabit Ethernet interconnect
- Intel Itanium2 based 128-way SGI Altix 3700Bx2 and a 256-way SGI Altix 4700 with ccNUMA shared memory interconnect
Peakperformance of the whole cluster is 10.6 TFlop/s. All systems will be operated as a single batch entity. Howto get an account
update 22.4.2010: A new SGI ICE system with 512 cores and powerful Nehalem processors from Intel is now generally available to cluster users.
See also: LRZ Overview Computing facilities
CUDA related projects
E18
E18 contacts: Sebastian Neubert, Florian Böhmer, Bernhard Ketzer, Boris Grube
Project: 5-dimensional Hough-Transform for Helix Track-Finding
Description: A 5-dimensional Hough Transform has been implemented for Helix-Track-Finding. The associated challenge of finding an unknown number of local maxima in the 5-dimensional space is attacked by a tree-search algorithm. This searching algorithm features a very good calculation/memory access - ratio and offers a very rich potential of parallelization, making it perfectly suited for implementation on a GPU. This project is part of a development aiming for online software pattern recognition in a large Particle Physics experiment. With the experience gained during the development so far a next step would be to examine the scalability on larger arrays of GPUs.
Status: CPU & GPU implementation running, first results show speed increase of at least a factor of 20
Memory Usage: In the order of 100 MB
Floating Point Performance: Not yet studied in detail.
Parallelisation: Up to 100 000 completely independent threads at a time.
Project: Partial Wave LogLikelihood Fitting in Hadron Spectroscopy.
Description: The models used in state of the art partial wave decompositions contain up to several hundreds of fitting parameters. In order to find and refine the best fitting model and to perform systematic studies on the data several thousand fits have to be performed for a typical analysis. Modern experiments deliver data samples that are larger than current world statistics by an order of magnitude making fast analysis algorithms mandatory to exploit the additional wealth of information.
Status: Working. Using double precision speedups of larger than 20 have been achieved.
Memory Usage: Up to 1GB (dependent on data sample and fit model). Typical: 100MB
Parallelization: The evaluation of the LogLikelihood (and its gradient) for partial wave decompositions contains a large sum over events. The calculation of this sum can be parallelized. Heterogeneous programming allows the usage of standard minimizers (like e.g. Minuit) on the CPU while the expensive evaluation of the likelihood is delegated to the GPUs. Advanced algorithms with high computational demands like Markov Chain Monte Carlo Mapping could be realized. This project also scales well with the number of independent nodes in a cluster.
Links to other (scientific) groups with expertise on GPU computing
Klaus Schulten, Theoretical and Computational Biophysics Group, University of Illinois
http://www.ks.uiuc.edu/Research/gpu/
GPUcomputing.net is a research and development community that fosters collaborative domain-focused GPU research across disciplines. http://www.gpucomputing.net/
Remote visualization at RZG and LRZ
just a few links and email information to start with
RZG
Markus Rampp, mjr@rzg.mpg.de Team: visualization@rzg.mpg.de
http://www.rzg.mpg.de/visualisation/remote-visualization
LRZ
Helmut Satzger, helmut.satzger@lrz.de
http://www.lrz-muenchen.de/services/compute/visualisation/index.html