Decommissioned HLRB II SGI Altix
aka: Höchstleistungsrechner in Bayern I, Bundeshöchstleistungsrechner in Bayern II
Hardware
The HLRB II is based on SGI's Altix 4700 platform. The system installed at LRZ was optimized for high application performance and high memory bandwidth.
The following table provides an overview of the hardware and characteristics of the HLRB II.
Overall Characteristics for both installation phases | ||
Phase 1 | Phase 2 | |
---|---|---|
Total number of cores | 4096 | 9728 |
Peak Performance of the entire system | 26.2 TFlop/s | 62.3 TFlop/s |
Linpack Performance | 24.5 TFlop/s | 56.5 TFlop/s |
Total size of memory for entire system | 17.5 TByte | 39 TByte |
Direct Attached Disks | 300 TByte | 600 TByte |
Network Attached Disks | 40 TByte | 60 TByte |
Granularity | ||
Number of compute partitions | 16 | 19 (13+ 6 with high density blades) |
Number of cores per compute partition | 256 | 512 |
Number of blades (memory channels) per compute partition | 256 | 128 (high density) or 256 |
Number of cores per socket | 1 | 2 |
Number of cores per blade | 1 | 2 or 4 (high density blades) |
Processor | ||
Processor type | Intel Itanium2 Madison 9M | Intel Itanium2 Montecito Dual Core |
Clock rate | 1.6 GHz | 1.6 GHz |
Number of Floating Point Operations per clock | 4 (=2 FMAs) | 4 (=2 FMAs) |
Peak performance of a socket | 6.4 GFlop/s | 12.8 GFlop/s |
Max. number of Instructions per clock tick | 6 | 12 (6 per Core) |
Peak number of instructions per second of a socket (Gip/s) | 9.6 Gip/s | 19.2 Gip/s (9.6 per Core) |
Number of FP Registers | 128 | 256 (128 per core) |
Memory | ||
Memory per core | 4 GByte (8 GByte on interactive node) | 4 GByte per Core |
Clock rate of frontside bus (FSB) | 533 MHz | 533 MHz |
Peak bandwidth to local memory | 8.5 GByte/s per core | 8.5 GByte/s shared between 2 or 4 cores (density blades) |
Total bandwidth to local memory of the entire system | 34816 GByte/s | 34816 GByte/s |
Latency to local memory | approx. 210 cycles | approx. ??? cycles |
Memory Hierarchy | ||
L1 Data Cache (not used for floating point data) | ||
size | 16 kByte | 16 kByte |
cacheline size | 64 Byte | 64 Byte |
associativity | 4-way | 4-way |
latency | 1 cycle | 1 cycle |
Bandwidth | 25.6 GByte/s | 25.6 GByte/s |
L2 Data Cache (per core) | ||
Size | 256 kByte | 256 kByte |
Cacheline size | 128 Byte | 128 Byte |
Associativity | 8-way | 8-way |
min. Latency | INT: 5 cycles, | INT: 5 cycles, |
Bandwidth | 51.2 GByte/s (FP) (+25.6 GByte/s (INT)) | 51.2 GByte/s (FP) (+25.6 GByte/s (INT)) |
Data banks | 16 Bytes/bank | 16 Bytes/bank |
L2 Instr. Cache (per core) | ||
Size | n/a | 1 MByte |
L3 Cache (per core) | ||
Size | 6 MByte | 9 MByte |
Cacheline size | 128 Byte | 128 Byte |
Associativity | 12-way | 12-way |
min. Latency | 14 cycles | 14 cycles |
Bandwidth | 51.2 GByte/s | 51.2 GByte/s |
Fill Bandwidth | 128 Byte in 4 cycles | 128 Byte in 4 cycles |
L2 Data TLB | ||
Entries | 128 | 128 |
Latency | 30 cycle penalty for TLB miss | 30 cycle penalty for |
Internal Interconnect | ||
Connection network type | NUMAlink 4 | NUMAlink 4 |
Number of (bidirectional) links per blade | 2 | 2 |
Bandwidth of one link (bidirectional) | 6.4 GByte/s | 6.4 GByte/s |
MPI latency | 1-5 µs | 1-5 µs |
Disks | ||
Direct attached disks | ||
Characteristics | few, but large files; high bandwidth; | few, but large files; high bandwidth Pseudo Temporary Files, Temporary Project Files |
Size | 300 TByte | 600 TByte |
aggr. bandwidth to disks | 20 GByte/s | 40 GByte/s |
Networked attached disks (Home Directories) | 30 TByte | 60 TByt |
Characteristics | many, but small files; high transaction rate | many, but small files; high transaction rate |
Size | 40 TByte | 60 TByte |
bandwidth to disks | 600 MByte/s | 800 MByte/s |
Environment | ||
Footprint | 24 m x 12 m | 24 m x 12 m |
Total weight | 103 metric tons | 103 metric tons |
Total electrical power | ~1000 kVA | ~1100 kVA |