AMD Optimizing CPU Compiler (AOCC)
currently the following AOCC compiler versions are available on the LiCCA cluster
aocc/4.1.0 (D)
(D)
- stands for default!
MPI
AOCC does not contain any MPI package, however a recent OpenMPI version compiled with AOCC can be conveniently loaded after loading AOCC.
module load aocc openmpi
Clang is a C, C++, and Objective-C compiler that encompasses preprocessing, parsing, optimization, code generation, assembly, and linking. Clang supports the AOCC Clang/Clang++
-march=znver3
flag to enable best code generation and tuning for 3rd Gen AMD EPYC Series processors.
Flang is the Fortran front-end designed for LVVM integration and suitable for interoperability with Clang/LLVM. It supports all AOCC Flang
clang
compiler options plus a few flang
-specific
options. AMD extends the GitHub version of flang
available from https://github.com/flang-compiler/flang*, which in turn is based on the NVIDIA/PGI commercial Fortran compiler.
Generate instructions that run on 4th Gen AMD EPYC Series CPUs. Generate instructions that run on 3rd Gen AMD EPYC Series CPUs.Clang and Flang Options
Architecture -march=znver4
-march=znver3
Generate instructions for the local machine -march=native
Optimization Levels Disable all optimizations -O0
Minimal level speed and code optimization -O1/ -O
Moderate level optimization -O2
Aggressive optimization -O3
Maximize performance -Ofast
Enable link time optimizations -flto
Enable loop optimizations -funroll-loops
-enable-licm-vrp
-enable-partial-unswitch
-fuse-tile-inner-loop
-unroll-threshold
Enable advanced loop optimizations -unroll-aggressive
Enable function level optimizations -fitodcalls
-function-specialize
-finline-aggressive
-inline-recursion=[1..4] (use with flto)
-do-block-reordering={none, normal,
aggressive}
Enable advanced vectorization -enable-strided-vectorization
-enable-epilog-vectorization
Enable memory layer optimizations -fremap-arrays (use with flto)
Profile guided optimizations -fprofile-instr-generate (1st invoc.)
-fprofile-instr-use (2nd invocation)OpenMP® -fopening
For enabling memory stores, memory bandwidth workloads -fnt-store
Enable removal of all unused array computation -reduce-array-computations=3
Other Options Enable faster, less precise math operations (part of Ofast) -ffast-math
-freciprocal-math
OpenMP threads and affinity (N number of cores) export OMP_NUM_THREADS=N
export GOMP_CPU_AFFINITY=”0-{N-1}”
Enabling vector library -fveclib=AMDLIBM
Link to AMD library -L/libm-install-dir/lin -lalm
For Fortran workloads Compile free form Fortran -ffree-form
AMD Optimizing CPU Libraries (AOCL) are a set of numerical libraries tuned specifically for the AMD EPYC processor family. They include a simple interface that takes advantage of the latest hardware innovations. To make use of the AOCL-Libraries, a suitable compiler ( AOCL consists of the following libraries: BLIS is a portable open-source software framework for instantiating high-performance Basic Linear Algebra Subprograms (BLAS), such as dense linear algebra libraries. AOCL-libFLAME is a high performant implementation of Linear Algebra PACKage (LAPACK). LAPACK provides routines for solving systems of linear equations, least-squares problems, eigenvalue problems, singular value problems, and the associated matrix factorizations. The AMD-optimized version of Fast Fourier Transform Algorithm (FFTW) is an open-source implementation of FFTW that offers a comprehensive collection of fast C routines for computing the Discrete Fourier Transform (DFT) and various special cases thereof that are optimized for AMD EPYC and other AMD “Zen”-based processors. It can compute transforms of real and complex valued arrays of arbitrary size and dimension. AMD LibM is a software library containing a collection of basic math functions optimized for x86-64 processor-based machines. It provides many routines from the list of standard C99 math functions. Applications can link into the AMD LibM library and invoke math functions instead of using the compiler’s default math functions for better accuracy and performance. AOCL-Sparse is a library containing basic linear algebra subroutines for the sparse matrices and vectors optimized for AMD EPYCTM and other AMD “Zen”-based processors. It is designed to be used with C and C++.AMD Optimizing CPU Libraries (AOCL)
gcc
or aocc
) has to be loaded first. Currently, AOCL is available as a single package including all of its libraries and you can choose between two different integer type lengths: lp64
uses 32-bit integer interfaces (most commonly used) whereas ilp64
uses 64-bot integer interfaces (rare).Loading AOCL via Lmod
ml load aocc aocl
ml load aocc aocl/ilp64
ml load gcc aocl
ml load gcc aocl/ilp64
ml load gcc/13.2 aocl
ml load gcc/13.2 aocl/ilp64
BLIS (BLAS [Basic Linear Algebra Subprograms] Library)
libFLAME (LAPACK [Linear Algebra PACKage])
AMD-FFTW (Fastest Fourier Transform in the West)
LibM (AMD Core Math Library)
AOCL-Sparse