AMD Optimizing CPU Compiler (AOCC)

currently the following AOCC compiler versions are available on the LiCCA cluster

module av (cutout)
aocc/4.1.0                     (D)

(D) - stands for default!

MPI

AOCC does not contain any MPI package, however a recent OpenMPI version compiled with AOCC can be conveniently loaded after loading AOCC.

Loading AOCC with OpenMPI
module load aocc openmpi


AOCC Clang/Clang++

Clang is a C, C++, and Objective-C compiler that encompasses preprocessing, parsing, optimization, code generation, assembly, and linking. Clang supports the -march=znver3 flag to enable best code generation and tuning for 3rd Gen AMD EPYC Series processors.

AOCC Flang

Flang is the Fortran front-end designed for LVVM integration and suitable for interoperability with Clang/LLVM. It supports all clang compiler options plus a few flang-specific options. AMD extends the GitHub version of flang available from https://github.com/flang-compiler/flang*, which in turn is based on the NVIDIA/PGI commercial Fortran compiler.

Clang and Flang Options

Architecture

Generate instructions that run on 4th Gen AMD EPYC Series

CPUs.

-march=znver4

Generate instructions that run on 3rd Gen AMD EPYC Series

CPUs.

-march=znver3
Generate instructions for the local machine
-march=native
Optimization Levels
Disable all optimizations
-O0
Minimal level speed and code optimization
-O1/ -O
Moderate level optimization
-O2
Aggressive optimization
-O3
Maximize performance
-Ofast
Enable link time optimizations
-flto
Enable loop optimizations -funroll-loops
-enable-licm-vrp
-enable-partial-unswitch
-fuse-tile-inner-loop
-unroll-threshold
Enable advanced loop optimizations
-unroll-aggressive
Enable function level optimizations
-fitodcalls
-function-specialize
-finline-aggressive
-inline-recursion=[1..4] (use with flto)
-do-block-reordering={none, normal,
aggressive}
Enable advanced vectorization
-enable-strided-vectorization
-enable-epilog-vectorization
Enable memory layer optimizations
-fremap-arrays (use with flto)
Profile guided optimizations
-fprofile-instr-generate (1st invoc.)

-fprofile-instr-use (2nd invocation)
OpenMP®
-fopening
For enabling memory stores, memory bandwidth workloads
-fnt-store
Enable removal of all unused array computation
-reduce-array-computations=3
Other Options
Enable faster, less precise math operations (part of Ofast)
-ffast-math
-freciprocal-math
OpenMP threads and affinity (N number of cores)
export OMP_NUM_THREADS=N
export GOMP_CPU_AFFINITY=”0-{N-1}”
Enabling vector library
-fveclib=AMDLIBM
Link to AMD library
-L/libm-install-dir/lin -lalm
For Fortran workloads
Compile free form Fortran
-ffree-form


AMD Optimizing CPU Libraries (AOCL)

AMD Optimizing CPU Libraries (AOCL) are a set of numerical libraries tuned specifically for the AMD EPYC processor family. They include a simple interface that takes advantage of the latest hardware innovations.

To make use of the AOCL-Libraries, a suitable compiler (gcc or aocc ) has to be loaded first. Currently, AOCL is available as a single package including all of its libraries and you can choose between two different integer type lengths: lp64 uses 32-bit integer interfaces (most commonly used) whereas ilp64 uses 64-bot integer interfaces (rare).

Loading AOCL via Lmod

Choose one of the following lines
ml load aocc aocl
ml load aocc aocl/ilp64
ml load gcc aocl
ml load gcc aocl/ilp64
ml load gcc/13.2 aocl
ml load gcc/13.2 aocl/ilp64


AOCL consists of the following libraries:

BLIS (BLAS [Basic Linear Algebra Subprograms] Library)

BLIS is a portable open-source software framework for instantiating high-performance Basic Linear Algebra Subprograms (BLAS), such as dense linear algebra libraries.

libFLAME (LAPACK [Linear Algebra PACKage])

AOCL-libFLAME is a high performant implementation of Linear Algebra PACKage (LAPACK). LAPACK provides routines for solving systems of linear equations, least-squares problems, eigenvalue problems, singular value problems, and the associated matrix factorizations.

AMD-FFTW (Fastest Fourier Transform in the West)

The AMD-optimized version of Fast Fourier Transform Algorithm (FFTW) is an open-source implementation of FFTW that offers a comprehensive collection of fast C routines for computing the Discrete Fourier Transform (DFT) and various special cases thereof that are optimized for AMD EPYC and other AMD “Zen”-based processors. It can compute transforms of real and complex valued arrays of arbitrary size and dimension.

LibM (AMD Core Math Library)

AMD LibM is a software library containing a collection of basic math functions optimized for x86-64 processor-based machines. It provides many routines from the list of standard C99 math functions. Applications can link into the AMD LibM library and invoke math functions instead of using the compiler’s default math functions for better accuracy and performance.

AOCL-Sparse

AOCL-Sparse is a library containing basic linear algebra subroutines for the sparse matrices and vectors optimized for AMD EPYCTM and other AMD “Zen”-based processors. It is designed to be used with C and C++.