Generate instructions that run on 4th Gen AMD EPYC Series CPUs. Generate instructions that run on 3rd Gen AMD EPYC Series CPUs.Clang and Flang Options
Architecture -march=znver4
-march=znver3
Generate instructions for the local machine -march=native
Optimization Levels Disable all optimizations -O0
Minimal level speed and code optimization -O1/ -O
Moderate level optimization -O2
Aggressive optimization -O3
Maximize performance -Ofast
Enable link time optimizations -flto
Enable loop optimizations -funroll-loops
-enable-licm-vrp
-enable-partial-unswitch
-fuse-tile-inner-loop
-unroll-threshold
Enable advanced loop optimizations -unroll-aggressive
Enable function level optimizations -fitodcalls
-function-specialize
-finline-aggressive
-inline-recursion=[1..4] (use with flto)
-do-block-reordering={none, normal,
aggressive}
Enable advanced vectorization -enable-strided-vectorization
-enable-epilog-vectorization
Enable memory layer optimizations -fremap-arrays (use with flto)
Profile guided optimizations -fprofile-instr-generate (1st invoc.)
-fprofile-instr-use (2nd invocation)OpenMP® -fopening
For enabling memory stores, memory bandwidth workloads -fnt-store
Enable removal of all unused array computation -reduce-array-computations=3
Other Options Enable faster, less precise math operations (part of Ofast) -ffast-math
-freciprocal-math
OpenMP threads and affinity (N number of cores) export OMP_NUM_THREADS=N
export GOMP_CPU_AFFINITY=”0-{N-1}”
Enabling vector library -fveclib=AMDLIBM
Link to AMD library -L/libm-install-dir/lin -lalm
For Fortran workloads Compile free form Fortran -ffree-form