Clang and Flang Options

Architecture

Generate instructions that run on 4th Gen AMD EPYC Series

CPUs.

-march=znver4

Generate instructions that run on 3rd Gen AMD EPYC Series

CPUs.

-march=znver3
Generate instructions for the local machine
-march=native
Optimization Levels
Disable all optimizations
-O0
Minimal level speed and code optimization
-O1/ -O
Moderate level optimization
-O2
Aggressive optimization
-O3
Maximize performance
-Ofast
Enable link time optimizations
-flto
Enable loop optimizations -funroll-loops
-enable-licm-vrp
-enable-partial-unswitch
-fuse-tile-inner-loop
-unroll-threshold
Enable advanced loop optimizations
-unroll-aggressive
Enable function level optimizations
-fitodcalls
-function-specialize
-finline-aggressive
-inline-recursion=[1..4] (use with flto)
-do-block-reordering={none, normal,
aggressive}
Enable advanced vectorization
-enable-strided-vectorization
-enable-epilog-vectorization
Enable memory layer optimizations
-fremap-arrays (use with flto)
Profile guided optimizations
-fprofile-instr-generate (1st invoc.)

-fprofile-instr-use (2nd invocation)
OpenMP®
-fopening
For enabling memory stores, memory bandwidth workloads
-fnt-store
Enable removal of all unused array computation
-reduce-array-computations=3
Other Options
Enable faster, less precise math operations (part of Ofast)
-ffast-math
-freciprocal-math
OpenMP threads and affinity (N number of cores)
export OMP_NUM_THREADS=N
export GOMP_CPU_AFFINITY=”0-{N-1}”
Enabling vector library
-fveclib=AMDLIBM
Link to AMD library
-L/libm-install-dir/lin -lalm
For Fortran workloads
Compile free form Fortran
-ffree-form


  • Keine Stichwörter