Comparison of Compiler Options (intel vs. pgi vs. gcc)
The following sections discuss the most commonly used compiler switches and extensions implemented by the Intel, PGI, and g95 Compilers. We give an overview of the available optimization switches. If you experience any difficulties, you might have to progressively switch off some of them again.
Option Intel | Option PGI | Option gcc, gfortran | Meaning | Comments |
---|---|---|---|---|
Optimization | ||||
-O[0-3] | -O[0-3] | -O[0-3] | Specifies the code optimization level for applications. | Here O0 specifies no optimization, whereas O3 specifies the highest optimization level (see Compiler Documentations for further details). |
-fast | t.b.d. | -Ofast | Maximizes speed across the entire program. | Sets the following options -ipo, -O3, -no-prec-div, -static, and -xHost . |
-xHost | t.b.d. | -march=cpu-type | Tells the compiler to generate instructions for the highest instruction set available on the compilation host processor. | If Host is used: the highest instructions set of the compilation host is used.
cpu.type may be replaced by haswell, skylake, etc.
|
Intel compilers have-opt-streaming-stores [always|never|auto] or use the source code directive !DEC$ VECTOR NONTEMPORAL instead. | -Mnontemporal | Some programs may slow down with -fastsse due to prefetches used. Adding -Mnontemporal offers a different data movement scheme which may improve performance. | Worth a try during code tuning. May especially be useful for memory-bound code, since this supports cache bypass for streaming writes. | |
Code Transformations, Aliasing and Interprocedural Optimization | ||||
-fno-alias (-fno-fnalias) | n/a | n/a | Assume no aliasing (within functions) | This may give a considerable performance increase. Beware: Check your code yourself for pointer aliasing! |
-unroll[<number>] | -Munroll[=n:<number>] | -funroll-loops, -funroll-all-loops | Unroll loops | <number> (optional) gives the maximum number of times for unrolling. 0 disables unrolling, omitting it enables compiler heuristics for unrolling. Note that for the Intel compiler you can instead use a source code directive!DEC$ UNROLL(<number>) do i=1,imax ...in your code, which might be more useful. |
-ip | -Minline[=option[,option,...]] | -finline-functions | Enables interprocedural optimizations for single file compilation | performs inline function expansion for calls to functions defined within the current source file. For Intel compilers, you can disable full/partial inlining enabled by this option by also specifying -ip_no_inlining/-ip_no_pinlining. For the PGI compiler, please check out man page and user's guide for more information on inlining. |
-ipo | -Minline and -Mextract with suboptions |
( | Enables multifile interprocedural (IP) optimizations (between files). | Performs inline function expansion for calls to functions defined in separate files. For the Intel compiler, a set of source files must be specified as an argument. For the PGI compiler, an inline library must be explicitly created. |
Linkage Options | ||||
-c | -c | -c | compile only, do not link | This follows conventional usage. |
-Ldir | -Ldir | -Ldir | look for libraries in dir as well | This follows conventional usage. |
-lmylib | -lmylib | -lmylib | link with library libmylib.{a|so} | This follows conventional usage. |
[no-]heap-arrays | n/a | n/a | Allocate automatic arrays on heap (Fortran; default is to allocate on stack, which may lead to trouble for low stack limits) | |
-auto | Direct all local variables to be automatic (Fortran) | |||
n/a | -g77libs | n/a | add GNU Fortran libraries | Needed if g77-built objects are to be linked correctly. The Intel Compiler does not support this. |
Source format and Preprocessing | ||||
-FI or -fixed [-72|-80|-132] | -Mfixed | fixed format source code [with possibly extended width] | source file extension .f (Intel: also .ftn .for) automatically assumes fixed form | |
-FR or -free | -Mfree | free format source code | source file extension .f90 automatically assumes free form | |
-fpp [-P] | -F | Invoke preprocessor (C-style includes) | Intel Compiler: optional -P switch puts preprocessing results in output_file instead of compiling it. Open64 Compiler: -o switch required for preprocessing to output_file. PGI Compiler: source file must have extension .F, output is put into matching file with extension .f. | |
-Dname[=value] | define preprocessor macro | this follows conventional usage. | ||
-Idir | look for include files in dir as well. | This follows conventional usage. | ||
Options for Data and I/O | ||||
-i{2|4|8} | INTEGER and LOGICAL types of unspecified KIND use the indicated amount of bytes | Default value is 4; -i2 not available for Open64 | ||
-r{4|8|16} | -Mr8 | -r{4|8} | REAL types of unspecified KIND use the indicated amount of bytes | Default value is 4. A value of 8 would change all REAL variables to DOUBLE PRECISION. For the PGI Compilers only promotion from 4 to 8 byte REAL is available. |
Controlled via environment run time option. See Section on Big Endian I/O in the Troubleshooting document | -Mbyteswapio -byteswapio | (probably not available) | Do unformatted I/O in big endian instead of little endian | PGI Compiler: should enable you to read and write data compatible to Sun and SGI platforms. |
Diagnostics, Runtime Checking and Debugging | ||||
-g | -g | -g | Include symbols for debugging | Use DDT, totalview, gdb, or idb to debug, or pgdbg for PGI-compiled binaries |
-traceback | Generate traceback | Tells the compiler to generate extra information in the object file to provide source file traceback information when a severe error occurs at run time. | ||
-check all This option applies to Fortran Compilers only. T
| -C | (g77 had -ffortran-bounds-check) | run time checking | Full checking may incur a large performance penalty. Intel Fortran Compiler: The argument "all" switches on all available checks. It can be replaced by:
|
-opt-report -opt-report-level[min|max] | n/a | generate optimization report | The Intel compiler writes the report to stderr | |
-list | -Mlist | n/a | provide source listing | The Intel compiler writes the source listing to STDOUT, while the PGI compiler produces a file myprog.lst from myprog.f |
Parallelization and Vectorization | ||||
-openmp | -mp | generate multithreaded code from OpenMP directives in the source code | If used, this option must also be specified for linkage. | |
-openmp-stubs | n/a | Compile OpenMP programs for serial mode; directives are ignored and a stub library for the function calls is linked. | If used, this option must also be specified for linkage. | |
-openmp-report[0|1|2] | n/a | Diagnostic level for OpenMP parallelization | ||
-parallel | -Mconcur [=option[,option]] | perform (shared-memory) auto-parallelization | If used, this option must also be specified for linkage. Please refer to the PGI User's Guide, Section 3.1.2 for information on the -Mconcur suboptions. | |
-par-report[0|1|2] | n/a | Diagnostic level for automatic parallelization | ||
-par-threshold{n} | n/a | set threshold for autoparallelization of loops | -par_threshold0 : always parallelize -par_threshold25 : parallelize if chance of perf. increase is 25% -par_threshold75 : parallelize if chance of perf. increase is 75% (default) -par_threshold100 : onlyparallelize if absolutely sure. For the PGI compiler, the -Mconcur suboptions (q. v.) allow for a finer control of autoparallelization | |
-vec | t.b.d. | Enables or disables vectorization. | ||
-simd | t.b.d. | Enables or disables the SIMD vectorization feature of the compiler. | ||
-vec-report[0-5] | t.b.d. | Controls the diagnostic information reported by the vectorizer. | Here 0 specifies to report no diagnostic information, for the other levels please consult the Compiler Documentations. | |
-vec-threshold[n] | t.b.d. | Sets a threshold for the vectorization of loops. | -par_threshold0 : always vectorize -par_threshold75 : vectorize if chance of perf. increase is 50% -par_threshold100 : only vectorize if absolutely sure (default). |
Compiler Directives for the Intel compiler
The following table shows the source code directives as supported by the Intel Fortran compiler to help with tuning or debugging applications. Note that for fixed source format the "!" comment symbol in the first column needs to be replaced with a "c" comment symbol.
Directive | Meaning |
---|---|
| Ignore vector dependencies |
| Software pipelining hint |
| Split large loop |
| Unroll inner loop N times. Compiler heuristics used if N omitted. |
| Do not unroll loop |
| Prefetch Array A |
| Do not prefetch array A |
| Vectorize loop, CLAUSE = { ALWAYS [ASSERT]|ALIGNED|UNALIGNED|TEMPORAL|NONTEMPORAL [(var1 [, var2]...)] } For further details please see Compiler Documentations. |
| Do not vectorize loop. |