

14 The PGPROF Profiler
This
chapter introduces the PGPROF profiler. The profiler is a tool that
analyzes data generated during execution of specially compiled C,
C++, F77, F90 and HPF programs. The PGPROF profiler allows users
to discover which functions and lines were executed as well as how often they
were executed and how much of the total time they consumed.
The PGPROF profiler also allows you to select processor information
about HPF programs on multiprocessor systems. The multiprocessor information
allows you to select combined minimum and maximum processor data, or to select
processor data on a processor by processor basis. This information can be used
to identify communications patterns, and identify the portions of a program
that will benefit the most from performance tuning.
Profiling
is a three step process:
- Compilation
- Compiler switches cause special profiling calls to be inserted in the code
and data collection libraries to be linked in.
- Execution
- The profiled program is invoked normally, but collects call counts and
timing data during execution. When the program terminates, a profile data file
is generated (pgprof.out ).
- Analysis
- The PGPROF tool interprets the pgprof.out file to display
the profile data and associated source files. The profiler supports function
level and line level data collection modes. The next section provides
definitions for these data collection modes.
- Function Level Profiling
Is the strategy of collecting call counts and execution times on a per
function basis.Line
Level Profiling
Execution counts and times within each function are collected in
addition to function level data. Line Level is somewhat of a
misnomer because the granularity ranges from data for individual statements to
data for large blocks of code, depending on the optimization level. At
optimization level 0, the profiling is truly line level.Basic
Block
- At optimization levels above 0, code is broken into basic blocks, which are
groups of sequential statements without any conditional or looping controls.
Line level profile data is collected on basic blocks rather than individual
statements at these optimization levels.Virtual
Timer
A statistical method for collecting time information by directly
reading a timer which is being incremented at a known rate on a processor by
processor basis.
- Data
Set
- A profile data file is considered to be a data set.Host
- The system on which the PGPROF tool executes. This will generally be
the system where source and executable files reside, and where compilation is
performed.Target
Machine
- The system on which a profiled program runs. This may or may not be the
same system as the host.
- GUI
- Graphical User Interface. A set of windows, and associated menus, buttons,
scrollbars, etc., that can be used to control the profiler and display the
profile data.
The
following list shows driver switches which cause profile data collection calls
to be inserted and libraries to be linked in the executable file:
- -Mprof=func
- insert calls to produce a pgprof.out file for function level data.
- -Mprof=lines
- insert calls to produce a pgprof.out file which contains both
function and line level data.
Once
a program is compiled for profiling, it needs to be executed. The profiled
program is invoked normally, but while running it collects call counts and/or
time data. When the program terminates, a profile data file called
pgprof.out is generated.
Running
the PGPROF profiler allows the profile data produced during the
execution phase to be analyzed and initializes the profiler.
The PGPROF profiler is invoked as follows:
% pgprof [options] [-I srcdir] [datafile]
If
invoked without any options or arguments, the PGPROF profiler looks for
the pgprof.out data file and the program source files in the current
directory. The program executable name, as specified when the program was run,
is usually stored in the profile data file. If all program related activity
occurs in a single directory, the PGPROF profiler needs no arguments. If
present, the arguments are interpreted as follows:
- -s
- Read
commands from standard input. On hosts which have a GUI, this causes
PGPROF to operate in a non-graphical mode. This is useful if input is
being redirected from a file or if the user is remotely logged in to the host
system.
- -Isrcdir
Add
a directory to the source file search path. The PGPROF profiler will
always look for a program source file in the current directory first. The -I
option can be used multiple times to append additional directories to the
search path. Directories will be searched in the order specified. It is
acceptable to leave white space between the -I and the srcdir
arguments.
- datafile
A single datafile name may be specified on the command
line.
An initialization file named .pgprofrc may be placed in the
current directory. The data in this file will be interpreted as command line
arguments, with any number of arguments per line. A word beginning with # is a
comment and causes the rest of the line to be ignored. A typical use of this
file would be to specify multiple source directories. The .pgprofrc file
is read after the command line arguments have been processed. Any arguments
provided on the invocation line will override conflicting arguments found in
the .pgprofrc file.
The
PGPROF data collection libraries now employ a single unified method for
gathering time information - Virtual Timer - rather than the three
methods previously supported.
This
data collection method employs a single timer, which starts at zero (0) and is
incremented at a fixed rate while the active program is being profiled. For
multiprocessor programs, there is a timer on each processor, and the profiler's
summary data (minimum, maximum and per processor) is based on each processor's
time to run a function. How the timer is incremented and at what frequency
depends on the target machine. The timer is read from within the data
collection functions and is used to accumulate COST and TIME values for each
line, function, and the total execution time. The line level data is based on
source lines; however, in some cases, there may be multiple statements on a
line and the profiler will show data for each statement.
NOTE: due to the timing mechanism used by the profiler to gather data,
information provided for longer running functions will be more accurate than
for functions that only execute for a short percentage of the timer's
granularity. Refer to the list of Caveats below for more profiler limitations.
The
following statistics are collected and may be displayed by the PGPROF
profiler.
- BYTES
For HPF profiles only. This is the number of message bytes sent and
received by the function or line.
- BYTES RECEIVED
For HPF profiles only. This is the number of bytes received by the
function or line in a data transfer.
- BYTES SENT
For HPF profiles only. This is the number of bytes sent by the
function or line.
- CALLS
This is the number of times a function is called.
- COST
This is the sum of the differences between the timer value entering and
exiting a function. This includes time spent on behalf of the current function
in all children whether profiled or not.
- COUNT
This is the number of times a line or function is executed.
- COVERAGE
This is the percentage of lines in a function that were executed at
least once.
- LINE NUMBER
For line mode, this is the line number for that line. For function
mode, this is the line number of the first line of the function.
- MESSAGES
For HPF profiles only. This is the number of messages sent and
received by the function or line.
- RECEIVES
For HPF profiles only. This is the number of messages received by the
function or line.
- SENDS
For HPF profiles only. This is the number of messages sent by the
function or line.
- STMT ON LINE
For programs with multiple statements on a line, data is collected and
displayed for each statement individually.
- TIME
This is only the time spent within the function or executing the line.
The TIME does not include time spent in functions called from this function or
line. TIME may be displayed in seconds or as a percent of the total time.
- TIME PER CALL
This is the TIME for a function divided by the CALLS to that function.
TIME PER CALL is displayed in milliseconds.
The data provided by virtual
timer profiling based collection allows you to analyze relationships between
functions and between processors.
Collecting
performance data for programs running on high speed processors and parallel
processors is a difficult task. There is no ideal solution. Since programs
running on these processors tend to operate within large internal caches,
external hardware cannot be used to monitor their behavior. The only other way
to collect data is to alter the program itself, which is how this profiling
process works. Unfortunately, it is impossible to do this without affecting the
temporal behavior of the program. Every effort has been made to strike a
balance between intrusion and utility, and to avoid generating misleading or
incomprehensible data. It would, however, be unwise to assume the data is
beyond question.
Many
target machines provide a clock resolution of only 20 to 100 ticks per second.
Under these circumstances a function must consume at least a few seconds of CPU
time to generate meaningful line level times.
At
higher optimization levels, and especially with highly vectorized code,
significant code reorganization may have occurred within functions. Most line
profilers deal with this problem by disallowing profiling above optimization
level 0. The PGPROF profiler allows line profiling at any optimization
level, and significant effort was expended on associating the line level data
with the source in a rational manner and avoiding unnecessary intrusion.
Despite this effort, the correlation between source and data may at times
appear inconsistent. Compiling at a lower optimization level or examining the
assembly language source may be necessary to interpret the data in these cases.
The
PGPROF X-Windows Graphical User Interface (GUI) is invoked using the
command pgprof. This chapter describes how to use the profiler with
the GUI on systems where it's supported. There may be minor variations in the
GUI from host to host, depending on the type of monitor available, the settings
for various defaults and the window manager used. Some monitors do not support
the color features available with the PGPROF GUI. The basic interface
across all systems remains the same, as described in this chapter, with the
exception of the differences tied to the display characteristics and the window
manager used.
There are two major advantages provided by the PGPROF GUI.
- Source Interaction
The PGPROF GUI allows a user to view the program source for
any known function in the line profiler window whether or not line level
profile data is available simply by selecting the function name. Since
interpreting profile data usually involves correlating the program source and
the data, the source interaction provided by the GUI greatly reduces the time
spent interpreting data. The GUI allows users to easily compare data on a per
processor basis, and identify problem areas of code based on processor
execution time differences for functions or lines.
- Graphical Display of Data
It is often difficult to visualize the relationships between the
various percentages and execution counts. The GUI allows bar graphs to be
displayed which graphically represent these relationships. This makes it much
easier to locate the 'hot spots' while scrolling through the data for a large
program.
PGPROF
command line switches may be used to control some features of the GUI. These
command line switches may be used when the PGPROF profiler is
invoked.
- -bg <color>
- sets the display background color to color; the default is set by
the Motif libraries. For example: -bg blue
- -fg <color>
- sets the display foreground color to color; the default is set by
the Motif libraries. For example: -fg black
- -bar <num>
- sets the width of bar graphs to num, in number of characters.
- -source <num>
- sets the number of characters of the source program to display for line
level data to num.
- -file <num>
- sets the number of characters of the filename to display to num.
- -high <color>
- -medium <color>
- -low <color>
- -verylow <color>
Bar graphs are divided into four groups by length, at 25%, 50% and 75%
of the longest bar. These bar coloring options (-high,
-medium, -low and -verylow) let you set the color to use
for these four bar groups.
- -title <string>
- sets the window title to string.
Normal X-windows switches
may also be used, such as -display and -geometry.
In addition to normal X-windows resources, PGPROF uses the following
resources, which can be set with the xrdb command.
- pgprof.bar: num
- equivalent to -bar num
- pgprof.source: num
- equivalent to -source num
- pgprof.filename: num
- equivalent to -file num
- pgprof.high: color
- equivalent to -high color
- pgprof.medium: color
- equivalent to -medium color
- pgprof.low: color
- equivalent to -low color
- pgprof.verylow: color
- equivalent to -verylow color
- pgprof.foreground: color
equivalent to -fg color
- pgprof.background: color
equivalent to -bg color
- pgprof.browser: path
- sets the path to the web browser used to browse the help page.
- pgprof.helppage: http-address
the http-address of the PGPROF HTML help page.
- pgprof.browserdirect: command
a printf format string used to construct the
arguments to the browser; use %s (percent signs)
where the
http-address should appear.
The
profiler window is divided into five areas from top to bottom, as follows: the
Menu Bar area, the Title area, the Sort/Select area, the Display area and the
Messages area. The illustration in figure 14-1 depicts a function-level profile
window.

Figure 14-1
The Menu Bar contains File, Options, Sort,
Select, Processors, View and Help menus.
Any of these menus can be selected with a mouse click or by keyboard shortcuts.
For example, use Alt+F for File. All menus have tear-off mode enabled. This
is performed by clicking on the dashed line on the top of each menu.
The Title area displays the name of the executable as well as the date
and time the executable was created. To the right, it also displays the total
execution time of the run, the number of processes or processors it used and
the date and time of the run.
The Sort/Select area allows you to re-sort the functions or select
subsets of the functions based on time, cost, coverage, or other properties;
line mode windows do not have a Sort/Select area.
The Messages area contains a scrollable display with informational
messages from the PGPROF profiler.
The
File menu permits the following actions:
- Open <Ctrl+O>
- Opens a file-selection window to allow you to select a new profiler output
file to display. The new data is displayed in a new window.
- Merge
- Opens a file-selection window to allow you to select a profiler output file
to merge with the current file. The merged execution times are added for each
function and line. The merged file must have been created with the same
program and executable.
- Print
- Sends the data in the Display area to a printer
- Print to File
Prints the data in the Display area to a file.
- Append to File
Appends the data in the Display area to the file most recently
created by the Print to File action.
- Close <Ctrl+C>
Closes the current PGPROF
window.
- Quit <Ctrl+Q>
Closes all PGPROF windows and exits the application.
The
Options menu controls the following options:
- Printer Options
Allows you to select the printer command used for the Print
action. The default is "lpr".
- Help Options
Allows you to change the path to the browser and help page used for
interactive help.
- Source Directory
Allows you to add another directory to the search path for source files.
The
Sort menu and the Sort option box in the
Sort/Select area allow you to sort the functions by any of several keys. The
Sort menu and Sort option box give the same
functionality. The Sort menu is not available for line-level
windows. The sort keys are:
- Name
- function name (alphabetical)
- File Name
- name of the source file (alphabetical)
- Calls
- number of calls to this function (numerical)
- Time
- execution time spent in this function
- Cost
- execution time spent in this function and all functions called from this
function
- Coverage
- percentage of lines in this function that were executed
- Time/Call
- ratio of Time and Calls
- Messages
- for HPF profiles, the number of messages sent and received
- Messages Sent
for HPF profiles, the number of messages sent
- Messages Received
for HPF profiles, the number of messages received
- Bytes
- for HPF profiles, the length of all messages sent and received in bytes
- Bytes Sent
- for HPF profiles, the length of all messages sent in bytes
- Bytes Received
for HPF profiles, the length of all messages received in bytes
The
Select menu and the Select option box in the
Sort/Select area allow you to select a subset of the functions by any one of
several properties. The Select menu and Select
option box give the same functionality. The Select menu is not
available for line-level windows. The selection options are:
- All
- all functions are displayed
- Calls
- only functions with more than N calls are displayed. Setting the
value N is described at the end of this subsection.
- Time
- only functions taking more than N% of the total execution time are
displayed. Setting the value N is described at the end of this
subsection.
- Coverage
- only functions with coverage less than N% coverage are displayed.
Setting the value N is described at the end of this subsection.
- Executed
- only functions that were actually executed are displayed.
- Unexecuted
- only functions that were never called are displayed.
The value
N used in the description of Calls, Time
and Coverage above can be set by typing into the text window in
the Sort/Select area or by clicking on the up/down
arrows next to that text window.
For
HPF profiles, the Processors menu allows you to choose which processor data to
display. One or more options may be selected. The options are:
- Maximum
- displays the maximum value (time, cost, calls, count, etc.) from among all
processors.
- Average
- displays the average value of all processors.
- Minimum
- displays the minimum value from among all processors.
- Sum
- displays the sum of values for all processors.
- All
- displays data for each processor for each functions or lines display. One
line is used for each processor. Note: if many processors
were used, the display can be quite long.
- Individual
- opens a selection window allowing you to select individual processor data
to display.
- None
- None turns off individual processor displays. Note that only one of All,
Individual and None may be selected.
The
View menu lets you select which data to display. The data that may be viewed
for functions is:
- Filename
- name of the source file containing the function.
- Line Number
- line number where the function starts in the source file.
- Name
- name of the function.
- Processor
- for HPF profiles, the processor number to which this data line corresponds,
or the string "max", "avg", "min" or "sum".
- Calls
- number of calls to the function. This may be displayed numerically or as a
bar chart.
- Time
- time spent in this function. This may be displayed numerically in seconds
or as a percent of total time; or it may be displayed as a bar chart. It may
also be displayed as Time Per Call numerically in milliseconds or as a bar
chart.
- Cost
- time spent in this function and all functions called from this function.
This may be displayed numerically in seconds or as a percent of total time; or
it may be displayed as a bar chart.
- Coverage
- number of lines that were actually executed. This may be displayed
numerically as a line count or as a percent of actual coverage; or it may be
displayed as a bar chart.
- Messages
- for HPF profiles, the number of messages total, or sent, or received; all
either numerically or as a bar chart. Additionally, messages that were
executed on the same processor as copies may be displayed
numerically or as bar charts.
- Bytes
- for HPF profiles, the total length of all messages in bytes, or messages
sent, or messages received; all either numerically or as bar charts.
Additionally, the bytes count for messages that were executed on the same
processor as copies may be displayed.

Figure 14-2
The illustration above shows an individual source line window. Selecting a
function name from the function-level profile window and invoking it, usually
by double-clicking will cause a line-level source window to be displayed. The
data that may be viewed for individual source lines is:
- Line Number
- line number in the file.
- Stmt/on/Line
for programs with multiple statements on one line.
- Source
- the program source text.
- Processor
- for HPF profiles, the processor number to which this data line corresponds,
or the string "max", "avg", "min", "sum".
- Counts
- the number of times this line was executed. This may be displayed
numerically or as a bar chart.
- Time
- the time spent executing this line. The Seconds may be displayed
numerically, as a percent of total time, or as a bar chart. Alternately, Time
per Count may be displayed numerically in milliseconds or as a bar chart.
- Cost
- time spent executing this line and all functions called from this line.
This may be displayed in Seconds, as a Percent of Cost or as a Bar Chart.
- Messages
- for HPF profiles, the number of messages total, or sent, or received; all
either numerically or as a bar chart. Additionally, messages that were
executed on the same processor as copies may be displayed
numerically or as bar charts.
- Bytes
- for HPF profiles, the total length of all messages in bytes, or messages
sent, or messages received; all either numerically or as bar charts.
Additionally, the bytes count for messages that were executed on the same
processor as copies may be displayed.
The
Help menu has two options:
- About
- this option opens a window giving version information about PGPROF.
- Index
- this option starts up a WWW browser (Netscape) to interactively browse the
PGPROF help page.
The
interface for non-GUI versions of the PGPROF profiler is a simple
command language. This command language is available in GUI versions of the
profiler using the -s option. The language is composed of commands and
arguments separated by white space. A pgprof> prompt is issued
unless input is being redirected.
This
section describes the profiler's command set. Command names are printed in bold
and may be abbreviated as indicated. Arguments contained in [ and ] are
optional. Separating two or more arguments by | indicates that any one is
acceptable. Argument names in italics are chosen to indicate what kind
of argument is expected. Argument names which are not in italics are
keywords and should be entered as they appear.
- d[isplay] [display options] | all | none
Specify display information. This includes information on minimum
values, maximum values, average values, or per processor data.
- he[lp] [command]
Provide brief command synopsis. If the command argument is
present only information for that command will be displayed. The character "?"
may be used as an alias for help.h[istory]
[ size ]
Display the history list, which stores previous commands in a manner
similar to that available with csh or dbx . The optional size argument
specifies the number of lines to store in the history list.
- l[ines] function [[>] filename]
Print (display) the line level data together with the source for the
specified function. If the filename argument is present the
output will be placed in the named file. The '>' means redirect output, and
is optional.lo[ad]
[ datafile]
Load a new dataset. With no arguments reloads the current dataset. A
single argument is interpreted as a new data file. With two arguments, the
first is interpreted as the program and the second as the data file.m[erge]
datafile
Merge the profile data from the named datafile into the
current loaded dataset. The datafile must be in standard
pgprof.out format, and must have been generated by the same executable
file as the original dataset (no datafiles are modified.)pro[cess]
processor_num
For HPF profiles, specify the processor number of the data to
display.
- p[rint] [[>] filename]
Print (display) the currently selected function data. If the
filename argument is present the output will be placed in the named
file. The '>' means redirect output, and is optional.q[uit]
- Exit the profiler.
- sel[ect] coverage | covered | uncovered | all
[[<] cutoff]
This is the coverage mode variant of the select command. The
cutoff value is interpreted as a percentage and is only applicable to the
coverage option. The '<' means less than, and is optional. The
default is coverage < 100%.sel[ect]
calls | time/call | time | cost | all [[>] cutoff]
You can choose to display data for a selected subset of the functions.
This command allows you to set the selection key and establish a cutoff
percentage or value. The cutoff value must be a positive integer, and for time
related fields is interpreted as a percentage. The '>' means greater than,
and is optional. The default is time > 1%.
- sh[ell] arg1, arg2, argn...
For a shell using the given arguments.
- so[rt] [by] calls | time/call | time | cost | name
(Profile Mode) Function level data is displayed as a sorted list. This
command establishes the basis for sorting. The default is time.so[rt]
[by] coverage | name
This is the coverage mode variant of the sort command. The
default is coverage, which causes the functions to be sorted based on
percentage of lines covered, in ascending order.src[dir]
directory
Add the named directory to the source file search path. This
is useful if you neglected to specify source directories at invocation.s[tat]
[no]min|[no]avg|[no]max|[no]proc|[no]all]
Set which HPF fields to display or do not display with the no versions.t[imes]
raw | pct
Specify whether time related values should be displayed as raw
numbers or as percentages. The default is pct. This command does not
exist in coverage mode.
- !!
- repeat previous command.
- ! num
- repeat previous command numbered num in the history list.
- !-num
- repeat the num-th previous command numbered num in the
history list.
- ! string
- repeat the most recent previous command starting with string from
the history list.

