In software engineering, performance analysis, more commonly profiling, is the investigation of a program's behavior using information gathered as the program runs (i. Software engineering is the application of a systematic disciplined quantifiable approach to the development operation and maintenance of Software. e. it is a form of dynamic program analysis, as opposed to static code analysis). Dynamic program analysis is the analysis of computer software that is performed with executing programs built from that Software on a real or virtual processor (analysis performed Static code analysis is the analysis of computer Software that is performed without actually executing programs built from that software (analysis performed on executing The usual goal of performance analysis is to determine which parts of a program to optimize for speed or memory usage. In Computing, optimization is the process of modifying a system to make some aspect of it work more efficiently or use fewer resources
Contents |
A profiler is a performance analysis tool that measures the behavior of a program as it runs, particularly the frequency and duration of function calls. The output is a stream of recorded events (a trace) or a statistical summary of the events observed (a profile). Profilers use a wide variety of techniques to collect data, including hardware interrupts, code instrumentation, operating system hooks, and performance counters. For the slang term meaning female prostitution see Prostitution. In Computers hardware performance counters or hardware counters are a set of special-purpose registers built in modern Microprocessors to store the counts of hardware-related The usage of profilers is called out in the performance engineering process. Within Systems engineering, performance engineering encompasses the set of roles skills activities practices tools and deliverables applied at every phase of the Systems
As the summation in a profile often is done related to the source code positions where the events happen, the size of measurement data is linear to the code size of the program. In contrast, the size of a trace is linear to the program's execution time, making it somewhat impractical. For sequential programs, a profile is usually enough, but performance problems in parallel programs (waiting for messages or synchronization issues) often depend on the time relationship of events, thus requiring the full trace to get an understanding of the problem.
Profiler-driven program analysis on Unix dates back to at least 1979, when Unix systems included a basic tool "prof" that listed each function and how much of program execution time it used. Programming Language Design and Implementation ( PLDI) is the name of one of the ACM SIGPLAN 's most important conferences In 1982, gprof extended the concept to a complete call graph analysis (Gprof: a Call Graph Execution Profiler [1])
In 1994, Amitabh Srivastava and Alan Eustace of Digital Equipment Corporation published a paper describing ATOM [2]. The GNU Binary Utilities, or binutils, is a collection of Programming tools for the manipulation of Object code in various Object file A call graph (also known as a call Multigraph) is a directed graph that represents calling relationships between Subroutines in a Computer program Digital Equipment Corporation was a pioneering American company in the Computer industry ATOM is a platform for converting a program into its own profiler. That is, at compile time, it inserts code into the program to be analyzed. In Computer science, compile time refers to either the operations performed by a Compiler (the "compile-time operations" or Programming language That inserted code outputs analysis data. This technique, modifying a program to analyze itself, is known as "instrumentation".
In 2004, both the Gprof and ATOM papers appeared on the list of the 50 most influential PLDI papers of all time. Programming Language Design and Implementation ( PLDI) is the name of one of the ACM SIGPLAN 's most important conferences [3]
Flat profilers compute the average call times, from the calls, and do not breakdown the call times based on the callee or the context.
Call Graph profilers show the call times, and frequencies of the functions, and also the call-chains involved based on the callee. However context is not preserved.
The programming languages listed here have event-based profilers:
Some profilers operate by sampling. Sampling is that part of Statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern A sampling profiler probes the target program's program counter at regular intervals using operating system interrupts. The program counter, or shorter PC (also called the instruction pointer, part of the instruction sequencer in some Computers is a register in An operating system (commonly abbreviated OS and O/S) is the software component of a Computer system that is responsible for the management and coordination In Computing, an interrupt is an asynchronous signal from hardware indicating the need for attention or a synchronous event in software indicating the need for a change Sampling profiles are typically less accurate and specific, but allow the target program to run at near full speed.
Some profilers instrument the target program with additional instructions to collect the required information. Instrumenting the program can cause changes in the performance of the program, causing inaccurate results and heisenbugs. Unusual software bugs are a class of Software bugs that are considered exceptionally difficult to understand and repair Instrumenting can potentially be very specific but slows down the target program as more specific information is collected.
The resulting data are not exact, but a statistical approximation. The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods. [4]
Some of the most commonly used statistical profilers are GNU's gprof, Oprofile and SGI's Pixie. GNU ( pronounced) is a computer Operating system composed entirely of Free software. The GNU Binary Utilities, or binutils, is a collection of Programming tools for the manipulation of Object code in various Object file Silicon Graphics Inc (commonly initialised to SGI, historically sometimes referred to as Silicon Graphics Computer Systems or SGCS) is a company
When a sequential program has an infinite loop, the simplest way to find the problem is to run it under a debugger, halt it with a "pause" button (not a breakpoint), and examine the Call stack. SIMMON ( SIM ulation MON itor was a Proprietary Software testing system developed in the late 1960s in the IBM Product Test Laboratory In Computer science, a call stack is a dynamic stack data structure which stores information about the active Subroutines of a Computer program Each statement (or instruction) on the call stack is a function call, except for the one at the "bottom" of the stack. One of those statements is in an infinite loop, which can be found by single stepping and examining the context of each statement.
The method works even if the running time is finite. First the program is modified, if necessary, to make it take more than a few seconds, perhaps by adding a temporary outer loop. Then, while the program is doing whatever seems to take too long, it is randomly halted, and a record is made of the call stack. The process is repeated to get additional samples of the call stack. At the same time, the call stacks are compared, so as to find any statements that appear on more than one. Any such statement, if a way can be found to invoke it much less frequently or eliminate it, reduces execution time by the fraction of time it resided on the call stack. Once that is done, the entire process can be repeated, up to several times, usually resulting in significant cumulative speedups. This method is called "random halting" or "deep sampling".
In making these performance-enhancing modifications, one gets the sense that one is fixing bugs of a type that only make a program slow, not wrong. A name for this type of bug is "slug" (slowness bug). Programs as first written generally contain both bugs and slugs. Bugs are usually removed during program testing, while slugs are usually not, unless performance analysis is employed during development.
There are different kinds of slugs. Generally, things that could be done intentionally to make a program run longer can also occur unintentionally. One commonly accepted kind of slug is a "hot spot", which is a tight inner loop where the program counter spends much of its time. For example, if one often finds at the bottom of the call stack a linear search algorithm instead of binary search, this would be a true hot spot slug. However, if another function is called in the search loop, such as string compare, that function would be found at the bottom of the stack, and the call to it in the loop would be at the next level up. In this case, the loop would not be a hot spot, but it would still be a slug. In all but the smallest programs, hot spot slugs are rare, but slugs are quite common.
Data structures that are too general for the problem at hand might also slow software down. For example, if a collection of objects remains small, a simple array with linear search could be much faster than something like a "dictionary" class, complete with hash coding. With this kind of slug, the program counter is most often found in system memory allocation and freeing routines as the collections are being constructed and destructed.
Another common motif is that a powerful function is written to collect a set of useful information (from a database, for example). Then that function is called multiple times, rather than taking the trouble to save the results from a prior call. A possible explanation for this could be that it is beyond a programmer's comprehension that a function call might take a million times as long to execute as an adjacent assignment statement. A contributing factor could be "information hiding", in which external users of a module can be ignorant of what goes on inside it.
At this time, there are certain misconceptions in performance analysis. One is that timing is important. Knowing how much time is spent in functions is good for reporting improvements, but it provides only vague help in finding problems. The information that matters is the fraction of time that individual statements reside on the call stack.
Another misconception is that statistical precision matters. Typical slugs sit on the call stack between 5 and 95 percent of the time. The larger they are, the fewer samples are needed to find them. As in sport fishing, the object is to catch them first, and measure them later, if ever.
As an example, the iteration of slug removal tends to go something like this: Slug X1 could be taking 50% of the time, and X2 could be taking 25% of the time. If X1 is removed, execution time is cut in half, at which point X2 takes 50% of the time. If on the first pass X2 is removed instead, the time is only reduced by 1/4, but then X1 is seen as taking 67% of the time, so it is even more obvious, and can be removed. Either way, removing both X1 and X2 reduces execution time by 75%, so the remaining slugs are four times larger. This "magnification effect" allows the process to continue through X3, X4, and so on until all the easily-removed slugs have been fixed.