Program Analysis and Transformation Survey and Links
Table of Contents:
Glossary
- Critical edge (of a graph) - Edge between a vertex which also has other successors and a vertex which has other predecessors.
(wikipedia)
- DCE - Dead Code Elimination (wikipedia)
- Graph - (wikipedia)
- LOLSSA - [ this entry is a joke! due to lolspeak ]
a) SSA as defined in some early papers on the matter, especially
in the part of out-of-SSA conversion (see epigraph to
SSA Deconstruction section below);
b) a similar version of SSA used in some (oftentimes amateur) projects
decades later.
Names
As a dedication:
We study Program Analysis because it's objective and complex phenomena of
nature devoid of subjectivities of the mankind. But then, we can't separate
it from the work of great human minds who laid the paths in this area, whose
steps we now follow.
These are people who contributed to the Program Analysis field of study
(with all the apology to many more who are not listed here). The emphasis
here is on well-knownness and public availability of their works:
- Gregory Chaitin
- Jeffrey Ullman
- Alfred Aho
-
Keith Cooper
- thesis: 1983 "Interprocedural Data Flow Analysis in a Programming Environment"
-
Andrew Appel
- thesis: 1985 "Compile-time Evaluation and Code Generation in Semantics-Directed Compilers"
- book 1998: "Modern Compiler Implementation in ML/Java/C"
- 2000: Optimal Register Coalescing Challenge
-
Preston Briggs
- thesis: 1992 "Register Allocation via Graph Coloring"
-
Clifford Click @cliffclick
- thesis: 1995 "Combining Analyses, Combining Optimizations"
-
John Aycock
- thesis: 2001 "Practical Earley Parsing and the SPARK Toolkit"
- Hacked on Python compilation: [1], [2], [3], [4]
- Now hacks on retrogaming: [1], [2]
-
Sebastian Hack
- thesis: 2006 "Register Allocation for Programs in SSA Form"
-
Matthias Braun @MatzeB
- thesis: 2006 "Heuristisches Auslagern in einem SSA-basierten Registerzuteiler" in German, "Heuristic spilling in an SSA-based register allocator"
-
Sebastian Buchwald
- thesis: 2008 "Befehlsauswahl auf expliziten Abhangigkeitsgraphen" in German, "Instruction selection on explicit dependency graphs"
-
Florent Bouchez
- thesis: 2009 "A Study of Spilling and Coalescing in Register Allocation as Two Separate Phases"
-
Benoit Boissinot
- thesis: 2010 "Towards an SSA based compiler back-end: some interesting properties of SSA and its extensions"
- Quentin Colombet
- thesis 2012: "Decoupled (SSA-based) Register Allocators: from Theory to Practice, Coping with Just-In-Time Compilation and Embedded Processors Constraints"
Intermediate Representation Forms/Types
- Imperative
- Functional
- Static Single-Assignment (SSA) - As argued by Appel, SSA is a functional representation.
- (Lambda-)Functional
- Continuation-passing Style (CPS)
SSA Form
Put simple, in an SSA program, each variable is (statically, syntactically)
assigned only once.
Wikipedia: https://en.wikipedia.org/wiki/Static_single_assignment_form
General reference: "SSA Book" aka "Static Single Assignment Book" aka
"SSA-based Compiler Design" is open, collaborative effort of many SSA
researchers to write a definitive reference for all things SSA.
Classification of SSA types
- Axis 1: Minimality. There're 2 poles: fully minimal vs fully maximal SSA form.
Between those, there's continuum of intermediate cases.
- Fully maximal
- Optimized maximal
- An obvious optimization of avoiding placing phi functions in blocks with
a single predecessor, as they never needed there. While cuts the number
of phi functions, makes renaming algorithm a bit more complex: while for
maximal form renaming could process blocks in arbitrary order (because
each of program's variables has a local definition in every basic block),
optimized maximal form requires processing predecessor first for each
such single-predecessor block.
- Minimal for reducible CFGs
- Some algorithms (e.g. optimized for simplicity) naturally produce minimal
form only for reducible CFGs. Applied to non-reducible CFGs, they may
generate extra Phi functions. There're usually extensions to such
algorithms to generate minimal form even for non-reducible CFGs too (but
such extensions may add noticeable complexity to otherwise "simple"
algorithm). Examplem of such an algorithm ins 2013 Braun et al.
- Fully minimal
- This is usually what's sought for SSA form, where there're no superflous
phi functions, based only on graph properties of the CFG (with consulting
semantics of the underlying program).
- Axis 2: Prunedness. As argued (implied) by 2013 Braun et al., prunedness is
a separate trait from minimality. E.g., their algorithm constructs not fully
minimal, yet pruned form. Between pruned and non-pruned forms, there're
intermediate types again.
- Pruned
- Minimal form can still have dead phi functions, i.e. phi functions which
reference variables which are not actually used in the rest of the program.
Note that such references are problematic, as they artificially extend live
ranges of referenced variables. Likewise, it defines new variables which
aren't really live. The pruned SSA form is devoid of the dead phi functions.
Two obvious way to achieve this: a) perform live variable analysis prior to
SSA construction and use it to avoid placing dead phi functions; b) run
dead code elimination (DCE) pass after the construction (which requires
live variable analysis first, this time on SSA form of the program already).
Due to these additional passes, pruned SSA construction is more expensive
than just the minimal form. Note that if we intend to run DCE pass on the
program anyway, which is often happens, we don't really need to be concerned
to construct pruned form, as we will get it after the DCE pass "for free".
Except of course that minimal and especially maximal form require more
space to store and more time to go thru it during DCE.
- Semi-pruned
- Sometimes called "Briggs-Minimal" form. A compromise between fully
pruned and minimal form. From Wikipedia:
Semi-pruned SSA form[6] is an attempt to reduce the number of Φ
functions without incurring the relatively high cost of computing
live variable information. It is based on the following observation:
if a variable is never live upon entry into a basic block, it never
needs a Φ function. During SSA construction, Φ functions for any
"block-local" variables are omitted.
- Not pruned
- Axis 2: Conventional vs Transformed SSA
- Conventional
- Allows for easy deconstruction algorithm (literally, just drop
SSA variables subscripts and remove Phi functions). Usually,
after construction, SSA is in conventional form (if during
construction, additional optimizations were not performed).
- Transformed
- Some optimizations applied to an SSA program make simple deconstruction
algorithm outlined above not possible (not producing correct
results). This is known as "transformed SSA". There're algorithms
to convert transformed SSA into conventional form.
- Axis 3: Strict vs non-strict SSA
- Non-strict SSA allows some variables to be undefined on some paths
(just like conventional imperative programs).
- Strict form requires each use to be dominated by definition. This
in turn means that every variable must be explicitly initialized.
Non-strict program can be trivially converted into strict form, by
initializing variables with special values, like "undef" for truly
undefined values, "param" for function paramters, etc. Most of SSA
algorithms requires/assume strict SSA form, so non-strict is not
further considered.
Discussion: There's one and true SSA type - the maximal one. It has a
straightforward, easy to understand construction algorithm which does
not depend on any other special algorithms. Running a generic
DCE algorithm on it will remove any redundancies of the maximal form
(oftentimes, together with other dead code). All other types are
optimizations of the maximal form, allowing to generate less Phi
functions, so less are removed later. Optimizations are useful, but
the usual warning about premature optimization applies.
History
According to Aycock/Horspool:
The genesis of SSA form was in the 1960s with the work of Shapiro and
Saint [23,19]. Their conversion algorithm was based upon finding equivalence
classes of variables by walking the control-flow graph.
R. M. Shapiro and H. Saint. The Representation of Algorithms. Rome Air
Development Center TR-69-313, Volume II, September 1969.
Given the possibility of concurrent operation, we might also wish to question the automatic one-one mapping of variable names to equipment locations. Two uses of the same variable name might be entirely unrelated in terms of data dependency and thus potentially concurrent if mapped to different equipment locations.
Continues on the p.31 of the paper (p.39 of the PDF) under the title:
VI. Variable-Names and Data Dependency Relations
Then, following Wikipedia, "SSA was proposed by Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck in 1988."
Barry Rosen; Mark N. Wegman; F. Kenneth Zadeck (1988). "Global value numbers and redundant computations"
Construction Algorithms
Based on excerpts from "Simple Generation of Static Single-Assignment Form",
Aycock/Horspool
For Reducible CFGs (i.e. special case)
-
1986 R. Cytron, A. Lowry, K. Zadeck. Code Motion of Control Structures in High-Level
Languages. Proceedings of the Thirteenth Annual ACM Symposium on Principles
of Programming Languages, 1986, pp. 70–85.
Cytron, Lowry, and Zadeck [11] predate the use of φ-functions, and employ
a heuristic placement policy based on the interval structure of the control-flow
graph, similar to that of Rosen, Wegman, and Zadeck [22]. The latter work is
interesting because they look for the same patterns as our algorithm does during
our minimization phase. However, they do so after generating SSA form, and
then only to correct ‘second order effects’ created during redundancy elimination.
-
1994 Single-Pass Generation of Static Single-Assignment Form for Structured Languages, Brandis and Mössenböck
Brandis and Mössenböck [5] generate SSA form in one pass for structured control-
flow graphs, a subset of reducible control-flow graphs, by delicate placement of
φ-functions. They describe how to extend their method to reducible control-flow
graphs, but require the dominator tree to do so.
-
2000 Simple Generation of Static Single-Assignment Form, 2000, John Aycock and Nigel Horspool
In this paper we present a new, simple method for converting to SSA
form, which produces correct solutions for nonreducible control-flow
graphs, and produces minimal solutions for reducible ones.
For Non-Reducible CFGs (i.e. general case)
-
1991 R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently
Computing Static Single-Assignment Form and the Control Dependence Graph.
ACM TOPLAS 13, 4 (October 1991), pp. 451–490.
"Canonical" SSA construction algorithm.
-
1994 R. Johnson, D. Pearson, and K. Pingali. The Program Structure Tree: Computing
Control Regions in Linear Time. ACM PLDI ’94, pp. 171–185.
Johnson, Pearson, and Pingali [16] demonstrate conversion to SSA form as
an application of their “program structure tree,” a decomposition of the control-
flow graph into single-entry, single-exit regions. They claim that using this graph
representation allows them to avoid areas in the control-flow graph that do not
contribute to a solution.
-
1995 V. C. Sreedhar and G. R. Gao. A Linear Time Algorithm for Placing φ-Nodes.
Proceedings of the Twenty-Second Annual ACM Symposium on Principles of
Programming Languages, 1995, pp. 62–73.
Sreedhar and Gao [24] devised a linear-time algorithm for φ-function
placement using DJ-graphs, a data structure which combines the dominator tree
with information about where data flow in the program merges.
-
2013 M. Braun, S. Buchwald, S. Hack, R. Leißa, C. Mallon, and
A. Zwinkau. Simple and efficient construction of static single assignment
form. In R. Jhala and K. Bosschere, editors, Compiler Construction,
volume 7791 of Lecture Notes in Computer Science, pp.
102–122. Springer, 2013. doi: 10.1007/978-3-642-37051-9_6.
Braun, et al present a simple SSA construction algorithm, which allows
direct translation from an abstract syntax tree or bytecode into an
SSA-based intermediate representation. The algorithm requires no prior
analysis and ensures that even during construction the intermediate representation
is in SSA form. This allows the application of SSA-based optimizations
during construction. After completion, the intermediate representation
is in minimal and pruned SSA form. In spite of its simplicity,
the runtime of the algorithm is on par with Cytron et al.’s algorithm.
- 2016 Verified Construction of Static Single Assignment Form, Sebastian Buchwald,
Denis Lohner, Sebastian Ullrich
Deconstruction Algorithms
Epigraph (due to Boissinot, slide 20):
Naively, a k-input Phi-function at entrance to a node X can
be replaced by k ordinary assignments, one at the end of
each control flow predecessor of X. This is always correct...
-- Cytron, Ferrante, Rosen, Wegman, Zadeck (1991)
Efficiently computing static single assignment form and the control
dependence graph.
Cytron et al. (1991): Copies in predecessor basic blocks.
Incorrect!
- Bad understanding of parallel copies
- Bad understanding of critical edges and interference
Briggs et al. (1998)
Both problems identified. General correctness unclear.
Sreedhar et al. (1999)
Correct but:
- handling of complex branching instructions unclear
- interplay with coalescing unclear
- "virtualization" hard to implement
Many SSA optimizations turned off in gcc and Jikes.
TBD. Some papers in the "Construction Algorithms" section also include
information/algorithms on deconstruction.
Converting out of SSA is effectively elimination (lowering) of Phi
functions. (Note that Phi functions may appear in a program which is
not (purely) SSA, so Phi elimination is formally more general process
than conversion out of SSA.)
There are 2 general ways to eliminate Phi functions:
-
Requires splitting critical edges, but doesn't introduce new variables
and extra copies:
Treat Phi functions as parallel copies on the incoming edges. This
requires splitting critical edges. Afterwards, parallel copies are
sequentialized.
-
Does not require splitting critical edges, but introduces new
variables and extra copies to them which then would need to be
coalesced:
For Conventional SSA (CSSA), result and arguments of a Phi can
be just renamed to the same name (throughout the program), and the Phi
removed. This is because arguments and result do not interfere among
themselves (CSSA is produced by normal SSA construction algorithms, which
don't perform copy propagation and value numbering during construction).
Arbitrary SSA (or Transformed SSA, TSSA) can be converted to CSSA
by splitting live ranges of Phi arguments and results, by renaming them
to new variables, then inserting parallel copy of old argument variables
to new at the end of each predecessor, and parallel copy of all Phi
results, after all the Phi functions of the current basic block. These
parallel copies (usually) can be sequentialized trivially (so oftentimes
even not treated as parallel in the literature). This method does not
require splitting critical edges, but introduces many unnecessary copies
(intuitively, for non-interfering Phi variables), which then need to
be optimized by coalescing (or alternatively, unneeded copies should
not be introduced in the first place).
Control Flow Analysis
According to 1997 Muchnick:
- Analysis on "raw" graphs, using dominators, the iterative dataflow algorithms
- Interval Analysis, which then allows to use adhoc optimized dataflow analysis.
Variants in the order of advanceness:
- The simplest form is T1-T2 reduction
- Maximal intervals analysis
- Minimal intervals analysis
- Structural analysis
Alias Analysis
Register Allocation
Wikipedia: https://en.wikipedia.org/wiki/Register_allocation
Terms:
- Decoupled allocator - In classic register allocation algorithms, variables
assignment to registers and spilling of non-assignable variables are
tightly-coupled, interleaving phases of the single algorithm. In a decoupled
allocator, these phases are well separated, with spilling algorithm first
selecting and rewriting spilling variables, and assignment algorithm then
dealing with the remaining variables. Most of decoupled register allocators
are SSA-based, though recent developments also include decoupled allocators
for standard imperative programs.
- Chordal graph - A type of graph, having a property that it can be colored
in polynomial time (whereas generic graphs require NP time for coloring).
Interference graphs of SSA programs are chordal. (Note that arbitrary
pre-coloring and/or register aliasing support for chordal graphs, as
required for real-world register allocation, may push complexity back into
NP territory).
Conventional Register Allocation
TBD
Projects
Academic projects
-
SUIF1 - 1994, Stanford University
- "The SUIF (Stanford University Intermediate Format) 1.x compiler,
developed by the Stanford Compiler Group, is a free infrastructure
designed to support collaborative research in optimizing and
parallelizing compilers."
-
SUIF2 - 1999, Stanford University
- "A new version of the SUIF compiler system, a free infrastructure
designed to support collaborative research in optimizing and
parallelizing compilers. It is currently in the beta test stage
of development."
- Machine SUIF aka machsuif - "Fork" of SUIF1/SUIF2, Harvard University
-
NCI (National Compiler Infrastructure) (archive.org) - 1998-200x? Collaborative project among US universities
- "the National Compiler Infrastructure project has two components:"
- SUIF
-
Zephyr
This list is compiled and maintained by Paul Sokolovsky, and released under
Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0).