Bentley Rules for Optimizing Work
    
    
Agenda
- Data structures
    - Packing and encoding
- Augmentation
- Precomputation
- Compile-time initialization
- Caching
- Lazy evaluation
- Sparsity
 
- Loops
    - Hoisting
- Sentinels
- Loop unrolling
- Loop fusion
- Eliminating wasted iterations
 
- Logic
    - Constant folding and propagation
- Common-subexpression elimination
- Algebraic identities
- Short-circuiting
- Ordering tests
- Creating a fast path
- Combining tests
 
- Functions
    - Inlining
- Tail-recursion elimination
- Coarsening recursion
 
Data Structures
Packing and Encoding
The idea of packing is to store more than one data value in a machine word. The related idea of encoding is to convert data values into a representation requiring fewer bits.
Augmentation
The idea of data-structure augmentation is to add information to a data structure to make common operations do less work.
Example: Appending singly linked lists
- Appending one list to another requires walking the length of the first list to set its null pointer to the start of the second.
- AUgmenting the list with a tail pointer allows appending to operate in a constant time.
Precomputation
The idea of precomputation is to perform calculations in advance so as to avoid doing them at “mission-critical” times.
Compile-time initialization
The idea of compile-time initalization is to store the values of constants during compilation, saving work at execution time.
Idea: Create large static tables by metaprogramming.
Sparsity
THe idea of exploting sparsity is to avoid storing and computing on zeros. “The fastest way to compute is not to compute at all”.
// TODO: add examples
Logic
Constant Folding and Propagation
The idea of constant folding and propagation is to evaluate constant expressions and substitue the result into further expressions, all during compilation.
void orrery() {
    const double x = 1;
    const doube y = 2 * x;
    const double z = M_PI * y;
    //...
}
With a sufficiently high optimization level, all the expressions are evaluated at compile-time.
Common-Subexpression Elimination
The idea of common-subexpression elimination is to avoid computing the same expression multiple times by evaluating the expression once and storing the result for later use.
Algebraic Identities
The idea of exploiting algebraic identities is to preplace expensive algebraic expressions with algebraic equivalents that require less work.
Short-Circuiting
WHen performing a series of tests, the idea of short-circuiting is to stop evaluating as soon as you know the answer.
Ordering Tests
Consider code that executes a sequence of logical tests. The idea of ordering tests is to perform those that are more often “successful” - a particular alternative is selected by the test - before tests that are rarely sucessful. Similarly, inexpensive tests should precede expensive ones.
// before:
bool is_whitespace(char c) {
  if (c == '\r' || c == '\t' || c == ' ' || c == '\n') {
    return true;
  }
  return false;
}
// after
bool is_whitespace(char c) {
  if (c == ' ' || c == '\n' || c == '\t' || c == '\r') {
    return true;
  }
  return false;
}
Loops
Hoisting
The goal of hoisting - also called loop-invariant code motion - is to avoid recomputing loop-invariant code each time through the body of a loop.
// before:
void scale(double* X, double* Y, int N) {
  for (int i=0; i<N; i++) {
    Y[i] = X[i] * exp(sqrt(M_PI/2));
  }
}
// after:
void scale(double* X, double* Y, int N) {
  double factor = exp(sqrt(M_PI/2));
  for (int i=0; i<N; i++) {
    Y[i] = X[i] * factor;
  }
}
Loop Unrolling
Loop unrolling attempts to save work by combining several consecutive iterations of a loop into a single iteration, thereby reducing the total number of iterations of the loop and, consequently, the number of times that the instructions that control the loop must be executed.
- Full loop unrolling - All iterations are unrolled.
- Partial loop unrolling - Several, but not all, of the iterations are unrolled.
int sum = 0;
for(int i=0; i<n; ++i) {
  sum += A[i];
}
// partial loop unrolling
int sum = 0;
int j;
for (j=0; j<n-3; ++j) {
  sum += A[j];
  sum += A[j+1];
  sum += A[j+2];
  sum += A[j+3];
}
for(int i=j; i<n; ++i){
  sum += A[i];
}
Benefits of loop unrolling
- Lower number of instructions in loop control code
- Enable more compiler optimizations
Unrolling too much can cause poor use of instruction cache
Functions
Inlining
The idea of inlining is to avoid the overhead of a function call by replacing a call to the function with the body of the function itself.
double square(doube x) {
    return x*x;
}
double sum_of_squares(double* A, int n) {
    double sum = 0.0f;
    for (int i=0; i<n; ++i){
        sum += square(A[i]);
    }
    return sum;
}
We can inline the square function to reduce the overhead (stack expend). To do that, we can simply mark fucntion using static inline keywords. It turns out for some modern compilers, the functions can be inlined without declaring “static inline”.
Inline functions can be just as effcient as macros, and they are better structured.
Closing Advice
- Avoid premature optimizations. First get correct working code. Then optimize, preserving correctness by regression testing.
- Reducing the work of a program does not necessarily decrease its running time, but it is a good heuristic.
- The compiler automates many low-level optimizations
- To tell if the compiler is acutally performing a particular optimization, look at the assembly code.
