Bentley Rules for Optimizing Work


Agenda

  • Data structures
    • Packing and encoding
    • Augmentation
    • Precomputation
    • Compile-time initialization
    • Caching
    • Lazy evaluation
    • Sparsity
  • Loops
    • Hoisting
    • Sentinels
    • Loop unrolling
    • Loop fusion
    • Eliminating wasted iterations
  • Logic
    • Constant folding and propagation
    • Common-subexpression elimination
    • Algebraic identities
    • Short-circuiting
    • Ordering tests
    • Creating a fast path
    • Combining tests
  • Functions
    • Inlining
    • Tail-recursion elimination
    • Coarsening recursion

Data Structures

Packing and Encoding

The idea of packing is to store more than one data value in a machine word. The related idea of encoding is to convert data values into a representation requiring fewer bits.

Augmentation

The idea of data-structure augmentation is to add information to a data structure to make common operations do less work.

Example: Appending singly linked lists

  • Appending one list to another requires walking the length of the first list to set its null pointer to the start of the second.
  • AUgmenting the list with a tail pointer allows appending to operate in a constant time.

Precomputation

The idea of precomputation is to perform calculations in advance so as to avoid doing them at “mission-critical” times.

Compile-time initialization

The idea of compile-time initalization is to store the values of constants during compilation, saving work at execution time.

Idea: Create large static tables by metaprogramming.

Sparsity

THe idea of exploting sparsity is to avoid storing and computing on zeros. “The fastest way to compute is not to compute at all”.

// TODO: add examples

Logic

Constant Folding and Propagation

The idea of constant folding and propagation is to evaluate constant expressions and substitue the result into further expressions, all during compilation.

void orrery() {
    const double x = 1;
    const doube y = 2 * x;
    const double z = M_PI * y;
    //...
}

With a sufficiently high optimization level, all the expressions are evaluated at compile-time.

Common-Subexpression Elimination

The idea of common-subexpression elimination is to avoid computing the same expression multiple times by evaluating the expression once and storing the result for later use.

Algebraic Identities

The idea of exploiting algebraic identities is to preplace expensive algebraic expressions with algebraic equivalents that require less work.

Short-Circuiting

WHen performing a series of tests, the idea of short-circuiting is to stop evaluating as soon as you know the answer.

Ordering Tests

Consider code that executes a sequence of logical tests. The idea of odering tests is to perform those that are more often “successful” - a particular alternative is selected by the test - before tests that are rarely sucessful. Similarly, inexpensive tests should precede expensive ones.


#include 

bool is_whitespace(char c) {
  if (c == '\r' || c == '\t' || c == ' ' || c == '\n') {
    return true;
  }
  return false;
}
</code>
</pre>
</div>


#include 

bool is_whitespace(char c) {
  if (c == ' ' || c == '\n' || c == '\t' || c == '\r') {
    return true;
  }
  return false;
}
</code>
</pre>
</div>

</div>

## Loops

### Hoisting 

The goal of **hoisting** - also called **loop-invariant code motion** - is to avoid recomputing loop-invariant code each time through the body of a loop.



#include 

void scale(double* X, double* Y, int N) {
  for (int i=0; i<N; i++) {
    Y[i] = X[i] * exp(sqrt(M_PI/2));
  }
}

</code>
</pre>
</div>

#include 

void scale(double* X, double* Y, int N) {
  double factor = exp(sqrt(M_PI/2));
  for (int i=0; i<N; i++) {
    Y[i] = X[i] * factor;
  }
}

</pre>
</div>
</div>

### Loop Unrolling

**Loop unrolling** attempts to save work by combining several consecutive iterations of a loop into a single iteration, thereby reducing the total number of iterations of the loop and, consequently, the number of times that the instructions that control the loop must be executed.

- **Full** loop unrolling - All iterations are unrolled.
- **Partial** loop unrolling - Several, but not all, of the iterations are unrolled.

```cpp
int sum = 0;
for(int i=0; i<n; ++i) {
  sum += A[i];
}

// partial loop unrolling
int sum = 0;
int j;
for (j=0; j<n-3; ++j) {
  sum += A[j];
  sum += A[j+1];
  sum += A[j+2];
  sum += A[j+3];
}
for(int i=j; i<n; ++i){
  sum += A[i];
}
```

Benefits of loop unrolling
- Lower number of instructions in loop control code
- Enable more compiler optimizations

Unrolling too much can cause poor use of instruction cache

## Functions

### Inlining

The idea of **inlining** is to avoid the overhead of a function call by replacing a call to the function with the body of the function itself.

```c
double square(doube x) {
    return x*x;
}

double sum_of_squares(double* A, int n) {
    double sum = 0.0f;
    for (int i=0; i<n; ++i){
        sum += square(A[i]);
    }
    return sum;
}
```

We can inline the `square` function to reduce the overhead (stack expend). To do that, we can simply mark fucntion using `static inline` keywords. It turns out for some modern compilers, the functions can be inlined without declaring "static inline".

Inline functions can be just as effcient as macros, and they are better structured.

## Closing Advice

- Avoid premature optimizations. First get correct working code. Then optimize, preserving correctness by regression testing.
- Reducing the work of a program does not necessarily decrease its running time, but it is a good heuristic.
- The compiler automates many low-level optimizations
- To tell if the compiler is acutally performing a particular optimization, look at the assembly code.


## Resources

- [Course Link](https://www.youtube.com/watch?v=H-1-X9bkop8&list=PLUl4u3cNGP63VIBQVWguXxZZi0566y7Wf)
- [Perforamnce Engineering of Software Systems](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2018/index.htm)