# Bentley Rules for Optimizing Work

## Agenda

- Data structures
- Packing and encoding
- Augmentation
- Precomputation
- Compile-time initialization
- Caching
- Lazy evaluation
- Sparsity

- Loops
- Hoisting
- Sentinels
- Loop unrolling
- Loop fusion
- Eliminating wasted iterations

- Logic
- Constant folding and propagation
- Common-subexpression elimination
- Algebraic identities
- Short-circuiting
- Ordering tests
- Creating a fast path
- Combining tests

- Functions
- Inlining
- Tail-recursion elimination
- Coarsening recursion

## Data Structures

### Packing and Encoding

The idea of **packing** is to store more than one data value in a machine word. The related idea of **encoding** is to convert data values into a representation requiring fewer bits.

### Augmentation

The idea of data-structure augmentation is to add information to a data structure to make common operations do less work.

Example: Appending singly linked lists

- Appending one list to another requires walking the length of the first list to set its null pointer to the start of the second.
- AUgmenting the list with a tail pointer allows appending to operate in a constant time.

### Precomputation

The idea of **precomputation** is to perform calculations in advance so as to avoid doing them at “mission-critical” times.

### Compile-time initialization

The idea of **compile-time initalization** is to store the values of constants during compilation, saving work at execution time.

Idea: Create large static tables by metaprogramming.

### Sparsity

THe idea of exploting **sparsity** is to avoid storing and computing on zeros. “The fastest way to compute is not to compute at all”.

// TODO: add examples

## Logic

### Constant Folding and Propagation

The idea of constant folding and propagation is to evaluate constant expressions and substitue the result into further expressions, all during compilation.

```
void orrery() {
const double x = 1;
const doube y = 2 * x;
const double z = M_PI * y;
//...
}
```

With a sufficiently high optimization level, all the expressions are evaluated at compile-time.

### Common-Subexpression Elimination

The idea of **common-subexpression elimination** is to avoid computing the same expression multiple times by evaluating the expression once and storing the result for later use.

### Algebraic Identities

The idea of **exploiting algebraic identities** is to preplace expensive algebraic expressions with algebraic equivalents that require less work.

### Short-Circuiting

WHen performing a series of tests, the idea of **short-circuiting** is to stop evaluating as soon as you know the answer.

### Ordering Tests

Consider code that executes a sequence of logical tests. The idea of **odering tests** is to perform those that are more often “successful” - a particular alternative is selected by the test - before tests that are rarely sucessful. Similarly, inexpensive tests should precede expensive ones.

```
#include
```
bool is_whitespace(char c) {
if (c == '\r' || c == '\t' || c == ' ' || c == '\n') {
return true;
}
return false;
}
</code>
</pre>
</div>
```
#include
```
bool is_whitespace(char c) {
if (c == ' ' || c == '\n' || c == '\t' || c == '\r') {
return true;
}
return false;
}
</code>
</pre>
</div>
</div>
## Loops
### Hoisting
The goal of **hoisting** - also called **loop-invariant code motion** - is to avoid recomputing loop-invariant code each time through the body of a loop.
```
#include
```
void scale(double* X, double* Y, int N) {
for (int i=0; i<N; i++) {
Y[i] = X[i] * exp(sqrt(M_PI/2));
}
}
</code>
</pre>
</div>
```
#include
```
void scale(double* X, double* Y, int N) {
double factor = exp(sqrt(M_PI/2));
for (int i=0; i<N; i++) {
Y[i] = X[i] * factor;
}
}
</pre>
</div>
</div>
### Loop Unrolling
**Loop unrolling** attempts to save work by combining several consecutive iterations of a loop into a single iteration, thereby reducing the total number of iterations of the loop and, consequently, the number of times that the instructions that control the loop must be executed.
- **Full** loop unrolling - All iterations are unrolled.
- **Partial** loop unrolling - Several, but not all, of the iterations are unrolled.
```cpp
int sum = 0;
for(int i=0; i<n; ++i) {
sum += A[i];
}
// partial loop unrolling
int sum = 0;
int j;
for (j=0; j<n-3; ++j) {
sum += A[j];
sum += A[j+1];
sum += A[j+2];
sum += A[j+3];
}
for(int i=j; i<n; ++i){
sum += A[i];
}
```
Benefits of loop unrolling
- Lower number of instructions in loop control code
- Enable more compiler optimizations
Unrolling too much can cause poor use of instruction cache
## Functions
### Inlining
The idea of **inlining** is to avoid the overhead of a function call by replacing a call to the function with the body of the function itself.
```c
double square(doube x) {
return x*x;
}
double sum_of_squares(double* A, int n) {
double sum = 0.0f;
for (int i=0; i<n; ++i){
sum += square(A[i]);
}
return sum;
}
```
We can inline the `square` function to reduce the overhead (stack expend). To do that, we can simply mark fucntion using `static inline` keywords. It turns out for some modern compilers, the functions can be inlined without declaring "static inline".
Inline functions can be just as effcient as macros, and they are better structured.
## Closing Advice
- Avoid premature optimizations. First get correct working code. Then optimize, preserving correctness by regression testing.
- Reducing the work of a program does not necessarily decrease its running time, but it is a good heuristic.
- The compiler automates many low-level optimizations
- To tell if the compiler is acutally performing a particular optimization, look at the assembly code.
## Resources
- [Course Link](https://www.youtube.com/watch?v=H-1-X9bkop8&list=PLUl4u3cNGP63VIBQVWguXxZZi0566y7Wf)
- [Perforamnce Engineering of Software Systems](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2018/index.htm)