Выбрать главу

//: C05:Fibonacci.cpp

#include <iostream>

using namespace std;

template<int n>

struct Fib {

   enum {val = Fib<n-1>::val + Fib<n-2>::val};

};

template<>

struct Fib<1> {

   enum {val = 1};

};

template<>

struct Fib<0> {

   enum {val = 0};

};

int main() {

   cout << Fib<5>::val << endl;   // 6

   cout << Fib<20>::val << endl;  // 6765

} ///:~.

Fibonacci numbers are defined mathematically as:

The first two cases lead to the template specializations above, and the rule in the third line becomes the primary template.

Compile-time looping

To compute any loop in a template metaprogram, it must first be reformulated recursively. For example, to raise the integer n to the power p, instead of using a loop such as in the following lines:.

int val = 1;

while (p--)

  val *= n;

you would have to think of it as a recursive procedure:

int power(int n, int p) {

  return (p == 0) ? 1 : n*power(n, p - 1);

}

This can now be easily rendered as a template metaprogram as follows:

//: C05:Power.cpp

#include <iostream>

using namespace std;

template<int N, int P>

struct Power {

  enum {val = N * Power<N, P-1>::val};

};

template<int N>

struct Power<N, 0> {

  enum {val = 1};

};

int main() {

  cout << Power<2, 5>::val << endl;  // 32

} ///:~

Note that we need to use a partial specialization for the stopping condition, since the value N is still a free template parameter. This program only works for non-negative powers, of course.

The following metaprogram adapted from Czarnecki and Eisenecker[68] is interesting in that it uses a template template parameter, and simulates passing a function as a parameter to another function, which "loops through" the numbers 0..n.

//: C05:Accumulate.cpp

// Passes a "function" as a parameter at compile time

#include <iostream>

using namespace std;

// Accumulates the results of F(0)..F(n)

template<int n, template<int> class F>

struct Accumulate {

   enum {val = Accumulate<n-1, F>::val + F<n>::val};

};

// The stopping criterion (returns the value F(0))

template<template<int> class F>

struct Accumulate<0, F> {

   enum {val = F<0>::val};

};

// Various "functions":

template<int n>

struct Identity {

   enum {val = n};

};

template<int n>

struct Square {

   enum {val = n*n};

};

template<int n>

struct Cube {

   enum {val = n*n*n};

};

int main() {

   cout << Accumulate<4, Identity>::val << endl; // 10

   cout << Accumulate<4, Square>::val << endl;   // 30

   cout << Accumulate<4, Cube>::val << endl;     // 100

} ///:~.

The primary Accumulate template attempts to compute the sum F(n)+F(n‑1)…F(0). The stopping criterion is obtained by a partial specialization, which "returns" F(0). The parameter F is itself a template, and acts like a function as in the previous examples in this section. The templates Identity, Square, and Cube compute the corresponding functions of their template parameter that their names suggest. The first instantiation of Accumulate in main( ) computes the sum 4+3+2+1+0, because the Identity function simply "returns" its template parameter. The second line in main( ) adds the squares of those numbers (16+9+4+1+0), and the last computes the sum of the cubes (64+27+8+1+0).

Loop unrolling

Algorithm designers have always endeavored to optimize their programs. One time-honored optimization, especially for numeric programming, is loop unrolling, a technique that minimizes loop overhead. The quintessential loop-unrolling example is matrix multiplication. The following function multiplies a matrix and a vector. (The constants rows and cols have been previously defined.):.

void mult(int a[rows][cols], int x[cols], int y[cols]) {

  for (int i = 0; i < rows; ++i) {

      y[i] = 0;

      for (int j = 0; j < cols; ++j)

        y[i] += a[i][j]*x[j];

  }

}

If cols is an even number, the overhead of incrementing and comparing the loop control variable j can be cut in half by "unrolling" the computation into pairs in the inner loop:.

void mult(int a[rows][cols], int x[cols], int y[cols]) {

  for (int i = 0; i < rows; ++i) {

      y[i] = 0;

      for (int j = 0; j < cols; j += 2)

        y[i] += a[i][j]*x[j] + a[i][j+1]*x[j+1];

  }

}

In general, if cols is a factor of k, k operations can be performed each time the inner loop iterates, greatly reducing the overhead. The savings is only noticeable on large arrays, but that is precisely the case with industrial-strength mathematical computations.

Function inlining also constitutes a form of loop unrolling. Consider the following approach to computing powers of integers.

//: C05:Unroll.cpp

// Unrolls an implicit loop via inlining

#include <iostream>

using namespace std;

template<int n>

inline int power(int m) {

   return power<n-1>(m) * m;

}

template<>

inline int power<1>(int m) {

   return m;

}

template<>

inline int power<0>(int m) {

   return 1;

}

int main()

{

   int m = 4;

   cout << power<3>(m) << endl;

} ///:~.

Conceptually, the compiler must generate three specializations of power<>, one each for the template parameters 3, 2, and 1. Because the code for each of these functions can be inlined, the actual code that is inserted into main( ) is the single expression m*m*m. Thus, a simple template specialization coupled with inlining here provides a way to totally avoid loop control overhead.[69] This approach to loop unrolling is limited by your compiler’s inlining depth, of course.