source: trunk/ADOL-C/doc/adolc-manual.tex @ 763

Last change on this file since 763 was 763, checked in by kulshres, 3 months ago

Fix typos and improve typesetting w.r.t. sec. external functions
regenerate doc

From: Martin Schroschk <martin.schroschk@…>

File size: 227.9 KB
Line 
1% Latex file containing the documentation of ADOL-C
2%
3% Copyright (C) Andrea Walther, Andreas Griewank, Andreas Kowarz,
4%               Hristo Mitev, Sebastian Schlenkrich, Jean Utke, Olaf Vogel
5%
6% This file is part of ADOL-C. This software is provided as open source.
7% Any use, reproduction, or distribution of the software constitutes
8% recipient's acceptance of the terms of the accompanying license file.
9%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
10
11\documentclass[11pt,twoside]{article}
12\usepackage{hyperref}
13\usepackage{amsmath,amsthm,amssymb}
14\usepackage{graphicx}
15\usepackage{datetime}
16
17\newdateformat{monthyear}{\monthname\ \THEYEAR}
18
19\usepackage{color}
20
21\pagestyle{headings} 
22\bibliographystyle{plain}
23\parskip=6pt
24
25\setlength{\textwidth}{433.6pt}
26\setlength{\oddsidemargin}{23pt}
27\setlength{\evensidemargin}{23pt}
28\setlength{\topmargin}{25.0pt}
29\setlength{\textheight}{580pt}
30\setlength{\baselineskip}{8pt}
31
32\newcommand{\N}{{ {\rm I} \kern -.225em {\rm N} }}
33\newcommand{\R}{{ {\rm I} \kern -.225em {\rm R} }}
34\newcommand{\T}{{ {\rm I} \kern -.425em {\rm T} }}
35
36\renewcommand{\sectionautorefname}{Section}
37\renewcommand{\subsectionautorefname}{Section}
38\renewcommand{\figureautorefname}{Figure}
39\renewcommand{\tableautorefname}{Table}
40
41\setcounter{equation}{0}
42
43\input{version.tex}
44
45%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
46\begin{document}
47
48\begin{titlepage}
49\begin{center}
50{\Large {\bf ADOL-C:}} 
51\footnote{The development of earlier versions was supported by the Office of
52  Scientific Computing, U.S. Department of Energy, the NSF, and the Deutsche
53  Forschungsgemeinschaft. During the development of the current
54  version Andrea Walther and Andreas Kowarz were supported by the
55  grant Wa 1607/2-1 of the Deutsche Forschungsgemeinschaft} 
56\vspace{0.2in} \\
57%
58{\Large A Package for the Automatic Differentiation}\vspace{0.1in} \\
59{\Large of Algorithms Written in C/C++}\\
60\vspace{.2in}
61{\large\bf  Version \packageversion, \monthyear\today} \\
62\bigskip
63 \mbox{Andrea Walther}\footnote{Institute of Mathematics, University
64   of Paderborn, 33098 Paderborn, Germany} and
65 \mbox{Andreas Griewank}\footnote{Department of Mathematics,
66 Humboldt-Universit\"at zu Berlin, 10099 Berlin, Germany}
67\end{center}
68%
69%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
70\begin{abstract}
71The C++ package ADOL-C described here facilitates the evaluation of
72first and higher derivatives of vector functions that are defined
73by computer programs written in C or C++. The resulting derivative
74evaluation routines may be called from C, C++, Fortran, or any other
75language that can be linked with C.
76
77The numerical values of derivative vectors are obtained free
78of truncation errors at a small multiple of the run time and
79random access memory required by the given function evaluation program.
80Derivative matrices are obtained by columns, by rows or in sparse format.
81For solution curves defined by ordinary differential equations,
82special routines are provided that evaluate the Taylor coefficient vectors
83and their Jacobians with respect to the current state vector.
84For explicitly or implicitly defined functions derivative tensors are
85obtained with a complexity that grows only quadratically in their
86degree. The derivative calculations involve a possibly substantial but
87always predictable amount of data. Since the data is accessed strictly sequentially
88it can be automatically paged out to external files.
89\end{abstract}
90%
91{\bf Keywords}: Computational Differentiation, Automatic
92         Differentiation,
93         Chain Rule, Overloading, Taylor Coefficients,
94         Gradients, Hessians, Forward Mode, Reverse Mode,
95         Implicit Function Differentiation, Inverse Function Differentiation
96\medskip
97
98\noindent
99{\bf Abbreviated title}: Automatic differentiation by overloading in C++
100%
101\end{titlepage}
102%
103\tableofcontents       
104%
105
106\newpage
107%
108%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
109%
110%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
111\section{Preparing a Section of C or C++ Code for Differentiation}
112\label{prepar}
113%
114\subsection{Introduction}
115%
116\setcounter{equation}{0}
117The package \mbox{ADOL-C} 
118utilizes overloading in C++, but the
119user has to know only C. The acronym stands for {\bf A}utomatic
120{\bf D}ifferentiation by {\bf O}ver{\bf L}oading in {\bf C}++.
121In contrast to source transformation approaches, overloading does not generate intermediate
122source code.
123As starting points to retrieve further information on techniques and
124application of automatic differentiation, as well as on other AD
125tools, we refer to the book \cite{GrWa08}. Furthermore, the web page
126\verb=http://www.autodiff.org= of the AD community forms a rich source
127of further information and pointers.
128
129
130ADOL-C facilitates the simultaneous
131evaluation of arbitrarily high directional derivatives and the
132gradients of these Taylor coefficients with respect to all independent
133variables. Relative to the cost of evaluating the underlying function,
134the cost for evaluating any such scalar-vector pair grows as the
135square of the degree of the derivative but is still completely
136independent of the numbers $m$ and $n$.
137
138This manual is organized as follows. This section explains the
139modifications required to convert undifferentiated code to code that
140compiles with ADOL-C.
141\autoref{tape} covers aspects of the tape of recorded data that ADOL-C uses to
142evaluate arbitrarily high order derivatives. The discussion includes storage
143requirements and the tailoring of certain tape characteristics to fit specific
144user needs. Descriptions of easy-to-use drivers for a  convenient derivative
145evaluation are contained in \autoref{drivers}.
146\autoref{forw_rev_ad} offers a more mathematical characterization of
147the different modes of AD to compute derivatives. At the same time, the
148corresponding drivers of ADOL-C are explained. 
149The overloaded derivative evaluation routines using the forward and the reverse
150mode of AD are explained in \autoref{forw_rev}.
151Advanced differentiation techniques as the optimal checkpointing for
152time integrations, the exploitation of fixed point iterations, the usages
153of external differentiated functions and the differentiation of OpenMP
154parallel programs are described in \autoref{adv_ad}.
155The traceless forward mode is presented in \autoref{traceless}.
156\autoref{install} details the installation and
157use of the ADOL-C package. Finally, \autoref{example} 
158furnishes some example programs that incorporate the ADOL-C package to
159evaluate first and higher-order
160derivatives.  These and other examples are distributed with the ADOL-C
161source code.
162The user should simply refer to them if the more abstract and general
163descriptions of ADOL-C provided in this document do not suffice.
164%
165%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
166\subsection{Declaring Active Variables}
167%
168\label{DecActVar}
169%
170The key ingredient of automatic differentiation by overloading is the
171concept of an {\em active variable}. All variables that may be
172considered as differentiable quantities at some time
173during the program execution must be of an active
174type. ADOL-C uses one
175active scalar type, called {\sf adouble}, whose real part is of the
176standard type {\sf double}.
177Typically, one will declare the independent variables
178and all quantities that directly or indirectly depend on them as
179{\em active}. Other variables that do not depend on the independent
180variables but enter, for example, as parameters, may remain one of the
181{\em passive} types {\sf double, float}, or {\sf int}. There is no
182implicit type conversion from {\sf adouble} to any of these passive
183types; thus, {\bf failure to declare variables as active when they
184depend on other active variables will result in a compile-time error
185message}. In data flow terminology, the set of active variable names
186must contain all its successors in the dependency graph. All components
187of indexed arrays must have the same activity status.
188
189The real component of an {\sf adouble x} can be extracted as
190{\sf x.value()}. In particular,
191such explicit conversions are needed for the standard output procedure
192{\sf printf}. The output stream operator \boldmath $\ll$ \unboldmath is overloaded such
193that first the real part of an {\sf adouble} and then the string
194``{\sf (a)}" is added to the stream. The input stream operator \boldmath $\gg$ \unboldmath  can
195be used to assign a constant value to an {\sf adouble}.
196Naturally, {\sf adouble}s may be
197components of vectors, matrices, and other arrays, as well as
198members of structures or classes.
199
200The C++ class {\sf adouble}, its member functions, and the overloaded
201versions of all arithmetic operations, comparison operators, and
202most ANSI C functions are contained in the file \verb=adouble.cpp= and its
203header \verb=<adolc/adouble.h>=. The latter must be included for compilation
204of all program files containing {\sf adouble}s and corresponding
205operations.
206%
207%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
208\subsection{Marking Active Sections}
209\label{markingActive}
210%
211All calculations involving active variables that occur between
212the void function calls
213\begin{center}
214{\sf trace\_on(tag,keep)} \hspace{0.3in} and \hspace{0.3in}
215{\sf trace\_off(file)}
216\end{center}
217are recorded on a sequential data set called {\em tape}. Pairs of
218these function calls can appear anywhere in a C++ program, but
219they must not overlap. The nonnegative integer argument {\sf tag} identifies the
220particular tape for subsequent function or derivative evaluations.
221Unless several tapes need to be kept, ${\sf tag} =0$ may be used throughout.
222The optional integer arguments {\sf keep} and
223{\sf file} will be discussed in \autoref{tape}. We will refer to the
224sequence of statements executed between a particular call to
225{\sf trace\_on} and the following call to {\sf trace\_off} as an
226{\em active section} of the code. The same active section may be
227entered repeatedly, and one can successively generate several traces
228on distinct tapes by changing the value of {\sf tag}.
229Both functions {\sf trace\_on} and {\sf trace\_off} are prototyped in
230the header file \verb=<adolc/taputil.h>=, which is included by the header
231\verb=<adolc/adouble.h>= automatically.
232
233Active sections may contain nested or even recursive calls to functions
234provided by the user. Naturally, their formal and actual parameters
235must have matching types. In particular, the functions must be
236compiled with their active variables declared as
237{\sf adouble}s and with the header file \verb=<adolc/adouble.h>= included. 
238Variables of type {\sf adouble} may be declared outside an active section and need not
239go out of scope before the end of an active section.
240It is not necessary -- though desirable -- that free-store {\sf adouble}s
241allocated within
242an active section be deleted before its completion. The values of all
243{\sf adouble}s that exist at the beginning and end of an active section
244are automatically
245recorded by {\sf trace\_on} and {\sf trace\_off}, respectively.
246%
247%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
248\subsection{Selecting Independent and Dependent Variables}
249%
250One or more active variables that are read in or initialized to
251the values of constants or passive variables must be distinguished as
252independent variables. Other active variables that are similarly
253initialized may be considered as temporaries (e.g., a variable that
254accumulates the partial sums of a scalar product after being
255initialized to zero). In order to distinguish an active variable {\sf x} as
256independent, ADOL-C requires an assignment of the form
257\begin{center}
258{\sf x} \boldmath $\ll=$ \unboldmath {\sf px}\hspace{0.2in}// {\sf px} of any passive numeric type $\enspace .$
259\end{center}
260This special initialization ensures that {\sf x.value()} = {\sf px}, and it should
261precede any other assignment to {\sf x}. However, {\sf x} may be reassigned
262other values subsequently. Similarly, one or more active variables {\sf y}
263must be distinguished as dependent by an assignment of the form
264\begin{center}
265{\sf y \boldmath $\gg=$ \unboldmath py}\hspace{0.2in}// {\sf py} of any  passive type $\enspace ,$ 
266\end{center}
267which ensures that {\sf py} = {\sf y.value()} and should not be succeeded
268by any other assignment to {\sf y}. However, a dependent variable {\sf y} 
269may have been assigned other real values previously, and it could even be an
270independent variable as well.  The derivative values calculated after
271the
272completion of an active section always represent {\bf derivatives of the final
273values of the dependent variables with respect to the initial values of the
274independent variables}.
275
276The order in which the independent and dependent variables are marked
277by the \boldmath $\ll=$ \unboldmath and \boldmath $\gg=$ \unboldmath statements matters crucially for the subsequent
278derivative evaluations. However, these variables do not have to be
279combined into contiguous vectors. ADOL-C counts the number of
280independent and dependent variable specifications within each active
281section and records them in the header of the tape.
282%
283%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
284\subsection{A Subprogram as an Active Section} 
285%
286As a generic example let us consider a C(++) function of the form
287shown in \autoref{code1}.
288%
289\begin{figure}[hbt]
290\framebox[\textwidth]{\parbox{\textwidth}{
291\begin{center}
292\begin{tabbing}
293{\sf void eval(}\= {\sf int n, int m,} \hspace{0.5 in} \=  // number of independents and dependents\\
294\>{\sf  double *x,} \> // independent variable vector \\
295\>{\sf  double *y,} \> // dependent variable vector  \\ 
296\> {\sf int *k, } \> // integer parameters \\ 
297\>{\sf  double *z)}  \> // real parameters \\
298{\sf \{ }\hspace{0.1 in } \=  \> // beginning of function body \\
299\>{\sf double t = 0;}  \> // local variable declaration \\
300\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> // begin of computation \\
301\>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue  \\
302\>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue \\
303\>{\sf  y[m-1] = t/m; }   \> //   end of computation \\
304{\sf  \} } \>  \> // end of function
305\end{tabbing}
306\end{center}
307}} 
308\caption{Generic example of a subprogram to be activated}
309\label{code1}
310\end{figure}
311%
312
313If {\sf eval} is to be called from within an active C(++)
314section with {\sf x}
315and {\sf y} as vectors of {\sf adouble}s and the other parameters
316passive, then one merely has to change the type declarations of all
317variables that depend on {\sf x} from {\sf double} or {\sf float} to
318{\sf adouble}. Subsequently, the subprogram must be compiled with the
319header file \verb=<adolc/adouble.h>= included as described
320in \autoref{DecActVar}. Now let us consider the situation when {\sf eval} is
321still to be called with integer and real arguments, possibly from
322a program written in Fortran77, which  does not allow overloading.
323
324To automatically compute derivatives of the dependent
325variables {\sf y} with respect to the independent variables {\sf x}, we
326can make the body of the function into an active section. For
327example, we may modify the previous program segment
328as in \autoref{adolcexam}.
329The renaming and doubling up of the original independent and dependent
330variable vectors by active counterparts may seem at first a bit clumsy.
331However, this transformation has the advantage that the calling
332sequence and the computational part, i.e., where the function is
333really evaluated, of {\sf eval} remain completely
334unaltered. If the temporary variable {\sf t} had remained a {\sf double},
335the code would not compile, because of a type conflict in the assignment
336following the declaration. More detailed example codes are listed in
337\autoref{example}.
338
339\begin{figure}[htb]
340\framebox[\textwidth]{\parbox{\textwidth}{
341\begin{center}
342\begin{tabbing}
343{\sf void eval(} \= {\sf  int n,m,} \hspace{1.0 in}\= // number of independents and dependents\\
344\> {\sf double *px,} \> // independent passive variable vector \\
345\> {\sf double *py,} \> // dependent passive variable vector  \\ 
346\> {\sf int *k,}  \> // integer parameters \\
347\> {\sf double *z)} \> // parameter vector \\
348{\sf \{}\hspace{0.1 in}\= \> // beginning of function body \\
349\>{\sf  short int tag = 0;} \>   // tape array and/or tape file specifier\\
350\>{\sf trace\_on(tag);} \> // start tracing  \\
351\>{\sf adouble *x, *y;} \> // declare active variable pointers \\
352\>{\sf x = new adouble[n];}\>// declare active independent variables \\ 
353\>{\sf y = new adouble[m];} \> // declare active dependent variables \\
354\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \\
355\>\hspace{0.2in} {\sf x[i] \boldmath $\ll=$ \unboldmath  px[i];} \> // select independent variables \\
356\>{\sf adouble t = 0;}  \> // local variable declaration \\
357     \>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> //  begin crunch \\
358     \>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue crunch \\
359     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
360     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
361     \>{\sf  y[m-1] = t/m; }   \> //   end crunch as before\\
362     \>{\sf for (int j=0; j \boldmath $<$ \unboldmath m; j++)} \\
363     \>\hspace{0.2in}{\sf y[j] \boldmath $\gg=$ \unboldmath py[j];} \> // select dependent variables \\
364     \>{\sf  delete[] y;} \>// delete dependent active variables \\
365     \>{\sf  delete[] x;} \>// delete independent active variables \\
366     \>{\sf trace\_off();} \> // complete tape \\
367{\sf  \}}   \>\> // end of function
368\end{tabbing} 
369\end{center}}}
370\caption{Activated version of the code listed in \autoref{code1}}
371\label{adolcexam}
372\end{figure}
373%
374%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
375\subsection{Overloaded Operators and Functions}
376\label{OverOper}
377%
378As in the subprogram discussed above, the actual computational
379statements of a C(++) code need not be altered for the purposes of
380automatic differentiation. All arithmetic operations, as well as the
381comparison and assignment operators, are overloaded, so any or all of
382their operands can be an active variable. An {\sf adouble x} occurring
383in a comparison operator is effectively replaced by its real value
384{\sf x.value()}. Most functions contained in the ANSI C standard for
385the math library are overloaded for active arguments. The only
386exceptions are the non-differentiable functions {\sf fmod} and
387{\sf modf}. Otherwise, legitimate C code in active sections can remain
388completely unchanged, provided the direct output of active variables
389is avoided. The rest of this subsection may be skipped by first time
390users who are not worried about marginal issues of differentiability
391and efficiency.
392
393The modulus {\sf fabs(x)} is everywhere Lipschitz continuous but not
394properly differentiable at the origin, which raises the question of
395how this exception ought to be handled. Fortunately, one can easily
396see that {\sf fabs(x)} and all its compositions with smooth
397functions are still directionally differentiable. These
398directional derivatives of arbitrary order can be propagated in the
399forward mode without any ambiguity. In other words, the forward mode as
400implemented in ADOL-C  computes Gateaux derivatives
401in certain directions, which reduce to Fr\'echet derivatives only
402if the dependence on the direction is linear. Otherwise,
403the directional derivatives are merely positively homogeneous with
404respect to the scaling of the directions.
405For the reverse mode, ADOL-C sets the derivative of {\sf fabs(x)} at
406the origin somewhat arbitrarily to zero.
407
408We have defined binary functions {\sf fmin} and {\sf fmax} for {\sf adouble}
409arguments, so that function and derivative values are obtained consistent
410with those of {\sf fabs} according to the identities
411\[
412 \min(a,b) = [a+b-|a-b|]/2 \quad {\rm and} \quad
413 \max(a,b) = [a+b+|a-b|]/2 \quad .
414\]
415These relations cannot hold if either $a$ or $b$ is infinite, in which
416case {\sf fmin} or {\sf fmax} and their derivatives may still be well
417defined. It should be noted that the directional differentiation of
418{\sf fmin} and {\sf fmax} yields at ties $a=b$ different results from
419the corresponding assignment based on the sign of $a-b$. For example,
420the statement
421\begin{center}
422 {\sf if (a $<$ b) c = a; else c = b;}
423\end{center} 
424yields for {\sf a}~=~{\sf b} and {\sf a}$^\prime < $~{\sf b}$^\prime$
425the incorrect directional derivative value
426{\sf c}$^\prime = $~{\sf  b}$^\prime$ rather than the correct
427{\sf c}$^\prime = $~{\sf  a}$^\prime$. Therefore this form of conditional assignment
428should be avoided by use of the function $\sf fmin(a,b)$. There
429are also versions of {\sf fmin} and {\sf fmax} for two passive
430arguments and mixed passive/active arguments are handled by
431implicit conversion.
432On the function class obtained by composing the modulus with real
433analytic functions, the concept of directional differentiation can be
434extended to the propagation of unique one-sided Taylor expansions.
435The branches taken by {\sf fabs, fmin}, and {\sf fmax}, are recorded
436on the tape.
437
438The functions {\sf sqrt}, {\sf pow}, and some inverse trigonometric
439functions have infinite slopes at the boundary points of their domains.
440At these marginal points the derivatives are set by ADOL-C to
441either {\sf $\pm$InfVal}, 0
442or {\sf NoNum}, where {\sf InfVal} and {\sf NoNum} are user-defined
443parameters, see \autoref{Customizing}.
444On IEEE machines {\sf InfVal} can be set to the special value
445{\sf Inf}~=~$1.0/0.0$ and {\sf NoNum} to {\sf NaN}~=~$0.0/0.0$.
446For example, at {\sf a}~=~0 the first derivative {\sf b}$^\prime$ 
447of {\sf b}~=~{\sf sqrt(a)} is set to
448\[
449{\sf b}^\prime = \left\{
450\begin{array}{ll}
451\sf InfVal&\mbox{if}\;\; {\sf a}^\prime>0  \\
4520&\mbox{if}\;\;{\sf a}^\prime =0 \\
453\sf NoNum&\mbox{if}\;\;{\sf a}^\prime <0\\
454\end{array} \right\enspace .
455\]
456In other words, we consider {\sf a} and
457consequently {\sf b}  as a constant when {\sf a}$^\prime$ or more generally
458all computed Taylor coefficients are zero.
459
460The general power function ${\sf pow(x,y)=x^y}$ is computed whenever
461it is defined for the corresponding {\sf double} arguments. If {\sf x} is
462negative, however, the partial derivative with respect to an integral exponent
463is set to zero.
464%Similarly, the partial of {\bf pow} with respect to both arguments
465%is set to zero at the origin, where both arguments vanish.     
466The derivatives of the step functions
467{\sf floor}, {\sf ceil}, {\sf frexp}, and {\sf ldexp} are set to zero at all
468arguments {\sf x}. The result values of the step functions
469are recorded on the tape and can later be checked to recognize
470whether a step to another level was taken during a forward sweep
471at different arguments than at taping time.
472
473Some C implementations supply other special
474functions, in particular the error function {\sf erf(x)}. For the
475latter, we have included an {\sf adouble} version in \verb=<adouble.cpp>=, which
476has been commented out for systems on which the {\sf double} valued version
477is not available. The increment and decrement operators {\sf ++}, \boldmath $--$ \unboldmath (prefix and
478postfix) are available for {\sf adouble}s.
479%
480% XXX: Vector and matrix class have to be reimplemented !!!
481%
482% and also the
483%active subscripts described in the \autoref{act_subscr}.
484Ambiguous statements like {\sf a += a++;} must be
485avoided because the compiler may sequence the evaluation of the
486overloaded
487expression differently from the original in terms of {\sf double}s.
488
489As we have indicated above, all subroutines called with active arguments
490must be modified or suitably overloaded. The simplest procedure is
491to declare the local variables of the function as active so that
492their internal calculations are also recorded on the tape.
493Unfortunately, this approach is likely to be unnecessarily inefficient
494and inaccurate if the original subroutine evaluates a special function
495that is defined as the solution of a particular mathematical problem.
496The most important examples are implicit functions, quadratures,
497and solutions of ordinary differential equations. Often
498the numerical methods for evaluating such special functions are
499elaborate, and their internal workings are not at all differentiable in
500the data. Rather than differentiating through such an adaptive
501procedure, one can obtain first and higher derivatives directly from
502the mathematical definition of the special function. Currently this
503direct approach has been implemented only for user-supplied quadratures
504as described in \autoref{quadrat}.
505%
506%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
507\subsection{Reusing the Tape for Arbitrary Input Values}
508\label{reuse_tape}
509%
510In some situations it may be desirable to calculate the value and
511derivatives of a function at arbitrary arguments by using a tape of
512the function evaluation at one argument and reevaluating the
513function  and its derivatives using the given ADOL-C
514routines. This approach can
515significantly reduce run times, and it
516also allows to port problem functions, in the form of the 
517corresponding tape files, into a computing environment that
518does not support C++ but does support C or Fortran. 
519Therefore, the routines provided by ADOL-C for the evaluation of derivatives
520can be used to at arguments $x$ other than the
521point at which the tape was generated, provided there are
522no user defined quadratures and all comparisons involving
523{\sf adouble}s yield the same result. The last condition
524implies that the control flow is unaltered by the change
525of the independent variable values. Therefore, this sufficient
526condition is tested by ADOL-C and if it is not met
527the ADOL-C routine called for derivative calculations indicates this
528contingency through its return value. Currently, there are six return values,
529see \autoref{retvalues}.
530\begin{table}[h]
531\center\small
532\begin{tabular}{|r|l|}\hline
533 +3 &
534\begin{minipage}{12.5cm}
535\vspace*{1ex}
536The function is locally analytic.
537\vspace*{1ex}
538\end{minipage} \\ \hline
539 +2 &
540\begin{minipage}{12.5cm}
541\vspace*{1ex}
542The function is locally analytic but the sparsity
543structure (compared to the situation at the  taping point)
544may have changed, e.g. while at taping arguments
545{\sf fmax(a,b)} returned {\sf a} we get {\sf b} at
546the argument currently used.
547\vspace*{1ex}
548\end{minipage} \\ \hline
549 +1 &
550\begin{minipage}{12.5cm}
551\vspace*{1ex}
552At least one of the functions {\sf fmin}, {\sf fmax} or {\sf fabs}
553is  evaluated at a tie or zero, respectively.  Hence, the function to be differentiated is
554Lipschitz-continuous but possibly non-differentiable.
555\vspace*{1ex}
556\end{minipage} \\ \hline
557 0 &
558\begin{minipage}{12.5cm}
559\vspace*{1ex}
560Some arithmetic comparison involving {\sf adouble}s yields a tie.
561Hence, the function to be differentiated  may be discontinuous.
562\vspace*{1ex}
563\end{minipage} \\ \hline
564 -1 &
565\begin{minipage}{12.5cm}
566\vspace*{1ex}
567An {\sf adouble} comparison yields different results
568from the evaluation point at which the tape was generated.
569\vspace*{1ex}
570\end{minipage} \\ \hline
571 -2 &
572\begin{minipage}{12.5cm}
573\vspace*{1ex}
574The argument of a user-defined quadrature has changed
575from the evaluation point at which the tape was generated.
576\vspace*{1ex}
577\end{minipage} \\ \hline
578\end{tabular}
579\caption{Description of return values}
580\label{retvalues}
581\end{table}                           
582
583\begin{figure}[h]
584\centering\includegraphics[width=10.0cm]{tap_point}
585\caption{Return values around the taping point}
586\label{fi:tap_point}
587\end{figure}         
588
589In \autoref{fi:tap_point} these return values are illustrated.
590If the user finds the return value of an ADOL-C routine to be negative the
591taping process simply has to be repeated by executing the active section again.
592The crux of the problem lies in the fact that the tape records only
593the operations that are executed during one particular evaluation of the
594function.
595It also has no way to evaluate integrals since the corresponding
596quadratures are never recorded on the tape.
597Therefore, when there are user-defined quadratures the retaping is necessary at each
598new point. If there are only branches conditioned on {\sf adouble}
599comparisons one may hope that re-taping becomes unnecessary when
600the points settle down in some small neighborhood, as one would
601expect for example in an iterative equation solver.
602%
603%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
604\subsection{Conditional Assignments}
605\label{condassign}
606%
607It appears unsatisfactory that, for example, a simple table lookup
608of some physical property forces the re-recording of a possibly
609much larger calculation. However, the basic philosophy of ADOL-C
610is to overload arithmetic, rather than to generate a new program
611with jumps between ``instructions'', which would destroy the
612strictly sequential tape access and
613require the infusion of substantial compiler technology.
614Therefore, we introduce the two constructs of conditional
615assignments and active integers as partial remedies to the
616branching problem.
617
618In many cases, the functionality of branches
619can be replaced by conditional assignments. 
620For this purpose, we provide a special function called
621{\sf condassign(a,b,c,d)}. Its calling sequence corresponds to the
622syntax of the conditional assignment
623\begin{center}
624    {\sf a = (b \boldmath $>$ \unboldmath 0) ? c : d;} 
625\end{center}
626which C++ inherited from C. However, here the arguments are restricted to be
627active or passive scalar arguments, and all expression arguments
628are evaluated before the test on {\sf  b}, which is different from
629the usual conditional assignment or the code segment.
630
631Suppose the original program contains the code segment
632\begin{center}
633{\sf if (b \boldmath $>$ \unboldmath 0) a = c; else a = d;}\\
634\end{center}
635Here, only one of the expressions (or, more generally, program blocks)
636{\sf c} and {\sf d} is evaluated, which exactly constitutes the problem
637for ADOL-C. To obtain the correct value {\sf a} with ADOL-C, one
638may first execute both branches and then pick either {\sf c}
639or {\sf d} using
640{\sf condassign(a,b,c,d)}. To maintain
641consistency with the original code, one has to make sure
642that the two branches do not have any side effects that can
643interfere with each other or may be important for subsequent
644calculations. Furthermore the test parameter {\sf b} has to be an
645{\sf adouble} or an {\sf adouble} expression. Otherwise the
646test condition {\sf b} is recorded on the tape as a {\em constant} with its
647run time value. Thus the original dependency of {\sf b} on
648active variables gets lost, for instance if {\sf b} is a comparison
649expression, see \autoref{OverOper}.
650If there is no {\sf else} part in a conditional assignment, one may call
651the three argument version
652{\sf condassign(a,b,c)}, which
653is logically equivalent to {\sf condassign(a,b,c,a)} in that
654nothing happens if {\sf b} is non-positive. 
655The header file \verb=<adolc/adouble.h>=
656contains also corresponding definitions of
657{\sf condassign(a,b,c,d)} 
658and {\sf condassign(a,b,c)} for
659passive {\sf double} arguments so that the modified code
660without any differentiation can be tested
661for correctness.
662
663A generalization of this concept for more than two branches, e.g.,
664akin to a \texttt{switch} statement or a cascade of \texttt{if...else if}, may be done by enabling
665\textsf{ADOLC\_ADVANCED\_BRANCHING} and performing selection on
666elements of an \texttt{advector} with active indices.
667%
668%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
669\subsection{Step-by-Step Modification Procedure}
670%
671To prepare a section of given C or C++ code for automatic
672differentiation as described above, one applies the following step-by-step procedure.
673\begin{enumerate}
674\item
675Use the statements {\sf trace\_on(tag)} or {\sf trace\_on(tag,keep)}
676and {\sf trace\_off()} or {\sf trace\_off(file)} to mark the
677beginning and end of the active section.
678\item 
679Select the set of active variables, and change their type from
680{\sf double} or {\sf float} to {\sf adouble}.
681\item
682Select a sequence of independent variables, and initialize them with
683\boldmath $\ll=$ \unboldmath assignments from passive variables or vectors.
684\item
685Select a sequence of dependent variables among the active variables,
686and pass their final values to passive variable or vectors thereof
687by \boldmath $\gg=$ \unboldmath assignments.
688\item 
689Compile the codes after including the header file \verb=<adolc/adouble.h>=.
690\end{enumerate}
691Typically, the first compilation will detect several type conflicts
692-- usually attempts to convert from active to passive
693variables or to perform standard I/O of active variables.
694Since all standard
695C programs can be activated by a mechanical application of the
696procedure above, the following section is of importance
697only to advanced users.
698%                                                                 
699%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
700\section{Numbering the Tapes and Controlling the Buffer}
701\label{tape}
702%
703The trace generated by the execution of an active section may stay
704within a triplet of internal arrays or it may be written out
705to three corresponding files. We will refer to these triplets as the
706tape array or tape file, in general tape, which may subsequently be
707used to evaluate the
708underlying function and its derivatives at the original point or at
709alternative arguments. If the active section involves user-defined
710quadratures it must be executed and
711re-taped at each new argument. Similarly, if conditions on
712{\sf adouble} values lead to a different program branch being taken at
713a new argument the evaluation process also needs to be re-taped at the
714new point. Otherwise, direct evaluation from
715the tape by the routine {\sf function} (\autoref{optdrivers}) is
716likely to be
717faster. The use of quadratures and the results of all comparisons on
718{\sf adouble}s are recorded on the tape so that {\sf function} and other
719forward routines stop and  return appropriate flags if their use without
720prior re-taping is unsafe. To avoid any re-taping certain types of
721branches can be recorded on the tape through
722the use of conditional assignments 
723described before in \autoref{condassign}.
724
725Several tapes may be generated and kept simultaneously.
726A tape array is used as a triplet of buffers or a tape file is generated if
727the length of any of the buffers exceeds the maximal array lengths of
728{\sf OBUFSIZE}, {\sf VBUFSIZE} or {\sf LBUFSIZE}. These parameters are
729defined in the header file \verb=<adolc/internal/usrparms.h>=
730and may be adjusted by the user in the header file before compiling
731the ADOL-C library, or on runtime using a file named \verb=.adolcrc=.
732Lines in this file must have the form
733\begin{verbatim}
734"VARIABLE" = "VALUE"
735\end{verbatim}
736where the quotation marks are mandatory.
737The filesystem folder, where the tapes files may be written to disk,
738can be changed by changing the definition of {\sf TAPE\_DIR} in
739the header file \verb=<adolc/dvlparms.h>= before
740compiling the ADOL-C library, or on runtime by defining {\sf
741  TAPE\_DIR} in the \verb=.adolcrc= file. By default this is defined
742to be the present working directory (\verb=.=).
743
744For simple usage, {\sf trace\_on} may be called with only the tape
745{\sf tag} as argument, and {\sf trace\_off} may be called
746without argument. The optional integer argument {\sf keep} of
747{\sf trace\_on} determines whether the numerical values of all
748active variables are recorded in a buffered temporary array or file
749called the taylor stack.
750This option takes effect if
751{\sf keep} = 1 and prepares the scene for an immediately following
752gradient evaluation by a call to a routine implementing the reverse mode
753as described in the \autoref{forw_rev_ad} and \autoref{forw_rev}. A
754file is used instead of an array if the size exceeds the maximal array
755length of {\sf TBUFSIZE} defined in \verb=<adolc/internal/usrparms.h>= and may
756be adjusted in the same way like the other buffer sizes mentioned above.
757Alternatively, gradients may be evaluated by a call
758to {\sf gradient}, which includes a preparatory forward sweep
759for the creation of the temporary file. If omitted, the argument
760{\sf  keep} defaults to 0, so that no temporary
761taylor stack file is generated.
762
763By setting the optional integer argument {\sf file} of
764{\sf  trace\_off} to 1, the user may force a numbered  tape
765file to be written even if the tape array (buffer) does not overflow.
766If the argument {\sf file} is omitted, it
767defaults to 0, so that the tape array is written onto a tape file only
768if the length of any of the buffers exceeds {\sf [OLVT]BUFSIZE} elements.
769
770After the execution of an active section, if a tape file was generated, i.e.,
771if the length of some buffer exceeded {\sf [OLVT]BUFSIZE} elements or if the
772argument {\sf file} of {\sf trace\_off} was set to 1, the files will be
773saved in the directory defined as {\sf ADOLC\_TAPE\_DIR} (by default
774the current working directory) under filenames formed by
775the strings {\sf ADOLC\_OPERATIONS\_NAME}, {\sf
776  ADOLC\_LOCATIONS\_NAME}, {\sf ADOLC\_VALUES\_NAME} and {\sf
777  ADOLC\_TAYLORS\_NAME} defined in
778the header file \verb=<adolc/dvlparms.h>= appended with the number
779given as the {\sf tag} argument to {\sf trace\_on} and have the
780extension {\sf .tap}.
781
782 Later, all problem-independent routines
783like {\sf gradient}, {\sf jacobian}, {\sf forward}, {\sf reverse}, and others
784expect as first argument a {\sf tag} to determine
785the tape on which their respective computational task is to be performed.
786By calling {\sf trace\_on} with different tape {\sf tag}s, one can create
787several tapes for various function evaluations and subsequently perform
788function and derivative evaluations on one or more of them.
789
790For example, suppose one wishes to calculate for two smooth functions
791$f_1(x)$ and $f_2(x)$ 
792\[
793   f(x) = \max \{f_1(x) ,f_2(x)\},\qquad \nabla f(x),
794\]
795and possibly higher derivatives where the two functions do not tie.
796Provided $f_1$ and $f_2$ are evaluated in two separate active sections,
797one can generate two different tapes by calling {\sf trace\_on} with
798{\sf tag} = 1 and {\sf tag} = 2 at the beginning of the respective active
799sections.
800Subsequently, one can decide whether $f(x)=f_1(x)$ or $f(x)=f_2(x)$ at the
801current argument and then evaluate the gradient $\nabla f(x)$ by calling
802{\sf gradient} with the appropriate argument value {\sf tag} = 1 or
803{\sf tag} = 2.
804%
805%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
806\subsection{Examining the Tape and Predicting Storage Requirements }
807\label{examiningTape}
808%
809At any point in the program, one may call the routine
810\begin{center}
811{\sf void tapestats(unsigned short tag, size\_t* counts)}
812\end{center}
813with {\sf counts} beeing an array of at least eleven integers.
814The first argument {\sf tag} specifies the particular tape of
815interest. The components of {\sf counts} represent
816\[
817\begin{tabular}{ll}
818{\sf counts[0]}: & the number of independents, i.e.~calls to \boldmath $\ll=$ \unboldmath, \\
819{\sf counts[1]}: & the number of dependents, i.e.~calls to \boldmath $\gg=$ \unboldmath,\\ 
820{\sf counts[2]}: & the maximal number of live active variables,\\
821{\sf counts[3]}: & the size of taylor stack (number of overwrites),\\
822{\sf counts[4]}: & the buffer size (a multiple of eight),
823\end{tabular}
824\]
825\[
826\begin{tabular}{ll}
827{\sf counts[5]}: & the total number of operations recorded,\\
828{\sf counts[6-13]}: & other internal information about the tape.
829\end{tabular}
830\]
831The values {\sf maxlive} = {\sf counts[2]} and {\sf tssize} = {\sf counts[3]} 
832determine the temporary
833storage requirements during calls to the routines
834implementing the forward and the reverse mode.
835For a certain degree {\sf deg} $\geq$ 0, the scalar version of the
836forward mode involves apart from the tape buffers an array of
837 $(${\sf deg}$+1)*${\sf maxlive} {\sf double}s in
838core and, in addition, a sequential data set called the value stack
839of {\sf tssize}$*${\sf keep} {\sf revreal}s if called with the
840option {\sf keep} $>$ 0. Here
841the type {\sf revreal} is defined as {\sf double} or {\sf float}. The latter choice halves the storage
842requirement for the sequential data set, which stays in core if
843its length is less than {\sf TBUFSIZE} bytes and is otherwise written
844out to a temporary file. The parameter {\sf TBUFSIZE} is defined in the header file \verb=<adolc/internal/usrparms.h>=.
845The drawback of the economical
846{\sf revreal} = {\sf float} choice is that subsequent calls to reverse mode implementations
847yield gradients and other adjoint vectors only in single-precision
848accuracy. This may be acceptable if the adjoint vectors
849represent rows of a Jacobian that is  used for the calculation of
850Newton steps. In its scalar version, the reverse mode implementation involves
851the same number of {\sf double}s and twice as many {\sf revreal}s as the
852forward mode implementation.
853The storage requirements of the vector versions of the forward mode and
854reverse mode implementation are equal to that of the scalar versions multiplied by
855the vector length.
856%
857%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
858\subsection{Customizing ADOL-C}
859\label{Customizing}
860%
861Based on the information provided by the routine {\sf tapestats}, the user may alter the
862following types and constant dimensions in the header file \verb=<adolc/internal/usrparms.h>=
863to suit his problem and environment.
864
865\begin{description}
866\item[{\sf OBUFSIZE}, {\sf LBUFSIZE}, {\sf VBUFSIZE}{\rm :}] These integer determines the length of
867in\-ter\-nal buf\-fers (default: 524$\,$288). If the buffers are large enough to accommodate all
868required data, any file access is avoided unless {\sf trace\_off}
869is called with a positive argument. This desirable situation can
870be achieved for many problem functions with an execution trace of moderate
871size. Primarily these values occur as an argument
872to {\sf malloc}, so that setting it unnecessarily large may have no
873ill effects, unless the operating system prohibits or penalizes large
874array allocations. It is however recommended to leave the values in
875\texttt{<adolc/internal/usrparms.h>} unchanged and set them using the
876\texttt{.adolcrc} file in the current working directory at runtime.
877
878\item[{\sf TBUFSIZE}{\rm :}] This integer determines the length of the
879in\-ter\-nal buf\-fer for a taylor stack (default: 524$\,$288).
880
881\item[{\sf TBUFNUM}{\rm :}] This integer determines the maximal number of taylor stacks (default: 32).
882
883\item[{\sf fint}{\rm :}] The integer data type used by Fortran callable versions of functions.
884
885\item[{\sf fdouble}{\rm :}] The floating point data type used by Fortran callable versions of functions.
886
887\item[{\sf inf\_num}{\rm :}] This together with {\sf inf\_den}
888sets the ``vertical'' slope {\sf InfVal} = {\sf inf\_num/inf\_den} 
889of special functions at the boundaries of their domains (default: {\sf inf\_num} = 1.0). On IEEE machines
890the default setting produces the standard {\sf Inf}. On non-IEEE machines
891change these values to produce a small {\sf InfVal} value and compare
892the results of two forward sweeps with different {\sf InfVal} settings
893to detect a ``vertical'' slope.
894
895\item[{\sf inf\_den}{\rm :}] See {\sf inf\_num} (default: 0.0).
896
897\item[{\sf non\_num}{\rm :}] This together with {\sf non\_den} 
898sets the mathematically
899undefined derivative value {\sf NoNum} = {\sf non\_num/non\_den}
900of special functions at the boundaries of their domains (default: {\sf non\_num} = 0.0). On IEEE machines
901the default setting produces the standard {\sf NaN}. On non-IEEE machines
902change these values to produce a small {\sf NoNum} value and compare
903the results of two forward sweeps with different {\sf NoNum} settings
904to detect the occurrence of undefined derivative values.
905
906\item[{\sf non\_den}{\rm :}] See {\sf non\_num} (default: 0.0).
907
908\item[{\sf ADOLC\_EPS}{\rm :}] For testing on small numbers to avoid overflows (default: 10E-20).
909
910\item[{\sf DIAG\_OUT}{\rm :}] File identifier used as standard output for ADOL-C diagnostics (default: stdout).
911\end{description}
912
913The following types and options may be set using the command-line
914options of the \texttt{./configure} script.
915
916\begin{description}
917\item[{\sf locint}{\rm :}] The range of the integer type
918{\sf locint} determines how many {\sf adouble}s can be simultaneously
919alive (default: {\sf unsigned int}).  In extreme cases when there are more than $2^{32}$ {\sf adouble}s
920alive at any one time, the type {\sf locint} must be changed to
921 {\sf unsigned long}. This can be done by passing
922 \texttt{--enable-ulong} to \texttt{./configure}.
923
924\item[{\sf revreal}{\rm :}] The choice of this floating-point type
925trades accuracy with storage for reverse sweeps (default: {\sf double}). While functions
926and their derivatives are always evaluated in double precision
927during forward sweeps, gradients and other adjoint vectors are obtained
928with the precision determined by the type {\sf revreal}. The less
929accurate choice {\sf revreal} = {\sf float} nearly halves the
930storage requirement during reverse sweeps. This can be done by passing
931\texttt{--disable-double} to \texttt{./configure}.
932
933\item[{\sf ATRIG\_ERF}{\rm :}] The overloaded versions of the inverse
934  hyperbolic functions and the error function are enabled (default:
935  undefined) by passing \texttt{--enable-atrig-erf}
936  to \texttt{./configure}
937
938\item[{\sf ADOLC\_USE\_CALLOC}{\rm :}] Selects the memory allocation routine
939  used by ADOL-C. {\sf Malloc} will be used if this variable is
940  undefined. {\sf ADOLC\_USE\_CALLOC} is defined by default to avoid incorrect
941  result caused by uninitialized memory. It can be set undefined by
942  passing \texttt{--disable-use-calloc} to \texttt{./configure}.
943
944\item[{\sf ADOLC\_ADVANCED\_BRANCHING}{\rm :}] Enables routines
945  required for automatic branch selection (default: disabled). The
946  boolean valued comparison operators with two \texttt{adouble} type
947  arguments will not return boolean values anymore and may not be used
948  in branch control statements (\texttt{if}, \texttt{while}, \texttt{for}
949  etc.). Instead conditional assignments using \texttt{condassign} or
950  selection operations on elements of \texttt{advector} type should be
951  used. Enabling this option and rewriting the function evaluation
952  using \texttt{condassign} or selections of \texttt{advector}
953  elements will prevent the need for retracing the function at branch
954  switches. This can be enabled by passing
955  \texttt{--enable-advanced-branching} to \texttt{./configure}.
956\end{description}
957%
958%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
959\subsection{Warnings and Suggestions for Improved Efficiency}
960\label{WarSug}
961%
962Since the type {\sf adouble} has a nontrivial constructor,
963the mere declaration of large {\sf adouble} arrays may take up
964considerable run time. The user should be warned against
965the usual Fortran practice of declaring fixed-size arrays
966that can accommodate the largest possible case of an evaluation program
967with variable dimensions. If such programs are converted to or written
968in C, the overloading in combination with ADOL-C will lead to very
969large run time increases for comparatively small values of the
970problem dimension, because the actual computation is completely
971dominated by the construction of the large {\sf adouble} arrays.
972The user is advised to
973create dynamic arrays of
974{\sf adouble}s by using the C++ operator {\sf new} and to destroy them
975using {\sf delete}. For storage efficiency it is desirable that
976dynamic objects are created and destroyed in a last-in-first-out
977fashion.
978
979Whenever an {\sf adouble} is declared, the constructor for the type
980{\sf adouble} assigns it a nominal address, which we will refer to as
981its  {\em location}.  The location is of the type {\sf locint} defined
982in the header file \verb=<adolc/internal/usrparms.h>=. Active vectors occupy
983a range of contiguous locations. As long as the program execution
984never involves more than 65$\,$536 active variables, the type {\sf locint}
985may be defined as {\sf unsigned short}. Otherwise, the range may be
986extended by defining {\sf locint} as {\sf (unsigned) int} or
987{\sf (unsigned) long}, which may nearly double
988the overall mass storage requirement. Sometimes one can avoid exceeding
989the accessible range of {\sf unsigned short}s by using more local variables and deleting
990{\sf adouble}s  created by the new operator in a
991last-in-first-out
992fashion.  When memory for {\sf adouble}s is requested through a call to
993{\sf malloc()} or other related C memory-allocating
994functions, the storage for these {\sf adouble}s is allocated; however, the
995C++ {\sf adouble} constructor is never called.  The newly defined
996{\sf adouble}s are never assigned a location and are not counted in
997the stack of live variables. Thus, any results depending upon these
998pseudo-{\sf adouble}s will be incorrect. For these reasons {\bf DO NOT use
999  malloc() and related C memory-allocating
1000functions when declaring adoubles (see the following paragraph).}
1001%
1002% XXX: Vector and matrix class have to be reimplemented !!!
1003%
1004%The same point applies, of course,
1005% for active vectors.
1006
1007When an {\sf adouble}
1008%
1009% XXX: Vector and matrix class have to be reimplemented !!!
1010%
1011% or {\bf adoublev}
1012goes out of
1013scope or is explicitly deleted, the destructor notices that its
1014location(s) may be
1015freed for subsequent (nominal) reallocation. In general, this is not done
1016immediately but is delayed until the locations to be deallocated form a
1017contiguous tail of all locations currently being used. 
1018
1019 As a consequence of this allocation scheme, the currently
1020alive {\sf adouble} locations always form a contiguous range of integers
1021that grows and shrinks like a stack. Newly declared {\sf adouble}s are
1022placed on the top so that vectors of {\sf adouble}s obtain a contiguous
1023range of locations. While the C++ compiler can be expected to construct
1024and destruct automatic variables in a last-in-first-out fashion, the
1025user may upset this desirable pattern by deleting free-store {\sf adouble}s
1026too early or too late. Then the {\sf adouble} stack may grow
1027unnecessarily, but the numerical results will still be
1028correct, unless an exception occurs because the range of {\sf locint}
1029is exceeded. In general, free-store {\sf adouble}s
1030%
1031% XXX: Vector and matrix class have to be reimplemented !!!
1032%
1033%and {\bf adoublev}s
1034should be deleted in a last-in-first-out fashion toward the end of
1035the program block in which they were created.
1036When this pattern is maintained, the maximum number of
1037{\sf adouble}s alive and, as a consequence, the
1038randomly accessed storage space
1039of the derivative evaluation routines is bounded by a
1040small multiple of the memory used in the relevant section of the
1041original program. Failure to delete dynamically allocated {\sf adouble}s
1042may cause that the  maximal number of {\sf adouble}s alive at one time will be exceeded
1043if the same active section is called repeatedly. The same effect
1044occurs if static {\sf adouble}s are used.
1045
1046To avoid the storage and manipulation of structurally
1047trivial derivative values, one should pay careful attention to
1048the naming of variables. Ideally, the intermediate
1049values generated during the evaluation of a vector function
1050should be assigned to program variables that are
1051consistently either active or passive, in that all their values
1052either are or are not dependent on the independent variables
1053in a nontrivial way. For example, this rule is violated if a temporary
1054variable is successively used to accumulate inner products involving
1055first only passive and later active arrays. Then the first inner
1056product and all its successors in the data dependency graph become
1057artificially active and the derivative evaluation routines
1058described later will waste
1059time allocating and propagating
1060trivial or useless derivatives. Sometimes even values that do
1061depend on the independent variables may be of only transitory
1062importance and may not affect the dependent variables. For example,
1063this is true for multipliers that are used to scale linear
1064equations, but whose values do not influence the dependent
1065variables in a mathematical sense. Such dead-end variables
1066can be deactivated by the use of the {\sf value} function, which
1067converts {\sf adouble}s to {\sf double}s. The deleterious effects
1068of unnecessary activity are partly alleviated by run time
1069activity flags in the derivative routine
1070{\sf hov\_reverse} presented in \autoref{forw_rev_ad}.
1071
1072The {\sf adouble} default constructor sets to zero the associated value.
1073This implies a certain overhead that may seem unnecessary when no initial value
1074is actually given, however,
1075the implicit initialization of arrays from a partial value list is the only legitimate construct (known to us) that requires this behavior.
1076An array instantiation such as
1077\begin{center}
1078\sf double x[3]=\{2.0\};
1079\end{center}
1080will initialize {\sf x[0]} to {\sf 2.0} and initialize (implicitly) the remaining array elements
1081{\sf x[1]} and {\sf x[2]}  to {\sf 0.0}. According to the C++ standard the array element  construction of
1082the type changed instantiation
1083\begin{center}
1084\sf adouble x[3]=\{2.0\};
1085\end{center}
1086will use the constructor {\sf adouble(const double\&);} for {\sf x[0]} passing in {\sf 2.0} but
1087will call the {\sf adouble} default constructor {\sf x[1]} and {\sf x[2]} leaving these array
1088elements uninitialized {\em unless} the default constructor does implement the initialization to
1089zero.
1090The C++ constructor syntax does not provide a means to  distinguish this implicit initialization from the declaration of any simple uninitialized variable.
1091If the user can ascertain the absence of array instantiations such as the above then one can 
1092configure ADOL-C with the \verb=--disable-stdczero= option , see \autoref{genlib}, to
1093avoid the overhead of these initializations. 
1094 
1095%
1096%
1097%
1098%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1099\section{Easy-To-Use Drivers}
1100\label{drivers}
1101%
1102For the convenience of the user, ADOL-C provides several
1103easy-to-use drivers that compute the most frequently required
1104derivative objects. Throughout, we assume that after the execution of an
1105active section, the corresponding tape with the identifier {\sf tag}
1106contains a detailed record of the computational process by which the
1107final values $y$ of the dependent variables were obtained from the
1108values $x$ of the independent variables. We will denote this functional
1109relation between the input variables $x$ and the output variables $y$ by
1110\[
1111F : \R^n \mapsto \R^m, \qquad x \rightarrow F(x) \equiv y.
1112\]
1113The return value of all drivers presented in this section
1114indicate the validity of the tape as explained in \autoref{reuse_tape}.
1115The presented drivers are all C functions and therefore can be used within
1116C and C++ programs. Some Fortran-callable companions can be found
1117in the appropriate header files.
1118%
1119%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1120\subsection{Drivers for Optimization and Nonlinear Equations}
1121%
1122\label{optdrivers}
1123%
1124The drivers provided for solving optimization problems and nonlinear
1125equations are prototyped in the header file \verb=<adolc/drivers/drivers.h>=,
1126which is included automatically by the global header file \verb=<adolc/adolc.h>=
1127(see \autoref{ssec:DesIH}).
1128
1129The routine {\sf function} allows to evaluate the desired function from
1130the tape instead of executing the corresponding source code:
1131%
1132\begin{tabbing}
1133\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1134\>{\sf int function(tag,m,n,x,y)}\\
1135\>{\sf short int tag;}         \> // tape identification \\
1136\>{\sf int m;}                 \> // number of dependent variables $m$\\
1137\>{\sf int n;}                 \> // number of independent variables $n$\\
1138\>{\sf double x[n];}           \> // independent vector $x$ \\
1139\>{\sf double y[m];}           \> // dependent vector $y=F(x)$ 
1140\end{tabbing}
1141%
1142If the original evaluation program is available this double version
1143should be used to compute the function value in order to avoid the
1144interpretative overhead. 
1145
1146For the calculation of whole derivative vectors and matrices up to order
11472 there are the following procedures:
1148%
1149\begin{tabbing}
1150\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1151\>{\sf int gradient(tag,n,x,g)}\\
1152\>{\sf short int tag;}         \> // tape identification \\
1153\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1154\>{\sf double x[n];}           \> // independent vector $x$ \\
1155\>{\sf double g[n];}           \> // resulting gradient $\nabla F(x)$
1156\end{tabbing}
1157%
1158\begin{tabbing}
1159\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1160\>{\sf int jacobian(tag,m,n,x,J)}\\
1161\>{\sf short int tag;}         \> // tape identification \\
1162\>{\sf int m;}                 \> // number of dependent variables $m$\\
1163\>{\sf int n;}                 \> // number of independent variables $n$\\
1164\>{\sf double x[n];}           \> // independent vector $x$ \\
1165\>{\sf double J[m][n];}        \> // resulting Jacobian $F^\prime (x)$
1166\end{tabbing}
1167%
1168\begin{tabbing}
1169\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1170\>{\sf int hessian(tag,n,x,H)}\\
1171\>{\sf short int tag;}         \> // tape identification \\
1172\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1173\>{\sf double x[n];}           \> // independent vector $x$ \\
1174\>{\sf double H[n][n];}        \> // resulting Hessian matrix $\nabla^2F(x)$ 
1175\end{tabbing}
1176%
1177The driver routine {\sf hessian} computes only the lower half of
1178$\nabla^2f(x_0)$ so that all values {\sf H[i][j]} with $j>i$ 
1179of {\sf H} allocated as a square array remain untouched during the call
1180of {\sf hessian}. Hence only $i+1$ {\sf double}s  need to be
1181allocated starting at the position {\sf H[i]}.
1182
1183To use the full capability of automatic differentiation when the
1184product of derivatives with certain weight vectors or directions are needed, ADOL-C offers
1185the following four drivers: 
1186%
1187\begin{tabbing}
1188\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1189\>{\sf int vec\_jac(tag,m,n,repeat,x,u,z)}\\
1190\>{\sf short int tag;}         \> // tape identification \\
1191\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1192\>{\sf int n;}                 \> // number of independent variables $n$\\
1193\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1194\>{\sf double x[n];}           \> // independent vector $x$ \\
1195\>{\sf double u[m];}           \> // range weight vector $u$ \\ 
1196\>{\sf double z[n];}           \> // result $z = u^TF^\prime (x)$
1197\end{tabbing}
1198If a nonzero value of the parameter {\sf repeat} indicates that the
1199routine {\sf vec\_jac} has been called at the same argument immediately
1200before, the internal forward mode evaluation will be skipped and only
1201reverse mode evaluation with the corresponding arguments is executed
1202resulting in a reduced computational complexity of the function {\sf vec\_jac}.
1203%
1204\begin{tabbing}
1205\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1206\>{\sf int jac\_vec(tag,m,n,x,v,z)}\\
1207\>{\sf short int tag;}         \> // tape identification \\
1208\>{\sf int m;}                 \> // number of dependent variables $m$\\
1209\>{\sf int n;}                 \> // number of independent variables $n$\\
1210\>{\sf double x[n];}           \> // independent vector $x$\\
1211\>{\sf double v[n];}           \> // tangent vector $v$\\ 
1212\>{\sf double z[m];}           \> // result $z = F^\prime (x)v$
1213\end{tabbing}
1214%
1215\begin{tabbing}
1216\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1217\>{\sf int hess\_vec(tag,n,x,v,z)}\\
1218\>{\sf short int tag;}         \> // tape identification \\
1219\>{\sf int n;}                 \> // number of independent variables $n$\\
1220\>{\sf double x[n];}           \> // independent vector $x$\\
1221\>{\sf double v[n];}           \> // tangent vector $v$\\
1222\>{\sf double z[n];}           \> // result $z = \nabla^2F(x) v$ 
1223\end{tabbing}
1224%
1225\begin{tabbing}
1226\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1227\>{\sf int hess\_mat(tag,n,p,x,V,Z)}\\
1228\>{\sf short int tag;}         \> // tape identification \\
1229\>{\sf int n;}                 \> // number of independent variables $n$\\
1230\>{\sf int p;}                 \> // number of columns in $V$\\
1231\>{\sf double x[n];}           \> // independent vector $x$\\
1232\>{\sf double V[n][p];}        \> // tangent matrix $V$\\
1233\>{\sf double Z[n][p];}        \> // result $Z = \nabla^2F(x) V$ 
1234\end{tabbing}
1235%
1236\begin{tabbing}
1237\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1238\>{\sf int lagra\_hess\_vec(tag,m,n,x,v,u,h)}\\
1239\>{\sf short int tag;}         \> // tape identification \\
1240\>{\sf int m;}                 \> // number of dependent variables $m$\\
1241\>{\sf int n;}                 \> // number of independent variables $n$\\
1242\>{\sf double x[n];}           \> // independent vector $x$\\
1243\>{\sf double v[n];}           \> // tangent vector $v$\\
1244\>{\sf double u[m];}           \> // range weight vector $u$ \\
1245\>{\sf double h[n];}           \> // result $h = u^T\nabla^2F(x) v $
1246\end{tabbing}
1247%
1248The next procedure allows the user to perform Newton steps only
1249having the corresponding tape at hand:
1250%
1251\begin{tabbing}
1252\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1253\>{\sf int jac\_solv(tag,n,x,b,mode)} \\
1254\>{\sf short int tag;}         \> // tape identification \\
1255\>{\sf int n;}                 \> // number of independent variables $n$\\
1256\>{\sf double x[n];}           \> // independent vector $x$ as\\
1257\>{\sf double b[n];}           \> // in: right-hand side b, out: result $w$ of
1258$F(x)w = b$\\
1259\>{\sf int mode;}              \> // option to choose different solvers
1260\end{tabbing}
1261%
1262On entry, parameter {\sf b} of the routine {\sf jac\_solv}
1263contains the right-hand side of the equation $F(x)w = b$ to be solved. On exit,
1264{\sf b} equals the solution $w$ of this equation. If {\sf mode} = 0 only
1265the Jacobian of the function
1266given by the tape labeled with {\sf tag} is provided internally.
1267The LU-factorization of this Jacobian is computed for {\sf mode} = 1. The
1268solution of the equation is calculated if {\sf mode} = 2.
1269Hence, it is possible to compute the
1270LU-factorization only once. Then the equation can be solved for several
1271right-hand sides $b$ without calculating the Jacobian and
1272its factorization again. 
1273
1274If the original evaluation code of a function contains neither
1275quadratures nor branches, all drivers described above can be used to
1276evaluate derivatives at any argument in its domain. The same still
1277applies if there are no user defined quadratures and
1278all comparisons  involving {\sf adouble}s have the same result as
1279during taping. If this assumption is falsely made all drivers
1280while internally calling the forward mode evaluation will return the value -1 or -2
1281as already specified in \autoref{reuse_tape}
1282
1283%
1284%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1285\subsection{Drivers for Ordinary Differential Equations}
1286\label{odedrivers}
1287%
1288When $F$ is the right-hand side of an (autonomous) ordinary
1289differential equation 
1290\[
1291x^\prime(t) \; = \; F(x(t)) , 
1292\] 
1293we must have $m=n$. Along any solution path $x(t)$ its Taylor
1294coefficients $x_j$ at some time, e.g., $t=0$, must satisfy
1295the relation
1296\[
1297 x_{i+1} = \frac{1}{1+i} y_i.
1298\]
1299with the $y_j$ the Taylor coefficients of its derivative $y(t)=x^\prime(t)$, namely,
1300\[
1301 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
1302\]
1303defined by an autonomous right-hand side $F$ recorded on the tape.
1304Using this relation, one can generate the Taylor coefficients $x_i$,
1305$i \le deg$,
1306recursively from the current point $x_0$. This task is achieved by the
1307driver routine {\sf forode} defined as follows:
1308%
1309\begin{tabbing}
1310\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1311\>{\sf int forode(tag,n,tau,dol,deg,X)}\\
1312\>{\sf short int tag;}         \> // tape identification \\
1313\>{\sf int n;}                 \> // number of state variables $n$\\
1314\>{\sf double tau;}            \> // scaling parameter\\
1315\>{\sf int dol;}               \> // degree on previous call\\
1316\>{\sf int deg;}               \> // degree on current call\\
1317\>{\sf double X[n][deg+1];}    \> // Taylor coefficient vector $X$
1318\end{tabbing}
1319%
1320If {\sf dol} is positive, it is assumed that {\sf forode}
1321has been called before at the same point so that all Taylor coefficient
1322vectors up to the {\sf dol}-th are already correct.
1323
1324Subsequently one may call the driver routine {\sf reverse} or corresponding
1325low level routines as explained in the \autoref{forw_rev} and
1326\autoref{forw_rev_ad}, respectively, to compute
1327the family of square matrices {\sf Z[n][n][deg]} defined by
1328\[
1329Z_j \equiv U\/\frac{\partial y_j}{\partial x_0} \in{I\!\!R}^{q \times n} ,
1330\]
1331with {\sf double** U}$=I_n$ the identity matrix of order {\sf n}.
1332
1333For the numerical solutions of ordinary differential equations,
1334one may also wish to calculate the Jacobians
1335\begin{equation} 
1336\label{eq:bees}
1337B_j \; \equiv \; \frac{\mbox{d}x_{j+1}}{\mbox{d} x_0}\;\in\;{I\!\!R}^{n \times n}\, ,
1338\end{equation}
1339which exist provided $F$ is sufficiently smooth. These matrices can
1340be obtained from the partial derivatives $\partial y_i/\partial x_0$
1341by an appropriate version of the chain rule.
1342To compute the total derivatives $B = (B_j)_{0\leq j <d}$
1343defined in \eqref{eq:bees}, one has to evaluate $\frac{1}{2}d(d-1)$
1344matrix-matrix products. This can be done by a call of the routine {\sf accode} after the
1345corresponding evaluation of the {\sf hov\_reverse} function. The interface of
1346{\sf accode} is defined as follows:
1347%
1348\begin{tabbing}
1349\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1350\>{\sf int accode(n,tau,deg,Z,B,nz)}\\
1351\>{\sf int n;}                 \> // number of state variables $n$ \\
1352\>{\sf double tau;}            \> // scaling parameter\\
1353\>{\sf int deg;}               \> // degree on current call\\
1354\>{\sf double Z[n][n][deg];}   \> // partials of coefficient vectors\\
1355\>{\sf double B[n][n][deg];}   \> // result $B$ as defined in \eqref{eq:bees}\\
1356\>{\sf short nz[n][n];}        \> // optional nonzero pattern
1357\end{tabbing}
1358%
1359Sparsity information can be exploited by {\sf accode} using the array {\sf
1360nz}. For this purpose, {\sf nz} has to be set by a call of the routine {\sf
1361reverse} or the corresponding basic routines as explained below in
1362\autoref{forw_rev_ad} and \autoref{forw_rev}, respectively. The
1363non-positive entries of {\sf nz} are then changed by {\sf accode} so that upon
1364return
1365\[
1366  \mbox{{\sf B[i][j][k]}} \; \equiv \; 0 \quad {\rm if} \quad \mbox{\sf k} \leq \mbox{\sf $-$nz[i][j]}\; .
1367\]
1368In other words, the matrices $B_k$ = {\sf B[ ][ ][k]} have a
1369sparsity pattern that fills in as $k$ grows. Note, that there need to be no
1370loss in computational efficiency if a time-dependent ordinary differential equation
1371is rewritten in autonomous form.
1372
1373The prototype of the ODE-drivers {\sf forode} and {\sf accode} is contained in the header file
1374\verb=<adolc/drivers/odedrivers.h>=. The global header file
1375\verb=<adolc/adolc.h>=
1376includes this file automatically, see \autoref{ssec:DesIH}.
1377
1378An example program using the procedures {\sf forode} and {\sf accode} together
1379with more detailed information about the coding can be found in
1380\autoref{exam:ode}. The corresponding source code
1381\verb=odexam.cpp= is contained in the subdirectory
1382\verb=examples=.
1383%
1384%
1385%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1386\subsection{Drivers for Sparse Jacobians and Sparse Hessians}
1387\label{sparse}
1388%
1389Quite often, the Jacobians and Hessians that have to be computed are sparse
1390matrices. Therefore, ADOL-C provides additionally drivers that
1391allow the exploitation of sparsity. The exploitation of sparsity is
1392frequently based on {\em graph coloring} methods, discussed
1393for example in \cite{GeMaPo05} and \cite{GeTaMaPo07}. The sparse drivers of ADOL-C presented in this section
1394rely on the the coloring package ColPack developed by the authors of \cite{GeMaPo05} and \cite{GeTaMaPo07}.
1395ColPack is not directly incorporated in ADOL-C, and therefore needs to be installed
1396separately to use the sparse drivers described here. ColPack is available for download at
1397\verb=http://cscapes.cs.purdue.edu/coloringpage/software.htm=. More information about the required
1398installation of ColPack is given in \autoref{install}.
1399%
1400\subsubsection*{Sparse Jacobians and Sparse Hessians}
1401%
1402To compute the entries of sparse Jacobians and sparse Hessians,
1403respectively, in coordinate format one may use the drivers:
1404\begin{tabbing}
1405\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1406\>{\sf int sparse\_jac(tag,m,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1407\>{\sf short int tag;}         \> // tape identification \\
1408\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1409\>{\sf int n;}                 \> // number of independent variables $n$\\
1410\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1411\>{\sf double x[n];}           \> // independent vector $x$ \\
1412\>{\sf int nnz;}               \> // number of nonzeros \\ 
1413\>{\sf unsigned int rind[nnz];}\> // row index\\ 
1414\>{\sf unsigned int cind[nnz];}\> // column index\\ 
1415\>{\sf double values[nnz];}    \> // non-zero values\\ 
1416\>{\sf int options[4];}        \> // array of control parameters\\ 
1417\end{tabbing}
1418%
1419\begin{tabbing}
1420\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1421\>{\sf int sparse\_hess(tag,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1422\>{\sf short int tag;}         \> // tape identification \\
1423\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1424\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1425\>{\sf double x[n];}           \> // independent vector $x$ \\
1426\>{\sf int nnz;}               \> // number of nonzeros \\ 
1427\>{\sf unsigned int rind[nnz];}\> // row indices\\ 
1428\>{\sf unsigned int cind[nnz];}\> // column indices\\ 
1429\>{\sf double values[nnz];}    \> // non-zero values  \\
1430\>{\sf int options[2];}        \> // array of control parameters\\ 
1431\end{tabbing}
1432%
1433Once more, the input variables are the identifier for the internal
1434representation {\sf tag}, if required the number of dependents {\sf m},
1435and the number of independents {\sf n} for a consistency check.
1436Furthermore, the flag {\sf repeat=0} indicates that the functions are called
1437at a point with a new sparsity structure, whereas  {\sf repeat=1} results in
1438the re-usage of the sparsity pattern from the previous call.
1439The current values of the independents are given by the array {\sf x}.
1440The input/output
1441variable {\sf nnz} stores the number of the nonzero entries.
1442Therefore, {\sf nnz} denotes also the length of the arrays {\sf r\_ind} storing
1443the row indices, {\sf c\_ind} storing the column indices, and
1444{\sf values} storing the values of the nonzero entries.
1445If {\sf sparse\_jac} and {\sf sparse\_hess} are called with {\sf repeat=0},
1446the functions determine the number of nonzeros for the sparsity pattern
1447defined by the value of {\sf x}, allocate appropriate arrays {\sf r\_ind},
1448{\sf c\_ind}, and {\sf values} and store the desired information in these
1449arrays.
1450During the next function call with {\sf repeat=1} the allocated memory
1451is reused such that only the values of the arrays are changed.   
1452Before calling {\sf sparse\_jac} or {\sf sparse\_hess} once more with {\sf
1453  repeat=0} the user is responsible for the deallocation of the array
1454 {\sf r\_ind}, {\sf c\_ind}, and {\sf values} using the function {\sf
1455   free()}!
1456
1457For each driver the array {\sf options} can be used to adapted the
1458computation of the sparse derivative matrices to the special
1459needs of application under consideration. Most frequently, the default options
1460will give a reasonable performance. The elements of the array {\sf options} control the action of
1461{\sf sparse\_jac} according to \autoref{options_sparse_jac}.
1462\begin{table}[h]
1463\center
1464\begin{tabular}{|c|c|l|} \hline
1465component & value &  \\ \hline
1466{\sf options[0]} &    &  way of sparsity pattern computation \\
1467                 & 0  &  propagation of index domains (default) \\
1468                 & 1  &  propagation of bit pattern \\ \hline
1469{\sf options[1]} &    &  test the computational graph control flow \\
1470                 & 0  &  safe mode (default) \\
1471                 & 1  &  tight mode \\ \hline
1472{\sf options[2]} &    &  way of bit pattern propagation \\
1473                 & 0  &  automatic detection (default) \\
1474                 & 1  &  forward mode \\ 
1475                 & 2  &  reverse mode \\ \hline
1476{\sf options[3]} &    &  way of compression \\
1477                 & 0  &  column compression (default) \\
1478                 & 1  &  row compression \\ \hline
1479\end{tabular}
1480\caption{ {\sf sparse\_jac} parameter {\sf options}\label{options_sparse_jac}}
1481\end{table}           
1482
1483The component {\sf options[1]} determines
1484the usage of the safe or tight mode of sparsity computation.
1485The first, more conservative option is the default. It accounts for all
1486dependences that might occur for any value of the
1487independent variables. For example, the intermediate
1488{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1489always assumed to depend on all independent variables that {\sf a} or {\sf b}
1490dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1491logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1492In contrast
1493the tight option gives this result only in the unlikely event of an exact
1494tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1495associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1496depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1497Obviously, the sparsity pattern obtained with the tight option may contain
1498more zeros than that obtained with the safe option. On the other hand, it
1499will only be valid at points belonging to an area where the function $F$ is locally
1500analytic and that contains the point at which the internal representation was
1501generated. Since generating the sparsity structure using the safe version does not
1502require any reevaluation, it may thus reduce the overall computational cost
1503despite the fact that it produces more nonzero entries.
1504The value of {\sf options[2]} selects the direction of bit pattern propagation.
1505Depending on the number of independent $n$ and of dependent variables $m$ 
1506one would prefer the forward mode if $n$ is significant smaller than $m$ and
1507would otherwise use the reverse mode.
1508
1509 The elements of the array {\sf options} control the action of
1510{\sf sparse\_hess} according to \autoref{options_sparse_hess}.
1511\begin{table}[h]
1512\center
1513\begin{tabular}{|c|c|l|} \hline
1514component & value &  \\ \hline
1515{\sf options[0]} &    &  test the computational graph control flow \\
1516                 & 0  &  safe mode (default) \\
1517                 & 1  &  tight mode \\ \hline
1518{\sf options[1]} &    &  way of recovery \\
1519                 & 0  &  indirect recovery (default) \\
1520                 & 1  &  direct recovery \\ \hline
1521\end{tabular}
1522\caption{ {\sf sparse\_hess} parameter {\sf options}\label{options_sparse_hess}}
1523\end{table}           
1524
1525The described driver routines for the computation of sparse derivative
1526matrices are prototyped in the header file
1527\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1528global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1529Example codes illustrating the usage of {\sf
1530  sparse\_jac} and {\sf sparse\_hess} can be found in the file
1531\verb=sparse_jacobian.cpp=  and \verb=sparse_hessian.cpp= contained in %the subdirectory
1532\verb=examples/additional_examples/sparse=.
1533%
1534%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1535\subsubsection*{Computation of Sparsity Pattern}
1536%
1537ADOL-C offers a convenient way of determining the 
1538sparsity structure of a Jacobian matrix using the function:
1539%
1540\begin{tabbing}
1541\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1542\>{\sf int jac\_pat(tag, m, n, x, JP, options)}\\
1543\>{\sf short int tag;} \> // tape identification \\
1544\>{\sf int m;} \> // number of dependent variables $m$\\
1545\>{\sf int n;} \> // number of independent variables $n$\\
1546\>{\sf double x[n];} \> // independent variables $x_0$\\
1547\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure\\
1548\>{\sf int options[2];} \> // array of control parameters
1549\end{tabbing}
1550%
1551The sparsity pattern of the
1552Jacobian is computed in a compressed row format. For this purpose,
1553{\sf JP} has to be an $m$ dimensional array of pointers to {\sf
1554  unsigned int}s, i.e., one has {\sf unsigned int* JP[m]}.
1555During the call of  {\sf jac\_pat}, the number $\hat{n}_i$ of nonzero
1556entries in row $i$ of the Jacobian is determined for all $1\le i\le
1557m$. Then, a memory allocation is performed such that {\sf JP[i-1]}
1558points to a block of $\hat{n}_i+1$ {\sf  unsigned int} for all $1\le
1559i\le m$ and {\sf JP[i-1][0]} is set to $\hat{n}_i$. Subsequently, the
1560column indices of the $j$ nonzero entries in the $i$th row are stored
1561in the components  {\sf JP[i-1][1]}, \ldots, {\sf JP[i-1][j]}.
1562
1563The elements of the array {\sf options} control the action of
1564{\sf jac\_pat} according to \autoref{options}.
1565\begin{table}[h]
1566\center
1567\begin{tabular}{|c|c|l|} \hline
1568component & value &  \\ \hline
1569{\sf options[0]} &    &  way of sparsity pattern computation \\
1570                 & 0  &  propagation of index domains (default) \\
1571                 & 1  &  propagation of bit pattern \\ \hline
1572{\sf options[1]} &    &  test the computational graph control flow \\
1573                 & 0  &  safe mode (default) \\
1574                 & 1  &  tight mode \\ \hline
1575{\sf options[2]} &    &  way of bit pattern propagation \\
1576                 & 0  &  automatic detection (default) \\
1577                 & 1  &  forward mode \\ 
1578                 & 2  &  reverse mode \\ \hline
1579\end{tabular}
1580\caption{ {\sf jac\_pat} parameter {\sf options}\label{options}}
1581\end{table}           
1582The value of {\sf options[0]} selects the way to compute the sparsity
1583pattern. The component {\sf options[1]} determines
1584the usage of the safe or tight mode of bit pattern propagation.
1585The first, more conservative option is the default. It accounts for all
1586dependences that might occur for any value of the
1587independent variables. For example, the intermediate
1588{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1589always assumed to depend on all independent variables that {\sf a} or {\sf b}
1590dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1591logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1592In contrast
1593the tight option gives this result only in the unlikely event of an exact
1594tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1595associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1596depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1597Obviously, the sparsity pattern obtained with the tight option may contain
1598more zeros than that obtained with the safe option. On the other hand, it
1599will only be valid at points belonging to an area where the function $F$ is locally
1600analytic and that contains the point at which the internal representation was
1601generated. Since generating the sparsity structure using the safe version does not
1602require any reevaluation, it may thus reduce the overall computational cost
1603despite the fact that it produces more nonzero entries. The value of
1604{\sf options[2]} selects the direction of bit pattern propagation.
1605Depending on the number of independent $n$ and of dependent variables $m$ 
1606one would prefer the forward mode if $n$ is significant smaller than $m$ and
1607would otherwise use the reverse mode.
1608
1609The routine {\sf jac\_pat} may use the propagation of bitpattern to
1610determine the sparsity pattern. Therefore, a kind of ``strip-mining''
1611is used to cope with large matrix dimensions. If the system happens to run out of memory, one may reduce
1612the value of the constant {\sf PQ\_STRIPMINE\_MAX}
1613following the instructions in \verb=<adolc/sparse/sparse_fo_rev.h>=.
1614
1615The driver routine is prototyped in the header file
1616\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1617global header file \verb=<adolc/adolc.h>= (see
1618\autoref{ssec:DesIH}). The determination of sparsity patterns is
1619illustrated by the examples \verb=sparse_jacobian.cpp=
1620and \verb=jacpatexam.cpp=
1621contained in
1622\verb=examples/additional_examples/sparse=.
1623
1624To compute the sparsity pattern of a Hessian in a row compressed form, ADOL-C provides the
1625driver
1626\begin{tabbing}
1627\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1628\>{\sf int hess\_pat(tag, n, x, HP, options)}\\
1629\>{\sf short int tag;}       \> // tape identification \\
1630\>{\sf int n;}               \> // number of independent variables $n$\\
1631\>{\sf double x[n];}         \> // independent variables $x_0$\\
1632\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure\\
1633\>{\sf int option;}          \> // control parameter
1634\end{tabbing}
1635where the user has to provide {\sf HP} as an $n$ dimensional array of pointers to {\sf
1636 unsigned int}s.
1637After the function call {\sf HP} contains the sparsity pattern,
1638where {\sf HP[j][0]} contains the number of nonzero elements in the
1639 $j$th row for $1 \le j\le n$.
1640The components {\sf P[j][i]}, $0<${\sf i}~$\le$~{\sf P[j][0]} store the
1641 indices of these entries. For determining the sparsity pattern, ADOL-C uses
1642 the algorithm described in \cite{Wa05a}.  The parameter{\sf option} determines
1643the usage of the safe ({\sf option = 0}, default) or tight mode ({\sf
1644  option = 1}) of the computation of the sparsity pattern as described
1645above.
1646
1647This driver routine is prototyped in the header file
1648\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1649global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1650An example employing the procedure {\sf hess\_pat}  can be found in the file
1651\verb=sparse_hessian.cpp=  contained in
1652\verb=examples/additional_examples/sparse=.
1653%
1654%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1655\subsubsection*{Calculation of Seed Matrices}
1656%
1657To compute a compressed derivative matrix from a given sparsity
1658pattern, one has to calculate an appropriate seed matrix that can be
1659used as input for the derivative calculation. To facilitate the
1660generation of seed matrices for a sparsity pattern given in
1661row compressed form, ADOL-C provides the following two drivers,
1662which are based on the ColPack library:
1663\begin{tabbing}
1664\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1665\>{\sf int generate\_seed\_jac(m, n, JP, S, p)}\\
1666\>{\sf int m;} \> // number of dependent variables $m$\\
1667\>{\sf int n;} \> // number of independent variables $n$\\
1668\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure
1669of Jacobian\\
1670\>{\sf double S[n][p];} \> // seed matrix\\
1671\>{\sf int p;} \> // number of columns in $S$
1672\end{tabbing}
1673The input variables to {\sf generate\_seed\_jac} are the number of dependent variables $m$, the
1674number of independent variables {\sf n} and the sparsity pattern {\sf
1675  JP} of the Jacobian computed for example by {\sf jac\_pat}. First,
1676{\sf generate\_seed\_jac} performs a distance-2 coloring of the bipartite graph defined by the sparsity
1677pattern {\sf JP} as described in \cite{GeMaPo05}. The number of colors needed for the coloring
1678determines the number of columns {\sf p} in the seed
1679matrix. Subsequently, {\sf generate\_seed\_jac} allocates the memory needed by {\sf
1680 S} and initializes {\sf S} according to the graph coloring.
1681The coloring algorithm that is applied in {\sf
1682  generate\_seed\_jac} is used also by the driver {\sf sparse\_jac}
1683described earlier.
1684
1685\begin{tabbing}
1686\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1687\>{\sf int generate\_seed\_hess(n, HP, S, p)}\\
1688\>{\sf int n;} \> // number of independent variables $n$\\
1689\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure
1690of Jacobian\\
1691\>{\sf double S[n][p];} \> // seed matrix\\
1692\>{\sf int p;} \> // number of columns in $S$
1693\end{tabbing}
1694The input variables to {\sf generate\_seed\_hess} are the number of independents $n$
1695and the sparsity pattern {\sf HP} of the Hessian computed for example
1696by {\sf hess\_pat}. First, {\sf generate\_seed\_hess} performs an
1697appropriate coloring of the adjacency graph defined by the sparsity
1698pattern {\sf HP}: An acyclic coloring in the case of an indirect recovery of the Hessian from its
1699    compressed representation and a star coloring in the case of a direct recovery.
1700 Subsequently, {\sf generate\_seed\_hess} allocates the memory needed by {\sf
1701 S} and initializes {\sf S} according to the graph coloring.
1702The coloring algorithm applied in {\sf
1703  generate\_seed\_hess} is used also by the driver {\sf sparse\_hess}
1704described earlier.
1705
1706The specific set of criteria used to define a seed matrix $S$ depends
1707on whether the sparse derivative matrix
1708to be computed is a Jacobian (nonsymmetric) or a Hessian (symmetric). 
1709It also depends on whether the entries of the derivative matrix  are to be
1710recovered from the compressed representation \emph{directly}
1711(without requiring any further arithmetic) or \emph{indirectly} (for
1712example, by solving for unknowns via successive substitutions).
1713Appropriate recovery routines are provided by ColPack and used
1714in the drivers {\sf sparse\_jac} and {\sf sparse\_hess} described in
1715the previous subsection. Examples with a detailed analysis of the
1716employed drivers for the exploitation of sparsity can be found in the
1717papers \cite{GePoTaWa06} and \cite{GePoWa08}.
1718
1719
1720These driver routines are prototyped in
1721\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1722global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1723An example code illustrating the usage of {\sf
1724generate\_seed\_jac} and {\sf generate\_seed\_hess} can be found in the file
1725\verb=sparse_jac_hess_exam.cpp= contained in \verb=examples/additional_examples/sparse=.
1726%
1727%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1728\subsection{Higher Derivative Tensors}
1729\label{higherOrderDeriv}
1730%
1731Many applications in scientific computing need second- and higher-order
1732derivatives. Often, one does not require full derivative tensors but
1733only the derivatives in certain directions $s_i \in \R^{n}$.
1734Suppose a collection of $p$ directions
1735$s_i \in \R^{n}$ is given, which form a matrix
1736\[
1737S\; =\; \left [ s_1, s_2,\ldots,  s_p \right ]\; \in \;
1738 \R^{n \times p}.
1739\]
1740One possible choice is $S = I_n$ with  $p = n$, which leads to
1741full tensors being evaluated.
1742ADOL-C provides the function {\sf tensor\_eval}
1743to calculate the derivative tensors
1744\begin{eqnarray}
1745\label{eq:tensor}
1746\left. \nabla_{\mbox{$\scriptstyle \!\!S$}}^{k}
1747     F(x_0) \; = \; \frac{\partial^k}{\partial z^k} F(x_0+Sz) \right |_{z=0} 
1748     \in \R^{p^k}\quad \mbox{for} \quad k = 0,\ldots,d
1749\end{eqnarray}
1750simultaneously. The function {\sf tensor\_eval} has the following calling sequence and
1751parameters:
1752%
1753\begin{tabbing}
1754\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1755\>{\sf void tensor\_eval(tag,m,n,d,p,x,tensor,S)}\\
1756\>{\sf short int tag;}         \> // tape identification \\
1757\>{\sf int m;}                 \> // number of dependent variables $m$ \\
1758\>{\sf int n;}                 \> // number of independent variables $n$\\
1759\>{\sf int d;}                 \> // highest derivative degree $d$\\
1760\>{\sf int p;}                 \> // number of directions $p$\\
1761\>{\sf double x[n];}           \> // values of independent variables $x_0$\\
1762\>{\sf double tensor[m][size];}\> // result as defined in \eqref{eq:tensor} in compressed form\\
1763\>{\sf double S[n][p];}        \> // seed matrix $S$
1764\end{tabbing}
1765%
1766Using the symmetry of the tensors defined by \eqref{eq:tensor}, the memory 
1767requirement can be reduced enormously. The collection of  tensors up to order $d$ comprises 
1768$\binom{p+d}{d}$ distinct elements. Hence, the second dimension of {\sf tensor} must be
1769greater or equal to $\binom{p+d}{d}$.
1770To compute the derivatives, {\sf tensor\_eval} propagates internally univariate Taylor
1771series along $\binom{n+d-1}{d}$ directions. Then the desired values are interpolated. This
1772approach is described in \cite{Griewank97}.
1773
1774The access of individual entries in symmetric tensors of
1775higher order is a little tricky. We always store the derivative
1776values in the two dimensional array {\sf tensor} and provide two
1777different ways of accessing them. 
1778The leading dimension of the tensor array ranges over
1779the component index $i$ of the function $F$, i.e., $F_{i+1}$ for $i =
17800,\ldots,m-1$. The sub-arrays pointed to by {\sf tensor[i]} have identical
1781structure for all $i$. Each of them represents the symmetric tensors up to
1782order $d$ of the scalar function $F_{i+1}$ in $p$ variables. 
1783%
1784The $\binom{p+d}{d}$ mixed partial derivatives in each of the $m$
1785tensors are linearly ordered according to the tetrahedral
1786scheme described by Knuth \cite{Knuth73}. In the familiar quadratic
1787case $d=2$ the derivative with respect to $z_j$ and $z_k$ with $z$ 
1788as in \eqref{eq:tensor} and $j \leq k$ is stored at {\sf tensor[i][l]} with
1789$l = k*(k+1)/2+j$. At $j = 0 = k$ and hence $l = 0$ we find the
1790function value $F_{i+1}$ itself and the gradient
1791$\nabla F_{i+1}= \partial F_{i+1}/\partial x_k $ is stored at $l=k(k+1)/2$
1792with $j=0$ for $k=1,\ldots,p$.
1793
1794For general $d$ we combine the variable
1795indices to a multi-index $j = (j_1,j_2,\ldots,j_d)$,
1796where $j_k$ indicates differentiation with respect to variable
1797$x_{j_k}$ with $j_k \in \{0,1,\ldots,p\}$. The value $j_k=0$ indicates
1798no differentiation so that all lower derivatives are also
1799contained in the same data structure as described above for
1800the quadratic case. The location of the partial derivative specified
1801by $j$ is computed by the function
1802%
1803\begin{tabbing}
1804\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1805\>{\sf int tensor\_address(d,$\,$j)} \\
1806\>{\sf int d;}                 \> // highest derivative degree $d$ \\
1807\>{\sf int j[d];}              \> // multi-index $j$
1808\end{tabbing}       
1809%
1810and it may thus be referenced as {\sf tensor[i][tensor\_address(d,$\,$j)]}.
1811Notice that the address computation does depend on the degree $d$ 
1812but not on the number of directions $p$, which could theoretically be
1813enlarged without the need to reallocate the original tensor.
1814Also, the components of $j$ need to be non-increasing.
1815%
1816To some C programmers it may appear more natural to access tensor
1817entries by successive dereferencing in the form
1818{\sf tensorentry[i][$\,$j1$\,$][$\,$j2$\,$]$\ldots$[$\,$jd$\,$]}.
1819We have also provided this mode, albeit with the restriction
1820that the indices $j_1,j_2,\ldots,j_d$ are non-increasing.
1821In the second order case this means that the Hessian entries must be
1822specified in or below the diagonal. If this restriction is
1823violated the values are almost certain to be wrong and array bounds
1824may be violated. We emphasize that subscripting is not overloaded
1825but that {\sf tensorentry} is a conventional and
1826thus moderately efficient C pointer structure.
1827Such a pointer structure can be allocated and set up completely by the
1828function
1829%
1830\begin{tabbing}
1831\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1832\>{\sf void** tensorsetup(m,p,d,tensor)} \\
1833\>{\sf int m;}                 \> // number of dependent variables $n$ \\
1834\>{\sf int p;}                 \> // number of directions $p$\\
1835\>{\sf int d;}                 \> // highest derivative degree $d$\\
1836\>{\sf double tensor[m][size];}\> // pointer to two dimensional array
1837\end{tabbing}     
1838%
1839Here, {\sf tensor} is the array of $m$ pointers pointing to arrays of {\sf size}
1840$\geq \binom{p+d}{d}$ allocated by the user before. During the execution of {\sf tensorsetup},
1841 $d-1$ layers of pointers are set up so that the return value
1842allows the direct dereferencing of individual tensor elements.
1843
1844For example, suppose some active section involving  $m \geq 5$ dependents and
1845$n \geq 2$ independents has been executed and taped. We may
1846select $p=2$, $d=3$ and initialize the $n\times 2$ seed matrix $S$ with two
1847columns $s_1$ and $s_2$. Then we are able to execute the code segment
1848\begin{tabbing}
1849\hspace{0.5in}\={\sf double**** tensorentry = (double****) tensorsetup(m,p,d,tensor);} \\
1850              \>{\sf tensor\_eval(tag,m,n,d,p,x,tensor,S);}   
1851\end{tabbing}
1852This way, we evaluated all tensors defined in \eqref{eq:tensor} up to degree 3
1853in both directions $s_1$ and
1854$s_2$ at some argument $x$. To allow the access of tensor entries by dereferencing the pointer
1855structure {\sf tensorentry} has been created. Now, 
1856the value of the mixed partial
1857\[
1858 \left. \frac{\partial ^ 3 F_5(x+s_1 z_1+s_2 z_2)}{\partial z_1^2 \partial z_2}   \right |_{z_1=0=z_2
1859\]
1860can be recovered as
1861\begin{center}
1862   {\sf tensorentry[4][2][1][1]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[4][tensor\_address(d,$\,$j)]},
1863\end{center}
1864where the integer array {\sf j} may equal (1,1,2), (1,2,1) or (2,1,1). 
1865Analogously, the entry
1866\begin{center}   
1867   {\sf tensorentry[2][1][0][0]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[2][tensor\_address(d,$\,$j)]}
1868\end{center}
1869with {\sf j} = (1,0,0) contains the first derivative of the third dependent
1870variable $F_3$ with respect to the first differentiation parameter $z_1$.
1871
1872Note, that the pointer structure {\sf tensorentry} has to be set up only once. Changing the values of the
1873array {\sf tensor}, e.g.~by a further call of {\sf tensor\_eval}, directly effects the values accessed
1874by {\sf tensorentry}.
1875%
1876When no more derivative evaluations are desired the pointer structure
1877{\sf tensorentry} can be deallocated by a call to the function
1878%
1879\begin{tabbing}
1880\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1881\>{\sf int freetensor(m,p,d, (double ****) tensorentry)}\\
1882\>{\sf int m;}                    \> // number of dependent variables $m$ \\
1883\>{\sf int p;}                    \> // number of independent variables $p$\\
1884\>{\sf int d;}                    \> // highest derivative degree $d$\\
1885\>{\sf double*** tensorentry[m];} \> // return value of {\sf tensorsetup} 
1886\end{tabbing} 
1887%
1888that does not deallocate the array {\sf tensor}.
1889
1890The drivers provided for efficient calculation of higher order
1891derivatives are prototyped in the header file \verb=<adolc/drivers/taylor.h>=,
1892which is included by the global header file \verb=<adolc/adolc.h>= automatically
1893(see \autoref{ssec:DesIH}).
1894Example codes using the above procedures can be found in the files
1895\verb=taylorexam.C= and \verb=accessexam.C= contained in the subdirectory
1896\verb=examples/additional_examples/taylor=.
1897%
1898%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1899\subsection{Derivatives of Implicit and Inverse Functions}
1900\label{implicitInverse}
1901%
1902Frequently, one needs derivatives of variables
1903$y \in \R^{m}$ that are implicitly defined as
1904functions of some variables $x \in \R^{n-m}$
1905by an algebraic system of equations
1906\[
1907G(z) \; = \; 0 \in \R^m \quad
1908{\rm with} \quad z = (y, x) \in \R^n .
1909\] 
1910Naturally, the $n$ arguments of $G$ need not be partitioned in
1911this regular fashion and we wish to provide flexibility for a
1912convenient selection of the $n-m$ {\em truly} independent
1913variables. Let $P \in \R^{(n-m)\times n}$ be a $0-1$ matrix
1914that picks out these variables so that it is a column
1915permutation of the matrix $[0,I_{n-m}] \in \R^{(n-m)\times n}$.
1916Then the nonlinear system
1917\[
1918  G(z) \; = \; 0, \quad P z =  x,                           
1919\] 
1920has a regular Jacobian, wherever the implicit function theorem
1921yields $y$ as a function of $x$. Hence, we may also write
1922\begin{equation}
1923\label{eq:inv_tensor}
1924F(z) = \left(\begin{array}{c}
1925                        G(z) \\
1926                        P z
1927                      \end{array} \right)\; \equiv \;
1928                \left(\begin{array}{c}
1929                        0 \\
1930                        P z
1931                      \end{array} \right)\; \equiv \; S\, x,
1932\end{equation}
1933where $S = [0,I_p]^{T} \in \R^{n \times p}$ with $p=n-m$. Now, we have rewritten
1934the original implicit functional relation between $x$ and $y$ as an inverse
1935relation $F(z) = Sx$. In practice, we may implement the projection $P$ simply
1936by marking $n-m$ of the independents also dependent. 
1937
1938Given any $ F : \R^n \mapsto \R^n $ that is locally invertible and an arbitrary
1939seed matrix $S \in \R^{n \times p}$ we may evaluate all derivatives of $z \in \R^n$
1940with respect to $x \in \R^p$ by calling the following routine:
1941%
1942\begin{tabbing}
1943\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1944\>{\sf void inverse\_tensor\_eval(tag,n,d,p,z,tensor,S)}\\
1945\>{\sf short int tag;}         \> // tape identification \\
1946\>{\sf int n;}                 \> // number of variables $n$\\
1947\>{\sf int d;}                 \> // highest derivative degree $d$\\
1948\>{\sf int p;}                 \> // number of directions $p$\\
1949\>{\sf double z[n];}          \> // values of independent variables $z$\\
1950\>{\sf double tensor[n][size];}\> // partials of $z$ with respect to $x$\\
1951\>{\sf double S[n][p];}        \> // seed matrix $S$
1952\end{tabbing}                 
1953%                 
1954The results obtained in {\sf tensor} are exactly the same as if we had called {\sf tensor\_eval} with
1955{\sf tag} pointing to a tape for the evaluation of the inverse function
1956$z=F^{-1}(y)$ for which naturally $n=m$. Note that the columns of $S$ belong
1957to the domain of that function. Individual derivative components can be
1958accessed in tensor exactly as in the explicit case described above.
1959
1960It must be understood that {\sf inverse\_tensor\_eval} actually computes the
1961derivatives of $z$ with respect to $x$ that is defined by the equation
1962$F(z)=F(z_0)+S \, x$. In other words the base point at
1963which the inverse function is differentiated is given by $F(z_0)$.
1964The routine has no capability for inverting $F$ itself as
1965solving systems of nonlinear
1966equations $F(z)=0$ in the first place is not just a differentiation task.
1967However, the routine {\sf jac\_solv} described in \autoref{optdrivers} may certainly be very
1968useful for that purpose.
1969
1970As an example consider the following two nonlinear expressions
1971\begin{eqnarray*}
1972      G_1(z_1,z_2,z_3,z_4) & = & z_1^2+z_2^2-z_3^\\
1973      G_2(z_1,z_2,z_3,z_4) & = & \cos(z_4) - z_1/z_3 \enspace   .
1974\end{eqnarray*}   
1975The equations $G(z)=0$ describe the relation between the Cartesian
1976coordinates $(z_1,z_2)$ and the polar coordinates $(z_3,z_4)$ in the plane.
1977Now, suppose we are interested in the derivatives of the second Cartesian
1978$y_1=z_2$ and the second (angular) polar coordinate $y_2=z_4$ with respect
1979to the other two variables $x_1=z_1$ and $x_2=z_3$. Then the active section
1980could look simply like
1981%
1982\begin{tabbing}
1983\hspace{1.5in}\={\sf for (j=1; j $<$ 5;$\,$j++)}\hspace{0.15in} \= {\sf z[j] \boldmath $\ll=$ \unboldmath  zp[j];}\\
1984\>{\sf g[1] = z[1]*z[1]+z[2]*z[2]-z[3]*z[3]; }\\
1985\>{\sf g[2] = cos(z[4]) - z[1]/z[3]; }\\
1986\>{\sf g[1] \boldmath $\gg=$ \unboldmath gp[1];} \> {\sf g[2] \boldmath $\gg=$ \unboldmath gp[2];}\\
1987\>{\sf z[1] \boldmath $\gg=$ \unboldmath zd[1];} \> {\sf z[3] \boldmath $\gg=$ \unboldmath zd[2];}
1988\end{tabbing}     
1989%
1990where {\sf zd[1]} and {\sf zd[2]} are dummy arguments.
1991In the last line the two independent variables {\sf z[1]} and
1992{\sf z[3]} are made
1993simultaneously dependent thus generating a square system that can be
1994inverted (at most arguments). The corresponding projection and seed
1995matrix are
1996\begin{eqnarray*}
1997P \;=\; \left( \begin{array}{cccc}
1998               1 & 0 & 0 & 0 \\
1999               0 & 0 & 1 & 0
2000            \end{array}\right) \quad \mbox{and} \quad
2001S^T \; = \; \left( \begin{array}{cccc}
2002               0 & 0 & 1 & 0 \\
2003               0 & 0 & 0 & 1
2004            \end{array}\right\enspace .
2005\end{eqnarray*}
2006Provided the vector {\sf zp} is consistent in that its Cartesian and polar
2007components describe the same point in the plane the resulting tuple
2008{\sf gp} must vanish. The call to {\sf inverse\_tensor\_eval} with
2009$n=4$, $p=2$ and $d$
2010as desired will yield the implicit derivatives, provided
2011{\sf tensor} has been allocated appropriately of course and $S$ has the value
2012given above.
2013%
2014The example is untypical in that the implicit function could also be
2015obtained explicitly by symbolic mani\-pu\-lations. It is typical in that
2016the subset of $z$ components that are to be considered as truly
2017independent can be selected and altered with next to no effort at all.
2018
2019The presented drivers are prototyped in the header file
2020\verb=<adolc/drivers/taylor.h>=. As indicated before this header
2021is included by the global header file \verb=<adolc/adolc.h>= automatically
2022(see \autoref{ssec:DesIH}).
2023The example programs \verb=inversexam.cpp=, \verb=coordinates.cpp= and
2024\verb=trigger.cpp=  in the directory \verb=examples/additional_examples/taylor=
2025show the application of the procedures described here.
2026%
2027%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2028\subsection{Drivers of the Toolbox for Lie Derivatives}
2029Nonlinear controller and observer design often require certain types of Lie derivatives. These derivatives also arise in other areas such as classical mechanics and relativity.
2030Lie derivatives are total derivatives of tensor fields along a vector field.
2031The drivers for calculating Lie derivatives in ADOL-C presented in this section
2032were developed by Klaus R{\"o}benack~\cite{Roeb05} and his co-workers Jan Winkler, Siqian Wang and Mirko Franke~\cite{Roeb11}.
2033They are prototyped as C functions in the header file {\verb=<adolc_lie.h>=} and
2034allow the calculation of Lie derivatives of scalar, vector and covector fields. In addition, the calculation of gradients of Lie derivatives of scalar fields is supported.
2035
2036We consider computational problems occurring in controller and observer design for
2037nonlinear control systems
2038\begin{equation*}
2039\dot{\mathbf x}=\mathbf f(\mathbf x)+\mathbf g(\mathbf x)u,
2040\quad \mathbf y=\mathbf h(\mathbf x),\quad \mathbf x(0)=\mathbf x_0 \in \Omega\\
2041\end{equation*}
2042with the scalar input u, the state $\mathbf x$, and the output $\mathbf y$, where the vector
2043fields $\mathbf f : \Omega \to \R^n$, $\mathbf g: \Omega \to \R^n$
2044and the map $\mathbf h: \Omega\to \R^m$ (for $m=1$ this map is the scalar field $h$)
2045are defined on an open subset $\Omega\subseteq\R^n$.
2046
2047\sloppy
2048\subsubsection*{Lie Derivatives of Scalar Fields}
2049Consider the initial value problem
2050\[
2051 \dot{\mathbf x}={\mathbf f}(\mathbf x),\quad y=h(\mathbf x),\quad \mathbf{x}(0)=\mathbf{x}_0\in\Omega
2052\]
2053with the vector field $\mathbf f : \Omega \to \R^n$. The Lie derivative of the scalar field $h:\Omega\to\R$ along the vector field~$\mathbf{f}$ are time derivatives of the curve~$y$, i.e.,
2054\[
2055 \dot{y}(0)=L_\mathbf{f}h(\mathbf{x}_0),\;
2056 \ddot{y}(0)=L_\mathbf{f}^2h(\mathbf{x}_0),\;  \ldots,\;
2057 y^{(d)}(0)=L_\mathbf{f}^dh(\mathbf{x}_0).
2058\]
2059Formally, the Lie derivative of the scalar field~$h$ along the vector field~$\mathbf{f}$ is defined by
2060\[
2061 L_{\mathbf f} h(\mathbf x)=\frac{\partial h(\mathbf x)}{\partial \mathbf x}\,\mathbf f(\mathbf x).
2062\]
2063Higher order Lie derivatives are given by the recursion
2064\[
2065 L_{\mathbf f}^{k+1} h(\mathbf x)= \frac{\partial L_{\mathbf f}^k h(\mathbf x)}{\partial \mathbf x}\,\mathbf f(\mathbf x)\quad\text{with}\quad L_{\mathbf f}^0 h(\mathbf x)= h(\mathbf x).
2066\]
2067To compute Lie derivatives
2068\[
2069(L_\mathbf f^0h(\mathbf x_0),\ldots,L_\mathbf f^dh(\mathbf x_0))
2070\]
2071we need the trace numbers of the active sections of the vector field~$\mathbf f$ and the scalar field~$h$,
2072the number of independent variables $n$, the initial value $\mathbf x_0\in\Omega$
2073and the highest derivative degree $d$. The values of the Lie derivatives are
2074then stored in the one dimensional array {\verb=result=} of length $d+1$:
2075
2076\begin{tabbing}
2077\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2078\>{\sf int lie\_scalarc(Tape\_F, Tape\_H, n, x0, d, result)}\\
2079\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2080\>{\sf short Tape\_H;}         \> // trace identification of scalar field $h$\\
2081\>{\sf short n;}               \> // number of independent variables $n$ and $m = 1$\\
2082\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2083\>{\sf short d; }              \> // highest derivative degree $d$\\
2084\>{\sf double result[d+1];}    \> // resulting Lie derivatives of a scalar field\\
2085\end{tabbing}   
2086
2087
2088For a smooth vector-valued map $\mathbf h: \Omega\rightarrow \R^m$ with
2089$\mathbf h(\mathbf x)=(h_1(\mathbf x),\ldots,h_m(\mathbf x))^T$
2090we define the Lie derivative
2091\begin{equation}\label{eq:lie_vectorial}
2092L_\mathbf f^k\mathbf h(\mathbf x_0)=
2093(L_\mathbf f^kh_1(\mathbf x_0),\ldots,L_\mathbf f^kh_m(\mathbf x_0))^T\in\R^m
2094 \quad \text{for}\quad k = 0, \cdots, d,
2095\end{equation}
2096component-wise by the Lie derivatives $L_\mathbf f^k h_1(\mathbf x_0),\ldots,L_\mathbf f^k h_m(\mathbf x_0)$ of the $m$ scalar fields $h_1, \ldots, h_m:\Omega\to\R$. Note that~\eqref{eq:lie_vectorial} should not be confused with the Lie derivative of a vector field. To compute Lie derivatives~\eqref{eq:lie_vectorial} of the vector-valued map $\mathbf h$
2097we additionally need the number of dependent variables $m$:
2098
2099\begin{tabbing}
2100\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2101\>{\sf int lie\_scalarcv(Tape\_F, Tape\_H, n, m, x0, d, result)}\\
2102\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2103\>{\sf short Tape\_H;}         \> // trace identification of vector-valued map $\mathbf h$\\
2104\>{\sf short n;}               \> // number of independent variables $n$\\
2105\>{\sf short m;}               \> // number of dependent variables $m$\\
2106\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2107\>{\sf short d; }              \> // highest derivative degree $d$\\
2108\>{\sf double result[m][d+1];} \> // resulting Lie derivatives of vectorial scalar fields\\
2109\end{tabbing} 
2110
2111\subsubsection*{Gradients of Lie Derivatives of Scalar Fields}
2112To compute the gradients
2113\[
2114(\mathrm{d}L_\mathbf f^0h(\mathbf x_0),\ldots, \mathrm{d}L_\mathbf f^dh(\mathbf x_0))
2115\]
2116of the Lie derivatives $L_\mathbf f^0h(\mathbf x_0),\ldots, L_\mathbf f^dh(\mathbf x_0)$ of the scalar field $h:\Omega\to\R$ along the vector field $\mathbf f:\Omega\to\R^n$,
2117the following C function can be used:
2118
2119\begin{tabbing}
2120\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2121\>{\sf int lie\_gradientc(Tape\_F, Tape\_H, n, x0, d, result)}\\
2122\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2123\>{\sf short Tape\_H;}         \> // trace identification of scalar field $h$\\
2124\>{\sf short n;}               \> // number of independent variables $n$ and $m=1$\\
2125\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2126\>{\sf short d;}               \> // highest derivative degree $d$\\
2127\>{\sf double result[n][d+1];} \> // resulting gradients of Lie derivatives\\
2128\>{\sf} \> // of a scalar field\\
2129\end{tabbing} 
2130
2131For calculating the jacobians $\mathrm{d}L_\mathbf f^k\mathbf h(\mathbf x_0)\in\R^{m\times n}$
2132of Lie derivatives~\eqref{eq:lie_vectorial} of the vector-valued map $\mathbf h: \Omega\rightarrow \R^m$ for $k=0,\ldots,d$ one can use the following function, where the $d+1$ matrices are stored in the three dimensional array {\verb=result=}:
2133
2134\begin{tabbing}
2135\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2136\>{\sf int lie\_gradientcv(Tape\_F, Tape\_H, n, m, x0, d, result)}\\
2137\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2138\>{\sf short Tape\_H;}         \> // trace identification of vector-valued map $\mathbf h$\\
2139\>{\sf short n;}               \> // number of independent variables $n$\\
2140\>{\sf short m;}               \> // number of dependent variables $m$\\
2141\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2142\>{\sf short d;}               \> // highest derivative degree $d$\\
2143\>{\sf double result[m][n][d+1];} \> // resulting jacobians of Lie derivatives\\
2144\>{\sf} \> // of vectorial scalar fields\\
2145\end{tabbing} 
2146
2147\subsubsection*{Lie Derivatives of Covector Fields}
2148
2149If we consider the elements of the real vector space $\R^n$ as column vectors, the elements of the associated dual space $(\R^n)^*$ can be represented by row vectors. These row vectors are called adjoint vectors in ADOL-C and covectors in differential geometry.
2150In a program the user does not have to distinguish between a vector field
2151$\mathbf f: \Omega \rightarrow \R^n$ and a covector field
2152$\boldsymbol{\omega}: \Omega \rightarrow (\R^n)^*$, since for both fields the dependent variables are stored in one dimensional arrays.
2153The Lie derivative of a covector field~$\boldsymbol{\omega}$ is defined by
2154\[
2155 L_{\mathbf f}\boldsymbol{\omega}(\mathbf x)=\boldsymbol{\omega}(\mathbf x)\,{\mathbf f}^\prime(\mathbf x)
2156 + {\mathbf f}^T(\mathbf x) \left(\frac{\partial \boldsymbol{\omega}^T (\mathbf x)}{\partial \mathbf x}\right)^T.
2157\]
2158The $d+1$ Lie derivatives $L_{\mathbf f}^0\boldsymbol{\omega}(\mathbf x),\ldots,L_{\mathbf f}^d\boldsymbol{\omega}(\mathbf x)$ of the covector field $\boldsymbol{\omega}$ along the vector field~$\mathbf{f}$
2159can be computed by
2160
2161\begin{tabbing}
2162\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2163\>{\sf int lie\_covector(Tape\_F, Tape\_W, n, x0, d, result)}\\
2164\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2165\>{\sf short Tape\_W;}         \> // trace identification of covector field $\boldsymbol{\omega}$\\
2166\>{\sf short n;}               \> // number of independent variables $n$\\
2167\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2168\>{\sf short d;}               \> // highest derivative degree $d$\\
2169\>{\sf double result[n][d+1];} \> // resulting Lie derivatives of a covector field\\
2170\end{tabbing} 
2171
2172\subsubsection*{Lie Derivatives of a Vector Field (Lie Brackets)}
2173
2174Lie derivatives (Lie brackets) of a vector field $\mathbf g: \Omega \rightarrow \R^n$ along a vector field $\mathbf f: \Omega \rightarrow \R^n$ are defined by
2175\[
2176 \mathrm{ad}_\mathbf{f}\mathbf{g}(\mathbf{x})=[\mathbf{f},\mathbf{g}](\mathbf{x})
2177 =\mathbf{g}^{\prime}(\mathbf{x})\mathbf{f}(\mathbf{x})-\mathbf{f}^{\prime}(\mathbf{x})\mathbf{g}(\mathbf{x}).
2178\]
2179We can calculate iterated Lie brackets $\mathrm{ad}_\mathbf{f}^0\mathbf{g}(\mathbf{x}),\ldots,\mathrm{ad}_\mathbf{f}^{d}\mathbf{g}(\mathbf{x})$ at the point $\mathbf{x}_0\in\Omega$ 
2180using the C function listed below:
2181
2182\begin{tabbing}
2183\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2184\>{\sf int lie\_bracket(Tape\_F, Tape\_G, n, x0, d, result)}\\
2185\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2186\>{\sf short Tape\_G;}         \> // trace identification of vector field $\mathbf g$\\
2187\>{\sf short n;}               \> // number of independent variables $n$\\
2188\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2189\>{\sf short d;}               \> // highest derivative degree $d$\\
2190\>{\sf double result[n][d+1];} \> // resulting Lie derivatives of a vector field\\
2191\end{tabbing} 
2192
2193\subsubsection*{Additional C++ Drivers for Lie Derivatives}
2194
2195For Lie derivatives of scalar fields, the C interface offers two different drivers, namely {\verb=lie_scalarc=} for $m=1$ scalar field $h:\Omega\to\R$ and {\verb=lie_scalarcv=} for $m\geq1$ scalar fields $h_1,\ldots,h_m:\Omega\to\R$, which are combined in a vector-valued map $\mathbf{h}:\Omega\to\R^m$. Using the polymorphism in C++, both C drivers are unified by the C++ driver {\verb=lie_scalar=} for the computation of Lie derivatives of scalar fields:
2196
2197\begin{tabbing}
2198\hspace{0.5in}\={\sf int lie\_scalar(Tape\_F, Tape\_H, n, x0, d, result)} \hspace{0.45in}\= // case $m=1$\\
2199\>{\sf int lie\_scalar(Tape\_F, Tape\_H, n, m, x0, d, result)} \> // case $m\geq1$
2200\end{tabbing} 
2201
2202The same situation occurs with the gradients or jacobians of Lie derivatives of scalar fields. Here, the C drivers {\verb=lie_gradientc=} and {\verb=lie_gradientcv=} are unified by the C++ driver {\verb=lie_gradient=}:
2203
2204\begin{tabbing}
2205\hspace{0.5in}\={\sf int lie\_gradient(Tape\_F, Tape\_H, n, x0, d, result)} \hspace{0.35in}\= // case $m=1$\\
2206\>{\sf int lie\_gradient(Tape\_F, Tape\_H, n, m, x0, d, result)} \> // case $m\geq1$
2207\end{tabbing} 
2208
2209The C++ drivers are also prototyped in the header file {\sf $<$adolc\_lie.h$>$}. An example how to use these drivers for the calculation of Lie derivatives can be found in the file {\sf GantryCrane.cpp} in the directory {\verb=additional_examples/lie=}.
2210
2211%
2212%
2213%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
2214\section{Basic Drivers for the Forward and Reverse Mode}
2215\label{forw_rev_ad}
2216%
2217In this section, we present tailored drivers for different
2218variants of the forward mode and the reverse mode, respectively.
2219For a better understanding, we start with a short
2220description of the mathematical background.
2221
2222Provided no arithmetic exception occurs,
2223no comparison including {\sf fmax} or  {\sf fmin} yields a tie,
2224{\sf fabs} does not yield zero,
2225and all special functions were evaluated in the
2226interior of their domains, the functional relation between the input
2227variables $x$
2228and the output variables $y$ denoted by $y=F(x)$ is in
2229fact analytic.  In other words, we can compute arbitrarily high
2230derivatives of the vector function $F : I\!\!R^n \mapsto I\!\!R^m$ defined
2231by the active section.
2232We find it most convenient to describe and
2233compute derivatives in terms of univariate Taylor expansions, which
2234are truncated after the highest derivative degree $d$ that is desired
2235by the user. Let
2236\begin{equation}
2237\label{eq:x_of_t}
2238x(t) \; \equiv \; \sum_{j=0}^dx_jt^j \; : \;  I\!\!R \; \mapsto \;
2239I\!\!R^n
2240\end{equation} 
2241denote any vector polynomial in the scalar variable $t \in I\!\!R$.
2242In other words, $x(t)$ describes a path in $I\!\!R^n$ parameterized by $t$.
2243The Taylor coefficient vectors
2244\[ x_j \; = \; 
2245\frac{1}{j!} \left .  \frac{\partial ^j}{\partial t^j} x(t)
2246\right |_{t=0}
2247\] 
2248are simply the scaled derivatives of $x(t)$ at the parameter
2249origin $t=0$. The first two vectors $x_1,x_2 \in I\!\!R^n$ can be
2250visualized as tangent and curvature at the base point $x_0$,
2251respectively.
2252Provided that $F$ is $d$ times continuously differentiable, it
2253follows from the chain rule that the image path
2254\begin{equation}
2255\label{eq:rela}
2256 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
2257\end{equation}
2258is also smooth and has $(d+1)$ Taylor coefficient vectors
2259$y_j \in I\!\!R^m$ at $t=0$, so that
2260\begin{equation}
2261\label{eq:series}
2262y(t) \; = \; \sum_{j=0}^d y_jt^j + O(t^{d+1}).
2263\end{equation}
2264Also as a consequence of the chain rule, one can observe that
2265each $y_j$ is uniquely and smoothly determined by the coefficient
2266vectors $x_i$ with $i \leq j$.  In particular we have
2267\begin{align}
2268\label{eq:y_0y_1}
2269  y_0 & = F(x_0) \nonumber \\
2270  y_1 & = F'(x_0) x_1 \nonumber\\
2271  y_2 & = F'(x_0) x_2 + \frac{1}{2}F''(x_0)x_1 x_1 \\
2272  y_3 & = F'(x_0) x_3 + F''(x_0)x_1 x_2
2273          + \frac{1}{6}F'''(x_0)x_1 x_1 x_1\nonumber\\
2274  & \ldots\nonumber
2275\end{align}
2276In writing down the last equations we have already departed from the
2277usual matrix-vector notation. It is well known that the number of
2278terms that occur in these ``symbolic'' expressions for
2279the $y_j$ in terms of the first $j$ derivative tensors of $F$ and
2280the ``input'' coefficients $x_i$ with $i\leq j$ grows very rapidly
2281with $j$. Fortunately, this exponential growth does not occur
2282in automatic differentiation, where the many terms are somehow
2283implicitly combined  so that storage and operations count grow only
2284quadratically in the bound $d$ on $j$.
2285
2286Provided $F$ is analytic, this property is inherited by the functions
2287\[
2288y_j = y_j (x_0,x_1, \ldots ,x_j) \in {I\!\!R}^m ,
2289\]
2290and their derivatives satisfy the identities
2291\begin{equation}
2292\label{eq:identity}
2293\frac{\partial y_j}{\partial x_i}  = \frac{\partial y_{j-i}}
2294{\partial x_0} = A_{j-i}(x_0,x_1, \ldots ,x_{j-i})
2295\end{equation}
2296as established in \cite{Chri91a}. This yields in particular
2297\begin{align*}
2298  \frac{\partial y_0}{\partial x_0} =
2299  \frac{\partial y_1}{\partial x_1} =
2300  \frac{\partial y_2}{\partial x_2} =
2301  \frac{\partial y_3}{\partial x_3} =
2302  A_0 & = F'(x_0) \\
2303  \frac{\partial y_1}{\partial x_0} =
2304  \frac{\partial y_2}{\partial x_1} =
2305  \frac{\partial y_3}{\partial x_2} =
2306  A_1 & = F''(x_0) x_1 \\
2307  \frac{\partial y_2}{\partial x_0} =
2308  \frac{\partial y_3}{\partial x_1} =
2309  A_2 & = F''(x_0) x_2 + \frac{1}{2}F'''(x_0)x_1 x_1 \\
2310  \frac{\partial y_3}{\partial x_0} =
2311  A_3 & = F''(x_0) x_3 + F'''(x_0)x_1 x_2
2312          + \frac{1}{6}F^{(4)}(x_0)x_1 x_1 x_1 \\
2313  & \ldots
2314\end{align*}
2315The $m \times n$ matrices $A_k, k=0,\ldots,d$, are actually the Taylor
2316coefficients of the Jacobian path $F^\prime(x(t))$, a fact that is of
2317interest primarily in the context of ordinary differential
2318equations and differential algebraic equations.
2319
2320Given the tape of an active section and the coefficients $x_j$,
2321the resulting $y_j$ and their derivatives $A_j$ can be evaluated
2322by appropriate calls to the ADOL-C forward mode implementations and
2323the ADOL-C reverse mode implementations. The scalar versions of the forward
2324mode propagate just one truncated Taylor series from the $(x_j)_{j\leq d}$
2325to the $(y_j)_{j\leq d}$. The vector versions of the forward
2326mode propagate families of $p\geq 1$ such truncated Taylor series
2327in order to reduce the relative cost of the overhead incurred
2328in the tape interpretation. In detail, ADOL-C provides
2329\begin{tabbing}
2330\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2331\>{\sf int zos\_forward(tag,m,n,keep,x,y)}\\
2332\>{\sf short int tag;}         \> // tape identification \\
2333\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2334\>{\sf int n;}                 \> // number of independent variables $n$\\
2335\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2336\>{\sf double x[n];}           \> // independent vector $x=x_0$\\
2337\>{\sf double y[m];}           \> // dependent vector $y=F(x_0)$
2338\end{tabbing}                 
2339for the {\bf z}ero-{\bf o}rder {\bf s}calar forward mode. This driver computes
2340$y=F(x)$ with $0\leq\text{\sf keep}\leq 1$. The integer
2341flag {\sf keep} plays a similar role as in the call to 
2342{\sf trace\_on}: It determines if {\sf zos\_forward} writes
2343the first Taylor coefficients of all intermediate quantities into a buffered
2344temporary file, i.e., the value stack, in preparation for a subsequent
2345reverse mode evaluation. The value {\sf keep} $=1$
2346prepares for {\sf fos\_reverse} or {\sf fov\_reverse} as exlained below.
2347
2348To compute first-order derivatives, one has
2349\begin{tabbing}
2350\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2351\>{\sf int fos\_forward(tag,m,n,keep,x0,x1,y0,y1)}\\
2352\>{\sf short int tag;}         \> // tape identification \\
2353\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2354\>{\sf int n;}                 \> // number of independent variables $n$\\
2355\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2356\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2357\>{\sf double x1[n];}          \> // tangent vector $x_1$\\
2358\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2359\>{\sf double y1[m];}          \> // first derivative $y_1=F'(x_0)x_1$
2360\end{tabbing}                 
2361for the {\bf f}irst-{\bf o}rder {\bf s}calar forward mode. Here, one has
2362$0\leq\text{\sf keep}\leq 2$, where
2363\begin{align*}
2364\text{\sf keep} = \left\{\begin{array}{cl}
2365       1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2366       2 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2367       \end{array}\right.
2368\end{align*}
2369as exlained below. For the {\bf f}irst-{\bf o}rder {\bf v}ector forward mode,
2370ADOL-C provides
2371\begin{tabbing}
2372\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2373\>{\sf int fov\_forward(tag,m,n,p,x0,X,y0,Y)}\\
2374\>{\sf short int tag;}         \> // tape identification \\
2375\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2376\>{\sf int n;}                 \> // number of independent variables $n$\\
2377\>{\sf int p;}                 \> // number of directions\\
2378\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2379\>{\sf double X[n][p];}        \> // tangent matrix $X$\\
2380\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2381\>{\sf double Y[m][p];}        \> // first derivative matrix $Y=F'(x)X$
2382\end{tabbing}                 
2383For the computation of higher derivative, the driver
2384\begin{tabbing}
2385\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2386\>{\sf int hos\_forward(tag,m,n,d,keep,x0,X,y0,Y)}\\
2387\>{\sf short int tag;}         \> // tape identification \\
2388\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2389\>{\sf int n;}                 \> // number of independent variables $n$\\
2390\>{\sf int d;}                 \> // highest derivative degree $d$\\
2391\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2392\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2393\>{\sf double X[n][d];}        \> // tangent matrix $X$\\
2394\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2395\>{\sf double Y[m][d];}        \> // derivative matrix $Y$
2396\end{tabbing}
2397implementing the  {\bf h}igher-{\bf o}rder {\bf s}calar forward mode.
2398The rows of the matrix $X$ must correspond to the independent variables in the order of their
2399initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2400$X = \{x_j\}_{j=1\ldots d}$ represent Taylor coefficient vectors as in
2401\eqref{eq:x_of_t}. The rows of the matrix $Y$ must correspond to the
2402dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2403The columns of $Y = \{y_j\}_{j=1\ldots d}$ represent
2404Taylor coefficient vectors as in \eqref{eq:series}, i.e., {\sf hos\_forward}
2405computes the values
2406$y_0=F(x_0)$, $y_1=F'(x_0)x_1$, \ldots, where
2407$X=[x_1,x_2,\ldots,x_d]$ and  $Y=[y_1,y_2,\ldots,y_d]$. Furthermore, one has
2408$0\leq\text{\sf keep}\leq d+1$, with
2409\begin{align*}
2410\text{\sf keep}  \left\{\begin{array}{cl}
2411       = 1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2412       > 1 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2413       \end{array}\right.
2414\end{align*}
2415Once more, there is also a vector version given by
2416\begin{tabbing}
2417\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2418\>{\sf int hov\_forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2419\>{\sf short int tag;}         \> // tape identification \\
2420\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2421\>{\sf int n;}                 \> // number of independent variables $n$\\
2422\>{\sf int d;}                 \> // highest derivative degree $d$\\
2423\>{\sf int p;}                 \> // number of directions $p$\\
2424\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2425\>{\sf double X[n][p][d];}     \> // tangent matrix $X$\\
2426\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2427\>{\sf double Y[m][p][d];}     \> // derivative matrix $Y$
2428\end{tabbing}
2429for the  {\bf h}igher-{\bf o}rder {\bf v}ector forward mode that computes
2430$y_0=F(x_0)$, $Y_1=F'(x_0)X_1$, \ldots, where $X=[X_1,X_2,\ldots,X_d]$ and 
2431$Y=[Y_1,Y_2,\ldots,Y_d]$.
2432
2433There are also overloaded versions providing a general {\sf forward}-call.
2434Details of the appropriate calling sequences are given in \autoref{forw_rev}.
2435
2436Once, the required information is generated due to a forward mode evaluation
2437with an approriate value of the parameter {\sf keep}, one may use the
2438following implementation variants of the reverse mode. To compute first-order derivatives
2439one can use
2440\begin{tabbing}
2441\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2442\>{\sf int fos\_reverse(tag,m,n,u,z)}\\
2443\>{\sf short int tag;}         \> // tape identification \\
2444\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2445\>{\sf int n;}                 \> // number of independent variables $n$\\
2446\>{\sf double u[m];}           \> // weight vector $u$\\
2447\>{\sf double z[n];}           \> // resulting adjoint value $z^T=u^T F'(x)$
2448\end{tabbing}                 
2449as {\bf f}irst-{\bf o}rder {\bf s}calar reverse mode implementation that computes
2450the product $z^T=u^T F'(x)$ after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2451{\sf hos\_forward} with {\sf keep}=1. The corresponding {\bf f}irst-{\bf
2452  o}rder {\bf v}ector reverse mode driver is given by
2453\begin{tabbing}
2454\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2455\>{\sf int fov\_reverse(tag,m,n,q,U,Z)}\\
2456\>{\sf short int tag;}         \> // tape identification \\
2457\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2458\>{\sf int n;}                 \> // number of independent variables $n$\\
2459\>{\sf int q;}                 \> // number of weight vectors $q$\\
2460\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2461\>{\sf double Z[q][n];}        \> // resulting adjoint $Z=U F'(x)$
2462\end{tabbing}                 
2463that can be used after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2464{\sf hos\_forward} with {\sf keep}=1. To compute higher-order derivatives,
2465ADOL-C provides
2466\begin{tabbing}
2467\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2468\>{\sf int hos\_reverse(tag,m,n,d,u,Z)}\\
2469\>{\sf short int tag;}         \> // tape identification \\
2470\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2471\>{\sf int n;}                 \> // number of independent variables $n$\\
2472\>{\sf int d;}                 \> // highest derivative degree $d$\\
2473\>{\sf double u[m];}           \> // weight vector $u$\\
2474\>{\sf double Z[n][d+1];}      \> // resulting adjoints
2475\end{tabbing}                 
2476as {\bf h}igher-{\bf o}rder {\bf s}calar reverse mode implementation yielding
2477the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2478\ldots, where $Z=[z_0,z_1,\ldots,z_d]$ after calling  {\sf fos\_forward} or
2479{\sf hos\_forward} with {\sf keep} $=d+1>1$. The vector version is given by
2480\begin{tabbing}
2481\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2482\>{\sf int hov\_reverse(tag,m,n,d,q,U,Z,nz)}\\
2483\>{\sf short int tag;}         \> // tape identification \\
2484\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2485\>{\sf int n;}                 \> // number of independent variables $n$\\
2486\>{\sf int d;}                 \> // highest derivative degree $d$\\
2487\>{\sf double U[q][m];}        \> // weight vector $u$\\
2488\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints\\
2489\>{\sf short int nz[q][n];}    \> // nonzero pattern of {\sf Z}
2490\end{tabbing}                 
2491as {\bf h}igher-{\bf o}rder {\bf v}ector reverse mode driver to compute
2492the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2493\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$ after calling  {\sf fos\_forward} or
2494{\sf hos\_forward} with {\sf keep} $=d+1>1$.
2495After the function call, the last argument of {\sf hov\_reverse} 
2496contains information about the sparsity pattern, i.e. each {\sf nz[i][j]}
2497has a value that characterizes the functional relation between the
2498$i$-th component of $UF^\prime(x)$ and the $j$-th independent value
2499$x_j$ as:
2500\begin{center}
2501\begin{tabular}{ll}
2502 0 & trivial \\
2503 1 & linear
2504\end{tabular} \hspace*{4ex}
2505\begin{tabular}{ll}
2506 2 & polynomial\\
2507 3 & rational
2508\end{tabular} \hspace*{4ex}
2509\begin{tabular}{ll}
2510 4 & transcendental\\
2511 5 & non-smooth
2512\end{tabular}
2513\end{center}
2514Here, ``trivial'' means that there is no dependence at all and ``linear'' means
2515that the partial derivative is a constant that
2516does not dependent on other variables either. ``Non-smooth'' means that one of
2517the functions on the path between $x_i$ and $y_j$ was evaluated at a point
2518where it is not differentiable.  All positive labels
2519$1, 2, 3, 4, 5$ are pessimistic in that the actual functional relation may
2520in fact be simpler, for example due to exact cancellations. 
2521
2522There are also overloaded versions providing a general {\sf reverse}-call.
2523Details of the appropriate calling sequences are given in the following \autoref{forw_rev}.
2524
2525
2526\input{absdrivers}
2527%
2528%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2529\section{Overloaded Forward and Reverse Calls}
2530\label{forw_rev}
2531%
2532In this section, the several versions of the {\sf forward} and
2533{\sf reverse} routines, which utilize the overloading capabilities
2534of C++, are described in detail. With exception of the bit pattern
2535versions all interfaces are prototyped in the header file
2536\verb=<adolc/interfaces.h>=, where also some more specialized {\sf forward}
2537and {\sf reverse} routines are explained. Furthermore, \mbox{ADOL-C} provides
2538C and Fortran-callable versions prototyped in the same header file.
2539The bit pattern versions of {\sf forward} and {\sf reverse} introduced
2540in the \autoref{ProBit} are prototyped in the header file
2541\verb=<adolc/sparse/sparsedrivers.h>=, which will be included by the header
2542file \verb=<adolc/interfaces.h>= automatically.
2543%
2544%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2545\subsection{The Scalar Case}
2546%
2547\label{ScaCas}
2548%     
2549Given any correct tape, one may call from within
2550the generating program, or subsequently during another run, the following
2551procedure:
2552%
2553\begin{tabbing}
2554\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2555\>{\sf int forward(tag,m,n,d,keep,X,Y)} \\
2556\>{\sf short int tag;}         \> // tape identification \\
2557\>{\sf int m;}                 \> // number of dependent variables $m$\\
2558\>{\sf int n;}                 \> // number of independent variables $n$\\
2559\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2560\>{\sf  int keep;}             \> // flag for reverse sweep \\ 
2561\>{\sf  double X[n][d+1];}     \> // Taylor coefficients $X$ of
2562                                     independent variables \\
2563\>{\sf double Y[m][d+1];}      \> // Taylor coefficients $Y$ as
2564                                     in \eqref{eq:series}
2565\end{tabbing}
2566%
2567The rows of the matrix $X$ must correspond to the independent variables in the order of their
2568initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2569$X = \{x_j\}_{j=0\ldots d}$ represent Taylor coefficient vectors as in
2570\eqref{eq:x_of_t}. The rows of the matrix $Y$ must
2571correspond to the
2572dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2573The columns of $Y = \{y_j\}_{j=0\ldots d}$ represent
2574Taylor coefficient vectors as in \eqref{eq:series}.
2575Thus the first column of $Y$ contains the
2576function value $F(x)$ itself, the next column represents the first
2577Taylor coefficient vector of $F$, and the last column the
2578$d$-th Taylor coefficient vector. The integer flag {\sf keep} determines
2579how many Taylor coefficients of all intermediate quantities are
2580written into the value stack as explained in \autoref{forw_rev_ad}.
2581 If {\sf keep} is omitted, it defaults to 0.
2582
2583The given {\sf tag} value is used by {\sf forward} to determine the
2584name of the file on which the tape was written. If the tape file does
2585not exist, {\sf forward} assumes that the relevant
2586tape is still in core and reads from the buffers.
2587After the execution of an active section with \mbox{{\sf keep} = 1} or a call to
2588{\sf forward} with any {\sf keep} $\leq$ $d+1$, one may call
2589the function {\sf reverse} with \mbox{{\sf d} = {\sf keep} $-$ 1} and the same tape
2590identifier {\sf tag}. When $u$ is a vector
2591and $Z$ an $n\times (d+1)$ matrix
2592{\sf reverse} is executed in the scalar mode by the calling
2593sequence
2594%
2595\begin{tabbing}
2596\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position             
2597\>{\sf int reverse(tag,m,n,d,u,Z)}\\
2598\>{\sf short int tag;}         \> // tape identification \\
2599\>{\sf int m;}                 \> // number of dependent variables $m$\\
2600\>{\sf int n;}                 \> // number of independent variables $n$\\
2601\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2602\>{\sf  double u[m];}          \> // weighting vector $u$\\
2603\>{\sf double Z[n][d+1];}      \> // resulting adjoints $Z$ 
2604\end{tabbing}
2605to compute
2606the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2607\ldots, where $Z=[z_0,z_1,\ldots,z_d]$.
2608%
2609%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2610\subsection{The Vector Case}
2611%
2612\label{vecCas}
2613%
2614When $U$ is a matrix {\sf reverse} is executed in the vector mode by the following calling sequence
2615%
2616\begin{tabbing}
2617\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position       
2618\>{\sf int reverse(tag,m,n,d,q,U,Z,nz)}\\
2619\>{\sf short int tag;}         \> // tape identification \\
2620\>{\sf int m;}                 \> // number of dependent variables $m$\\
2621\>{\sf int n;}                 \> // number of independent variables $n$\\
2622\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2623\>{\sf int q;}                 \> // number of weight vectors $q$\\
2624\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2625\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints \\
2626\>{\sf short nz[q][n];}        \> // nonzero pattern of {\sf Z}
2627\end{tabbing}
2628%
2629to compute the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2630\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$.
2631When the arguments {\sf p} and {\sf U} are omitted, they default to
2632$m$ and the identity matrix of order $m$, respectively. 
2633
2634Through the optional argument {\sf nz} of {\sf reverse} one can compute
2635information about the sparsity pattern of $Z$ as described in detail
2636in the previous \autoref{forw_rev_ad}.
2637
2638The return values of {\sf reverse} calls can be interpreted according
2639to \autoref{retvalues}, but negative return values are not
2640valid, since the corresponding forward sweep would have
2641stopped without completing the necessary taylor file.
2642The return value of {\sf reverse} may be higher
2643than that of the preceding {\sf forward} call because some operations
2644that were evaluated  at a critical argument during the forward sweep
2645were found not to impact the dependents during the reverse sweep.
2646
2647In both scalar and vector mode, the degree $d$ must agree with
2648{\sf keep}~$-$~1 for the most recent call to {\sf forward}, or it must be
2649equal to zero if {\sf reverse} directly follows the taping of an active
2650section. Otherwise, {\sf reverse} will return control with a suitable error
2651message.
2652In order to avoid possible confusion, the first four arguments must always be
2653present in the calling sequence. However, if $m$ or $d$
2654attain their trivial values 1 and 0, respectively, then
2655corresponding dimensions of the arrays {\sf X}, {\sf Y}, {\sf u},
2656{\sf U}, or {\sf Z} can be omitted, thus eliminating one level of
2657indirection.  For example, we may call
2658{\sf reverse(tag,1,n,0,1.0,g)} after declaring
2659{\sf double g[n]} 
2660to calculate a gradient of a scalar-valued function.
2661
2662Sometimes it may be useful to perform a forward sweep for families of
2663Taylor series with the same leading term.
2664This vector version of {\sf forward} can be called in the form
2665%
2666\begin{tabbing}
2667\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2668\>{\sf int forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2669\>{\sf short int tag;}         \> // tape identification \\
2670\>{\sf int m;}                 \> // number of dependent variables $m$\\
2671\>{\sf int n;}                 \> // number of independent variables $n$\\
2672\>{\sf int d;}                 \> // highest derivative degree $d$\\
2673\>{\sf int p;}                 \> // number of Taylor series $p$\\
2674\>{\sf  double x0[n];}          \> // values of independent variables $x_0$\\
2675\>{\sf double X[n][p][d];}     \> // Taylor coefficients $X$ of independent variables\\
2676\>{\sf double y0[m];}           \> // values of dependent variables $y_0$\\
2677\>{\sf double Y[m][p][d];}     \> // Taylor coefficients $Y$ of dependent variables
2678\end{tabbing}
2679%
2680where {\sf X} and {\sf Y} hold the Taylor coefficients of first
2681and higher degree and {\sf x0}, {\sf y0} the common Taylor coefficients of
2682degree 0. There is no option to keep the values of active variables
2683that are going out of scope or that are overwritten. Therefore this
2684function cannot prepare a subsequent reverse sweep.
2685The return integer serves as a flag to indicate quadratures or altered
2686comparisons as described above in \autoref{reuse_tape}.
2687
2688Since the calculation of Jacobians is probably the most important
2689automatic differentia\-tion task, we have provided a specialization
2690of vector {\sf forward} to the case where $d = 1$. This version can be
2691called in the form
2692%
2693\begin{tabbing}
2694\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2695\>{\sf int forward(tag,m,n,p,x,X,y,Y)}\\
2696\>{\sf short int tag;}         \> // tape identification \\
2697\>{\sf int m;}                 \> // number of dependent variables $m$\\
2698\>{\sf int n;}                 \> // number of independent variables $n$\\
2699\>{\sf int p;}                 \> // number of partial derivatives $p$ \\
2700\>{\sf double x[n];}          \> // values of independent variables $x_0$\\
2701\>{\sf double X[n][p];}        \> // seed derivatives of independent variables $X$\\
2702\>{\sf double y[m];}           \> // values of dependent variables $y_0$\\
2703\>{\sf double Y[m][p];}        \> // first derivatives of dependent variables $Y$
2704\end{tabbing}
2705%
2706When this routine is called with {\sf p} = {\sf n} and {\sf X} the identity matrix,
2707the resulting {\sf Y} is simply the Jacobian $F^\prime(x_0)$. In general,
2708one obtains the $m\times p$ matrix $Y=F^\prime(x_0)\,X $ for the
2709chosen initialization of $X$. In a workstation environment a value
2710of $p$ somewhere between $10$ and $50$
2711appears to be fairly optimal. For smaller $p$ the interpretive
2712overhead is not appropriately amortized, and for larger $p$ the
2713$p$-fold increase in storage causes too many page faults. Therefore,
2714large Jacobians that cannot be compressed via column coloring
2715as could be done for example using the driver {\sf sparse\_jac}
2716should be ``strip-mined'' in the sense that the above
2717first-order-vector version of {\sf forward} is called
2718repeatedly with the successive \mbox{$n \times p$} matrices $X$ forming
2719a partition of the identity matrix of order $n$.
2720%
2721%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2722\subsection{Dependence Analysis}
2723%
2724\label{ProBit}
2725%
2726The sparsity pattern of Jacobians is often needed to set up data structures
2727for their storage and factorization or to allow their economical evaluation
2728by compression \cite{BeKh96}. Compared to the evaluation of the full
2729Jacobian $F'(x_0)$ in real arithmetic computing the Boolean matrix
2730$\tilde{P}\in\left\{0,1\right\}^{m\times n}$ representing its sparsity
2731pattern in the obvious way requires a little less run-time and
2732certainly a lot less memory.
2733
2734The entry $\tilde{P}_{ji}$ in the $j$-th row and $i$-th column
2735of $\tilde{P}$ should be $1 = true$ exactly when there is a data
2736dependence between the $i$-th independent variable $x_{i}$ and
2737the $j$-th dependent variable $y_{j}$. Just like for real arguments
2738one would wish to compute matrix-vector and vector-matrix products
2739of the form $\tilde{P}\tilde{v}$ or $\tilde{u}^{T}\tilde{P}$ 
2740by appropriate {\sf forward} and {\sf reverse} routines where
2741$\tilde{v}\in\{0,1\}^{n}$ and $\tilde{u}\in\{0,1\}^{m}$.
2742Here, multiplication corresponds to logical
2743{\sf AND} and addition to logical {\sf OR}, so that algebra is performed in a
2744semi-ring.
2745
2746For practical reasons it is assumed that
2747$s=8*${\sf sizeof}$(${\sf unsigned long int}$)$ such Boolean vectors
2748$\tilde{v}$ and $\tilde{u}$ are combined to integer vectors
2749$v\in\N^{n}$ and $u\in\N^{m}$ whose components can be interpreted
2750as bit patterns. Moreover $p$ or $q$ such integer vectors may
2751be combined column-wise or row-wise to integer matrices $X\in\N^{n \times p}$ 
2752and $U\in\N^{q \times m}$, which naturally correspond
2753to Boolean matrices $\tilde{X}\in\{0,1\}^{n\times\left(sp\right)}$
2754and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times m}$. The provided
2755bit pattern versions of {\sf forward} and {\sf reverse} allow
2756to compute integer matrices $Y\in\N^{m \times p}$ and
2757$Z\in\N^{q \times m}$ corresponding to
2758\begin{equation}
2759\label{eq:int_forrev}
2760\tilde{Y} = \tilde{P}\tilde{X} \qquad \mbox{and} \qquad 
2761\tilde{Z} = \tilde{U}\tilde{P} \, ,
2762\end{equation} 
2763respectively, with $\tilde{Y}\in\{0,1\}^{m\times\left(sp\right)}$
2764and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times n}$.
2765In general, the application of the bit pattern versions of
2766{\sf forward} or {\sf reverse} can be interpreted as
2767propagating dependences between variables forward or backward, therefore
2768both the propagated integer matrices and the corresponding
2769Boolean matrices are called {\em dependence structures}.
2770 
2771The bit pattern {\sf forward} routine
2772%
2773\begin{tabbing}
2774\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2775\>{\sf int forward(tag,m,n,p,x,X,y,Y,mode)}\\
2776\>{\sf short int tag;}              \> // tape identification \\
2777\>{\sf int m;}                      \> // number of dependent variables $m$\\
2778\>{\sf int n;}                      \> // number of independent variables $n$\\
2779\>{\sf int p;}                      \> // number of integers propagated $p$\\
2780\>{\sf double x[n];}                \> // values of independent variables $x_0$\\
2781\>{\sf unsigned long int X[n][p];}  \> // dependence structure $X$ \\
2782\>{\sf double y[m];}                \> // values of dependent variables $y_0$\\
2783\>{\sf unsigned long int Y[m][p];}  \> // dependence structure $Y$ according to
2784                                     \eqref{eq:int_forrev}\\
2785\>{\sf char mode;}                  \> // 0 : safe mode (default), 1 : tight mode
2786\end{tabbing}
2787%
2788can be used to obtain the dependence structure $Y$ for a given dependence structure
2789$X$. The dependence structures are
2790represented as arrays of {\sf unsigned long int} the entries of which are
2791interpreted as bit patterns as described above.   
2792For example, for $n=3$ the identity matrix $I_3$ should be passed
2793with $p=1$ as the $3 \times 1$ array
2794\begin{eqnarray*}
2795{\sf X} \; = \;
2796\left( \begin{array}{r}
2797         {\sf 1}0000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2798         0{\sf 1}000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2799         00{\sf 1}00000 \: 00000000 \: 00000000 \: 00000000_2
2800       \end{array} \right)
2801\end{eqnarray*}
2802in the 4-byte long integer format. The parameter {\sf mode} determines
2803the mode of dependence analysis as explained already in \autoref{sparse}.
2804
2805A call to the corresponding bit pattern {\sf reverse} routine
2806%
2807\begin{tabbing}
2808\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2809\>{\sf int reverse(tag,m,n,q,U,Z,mode)}\\
2810\>{\sf short int tag;}         \> // tape identification \\
2811\>{\sf int m;}                 \> // number of dependent variables $m$\\
2812\>{\sf int n;}                 \> // number of independent variables $n$\\
2813\>{\sf int q;}                 \> // number of integers propagated q\\
2814\>{\sf unsigned long int U[q][m];}  \> // dependence structure $U$ \\
2815\>{\sf unsigned long int Z[q][n];}  \> // dependence structure $Z$ according
2816                                     to \eqref{eq:int_forrev}\\
2817\>{\sf char mode;}        \> // 0 : safe mode (default), 1 : tight mode
2818\end{tabbing}
2819%
2820yields the dependence structure $Z$ for a given dependence structure
2821$U$.
2822
2823To determine the whole sparsity pattern $\tilde{P}$ of the Jacobian $F'(x)$
2824as an integer matrix $P$ one may call {\sf forward} or {\sf reverse} 
2825with $p \ge n/s$ or $q \ge m/s$, respectively. For this purpose the
2826corresponding dependence structure $X$ or $U$ must be defined to represent 
2827the identity matrix of the respective dimension.
2828Due to the fact that always a multiple of $s$ Boolean vectors are propagated
2829there may be superfluous vectors, which can be set to zero.
2830
2831The return values of the bit pattern {\sf forward} and {\sf reverse} routines
2832correspond to those described in \autoref{retvalues}.
2833
2834One can control the storage growth by the factor $p$ using
2835``strip-mining'' for the calls of {\sf forward} or {\sf reverse} with successive
2836groups of columns or respectively rows at a time, i.e.~partitioning
2837$X$ or $U$ appropriately as described for the computation of Jacobians
2838in \autoref{vecCas}.
2839%
2840%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2841%
2842\section{Advanced algorithmic differentiation in ADOL-C}
2843\label{adv_ad}
2844%
2845\subsection{Differentiating parameter dependent functions}
2846%
2847Normally the functions to be differentiated using AD are defined as
2848mappings from $\R^n$ to $\R^m$. The derivatives of the $m$ dependents
2849are then computed w.r.t. all the $n$ independents. However sometimes
2850the application requires only derivatives w.r.t. some of the
2851independents and the others only occur in the evaluation procedure as
2852parameters. One important such use case is Lagrange Multipliers in a
2853nonlinear optimization problem. Declaring such variables as
2854\verb=adouble= and marking them as independent results in the
2855evaluation of superfluous derivative information and increases
2856runtime. However, if the parameters are not defined as \verb=adouble=
2857type variables they get hard coded as constants onto the ADOL-C
2858trace. One may want to change these parameters for some evaluations,
2859however the overhead of retracing the function may be very high. In
2860such cases these constants may be specially marked as parameters
2861during tracing using the function
2862\begin{tabbing}
2863\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2864\>{\sf mkparam(d)}\\
2865\>{\sf double d;}         \> // constant value to be marked as parameter \\
2866\end{tabbing}
2867This stores the constant in a different place on the trace than the
2868other unmarked constants and enables the user to replace a whole
2869vector of parameters with a new one before computing derivatives later
2870on. All such marked parameter constants can be used in any evaluation
2871expression where the original unmarked constants were used.
2872
2873Sometimes the same parameter may be needed in various different
2874expressions. During trace creation the marked parameters are given a
2875running index as they are marked. This index may me saved by the user
2876and used to access the same stored value in different expressions.
2877
2878To save the index while marking a parameter use
2879\begin{tabbing}
2880\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2881\>{\sf locint mkparam\_idx(d)}\\
2882\>{\sf double d;} \> // constant value to be marked
2883\end{tabbing}
2884The resulting \verb=locint= can be saved in a variable by the user and
2885later used to access the saved parameter from in the trace in any expression
2886
2887To access a saved parameter inside any evaluation expression use
2888\begin{tabbing}
2889\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2890\>{\sf getparam(idx)}\\
2891\>{\sf locint idx;} \> // index of the parameter as returned by
2892mkparam\_idx()\\
2893\end{tabbing}
2894
2895The above three functions work only during the creation of the trace,
2896i.e., between the invokation of \verb=trace_on()= and \verb=trace_off()=.
2897
2898For all following derivative computations using the drivers described
2899in the preceding sections, the saved parameter values will be used,
2900unless the user asks to replace them with a different vector of
2901values.
2902
2903To replace the whole parameter vector one uses the routine
2904\begin{tabbing}
2905\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2906\>{\sf set\_param\_vec(tag, number, parameters)}\\
2907\>{\sf short int tag;}   \> // tape identification\\
2908\>{\sf size\_t number;}  \> // total number of constants marked as
2909parameters\\
2910\>{\sf double* parameters;} \> // array of constants to be used as replacement\\
2911\end{tabbing}
2912All following derivative computations using the drivers described in
2913the preceding sections will now use the new parameters.
2914
2915After replacing the parameters, however, one cannot directly call a basic
2916reverse mode driver. A basic forward mode driver must be called. However,
2917the easy to use drivers from Section \ref{drivers} do this automatically.
2918%
2919\subsection{Differentiating external functions}
2920%
2921Ideally, AD is applied to a given computation as a whole.
2922This is not always possible because parts of the computation may
2923be coded in a different programming language or may a call to
2924an external library.
2925In the former case one may want to differentiate the parts in
2926question with a different AD tool or provide hand written derivatives.
2927In practice, however, sophisticated projects usually evolve over a long period of time.
2928Within this process, a heterogeneous code base for the project
2929develops, which may include the incorporation of external solutions,
2930changes in programming paradigms or even of programming languages.
2931Equally heterogeneous, the computation of derivative values appears.
2932Hence, different \mbox{AD-tools} may be combined with hand-derived
2933codes based on the same or different programming languages.
2934ADOL-C supports such settings by the concept of externally
2935differentiated functions, that is, a function
2936not differentiated by ADOL-C itself. The required derivatives
2937have to be provided by the user.
2938
2939For this purpose, it is required that the externally differentiated
2940function (for example named {\sf\em euler\_step} ) has the following signature
2941\smallskip
2942
2943\noindent
2944\hspace*{2cm}{\sf int euler\_step(int n, double *x, int m, double *y);} .
2945\medskip
2946
2947\noindent
2948Note that the formal paramters in the signature have {\sf double} type, that is,
2949they are not active as in the original program before the ADOL-C type change.
2950The externally differentiated function has to
2951be {\em registered}\footnote{we record the function pointer} using an \mbox{ADOL-C} method as follows
2952\smallskip
2953
2954\noindent
2955\hspace*{2cm}{\sf ext\_diff\_fct *edf = reg\_ext\_fct(euler\_step);} .
2956\smallskip
2957
2958\noindent
2959This returns a pointer to an {\sf ext\_diff\_fct} instance specific to the registered function.
2960Then, the user has to supply the function pointers for the callback methods (for example here
2961{\sf zos\_for\_euler\_step} and {\sf  fos\_rev\_euler\_step}) the user implemented
2962to compute the derivatives as follows:
2963\begin{tabbing}
2964\hspace*{2cm}\= {\sf edf-$>$zos\_forward = {\em zos\_for\_euler\_step};}\\
2965             \> {\sf // function pointer for computing
2966               Zero-Order-Scalar (=zos)}\\
2967             \> {\sf // forward information}\\
2968%             \> {\sf edf-$>$dp\_x = xp;}\\
2969%             \> {\sf edf-$>$dp\_y = yp;}\\
2970%             \> {\sf // double arrays for arguments and results}\\
2971             \> {\sf edf-$>$fos\_reverse = fos\_rev\_euler\_step;} \\
2972             \> {\sf // function pointer for computing
2973               First-Order-Scalar (=fos)}\\ 
2974             \> {\sf reverse information}
2975\end{tabbing}
2976To facilitate the switch between active and passive versions of the parameters {\sf x} and {\sf y}
2977one has to provide (allocate) both variants. I.e. if the call to {\sf euler\_step} was originally
2978\smallskip
2979
2980\noindent
2981\hspace*{2cm}{\sf rc=euler\_step(n, sOld, m, sNew);}
2982\medskip
2983
2984\noindent
2985then the ADOL-C typechange for the calling context will turn  {\sf sOld} and {\sf sNew} in {\sf adouble} pointers.
2986To trigger the appropriate action for the derivative computation (i.e. creating an external differentiation entry on the trace)
2987the original call to the externally differentiated function
2988must be substituted by
2989\smallskip
2990
2991\noindent
2992\hspace*{2cm}{\sf rc=call\_ext\_fct(edf, n, sOldPassive, sOld, m, sNewPassive, sNew);} .
2993\medskip
2994
2995\noindent
2996Here, {\sf sOldPassive} and {\sf sNewPassive} are the passive counterparts ({\sf double} pointers allocated to length {\sf n} and {\sf m}, respectively) 
2997to the active arguments {\sf sNew} in {\sf adouble}.
2998The usage of the external function facility is illustrated by the
2999example \verb=ext_diff_func= contained in
3000\verb=examples/additional_examples/ext_diff_func=.
3001There, the external differentiated function is also a C code, but the
3002handling as external differentiated function decreases the
3003overall required tape size.
3004
3005%
3006\subsection{Advanced algorithmic differentiation of time integration processes}
3007%
3008For many time-dependent applications, the corresponding simulations
3009are based on ordinary or partial differential equations.
3010Furthermore, frequently there are quantities that influence the
3011result of the simulation and can be seen as  control of the systems.
3012To compute an approximation of the
3013simulated process for a time interval $[0,T]$ and evaluated the
3014desired target function, one applies an
3015appropriate integration scheme given by
3016\begin{tabbing}
3017\hspace{5mm} \= some initializations yielding $x_0$\\
3018\> for $i=0,\ldots, N-1$\\
3019\hspace{10mm}\= $x_{i+1} = F(x_i,u_i,t_i)$\\
3020\hspace{5mm} \= evaluation of the target function
3021\end{tabbing}
3022where $x_i\in {\bf R}^n$ denotes the state and $u_i\in {\bf R}^m$ the control at
3023time $t_i$ for a given time grid $t_0,\ldots,t_N$ with $t_0=0$ and
3024$t_N=T$. The operator $F : {\bf R}^n \times {\bf R}^m \times {\bf R} \mapsto {\bf R}^n$
3025defines the time step to compute the state at time $t_i$. Note that we
3026do not assume a uniform grid.
3027
3028When computing derivatives of the target function with respect to the
3029control, the consequences for the tape generation using the ``basic''
3030taping approach as implemented in ADOL-C so far are shown in the left part of
3031\autoref{fig:bas_tap}.
3032\begin{figure}[htbp] 
3033\begin{center}
3034\hspace*{0.5cm}\includegraphics[width=5.8cm]{tapebasic}\hfill
3035\includegraphics[width=5.8cm]{tapeadv} \hspace*{0.5cm}\
3036\end{center}
3037\hspace*{0.8cm} Basic taping process \hspace*{4.3cm} Advanced taping process
3038\caption{Different taping approaches}
3039\label{fig:bas_tap}
3040\end{figure} 
3041As can be seen, the iterative process is completely
3042unrolled due to the taping process. That is, the tape contains an internal representation of each
3043time step. Hence, the overall tape comprises a serious amount of redundant
3044information as illustrated by the light grey rectangles in
3045\autoref{fig:bas_tap}
3046
3047To overcome the repeated storage of essentially the same information,
3048a {\em nested taping} mechanism has been incorporated into ADOL-C as illustrated on
3049the right-hand side of \autoref{fig:bas_tap}. This new
3050capability allows the encapsulation of the time-stepping procedure
3051such that only the last time step $x_{N} = F(x_{N-1},u_{N-1})$ is taped as one
3052representative of the time steps in addition to a function pointer to the
3053evaluation procedure $F$ of the time steps.  The function pointer has
3054to be stored for a possibly necessary retaping during the derivative calculation
3055as explained below.
3056
3057Instead of storing the complete tape, only a very limited number of intermediate
3058states are kept in memory. They serve as checkpoints, such that
3059the required information for the backward integration is generated
3060piecewise during the adjoint calculation.
3061For this modified adjoint computation the optimal checkpointing schedules
3062provided by {\bf revolve} are employed. An adapted version of the
3063software package {\sf revolve} is part of ADOL-C and automatically
3064integrated in the ADOL-C library. Based on {\sf revolve}, $c$ checkpoints are
3065distributed such that computational effort is minimized for the given
3066number of checkpoints and time steps $N$. It is important to note that the overall tape
3067size is drastically reduced due to the advanced taping strategy.  For the
3068implementation of this nested taping we introduced
3069a so-called ``differentiating context'' that enables \mbox{ADOL-C} to
3070handle different internal function representations during the taping
3071procedure and the derivative calculation. This approach allows the generation of a new
3072tape inside the overall tape, where the coupling of the different tapes is based on
3073the {\em external differentiated function} described above.
3074
3075Written under the objective of minimal user effort, the checkpointing routines
3076of \mbox{ADOL-C} need only very limited information. The user must
3077provide two routines as implementation of the time-stepping function $F$ 
3078with the signatures
3079\medskip
3080
3081\noindent
3082\hspace*{2cm}{\sf int time\_step\_function(int n, adouble *u);}\\
3083\hspace*{2cm}{\sf int time\_step\_function(int n, double *u);}
3084\medskip
3085
3086\noindent
3087where the function names can be chosen by the user as long as the names are
3088unique.It is possible that the result vector of one time step
3089iteration overwrites the argument vector of the same time step. Then, no
3090copy operations are required to prepare the next time step.
3091
3092At first, the {\sf adouble} version of the time step function has to
3093be {\em registered} using the \mbox{ADOL-C} function
3094\medskip
3095
3096\noindent
3097\hspace*{2cm}{\sf CP\_Context cpc(time\_step\_function);}.
3098\medskip
3099
3100\noindent
3101This function initializes the structure {\sf cpc}. Then,
3102the user has to provide the remaining checkpointing information
3103by the following commands:
3104\begin{tabbing}
3105\hspace*{2cm}\= {\sf cpc.setDoubleFct(time\_step\_function);}\\
3106             \> {\sf // double variante of the time step function}\\
3107             \> {\sf cpc.setNumberOfSteps(N);}\\
3108             \> {\sf // number of time steps to perform}\\
3109             \> {\sf cpc.setNumberOfCheckpoints(10);}\\
3110             \> {\sf // number of checkpoint} \\
3111             \> {\sf cpc.setDimensionXY(n);}\\
3112             \> {\sf // dimension of input/output}\\
3113             \> {\sf cpc.setInput(y);}\\
3114             \> {\sf // input vector} \\
3115             \> {\sf cpc.setOutput(y);}\\
3116             \> {\sf // output vector }\\
3117             \> {\sf cpc.setTapeNumber(tag\_check);}\\
3118             \> {\sf // subtape number for checkpointing} \\
3119             \> {\sf cpc.setAlwaysRetaping(false);}\\
3120             \> {\sf // always retape or not ?}
3121\end{tabbing}
3122Subsequently, the time loop in the function evaluation can be
3123substituted by a call of the function
3124\medskip
3125
3126\noindent
3127\hspace*{2cm}{\sf int cpc.checkpointing();}
3128\medskip
3129
3130\noindent
3131Then, ADOL-C computes derivative information using the optimal checkpointing
3132strategy provided by {\sf revolve} internally, i.e., completely hidden from the user.
3133
3134The presented driver is prototyped in the header file
3135\verb=<adolc/checkpointing.h>=. This header
3136is included by the global header file \verb=<adolc/adolc.h>= automatically.
3137An example program \verb=checkpointing.cpp= illustrates the
3138checkpointing facilities. It can be found in the directory \verb=examples/additional_examples/checkpointing=.
3139%
3140%
3141%
3142\subsection{Advanced algorithmic differentiation of fixed point iterations}
3143%
3144Quite often, the state of the considered system denoted by $x\in\R^n$
3145depends on some design parameters denoted by $u\in\R^m$. One example for this setting
3146forms the flow over an aircraft wing. Here, the shape of the wing that
3147is defined by the design vector $u$ 
3148determines the flow field $x$. The desired quasi-steady state $x_*$
3149fulfills the fixed point equation
3150\begin{align}
3151  \label{eq:fixedpoint}
3152  x_* = F(x_*,u)
3153\end{align}
3154for a given continuously differentiable function
3155$F:\R^n\times\R^m\rightarrow\R^n$. A fixed point property of this kind is
3156also exploited by many other applications.
3157
3158Assume that one can apply the iteration 
3159\begin{align}
3160\label{eq:iteration}
3161 x_{k+1} = F(x_k,u)
3162\end{align}
3163to obtain a linear converging sequence $\{x_k\}$ generated
3164for any given control $u\in\R^n$. Then the limit point $x_*\in\R^n$ fulfils the fixed
3165point equation~\eqref{eq:fixedpoint}. Moreover,
3166suppose that $\|\frac{dF}{dx}(x_*,u)\|<1$ holds for any pair
3167$(x_*,u)$ satisfying equation \eqref{eq:fixedpoint}.
3168Hence, there exists a
3169differentiable function $\phi:\R^m \rightarrow \R^n$,
3170such that $\phi(u) = F(\phi(u),u)$, where the state
3171$\phi(u)$ is a fixed point of $F$ according to a control
3172$u$. To optimize the system described by the state vector $x=\phi(u)$ with respect to
3173the design vector $u$, derivatives of $\phi$ with respect
3174to $u$ are of particular interest.
3175
3176To exploit the advanced algorithmic differentiation  of such fixed point iterations
3177ADOL-C provides the special functions {\tt fp\_iteration(...)}.
3178It has the following interface:
3179\begin{tabbing}
3180\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
3181\>{\sf int
3182  fp\_iteration(}\={\sf sub\_tape\_num,double\_F,adouble\_F,norm,norm\_deriv,eps,eps\_deriv,}\\
3183\>              \>{\sf N\_max,N\_max\_deriv,x\_0,u,x\_fix,dim\_x,dim\_u)}\\
3184\hspace{0.5in}\={\sf short int tag;} \hspace{0.9in}\= \kill    % define tab position
3185\>{\sf short int sub\_tape\_num;}         \> // tape identification for sub\_tape \\
3186\>{\sf int *double\_F;}         \> // pointer to a function that compute for $x$ and $u$ \\
3187\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
3188\>{\sf int *adouble\_F;}        \> // pointer to a function that compute for $x$ and $u$ \\
3189\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
3190\>{\sf int *norm;}              \> // pointer to a function that computes\\
3191\>                              \> // the norm of a vector\\
3192\>{\sf int *norm\_deriv;}       \> // pointer to a function that computes\\
3193\>                              \> // the norm of a vector\\
3194\>{\sf double eps;}             \> // termination criterion for fixed point iteration\\
3195\>{\sf double eps\_deriv;}      \> // termination criterion for adjoint fixed point iteration\\
3196\>{\sf N\_max;}                 \> // maximal number of itertions for state computation\\
3197\>{\sf N\_max\_deriv;}          \> // maximal number of itertions for adjoint computation\\
3198\>{\sf adouble *x\_0;}          \> // inital state of fixed point iteration\\
3199\>{\sf adouble *u;}             \> // value of $u$\\
3200\>{\sf adouble *x\_fic;}        \> // final state of fixed point iteration\\
3201\>{\sf int dim\_x;}             \> // dimension of $x$\\
3202\>{\sf int dim\_u;}             \> // dimension of $u$\\
3203\end{tabbing}
3204%
3205Here {\tt sub\_tape\_num} is an ADOL-C identifier for the subtape that
3206should be used for the fixed point iteration.
3207{\tt double\_F} and {\tt adouble\_F} are pointers to functions, that
3208compute for $x$ and $u$ a single iteration step $y=F(x,u)$. Thereby
3209{\tt double\_F} uses {\tt double} arguments and {\tt adouble\_F}
3210uses ADOL-C {\tt adouble} arguments. The parameters {\tt norm} and
3211{\tt norm\_deriv} are pointers to functions computing the norm
3212of a vector. The latter functions together with {\tt eps},
3213{\tt eps\_deriv}, {\tt N\_max}, and {\tt N\_max\_deriv} control
3214the iterations. Thus the following loops are performed:
3215\begin{center}
3216\begin{tabular}{ll}
3217  do                     &   do                           \\
3218  ~~~~$k = k+1$          &   ~~~~$k = k+1$                \\
3219  ~~~~$x = y$            &   ~~~~$\zeta = \xi$            \\
3220  ~~~~$y = F(x,u)$       &   ~~~
3221  $(\xi^T,\bar u^T) = \zeta^TF'(x_*,u) + (\bar x^T, 0^T)$ \\
3222  while $\|y-x\|\geq\varepsilon$ and $k\leq N_{max}$ \hspace*{0.5cm} &
3223  while $\|\xi -\zeta\|_{deriv}\geq\varepsilon_{deriv}$   \\
3224  & and $k\leq N_{max,deriv}$
3225\end{tabular}
3226\end{center}
3227The vector for the initial iterate and the control is stored
3228in {\tt x\_0} and {\tt u} respectively. The vector in which the
3229fixed point is stored is {\tt x\_fix}. Finally {\tt dim\_x}
3230and {\tt dim\_u} represent the dimensions $n$ and $m$ of the
3231corresponding vectors.
3232
3233The presented driver is prototyped in the header file
3234\verb=<adolc/fixpoint.h>=. This header
3235is included by the global header file \verb=<adolc/adolc.h>= automatically.
3236An example code that shows also the
3237expected signature of the function pointers is contained in the directory \verb=examples/additional_examples/fixpoint_exam=.
3238%
3239\subsection{Advanced algorithmic differentiation of OpenMP parallel programs}
3240%
3241ADOL-C allows to compute derivatives in parallel for functions
3242containing OpenMP parallel loops.
3243This implies that an explicit loop-handling approach is applied. A
3244typical situation is shown in \autoref{fig:basic_layout},
3245\begin{figure}[hbt]
3246    \vspace{3ex}
3247    \begin{center}
3248        \includegraphics[height=4cm]{multiplexed} \\
3249        \begin{picture}(0,0)
3250            \put(-48,40){\vdots}
3251            \put(48,40){\vdots}
3252            \put(-48,80){\vdots}
3253            \put(48,80){\vdots}
3254            \put(-83,132){function eval.}
3255            \put(5,132){derivative calcul.}
3256        \end{picture}
3257    \end{center}
3258    \vspace{-5ex}
3259    \caption{Basic layout of mixed function and the corresponding derivation process}
3260    \label{fig:basic_layout}
3261\end{figure}
3262where the OpenMP-parallel loop is preceded by a serial startup
3263calculation and followed by a serial finalization phase.
3264
3265Initialization of the OpenMP-parallel regions for \mbox{ADOL-C} is only a matter of adding a macro to the outermost OpenMP statement.
3266Two macros are available that only differ in the way the global tape information is handled.
3267Using {\tt ADOLC\_OPENMP}, this information, including the values of the augmented variables, is always transferred from the serial to the parallel region using {\it firstprivate} directives for initialization.
3268For the special case of iterative codes where parallel regions, working on the same data structures, are called repeatedly the {\tt ADOLC\_OPENMP\_NC} macro can be used.
3269Then, the information transfer is performed only once within the iterative process upon encounter of the first parallel region through use of the {\it threadprivate} feature of OpenMP that makes use of thread-local storage, i.e., global memory local to a thread.
3270Due to the inserted macro, the OpenMP statement has the following structure:
3271\begin{tabbing}
3272\hspace*{1cm} \= {\sf \#pragma omp ... ADOLC\_OPENMP} \qquad \qquad or \\
3273              \> {\sf \#pragma omp ... ADOLC\_OPENMP\_NC}
3274\end{tabbing}
3275Inside the parallel region, separate tapes may then be created.
3276Each single thread works in its own dedicated AD-environment, and all
3277serial facilities of \mbox{ADOL-C} are applicable as usual. The global
3278derivatives can be computed using the tapes created in the serial and
3279parallel parts of the function evaluation, where user interaction is
3280required for the correct derivative concatenation of the various tapes.
3281
3282For the usage of the parallel facilities, the \verb=configure=-command
3283has to be used with the option \verb?--with-openmp-flag=FLAG?, where
3284\verb=FLAG= stands for the system dependent OpenMP flag.
3285The parallel differentiation of a parallel program is illustrated
3286by the example program \verb=openmp_exam.cpp= contained in \verb=examples/additional_examples/openmp_exam=.
3287%
3288%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3289%
3290\section{Traceless forward differentiation in ADOL-C}
3291\label{traceless}
3292%
3293Up to version 1.9.0, the development of the ADOL-C software package
3294was based on the decision to store all data necessary for derivative
3295computation on traces (tapes), where large applications require the traces to be
3296written out to corresponding files. In almost all cases this means
3297a considerable drawback in terms of run time due to the excessive
3298memory accesses. Using these traces enables ADOL-C to offer multiple
3299functions. However, it is not necessary for all tasks of derivative
3300computation to do that.
3301
3302Starting with version 1.10.0, ADOL-C now features a traceless forward
3303mode for computing first order derivatives in scalar mode, i.e.,
3304$\dot{y} = F'(x)\dot{x}$, and in vector mode, i.e., $\dot{Y} = F'(x)\dot{X}$.
3305This traceless variant coexists with the more universal
3306trace based mode in the package. The following subsections describe
3307the source code modifications required to use the traceless forward mode of
3308ADOL-C. 
3309%
3310\subsection{Modifying the Source Code}
3311%
3312Let us consider the coordinate transformation from Cartesian to spherical
3313polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y
3314= F(x)$, with
3315\begin{eqnarray*}
3316y_1  =  \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad
3317y_2  =  \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad
3318y_3  =  \arctan(x_2/x_1),
3319\end{eqnarray*}
3320as an example. The corresponding source code is shown in \autoref{fig:traceless}.
3321\begin{figure}[htb]
3322\framebox[\textwidth]{\parbox{\textwidth}{
3323%\begin{center}
3324%\begin{flushleft}
3325\begin{tabbing}
3326\= \kill
3327\> {\sf \#include} {\sf $<$iostream$>$}\\
3328\> {\sf using namespace std;}\\
3329\> \\
3330\> {\sf int main() \{}\\
3331\> {\sf \rule{0.5cm}{0pt}double x[3], y[3];}\\
3332\> \\
3333\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3334\> {\sf \rule{1cm}{0pt}...}\\
3335\> \\
3336\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3337\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3338\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3339\> \\
3340\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0] $<<$ " , y2=" $<<$ y[1] $<<$ " , y3=" $<<$ y[2] $<<$ endl;}\\
3341\> \\
3342\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3343\> \}
3344\end{tabbing}
3345%\end{flushleft}
3346%\end{center}
3347}}
3348\caption{Example for traceless forward mode}
3349\label{fig:traceless}
3350\end{figure}
3351%
3352Changes to the source code that are necessary for applying the
3353traceless forward mode of ADOL-C are described in the following two
3354subsections, where the vector mode version is described
3355as extension of the scalar mode.
3356%
3357\subsubsection*{The scalar mode}
3358%
3359To use the traceless forward mode, one has to include only
3360the header file \verb#adtl.h# since it does not include the
3361trace based functions defined in other header files.
3362
3363As in the trace based forward version of ADOL-C all derivative
3364calculations are introduced by calls to overloaded
3365operators. Therefore, similar to the trace-based version all
3366independent, intermediate and dependent variables must be declared
3367with type {\sf adouble}. The whole traceless functionality provided by
3368\verb#adtl.h# was written as complete inline intended code
3369due to run time aspects, where the real portion of inlined code can
3370be influenced by switches for many compilers. Likely, the whole
3371derivative code is inlined by default. Our experiments
3372with the traceless mode have produced complete inlined code by using
3373standard switches (optimization) for GNU and Intel C++
3374compiler.
3375
3376To avoid name conflicts
3377resulting from the inlining the traceless version has its own namespace
3378\verb#adtl#. As a result four possibilities of using the {\sf adouble}
3379type are available for the traceless version:
3380\begin{itemize}
3381\item Defining a new type
3382      \begin{center}
3383        \begin{tabular}{l}
3384          {\sf typedef adtl::adouble adouble;}\\
3385          ...\\
3386          {\sf adouble tmp;}
3387        \end{tabular}
3388      \end{center}
3389      This is the preferred way. Remember, you can not write an own
3390      {\sf adouble} type/class with different meaning after doing the typedef.
3391\item Declaring with namespace prefix
3392      \begin{center}
3393        \begin{tabular}{l}
3394          {\sf adtl::adouble tmp;}
3395        \end{tabular}
3396      \end{center}
3397      Not the most handsome and efficient way with respect to coding
3398      but without any doubt one of the safest ways. The identifier
3399      {\sf adouble} is still available for user types/classes.
3400\item Trusting macros
3401      \begin{center}
3402        \begin{tabular}{l}
3403          {\sf \#define adouble adtl::adouble}\\
3404          ...\\
3405          {\sf adouble tmp;}
3406        \end{tabular}
3407      \end{center}
3408      This approach should be used with care, since standard defines are text replacements.
3409  \item Using the complete namespace
3410        \begin{center}
3411          \begin{tabular}{l}
3412            {\sf using namespace adtl;}\\
3413            ...\\
3414            {\sf adouble tmp;}
3415          \end{tabular}
3416        \end{center}
3417        A very clear approach with the disadvantage of uncovering all the hidden secrets. Name conflicts may arise!
3418\end{itemize}
3419After defining the variables only two things are left to do. First
3420one needs to initialize the values of the independent variables for the
3421function evaluation. This can be done by assigning the variables a {\sf
3422double} value. The {\sf ad}-value is set to zero in this case.
3423Additionally, the traceless forward mode variant of ADOL-C
3424offers a function named {\sf setValue} for setting the value without
3425changing the {\sf ad}-value. To set the {\sf ad}-values of the independent
3426variables ADOL-C offers two possibilities:
3427\begin{itemize}
3428  \item Using the constructor
3429        \begin{center}
3430          \begin{tabular}{l}
3431        {\sf double seedOne = 1., seedZero = 0.;}\\
3432            {\sf adouble x1(2,\&seedOne), x2(4,\&seedZero), y;}
3433          \end{tabular}
3434        \end{center}
3435        This would create three adoubles $x_1$, $x_2$ and $y$. Obviously, the latter
3436        remains uninitialized. In terms of function evaluation
3437        $x_1$ holds the value 2 and $x_2$ the value 4 whereas the derivative values
3438        are initialized to $\dot{x}_1=1$ and $\dot{x}_2=0$.
3439   \item Setting point values directly
3440         \begin{center}
3441           \begin{tabular}{l}
3442         {\sf double seedOne = 1., seedZero = 0.;}\\
3443             {\sf adouble x1=2, x2=4, y;}\\
3444             ...\\
3445             {\sf x1.setADValue(\&seedOne);}\\
3446             {\sf x2.setADValue(\&seedZero);}
3447           \end{tabular}
3448         \end{center}
3449         The same example as above but now using {\sf setADValue}-method for initializing the derivative values. It is important to note that in both cases the derivative values have to be set by passing an address of a variable or a pointer of the {\sf double} type (instead of passing the value directly). The reason for doing this will become clear in the next subsection.
3450\end{itemize}
3451%
3452The derivatives can be obtained at any time during the evaluation
3453process by calling the {\sf getADValue}-method which returns a pointer (instead of a value), and therefore, it can be assigned to another pointer or it can be dereferenced immediately in order to read the value:
3454\begin{center}
3455  \begin{tabular}{l}
3456    {\sf adouble y;}\\
3457    ...\\
3458    {\sf cout $<<$ *(y.getADValue());}
3459  \end{tabular}
3460\end{center}
3461\autoref{fig:modcode} shows the resulting source code incorporating
3462all required changes for the example
3463given above.
3464
3465\begin{figure}[htb]
3466\framebox[\textwidth]{\parbox{\textwidth}{
3467%\begin{center}
3468\begin{tabbing}
3469\hspace*{-1cm} \= \kill
3470\> {\sf \#include $<$iostream$>$}\\
3471\> {\sf using namespace std;}\\
3472\> \\
3473\> {\sf \#include $<$adolc/adtl.h$>$}\\
3474\> {\sf typedef adtl::adouble adouble;}\\
3475\\
3476\> {\sf int main() \{}\\
3477\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3478\\
3479\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3480\> {\sf \rule{1cm}{0pt}...}\\
3481\\
3482\> {\sf \rule{0.5cm}{0pt}double seed = 1.;}\\
3483\> {\sf \rule{0.5cm}{0pt}x[0].setADValue(\&seed);\hspace*{3cm}// derivative of f with respect to $x_1$}\\
3484\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3485\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3486\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3487\\
3488\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3489\> {\sf \rule{0.5cm}{0pt}cout $<<$ "dy2/dx1 = " $<<$ *(y[1].getADValue()) $<<$ endl;}\\
3490\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3491\> {\sf \}}
3492\end{tabbing}
3493%\end{center}
3494}}
3495\caption{Example for traceless scalar forward mode}
3496\label{fig:modcode}
3497\end{figure}
3498%
3499\subsubsection*{The vector mode}
3500%
3501In scalar mode only one direction element has to be stored per {\sf
3502  adouble} whereas a field of $p$ elements is needed in the vector
3503  mode to cover the computations for the given $p$ directions. The
3504  resulting changes to the source code are described in this section.
3505
3506In order to use the vector mode one has to call
3507\verb#adtl::setNumDir(N)#.  This function takes the maximal number of
3508directions to be used within the resulting vector mode. Changing the
3509number of directions in the middle of executing code containing active
3510variables is undesirable and can lead to segmentation violations,
3511hence such an operation will result in an
3512error. \verb#adtl::setNumDir(N)# must therefore be called before
3513declaring any \verb#adtl::adouble# objects.
3514
3515Setting and getting the derivative values is done in the same manner as in the scalar case, by passing and retrieving the pointers, as illustrated in the following example:
3516\begin{center}
3517  \begin{tabular}{l}
3518    {\sf adtl::setNumDir(10);}\\
3519    {\sf adouble x, y;}\\
3520    {\sf double *ptr1 = new double[10];}\\
3521    {\sf double *ptr2;}\\
3522      ...\\
3523    {\sf x1=2;}\\
3524    {\sf x1.setADValue(ptr1);}\\
3525    ...\\
3526    {\sf ptr2=y.getADValue();}
3527  \end{tabular}
3528\end{center}
3529Now using the pointers in the methods {\sf setADValue} and {\sf getADValue} makes more sense since the vector mode requires the field of $p$ elements per {\sf adouble} which holds the derivative values.
3530Additionally, the traceless vector forward mode of ADOL-C offers two
3531new methods for setting/getting the derivative values. Setting a derivative value of the desired vector element is possible by passing a {\sf double} value and its position in the vector ({\sf integer}). Getting a derivative value of the desired vector element requires as argument only its position in the vector. One can note that the pointers are not used in these cases, as illustrated in the following example:
3532\begin{center}
3533  \begin{tabular}{l}
3534    {\sf adtl::setNumDir(10);}\\
3535    ...\\
3536    {\sf adouble x, y;}\\
3537    ...\\
3538    {\sf x1=2;}\\
3539    {\sf x1.setADValue(5,1);\hspace*{3.7cm}// set the 6th point value of x to 1.0}\\
3540      ...\\
3541    {\sf cout $<<$ y.getADValue(3) $<<$ endl;\hspace*{1cm}// print the 4th derivative value of y}
3542  \end{tabular}
3543\end{center}
3544The resulting source code containing all changes that are required is
3545shown in \autoref{fig:modcode2}
3546
3547\begin{figure}[!h!t!b]
3548\framebox[\textwidth]{\parbox{\textwidth}{
3549\begin{tabbing}
3550\hspace*{-1cm} \= \kill
3551\> {\sf \#include $<$iostream$>$}\\
3552\> {\sf  using namespace std;}\\
3553\\
3554\> {\sf \#include $<$adolc/adtl.h$>$}\\
3555\> {\sf typedef adtl::adouble adouble;}\\
3556\\
3557\> {\sf int main() \{}\\
3558\> {\sf \rule{0.5cm}{0pt}adtl::setNumDir(3);}\\
3559\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3560\\
3561\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3562\> {\sf \rule{1cm}{0pt}...\hspace*{3cm}// Initialize $x_i$}\\
3563\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j) if (i==j) x[i].setADValue(j,1);}\\
3564\> {\sf \rule{0.5cm}{0pt}\}}\\
3565\\
3566\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3567\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3568\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3569\\
3570\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3571\> {\sf \rule{0.5cm}{0pt}cout $<<$ "jacobian : " $<<$ endl;}\\
3572\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3573\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j)}\\
3574\> {\sf \rule{1.5cm}{0pt}cout $<<$ y[i].getADValue(j) $<<$ "  ";}\\
3575\> {\sf \rule{1cm}{0pt}cout $<<$ endl;}\\
3576\> {\sf \rule{0.5cm}{0pt}\}}\\
3577\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3578\> {\sf \}}
3579\end{tabbing}
3580}}
3581\caption{Example for traceless vector forward mode}
3582\label{fig:modcode2}
3583\end{figure}
3584%
3585\subsection{Compiling and Linking the Source Code}
3586%
3587After incorporating the required changes, one has to compile the
3588source code and link the object files to get the executable.
3589As long as the ADOL-C header files are not included in the absolute path
3590the compile sequence should be similar to the following example:
3591\begin{center}
3592  \begin{tabular}{l}
3593    {\sf g++ -I/home/username/adolc\_base/include -c traceless\_scalar.cpp}
3594  \end{tabular}
3595\end{center}
3596The \verb#-I# option tells the compiler where to search for the ADOL-C
3597header files. This option can be omitted when the headers are included
3598with absolute path or if ADOL-C is installed in a ``global'' directory.
3599
3600Although originally designed to be completely inline, certain global
3601variables need to be initialised in the compiled part of the library
3602and therefore the ADOL-C library must be linked with the compiled code.
3603The example started above could be finished with the
3604following command:
3605\begin{center}
3606  \begin{tabular}{l}
3607    {\sf g++ -o traceless\_scalar traceless\_scalar.o}\\
3608    {\sf -Wl,-{}-rpath -Wl,/home/username/adolc\_base/lib -L/home/username/adolc\_base/lib -ladolc}
3609  \end{tabular}
3610\end{center}
3611The mentioned source codes {\sf traceless\_scalar.c} and {\sf traceless\_vector.c} 
3612illustrating the use of the for traceless scalar and vector mode can be found in
3613the directory {\sf examples}.
3614%
3615\subsection{Concluding Remarks for the Traceless Forward Mode Variant}
3616%
3617As many other AD methods the traceless forward mode provided by the
3618ADOL-C package has its own strengths and drawbacks. Please read the
3619following section carefully to become familiar with the things that
3620can occur:
3621\begin{itemize}
3622  \item Advantages:
3623    \begin{itemize}
3624      \item Code speed\\
3625        Increasing computation speed was one of the main aspects in writing
3626        the traceless code. In many cases higher performance can be
3627        expected this way. However this is not guarenteed for repeated
3628        evaluations.
3629      \item Smaller overall memory requirements\\
3630        Traceless ADOL-C does not write traces anymore, as the name
3631        implies. Loop ''unrolling'' can be avoided this
3632        way. Considered main memory plus disk space as overall memory
3633        requirements the traceless version can be
3634        executed in a more efficient way.
3635    \end{itemize}
3636  \item Drawbacks:
3637    \begin{itemize}
3638    \item Main memory limitations\\
3639      The ability to compute derivatives to a given function is
3640      bounded by the main memory plus swap size  when using
3641      traceless ADOL-C. Computation from swap should be avoided anyway
3642      as far as possible since it slows down the computing time
3643      drastically. Therefore, if the program execution is 
3644      terminated without error message insufficient memory size can be
3645      the reason among other things. The memory requirements $M$ can
3646      be determined roughly as followed:
3647      \begin{itemize}
3648        \item Scalar mode: $M=$(number of {\sf adouble}s)$*2 + M_p$
3649        \item Vector mode: $M=$(number of {\sf adouble}s)*({\sf
3650          NUMBER\_DIRECTIONS}$+1) + M_p$ 
3651      \end{itemize}
3652      where the storage size of all non {\sf adouble} based variables is described by $M_p$.
3653    \item Compile time\\
3654      As discussed in the previous sections, the traceless forward mode of
3655      the ADOL-C package is implemented as inline intended version. Using
3656      this approach results in a higher source code size, since every
3657      operation involving at least one {\sf adouble} stands for the
3658      operation itself as well as for the corresponding derivative
3659      code after the inlining process. Therefore, the compilation time
3660      needed for the traceless version may be higher than that of the trace based code.
3661    \item Code Size\\
3662      A second drawback and result of the code inlining is the
3663      increase of code sizes for the binaries. The increase
3664      factor compared to the corresponding trace based program is
3665      difficult to quantify as it is task dependent. Practical results
3666      have shown that values between 1.0 and 2.0 can be
3667      expected. Factors higher than 2.0 are possible too and even
3668      values below 1.0 have been observed.
3669    \end{itemize}
3670\end{itemize}
3671
3672
3673\section{Traceless forward differentiation in ADOL-C using Cuda}
3674\label{tracelessCuda}
3675%
3676One major drawback using the traceless version of ADOL-C is the fact
3677that several function evaluations are needed to compute derivatives
3678in many different points. More precisely, to calculate the Jacobian for
3679a function $F:\mathbb{R}^n\rightarrow \mathbb{R}^m$ in $M$ points, $M$
3680function evaluations are needed for the traceless vector mode and even
3681$M*n$ for the traceless scalar mode. Depending on the size of the function
3682this can result in a long runtime. To achieve a better performance one can
3683use parallelisation techniques as the same operations are
3684performed during a function evaluation. One possibility is to use GPUs
3685since they are optimized for data parallel computation. Starting with
3686version 2.3.0 ADOL-C now features a traceless forward mode for computing
3687first order derivatives in scalar mode on GPUs using the
3688general purpose parallel computing architecture Cuda.
3689
3690The idea is to include parallel code that executes in many GPU threads across
3691processing elements. This can be done by using kernel functions, that is functions
3692which are executed on GPU as an array of threads in parallel. In general all
3693threads execute the same code. They are grouped into blocks which are then grouped
3694into grids. A kernel is executed as a grid of blocks of threads. For more
3695details see, e.g., the NVIDIA CUDA C Programming Guide which can be downloaded from
3696the web page {\sf www.nvidia.com}. To solve the problem of calculating
3697the Jacobian of $F$ at $M$ points it is possible to let each thread perform a
3698function evaluation and thus the computation of derivatives for one direction
3699at one point. The advantage is that the function is evaluated in different points
3700in parallel which can result in a faster wallclock runtime.
3701
3702The following subsection describes the source code modifications required to use
3703the traceless forward mode of ADOL-C with Cuda. 
3704
3705\subsection{Modifying the source code}
3706
3707Let us again consider the coordinate transformation from Cartesian to spherical
3708polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y
3709= F(x)$, with
3710\begin{eqnarray*}
3711y_1  =  \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad
3712y_2  =  \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad
3713y_3  =  \arctan(x_2/x_1),
3714\end{eqnarray*}
3715as an example. We now calculate the Jacobian at $M=1024$ different points.
3716The source code for one point is shown in \autoref{fig:traceless}. This example
3717has no real application but can still be used to show the combination of the traceless version
3718of ADOL-C and Cuda.
3719
3720For the use of this mode a Cuda toolkit, which is suitable for the grafic card used,
3721has to be installed. Furthermore, it is important to check that the graphic card
3722used supports double precision (for details see e.g. NVIDIA CUDA C Programming Guide).
3723Otherwise the data type employed inside of the {\sf adouble} class has to be adapted
3724to {\sf float}. To use the traceless forward mode with Cuda, one has to include the header
3725files \verb#adoublecuda.h# and \verb#cuda.h#. The first one contains the
3726definition of the class {\sf adouble} and the overloaded operators
3727for this version of ADOL-C. As in the other versions all derivative
3728calculations are introduced by calls to overloaded operators. The
3729second header file is needed for the use of Cuda. 
3730
3731One possibility to solve the problem above is the following. First of all
3732three {\sf double} arrays are needed: {\sf x} for the independent variables,
3733{\sf y} for the dependent variables and {\sf deriv} for the values of the
3734Jacobian matrices. The independent variables have to be initialised, therefore
3735the points at which the function should be evaluated are saved in a row in the
3736same array of length $3*M$. For the computation on GPUs one also has to allocate memory
3737on this device. Using the syntax in Cuda one can allocate an array of
3738length $3*M$ (number of independent variables times number of points) for
3739the independent variables as follows:
3740\begin{center}
3741  \begin{tabular}{l}
3742    {\sf double * devx;}\\
3743    {\sf cudaMalloc((void**)\&devx, 3*M*sizeof(double));}\\
3744  \end{tabular}
3745\end{center}
3746The arrays for the dependent variables and the values of the Jacobian matrices
3747are allocated in the same way. Then the values of the independent variables
3748have to be copied to the GPU using the following command
3749\begin{center}
3750  \begin{tabular}{l}
3751    {\sf cudaMemcpy(devx, x, sizeof(double)*3*M, cudaMemcpyHostToDevice);}\\
3752  \end{tabular}
3753\end{center}   
3754The argument {\sf cudaMemcpyHostToDevice} indicates that the values are copied from
3755the host to the GPU. In this case the values stored in {\sf x} are copied to {\sf devx}.
3756
3757Now all required information has been transferred to the GPU. The changes
3758in the source code made so far are summarized in \autoref{fig:cudacode1}.
3759 
3760\begin{figure}[!h!t!b]
3761\framebox[\textwidth]{\parbox{\textwidth}{
3762%\begin{center}
3763\begin{tabbing}
3764\hspace*{-1cm} \= \kill
3765\> {\sf \#include $<$iostream$>$}\\
3766\> {\sf \#include $<$cuda.h$>$}\\
3767\> {\sf \#include $<$adoublecuda.h$>$}\\
3768\> {\sf using namespace std;}\\
3769\\
3770\> {\sf int main() \{}\\
3771\> {\sf \rule{0.5cm}{0pt}int M=1024;}\\
3772\> {\sf \rule{0.5cm}{0pt}double* deriv = new double[9*M];}\\
3773\> {\sf \rule{0.5cm}{0pt}double* y = new double[3*M];}\\
3774\> {\sf \rule{0.5cm}{0pt}double* x = new double[3*M];}\\
3775\\
3776\> {\sf \rule{0.5cm}{0pt}// Initialize $x_i$}\\
3777\> {\sf \rule{0.5cm}{0pt}for (int k=0; k$<$M; ++k)\{}\\
3778\> {\sf \rule{1cm}{0pt}for (int i=0; i$<$3; ++i)}\\
3779\> {\sf \rule{1.5cm}{0pt}x[k*3+i] =i + 1/(k+1);\}}\\
3780\\
3781\> {\sf \rule{0.5cm}{0pt}// Allocate array for independent and dependent variables}\\
3782\> {\sf \rule{0.5cm}{0pt}// and Jacobian matrices on GPU}\\
3783\> {\sf \rule{0.5cm}{0pt}double * devx;}\\
3784\> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devx, 3*M*sizeof(double));}\\
3785\> {\sf \rule{0.5cm}{0pt}double * devy;}\\
3786\> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devy, 3*M*sizeof(double));}\\
3787\> {\sf \rule{0.5cm}{0pt}double * devderiv;}\\
3788\> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devderiv, 3*3*M*sizeof(double));}\\
3789\\
3790\> {\sf \rule{0.5cm}{0pt}// Copy values of independent variables from host to GPU}\\
3791\> {\sf \rule{0.5cm}{0pt}cudaMemcpy(devx, x, sizeof(double)*3*M, cudaMemcpyHostToDevice);}\\
3792\\
3793\> {\sf \rule{0.5cm}{0pt}// Call function to specify amount of blocks and threads to be used}\\
3794\> {\sf \rule{0.5cm}{0pt}kernellaunch(devx, devy, devderiv,M);}\\
3795\\
3796\> {\sf \rule{0.5cm}{0pt}// Copy values of dependent variables and Jacobian matrices from GPU to host}\\
3797\> {\sf \rule{0.5cm}{0pt}cudaMemcpy(y, devy, sizeof(double)*3*M,cudaMemcpyDeviceToHost);}\\
3798\> {\sf \rule{0.5cm}{0pt}cudaMemcpy(deriv, devderiv, sizeof(double)*3*3*M, cudaMemcpyDeviceToHost);}\\
3799\> {\sf \}}
3800\end{tabbing}
3801%\end{center}
3802}}
3803\caption{Example for traceless scalar forward mode with Cuda}
3804\label{fig:cudacode1}
3805\end{figure}
3806
3807In the next step the user has to specify how many blocks and threads per block
3808will be needed for the function evaluations. In the present example this is done by 
3809the call of the function {\sf kernellaunch}, see \autoref{fig:cudacode2}.
3810In this case the blocks are two dimensional: the x-dimension is determined
3811by the number of points $M$ at which the Jacobian matrix has to be calculated
3812while the y-dimension is given by the number of independent variables, i.e., 3.
3813Since a block cannot contain more than 1024 threads, the
3814x-dimension in the example is 64 instead of $M=1024$. Therefore $1024/64=16$
3815blocks are needed. The described division into blocks is reasonable as each
3816thread has to perform a function evaluation for one point and one direction,
3817hence $M*3$ threads are needed where 3 denotes the number of independent
3818variables corresponding to the number of directions needed for the computation
3819of the Jacobian in one point.
3820
3821\begin{figure}[!h!t!b]
3822\framebox[\textwidth]{\parbox{\textwidth}{
3823\begin{tabbing}
3824\hspace*{-1cm} \= \kill
3825\> {\sf cudaError\_t kernellaunch(double* inx, double* outy, double* outderiv, int M) \{}\\
3826\> {\sf \rule{0.5cm}{0pt}// Create 16 blocks}\\
3827\> {\sf \rule{0.5cm}{0pt}int Blocks=16;}\\
3828\> {\sf \rule{0.5cm}{0pt}// Two dimensional (M/Blocks)$\times$3 blocks}\\
3829\> {\sf \rule{0.5cm}{0pt}dim3 threadsPerBlock(M/Blocks,3);}\\
3830\\
3831\> {\sf \rule{0.5cm}{0pt}// Call kernel function with 16 blocks with (M/Blocks)$\times$3 threads per block}\\
3832\> {\sf \rule{0.5cm}{0pt}kernel $<<<$ Blocks, threadsPerBlock $>>>$(inx, outy, outderiv);}\\
3833\> {\sf \rule{0.5cm}{0pt}cudaError\_t cudaErr = cudaGetLastError();}\\
3834\\
3835\> {\sf \rule{0.5cm}{0pt}return cudaErr;}\\
3836\> {\sf \}}
3837\end{tabbing}
3838}}
3839\caption{Example for traceless scalar forward mode with Cuda}
3840\label{fig:cudacode2}
3841\end{figure}
3842
3843We can now perform the function evaluations together with the calculation
3844of the Jacobians. The corresponding code is illustrated in \autoref{fig:cudacode3}.
3845Since the function evaluations should be performed on a GPU a kernel function is
3846needed which is defined by using the \mbox{{\sf \_\_ global\_\_}} declaration specifier
3847(see \autoref{fig:cudacode3}). Then each thread executes the operations that are
3848defined in the kernel.
3849
3850\begin{figure}[!h!t!b]
3851\framebox[\textwidth]{\parbox{\textwidth}{
3852\begin{tabbing}
3853\hspace*{-1cm} \= \kill
3854\> {\sf \_\_global\_\_ void kernel(double* inx, double* outy, double* outderiv) \{}\\
3855\> {\sf \rule{0.5cm}{0pt}const int index = threadIdx.x ;}\\
3856\> {\sf \rule{0.5cm}{0pt}const int index1 = threadIdx.y;}\\
3857\> {\sf \rule{0.5cm}{0pt}const int index2 = blockIdx.x;}\\
3858\> {\sf \rule{0.5cm}{0pt}const int index3 = blockDim.x;}\\
3859\> {\sf \rule{0.5cm}{0pt}const int index4 = blockDim.x*blockDim.y;}\\
3860\\
3861\> {\sf \rule{0.5cm}{0pt}// Declare dependent and independent variables as {\sf adouble}s}\\
3862\> {\sf \rule{0.5cm}{0pt}adtlc::adouble y[3], x[3];}\\
3863\> {\sf \rule{0.5cm}{0pt}// Read out point for function evaluation}\\
3864\> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\
3865\> {\sf \rule{1cm}{0pt}x[i]=inx[index2*index4+index*3+i];}\\
3866\> {\sf \rule{0.5cm}{0pt}// Set direction for calculation of derivatives}\\
3867\> {\sf \rule{0.5cm}{0pt}x[index1].setADValue(1);}\\
3868\\
3869\> {\sf \rule{0.5cm}{0pt}// Function evaluation}\\
3870\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3871\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3872\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3873\\
3874\> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\
3875\> {\sf \rule{1cm}{0pt}outy[(index2*index3+index)*3+i]=y[i].getValue();}\\
3876\> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\
3877\> {\sf \rule{1cm}{0pt}outderiv[(index2*index4+index*3+index1)*3+i]=y[i].getADValue();}\\
3878\> {\sf \}}
3879\end{tabbing}
3880}}
3881\caption{Example for traceless scalar forward mode with Cuda}
3882\label{fig:cudacode3}
3883\end{figure}
3884
3885In this example each thread is assigned the task of calculating the
3886derivatives in one point with respect to one independent variable. Therefore
3887some indices are needed for the implementation. Each thread has a unique thread
3888ID in a block consisting of an x and an y-dimension. The blocks have an ID as
3889well. In the example the following indices are used.
3890 
3891\begin{itemize}
3892 \item {\sf index = threadIdx.x} denotes the x-dimension of a thread (ranges from 0 to 63 in the example)
3893 \item {\sf index1 = threadIdx.y} denotes the y-dimension of a thread (ranges from 0 to 2 in the example)
3894 \item {\sf index2 = blockIdx.x} denotes the block index (ranges from 0 to 15 in the example)
3895 \item {\sf index3 = blockDim.x} denotes the x-dimension of a block (always 64 in the example)
3896 \item {\sf index4 = blockDim.x*blockDim.y} denotes the size of a block (always $64*3=192$ in the example)
3897\end{itemize}
3898
3899For the calculation of derivatives the function evaluation has to be performed
3900with {\sf adouble}s. Therefore the dependent and the independent variables
3901have to be declared as {\sf adouble}s as in the other versions of ADOL-C (see, e.g., \autoref{traceless}).
3902Similar to the traceless version without Cuda the namespace \verb#adtlc# is used.
3903Now each thread has to read out the point at which the function evaluation is
3904then performed. This is determined by the blockindex and the x-dimension of
3905the thread, therefore the independent variables in a thread
3906have the values
3907\begin{center}
3908  \begin{tabular}{l}
3909    {\sf \rule{0.5cm}{0pt}x[i]=inx[index2*index4+index*3+i];}\\
3910  \end{tabular}
3911\end{center} 
3912for {\sf i=0,1,2} where {\sf inx} corresponds to the vector {\sf devx}. The direction for
3913the derivatives is given by {\sf index1}.
3914\begin{center}
3915  \begin{tabular}{l}
3916     {\sf \rule{0.5cm}{0pt}x[index1].setADValue(1);}\\
3917  \end{tabular}
3918\end{center} 
3919The functions for setting and getting the value and the derivative value of an
3920{\sf adouble} are the same as in the traceless version of ADOL-C for first
3921order derivatives (see \autoref{traceless}).
3922The function evaluation is then performed with {\sf adouble x} on GPU and the results
3923are saved in {\sf adouble y}. The function evaluation itself remains unchanged
3924(see \autoref{fig:cudacode3}). Then we store the values of the function
3925evaluations in each point in a row in one array:
3926\begin{center}
3927  \begin{tabular}{l}
3928   {\sf \rule{0.5cm}{0pt}outy[(index2*index3+index)*3+i]=y[i].getValue();}\\
3929  \end{tabular}
3930\end{center} 
3931for {\sf i=0,1,2}. The values of a Jacobian are stored column by column in
3932a row:
3933\begin{center}
3934  \begin{tabular}{l}
3935   {\sf \rule{0.5cm}{0pt}outderiv[(index2*index4+index*3+index1)*3+i]=y[i].getADValue();}\\
3936  \end{tabular}
3937\end{center} 
3938for {\sf i=0,1,2}.
3939
3940Now there is one thing left to do. The values calculated on the GPU have to be copied
3941back to host. For the dependent variables in the example this can be done by the
3942following call:
3943\begin{center}
3944  \begin{tabular}{l}
3945 {\sf \rule{0.5cm}{0pt}cudaMemcpy(y, devy, sizeof(double)*3*M,cudaMemcpyDeviceToHost);}\\
3946  \end{tabular}
3947\end{center} 
3948see \autoref{fig:cudacode1}. The argument {\sf cudaMemcpyDeviceToHost} determines that
3949the values are copied from GPU to host.
3950
3951%
3952\subsection{Compiling and Linking the Source Code}
3953%
3954After incorporating the required changes, one has to compile the
3955source code and link the object files to get the executable. For
3956the compilation of a Cuda file the {\sf nvcc} compiler is needed.
3957The compile sequence should be similar to the following example:
3958\begin{center}
3959  \begin{tabular}{l}
3960    {\sf nvcc -arch=sm\_20 -o traceless\_cuda traceless\_cuda.cu}
3961  \end{tabular}
3962\end{center}
3963The compiler option \verb#-arch=sm_20# specifies the compute capability that is
3964assumed, in this case one that supports double precision.
3965
3966The mentioned source code {\sf traceless\_cuda.cu} illustrating the use of the
3967forward traceless scalar mode with Cuda and a further example {\sf liborgpu.cu}
3968can be found in the directory examples. The second example is an adaption of
3969the OpenMP example to the traceless version with Cuda.
3970
3971%
3972%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3973\section{Installing and Using ADOL-C}
3974\label{install}
3975%
3976\subsection{Generating the ADOL-C Library}
3977\label{genlib}
3978%
3979The currently built system is best summarized by the ubiquitous gnu
3980install triplet
3981\begin{center}
3982\verb=configure - make - make install= .
3983\end{center}
3984Executing this three steps from the package base directory
3985\verb=</SOMEPATH/=\texttt{\packagetar}\verb=>= will compile the static and the dynamic
3986ADOL-C library with default options and install the package (libraries
3987and headers) into the default installation directory {\tt
3988  \verb=<=\$HOME/adolc\_base\verb=>=}. Inside the install directory
3989the subdirectory \verb=include= will contain all the installed header
3990files that may be included by the user program, the subdirectory
3991\verb=lib= will contain the 32-bit compiled library
3992and the subdirectory \verb=lib64= will contain the 64-bit compiled
3993library. Depending on the compiler only one of \verb=lib= or
3994\verb=lib64= may be created.
3995
3996Before doing so the user may modify the header file \verb=usrparms.h=
3997in order to tailor the \mbox{ADOL-C} package to the needs in the
3998particular system environment as discussed in
3999\autoref{Customizing}. The configure procedure which creates the necessary
4000\verb=Makefile=s can be customized by use of some switches. Available
4001options and their meaning can be obtained by executing
4002\verb=./configure --help= from the package base directory.
4003
4004All object files and other intermediately generated files can be
4005removed by the call \verb=make clean=. Uninstalling ADOL-C by
4006executing \verb=make uninstall= is only reasonable after a previous
4007called \verb=make install= and will remove all installed package files
4008but will leave the created directories behind.
4009
4010The sparse drivers are included in the ADOL-C libraries if the
4011\verb=./configure= command is executed with the option
4012\verb=--enable-sparse=. The ColPack library available at
4013\verb=http://cscapes.cs.purdue.edu/coloringpage/software.htm= is required to
4014compute the sparse structures, and is searched for in all the default
4015locations.
4016In case the library and its headers are installed in a nonstandard path
4017this may be specified with the \verb?--with-colpack=PATH? option.
4018It is assumed that the library and its header files have the following
4019directory structure: \verb?PATH/include? contains all the header
4020files,
4021\verb?PATH/lib? contains the 32-bit compiled library and
4022\verb?PATH/lib64? contains the 64-bit compiled library. Depending on
4023the compiler used to compile {\sf ADOL-C} one of these libraries will
4024be used for linking.
4025 
4026The option \verb=--disable-stdczero= turns off the initialization in the {\sf adouble} default
4027constructor. This will improve efficiency but requires that there be no implicit array initialization in the code, see
4028\autoref{WarSug}
4029
4030Support for MPI and the AdjoinableMPI API (see\\
4031\verb=http://www.mcs.anl.gov/~utke/AdjoinableMPI/AdjoinableMPIDox/index.html=
4032) may be enabled using the option \verb=--enable-ampi=. This requires
4033the presence of MPI compiler wrappers \verb=mpicc= and \verb=mpicxx=
4034in the \verb=$PATH= and the AdjoinableMPI libraries in the standard
4035locations. If MPI is installed in a nonstandard path one may specify
4036this using the \verb?--with-mpi=PATH? option. Similarly if the
4037AdjoinableMPI libraries are in an nonstandard path this may be
4038specified using the \verb?--with-ampi=PATH? option.  When MPI and
4039AdjoinableMPI support is compiled into {\sf ADOL-C} the generated
4040library will be unsuitable for linking with non-MPI
4041programs and is called \verb=libadolc_ampi=. Therefore the user
4042programs should use the flag \verb=-ladolc_ampi= instead of
4043\verb=-ladolc= in this case. An advanced user may infact specify
4044whatever name the resulting {\sf ADOL-C} library should have using the
4045\verb?--with-soname=SONAME? option.
4046
4047
4048\subsection{Compiling and Linking the Example Programs}
4049%
4050The installation procedure described in \autoref{genlib} also
4051provides the \verb=Makefile=s  to compile the example programs in the
4052directories \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= and the
4053additional examples in
4054\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples/additional_examples=. However,
4055one has to execute the
4056\verb=configure= command with  appropriate options for the ADOL-C package to enable the compilation of
4057examples. Available options are:
4058\begin{center}
4059\begin{tabular}[t]{ll}
4060\verb=--enable-docexa=&build all examples discussed in this manual\\
4061&(compare \autoref{example})\\
4062\verb=--enable-addexa=&build all additional examples\\
4063&(See file \verb=README= in the various subdirectories)
4064\end{tabular}
4065\end{center}
4066
4067Just calling \verb=make= from the packages base directory generates
4068all configured examples and the library if necessary. Compiling from
4069subdirectory \verb=examples= or one of its subfolders is possible
4070too. At least one kind of the ADOL-C library (static or shared) must
4071have been built previously in that case. Hence, building the library
4072is always the first step.
4073
4074For Compiling the library and the documented examples on Windows using
4075Visual Studio please refer to the \verb=<Readme_VC++.txt>= files in
4076the \verb=<windows/>=, \verb=<ThirdParty/ColPack/>= and
4077\verb=<ADOL-C/examples/>= subdirectories.
4078%
4079\subsection{Description of Important Header Files}
4080\label{ssec:DesIH}
4081%
4082The application of the facilities of ADOL-C requires the user
4083source code (program or module) to include appropriate
4084header files where the desired data types and routines are
4085prototyped. The new hierarchy of header files enables the user
4086to take one of two possible ways to access the right interfaces.
4087The first and easy way is recommended to beginners: As indicated in
4088\autoref{globalHeaders} the provided {\em global} header file
4089\verb=<adolc/adolc.h>= can be included by any user code to support all
4090capabilities of ADOL-C depending on the particular programming language
4091of the source.   
4092
4093\begin{table}[h]
4094\center \small
4095\begin{tabular}{|p{3.6cm}|p{10.5cm}|}\hline
4096\verb=<adolc/adolc.h>= &
4097\begin{tabular*}{10.5cm}{cp{9.5cm}}
4098  \boldmath $\rightarrow$ \unboldmath
4099                 & global header file available for easy use of ADOL-C; \\
4100  $\bullet$      & includes all ADOL-C header files depending on
4101                   whether the users source is C++ or C code.
4102\end{tabular*}
4103\\ \hline
4104\verb=<adolc/internal/usrparms.h>= &
4105\begin{tabular*}{10.5cm}{cp{9.5cm}}
4106  \boldmath $\rightarrow$ \unboldmath
4107                 & user customization of ADOL-C package (see
4108                   \autoref{Customizing}); \\
4109  $\bullet$      & after a change of
4110                   user options the ADOL-C library \verb=libadolc.*=
4111                   has to be rebuilt (see \autoref{genlib}); \\
4112  $\bullet$      & is included by all ADOL-C header files and thus by all user
4113                   programs.
4114\end{tabular*} \\ \hline
4115\end{tabular}
4116\caption{Global header files}
4117\label{globalHeaders}
4118\end{table} 
4119
4120The second way is meant for the more advanced ADOL-C user: Some source code
4121includes only those interfaces used by the particular application.
4122The respectively needed header files are indicated
4123throughout the manual.
4124Existing application determined dependences between the provided
4125ADOL-C routines are realized by automatic includes of headers in order
4126to maintain easy use. The header files important to the user are described
4127in the \autoref{importantHeaders1} and \autoref{importantHeaders2}.
4128
4129\begin{table}[h]
4130\center \small
4131\begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline
4132%\multicolumn{2}{|l|}{\bf Tracing/taping}\\ \hline
4133\verb=<adolc/adouble.h>= &
4134\begin{tabular*}{10.5cm}{cp{9.5cm}}
4135  \boldmath $\rightarrow$ \unboldmath
4136                & provides the interface to the basic active
4137                  scalar data type of ADOL-C: {\sf class adouble}
4138                  (see \autoref{prepar});
4139%  $\bullet$     & includes the header files \verb=<adolc/avector.h>= and \verb=<adolc/taputil.h>=.
4140\end{tabular*}
4141\\ \hline
4142% \verb=<adolc/avector.h>= &
4143%\begin{tabular*}{10.5cm}{cp{9.5cm}}
4144%  \boldmath $\rightarrow$ \unboldmath
4145%                & provides the interface to the active vector
4146%                  and matrix data types of ADOL-C: {\sf class adoublev}
4147%                  and {\sf class adoublem}, respectively
4148%                   (see \autoref{arrays}); \\
4149%  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
4150%\end{tabular*}
4151%\\ \hline
4152 \verb=<adolc/taputil.h>= &
4153\begin{tabular*}{10.5cm}{cp{9.5cm}}
4154  \boldmath $\rightarrow$ \unboldmath
4155                & provides functions to start/stop the tracing of
4156                  active sections (see \autoref{markingActive})
4157                  as well as utilities to obtain
4158                  tape statistics (see \autoref{examiningTape}); \\
4159  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
4160\end{tabular*}
4161\\ \hline
4162\end{tabular}
4163\caption{Important header files: tracing/taping}
4164\label{importantHeaders1}
4165\end{table} 
4166%
4167\begin{table}[h]
4168\center \small
4169\begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline
4170%\multicolumn{2}{|l|}{\bf Evaluation of derivatives}\\ \hline
4171\verb=<adolc/interfaces.h>= &
4172\begin{tabular*}{10.5cm}{cp{9.5cm}}
4173  \boldmath $\rightarrow$ \unboldmath
4174                & provides interfaces to the {\sf forward} and
4175                  {\sf reverse} routines as basic versions of derivative
4176                  evaluation (see \autoref{forw_rev}); \\
4177  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
4178  $\bullet$     & includes the header \verb=<adolc/sparse/sparsedrivers.h>=; \\
4179  $\bullet$     & is included by the header \verb=<adolc/drivers/odedrivers.h>=.
4180\end{tabular*}
4181\\ \hline
4182\verb=<adolc/drivers.h>= &
4183\begin{tabular*}{10.5cm}{cp{9.5cm}}
4184  \boldmath $\rightarrow$ \unboldmath
4185                & provides ``easy to use'' drivers for solving
4186                  optimization problems and nonlinear equations
4187                  (see \autoref{optdrivers}); \\
4188  $\bullet$     & comprises C and Fortran-callable versions.
4189\end{tabular*}
4190\\ \hline
4191\begin{minipage}{3cm}
4192\verb=<adolc/sparse/=\newline\verb= sparsedrivers.h>=
4193\end{minipage}  &
4194\begin{tabular*}{10.5cm}{cp{9.5cm}}
4195  \boldmath $\rightarrow$ \unboldmath
4196                & provides the ``easy to use'' sparse drivers
4197                  to exploit the sparsity structure of
4198                  Jacobians (see \autoref{sparse}); \\
4199  \boldmath $\rightarrow$ \unboldmath & provides interfaces to \mbox{C++}-callable versions
4200                  of {\sf forward} and {\sf reverse} routines
4201                  propagating bit patterns (see \autoref{ProBit}); \\
4202
4203  $\bullet$     & is included by the header \verb=<adolc/interfaces.h>=.
4204\end{tabular*}
4205\\ \hline
4206\begin{minipage}{3cm}
4207\verb=<adolc/sparse/=\newline\verb= sparse_fo_rev.h>=
4208\end{minipage}  &
4209\begin{tabular*}{10.5cm}{cp{9.5cm}}
4210  \boldmath $\rightarrow$ \unboldmath
4211                & provides interfaces to the underlying C-callable
4212                  versions of {\sf forward} and {\sf reverse} routines
4213                  propagating bit patterns.
4214\end{tabular*}
4215\\ \hline
4216\begin{minipage}{3cm}
4217\verb=<adolc/drivers/=\newline\verb= odedrivers.h>=
4218\end{minipage}  &
4219\begin{tabular*}{10.5cm}{cp{9.5cm}}
4220  \boldmath $\rightarrow$ \unboldmath
4221                & provides ``easy to use'' drivers for numerical
4222                  solution of ordinary differential equations
4223                  (see \autoref{odedrivers}); \\
4224  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
4225  $\bullet$     & includes the header \verb=<adolc/interfaces.h>=.
4226\end{tabular*}
4227\\ \hline
4228\begin{minipage}{3cm}
4229\verb=<adolc/drivers/=\newline\verb= taylor.h>=
4230\end{minipage}  &
4231\begin{tabular*}{10.5cm}{cp{9.5cm}}
4232  \boldmath $\rightarrow$ \unboldmath
4233                & provides ``easy to use'' drivers for evaluation
4234                  of higher order derivative tensors (see
4235                  \autoref{higherOrderDeriv}) and inverse/implicit function
4236                  differentiation (see \autoref{implicitInverse});\\
4237  $\bullet$     & comprises C++ and C-callable versions.
4238\end{tabular*}
4239\\ \hline
4240\verb=<adolc/adalloc.h>= &
4241\begin{tabular*}{10.5cm}{cp{9.5cm}}
4242  \boldmath $\rightarrow$ \unboldmath
4243                & provides C++ and C functions for allocation of
4244                  vectors, matrices and three dimensional arrays
4245                  of {\sf double}s.
4246\end{tabular*}
4247\\ \hline
4248\end{tabular}
4249\caption{Important header files: evaluation of derivatives}
4250\label{importantHeaders2}
4251\end{table} 
4252%
4253\subsection{Compiling and Linking C/C++ Programs}
4254%
4255To compile a C/C++ program or single module using ADOL-C
4256data types and routines one has to ensure that all necessary
4257header files according to \autoref{ssec:DesIH} are
4258included. All modules involving {\em active} data types as
4259{\sf adouble}
4260%, {\bf adoublev} and {\bf adoublem}
4261have to be compiled as C++. Modules that make use of a previously
4262generated tape to evaluate derivatives can either be programmed in ANSI-C
4263(while avoiding all C++ interfaces) or in C++. Depending
4264on the chosen programming language the header files provide
4265the right ADOL-C prototypes.
4266For linking the resulting object codes the library \verb=libadolc.*= 
4267must be used (see \autoref{genlib}).
4268%
4269\subsection{Adding Quadratures as Special Functions}
4270%
4271\label{quadrat}
4272%
4273Suppose an integral
4274\[ f(x) = \int\limits^{x}_{0} g(t) dt \]
4275is evaluated numerically by a user-supplied function
4276\begin{center}
4277{\sf  double  myintegral(double\& x);}
4278\end{center}
4279Similarly, let us suppose that the integrand itself is evaluated by
4280a user-supplied block of C code {\sf integrand}, which computes a
4281variable with the fixed name {\sf val} from a variable with the fixed
4282name {\sf arg}. In many cases of interest, {\sf integrand} will
4283simply be of the form
4284\begin{center}
4285{\sf \{ val = expression(arg) \}}\enspace .
4286\end{center}
4287In general, the final assignment to {\sf val} may be preceded
4288by several intermediate calculations, possibly involving local
4289active variables of type {\sf adouble}, but no external or static
4290variables of that type.  However, {\sf integrand} may involve local
4291or global variables of type {\sf double} or {\sf int}, provided they
4292do not depend on the value of {\sf arg}. The variables {\sf arg} and
4293{\sf val} are declared automatically; and as {\sf integrand} is a block
4294rather than a function, {\sf integrand} should have no header line. 
4295
4296Now the function {\sf myintegral} can be overloaded for {\sf adouble}
4297arguments and thus included in the library of elementary functions
4298by the following modifications:
4299\begin{enumerate}
4300\item
4301At the end of the file \verb=<adouble.cpp>=, include the full code
4302defining \\ {\sf double myintegral(double\& x)}, and add the line
4303\begin{center}
4304{\sf extend\_quad(myintegral, integrand); }
4305\end{center}
4306This macro is extended to the definition of
4307 {\sf adouble myintegral(adouble\& arg)}.
4308Then remake the library \verb=libadolc.*= (see \autoref{genlib}).
4309\item
4310In the definition of the class
4311{\sf ADOLC\_DLL\_EXPORT adouble} in \verb=<adolc/adouble.h>=, add the statement
4312\begin{center}
4313{\sf friend adouble myintegral(adouble\&)}.
4314\end{center}
4315\end{enumerate}
4316In the first modification, {\sf myintegral} represents the name of the
4317{\sf double} function, whereas {\sf integrand} represents the actual block
4318of C code.
4319
4320For example, in case of the inverse hyperbolic cosine, we have
4321{\sf myintegral} = {\sf acosh}. Then {\sf integrand} can be written as
4322{\sf \{ val = sqrt(arg*arg-1); \}}
4323so that the line
4324\begin{center}
4325{\sf extend\_quad(acosh,val = sqrt(arg*arg-1));}
4326\end{center}
4327can be added to the file \verb=<adouble.cpp>=.
4328A mathematically equivalent but longer representation of
4329{\sf integrand} is
4330\begin{center}
4331\begin{tabbing}
4332{\sf \{ }\hspace{1.0in}\= {\sf  \{ adouble} \= temp =   \kill
4333 \>{\sf  \{ adouble} \> {\sf temp = arg;} \\
4334 \> \ \> {\sf  temp = temp*temp; } \\ 
4335 \> \ \> {\sf  val = sqrt(temp-1); \}
4336\end{tabbing}
4337\end{center}
4338The code block {\sf integrand} may call on any elementary function that has already
4339been defined in file \verb=<adouble.cpp>=, so that one may also introduce
4340iterated integrals.
4341%
4342%
4343\section{Example Codes}
4344\label{example}
4345%
4346The following listings are all simplified versions of codes that
4347are contained in the example subdirectory
4348\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= of ADOL-C. In particular,
4349we have left out timings, which are included in the complete codes.
4350%
4351\subsection{Speelpenning's Example ({\tt speelpenning.cpp})}
4352%
4353The first example evaluates the gradient and the Hessian of
4354the function
4355\[ 
4356y \; = \; f(x)\; =\; \prod_{i=0}^{n-1} x_i
4357\] 
4358using the appropriate drivers {\sf gradient} and {\sf hessian}.
4359
4360\begin{verbatim}
4361#include <adolc/adouble.h>               // use of active doubles and taping
4362#include <adolc/drivers/drivers.h>       // use of "Easy to Use" drivers
4363                                   // gradient(.) and hessian(.)
4364#include <adolc/taping.h>                // use of taping
4365...
4366void main() {
4367int n,i,j;
4368size_t tape_stats[STAT_SIZE];
4369cout << "SPEELPENNINGS PRODUCT (ADOL-C Documented Example) \n";
4370cout << "number of independent variables = ?  \n";
4371cin >> n;
4372double* xp = new double[n];         
4373double  yp = 0.0;
4374adouble* x = new adouble[n];     
4375adouble  y = 1;
4376for(i=0;i<n;i++)
4377  xp[i] = (i+1.0)/(2.0+i);         // some initialization
4378trace_on(1);                       // tag =1, keep=0 by default
4379  for(i=0;i<n;i++) {
4380    x[i] <<= xp[i]; y *= x[i]; }     
4381  y >>= yp;
4382  delete[] x;                     
4383trace_off();
4384tapestats(1,tape_stats);           // reading of tape statistics
4385cout<<"maxlive "<<tape_stats[2]<<"\n";
4386...                                // ..... print other tape stats
4387double* g = new double[n];       
4388gradient(1,n,xp,g);                // gradient evaluation
4389double** H=(double**)malloc(n*sizeof(double*));
4390for(i=0;i<n;i++)
4391  H[i]=(double*)malloc((i+1)*sizeof(double));
4392hessian(1,n,xp,H);                 // H equals (n-1)g since g is
4393double errg = 0;                   // homogeneous of degree n-1.
4394double errh = 0;
4395for(i=0;i<n;i++)
4396  errg += fabs(g[i]-yp/xp[i]);     // vanishes analytically.
4397for(i=0;i<n;i++) {
4398  for(j=0;j<n;j++) {
4399    if (i>j)                       // lower half of hessian
4400      errh += fabs(H[i][j]-g[i]/xp[j]); } }
4401cout << yp-1/(1.0+n) << " error in function \n";
4402cout << errg <<" error in gradient \n";
4403cout << errh <<" consistency check \n";
4404}                                  // end main
4405\end{verbatim}
4406%
4407\subsection{Power Example ({\tt powexam.cpp})}
4408%
4409The second example function evaluates the $n$-th power of a real
4410variable $x$ in
4411$\log_2 n$ multiplications by recursive halving of the exponent. Since
4412there is only one independent variable, the scalar derivative can be
4413computed by
4414using both {\sf forward} and {\sf reverse}, and the
4415results are subsequently compared.
4416\begin{verbatim}
4417#include <adolc/adolc.h>                 // use of ALL ADOL-C interfaces
4418
4419adouble power(adouble x, int n) {
4420adouble z=1;
4421if (n>0) {                         // recursion and branches
4422  int nh =n/2;                     // that do not depend on
4423  z = power(x,nh);                 // adoubles are fine !!!!
4424  z *= z;
4425  if (2*nh != n) 
4426    z *= x;
4427  return z; }                      // end if
4428else {
4429  if (n==0)                        // the local adouble z dies
4430    return z;                      // as it goes out of scope.
4431  else
4432    return 1/power(x,-n); }        // end else
4433} // end power
4434\end{verbatim}
4435The function {\sf power} above was obtained from the original
4436undifferentiated version by simply changing the type of all
4437{\sf double}s including the return variable to {\sf adouble}s. The new version
4438can now be called from within any active section, as in the following
4439main program.
4440\begin{verbatim}
4441#include ...                       // as above
4442int main() {
4443int i,n,tag=1;
4444cout <<"COMPUTATION OF N-TH POWER (ADOL-C Documented Example)\n\n";
4445cout<<"monomial degree=? \n";      // input the desired degree
4446cin >> n;
4447                                   // allocations and initializations
4448double* Y[1];
4449*Y = new double[n+2];
4450double* X[1];                      // allocate passive variables with
4451*X = new double[n+4];              // extra dimension for derivatives
4452X[0][0] = 0.5;                     // function value = 0. coefficient
4453X[0][1] = 1.0;                     // first derivative = 1. coefficient
4454for(i=0;i<n+2;i++)
4455  X[0][i+2]=0;                     // further coefficients
4456double* Z[1];                      // used for checking consistency
4457*Z = new double[n+2];              // between forward and reverse
4458adouble y,x;                       // declare active variables
4459                                   // beginning of active section
4460trace_on(1);                       // tag = 1 and keep = 0
4461x <<= X[0][0];                     // only one independent var
4462y = power(x,n);                    // actual function call
4463y >>= Y[0][0];                     // only one dependent adouble
4464trace_off();                       // no global adouble has died
4465                                   // end of active section
4466double u[1];                       // weighting vector
4467u[0]=1;                            // for reverse call
4468for(i=0;i<n+2;i++) {               // note that keep = i+1 in call
4469  forward(tag,1,1,i,i+1,X,Y);      // evaluate the i-the derivative
4470  if (i==0)
4471    cout << Y[0][i] << " - " << y.value() << " = " << Y[0][i]-y.value() 
4472    << " (should be 0)\n";
4473  else
4474    cout << Y[0][i] << " - " << Z[0][i] << " = " << Y[0][i]-Z[0][i] 
4475    << " (should be 0)\n";
4476  reverse(tag,1,1,i,u,Z);          // evaluate the (i+1)-st derivative
4477  Z[0][i+1]=Z[0][i]/(i+1); }       // scale derivative to Taylorcoeff.
4478return 1;
4479}                                  // end main
4480\end{verbatim}
4481Since this example has only one independent and one dependent variable,
4482{\sf forward} and {\sf reverse} have the same complexity and calculate
4483the same scalar derivatives, albeit with a slightly different scaling.
4484By replacing the function {\sf power} with any other univariate test function,
4485one can check that {\sf forward} and {\sf reverse} are at least consistent.
4486In the following example the number of independents is much larger
4487than the number of dependents, which makes the reverse mode preferable.
4488%
4489\subsection{Determinant Example ({\tt detexam.cpp})}
4490%
4491Now let us consider an exponentially expensive calculation,
4492namely, the evaluation of a determinant by recursive expansion
4493along rows. The gradient of the determinant with respect to the
4494matrix elements is simply the adjoint, i.e. the matrix of cofactors.
4495Hence the correctness of the numerical result is easily checked by
4496matrix-vector multiplication. The example illustrates the use
4497of {\sf adouble} arrays and pointers. 
4498
4499\begin{verbatim}
4500#include <adolc/adouble.h>               // use of active doubles and taping
4501#include <adolc/interfaces.h>            // use of basic forward/reverse
4502                                   // interfaces of ADOL-C
4503adouble** A;                       // A is an n x n matrix
4504int i,n;                           // k <= n is the order
4505adouble det(int k, int m) {        // of the sub-matrix
4506if (m == 0) return 1.0 ;           // its column indices
4507else {                             // are encoded in m
4508  adouble* pt = A[k-1];
4509  adouble t = zero;                // zero is predefined
4510  int s, p =1;
4511  if (k%2) s = 1; else s = -1;
4512  for(i=0;i<n;i++) {
4513    int p1 = 2*p;
4514    if (m%p1 >= p) {
4515      if (m == p) {
4516        if (s>0) t += *pt; else t -= *pt; }
4517      else {
4518        if (s>0)
4519          t += *pt*det(k-1,m-p);   // recursive call to det
4520        else
4521          t -= *pt*det(k-1,m-p); } // recursive call to det
4522      s = -s;}
4523    ++pt;
4524    p = p1;}
4525  return t; }
4526}                                  // end det
4527\end{verbatim}
4528As one can see, the overloading mechanism has no problem with pointers
4529and looks exactly the same as the original undifferentiated function
4530except for the change of type from {\sf double} to {\sf adouble}.
4531If the type of the temporary {\sf t} or the pointer {\sf pt} had not been changed,
4532a compile time error would have resulted. Now consider a corresponding
4533calling program.
4534
4535\begin{verbatim}
4536#include ...                       // as above
4537int main() {
4538int i,j, m=1,tag=1,keep=1;
4539cout << "COMPUTATION OF DETERMINANTS (ADOL-C Documented Example)\n\n";
4540cout << "order of matrix = ? \n";  // select matrix size
4541cin >> n;
4542A = new adouble*[n];             
4543trace_on(tag,keep);                // tag=1=keep
4544  double detout=0.0, diag = 1.0;   // here keep the intermediates for
4545  for(i=0;i<n;i++) {               // the subsequent call to reverse
4546    m *=2;
4547    A[i] = new adouble[n];         // not needed for adoublem
4548    adouble* pt = A[i];
4549    for(j=0;j<n;j++)
4550      A[i][j] <<= j/(1.0+i);       // make all elements of A independent
4551    diag += A[i][i].value();        // value() converts to double
4552    A[i][i] += 1.0; }
4553  det(n,m-1) >>= detout;           // actual function call
4554  printf("\n %f - %f = %f  (should be 0)\n",detout,diag,detout-diag);
4555trace_off();
4556double u[1];
4557u[0] = 1.0;
4558double* B = new double[n*n];
4559reverse(tag,1,n*n,1,u,B);
4560cout <<" \n first base? : ";
4561for (i=0;i<n;i++) {
4562  adouble sum = 0;
4563  for (j=0;j<n;j++)                // the matrix A times the first n
4564    sum += A[i][j]*B[j];           // components of the gradient B
4565  cout<<sum.value()<<" "; }         // must be a Cartesian basis vector
4566return 1;
4567}                                  // end main
4568\end{verbatim}
4569The variable {\sf diag} should be mathematically
4570equal to the determinant, because the
4571matrix {\sf A} is defined as a rank 1 perturbation of the identity.
4572%
4573\subsection{Ordinary Differential Equation Example ({\tt odexam.cpp})}
4574\label{exam:ode}
4575%
4576Here, we consider a nonlinear ordinary differential equation that
4577is a slight modification of the Robertson test problem
4578given in Hairer and Wanner's book on the numerical solution of
4579ODEs \cite{HW}. The following source code computes the corresponding
4580values of $y^{\prime} \in \R^3$:
4581\begin{verbatim}
4582#include <adolc/adouble.h>                  // use of active doubles and taping
4583#include <adolc/drivers/odedrivers.h>       // use of "Easy To use" ODE drivers
4584#include <adolc/adalloc.h>                  // use of ADOL-C allocation utilities
4585
4586void tracerhs(short int tag, double* py, double* pyprime) {
4587adouble y[3];                         // this time we left the parameters
4588adouble yprime[3];                    // passive and use the vector types
4589trace_on(tag);
4590for (int i=0; i<3; i++)
4591     y[i] <<= py[i];                  // initialize and mark independents
4592yprime[0] = -sin(y[2]) + 1e8*y[2]*(1-1/y[0]);
4593yprime[1] = -10*y[0] + 3e7*y[2]*(1-y[1]);
4594yprime[2] = -yprime[0] - yprime[1];
4595yprime >>= pyprime;                   // mark and pass dependents
4596trace_off(tag);
4597}                                     // end tracerhs
4598\end{verbatim}
4599The Jacobian of the right-hand side has large
4600negative eigenvalues, which make the ODE quite stiff. We  have added
4601some numerically benign transcendentals to make the differentiation
4602more interesting.
4603The following main program uses {\sf forode} to calculate the Taylor series
4604defined by the ODE at the given point $y_0$ and {\sf reverse} as well
4605as {\sf accode} to compute the Jacobians of the coefficient vectors
4606with respect to $x_0$.
4607\begin{verbatim}
4608#include .......                   // as above
4609int main() {
4610int i,j,deg; 
4611int n=3;
4612double py[3];
4613double pyp[3];
4614cout << "MODIFIED ROBERTSON TEST PROBLEM (ADOL-C Documented Example)\n";
4615cout << "degree of Taylor series =?\n";
4616cin >> deg;
4617double **X;
4618X=(double**)malloc(n*sizeof(double*));
4619for(i=0;i<n;i++)
4620  X[i]=(double*)malloc((deg+1)*sizeof(double));
4621double*** Z=new double**[n];
4622double*** B=new double**[n];
4623short** nz = new short*[n];
4624for(i=0;i<n;i++) {
4625  Z[i]=new double*[n];
4626  B[i]=new double*[n];
4627  for(j=0;j<n;j++) {
4628    Z[i][j]=new double[deg];
4629    B[i][j]=new double[deg]; }     // end for
4630}                                  // end for
4631for(i=0;i<n;i++) {
4632  py[i] = (i == 0) ? 1.0 : 0.0;    // initialize the base point
4633  X[i][0] = py[i];                 // and the Taylor coefficient;
4634  nz[i] = new short[n]; }          // set up sparsity array
4635tracerhs(1,py,pyp);                // trace RHS with tag = 1
4636forode(1,n,deg,X);                 // compute deg coefficients
4637reverse(1,n,n,deg-1,Z,nz);         // U defaults to the identity
4638accode(n,deg-1,Z,B,nz);
4639cout << "nonzero pattern:\n";
4640for(i=0;i<n;i++) {
4641  for(j=0;j<n;j++)
4642    cout << nz[i][j]<<"\t";
4643  cout <<"\n"; }                   // end for
4644return 1;
4645}                                  // end main
4646\end{verbatim}
4647\noindent The pattern {\sf nz} returned by {\sf accode} is
4648\begin{verbatim}
4649              3  -1   4
4650              1   2   2
4651              3   2   4
4652\end{verbatim}
4653The original pattern {\sf nz} returned by {\sf reverse} is the same
4654except that the negative entry $-1$ was zero.
4655%
4656%\subsection {Gaussian Elimination Example ({\tt gaussexam.cpp})}
4657%\label{gaussexam}
4658%
4659%The following example uses conditional assignments to show the usage of a once produced tape
4660%for evaluation at new arguments. The elimination is performed with
4661%column pivoting.
4662%\begin{verbatim}
4663%#include <adolc/adolc.h>           // use of ALL ADOL-C interfaces
4664
4665%void gausselim(int n, adoublem& A, adoublev& bv) {
4666%along i;                           // active integer declaration
4667%adoublev temp(n);                  // active vector declaration
4668%adouble r,rj,temps;
4669%int j,k;
4670%for(k=0;k<n;k++) {                 // elimination loop
4671%  i = k;
4672%  r = fabs(A[k][k]);               // initial pivot size
4673%  for(j=k+1;j<n;j++) {
4674%    rj = fabs(A[j][k]);             
4675%    condassign(i,rj-r,j);          // look for a larger element in the same
4676%    condassign(r,rj-r,rj); }       // column with conditional assignments
4677%  temp = A[i];                     // switch rows using active subscripting
4678%  A[i] = A[k];                     // necessary even if i happens to equal
4679%  A[k] = temp;                     // k during taping
4680%  temps = bv[i];
4681%  bv[i]=bv[k];
4682%  bv[k]=temps;
4683%  if (!value(A[k][k]))             // passive subscripting
4684%    exit(1);                       // matrix singular!
4685%  temps= A[k][k];
4686%  A[k] /= temps;
4687%  bv[k] /= temps;
4688%  for(j=k+1;j<n;j++) {
4689%    temps= A[j][k];
4690%    A[j] -= temps*A[k];            // vector operations
4691%    bv[j] -= temps*bv[k]; }        // endfor
4692%}                                  // end elimination loop
4693%temp=0.0;
4694%for(k=n-1;k>=0;k--)                // backsubstitution
4695%  temp[k] = (bv[k]-(A[k]*temp))/A[k][k];
4696%bv=temp;
4697%}                                  // end gausselim
4698%\end{verbatim}
4699%\noindent This function can be called from any program
4700%that suitably initializes
4701%the components of {\sf A} and {\sf bv}
4702%as independents. The resulting tape can be
4703%used to solve any nonsingular linear system of the same size and
4704%to get the sensitivities of the solution with respect to the
4705%system matrix and the right hand side.
4706%\vspace*{-4mm}
4707%
4708\section*{Acknowledgements}
4709%
4710Parts of the ADOL-C source were developed by Andreas
4711Kowarz, Hristo Mitev, Sebastian Schlenkrich,  and Olaf
4712Vogel. We are also indebted to George Corliss,
4713Tom Epperly, Bruce Christianson, David Gay,  David Juedes,
4714Brad Karp, Koichi Kubota, Bob Olson,  Marcela Rosemblun, Dima
4715Shiriaev, Jay Srinivasan, Chuck Tyner, Jean Utke, and Duane Yoder for helping in
4716various ways with the development and documentation of ADOL-C.
4717%
4718\begin{thebibliography}{10}
4719
4720\bibitem{BeKh96}
4721Christian~H. Bischof, Peyvand~M. Khademi, Ali Bouaricha and Alan Carle.
4722\newblock {\em Efficient computation of gradients and Jacobians by dynamic
4723  exploitation of sparsity in automatic differentiation}.
4724\newblock Optimization Methods and Software 7(1):1-39, 1996.
4725
4726\bibitem{Chri91a}
4727Bruce Christianson.
4728\newblock {\em Reverse accumulation and accurate rounding error estimates for
4729Taylor series}.
4730\newblock  Optimization Methods and Software 1:81--94, 1992.
4731
4732\bibitem{GeMaPo05}
4733Assefaw Gebremedhin, Fredrik Manne, and Alex Pothen.
4734\newblock {\em What color is your {J}acobian? {G}raph coloring for computing
4735  derivatives}.
4736\newblock SIAM Review 47(4):629--705, 2005.
4737
4738\bibitem{GePoTaWa06}
4739Assefaw Gebremedhin, Alex Pothen, Arijit Tarafdar and Andrea Walther.
4740{\em Efficient Computation of Sparse Hessians: An Experimental Study
4741  using ADOL-C}. Tech. Rep. (2006). To appear in INFORMS Journal on Computing.
4742
4743\bibitem{GePoWa08} Assefaw Gebremedhin, Alex Pothen, and Andrea
4744  Walther.
4745{\em Exploiting  Sparsity  in Jacobian Computation via Coloring and Automatic Differentiation:
4746a Case Study in a Simulated Moving Bed Process}.
4747In Chr. Bischof et al., eds.,  {\em Proceedings AD 2008 conference}, LNCSE 64, pp. 327 -- 338, Springer (2008).
4748
4749\bibitem{GeTaMaPo07}
4750Assefaw Gebremedhin, Arijit Tarafdar, Fredrik Manne, and Alex Pothen,
4751{\em New Acyclic and Star Coloring Algorithms with Applications to Hessian Computation}.
4752SIAM Journal on Scientific Computing 29(3):1042--1072, 2007.
4753
4754\bibitem{Griewank13}
4755Andreas Griewank,
4756{\em On stable piecewise linearization and generalized algorithmic differentiation}.
4757Optimization Methods and Software 28(6):1139--1178, 2013.
4758
4759\bibitem{GrWa08}
4760Andreas Griewank and Andrea Walther: {\em Evaluating Derivatives, Principles and Techniques of
4761  Algorithmic Differentiation. Second edition}. SIAM, 2008.
4762
4763
4764\bibitem{Griewank97}
4765Andreas Griewank, Jean Utke, and Andrea Walther.
4766\newblock {\em Evaluating higher derivative tensors by forward propagation
4767          of univariate Taylor series}.
4768\newblock Mathematics of Computation, 69:1117--1130, 2000.
4769
4770\bibitem{GrWa00}
4771Andreas Griewank and Andrea Walther. {\em Revolve: An Implementation of Checkpointing for the Reverse
4772                 or Adjoint Mode of Computational Differentiation},
4773                 ACM Transaction on Mathematical Software 26:19--45, 2000.
4774
4775\bibitem{HW}
4776    Ernst Hairer and Gerhard Wanner.
4777    {\it Solving Ordinary Differential Equations II.\/}
4778    Springer-Verlag, Berlin, 1991.
4779
4780\bibitem{Knuth73}
4781Donald~E. Knuth.
4782\newblock {\em The Art of Computer Programming. Second edition.}
4783\newblock Addison-Wesley, Reading, 1973.
4784
4785\bibitem{Roeb05}
4786Klaus R\"{o}benack.
4787\newblock{\em Computation of Lie Derivatives of Tensor Fields Required for
4788         Nonlinear Controller and Observer Design Employing Automatic Differentiation. }
4789\newblock PAMM, 5(1): 181-184, 2005.
4790
4791\bibitem{Roeb11}
4792Klaus R\"{o}benack, Jan Winkler and Siqian Wang.
4793\newblock {\em LIEDRIVERS - A Toolbox for the Efficient Computation of Lie
4794          Derivatives Based on the Object-Oriented Algorithmic Differentiation
4795          Package ADOL-C. }
4796\newblock Proc. of the 4th International Workshop on Equation-Based Object-Oriented
4797          Modeling Languages and Tools, F. E. Cellier and D. Broman and P. Fritzson
4798          and E. A. Lee, editors, volume 56 of Link\"oping Electronic Conference Proceedings,
4799          pages 57-66, Zurich 2011.
4800\bibitem{Wa05a}
4801Andrea Walther.
4802\newblock {\em Computing Sparse Hessians with Automatic Differentiation}.
4803\newblock Transaction on Mathematical Software, 34(1), Artikel 3 (2008).
4804\end{thebibliography}
4805\end{document}
4806
Note: See TracBrowser for help on using the repository browser.