source: trunk/ADOL-C/doc/adolc-manual.tex @ 127

Last change on this file since 127 was 127, checked in by kulshres, 12 years ago

Squashed merge of branch 'master' of 'gitclone' into svn

  • 'master' of 'gitclone': (5 commits) execute perms for script add script to update versions in the documentation remove deleted files from dist remove old scripts make a mechanism to update versions all over the place

Details of the commits:

commit 64edab830585174ef8f0b202e69becd20d56d34b
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jul 22 22:40:54 2010 +0200

execute perms for script

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 181366b2473c74e676367693e5dbd78253f64d31
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jul 22 22:37:12 2010 +0200

add script to update versions in the documentation

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 6fe63e6a17039b960d55134fbd2c3bdd55d1a515
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jul 22 22:36:04 2010 +0200

remove deleted files from dist

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit dd0d8aae0514f60d781b1ab28e8be8f72371aca6
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jul 22 22:34:27 2010 +0200

remove old scripts

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit ce16a4d6adc04012c95cd5e814616627a886f34c
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jul 22 22:33:35 2010 +0200

make a mechanism to update versions all over the place

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

File size: 191.6 KB
Line 
1% Latex file containing the documentation of ADOL-C
2%
3% Copyright (C) Andrea Walther, Andreas Griewank, Andreas Kowarz,
4%               Hristo Mitev, Sebastian Schlenkrich, Jean Utke, Olaf Vogel
5%
6% This file is part of ADOL-C. This software is provided as open source.
7% Any use, reproduction, or distribution of the software constitutes
8% recipient's acceptance of the terms of the accompanying license file.
9%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
10
11\documentclass[11pt,twoside]{article}
12\usepackage{hyperref}
13\usepackage{amsmath,amsthm,amssymb}
14\usepackage{graphicx}
15\usepackage{datetime}
16
17\newdateformat{monthyear}{\monthname\ \THEYEAR}
18
19\usepackage{color}
20
21\pagestyle{headings} 
22\bibliographystyle{plain}
23\parskip=6pt
24
25\setlength{\textwidth}{433.6pt}
26\setlength{\oddsidemargin}{23pt}
27\setlength{\evensidemargin}{23pt}
28\setlength{\topmargin}{25.0pt}
29\setlength{\textheight}{580pt}
30\setlength{\baselineskip}{8pt}
31
32\newcommand{\N}{{ {\rm I} \kern -.225em {\rm N} }}
33\newcommand{\R}{{ {\rm I} \kern -.225em {\rm R} }}
34\newcommand{\T}{{ {\rm I} \kern -.425em {\rm T} }}
35
36\renewcommand{\sectionautorefname}{Section}
37\renewcommand{\subsectionautorefname}{Section}
38\renewcommand{\figureautorefname}{Figure}
39\renewcommand{\tableautorefname}{Table}
40
41\setcounter{equation}{0}
42
43\input{version.tex}
44
45%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
46\begin{document}
47
48\begin{titlepage}
49\begin{center}
50{\Large {\bf ADOL-C:}} 
51\footnote{The development of earlier versions was supported by the Office of
52  Scientific Computing, U.S. Department of Energy, the NSF, and the Deutsche
53  Forschungsgemeinschaft. During the development of the current
54  version Andrea Walther and Andreas Kowarz were supported by the
55  grant Wa 1607/2-1 of the Deutsche Forschungsgemeinschaft} 
56\vspace{0.2in} \\
57%
58{\Large A Package for the Automatic Differentiation}\vspace{0.1in} \\
59{\Large of Algorithms Written in C/C++}\\
60\vspace{.2in}
61{\large\bf  Version \packageversion, \monthyear\today} \\
62\bigskip
63 \mbox{Andrea Walther}\footnote{Institute of Mathematics, University
64   of Paderborn, 33098 Paderborn, Germany} and
65 \mbox{Andreas Griewank}\footnote{Department of Mathematics,
66 Humboldt-Universit\"at zu Berlin, 10099 Berlin, Germany}
67\end{center}
68%
69%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
70\begin{abstract}
71The C++ package ADOL-C described here facilitates the evaluation of
72first and higher derivatives of vector functions that are defined
73by computer programs written in C or C++. The resulting derivative
74evaluation routines may be called from C, C++, Fortran, or any other
75language that can be linked with C.
76
77The numerical values of derivative vectors are obtained free
78of truncation errors at a small multiple of the run time and
79random access memory required by the given function evaluation program.
80Derivative matrices are obtained by columns, by rows or in sparse format.
81For solution curves defined by ordinary differential equations,
82special routines are provided that evaluate the Taylor coefficient vectors
83and their Jacobians with respect to the current state vector.
84For explicitly or implicitly defined functions derivative tensors are
85obtained with a complexity that grows only quadratically in their
86degree. The derivative calculations involve a possibly substantial but
87always predictable amount of data. Since the data is accessed strictly sequentially
88it can be automatically paged out to external files.
89\end{abstract}
90%
91{\bf Keywords}: Computational Differentiation, Automatic
92         Differentiation,
93         Chain Rule, Overloading, Taylor Coefficients,
94         Gradients, Hessians, Forward Mode, Reverse Mode,
95         Implicit Function Differentiation, Inverse Function Differentiation
96\medskip
97
98\noindent
99{\bf Abbreviated title}: Automatic differentiation by overloading in C++
100%
101\end{titlepage}
102%
103\tableofcontents       
104%
105
106\newpage
107%
108%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
109%
110%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
111\section{Preparing a Section of C or C++ Code for Differentiation}
112\label{prepar}
113%
114\subsection{Introduction}
115%
116\setcounter{equation}{0}
117The package \mbox{ADOL-C} 
118utilizes overloading in C++, but the
119user has to know only C. The acronym stands for {\bf A}utomatic
120{\bf D}ifferentiation by {\bf O}ver{\bf L}oading in {\bf C}++.
121In contrast to source transformation approaches, overloading does not generate intermediate
122source code.
123As starting points to retrieve further information on techniques and
124application of automatic differentiation, as well as on other AD
125tools, we refer to the book \cite{GrWa08}. Furthermore, the web page
126\verb=http://www.autodiff.org= of the AD community forms a rich source
127of further information and pointers.
128
129
130ADOL-C facilitates the simultaneous
131evaluation of arbitrarily high directional derivatives and the
132gradients of these Taylor coefficients with respect to all independent
133variables. Relative to the cost of evaluating the underlying function,
134the cost for evaluating any such scalar-vector pair grows as the
135square of the degree of the derivative but is still completely
136independent of the numbers $m$ and $n$.
137
138This manual is organized as follows. This section explains the
139modifications required to convert undifferentiated code to code that
140compiles with ADOL-C.
141\autoref{tape} covers aspects of the tape of recorded data that ADOL-C uses to
142evaluate arbitrarily high order derivatives. The discussion includes storage
143requirements and the tailoring of certain tape characteristics to fit specific
144user needs. Descriptions of easy-to-use drivers for a  convenient derivative
145evaluation are contained in \autoref{drivers}.
146\autoref{forw_rev_ad} offers a more mathematical characterization of
147the different modes of AD to compute derivatives. At the same time, the
148corresponding drivers of ADOL-C are explained. 
149The overloaded derivative evaluation routines using the forward and the reverse
150mode of AD are explained in \autoref{forw_rev}.
151Advanced differentiation techniques as the optimal checkpointing for
152time integrations, the exploitation of fixed point iterations, the usages
153of external differentiated functions and the differentiation of OpenMP
154parallel programs are described in \autoref{adv_ad}.
155The tapeless forward mode is presented in \autoref{tapeless}.
156\autoref{install} details the installation and
157use of the ADOL-C package. Finally, \autoref{example} 
158furnishes some example programs that incorporate the ADOL-C package to
159evaluate first and higher-order
160derivatives.  These and other examples are distributed with the ADOL-C
161source code.
162The user should simply refer to them if the more abstract and general
163descriptions of ADOL-C provided in this document do not suffice.
164%
165%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
166\subsection{Declaring Active Variables}
167%
168\label{DecActVar}
169%
170The key ingredient of automatic differentiation by overloading is the
171concept of an {\em active variable}. All variables that may be
172considered as differentiable quantities at some time
173during the program execution must be of an active
174type. ADOL-C uses one
175active scalar type, called {\sf adouble}, whose real part is of the
176standard type {\sf double}.
177Typically, one will declare the independent variables
178and all quantities that directly or indirectly depend on them as
179{\em active}. Other variables that do not depend on the independent
180variables but enter, for example, as parameters, may remain one of the
181{\em passive} types {\sf double, float}, or {\sf int}. There is no
182implicit type conversion from {\sf adouble} to any of these passive
183types; thus, {\bf failure to declare variables as active when they
184depend on other active variables will result in a compile-time error
185message}. In data flow terminology, the set of active variable names
186must contain all its successors in the dependency graph. All components
187of indexed arrays must have the same activity status.
188
189The real component of an {\sf adouble x} can be extracted as
190{\sf x.value()}. In particular,
191such explicit conversions are needed for the standard output procedure
192{\sf printf}. The output stream operator \boldmath $\ll$ \unboldmath is overloaded such
193that first the real part of an {\sf adouble} and then the string
194``{\sf (a)}" is added to the stream. The input stream operator \boldmath $\gg$ \unboldmath  can
195be used to assign a constant value to an {\sf adouble}.
196Naturally, {\sf adouble}s may be
197components of vectors, matrices, and other arrays, as well as
198members of structures or classes.
199
200The C++ class {\sf adouble}, its member functions, and the overloaded
201versions of all arithmetic operations, comparison operators, and
202most ANSI C functions are contained in the file \verb=adouble.cpp= and its
203header \verb=<adolc/adouble.h>=. The latter must be included for compilation
204of all program files containing {\sf adouble}s and corresponding
205operations.
206%
207%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
208\subsection{Marking Active Sections}
209\label{markingActive}
210%
211All calculations involving active variables that occur between
212the void function calls
213\begin{center}
214{\sf trace\_on(tag,keep)} \hspace{0.3in} and \hspace{0.3in}
215{\sf trace\_off(file)}
216\end{center}
217are recorded on a sequential data set called {\em tape}. Pairs of
218these function calls can appear anywhere in a C++ program, but
219they must not overlap. The nonnegative integer argument {\sf tag} identifies the
220particular tape for subsequent function or derivative evaluations.
221Unless several tapes need to be kept, ${\sf tag} =0$ may be used throughout.
222The optional integer arguments {\sf keep} and
223{\sf file} will be discussed in \autoref{tape}. We will refer to the
224sequence of statements executed between a particular call to
225{\sf trace\_on} and the following call to {\sf trace\_off} as an
226{\em active section} of the code. The same active section may be
227entered repeatedly, and one can successively generate several traces
228on distinct tapes by changing the value of {\sf tag}.
229Both functions {\sf trace\_on} and {\sf trace\_off} are prototyped in
230the header file \verb=<adolc/taputil.h>=, which is included by the header
231\verb=<adolc/adouble.h>= automatically.
232
233Active sections may contain nested or even recursive calls to functions
234provided by the user. Naturally, their formal and actual parameters
235must have matching types. In particular, the functions must be
236compiled with their active variables declared as
237{\sf adouble}s and with the header file \verb=<adolc/adouble.h>= included. 
238Variables of type {\sf adouble} may be declared outside an active section and need not
239go out of scope before the end of an active section.
240It is not necessary -- though desirable -- that free-store {\sf adouble}s
241allocated within
242an active section be deleted before its completion. The values of all
243{\sf adouble}s that exist at the beginning and end of an active section
244are automatically
245recorded by {\sf trace\_on} and {\sf trace\_off}, respectively.
246%
247%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
248\subsection{Selecting Independent and Dependent Variables}
249%
250One or more active variables that are read in or initialized to
251the values of constants or passive variables must be distinguished as
252independent variables. Other active variables that are similarly
253initialized may be considered as temporaries (e.g., a variable that
254accumulates the partial sums of a scalar product after being
255initialized to zero). In order to distinguish an active variable {\sf x} as
256independent, ADOL-C requires an assignment of the form
257\begin{center}
258{\sf x} \boldmath $\ll=$ \unboldmath {\sf px}\hspace{0.2in}// {\sf px} of any passive numeric type $\enspace .$
259\end{center}
260This special initialization ensures that {\sf x.value()} = {\sf px}, and it should
261precede any other assignment to {\sf x}. However, {\sf x} may be reassigned
262other values subsequently. Similarly, one or more active variables {\sf y}
263must be distinguished as dependent by an assignment of the form
264\begin{center}
265{\sf y \boldmath $\gg=$ \unboldmath py}\hspace{0.2in}// {\sf py} of any  passive type $\enspace ,$ 
266\end{center}
267which ensures that {\sf py} = {\sf y.value()} and should not be succeeded
268by any other assignment to {\sf y}. However, a dependent variable {\sf y} 
269may have been assigned other real values previously, and it could even be an
270independent variable as well.  The derivative values calculated after
271the
272completion of an active section always represent {\bf derivatives of the final
273values of the dependent variables with respect to the initial values of the
274independent variables}.
275
276The order in which the independent and dependent variables are marked
277by the \boldmath $\ll=$ \unboldmath and \boldmath $\gg=$ \unboldmath statements matters crucially for the subsequent
278derivative evaluations. However, these variables do not have to be
279combined into contiguous vectors. ADOL-C counts the number of
280independent and dependent variable specifications within each active
281section and records them in the header of the tape.
282%
283%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
284\subsection{A Subprogram as an Active Section} 
285%
286As a generic example let us consider a C(++) function of the form
287shown in \autoref{code1}.
288%
289\begin{figure}[hbt]
290\framebox[\textwidth]{\parbox{\textwidth}{
291\begin{center}
292\begin{tabbing}
293{\sf void eval(}\= {\sf int n, int m,} \hspace{0.5 in} \=  // number of independents and dependents\\
294\>{\sf  double *x,} \> // independent variable vector \\
295\>{\sf  double *y,} \> // dependent variable vector  \\ 
296\> {\sf int *k, } \> // integer parameters \\ 
297\>{\sf  double *z)}  \> // real parameters \\
298{\sf \{ }\hspace{0.1 in } \=  \> // beginning of function body \\
299\>{\sf double t = 0;}  \> // local variable declaration \\
300\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> // begin of computation \\
301\>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue  \\
302\>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue \\
303\>{\sf  y[m-1] = t/m; }   \> //   end of computation \\
304{\sf  \} } \>  \> // end of function
305\end{tabbing}
306\end{center}
307}} 
308\caption{Generic example of a subprogram to be activated}
309\label{code1}
310\end{figure}
311%
312
313If {\sf eval} is to be called from within an active C(++)
314section with {\sf x}
315and {\sf y} as vectors of {\sf adouble}s and the other parameters
316passive, then one merely has to change the type declarations of all
317variables that depend on {\sf x} from {\sf double} or {\sf float} to
318{\sf adouble}. Subsequently, the subprogram must be compiled with the
319header file \verb=<adolc/adouble.h>= included as described
320in \autoref{DecActVar}. Now let us consider the situation when {\sf eval} is
321still to be called with integer and real arguments, possibly from
322a program written in Fortran77, which  does not allow overloading.
323
324To automatically compute derivatives of the dependent
325variables {\sf y} with respect to the independent variables {\sf x}, we
326can make the body of the function into an active section. For
327example, we may modify the previous program segment
328as in \autoref{adolcexam}.
329The renaming and doubling up of the original independent and dependent
330variable vectors by active counterparts may seem at first a bit clumsy.
331However, this transformation has the advantage that the calling
332sequence and the computational part, i.e., where the function is
333really evaluated, of {\sf eval} remain completely
334unaltered. If the temporary variable {\sf t} had remained a {\sf double},
335the code would not compile, because of a type conflict in the assignment
336following the declaration. More detailed example codes are listed in
337\autoref{example}.
338
339\begin{figure}[htb]
340\framebox[\textwidth]{\parbox{\textwidth}{
341\begin{center}
342\begin{tabbing}
343{\sf void eval(} \= {\sf  int n,m,} \hspace{1.0 in}\= // number of independents and dependents\\
344\> {\sf double *px,} \> // independent passive variable vector \\
345\> {\sf double *py,} \> // dependent passive variable vector  \\ 
346\> {\sf int *k,}  \> // integer parameters \\
347\> {\sf double *z)} \> // parameter vector \\
348{\sf \{}\hspace{0.1 in}\= \> // beginning of function body \\
349\>{\sf  short int tag = 0;} \>   // tape array and/or tape file specifier\\
350\>{\sf trace\_on(tag);} \> // start tracing  \\
351\>{\sf adouble *x, *y;} \> // declare active variable pointers \\
352\>{\sf x = new adouble[n];}\>// declare active independent variables \\ 
353\>{\sf y = new adouble[m];} \> // declare active dependent variables \\
354\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \\
355\>\hspace{0.2in} {\sf x[i] \boldmath $\ll=$ \unboldmath  px[i];} \> // select independent variables \\
356\>{\sf adouble t = 0;}  \> // local variable declaration \\
357     \>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> //  begin crunch \\
358     \>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue crunch \\
359     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
360     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
361     \>{\sf  y[m-1] = t/m; }   \> //   end crunch as before\\
362     \>{\sf for (int j=0; j \boldmath $<$ \unboldmath m; j++)} \\
363     \>\hspace{0.2in}{\sf y[j] \boldmath $\gg=$ \unboldmath py[j];} \> // select dependent variables \\
364     \>{\sf  delete[] y;} \>// delete dependent active variables \\
365     \>{\sf  delete[] x;} \>// delete independent active variables \\
366     \>{\sf trace\_off();} \> // complete tape \\
367{\sf  \}}   \>\> // end of function
368\end{tabbing} 
369\end{center}}}
370\caption{Activated version of the code listed in \autoref{code1}}
371\label{adolcexam}
372\end{figure}
373%
374%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
375\subsection{Overloaded Operators and Functions}
376\label{OverOper}
377%
378As in the subprogram discussed above, the actual computational
379statements of a C(++) code need not be altered for the purposes of
380automatic differentiation. All arithmetic operations, as well as the
381comparison and assignment operators, are overloaded, so any or all of
382their operands can be an active variable. An {\sf adouble x} occurring
383in a comparison operator is effectively replaced by its real value
384{\sf x.value()}. Most functions contained in the ANSI C standard for
385the math library are overloaded for active arguments. The only
386exceptions are the non-differentiable functions {\sf fmod} and
387{\sf modf}. Otherwise, legitimate C code in active sections can remain
388completely unchanged, provided the direct output of active variables
389is avoided. The rest of this subsection may be skipped by first time
390users who are not worried about marginal issues of differentiability
391and efficiency.
392
393The modulus {\sf fabs(x)} is everywhere Lipschitz continuous but not
394properly differentiable at the origin, which raises the question of
395how this exception ought to be handled. Fortunately, one can easily
396see that {\sf fabs(x)} and all its compositions with smooth
397functions are still directionally differentiable. These
398directional derivatives of arbitrary order can be propagated in the
399forward mode without any ambiguity. In other words, the forward mode as
400implemented in ADOL-C  computes Gateaux derivatives
401in certain directions, which reduce to Fr\'echet derivatives only
402if the dependence on the direction is linear. Otherwise,
403the directional derivatives are merely positively homogeneous with
404respect to the scaling of the directions.
405For the reverse mode, ADOL-C sets the derivative of {\sf fabs(x)} at
406the origin somewhat arbitrarily to zero.
407
408We have defined binary functions {\sf fmin} and {\sf fmax} for {\sf adouble}
409arguments, so that function and derivative values are obtained consistent
410with those of {\sf fabs} according to the identities
411\[
412 \min(a,b) = [a+b-|a-b|]/2 \quad {\rm and} \quad
413 \max(a,b) = [a+b+|a-b|]/2 \quad .
414\]
415These relations cannot hold if either $a$ or $b$ is infinite, in which
416case {\sf fmin} or {\sf fmax} and their derivatives may still be well
417defined. It should be noted that the directional differentiation of
418{\sf fmin} and {\sf fmax} yields at ties $a=b$ different results from
419the corresponding assignment based on the sign of $a-b$. For example,
420the statement
421\begin{center}
422 {\sf if (a $<$ b) c = a; else c = b;}
423\end{center} 
424yields for {\sf a}~=~{\sf b} and {\sf a}$^\prime < $~{\sf b}$^\prime$
425the incorrect directional derivative value
426{\sf c}$^\prime = $~{\sf  b}$^\prime$ rather than the correct
427{\sf c}$^\prime = $~{\sf  a}$^\prime$. Therefore this form of conditional assignment
428should be avoided by use of the function $\sf fmin(a,b)$. There
429are also versions of {\sf fmin} and {\sf fmax} for two passive
430arguments and mixed passive/active arguments are handled by
431implicit conversion.
432On the function class obtained by composing the modulus with real
433analytic functions, the concept of directional differentiation can be
434extended to the propagation of unique one-sided Taylor expansions.
435The branches taken by {\sf fabs, fmin}, and {\sf fmax}, are recorded
436on the tape.
437
438The functions {\sf sqrt}, {\sf pow}, and some inverse trigonometric
439functions have infinite slopes at the boundary points of their domains.
440At these marginal points the derivatives are set by ADOL-C to
441either {\sf $\pm$InfVal}, 0
442or {\sf NoNum}, where {\sf InfVal} and {\sf NoNum} are user-defined
443parameters, see \autoref{Customizing}.
444On IEEE machines {\sf InfVal} can be set to the special value
445{\sf Inf}~=~$1.0/0.0$ and {\sf NoNum} to {\sf NaN}~=~$0.0/0.0$.
446For example, at {\sf a}~=~0 the first derivative {\sf b}$^\prime$ 
447of {\sf b}~=~{\sf sqrt(a)} is set to
448\[
449{\sf b}^\prime = \left\{
450\begin{array}{ll}
451\sf InfVal&\mbox{if}\;\; {\sf a}^\prime>0  \\
4520&\mbox{if}\;\;{\sf a}^\prime =0 \\
453\sf NoNum&\mbox{if}\;\;{\sf a}^\prime <0\\
454\end{array} \right\enspace .
455\]
456In other words, we consider {\sf a} and
457consequently {\sf b}  as a constant when {\sf a}$^\prime$ or more generally
458all computed Taylor coefficients are zero.
459
460The general power function ${\sf pow(x,y)=x^y}$ is computed whenever
461it is defined for the corresponding {\sf double} arguments. If {\sf x} is
462negative, however, the partial derivative with respect to an integral exponent
463is set to zero.
464%Similarly, the partial of {\bf pow} with respect to both arguments
465%is set to zero at the origin, where both arguments vanish.     
466The derivatives of the step functions
467{\sf floor}, {\sf ceil}, {\sf frexp}, and {\sf ldexp} are set to zero at all
468arguments {\sf x}. The result values of the step functions
469are recorded on the tape and can later be checked to recognize
470whether a step to another level was taken during a forward sweep
471at different arguments than at taping time.
472
473Some C implementations supply other special
474functions, in particular the error function {\sf erf(x)}. For the
475latter, we have included an {\sf adouble} version in \verb=<adouble.cpp>=, which
476has been commented out for systems on which the {\sf double} valued version
477is not available. The increment and decrement operators {\sf ++}, \boldmath $--$ \unboldmath (prefix and
478postfix) are available for {\sf adouble}s.
479%
480% XXX: Vector and matrix class have to be reimplemented !!!
481%
482% and also the
483%active subscripts described in the \autoref{act_subscr}.
484Ambiguous statements like {\sf a += a++;} must be
485avoided because the compiler may sequence the evaluation of the
486overloaded
487expression differently from the original in terms of {\sf double}s.
488
489As we have indicated above, all subroutines called with active arguments
490must be modified or suitably overloaded. The simplest procedure is
491to declare the local variables of the function as active so that
492their internal calculations are also recorded on the tape.
493Unfortunately, this approach is likely to be unnecessarily inefficient
494and inaccurate if the original subroutine evaluates a special function
495that is defined as the solution of a particular mathematical problem.
496The most important examples are implicit functions, quadratures,
497and solutions of ordinary differential equations. Often
498the numerical methods for evaluating such special functions are
499elaborate, and their internal workings are not at all differentiable in
500the data. Rather than differentiating through such an adaptive
501procedure, one can obtain first and higher derivatives directly from
502the mathematical definition of the special function. Currently this
503direct approach has been implemented only for user-supplied quadratures
504as described in \autoref{quadrat}.
505%
506%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
507\subsection{Reusing the Tape for Arbitrary Input Values}
508\label{reuse_tape}
509%
510In some situations it may be desirable to calculate the value and
511derivatives of a function at arbitrary arguments by using a tape of
512the function evaluation at one argument and reevaluating the
513function  and its derivatives using the given ADOL-C
514routines. This approach can
515significantly reduce run times, and it
516also allows to port problem functions, in the form of the 
517corresponding tape files, into a computing environment that
518does not support C++ but does support C or Fortran. 
519Therefore, the routines provided by ADOL-C for the evaluation of derivatives
520can be used to at arguments $x$ other than the
521point at which the tape was generated, provided there are
522no user defined quadratures and all comparisons involving
523{\sf adouble}s yield the same result. The last condition
524implies that the control flow is unaltered by the change
525of the independent variable values. Therefore, this sufficient
526condition is tested by ADOL-C and if it is not met
527the ADOL-C routine called for derivative calculations indicates this
528contingency through its return value. Currently, there are six return values,
529see \autoref{retvalues}.
530\begin{table}[h]
531\center\small
532\begin{tabular}{|r|l|}\hline
533 +3 &
534\begin{minipage}{12.5cm}
535\vspace*{1ex}
536The function is locally analytic.
537\vspace*{1ex}
538\end{minipage} \\ \hline
539 +2 &
540\begin{minipage}{12.5cm}
541\vspace*{1ex}
542The function is locally analytic but the sparsity
543structure (compared to the situation at the  taping point)
544may have changed, e.g. while at taping arguments
545{\sf fmax(a,b)} returned {\sf a} we get {\sf b} at
546the argument currently used.
547\vspace*{1ex}
548\end{minipage} \\ \hline
549 +1 &
550\begin{minipage}{12.5cm}
551\vspace*{1ex}
552At least one of the functions {\sf fmin}, {\sf fmax} or {\sf fabs}
553is  evaluated at a tie or zero, respectively.  Hence, the function to be differentiated is
554Lipschitz-continuous but possibly non-differentiable.
555\vspace*{1ex}
556\end{minipage} \\ \hline
557 0 &
558\begin{minipage}{12.5cm}
559\vspace*{1ex}
560Some arithmetic comparison involving {\sf adouble}s yields a tie.
561Hence, the function to be differentiated  may be discontinuous.
562\vspace*{1ex}
563\end{minipage} \\ \hline
564 -1 &
565\begin{minipage}{12.5cm}
566\vspace*{1ex}
567An {\sf adouble} comparison yields different results
568from the evaluation point at which the tape was generated.
569\vspace*{1ex}
570\end{minipage} \\ \hline
571 -2 &
572\begin{minipage}{12.5cm}
573\vspace*{1ex}
574The argument of a user-defined quadrature has changed
575from the evaluation point at which the tape was generated.
576\vspace*{1ex}
577\end{minipage} \\ \hline
578\end{tabular}
579\caption{Description of return values}
580\label{retvalues}
581\end{table}                           
582
583\begin{figure}[h]
584\centering\includegraphics[width=10.0cm]{tap_point}
585\caption{Return values around the taping point}
586\label{fi:tap_point}
587\end{figure}         
588
589In \autoref{fi:tap_point} these return values are illustrated.
590If the user finds the return value of an ADOL-C routine to be negative the
591taping process simply has to be repeated by executing the active section again.
592The crux of the problem lies in the fact that the tape records only
593the operations that are executed during one particular evaluation of the
594function.
595It also has no way to evaluate integrals since the corresponding
596quadratures are never recorded on the tape.
597Therefore, when there are user-defined quadratures the retaping is necessary at each
598new point. If there are only branches conditioned on {\sf adouble}
599comparisons one may hope that re-taping becomes unnecessary when
600the points settle down in some small neighborhood, as one would
601expect for example in an iterative equation solver.
602%
603%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
604\subsection{Conditional Assignments}
605\label{condassign}
606%
607It appears unsatisfactory that, for example, a simple table lookup
608of some physical property forces the re-recording of a possibly
609much larger calculation. However, the basic philosophy of ADOL-C
610is to overload arithmetic, rather than to generate a new program
611with jumps between ``instructions'', which would destroy the
612strictly sequential tape access and
613require the infusion of substantial compiler technology.
614Therefore, we introduce the two constructs of conditional
615assignments and active integers as partial remedies to the
616branching problem.
617
618In many cases, the functionality of branches
619can be replaced by conditional assignments. 
620For this purpose, we provide a special function called
621{\sf condassign(a,b,c,d)}. Its calling sequence corresponds to the
622syntax of the conditional assignment
623\begin{center}
624    {\sf a = (b \boldmath $>$ \unboldmath 0) ? c : d;} 
625\end{center}
626which C++ inherited from C. However, here the arguments are restricted to be
627active or passive scalar arguments, and all expression arguments
628are evaluated before the test on {\sf  b}, which is different from
629the usual conditional assignment or the code segment.
630
631Suppose the original program contains the code segment
632\begin{center}
633{\sf if (b \boldmath $>$ \unboldmath 0) a = c; else a = d;}\\
634\end{center}
635Here, only one of the expressions (or, more generally, program blocks)
636{\sf c} and {\sf d} is evaluated, which exactly constitutes the problem
637for ADOL-C. To obtain the correct value {\sf a} with ADOL-C, one
638may first execute both branches and then pick either {\sf c}
639or {\sf d} using
640{\sf condassign(a,b,c,d)}. To maintain
641consistency with the original code, one has to make sure
642that the two branches do not have any side effects that can
643interfere with each other or may be important for subsequent
644calculations. Furthermore the test parameter {\sf b} has to be an
645{\sf adouble} or an {\sf adouble} expression. Otherwise the
646test condition {\sf b} is recorded on the tape as a {\em constant} with its
647run time value. Thus the original dependency of {\sf b} on
648active variables gets lost, for instance if {\sf b} is a comparison
649expression, see \autoref{OverOper}.
650If there is no {\sf else} part in a conditional assignment, one may call
651the three argument version
652{\sf condassign(a,b,c)}, which
653is logically equivalent to {\sf condassign(a,b,c,a)} in that
654nothing happens if {\sf b} is non-positive. 
655The header file \verb=<adolc/adouble.h>=
656contains also corresponding definitions of
657{\sf condassign(a,b,c,d)} 
658and {\sf condassign(a,b,c)} for
659passive {\sf double} arguments so that the modified code
660without any differentiation can be tested
661for correctness.
662%
663%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
664\subsection{Step-by-Step Modification Procedure}
665%
666To prepare a section of given C or C++ code for automatic
667differentiation as described above, one applies the following step-by-step procedure.
668\begin{enumerate}
669\item
670Use the statements {\sf trace\_on(tag)} or {\sf trace\_on(tag,keep)}
671and {\sf trace\_off()} or {\sf trace\_off(file)} to mark the
672beginning and end of the active section.
673\item 
674Select the set of active variables, and change their type from
675{\sf double} or {\sf float} to {\sf adouble}.
676\item
677Select a sequence of independent variables, and initialize them with
678\boldmath $\ll=$ \unboldmath assignments from passive variables or vectors.
679\item
680Select a sequence of dependent variables among the active variables,
681and pass their final values to passive variable or vectors thereof
682by \boldmath $\gg=$ \unboldmath assignments.
683\item 
684Compile the codes after including the header file \verb=<adolc/adouble.h>=.
685\end{enumerate}
686Typically, the first compilation will detect several type conflicts
687-- usually attempts to convert from active to passive
688variables or to perform standard I/O of active variables.
689Since all standard
690C programs can be activated by a mechanical application of the
691procedure above, the following section is of importance
692only to advanced users.
693%                                                                 
694%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
695\section{Numbering the Tapes and Controlling the Buffer}
696\label{tape}
697%
698The trace generated by the execution of an active section may stay
699within a triplet of internal arrays or it may be written out
700to three corresponding files. We will refer to these triplets as the
701tape array or tape file, in general tape, which may subsequently be
702used to evaluate the
703underlying function and its derivatives at the original point or at
704alternative arguments. If the active section involves user-defined
705quadratures it must be executed and
706re-taped at each new argument. Similarly, if conditions on
707{\sf adouble} values lead to a different program branch being taken at
708a new argument the evaluation process also needs to be re-taped at the
709new point. Otherwise, direct evaluation from
710the tape by the routine {\sf function} (\autoref{optdrivers}) is
711likely to be
712faster. The use of quadratures and the results of all comparisons on
713{\sf adouble}s are recorded on the tape so that {\sf function} and other
714forward routines stop and  return appropriate flags if their use without
715prior re-taping is unsafe. To avoid any re-taping certain types of
716branches can be recorded on the tape through
717the use of conditional assignments 
718described before in \autoref{condassign}.
719
720Several tapes may be generated and kept simultaneously.
721A tape array is used as a triplet of buffers or a tape file is generated if
722the length of any of the buffers exceeds the maximal array lengths of
723{\sf OBUFSIZE}, {\sf VBUFSIZE} or {\sf LBUFSIZE}. These parameters are
724defined in the header file \verb=<adolc/usrparms.h>=
725and may be adjusted by the user in the header file before compiling
726the ADOL-C library, or on runtime using a file named \verb=.adolcrc=.
727
728For simple usage, {\sf trace\_on} may be called with only the tape
729{\sf tag} as argument, and {\sf trace\_off} may be called
730without argument. The optional integer argument {\sf keep} of
731{\sf trace\_on} determines whether the numerical values of all
732active variables are recorded in a buffered temporary array or file
733called the taylor stack.
734This option takes effect if
735{\sf keep} = 1 and prepares the scene for an immediately following
736gradient evaluation by a call to a routine implementing the reverse mode
737as described in the \autoref{forw_rev_ad} and \autoref{forw_rev}. A
738file is used instead of an array if the size exceeds the maximal array
739length of {\sf TBUFSIZE} defined in \verb=<adolc/usrparms.h>= and may
740be adjusted in the same way like the other buffer sizes mentioned above.
741Alternatively, gradients may be evaluated by a call
742to {\sf gradient}, which includes a preparatory forward sweep
743for the creation of the temporary file. If omitted, the argument
744{\sf  keep} defaults to 0, so that no temporary
745taylor stack file is generated.
746
747By setting the optional integer argument {\sf file} of
748{\sf  trace\_off} to 1, the user may force a numbered  tape
749file to be written even if the tape array (buffer) does not overflow.
750If the argument {\sf file} is omitted, it
751defaults to 0, so that the tape array is written onto a tape file only
752if the length of any of the buffers exceeds {\sf [OLVT]BUFSIZE} elements.
753
754After the execution of an active section, if a tape file was generated, i.e.,
755if the length of some buffer exceeded {\sf [OLVT]BUFSIZE} elements or if the
756argument {\sf file} of {\sf trace\_off} was set to 1, the files will be
757saved in the directory defined as {\sf ADOLC\_TAPE\_DIR} (by default
758the current working directory) under filenames formed by
759the strings {\sf ADOLC\_OPERATIONS\_NAME}, {\sf
760  ADOLC\_LOCATIONS\_NAME}, {\sf ADOLC\_VALUES\_NAME} and {\sf
761  ADOLC\_TAYLORS\_NAME} defined in
762the header file \verb=<adolc/dvlparms.h>= appended with the number
763given as the {\sf tag} argument to {\sf trace\_on} and have the
764extension {\sf .tap}.
765
766 Later, all problem-independent routines
767like {\sf gradient}, {\sf jacobian}, {\sf forward}, {\sf reverse}, and others
768expect as first argument a {\sf tag} to determine
769the tape on which their respective computational task is to be performed.
770By calling {\sf trace\_on} with different tape {\sf tag}s, one can create
771several tapes for various function evaluations and subsequently perform
772function and derivative evaluations on one or more of them.
773
774For example, suppose one wishes to calculate for two smooth functions
775$f_1(x)$ and $f_2(x)$ 
776\[
777   f(x) = \max \{f_1(x) ,f_2(x)\},\qquad \nabla f(x),
778\]
779and possibly higher derivatives where the two functions do not tie.
780Provided $f_1$ and $f_2$ are evaluated in two separate active sections,
781one can generate two different tapes by calling {\sf trace\_on} with
782{\sf tag} = 1 and {\sf tag} = 2 at the beginning of the respective active
783sections.
784Subsequently, one can decide whether $f(x)=f_1(x)$ or $f(x)=f_2(x)$ at the
785current argument and then evaluate the gradient $\nabla f(x)$ by calling
786{\sf gradient} with the appropriate argument value {\sf tag} = 1 or
787{\sf tag} = 2.
788%
789%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
790\subsection{Examining the Tape and Predicting Storage Requirements }
791\label{examiningTape}
792%
793At any point in the program, one may call the routine
794\begin{center}
795{\sf void tapestats(unsigned short tag, int* counts)}
796\end{center}
797with {\sf counts} beeing an array of at least eleven integers.
798The first argument {\sf tag} specifies the particular tape of
799interest. The components of {\sf counts} represent
800\[
801\begin{tabular}{ll}
802{\sf counts[0]}: & the number of independents, i.e.~calls to \boldmath $\ll=$ \unboldmath, \\
803{\sf counts[1]}: & the number of dependents, i.e.~calls to \boldmath $\gg=$ \unboldmath,\\ 
804{\sf counts[2]}: & the maximal number of live active variables,\\
805{\sf counts[3]}: & the size of value stack (number of overwrites),\\
806{\sf counts[4]}: & the buffer size (a multiple of eight),
807\end{tabular}
808\]
809\[
810\begin{tabular}{ll}
811{\sf counts[5]}: & the total number of operations recorded,\\
812{\sf counts[6-10]}: & other internal information about the tape.
813\end{tabular}
814\]
815The values {\sf maxlive} = {\sf counts[2]} and {\sf vssize} = {\sf counts[3]} 
816determine the temporary
817storage requirements during calls to the routines
818implementing the forward and the reverse mode.
819For a certain degree {\sf deg} $\geq$ 0, the scalar version of the
820forward mode involves apart from the tape buffers an array of
821 $(${\sf deg}$+1)*${\sf maxlive} {\sf double}s in
822core and, in addition, a sequential data set called the value stack
823of {\sf vssize}$*${\sf keep} {\sf revreal}s if called with the
824option {\sf keep} $>$ 0. Here
825the type {\sf revreal} is defined as {\sf double} or {\sf float} in
826the header file \verb=<adolc/usrparms.h>=. The latter choice halves the storage
827requirement for the sequential data set, which stays in core if
828its length is less than {\sf TBUFSIZE} bytes and is otherwise written
829out to a temporary file. The parameter {\sf TBUFSIZE} is defined in the header file \verb=<adolc/usrparms.h>=.
830The drawback of the economical
831{\sf revreal} = {\sf float} choice is that subsequent calls to reverse mode implementations
832yield gradients and other adjoint vectors only in single-precision
833accuracy. This may be acceptable if the adjoint vectors
834represent rows of a Jacobian that is  used for the calculation of
835Newton steps. In its scalar version, the reverse mode implementation involves
836the same number of {\sf double}s and twice as many {\sf revreal}s as the
837forward mode implementation.
838The storage requirements of the vector versions of the forward mode and
839reverse mode implementation are equal to that of the scalar versions multiplied by
840the vector length.
841%
842%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
843\subsection{Customizing ADOL-C}
844\label{Customizing}
845%
846Based on the information provided by the routine {\sf tapestats}, the user may alter the
847following types and constant dimensions in the header file \verb=<adolc/usrparms.h>=
848to suit his problem and environment.
849
850\begin{description}
851\item[{\sf OBUFSIZE}, {\sf LBUFSIZE}, {\sf VBUFSIZE}{\rm :}] These integer determines the length of
852in\-ter\-nal buf\-fers (default: 65$\,$536). If the buffers are large enough to accommodate all
853required data, any file access is avoided unless {\sf trace\_off}
854is called with a positive argument. This desirable situation can
855be achieved for many problem functions with an execution trace of moderate
856size. Primarily these values occur as an argument
857to {\sf malloc}, so that setting it unnecessarily large may have no
858ill effects, unless the operating system prohibits or penalizes large
859array allocations.
860
861\item[{\sf TBUFSIZE}{\rm :}] This integer determines the length of the
862in\-ter\-nal buf\-fer for a taylor stack (default: 65$\,$536).
863
864\item[{\sf TBUFNUM}{\rm :}] This integer determines the maximal number of taylor stacks (default: 32).
865
866\item[{\sf locint}{\rm :}] The range of the integer type
867{\sf locint} determines how many {\sf adouble}s can be simultaneously
868alive (default: {\sf unsigned int}).  In extreme cases when there are more than $2^{32}$ {\sf adouble}s
869alive at any one time, the type {\sf locint} must be changed to
870 {\sf unsigned long}.
871
872\item[{\sf revreal}{\rm :}] The choice of this floating-point type
873trades accuracy with storage for reverse sweeps (default: {\sf double}). While functions
874and their derivatives are always evaluated in double precision
875during forward sweeps, gradients and other adjoint vectors are obtained
876with the precision determined by the type {\sf revreal}. The less
877accurate choice {\sf revreal} = {\sf float} nearly halves the
878storage requirement during reverse sweeps.
879
880\item[{\sf fint}{\rm :}] The integer data type used by Fortran callable versions of functions.
881
882\item[{\sf fdouble}{\rm :}] The floating point data type used by Fortran callable versions of functions.
883
884\item[{\sf inf\_num}{\rm :}] This together with {\sf inf\_den}
885sets the ``vertical'' slope {\sf InfVal} = {\sf inf\_num/inf\_den} 
886of special functions at the boundaries of their domains (default: {\sf inf\_num} = 1.0). On IEEE machines
887the default setting produces the standard {\sf Inf}. On non-IEEE machines
888change these values to produce a small {\sf InfVal} value and compare
889the results of two forward sweeps with different {\sf InfVal} settings
890to detect a ``vertical'' slope.
891
892\item[{\sf inf\_den}{\rm :}] See {\sf inf\_num} (default: 0.0).
893
894\item[{\sf non\_num}{\rm :}] This together with {\sf non\_den} 
895sets the mathematically
896undefined derivative value {\sf NoNum} = {\sf non\_num/non\_den}
897of special functions at the boundaries of their domains (default: {\sf non\_num} = 0.0). On IEEE machines
898the default setting produces the standard {\sf NaN}. On non-IEEE machines
899change these values to produce a small {\sf NoNum} value and compare
900the results of two forward sweeps with different {\sf NoNum} settings
901to detect the occurrence of undefined derivative values.
902
903\item[{\sf non\_den}{\rm :}] See {\sf non\_num} (default: 0.0).
904
905\item[{\sf ADOLC\_EPS}{\rm :}] For testing on small numbers to avoid overflows (default: 10E-20).
906
907\item[{\sf ATRIG\_ERF}{\rm :}] By removing the comment signs
908the overloaded versions of the inverse hyperbolic functions and
909the error function are enabled (default: undefined).
910
911\item[{\sf DIAG\_OUT}{\rm :}] File identifier used as standard output for ADOL-C diagnostics (default: stdout).
912
913\item[{\sf ADOLC\_USE\_CALLOC}{\rm :}] Selects the memory allocation routine
914  used by ADOL-C. {\sf Malloc} will be used if this variable is
915  undefined. {\sf ADOLC\_USE\_CALLOC} is defined by default to avoid incorrect
916  result caused by uninitialized memory.
917\end{description}
918%
919%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
920\subsection{Warnings and Suggestions for Improved Efficiency}
921\label{WarSug}
922%
923Since the type {\sf adouble} has a nontrivial constructor,
924the mere declaration of large {\sf adouble} arrays may take up
925considerable run time. The user should be warned against
926the usual Fortran practice of declaring fixed-size arrays
927that can accommodate the largest possible case of an evaluation program
928with variable dimensions. If such programs are converted to or written
929in C, the overloading in combination with ADOL-C will lead to very
930large run time increases for comparatively small values of the
931problem dimension, because the actual computation is completely
932dominated by the construction of the large {\sf adouble} arrays.
933The user is advised to
934create dynamic arrays of
935{\sf adouble}s by using the C++ operator {\sf new} and to destroy them
936using {\sf delete}. For storage efficiency it is desirable that
937dynamic objects are created and destroyed in a last-in-first-out
938fashion.
939
940Whenever an {\sf adouble} is declared, the constructor for the type
941{\sf adouble} assigns it a nominal address, which we will refer to as
942its  {\em location}.  The location is of the type {\sf locint} defined
943in the header file \verb=<adolc/usrparms.h>=. Active vectors occupy
944a range of contiguous locations. As long as the program execution
945never involves more than 65$\,$536 active variables, the type {\sf locint}
946may be defined as {\sf unsigned short}. Otherwise, the range may be
947extended by defining {\sf locint} as {\sf (unsigned) int} or
948{\sf (unsigned) long}, which may nearly double
949the overall mass storage requirement. Sometimes one can avoid exceeding
950the accessible range of {\sf unsigned short}s by using more local variables and deleting
951{\sf adouble}s  created by the new operator in a
952last-in-first-out
953fashion.  When memory for {\sf adouble}s is requested through a call to
954{\sf malloc()} or other related C memory-allocating
955functions, the storage for these {\sf adouble}s is allocated; however, the
956C++ {\sf adouble} constructor is never called.  The newly defined
957{\sf adouble}s are never assigned a location and are not counted in
958the stack of live variables. Thus, any results depending upon these
959pseudo-{\sf adouble}s will be incorrect. For these reasons {\bf DO NOT use
960  malloc() and related C memory-allocating
961functions when declaring adoubles (see the following paragraph).}
962%
963% XXX: Vector and matrix class have to be reimplemented !!!
964%
965%The same point applies, of course,
966% for active vectors.
967
968When an {\sf adouble}
969%
970% XXX: Vector and matrix class have to be reimplemented !!!
971%
972% or {\bf adoublev}
973goes out of
974scope or is explicitly deleted, the destructor notices that its
975location(s) may be
976freed for subsequent (nominal) reallocation. In general, this is not done
977immediately but is delayed until the locations to be deallocated form a
978contiguous tail of all locations currently being used. 
979
980 As a consequence of this allocation scheme, the currently
981alive {\sf adouble} locations always form a contiguous range of integers
982that grows and shrinks like a stack. Newly declared {\sf adouble}s are
983placed on the top so that vectors of {\sf adouble}s obtain a contiguous
984range of locations. While the C++ compiler can be expected to construct
985and destruct automatic variables in a last-in-first-out fashion, the
986user may upset this desirable pattern by deleting free-store {\sf adouble}s
987too early or too late. Then the {\sf adouble} stack may grow
988unnecessarily, but the numerical results will still be
989correct, unless an exception occurs because the range of {\sf locint}
990is exceeded. In general, free-store {\sf adouble}s
991%
992% XXX: Vector and matrix class have to be reimplemented !!!
993%
994%and {\bf adoublev}s
995should be deleted in a last-in-first-out fashion toward the end of
996the program block in which they were created.
997When this pattern is maintained, the maximum number of
998{\sf adouble}s alive and, as a consequence, the
999randomly accessed storage space
1000of the derivative evaluation routines is bounded by a
1001small multiple of the memory used in the relevant section of the
1002original program. Failure to delete dynamically allocated {\sf adouble}s
1003may cause that the  maximal number of {\sf adouble}s alive at one time will be exceeded
1004if the same active section is called repeatedly. The same effect
1005occurs if static {\sf adouble}s are used.
1006
1007To avoid the storage and manipulation of structurally
1008trivial derivative values, one should pay careful attention to
1009the naming of variables. Ideally, the intermediate
1010values generated during the evaluation of a vector function
1011should be assigned to program variables that are
1012consistently either active or passive, in that all their values
1013either are or are not dependent on the independent variables
1014in a nontrivial way. For example, this rule is violated if a temporary
1015variable is successively used to accumulate inner products involving
1016first only passive and later active arrays. Then the first inner
1017product and all its successors in the data dependency graph become
1018artificially active and the derivative evaluation routines
1019described later will waste
1020time allocating and propagating
1021trivial or useless derivatives. Sometimes even values that do
1022depend on the independent variables may be of only transitory
1023importance and may not affect the dependent variables. For example,
1024this is true for multipliers that are used to scale linear
1025equations, but whose values do not influence the dependent
1026variables in a mathematical sense. Such dead-end variables
1027can be deactivated by the use of the {\sf value} function, which
1028converts {\sf adouble}s to {\sf double}s. The deleterious effects
1029of unnecessary activity are partly alleviated by run time
1030activity flags in the derivative routine
1031{\sf hov\_reverse} presented in \autoref{forw_rev_ad}.
1032%
1033%
1034%
1035%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1036\section{Easy-To-Use Drivers}
1037\label{drivers}
1038%
1039For the convenience of the user, ADOL-C provides several
1040easy-to-use drivers that compute the most frequently required
1041derivative objects. Throughout, we assume that after the execution of an
1042active section, the corresponding tape with the identifier {\sf tag}
1043contains a detailed record of the computational process by which the
1044final values $y$ of the dependent variables were obtained from the
1045values $x$ of the independent variables. We will denote this functional
1046relation between the input variables $x$ and the output variables $y$ by
1047\[
1048F : \R^n \mapsto \R^m, \qquad x \rightarrow F(x) \equiv y.
1049\]
1050The return value of all drivers presented in this section
1051indicate the validity of the tape as explained in \autoref{reuse_tape}.
1052The presented drivers are all C functions and therefore can be used within
1053C and C++ programs. Some Fortran-callable companions can be found
1054in the appropriate header files.
1055%
1056%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1057\subsection{Drivers for Optimization and Nonlinear Equations}
1058%
1059\label{optdrivers}
1060%
1061The drivers provided for solving optimization problems and nonlinear
1062equations are prototyped in the header file \verb=<adolc/drivers/drivers.h>=,
1063which is included automatically by the global header file \verb=<adolc/adolc.h>=
1064(see \autoref{ssec:DesIH}).
1065
1066The routine {\sf function} allows to evaluate the desired function from
1067the tape instead of executing the corresponding source code:
1068%
1069\begin{tabbing}
1070\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1071\>{\sf int function(tag,m,n,x,y)}\\
1072\>{\sf short int tag;}         \> // tape identification \\
1073\>{\sf int m;}                 \> // number of dependent variables $m$\\
1074\>{\sf int n;}                 \> // number of independent variables $n$\\
1075\>{\sf double x[n];}           \> // independent vector $x$ \\
1076\>{\sf double y[m];}           \> // dependent vector $y=F(x)$ 
1077\end{tabbing}
1078%
1079If the original evaluation program is available this double version
1080should be used to compute the function value in order to avoid the
1081interpretative overhead. 
1082
1083For the calculation of whole derivative vectors and matrices up to order
10842 there are the following procedures:
1085%
1086\begin{tabbing}
1087\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1088\>{\sf int gradient(tag,n,x,g)}\\
1089\>{\sf short int tag;}         \> // tape identification \\
1090\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1091\>{\sf double x[n];}           \> // independent vector $x$ \\
1092\>{\sf double g[n];}           \> // resulting gradient $\nabla F(x)$
1093\end{tabbing}
1094%
1095\begin{tabbing}
1096\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1097\>{\sf int jacobian(tag,m,n,x,J)}\\
1098\>{\sf short int tag;}         \> // tape identification \\
1099\>{\sf int m;}                 \> // number of dependent variables $m$\\
1100\>{\sf int n;}                 \> // number of independent variables $n$\\
1101\>{\sf double x[n];}           \> // independent vector $x$ \\
1102\>{\sf double J[m][n];}        \> // resulting Jacobian $F^\prime (x)$
1103\end{tabbing}
1104%
1105\begin{tabbing}
1106\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1107\>{\sf int hessian(tag,n,x,H)}\\
1108\>{\sf short int tag;}         \> // tape identification \\
1109\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1110\>{\sf double x[n];}           \> // independent vector $x$ \\
1111\>{\sf double H[n][n];}        \> // resulting Hessian matrix $\nabla^2F(x)$ 
1112\end{tabbing}
1113%
1114The driver routine {\sf hessian} computes only the lower half of
1115$\nabla^2f(x_0)$ so that all values {\sf H[i][j]} with $j>i$ 
1116of {\sf H} allocated as a square array remain untouched during the call
1117of {\sf hessian}. Hence only $i+1$ {\sf double}s  need to be
1118allocated starting at the position {\sf H[i]}.
1119
1120To use the full capability of automatic differentiation when the
1121product of derivatives with certain weight vectors or directions are needed, ADOL-C offers
1122the following four drivers: 
1123%
1124\begin{tabbing}
1125\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1126\>{\sf int vec\_jac(tag,m,n,repeat,x,u,z)}\\
1127\>{\sf short int tag;}         \> // tape identification \\
1128\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1129\>{\sf int n;}                 \> // number of independent variables $n$\\
1130\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1131\>{\sf double x[n];}           \> // independent vector $x$ \\
1132\>{\sf double u[m];}           \> // range weight vector $u$ \\ 
1133\>{\sf double z[n];}           \> // result $z = u^TF^\prime (x)$
1134\end{tabbing}
1135If a nonzero value of the parameter {\sf repeat} indicates that the
1136routine {\sf vec\_jac} has been called at the same argument immediately
1137before, the internal forward mode evaluation will be skipped and only
1138reverse mode evaluation with the corresponding arguments is executed
1139resulting in a reduced computational complexity of the function {\sf vec\_jac}.
1140%
1141\begin{tabbing}
1142\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1143\>{\sf int jac\_vec(tag,m,n,x,v,z)}\\
1144\>{\sf short int tag;}         \> // tape identification \\
1145\>{\sf int m;}                 \> // number of dependent variables $m$\\
1146\>{\sf int n;}                 \> // number of independent variables $n$\\
1147\>{\sf double x[n];}           \> // independent vector $x$\\
1148\>{\sf double v[n];}           \> // tangent vector $v$\\ 
1149\>{\sf double z[m];}           \> // result $z = F^\prime (x)v$
1150\end{tabbing}
1151%
1152\begin{tabbing}
1153\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1154\>{\sf int hess\_vec(tag,n,x,v,z)}\\
1155\>{\sf short int tag;}         \> // tape identification \\
1156\>{\sf int n;}                 \> // number of independent variables $n$\\
1157\>{\sf double x[n];}           \> // independent vector $x$\\
1158\>{\sf double v[n];}           \> // tangent vector $v$\\
1159\>{\sf double z[n];}           \> // result $z = \nabla^2F(x) v$ 
1160\end{tabbing}
1161%
1162\begin{tabbing}
1163\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1164\>{\sf int hess\_mat(tag,n,p,x,V,Z)}\\
1165\>{\sf short int tag;}         \> // tape identification \\
1166\>{\sf int n;}                 \> // number of independent variables $n$\\
1167\>{\sf int p;}                 \> // number of columns in $V$\\
1168\>{\sf double x[n];}           \> // independent vector $x$\\
1169\>{\sf double V[n][p];}        \> // tangent matrix $V$\\
1170\>{\sf double Z[n][p];}        \> // result $Z = \nabla^2F(x) V$ 
1171\end{tabbing}
1172%
1173\begin{tabbing}
1174\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1175\>{\sf int lagra\_hess\_vec(tag,m,n,x,v,u,h)}\\
1176\>{\sf short int tag;}         \> // tape identification \\
1177\>{\sf int m;}                 \> // number of dependent variables $m$\\
1178\>{\sf int n;}                 \> // number of independent variables $n$\\
1179\>{\sf double x[n];}           \> // independent vector $x$\\
1180\>{\sf double v[n];}           \> // tangent vector $v$\\
1181\>{\sf double u[m];}           \> // range weight vector $u$ \\
1182\>{\sf double h[n];}           \> // result $h = u^T\nabla^2F(x) v $
1183\end{tabbing}
1184%
1185The next procedure allows the user to perform Newton steps only
1186having the corresponding tape at hand:
1187%
1188\begin{tabbing}
1189\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1190\>{\sf int jac\_solv(tag,n,x,b,mode)} \\
1191\>{\sf short int tag;}         \> // tape identification \\
1192\>{\sf int n;}                 \> // number of independent variables $n$\\
1193\>{\sf double x[n];}           \> // independent vector $x$ as\\
1194\>{\sf double b[n];}           \> // in: right-hand side b, out: result $w$ of
1195$F(x)w = b$\\
1196\>{\sf int mode;}              \> // option to choose different solvers
1197\end{tabbing}
1198%
1199On entry, parameter {\sf b} of the routine {\sf jac\_solv}
1200contains the right-hand side of the equation $F(x)w = b$ to be solved. On exit,
1201{\sf b} equals the solution $w$ of this equation. If {\sf mode} = 0 only
1202the Jacobian of the function
1203given by the tape labeled with {\sf tag} is provided internally.
1204The LU-factorization of this Jacobian is computed for {\sf mode} = 1. The
1205solution of the equation is calculated if {\sf mode} = 2.
1206Hence, it is possible to compute the
1207LU-factorization only once. Then the equation can be solved for several
1208right-hand sides $b$ without calculating the Jacobian and
1209its factorization again. 
1210
1211If the original evaluation code of a function contains neither
1212quadratures nor branches, all drivers described above can be used to
1213evaluate derivatives at any argument in its domain. The same still
1214applies if there are no user defined quadratures and
1215all comparisons  involving {\sf adouble}s have the same result as
1216during taping. If this assumption is falsely made all drivers
1217while internally calling the forward mode evaluation will return the value -1 or -2
1218as already specified in \autoref{reuse_tape}
1219%
1220%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1221\subsection{Drivers for Ordinary Differential Equations}
1222\label{odedrivers}
1223%
1224When $F$ is the right-hand side of an (autonomous) ordinary
1225differential equation 
1226\[
1227x^\prime(t) \; = \; F(x(t)) , 
1228\] 
1229we must have $m=n$. Along any solution path $x(t)$ its Taylor
1230coefficients $x_j$ at some time, e.g., $t=0$, must satisfy
1231the relation
1232\[
1233 x_{i+1} = \frac{1}{1+i} y_i.
1234\]
1235with the $y_j$ the Taylor coefficients of its derivative $y(t)=x^\prime(t)$, namely,
1236\[
1237 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
1238\]
1239defined by an autonomous right-hand side $F$ recorded on the tape.
1240Using this relation, one can generate the Taylor coefficients $x_i$,
1241$i \le deg$,
1242recursively from the current point $x_0$. This task is achieved by the
1243driver routine {\sf forode} defined as follows:
1244%
1245\begin{tabbing}
1246\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1247\>{\sf int forode(tag,n,tau,dol,deg,X)}\\
1248\>{\sf short int tag;}         \> // tape identification \\
1249\>{\sf int n;}                 \> // number of state variables $n$\\
1250\>{\sf double tau;}            \> // scaling parameter\\
1251\>{\sf int dol;}               \> // degree on previous call\\
1252\>{\sf int deg;}               \> // degree on current call\\
1253\>{\sf double X[n][deg+1];}    \> // Taylor coefficient vector $X$
1254\end{tabbing}
1255%
1256If {\sf dol} is positive, it is assumed that {\sf forode}
1257has been called before at the same point so that all Taylor coefficient
1258vectors up to the {\sf dol}-th are already correct.
1259
1260Subsequently one may call the driver routine {\sf reverse} or corresponding
1261low level routines as explained in the \autoref{forw_rev} and
1262\autoref{forw_rev_ad}, respectively, to compute
1263the family of square matrices {\sf Z[n][n][deg]} defined by
1264\[
1265Z_j \equiv U\/\frac{\partial y_j}{\partial x_0} \in{I\!\!R}^{q \times n} ,
1266\]
1267with {\sf double** U}$=I_n$ the identity matrix of order {\sf n}.
1268
1269For the numerical solutions of ordinary differential equations,
1270one may also wish to calculate the Jacobians
1271\begin{equation} 
1272\label{eq:bees}
1273B_j \; \equiv \; \frac{\mbox{d}x_{j+1}}{\mbox{d} x_0}\;\in\;{I\!\!R}^{n \times n}\, ,
1274\end{equation}
1275which exist provided $F$ is sufficiently smooth. These matrices can
1276be obtained from the partial derivatives $\partial y_i/\partial x_0$
1277by an appropriate version of the chain rule.
1278To compute the total derivatives $B = (B_j)_{0\leq j <d}$
1279defined in \eqref{eq:bees}, one has to evaluate $\frac{1}{2}d(d-1)$
1280matrix-matrix products. This can be done by a call of the routine {\sf accode} after the
1281corresponding evaluation of the {\sf hov\_reverse} function. The interface of
1282{\sf accode} is defined as follows:
1283%
1284\begin{tabbing}
1285\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1286\>{\sf int accode(n,tau,deg,Z,B,nz)}\\
1287\>{\sf int n;}                 \> // number of state variables $n$ \\
1288\>{\sf double tau;}            \> // scaling parameter\\
1289\>{\sf int deg;}               \> // degree on current call\\
1290\>{\sf double Z[n][n][deg];}   \> // partials of coefficient vectors\\
1291\>{\sf double B[n][n][deg];}   \> // result $B$ as defined in \eqref{eq:bees}\\
1292\>{\sf short nz[n][n];}        \> // optional nonzero pattern
1293\end{tabbing}
1294%
1295Sparsity information can be exploited by {\sf accode} using the array {\sf
1296nz}. For this purpose, {\sf nz} has to be set by a call of the routine {\sf
1297reverse} or the corresponding basic routines as explained below in
1298\autoref{forw_rev_ad} and \autoref{forw_rev}, respectively. The
1299non-positive entries of {\sf nz} are then changed by {\sf accode} so that upon
1300return
1301\[
1302  \mbox{{\sf B[i][j][k]}} \; \equiv \; 0 \quad {\rm if} \quad \mbox{\sf k} \leq \mbox{\sf $-$nz[i][j]}\; .
1303\]
1304In other words, the matrices $B_k$ = {\sf B[ ][ ][k]} have a
1305sparsity pattern that fills in as $k$ grows. Note, that there need to be no
1306loss in computational efficiency if a time-dependent ordinary differential equation
1307is rewritten in autonomous form.
1308
1309The prototype of the ODE-drivers {\sf forode} and {\sf accode} is contained in the header file
1310\verb=<adolc/drivers/odedrivers.h>=. The global header file
1311\verb=<adolc/adolc.h>=
1312includes this file automatically, see \autoref{ssec:DesIH}.
1313
1314An example program using the procedures {\sf forode} and {\sf accode} together
1315with more detailed information about the coding can be found in
1316\autoref{exam:ode}. The corresponding source code
1317\verb=odexam.cpp= is contained in the subdirectory
1318\verb=examples=.
1319%
1320%
1321%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1322\subsection{Drivers for Sparse Jacobians and Sparse Hessians}
1323\label{sparse}
1324%
1325Quite often, the Jacobians and Hessians that have to be computed are sparse
1326matrices. Therefore, ADOL-C provides additionally drivers that
1327allow the exploitation of sparsity. The exploitation of sparsity is
1328frequently based on {\em graph coloring} methods, discussed
1329for example in \cite{GeMaPo05} and \cite{GeTaMaPo07}. The sparse drivers of ADOL-C presented in this section
1330rely on the the coloring package ColPack developed by the authors of \cite{GeMaPo05} and \cite{GeTaMaPo07}.
1331ColPack is not directly incorporated in ADOL-C, and therefore needs to be installed
1332separately to use the sparse drivers described here. ColPack is available for download at
1333\verb=http://www.cscapes.org/coloringpage/software.htm=. More information about the required
1334installation of ColPack is given in \autoref{install}.
1335%
1336\subsubsection*{Sparse Jacobians and Sparse Hessians}
1337%
1338To compute the entries of sparse Jacobians and sparse Hessians,
1339respectively, in coordinate format one may use the drivers:
1340\begin{tabbing}
1341\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1342\>{\sf int sparse\_jac(tag,m,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1343\>{\sf short int tag;}         \> // tape identification \\
1344\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1345\>{\sf int n;}                 \> // number of independent variables $n$\\
1346\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1347\>{\sf double x[n];}           \> // independent vector $x$ \\
1348\>{\sf int nnz;}               \> // number of nonzeros \\ 
1349\>{\sf unsigned int rind[nnz];}\> // row index\\ 
1350\>{\sf unsigned int cind[nnz];}\> // column index\\ 
1351\>{\sf double values[nnz];}    \> // non-zero values\\ 
1352\>{\sf int options[4];}        \> // array of control parameters\\ 
1353\end{tabbing}
1354%
1355\begin{tabbing}
1356\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1357\>{\sf int sparse\_hess(tag,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1358\>{\sf short int tag;}         \> // tape identification \\
1359\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1360\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1361\>{\sf double x[n];}           \> // independent vector $x$ \\
1362\>{\sf int nnz;}               \> // number of nonzeros \\ 
1363\>{\sf unsigned int rind[nnz];}\> // row indices\\ 
1364\>{\sf unsigned int cind[nnz];}\> // column indices\\ 
1365\>{\sf double values[nnz];}    \> // non-zero values  \\
1366\>{\sf int options[2];}        \> // array of control parameters\\ 
1367\end{tabbing}
1368%
1369Once more, the input variables are the identifier for the internal
1370representation {\sf tag}, if required the number of dependents {\sf m},
1371and the number of independents {\sf n} for a consistency check.
1372Furthermore, the flag {\sf repeat=0} indicates that the functions are called
1373at a point with a new sparsity structure, whereas  {\sf repeat=1} results in
1374the re-usage of the sparsity pattern from the previous call.
1375The current values of the independents are given by the array {\sf x}.
1376The input/output
1377variable {\sf nnz} stores the number of the nonzero entries.
1378Therefore, {\sf nnz} denotes also the length of the arrays {\sf r\_ind} storing
1379the row indices, {\sf c\_ind} storing the column indices, and
1380{\sf values} storing the values of the nonzero entries.
1381If {\sf sparse\_jac} and {\sf sparse\_hess} are called with {\sf repeat=0},
1382the functions determine the number of nonzeros for the sparsity pattern
1383defined by the value of {\sf x}, allocate appropriate arrays {\sf r\_ind},
1384{\sf c\_ind}, and {\sf values} and store the desired information in these
1385arrays.
1386During the next function call with {\sf repeat=1} the allocated memory
1387is reused such that only the values of the arrays are changed.   
1388Before calling {\sf sparse\_jac} or {\sf sparse\_hess} once more with {\sf
1389  repeat=0} the user is responsible for the deallocation of the array
1390 {\sf r\_ind}, {\sf c\_ind}, and {\sf values} using the function {\sf
1391   delete[]}!
1392
1393For each driver the array {\sf options} can be used to adapted the
1394computation of the sparse derivative matrices to the special
1395needs of application under consideration. Most frequently, the default options
1396will give a reasonable performance. The elements of the array {\sf options} control the action of
1397{\sf sparse\_jac} according to \autoref{options_sparse_jac}.
1398\begin{table}[h]
1399\center
1400\begin{tabular}{|c|c|l|} \hline
1401component & value &  \\ \hline
1402{\sf options[0]} &    &  way of sparsity pattern computation \\
1403                 & 0  &  propagation of index domains (default) \\
1404                 & 1  &  propagation of bit pattern \\ \hline
1405{\sf options[1]} &    &  test the computational graph control flow \\
1406                 & 0  &  safe mode (default) \\
1407                 & 1  &  tight mode \\ \hline
1408{\sf options[2]} &    &  way of bit pattern propagation \\
1409                 & 0  &  automatic detection (default) \\
1410                 & 1  &  forward mode \\ 
1411                 & 2  &  reverse mode \\ \hline
1412{\sf options[3]} &    &  way of compression \\
1413                 & 0  &  column compression (default) \\
1414                 & 1  &  row compression \\ \hline
1415\end{tabular}
1416\caption{ {\sf sparse\_jac} parameter {\sf options}\label{options_sparse_jac}}
1417\end{table}           
1418
1419The component {\sf options[1]} determines
1420the usage of the safe or tight mode of sparsity computation.
1421The first, more conservative option is the default. It accounts for all
1422dependences that might occur for any value of the
1423independent variables. For example, the intermediate
1424{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1425always assumed to depend on all independent variables that {\sf a} or {\sf b}
1426dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1427logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1428In contrast
1429the tight option gives this result only in the unlikely event of an exact
1430tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1431associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1432depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1433Obviously, the sparsity pattern obtained with the tight option may contain
1434more zeros than that obtained with the safe option. On the other hand, it
1435will only be valid at points belonging to an area where the function $F$ is locally
1436analytic and that contains the point at which the internal representation was
1437generated. Since generating the sparsity structure using the safe version does not
1438require any reevaluation, it may thus reduce the overall computational cost
1439despite the fact that it produces more nonzero entries.
1440The value of {\sf options[2]} selects the direction of bit pattern propagation.
1441Depending on the number of independent $n$ and of dependent variables $m$ 
1442one would prefer the forward mode if $n$ is significant smaller than $m$ and
1443would otherwise use the reverse mode.
1444
1445 The elements of the array {\sf options} control the action of
1446{\sf sparse\_hess} according to \autoref{options_sparse_hess}.
1447\begin{table}[h]
1448\center
1449\begin{tabular}{|c|c|l|} \hline
1450component & value &  \\ \hline
1451{\sf options[0]} &    &  test the computational graph control flow \\
1452                 & 0  &  safe mode (default) \\
1453                 & 1  &  tight mode \\ \hline
1454{\sf options[1]} &    &  way of recovery \\
1455                 & 0  &  indirect recovery (default) \\
1456                 & 1  &  direct recovery \\ \hline
1457\end{tabular}
1458\caption{ {\sf sparse\_hess} parameter {\sf options}\label{options_sparse_hess}}
1459\end{table}           
1460
1461The described driver routines for the computation of sparse derivative
1462matrices are prototyped in the header file
1463\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1464global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1465Example codes illustrating the usage of {\sf
1466  sparse\_jac} and {\sf sparse\_hess} can be found in the file
1467\verb=sparse_jacobian.cpp=  and \verb=sparse_hessian.cpp= contained in %the subdirectory
1468\verb=examples/additional_examples/sparse=.
1469%
1470%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1471\subsubsection*{Computation of Sparsity Pattern}
1472%
1473ADOL-C offers a convenient way of determining the 
1474sparsity structure of a Jacobian matrix using the function:
1475%
1476\begin{tabbing}
1477\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1478\>{\sf int jac\_pat(tag, m, n, x, JP, options)}\\
1479\>{\sf short int tag;} \> // tape identification \\
1480\>{\sf int m;} \> // number of dependent variables $m$\\
1481\>{\sf int n;} \> // number of independent variables $n$\\
1482\>{\sf double x[n];} \> // independent variables $x_0$\\
1483\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure\\
1484\>{\sf int options[2];} \> // array of control parameters
1485\end{tabbing}
1486%
1487The sparsity pattern of the
1488Jacobian is computed in a compressed row format. For this purpose,
1489{\sf JP} has to be an $m$ dimensional array of pointers to {\sf
1490  unsigned int}s, i.e., one has {\sf unsigned int* JP[m]}.
1491During the call of  {\sf jac\_pat}, the number $\hat{n}_i$ of nonzero
1492entries in row $i$ of the Jacobian is determined for all $1\le i\le
1493m$. Then, a memory allocation is performed such that {\sf JP[i-1]}
1494points to a block of $\hat{n}_i+1$ {\sf  unsigned int} for all $1\le
1495i\le m$ and {\sf JP[i-1][0]} is set to $\hat{n}_i$. Subsequently, the
1496column indices of the $j$ nonzero entries in the $i$th row are stored
1497in the components  {\sf JP[i-1][1]}, \ldots, {\sf JP[i-1][j]}.
1498
1499The elements of the array {\sf options} control the action of
1500{\sf jac\_pat} according to \autoref{options}.
1501\begin{table}[h]
1502\center
1503\begin{tabular}{|c|c|l|} \hline
1504component & value &  \\ \hline
1505{\sf options[0]} &    &  way of sparsity pattern computation \\
1506                 & 0  &  propagation of index domains (default) \\
1507                 & 1  &  propagation of bit pattern \\ \hline
1508{\sf options[1]} &    &  test the computational graph control flow \\
1509                 & 0  &  safe mode (default) \\
1510                 & 1  &  tight mode \\ \hline
1511{\sf options[2]} &    &  way of bit pattern propagation \\
1512                 & 0  &  automatic detection (default) \\
1513                 & 1  &  forward mode \\ 
1514                 & 2  &  reverse mode \\ \hline
1515\end{tabular}
1516\caption{ {\sf jac\_pat} parameter {\sf options}\label{options}}
1517\end{table}           
1518The value of {\sf options[0]} selects the way to compute the sparsity
1519pattern. The component {\sf options[1]} determines
1520the usage of the safe or tight mode of bit pattern propagation.
1521The first, more conservative option is the default. It accounts for all
1522dependences that might occur for any value of the
1523independent variables. For example, the intermediate
1524{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1525always assumed to depend on all independent variables that {\sf a} or {\sf b}
1526dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1527logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1528In contrast
1529the tight option gives this result only in the unlikely event of an exact
1530tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1531associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1532depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1533Obviously, the sparsity pattern obtained with the tight option may contain
1534more zeros than that obtained with the safe option. On the other hand, it
1535will only be valid at points belonging to an area where the function $F$ is locally
1536analytic and that contains the point at which the internal representation was
1537generated. Since generating the sparsity structure using the safe version does not
1538require any reevaluation, it may thus reduce the overall computational cost
1539despite the fact that it produces more nonzero entries. The value of
1540{\sf options[2]} selects the direction of bit pattern propagation.
1541Depending on the number of independent $n$ and of dependent variables $m$ 
1542one would prefer the forward mode if $n$ is significant smaller than $m$ and
1543would otherwise use the reverse mode.
1544
1545The routine {\sf jac\_pat} may use the propagation of bitpattern to
1546determine the sparsity pattern. Therefore, a kind of ``strip-mining''
1547is used to cope with large matrix dimensions. If the system happens to run out of memory, one may reduce
1548the value of the constant {\sf PQ\_STRIPMINE\_MAX}
1549following the instructions in \verb=<adolc/sparse/sparse_fo_rev.h>=.
1550
1551The driver routine is prototyped in the header file
1552\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1553global header file \verb=<adolc/adolc.h>= (see
1554\autoref{ssec:DesIH}). The determination of sparsity patterns is
1555illustrated by the examples \verb=sparse_jacobian.cpp=
1556and \verb=jacpatexam.cpp=
1557contained in
1558\verb=examples/additional_examples/sparse=.
1559
1560To compute the sparsity pattern of a Hessian in a row compressed form, ADOL-C provides the
1561driver
1562\begin{tabbing}
1563\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1564\>{\sf int hess\_pat(tag, n, x, HP, options)}\\
1565\>{\sf short int tag;}       \> // tape identification \\
1566\>{\sf int n;}               \> // number of independent variables $n$\\
1567\>{\sf double x[n];}         \> // independent variables $x_0$\\
1568\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure\\
1569\>{\sf int option;}          \> // control parameter
1570\end{tabbing}
1571where the user has to provide {\sf HP} as an $n$ dimensional array of pointers to {\sf
1572 unsigned int}s.
1573After the function call {\sf HP} contains the sparsity pattern,
1574where {\sf HP[j][0]} contains the number of nonzero elements in the
1575 $j$th row for $1 \le j\le n$.
1576The components {\sf P[j][i]}, $0<${\sf i}~$\le$~{\sf P[j][0]} store the
1577 indices of these entries. For determining the sparsity pattern, ADOL-C uses
1578 the algorithm described in \cite{Wa05a}.  The parameter{\sf option} determines
1579the usage of the safe ({\sf option = 0}, default) or tight mode ({\sf
1580  option = 1}) of the computation of the sparsity pattern as described
1581above.
1582
1583This driver routine is prototyped in the header file
1584\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1585global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1586An example employing the procedure {\sf hess\_pat}  can be found in the file
1587\verb=sparse_hessian.cpp=  contained in
1588\verb=examples/additional_examples/sparse=.
1589%
1590%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1591\subsubsection*{Calculation of Seed Matrices}
1592%
1593To compute a compressed derivative matrix from a given sparsity
1594pattern, one has to calculate an appropriate seed matrix that can be
1595used as input for the derivative calculation. To facilitate the
1596generation of seed matrices for a sparsity pattern given in
1597row compressed form, ADOL-C provides the following two drivers,
1598which are based on the ColPack library:
1599\begin{tabbing}
1600\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1601\>{\sf int generate\_seed\_jac(m, n, JP, S, p)}\\
1602\>{\sf int m;} \> // number of dependent variables $m$\\
1603\>{\sf int n;} \> // number of independent variables $n$\\
1604\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure
1605of Jacobian\\
1606\>{\sf double S[n][p];} \> // seed matrix\\
1607\>{\sf int p;} \> // number of columns in $S$
1608\end{tabbing}
1609The input variables to {\sf generate\_seed\_jac} are the number of dependent variables $m$, the
1610number of independent variables {\sf n} and the sparsity pattern {\sf
1611  JP} of the Jacobian computed for example by {\sf jac\_pat}. First,
1612{\sf generate\_seed\_jac} performs a distance-2 coloring of the bipartite graph defined by the sparsity
1613pattern {\sf JP} as described in \cite{GeMaPo05}. The number of colors needed for the coloring
1614determines the number of columns {\sf p} in the seed
1615matrix. Subsequently, {\sf generate\_seed\_jac} allocates the memory needed by {\sf
1616 S} and initializes {\sf S} according to the graph coloring.
1617The coloring algorithm that is applied in {\sf
1618  generate\_seed\_jac} is used also by the driver {\sf sparse\_jac}
1619described earlier.
1620
1621\begin{tabbing}
1622\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1623\>{\sf int generate\_seed\_hess(n, HP, S, p)}\\
1624\>{\sf int n;} \> // number of independent variables $n$\\
1625\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure
1626of Jacobian\\
1627\>{\sf double S[n][p];} \> // seed matrix\\
1628\>{\sf int p;} \> // number of columns in $S$
1629\end{tabbing}
1630The input variables to {\sf generate\_seed\_hess} are the number of independents $n$
1631and the sparsity pattern {\sf HP} of the Hessian computed for example
1632by {\sf hess\_pat}. First, {\sf generate\_seed\_hess} performs an
1633appropriate coloring of the adjacency graph defined by the sparsity
1634pattern {\sf HP}: An acyclic coloring in the case of an indirect recovery of the Hessian from its
1635    compressed representation and a star coloring in the case of a direct recovery.
1636 Subsequently, {\sf generate\_seed\_hess} allocates the memory needed by {\sf
1637 S} and initializes {\sf S} according to the graph coloring.
1638The coloring algorithm applied in {\sf
1639  generate\_seed\_hess} is used also by the driver {\sf sparse\_hess}
1640described earlier.
1641
1642The specific set of criteria used to define a seed matrix $S$ depends
1643on whether the sparse derivative matrix
1644to be computed is a Jacobian (nonsymmetric) or a Hessian (symmetric). 
1645It also depends on whether the entries of the derivative matrix  are to be
1646recovered from the compressed representation \emph{directly}
1647(without requiring any further arithmetic) or \emph{indirectly} (for
1648example, by solving for unknowns via successive substitutions).
1649Appropriate recovery routines are provided by ColPack and used
1650in the drivers {\sf sparse\_jac} and {\sf sparse\_hess} described in
1651the previous subsection. Examples with a detailed analysis of the
1652employed drivers for the exploitation of sparsity can be found in the
1653papers \cite{GePoTaWa06} and \cite{GePoWa08}.
1654
1655
1656These driver routines are prototyped in
1657\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1658global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1659An example code illustrating the usage of {\sf
1660generate\_seed\_jac} and {\sf generate\_seed\_hess} can be found in the file
1661\verb=sparse_jac_hess_exam.cpp= contained in \verb=examples/additional_examples/sparse=.
1662%
1663%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1664\subsection{Higher Derivative Tensors}
1665\label{higherOrderDeriv}
1666%
1667Many applications in scientific computing need second- and higher-order
1668derivatives. Often, one does not require full derivative tensors but
1669only the derivatives in certain directions $s_i \in \R^{n}$.
1670Suppose a collection of $p$ directions
1671$s_i \in \R^{n}$ is given, which form a matrix
1672\[
1673S\; =\; \left [ s_1, s_2,\ldots,  s_p \right ]\; \in \;
1674 \R^{n \times p}.
1675\]
1676One possible choice is $S = I_n$ with  $p = n$, which leads to
1677full tensors being evaluated.
1678ADOL-C provides the function {\sf tensor\_eval}
1679to calculate the derivative tensors
1680\begin{eqnarray}
1681\label{eq:tensor}
1682\left. \nabla_{\mbox{$\scriptstyle \!\!S$}}^{k}
1683     F(x_0) \; = \; \frac{\partial^k}{\partial z^k} F(x_0+Sz) \right |_{z=0} 
1684     \in \R^{p^k}\quad \mbox{for} \quad k = 0,\ldots,d
1685\end{eqnarray}
1686simultaneously. The function {\sf tensor\_eval} has the following calling sequence and
1687parameters:
1688%
1689\begin{tabbing}
1690\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1691\>{\sf void tensor\_eval(tag,m,n,d,p,x,tensor,S)}\\
1692\>{\sf short int tag;}         \> // tape identification \\
1693\>{\sf int m;}                 \> // number of dependent variables $m$ \\
1694\>{\sf int n;}                 \> // number of independent variables $n$\\
1695\>{\sf int d;}                 \> // highest derivative degree $d$\\
1696\>{\sf int p;}                 \> // number of directions $p$\\
1697\>{\sf double x[n];}           \> // values of independent variables $x_0$\\
1698\>{\sf double tensor[m][size];}\> // result as defined in \eqref{eq:tensor} in compressed form\\
1699\>{\sf double S[n][p];}        \> // seed matrix $S$
1700\end{tabbing}
1701%
1702Using the symmetry of the tensors defined by \eqref{eq:tensor}, the memory 
1703requirement can be reduced enormously. The collection of  tensors up to order $d$ comprises 
1704$\binom{p+d}{d}$ distinct elements. Hence, the second dimension of {\sf tensor} must be
1705greater or equal to $\binom{p+d}{d}$.
1706To compute the derivatives, {\sf tensor\_eval} propagates internally univariate Taylor
1707series along $\binom{n+d-1}{d}$ directions. Then the desired values are interpolated. This
1708approach is described in \cite{Griewank97}.
1709
1710The access of individual entries in symmetric tensors of
1711higher order is a little tricky. We always store the derivative
1712values in the two dimensional array {\sf tensor} and provide two
1713different ways of accessing them. 
1714The leading dimension of the tensor array ranges over
1715the component index $i$ of the function $F$, i.e., $F_{i+1}$ for $i =
17160,\ldots,m-1$. The sub-arrays pointed to by {\sf tensor[i]} have identical
1717structure for all $i$. Each of them represents the symmetric tensors up to
1718order $d$ of the scalar function $F_{i+1}$ in $p$ variables. 
1719%
1720The $\binom{p+d}{d}$ mixed partial derivatives in each of the $m$
1721tensors are linearly ordered according to the tetrahedral
1722scheme described by Knuth \cite{Knuth73}. In the familiar quadratic
1723case $d=2$ the derivative with respect to $z_j$ and $z_k$ with $z$ 
1724as in \eqref{eq:tensor} and $j \leq k$ is stored at {\sf tensor[i][l]} with
1725$l = k*(k+1)/2+j$. At $j = 0 = k$ and hence $l = 0$ we find the
1726function value $F_{i+1}$ itself and the gradient
1727$\nabla F_{i+1}= \partial F_{i+1}/\partial x_k $ is stored at $l=k(k+1)/2$
1728with $j=0$ for $k=1,\ldots,p$.
1729
1730For general $d$ we combine the variable
1731indices to a multi-index $j = (j_1,j_2,\ldots,j_d)$,
1732where $j_k$ indicates differentiation with respect to variable
1733$x_{j_k}$ with $j_k \in \{0,1,\ldots,p\}$. The value $j_k=0$ indicates
1734no differentiation so that all lower derivatives are also
1735contained in the same data structure as described above for
1736the quadratic case. The location of the partial derivative specified
1737by $j$ is computed by the function
1738%
1739\begin{tabbing}
1740\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1741\>{\sf int address(d,$\,$j)} \\
1742\>{\sf int d;}                 \> // highest derivative degree $d$ \\
1743\>{\sf int j[d];}              \> // multi-index $j$
1744\end{tabbing}       
1745%
1746and it may thus be referenced as {\sf tensor[i][address(d,$\,$j)]}.
1747Notice that the address computation does depend on the degree $d$ 
1748but not on the number of directions $p$, which could theoretically be
1749enlarged without the need to reallocate the original tensor.
1750Also, the components of $j$ need to be non-increasing.
1751%
1752To some C programmers it may appear more natural to access tensor
1753entries by successive dereferencing in the form
1754{\sf tensorentry[i][$\,$j1$\,$][$\,$j2$\,$]$\ldots$[$\,$jd$\,$]}.
1755We have also provided this mode, albeit with the restriction
1756that the indices $j_1,j_2,\ldots,j_d$ are non-increasing.
1757In the second order case this means that the Hessian entries must be
1758specified in or below the diagonal. If this restriction is
1759violated the values are almost certain to be wrong and array bounds
1760may be violated. We emphasize that subscripting is not overloaded
1761but that {\sf tensorentry} is a conventional and
1762thus moderately efficient C pointer structure.
1763Such a pointer structure can be allocated and set up completely by the
1764function
1765%
1766\begin{tabbing}
1767\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1768\>{\sf void** tensorsetup(m,p,d,tensor)} \\
1769\>{\sf int m;}                 \> // number of dependent variables $n$ \\
1770\>{\sf int p;}                 \> // number of directions $p$\\
1771\>{\sf int d;}                 \> // highest derivative degree $d$\\
1772\>{\sf double tensor[m][size];}\> // pointer to two dimensional array
1773\end{tabbing}     
1774%
1775Here, {\sf tensor} is the array of $m$ pointers pointing to arrays of {\sf size}
1776$\geq \binom{p+d}{d}$ allocated by the user before. During the execution of {\sf tensorsetup},
1777 $d-1$ layers of pointers are set up so that the return value
1778allows the direct dereferencing of individual tensor elements.
1779
1780For example, suppose some active section involving  $m \geq 5$ dependents and
1781$n \geq 2$ independents has been executed and taped. We may
1782select $p=2$, $d=3$ and initialize the $n\times 2$ seed matrix $S$ with two
1783columns $s_1$ and $s_2$. Then we are able to execute the code segment
1784\begin{tabbing}
1785\hspace{0.5in}\={\sf double**** tensorentry = (double****) tensorsetup(m,p,d,tensor);} \\
1786              \>{\sf tensor\_eval(tag,m,n,d,p,x,tensor,S);}   
1787\end{tabbing}
1788This way, we evaluated all tensors defined in \eqref{eq:tensor} up to degree 3
1789in both directions $s_1$ and
1790$s_2$ at some argument $x$. To allow the access of tensor entries by dereferencing the pointer
1791structure {\sf tensorentry} has been created. Now, 
1792the value of the mixed partial
1793\[
1794 \left. \frac{\partial ^ 3 F_5(x+s_1 z_1+s_2 z_2)}{\partial z_1^2 \partial z_2}   \right |_{z_1=0=z_2
1795\]
1796can be recovered as
1797\begin{center}
1798   {\sf tensorentry[4][2][1][1]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[4][address(d,$\,$j)]},
1799\end{center}
1800where the integer array {\sf j} may equal (1,1,2), (1,2,1) or (2,1,1). 
1801Analogously, the entry
1802\begin{center}   
1803   {\sf tensorentry[2][1][0][0]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[2][address(d,$\,$j)]}
1804\end{center}
1805with {\sf j} = (1,0,0) contains the first derivative of the third dependent
1806variable $F_3$ with respect to the first differentiation parameter $z_1$.
1807
1808Note, that the pointer structure {\sf tensorentry} has to be set up only once. Changing the values of the
1809array {\sf tensor}, e.g.~by a further call of {\sf tensor\_eval}, directly effects the values accessed
1810by {\sf tensorentry}.
1811%
1812When no more derivative evaluations are desired the pointer structure
1813{\sf tensorentry} can be deallocated by a call to the function
1814%
1815\begin{tabbing}
1816\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1817\>{\sf int freetensor(m,p,d, (double ****) tensorentry)}\\
1818\>{\sf int m;}                    \> // number of dependent variables $m$ \\
1819\>{\sf int p;}                    \> // number of independent variables $p$\\
1820\>{\sf int d;}                    \> // highest derivative degree $d$\\
1821\>{\sf double*** tensorentry[m];} \> // return value of {\sf tensorsetup} 
1822\end{tabbing} 
1823%
1824that does not deallocate the array {\sf tensor}.
1825
1826The drivers provided for efficient calculation of higher order
1827derivatives are prototyped in the header file \verb=<adolc/drivers/taylor.h>=,
1828which is included by the global header file \verb=<adolc/adolc.h>= automatically
1829(see \autoref{ssec:DesIH}).
1830Example codes using the above procedures can be found in the files
1831\verb=taylorexam.C= and \verb=accessexam.C= contained in the subdirectory
1832\verb=examples/additional_examples/taylor=.
1833%
1834%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1835\subsection{Derivatives of Implicit and Inverse Functions}
1836\label{implicitInverse}
1837%
1838Frequently, one needs derivatives of variables
1839$y \in \R^{m}$ that are implicitly defined as
1840functions of some variables $x \in \R^{n-m}$
1841by an algebraic system of equations
1842\[
1843G(z) \; = \; 0 \in \R^m \quad
1844{\rm with} \quad z = (y, x) \in \R^n .
1845\] 
1846Naturally, the $n$ arguments of $G$ need not be partitioned in
1847this regular fashion and we wish to provide flexibility for a
1848convenient selection of the $n-m$ {\em truly} independent
1849variables. Let $P \in \R^{(n-m)\times n}$ be a $0-1$ matrix
1850that picks out these variables so that it is a column
1851permutation of the matrix $[0,I_{n-m}] \in \R^{(n-m)\times n}$.
1852Then the nonlinear system
1853\[
1854  G(z) \; = \; 0, \quad P z =  x,                           
1855\] 
1856has a regular Jacobian, wherever the implicit function theorem
1857yields $y$ as a function of $x$. Hence, we may also write
1858\begin{equation}
1859\label{eq:inv_tensor}
1860F(z) = \left(\begin{array}{c}
1861                        G(z) \\
1862                        P z
1863                      \end{array} \right)\; \equiv \;
1864                \left(\begin{array}{c}
1865                        0 \\
1866                        P z
1867                      \end{array} \right)\; \equiv \; S\, x,
1868\end{equation}
1869where $S = [0,I_p]^{T} \in \R^{n \times p}$ with $p=n-m$. Now, we have rewritten
1870the original implicit functional relation between $x$ and $y$ as an inverse
1871relation $F(z) = Sx$. In practice, we may implement the projection $P$ simply
1872by marking $n-m$ of the independents also dependent. 
1873
1874Given any $ F : \R^n \mapsto \R^n $ that is locally invertible and an arbitrary
1875seed matrix $S \in \R^{n \times p}$ we may evaluate all derivatives of $z \in \R^n$
1876with respect to $x \in \R^p$ by calling the following routine:
1877%
1878\begin{tabbing}
1879\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1880\>{\sf void inverse\_tensor\_eval(tag,n,d,p,z,tensor,S)}\\
1881\>{\sf short int tag;}         \> // tape identification \\
1882\>{\sf int n;}                 \> // number of variables $n$\\
1883\>{\sf int d;}                 \> // highest derivative degree $d$\\
1884\>{\sf int p;}                 \> // number of directions $p$\\
1885\>{\sf double z[n];}          \> // values of independent variables $z$\\
1886\>{\sf double tensor[n][size];}\> // partials of $z$ with respect to $x$\\
1887\>{\sf double S[n][p];}        \> // seed matrix $S$
1888\end{tabbing}                 
1889%                 
1890The results obtained in {\sf tensor} are exactly the same as if we had called {\sf tensor\_eval} with
1891{\sf tag} pointing to a tape for the evaluation of the inverse function
1892$z=F^{-1}(y)$ for which naturally $n=m$. Note that the columns of $S$ belong
1893to the domain of that function. Individual derivative components can be
1894accessed in tensor exactly as in the explicit case described above.
1895
1896It must be understood that {\sf inverse\_tensor\_eval} actually computes the
1897derivatives of $z$ with respect to $x$ that is defined by the equation
1898$F(z)=F(z_0)+S \, x$. In other words the base point at
1899which the inverse function is differentiated is given by $F(z_0)$.
1900The routine has no capability for inverting $F$ itself as
1901solving systems of nonlinear
1902equations $F(z)=0$ in the first place is not just a differentiation task.
1903However, the routine {\sf jac\_solv} described in \autoref{optdrivers} may certainly be very
1904useful for that purpose.
1905
1906As an example consider the following two nonlinear expressions
1907\begin{eqnarray*}
1908      G_1(z_1,z_2,z_3,z_4) & = & z_1^2+z_2^2-z_3^\\
1909      G_2(z_1,z_2,z_3,z_4) & = & \cos(z_4) - z_1/z_3 \enspace   .
1910\end{eqnarray*}   
1911The equations $G(z)=0$ describe the relation between the Cartesian
1912coordinates $(z_1,z_2)$ and the polar coordinates $(z_3,z_4)$ in the plane.
1913Now, suppose we are interested in the derivatives of the second Cartesian
1914$y_1=z_2$ and the second (angular) polar coordinate $y_2=z_4$ with respect
1915to the other two variables $x_1=z_1$ and $x_2=z_3$. Then the active section
1916could look simply like
1917%
1918\begin{tabbing}
1919\hspace{1.5in}\={\sf for (j=1; j $<$ 5;$\,$j++)}\hspace{0.15in} \= {\sf z[j] \boldmath $\ll=$ \unboldmath  zp[j];}\\
1920\>{\sf g[1] = z[1]*z[1]+z[2]*z[2]-z[3]*z[3]; }\\
1921\>{\sf g[2] = cos(z[4]) - z[1]/z[3]; }\\
1922\>{\sf g[1] \boldmath $\gg=$ \unboldmath gp[1];} \> {\sf g[2] \boldmath $\gg=$ \unboldmath gp[2];}\\
1923\>{\sf z[1] \boldmath $\gg=$ \unboldmath zd[1];} \> {\sf z[3] \boldmath $\gg=$ \unboldmath zd[2];}
1924\end{tabbing}     
1925%
1926where {\sf zd[1]} and {\sf zd[2]} are dummy arguments.
1927In the last line the two independent variables {\sf z[1]} and
1928{\sf z[3]} are made
1929simultaneously dependent thus generating a square system that can be
1930inverted (at most arguments). The corresponding projection and seed
1931matrix are
1932\begin{eqnarray*}
1933P \;=\; \left( \begin{array}{cccc}
1934               1 & 0 & 0 & 0 \\
1935               0 & 0 & 1 & 0
1936            \end{array}\right) \quad \mbox{and} \quad
1937S^T \; = \; \left( \begin{array}{cccc}
1938               0 & 0 & 1 & 0 \\
1939               0 & 0 & 0 & 1
1940            \end{array}\right\enspace .
1941\end{eqnarray*}
1942Provided the vector {\sf zp} is consistent in that its Cartesian and polar
1943components describe the same point in the plane the resulting tuple
1944{\sf gp} must vanish. The call to {\sf inverse\_tensor\_eval} with
1945$n=4$, $p=2$ and $d$
1946as desired will yield the implicit derivatives, provided
1947{\sf tensor} has been allocated appropriately of course and $S$ has the value
1948given above.
1949%
1950The example is untypical in that the implicit function could also be
1951obtained explicitly by symbolic mani\-pu\-lations. It is typical in that
1952the subset of $z$ components that are to be considered as truly
1953independent can be selected and altered with next to no effort at all.
1954
1955The presented drivers are prototyped in the header file
1956\verb=<adolc/drivers/taylor.h>=. As indicated before this header
1957is included by the global header file \verb=<adolc/adolc.h>= automatically
1958(see \autoref{ssec:DesIH}).
1959The example programs \verb=inversexam.cpp=, \verb=coordinates.cpp= and
1960\verb=trigger.cpp=  in the directory \verb=examples/additional_examples/taylor=
1961show the application of the procedures described here.
1962%
1963%
1964%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
1965\section{Basic Drivers for the Forward and Reverse Mode}
1966\label{forw_rev_ad}
1967%
1968In this section, we present tailored drivers for different
1969variants of the forward mode and the reverse mode, respectively.
1970For a better understanding, we start with a short
1971description of the mathematical background.
1972
1973Provided no arithmetic exception occurs,
1974no comparison including {\sf fmax} or  {\sf fmin} yields a tie,
1975{\sf fabs} does not yield zero,
1976and all special functions were evaluated in the
1977interior of their domains, the functional relation between the input
1978variables $x$
1979and the output variables $y$ denoted by $y=F(x)$ is in
1980fact analytic.  In other words, we can compute arbitrarily high
1981derivatives of the vector function $F : I\!\!R^n \mapsto I\!\!R^m$ defined
1982by the active section.
1983We find it most convenient to describe and
1984compute derivatives in terms of univariate Taylor expansions, which
1985are truncated after the highest derivative degree $d$ that is desired
1986by the user. Let
1987\begin{equation}
1988\label{eq:x_of_t}
1989x(t) \; \equiv \; \sum_{j=0}^dx_jt^j \; : \;  I\!\!R \; \mapsto \;
1990I\!\!R^n
1991\end{equation} 
1992denote any vector polynomial in the scalar variable $t \in I\!\!R$.
1993In other words, $x(t)$ describes a path in $I\!\!R^n$ parameterized by $t$.
1994The Taylor coefficient vectors
1995\[ x_j \; = \; 
1996\frac{1}{j!} \left .  \frac{\partial ^j}{\partial t^j} x(t)
1997\right |_{t=0}
1998\] 
1999are simply the scaled derivatives of $x(t)$ at the parameter
2000origin $t=0$. The first two vectors $x_1,x_2 \in I\!\!R^n$ can be
2001visualized as tangent and curvature at the base point $x_0$,
2002respectively.
2003Provided that $F$ is $d$ times continuously differentiable, it
2004follows from the chain rule that the image path
2005\begin{equation}
2006\label{eq:rela}
2007 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
2008\end{equation}
2009is also smooth and has $(d+1)$ Taylor coefficient vectors
2010$y_j \in I\!\!R^m$ at $t=0$, so that
2011\begin{equation}
2012\label{eq:series}
2013y(t) \; = \; \sum_{j=0}^d y_jt^j + O(t^{d+1}).
2014\end{equation}
2015Also as a consequence of the chain rule, one can observe that
2016each $y_j$ is uniquely and smoothly determined by the coefficient
2017vectors $x_i$ with $i \leq j$.  In particular we have
2018\begin{align}
2019\label{eq:y_0y_1}
2020  y_0 & = F(x_0) \nonumber \\
2021  y_1 & = F'(x_0) x_1 \nonumber\\
2022  y_2 & = F'(x_0) x_2 + \frac{1}{2}F''(x_0)x_1 x_1 \\
2023  y_3 & = F'(x_0) x_3 + F''(x_0)x_1 x_2
2024          + \frac{1}{6}F'''(x_0)x_1 x_1 x_1\nonumber\\
2025  & \ldots\nonumber
2026\end{align}
2027In writing down the last equations we have already departed from the
2028usual matrix-vector notation. It is well known that the number of
2029terms that occur in these ``symbolic'' expressions for
2030the $y_j$ in terms of the first $j$ derivative tensors of $F$ and
2031the ``input'' coefficients $x_i$ with $i\leq j$ grows very rapidly
2032with $j$. Fortunately, this exponential growth does not occur
2033in automatic differentiation, where the many terms are somehow
2034implicitly combined  so that storage and operations count grow only
2035quadratically in the bound $d$ on $j$.
2036
2037Provided $F$ is analytic, this property is inherited by the functions
2038\[
2039y_j = y_j (x_0,x_1, \ldots ,x_j) \in {I\!\!R}^m ,
2040\]
2041and their derivatives satisfy the identities
2042\begin{equation}
2043\label{eq:identity}
2044\frac{\partial y_j}{\partial x_i}  = \frac{\partial y_{j-i}}
2045{\partial x_0} = A_{j-i}(x_0,x_1, \ldots ,x_{j-i})
2046\end{equation}
2047as established in \cite{Chri91a}. This yields in particular
2048\begin{align*}
2049  \frac{\partial y_0}{\partial x_0} =
2050  \frac{\partial y_1}{\partial x_1} =
2051  \frac{\partial y_2}{\partial x_2} =
2052  \frac{\partial y_3}{\partial x_3} =
2053  A_0 & = F'(x_0) \\
2054  \frac{\partial y_1}{\partial x_0} =
2055  \frac{\partial y_2}{\partial x_1} =
2056  \frac{\partial y_3}{\partial x_2} =
2057  A_1 & = F''(x_0) x_1 \\
2058  \frac{\partial y_2}{\partial x_0} =
2059  \frac{\partial y_3}{\partial x_1} =
2060  A_2 & = F''(x_0) x_2 + \frac{1}{2}F'''(x_0)x_1 x_1 \\
2061  \frac{\partial y_3}{\partial x_0} =
2062  A_3 & = F''(x_0) x_3 + F'''(x_0)x_1 x_2
2063          + \frac{1}{6}F^{(4)}(x_0)x_1 x_1 x_1 \\
2064  & \ldots
2065\end{align*}
2066The $m \times n$ matrices $A_k, k=0,\ldots,d$, are actually the Taylor
2067coefficients of the Jacobian path $F^\prime(x(t))$, a fact that is of
2068interest primarily in the context of ordinary differential
2069equations and differential algebraic equations.
2070
2071Given the tape of an active section and the coefficients $x_j$,
2072the resulting $y_j$ and their derivatives $A_j$ can be evaluated
2073by appropriate calls to the ADOL-C forward mode implementations and
2074the ADOL-C reverse mode implementations. The scalar versions of the forward
2075mode propagate just one truncated Taylor series from the $(x_j)_{j\leq d}$
2076to the $(y_j)_{j\leq d}$. The vector versions of the forward
2077mode propagate families of $p\geq 1$ such truncated Taylor series
2078in order to reduce the relative cost of the overhead incurred
2079in the tape interpretation. In detail, ADOL-C provides
2080\begin{tabbing}
2081\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2082\>{\sf int zos\_forward(tag,m,n,keep,x,y)}\\
2083\>{\sf short int tag;}         \> // tape identification \\
2084\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2085\>{\sf int n;}                 \> // number of independent variables $n$\\
2086\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2087\>{\sf double x[n];}           \> // independent vector $x=x_0$\\
2088\>{\sf double y[m];}           \> // dependent vector $y=F(x_0)$
2089\end{tabbing}                 
2090for the {\bf z}ero-{\bf o}rder {\bf s}calar forward mode. This driver computes
2091$y=F(x)$ with $0\leq\text{\sf keep}\leq 1$. The integer
2092flag {\sf keep} plays a similar role as in the call to 
2093{\sf trace\_on}: It determines if {\sf zos\_forward} writes
2094the first Taylor coefficients of all intermediate quantities into a buffered
2095temporary file, i.e., the value stack, in preparation for a subsequent
2096reverse mode evaluation. The value {\sf keep} $=1$
2097prepares for {\sf fos\_reverse} or {\sf fov\_reverse} as exlained below.
2098
2099To compute first-order derivatives, one has
2100\begin{tabbing}
2101\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2102\>{\sf int fos\_forward(tag,m,n,keep,x0,x1,y0,y1)}\\
2103\>{\sf short int tag;}         \> // tape identification \\
2104\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2105\>{\sf int n;}                 \> // number of independent variables $n$\\
2106\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2107\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2108\>{\sf double x1[n];}          \> // tangent vector $x_1$\\
2109\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2110\>{\sf double y1[m];}          \> // first derivative $y_1=F'(x_0)x_1$
2111\end{tabbing}                 
2112for the {\bf f}irst-{\bf o}rder {\bf s}calar forward mode. Here, one has
2113$0\leq\text{\sf keep}\leq 2$, where
2114\begin{align*}
2115\text{\sf keep} = \left\{\begin{array}{cl}
2116       1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2117       2 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2118       \end{array}\right.
2119\end{align*}
2120as exlained below. For the {\bf f}irst-{\bf o}rder {\bf v}ector forward mode,
2121ADOL-C provides
2122\begin{tabbing}
2123\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2124\>{\sf int fov\_forward(tag,m,n,p,x0,X,y0,Y)}\\
2125\>{\sf short int tag;}         \> // tape identification \\
2126\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2127\>{\sf int n;}                 \> // number of independent variables $n$\\
2128\>{\sf int p;}                 \> // number of directions\\
2129\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2130\>{\sf double X[n][p];}        \> // tangent matrix $X$\\
2131\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2132\>{\sf double Y[m][p];}        \> // first derivative matrix $Y=F'(x)X$
2133\end{tabbing}                 
2134For the computation of higher derivative, the driver
2135\begin{tabbing}
2136\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2137\>{\sf int hos\_forward(tag,m,n,d,keep,x0,X,y0,Y)}\\
2138\>{\sf short int tag;}         \> // tape identification \\
2139\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2140\>{\sf int n;}                 \> // number of independent variables $n$\\
2141\>{\sf int d;}                 \> // highest derivative degree $d$\\
2142\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2143\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2144\>{\sf double X[n][d];}        \> // tangent matrix $X$\\
2145\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2146\>{\sf double Y[m][d];}        \> // derivative matrix $Y$
2147\end{tabbing}
2148implementing the  {\bf h}igher-{\bf o}rder {\bf s}calar forward mode.
2149The rows of the matrix $X$ must correspond to the independent variables in the order of their
2150initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2151$X = \{x_j\}_{j=1\ldots d}$ represent Taylor coefficient vectors as in
2152\eqref{eq:x_of_t}. The rows of the matrix $Y$ must correspond to the
2153dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2154The columns of $Y = \{y_j\}_{j=1\ldots d}$ represent
2155Taylor coefficient vectors as in \eqref{eq:series}, i.e., {\sf hos\_forward}
2156computes the values
2157$y_0=F(x_0)$, $y_1=F'(x_0)x_1$, \ldots, where
2158$X=[x_1,x_2,\ldots,x_d]$ and  $Y=[y_1,y_2,\ldots,y_d]$. Furthermore, one has
2159$0\leq\text{\sf keep}\leq d+1$, with
2160\begin{align*}
2161\text{\sf keep}  \left\{\begin{array}{cl}
2162       = 1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2163       > 1 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2164       \end{array}\right.
2165\end{align*}
2166Once more, there is also a vector version given by
2167\begin{tabbing}
2168\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2169\>{\sf int hov\_forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2170\>{\sf short int tag;}         \> // tape identification \\
2171\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2172\>{\sf int n;}                 \> // number of independent variables $n$\\
2173\>{\sf int d;}                 \> // highest derivative degree $d$\\
2174\>{\sf int p;}                 \> // number of directions $p$\\
2175\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2176\>{\sf double X[n][p][d];}     \> // tangent matrix $X$\\
2177\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2178\>{\sf double Y[m][p][d];}     \> // derivative matrix $Y$
2179\end{tabbing}
2180for the  {\bf h}igher-{\bf o}rder {\bf v}ector forward mode that computes
2181$y_0=F(x_0)$, $Y_1=F'(x_0)X_1$, \ldots, where $X=[X_1,X_2,\ldots,X_d]$ and 
2182$Y=[Y_1,Y_2,\ldots,Y_d]$.
2183
2184There are also overloaded versions providing a general {\sf forward}-call.
2185Details of the appropriate calling sequences are given in \autoref{forw_rev}.
2186
2187Once, the required information is generated due to a forward mode evaluation
2188with an approriate value of the parameter {\sf keep}, one may use the
2189following implementation variants of the reverse mode. To compute first-order derivatives
2190one can use
2191\begin{tabbing}
2192\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2193\>{\sf int fos\_reverse(tag,m,n,u,z)}\\
2194\>{\sf short int tag;}         \> // tape identification \\
2195\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2196\>{\sf int n;}                 \> // number of independent variables $n$\\
2197\>{\sf double u[m];}           \> // weight vector $u$\\
2198\>{\sf double z[n];}           \> // resulting adjoint value $z^T=u^T F'(x)$
2199\end{tabbing}                 
2200as {\bf f}irst-{\bf o}rder {\bf s}calar reverse mode implementation that computes
2201the product $z^T=u^T F'(x)$ after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2202{\sf hos\_forward} with {\sf keep}=1. The corresponding {\bf f}irst-{\bf
2203  o}rder {\bf v}ector reverse mode driver is given by
2204\begin{tabbing}
2205\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2206\>{\sf int fov\_reverse(tag,m,n,q,U,Z)}\\
2207\>{\sf short int tag;}         \> // tape identification \\
2208\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2209\>{\sf int n;}                 \> // number of independent variables $n$\\
2210\>{\sf int q;}                 \> // number of weight vectors $q$\\
2211\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2212\>{\sf double Z[q][n];}        \> // resulting adjoint $Z=U F'(x)$
2213\end{tabbing}                 
2214that can be used after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2215{\sf hos\_forward} with {\sf keep}=1. To compute higher-order derivatives,
2216ADOL-C provides
2217\begin{tabbing}
2218\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2219\>{\sf int hos\_reverse(tag,m,n,d,u,Z)}\\
2220\>{\sf short int tag;}         \> // tape identification \\
2221\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2222\>{\sf int n;}                 \> // number of independent variables $n$\\
2223\>{\sf int d;}                 \> // highest derivative degree $d$\\
2224\>{\sf double u[m];}           \> // weight vector $u$\\
2225\>{\sf double Z[n][d+1];}      \> // resulting adjoints
2226\end{tabbing}                 
2227as {\bf h}igher-{\bf o}rder {\bf s}calar reverse mode implementation yielding
2228the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2229\ldots, where $Z=[z_0,z_1,\ldots,z_d]$ after calling  {\sf fos\_forward} or
2230{\sf hos\_forward} with {\sf keep} $=d+1>1$. The vector version is given by
2231\begin{tabbing}
2232\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2233\>{\sf int hov\_reverse(tag,m,n,d,q,U,Z,nz)}\\
2234\>{\sf short int tag;}         \> // tape identification \\
2235\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2236\>{\sf int n;}                 \> // number of independent variables $n$\\
2237\>{\sf int d;}                 \> // highest derivative degree $d$\\
2238\>{\sf double U[q][m];}        \> // weight vector $u$\\
2239\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints\\
2240\>{\sf short int nz[q][n];}    \> // nonzero pattern of {\sf Z}
2241\end{tabbing}                 
2242as {\bf h}igher-{\bf o}rder {\bf v}ector reverse mode driver to compute
2243the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2244\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$ after calling  {\sf fos\_forward} or
2245{\sf hos\_forward} with {\sf keep} $=d+1>1$.
2246After the function call, the last argument of {\sf hov\_reverse} 
2247contains information about the sparsity pattern, i.e. each {\sf nz[i][j]}
2248has a value that characterizes the functional relation between the
2249$i$-th component of $UF^\prime(x)$ and the $j$-th independent value
2250$x_j$ as:
2251\begin{center}
2252\begin{tabular}{ll}
2253 0 & trivial \\
2254 1 & linear
2255\end{tabular} \hspace*{4ex}
2256\begin{tabular}{ll}
2257 2 & polynomial\\
2258 3 & rational
2259\end{tabular} \hspace*{4ex}
2260\begin{tabular}{ll}
2261 4 & transcendental\\
2262 5 & non-smooth
2263\end{tabular}
2264\end{center}
2265Here, ``trivial'' means that there is no dependence at all and ``linear'' means
2266that the partial derivative is a constant that
2267does not dependent on other variables either. ``Non-smooth'' means that one of
2268the functions on the path between $x_i$ and $y_j$ was evaluated at a point
2269where it is not differentiable.  All positive labels
2270$1, 2, 3, 4, 5$ are pessimistic in that the actual functional relation may
2271in fact be simpler, for example due to exact cancellations. 
2272
2273There are also overloaded versions providing a general {\sf reverse}-call.
2274Details of the appropriate calling sequences are given in the following \autoref{forw_rev}.
2275%
2276%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2277\section{Overloaded Forward and Reverse Calls}
2278\label{forw_rev}
2279%
2280In this section, the several versions of the {\sf forward} and
2281{\sf reverse} routines, which utilize the overloading capabilities
2282of C++, are described in detail. With exception of the bit pattern
2283versions all interfaces are prototyped in the header file
2284\verb=<adolc/interfaces.h>=, where also some more specialized {\sf forward}
2285and {\sf reverse} routines are explained. Furthermore, \mbox{ADOL-C} provides
2286C and Fortran-callable versions prototyped in the same header file.
2287The bit pattern versions of {\sf forward} and {\sf reverse} introduced
2288in the \autoref{ProBit} are prototyped in the header file
2289\verb=<adolc/sparse/sparsedrivers.h>=, which will be included by the header
2290file \verb=<adolc/interfaces.h>= automatically.
2291%
2292%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2293\subsection{The Scalar Case}
2294%
2295\label{ScaCas}
2296%     
2297Given any correct tape, one may call from within
2298the generating program, or subsequently during another run, the following
2299procedure:
2300%
2301\begin{tabbing}
2302\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2303\>{\sf int forward(tag,m,n,d,keep,X,Y)} \\
2304\>{\sf short int tag;}         \> // tape identification \\
2305\>{\sf int m;}                 \> // number of dependent variables $m$\\
2306\>{\sf int n;}                 \> // number of independent variables $n$\\
2307\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2308\>{\sf  int keep;}             \> // flag for reverse sweep \\ 
2309\>{\sf  double X[n][d+1];}     \> // Taylor coefficients $X$ of
2310                                     independent variables \\
2311\>{\sf double Y[m][d+1];}      \> // Taylor coefficients $Y$ as
2312                                     in \eqref{eq:series}
2313\end{tabbing}
2314%
2315The rows of the matrix $X$ must correspond to the independent variables in the order of their
2316initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2317$X = \{x_j\}_{j=0\ldots d}$ represent Taylor coefficient vectors as in
2318\eqref{eq:x_of_t}. The rows of the matrix $Y$ must
2319correspond to the
2320dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2321The columns of $Y = \{y_j\}_{j=0\ldots d}$ represent
2322Taylor coefficient vectors as in \eqref{eq:series}.
2323Thus the first column of $Y$ contains the
2324function value $F(x)$ itself, the next column represents the first
2325Taylor coefficient vector of $F$, and the last column the
2326$d$-th Taylor coefficient vector. The integer flag {\sf keep} determines
2327how many Taylor coefficients of all intermediate quantities are
2328written into the value stack as explained in \autoref{forw_rev_ad}.
2329 If {\sf keep} is omitted, it defaults to 0.
2330
2331The given {\sf tag} value is used by {\sf forward} to determine the
2332name of the file on which the tape was written. If the tape file does
2333not exist, {\sf forward} assumes that the relevant
2334tape is still in core and reads from the buffers.
2335After the execution of an active section with \mbox{{\sf keep} = 1} or a call to
2336{\sf forward} with any {\sf keep} $\leq$ $d+1$, one may call
2337the function {\sf reverse} with \mbox{{\sf d} = {\sf keep} $-$ 1} and the same tape
2338identifier {\sf tag}. When $u$ is a vector
2339and $Z$ an $n\times (d+1)$ matrix
2340{\sf reverse} is executed in the scalar mode by the calling
2341sequence
2342%
2343\begin{tabbing}
2344\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position             
2345\>{\sf int reverse(tag,m,n,d,u,Z)}\\
2346\>{\sf short int tag;}         \> // tape identification \\
2347\>{\sf int m;}                 \> // number of dependent variables $m$\\
2348\>{\sf int n;}                 \> // number of independent variables $n$\\
2349\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2350\>{\sf  double u[m];}          \> // weighting vector $u$\\
2351\>{\sf double Z[n][d+1];}      \> // resulting adjoints $Z$ 
2352\end{tabbing}
2353to compute
2354the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2355\ldots, where $Z=[z_0,z_1,\ldots,z_d]$.
2356%
2357%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2358\subsection{The Vector Case}
2359%
2360\label{vecCas}
2361%
2362When $U$ is a matrix {\sf reverse} is executed in the vector mode by the following calling sequence
2363%
2364\begin{tabbing}
2365\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position       
2366\>{\sf int reverse(tag,m,n,d,q,U,Z,nz)}\\
2367\>{\sf short int tag;}         \> // tape identification \\
2368\>{\sf int m;}                 \> // number of dependent variables $m$\\
2369\>{\sf int n;}                 \> // number of independent variables $n$\\
2370\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2371\>{\sf int q;}                 \> // number of weight vectors $q$\\
2372\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2373\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints \\
2374\>{\sf short nz[q][n];}        \> // nonzero pattern of {\sf Z}
2375\end{tabbing}
2376%
2377to compute the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2378\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$.
2379When the arguments {\sf p} and {\sf U} are omitted, they default to
2380$m$ and the identity matrix of order $m$, respectively. 
2381
2382Through the optional argument {\sf nz} of {\sf reverse} one can compute
2383information about the sparsity pattern of $Z$ as described in detail
2384in the previous \autoref{forw_rev_ad}.
2385
2386The return values of {\sf reverse} calls can be interpreted according
2387to \autoref{retvalues}, but negative return values are not
2388valid, since the corresponding forward sweep would have
2389stopped without completing the necessary taylor file.
2390The return value of {\sf reverse} may be higher
2391than that of the preceding {\sf forward} call because some operations
2392that were evaluated  at a critical argument during the forward sweep
2393were found not to impact the dependents during the reverse sweep.
2394
2395In both scalar and vector mode, the degree $d$ must agree with
2396{\sf keep}~$-$~1 for the most recent call to {\sf forward}, or it must be
2397equal to zero if {\sf reverse} directly follows the taping of an active
2398section. Otherwise, {\sf reverse} will return control with a suitable error
2399message.
2400In order to avoid possible confusion, the first four arguments must always be
2401present in the calling sequence. However, if $m$ or $d$
2402attain their trivial values 1 and 0, respectively, then
2403corresponding dimensions of the arrays {\sf X}, {\sf Y}, {\sf u},
2404{\sf U}, or {\sf Z} can be omitted, thus eliminating one level of
2405indirection.  For example, we may call
2406{\sf reverse(tag,1,n,0,1.0,g)} after declaring
2407{\sf double g[n]} 
2408to calculate a gradient of a scalar-valued function.
2409
2410Sometimes it may be useful to perform a forward sweep for families of
2411Taylor series with the same leading term.
2412This vector version of {\sf forward} can be called in the form
2413%
2414\begin{tabbing}
2415\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2416\>{\sf int forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2417\>{\sf short int tag;}         \> // tape identification \\
2418\>{\sf int m;}                 \> // number of dependent variables $m$\\
2419\>{\sf int n;}                 \> // number of independent variables $n$\\
2420\>{\sf int d;}                 \> // highest derivative degree $d$\\
2421\>{\sf int p;}                 \> // number of Taylor series $p$\\
2422\>{\sf  double x0[n];}          \> // values of independent variables $x_0$\\
2423\>{\sf double X[n][p][d];}     \> // Taylor coefficients $X$ of independent variables\\
2424\>{\sf double y0[m];}           \> // values of dependent variables $y_0$\\
2425\>{\sf double Y[m][p][d];}     \> // Taylor coefficients $Y$ of dependent variables
2426\end{tabbing}
2427%
2428where {\sf X} and {\sf Y} hold the Taylor coefficients of first
2429and higher degree and {\sf x0}, {\sf y0} the common Taylor coefficients of
2430degree 0. There is no option to keep the values of active variables
2431that are going out of scope or that are overwritten. Therefore this
2432function cannot prepare a subsequent reverse sweep.
2433The return integer serves as a flag to indicate quadratures or altered
2434comparisons as described above in \autoref{reuse_tape}.
2435
2436Since the calculation of Jacobians is probably the most important
2437automatic differentia\-tion task, we have provided a specialization
2438of vector {\sf forward} to the case where $d = 1$. This version can be
2439called in the form
2440%
2441\begin{tabbing}
2442\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2443\>{\sf int forward(tag,m,n,p,x,X,y,Y)}\\
2444\>{\sf short int tag;}         \> // tape identification \\
2445\>{\sf int m;}                 \> // number of dependent variables $m$\\
2446\>{\sf int n;}                 \> // number of independent variables $n$\\
2447\>{\sf int p;}                 \> // number of partial derivatives $p$ \\
2448\>{\sf double x[n];}          \> // values of independent variables $x_0$\\
2449\>{\sf double X[n][p];}        \> // seed derivatives of independent variables $X$\\
2450\>{\sf double y[m];}           \> // values of dependent variables $y_0$\\
2451\>{\sf double Y[m][p];}        \> // first derivatives of dependent variables $Y$
2452\end{tabbing}
2453%
2454When this routine is called with {\sf p} = {\sf n} and {\sf X} the identity matrix,
2455the resulting {\sf Y} is simply the Jacobian $F^\prime(x_0)$. In general,
2456one obtains the $m\times p$ matrix $Y=F^\prime(x_0)\,X $ for the
2457chosen initialization of $X$. In a workstation environment a value
2458of $p$ somewhere between $10$ and $50$
2459appears to be fairly optimal. For smaller $p$ the interpretive
2460overhead is not appropriately amortized, and for larger $p$ the
2461$p$-fold increase in storage causes too many page faults. Therefore,
2462large Jacobians that cannot be compressed via column coloring
2463as could be done for example using the driver {\sf sparse\_jac}
2464should be ``strip-mined'' in the sense that the above
2465first-order-vector version of {\sf forward} is called
2466repeatedly with the successive \mbox{$n \times p$} matrices $X$ forming
2467a partition of the identity matrix of order $n$.
2468%
2469%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2470\subsection{Dependence Analysis}
2471%
2472\label{ProBit}
2473%
2474The sparsity pattern of Jacobians is often needed to set up data structures
2475for their storage and factorization or to allow their economical evaluation
2476by compression \cite{BeKh96}. Compared to the evaluation of the full
2477Jacobian $F'(x_0)$ in real arithmetic computing the Boolean matrix
2478$\tilde{P}\in\left\{0,1\right\}^{m\times n}$ representing its sparsity
2479pattern in the obvious way requires a little less run-time and
2480certainly a lot less memory.
2481
2482The entry $\tilde{P}_{ji}$ in the $j$-th row and $i$-th column
2483of $\tilde{P}$ should be $1 = true$ exactly when there is a data
2484dependence between the $i$-th independent variable $x_{i}$ and
2485the $j$-th dependent variable $y_{j}$. Just like for real arguments
2486one would wish to compute matrix-vector and vector-matrix products
2487of the form $\tilde{P}\tilde{v}$ or $\tilde{u}^{T}\tilde{P}$ 
2488by appropriate {\sf forward} and {\sf reverse} routines where
2489$\tilde{v}\in\{0,1\}^{n}$ and $\tilde{u}\in\{0,1\}^{m}$.
2490Here, multiplication corresponds to logical
2491{\sf AND} and addition to logical {\sf OR}, so that algebra is performed in a
2492semi-ring.
2493
2494For practical reasons it is assumed that
2495$s=8*${\sf sizeof}$(${\sf unsigned long int}$)$ such Boolean vectors
2496$\tilde{v}$ and $\tilde{u}$ are combined to integer vectors
2497$v\in\N^{n}$ and $u\in\N^{m}$ whose components can be interpreted
2498as bit patterns. Moreover $p$ or $q$ such integer vectors may
2499be combined column-wise or row-wise to integer matrices $X\in\N^{n \times p}$ 
2500and $U\in\N^{q \times m}$, which naturally correspond
2501to Boolean matrices $\tilde{X}\in\{0,1\}^{n\times\left(sp\right)}$
2502and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times m}$. The provided
2503bit pattern versions of {\sf forward} and {\sf reverse} allow
2504to compute integer matrices $Y\in\N^{m \times p}$ and
2505$Z\in\N^{q \times m}$ corresponding to
2506\begin{equation}
2507\label{eq:int_forrev}
2508\tilde{Y} = \tilde{P}\tilde{X} \qquad \mbox{and} \qquad 
2509\tilde{Z} = \tilde{U}\tilde{P} \, ,
2510\end{equation} 
2511respectively, with $\tilde{Y}\in\{0,1\}^{m\times\left(sp\right)}$
2512and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times n}$.
2513In general, the application of the bit pattern versions of
2514{\sf forward} or {\sf reverse} can be interpreted as
2515propagating dependences between variables forward or backward, therefore
2516both the propagated integer matrices and the corresponding
2517Boolean matrices are called {\em dependence structures}.
2518 
2519The bit pattern {\sf forward} routine
2520%
2521\begin{tabbing}
2522\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2523\>{\sf int forward(tag,m,n,p,x,X,y,Y,mode)}\\
2524\>{\sf short int tag;}              \> // tape identification \\
2525\>{\sf int m;}                      \> // number of dependent variables $m$\\
2526\>{\sf int n;}                      \> // number of independent variables $n$\\
2527\>{\sf int p;}                      \> // number of integers propagated $p$\\
2528\>{\sf double x[n];}                \> // values of independent variables $x_0$\\
2529\>{\sf unsigned long int X[n][p];}  \> // dependence structure $X$ \\
2530\>{\sf double y[m];}                \> // values of dependent variables $y_0$\\
2531\>{\sf unsigned long int Y[m][p];}  \> // dependence structure $Y$ according to
2532                                     \eqref{eq:int_forrev}\\
2533\>{\sf char mode;}                  \> // 0 : safe mode (default), 1 : tight mode
2534\end{tabbing}
2535%
2536can be used to obtain the dependence structure $Y$ for a given dependence structure
2537$X$. The dependence structures are
2538represented as arrays of {\sf unsigned long int} the entries of which are
2539interpreted as bit patterns as described above.   
2540For example, for $n=3$ the identity matrix $I_3$ should be passed
2541with $p=1$ as the $3 \times 1$ array
2542\begin{eqnarray*}
2543{\sf X} \; = \;
2544\left( \begin{array}{r}
2545         {\sf 1}0000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2546         0{\sf 1}000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2547         00{\sf 1}00000 \: 00000000 \: 00000000 \: 00000000_2
2548       \end{array} \right)
2549\end{eqnarray*}
2550in the 4-byte long integer format. The parameter {\sf mode} determines
2551the mode of dependence analysis as explained already in \autoref{sparse}.
2552
2553A call to the corresponding bit pattern {\sf reverse} routine
2554%
2555\begin{tabbing}
2556\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2557\>{\sf int reverse(tag,m,n,q,U,Z,mode)}\\
2558\>{\sf short int tag;}         \> // tape identification \\
2559\>{\sf int m;}                 \> // number of dependent variables $m$\\
2560\>{\sf int n;}                 \> // number of independent variables $n$\\
2561\>{\sf int q;}                 \> // number of integers propagated q\\
2562\>{\sf unsigned long int U[q][m];}  \> // dependence structure $U$ \\
2563\>{\sf unsigned long int Z[q][n];}  \> // dependence structure $Z$ according
2564                                     to \eqref{eq:int_forrev}\\
2565\>{\sf char mode;}        \> // 0 : safe mode (default), 1 : tight mode
2566\end{tabbing}
2567%
2568yields the dependence structure $Z$ for a given dependence structure
2569$U$.
2570
2571To determine the whole sparsity pattern $\tilde{P}$ of the Jacobian $F'(x)$
2572as an integer matrix $P$ one may call {\sf forward} or {\sf reverse} 
2573with $p \ge n/s$ or $q \ge m/s$, respectively. For this purpose the
2574corresponding dependence structure $X$ or $U$ must be defined to represent 
2575the identity matrix of the respective dimension.
2576Due to the fact that always a multiple of $s$ Boolean vectors are propagated
2577there may be superfluous vectors, which can be set to zero.
2578
2579The return values of the bit pattern {\sf forward} and {\sf reverse} routines
2580correspond to those described in \autoref{retvalues}.
2581
2582One can control the storage growth by the factor $p$ using
2583``strip-mining'' for the calls of {\sf forward} or {\sf reverse} with successive
2584groups of columns or respectively rows at a time, i.e.~partitioning
2585$X$ or $U$ appropriately as described for the computation of Jacobians
2586in \autoref{vecCas}.
2587%
2588%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2589%
2590\section{Advance algorithmic differentiation in ADOL-C}
2591\label{adv_ad}
2592%
2593\subsection{External differentiated functions}
2594%
2595Ideally, AD is applied to a given function as a whole.
2596In practice, however, sophisticated projects usually evolve over a long period of time.
2597Within this process, a heterogeneous code base for the project
2598develops, which may include the incorporation of external solutions,
2599changes in programming paradigms or even of programming languages.
2600Equally heterogeneous, the computation of derivative values appears.
2601Hence, different \mbox{AD-tools} may be combined with hand-derived
2602codes based on the same or different programming languages.
2603ADOL-C support such settings  by the concept of external
2604differentiated functions. Hence, a external differentiated function
2605itself is not differentiated by ADOL-C. The required derivative
2606information have to be provided by the user.
2607
2608For this purpose, it is assumed that the external differentiated
2609function has the signature
2610\smallskip
2611
2612\noindent
2613\hspace*{2cm}{\sf int ext\_func(int n, double *yin, int m, double  *yout);}
2614\medskip
2615
2616\noindent
2617where the function names can be chosen by the user as long as the names are
2618unique. This {\sf double} version of the external differentiated function has to
2619be {\em registered} using the \mbox{ADOL-C} function
2620\smallskip
2621
2622\noindent
2623\hspace*{2cm}{\sf edf = reg\_ext\_fct(ext\_func);}.
2624\smallskip
2625
2626\noindent
2627This function initializes the structure {\sf edf}. Then,
2628the user has to provide the remaining  information
2629by the following commands:
2630\begin{tabbing}
2631\hspace*{2cm}\= {\sf edf-$>$zos\_forward = zos\_for\_ext\_func;}\\
2632             \> {\sf // function pointer for computing
2633               Zero-Order-Scalar (=zos)}\\
2634             \> {\sf // forward information}\\
2635             \> {\sf edf-$>$dp\_x = xp;}\\
2636             \> {\sf edf-$>$dp\_y = yp;}\\
2637             \> {\sf // double arrays for arguments and results}\\
2638             \> {\sf edf-$>$fos\_reverse = fos\_rev\_ext\_func;} \\
2639             \> {\sf // function pointer for computing
2640               First-Order-Scalar (=fos)}\\ 
2641             \> {\sf reverse information}
2642\end{tabbing}
2643Subsequently, the call to the external differentiated function  in the function evaluation can be
2644substituted by the call of
2645\medskip
2646
2647\noindent
2648\hspace*{2cm}{\sf int call\_ext\_fct(edf, n, xp, x, m, yp, y);}
2649\medskip
2650
2651\noindent
2652The usage of the external function facility is illustrated by the
2653example \verb=ext_diff_func= contained in
2654\verb=examples/additional_examples/ext_diff_func=.
2655Here,the external differentiated function is also a C code, but the
2656handling as external differentiated functions also a decrease of the
2657overall required tape size.
2658
2659%
2660\subsection{Advance algorithmic differentiation of time integration processes}
2661%
2662For many time-dependent applications, the corresponding simulations
2663are based on ordinary or partial differential equations.
2664Furthermore, frequently there are quantities that influence the
2665result of the simulation and can be seen as  control of the systems.
2666To compute an approximation of the
2667simulated process for a time interval $[0,T]$ and evaluated the
2668desired target function, one applies an
2669appropriate integration scheme given by
2670\begin{tabbing}
2671\hspace{5mm} \= some initializations yielding $x_0$\\
2672\> for $i=0,\ldots, N-1$\\
2673\hspace{10mm}\= $x_{i+1} = F(x_i,u_i,t_i)$\\
2674\hspace{5mm} \= evaluation of the target function
2675\end{tabbing}
2676where $x_i\in {\bf R}^n$ denotes the state and $u_i\in {\bf R}^m$ the control at
2677time $t_i$ for a given time grid $t_0,\ldots,t_N$ with $t_0=0$ and
2678$t_N=T$. The operator $F : {\bf R}^n \times {\bf R}^m \times {\bf R} \mapsto {\bf R}^n$
2679defines the time step to compute the state at time $t_i$. Note that we
2680do not assume a uniform grid.
2681
2682When computing derivatives of the target function with respect to the
2683control, the consequences for the tape generation using the ``basic''
2684taping approach as implemented in ADOL-C so far are shown in the left part of
2685\autoref{fig:bas_tap}.
2686\begin{figure}[htbp] 
2687\begin{center}
2688\hspace*{0.5cm}\includegraphics[width=5.8cm]{tapebasic}\hfill
2689\includegraphics[width=5.8cm]{tapeadv} \hspace*{0.5cm}\
2690\end{center}
2691\hspace*{0.8cm} Basic taping process \hspace*{4.3cm} Advanced taping process
2692\caption{Different taping approaches}
2693\label{fig:bas_tap}
2694\end{figure} 
2695As can be seen, the iterative process is completely
2696unrolled due to the taping process. That is, the tape contains an internal representation of each
2697time step. Hence, the overall tape comprises a serious amount of redundant
2698information as illustrated by the light grey rectangles in
2699\autoref{fig:bas_tap}
2700
2701To overcome the repeated storage of essentially the same information,
2702a {\em nested taping} mechanism has been incorporated into ADOL-C as illustrated on
2703the right-hand side of \autoref{fig:bas_tap}. This new
2704capability allows the encapsulation of the time-stepping procedure
2705such that only the last time step $x_{N} = F(x_{N-1},u_{N-1})$ is taped as one
2706representative of the time steps in addition to a function pointer to the
2707evaluation procedure $F$ of the time steps.  The function pointer has
2708to be stored for a possibly necessary retaping during the derivative calculation
2709as explained below.
2710
2711Instead of storing the complete tape, only a very limited number of intermediate
2712states are kept in memory. They serve as checkpoints, such that
2713the required information for the backward integration is generated
2714piecewise during the adjoint calculation.
2715For this modified adjoint computation the optimal checkpointing schedules
2716provided by {\bf revolve} are employed. An adapted version of the
2717software package {\sf revolve} is part of ADOL-C and automatically
2718integrated in the ADOL-C library. Based on {\sf revolve}, $c$ checkpoints are
2719distributed such that computational effort is minimized for the given
2720number of checkpoints and time steps $N$. It is important to note that the overall tape
2721size is drastically reduced due to the advanced taping strategy.  For the
2722implementation of this nested taping we introduced
2723a so-called ``differentiating context'' that enables \mbox{ADOL-C} to
2724handle different internal function representations during the taping
2725procedure and the derivative calculation. This approach allows the generation of a new
2726tape inside the overall tape, where the coupling of the different tapes is based on
2727the {\em external differentiated function} described above.
2728
2729Written under the objective of minimal user effort, the checkpointing routines
2730of \mbox{ADOL-C} need only very limited information. The user must
2731provide two routines as implementation of the time-stepping function $F$ 
2732with the signatures
2733\medskip
2734
2735\noindent
2736\hspace*{2cm}{\sf int time\_step\_function(int n, adouble *u);}\\
2737\hspace*{2cm}{\sf int time\_step\_function(int n, double *u);}
2738\medskip
2739
2740\noindent
2741where the function names can be chosen by the user as long as the names are
2742unique.It is possible that the result vector of one time step
2743iteration overwrites the argument vector of the same time step. Then, no
2744copy operations are required to prepare the next time step.
2745
2746At first, the {\sf adouble} version of the time step function has to
2747be {\em registered} using the \mbox{ADOL-C} function
2748\medskip
2749
2750\noindent
2751\hspace*{2cm}{\sf CP\_Context cpc(time\_step\_function);}.
2752\medskip
2753
2754\noindent
2755This function initializes the structure {\sf cpc}. Then,
2756the user has to provide the remaining checkpointing information
2757by the following commands:
2758\begin{tabbing}
2759\hspace*{2cm}\= {\sf cpc.setDoubleFct(time\_step\_function);}\\
2760             \> {\sf // double variante of the time step function}\\
2761             \> {\sf cpc.setNumberOfSteps(N);}\\
2762             \> {\sf // number of time steps to perform}\\
2763             \> {\sf cpc.setNumberOfCheckpoints(10);}\\
2764             \> {\sf // number of checkpoint} \\
2765             \> {\sf cpc.setDimensionXY(n);}\\
2766             \> {\sf // dimension of input/output}\\
2767             \> {\sf cpc.setInput(y);}\\
2768             \> {\sf // input vector} \\
2769             \> {\sf cpc.setOutput(y);}\\
2770             \> {\sf // output vector }\\
2771             \> {\sf cpc.setTapeNumber(tag\_check);}\\
2772             \> {\sf // subtape number for checkpointing} \\
2773             \> {\sf cpc.setAlwaysRetaping(false);}\\
2774             \> {\sf // always retape or not ?}
2775\end{tabbing}
2776Subsequently, the time loop in the function evaluation can be
2777substituted by a call of the function
2778\medskip
2779
2780\noindent
2781\hspace*{2cm}{\sf int cpc.checkpointing();}
2782\medskip
2783
2784\noindent
2785Then, ADOL-C computes derivative information using the optimal checkpointing
2786strategy provided by {\sf revolve} internally, i.e., completely hidden from the user.
2787
2788The presented driver is prototyped in the header file
2789\verb=<adolc/checkpointing.h>=. This header
2790is included by the global header file \verb=<adolc/adolc.h>= automatically.
2791An example program \verb=checkpointing.cpp= illustrates the
2792checkpointing facilities. It can be found in the directory \verb=examples/additional_examples/checkpointing=.
2793%
2794%
2795%
2796\subsection{Advance algorithmic differentiation of fixed point iterations}
2797%
2798Quite often, the state of the considered system denoted by $x\in\R^n$
2799depends on some design parameters denoted by $u\in\R^m$. One example for this setting
2800forms the flow over an aircraft wing. Here, the shape of the wing that
2801is defined by the design vector $u$ 
2802determines the flow field $x$. The desired quasi-steady state $x_*$
2803fulfills the fixed point equation
2804\begin{align}
2805  \label{eq:fixedpoint}
2806  x_* = F(x_*,u)
2807\end{align}
2808for a given continuously differentiable function
2809$F:\R^n\times\R^m\rightarrow\R^n$. A fixed point property of this kind is
2810also exploited by many other applications.
2811
2812Assume that one can apply the iteration 
2813\begin{align}
2814\label{eq:iteration}
2815 x_{k+1} = F(x_k,u)
2816\end{align}
2817to obtain a linear converging sequence $\{x_k\}$ generated
2818for any given control $u\in\R^n$. Then the limit point $x_*\in\R^n$ fulfils the fixed
2819point equation~\eqref{eq:fixedpoint}. Moreover,
2820suppose that $\|\frac{dF}{dx}(x_*,u)\|<1$ holds for any pair
2821$(x_*,u)$ satisfying equation \eqref{eq:fixedpoint}.
2822Hence, there exists a
2823differentiable function $\phi:\R^m \rightarrow \R^n$,
2824such that $\phi(u) = F(\phi(u),u)$, where the state
2825$\phi(u)$ is a fixed point of $F$ according to a control
2826$u$. To optimize the system described by the state vector $x=\phi(u)$ with respect to
2827the design vector $u$, derivatives of $\phi$ with respect
2828to $u$ are of particular interest.
2829
2830To exploit the advanced algorithmic differentiation  of such fixed point iterations
2831ADOL-C provides the special functions {\tt fp\_iteration(...)}.
2832It has the following interface:
2833\begin{tabbing}
2834\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2835\>{\sf int
2836  fp\_iteration(}\={\sf sub\_tape\_num,double\_F,adouble\_F,norm,norm\_deriv,eps,eps\_deriv,}\\
2837\>              \>{\sf N\_max,N\_max\_deriv,x\_0,u,x\_fix,dim\_x,dim\_u)}\\
2838\hspace{0.5in}\={\sf short int tag;} \hspace{0.9in}\= \kill    % define tab position
2839\>{\sf short int sub\_tape\_num;}         \> // tape identification for sub\_tape \\
2840\>{\sf int *double\_F;}         \> // pointer to a function that compute for $x$ and $u$ \\
2841\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
2842\>{\sf int *adouble\_F;}        \> // pointer to a function that compute for $x$ and $u$ \\
2843\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
2844\>{\sf int *norm;}              \> // pointer to a function that computes\\
2845\>                              \> // the norm of a vector\\
2846\>{\sf int *norm\_deriv;}       \> // pointer to a function that computes\\
2847\>                              \> // the norm of a vector\\
2848\>{\sf double eps;}             \> // termination criterion for fixed point iteration\\
2849\>{\sf double eps\_deriv;}      \> // termination criterion for adjoint fixed point iteration\\
2850\>{\sf N\_max;}                 \> // maximal number of itertions for state computation\\
2851\>{\sf N\_max\_deriv;}          \> // maximal number of itertions for adjoint computation\\
2852\>{\sf adouble *x\_0;}          \> // inital state of fixed point iteration\\
2853\>{\sf adouble *u;}             \> // value of $u$\\
2854\>{\sf adouble *x\_fic;}        \> // final state of fixed point iteration\\
2855\>{\sf int dim\_x;}             \> // dimension of $x$\\
2856\>{\sf int dim\_u;}             \> // dimension of $u$\\
2857\end{tabbing}
2858%
2859Here {\tt sub\_tape\_num} is an ADOL-C identifier for the subtape that
2860should be used for the fixed point iteration.
2861{\tt double\_F} and {\tt adouble\_F} are pointers to functions, that
2862compute for $x$ and $u$ a single iteration step $y=F(x,u)$. Thereby
2863{\tt double\_F} uses {\tt double} arguments and {\tt adouble\_F}
2864uses ADOL-C {\tt adouble} arguments. The parameters {\tt norm} and
2865{\tt norm\_deriv} are pointers to functions computing the norm
2866of a vector. The latter functions together with {\tt eps},
2867{\tt eps\_deriv}, {\tt N\_max}, and {\tt N\_max\_deriv} control
2868the iterations. Thus the following loops are performed:
2869\begin{center}
2870\begin{tabular}{ll}
2871  do                     &   do                           \\
2872  ~~~~$k = k+1$          &   ~~~~$k = k+1$                \\
2873  ~~~~$x = y$            &   ~~~~$\zeta = \xi$            \\
2874  ~~~~$y = F(x,u)$       &   ~~~
2875  $(\xi^T,\bar u^T) = \zeta^TF'(x_*,u) + (\bar x^T, 0^T)$ \\
2876  while $\|y-x\|\geq\varepsilon$ and $k\leq N_{max}$ \hspace*{0.5cm} &
2877  while $\|\xi -\zeta\|_{deriv}\geq\varepsilon_{deriv}$   \\
2878  & and $k\leq N_{max,deriv}$
2879\end{tabular}
2880\end{center}
2881The vector for the initial iterate and the control is stored
2882in {\tt x\_0} and {\tt u} respectively. The vector in which the
2883fixed point is stored is {\tt x\_fix}. Finally {\tt dim\_x}
2884and {\tt dim\_u} represent the dimensions $n$ and $m$ of the
2885corresponding vectors.
2886
2887The presented driver is prototyped in the header file
2888\verb=<adolc/fixpoint.h>=. This header
2889is included by the global header file \verb=<adolc/adolc.h>= automatically.
2890An example code that shows also the
2891expected signature of the function pointers is contained in the directory \verb=examples/additional_examples/fixpoint_exam=.
2892%
2893\subsection{Advance algorithmic differentiation of OpenMP parallel programs}
2894%
2895ADOL-C allows to compute derivatives in parallel for functions
2896containing OpenMP parallel loops.
2897This implies that an explicit loop-handling approach is applied. A
2898typical situation is shown in \autoref{fig:basic_layout},
2899\begin{figure}[hbt]
2900    \vspace{3ex}
2901    \begin{center}
2902        \includegraphics[height=4cm]{multiplexed} \\
2903        \begin{picture}(0,0)
2904            \put(-48,40){\vdots}
2905            \put(48,40){\vdots}
2906            \put(-48,80){\vdots}
2907            \put(48,80){\vdots}
2908            \put(-83,132){function eval.}
2909            \put(5,132){derivative calcul.}
2910        \end{picture}
2911    \end{center}
2912    \vspace{-5ex}
2913    \caption{Basic layout of mixed function and the corresponding derivation process}
2914    \label{fig:basic_layout}
2915\end{figure}
2916where the OpenMP-parallel loop is preceded by a serial startup
2917calculation and followed by a serial finalization phase.
2918
2919Initialization of the OpenMP-parallel regions for \mbox{ADOL-C} is only a matter of adding a macro to the outermost OpenMP statement.
2920Two macros are available that only differ in the way the global tape information is handled.
2921Using {\tt ADOLC\_OPENMP}, this information, including the values of the augmented variables, is always transferred from the serial to the parallel region using {\it firstprivate} directives for initialization.
2922For the special case of iterative codes where parallel regions, working on the same data structures, are called repeatedly the {\tt ADOLC\_OPENMP\_NC} macro can be used.
2923Then, the information transfer is performed only once within the iterative process upon encounter of the first parallel region through use of the {\it threadprivate} feature of OpenMP that makes use of thread-local storage, i.e., global memory local to a thread.
2924Due to the inserted macro, the OpenMP statement has the following structure:
2925\begin{tabbing}
2926\hspace*{1cm} \= {\sf \#pragma omp ... ADOLC\_OPENMP} \qquad \qquad or \\
2927              \> {\sf \#pragma omp ... ADOLC\_OPENMP\_NC}
2928\end{tabbing}
2929Inside the parallel region, separate tapes may then be created.
2930Each single thread works in its own dedicated AD-environment, and all
2931serial facilities of \mbox{ADOL-C} are applicable as usual. The global
2932derivatives can be computed using the tapes created in the serial and
2933parallel parts of the function evaluation, where user interaction is
2934required for the correct derivative concatenation of the various tapes.
2935
2936For the usage of the parallel facilities, the \verb=configure=-command
2937has to be used with the option \verb?--with-openmp-flag=FLAG?, where
2938\verb=FLAG= stands for the system dependent OpenMP flag.
2939The parallel differentiation of a parallel program is illustrated
2940by the example program \verb=openmp_exam.cpp= contained in \verb=examples/additional_examples/openmp_exam=.
2941%
2942%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2943%
2944\section{Tapeless forward differentiation in ADOL-C}
2945\label{tapeless}
2946%
2947Up to version 1.9.0, the development of the ADOL-C software package
2948was based on the decision to store all data necessary for derivative
2949computation on tapes, where large applications require the tapes to be
2950written out to corresponding files. In almost all cases this means
2951a considerable drawback in terms of run time due to the excessive
2952memory accesses. Using these tapes enables ADOL-C to offer multiple
2953functions. However, it is not necessary for all tasks of derivative
2954computation to do that.
2955
2956Starting with version 1.10.0, ADOL-C now features a tapeless forward
2957mode for computing first order derivatives in scalar mode, i.e.,
2958$\dot{y} = F'(x)\dot{x}$, and in vector mode, i.e., $\dot{Y} = F'(x)\dot{X}$.
2959This tapeless variant coexists with the more universal
2960tape based mode in the package. The following subsections describe
2961the source code modifications required to use the tapeless forward mode of
2962ADOL-C. 
2963%
2964\subsection{Modifying the Source Code}
2965%
2966Let us consider the coordinate transformation from Cartesian to spherical
2967polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y
2968= F(x)$, with
2969\begin{eqnarray*}
2970y_1  =  \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad
2971y_2  =  \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad
2972y_3  =  \arctan(x_2/x_1),
2973\end{eqnarray*}
2974as an example. The corresponding source code is shown in \autoref{fig:tapeless}.
2975\begin{figure}[htb]
2976\framebox[\textwidth]{\parbox{\textwidth}{
2977%\begin{center}
2978%\begin{flushleft}
2979\begin{tabbing}
2980\= \kill
2981\> {\sf \#include} {\sf $<$iostream$>$}\\
2982\> {\sf using namespace std;}\\
2983\> \\
2984\> {\sf int main() \{}\\
2985\> {\sf \rule{0.5cm}{0pt}double x[3], y[3];}\\
2986\> \\
2987\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
2988\> {\sf \rule{1cm}{0pt}...}\\
2989\> \\
2990\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
2991\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
2992\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
2993\> \\
2994\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0] $<<$ " , y2=" $<<$ y[1] $<<$ " , y3=" $<<$ y[2] $<<$ endl;}\\
2995\> \\
2996\> {\sf \rule{0.5cm}{0pt}return 0;}\\
2997\> \}
2998\end{tabbing}
2999%\end{flushleft}
3000%\end{center}
3001}}
3002\caption{Example for tapeless forward mode}
3003\label{fig:tapeless}
3004\end{figure}
3005%
3006Changes to the source code that are necessary for applying the
3007tapeless forward ADOL-C are described in the following two
3008subsections, where the vector mode version is described
3009as extension of the scalar mode.
3010%
3011\subsubsection*{The scalar mode}
3012%
3013To use the tapeless forward mode, one has to include one
3014of the header files \verb#adolc.h# or \verb#adouble.h#
3015where the latter should be preferred since it does not include the
3016tape based functions defined in other header files. Hence, including
3017\verb#adouble.h# avoids mode mixtures, since
3018\verb#adolc.h# is just a wrapper for including all public
3019  headers of the ADOL-C package and does not offer own functions.
3020Since the two ADOL-C forward mode variants tape-based and tapeless,
3021are prototyped in the same header file, the compiler needs to know if a
3022tapeless version is intended. This can be done by defining a
3023preprocessor macro named {\sf ADOLC\_TAPELESS}. Note that it is
3024important to define this macro before the header file is included.
3025Otherwise, the tape-based version of ADOL-C will be used.
3026
3027As in the tape based forward version of ADOL-C all derivative
3028calculations are introduced by calls to overloaded
3029operators. Therefore, similar to the tape-based version all
3030independent, intermediate and dependent variables must be declared
3031with type {\sf adouble}. The whole tapeless functionality provided by
3032\verb#adolc.h# was written as complete inline intended code
3033due to run time aspects, where the real portion of inlined code can
3034be influenced by switches for many compilers. Likely, the whole
3035derivative code is inlined by default. Our experiments
3036with the tapeless mode have produced complete inlined code by using
3037standard switches (optimization) for GNU and Intel C++
3038compiler.
3039
3040To avoid name conflicts
3041resulting from the inlining the tapeless version has its own namespace
3042\verb#adtl#. As a result four possibilities of using the {\sf adouble}
3043type are available for the tapeless version:
3044\begin{itemize}
3045\item Defining a new type
3046      \begin{center}
3047        \begin{tabular}{l}
3048          {\sf typedef adtl::adouble adouble;}\\
3049          ...\\
3050          {\sf adouble tmp;}
3051        \end{tabular}
3052      \end{center}
3053      This is the preferred way. Remember, you can not write an own
3054      {\sf adouble} type/class with different meaning after doing the typedef.
3055\item Declaring with namespace prefix
3056      \begin{center}
3057        \begin{tabular}{l}
3058          {\sf adtl::adouble tmp;}
3059        \end{tabular}
3060      \end{center}
3061      Not the most handsome and efficient way with respect to coding
3062      but without any doubt one of the safest ways. The identifier
3063      {\sf adouble} is still available for user types/classes.
3064\item Trusting macros
3065      \begin{center}
3066        \begin{tabular}{l}
3067          {\sf \#define adouble adtl::adouble}\\
3068          ...\\
3069          {\sf adouble tmp;}
3070        \end{tabular}
3071      \end{center}
3072      This approach should be used with care, since standard defines are text replacements.
3073  \item Using the complete namespace
3074        \begin{center}
3075          \begin{tabular}{l}
3076            {\sf \#using namespace adtl;}\\
3077            ...\\
3078            {\sf adouble tmp;}
3079          \end{tabular}
3080        \end{center}
3081        A very clear approach with the disadvantage of uncovering all the hidden secrets. Name conflicts may arise!
3082\end{itemize}
3083After defining the variables only two things are left to do. First
3084one needs to initialize the values of the independent variables for the
3085function evaluation. This can be done by assigning the variables a {\sf
3086double} value. The {\sf ad}-value is set to zero in this case.
3087Additionally, the tapeless forward mode variant of ADOL-C
3088offers a function named {\sf setValue} for setting the value without
3089changing the {\sf ad}-value. To set the {\sf ad}-values of the independent
3090variables ADOL-C offers two possibilities:
3091\begin{itemize}
3092  \item Using the constructor
3093        \begin{center}
3094          \begin{tabular}{l}
3095            {\sf adouble x1(2,1), x2(4,0), y;}
3096          \end{tabular}
3097        \end{center}
3098        This would create three adoubles $x_1$, $x_2$ and $y$. Obviously, the latter
3099        remains uninitialized. In terms of function evaluation
3100        $x_1$ holds the value 2 and $x_2$ the value 4 whereas the derivative values
3101        are initialized to $\dot{x}_1=1$ and $\dot{x}_2=0$.
3102   \item Setting point values directly
3103         \begin{center}
3104           \begin{tabular}{l}
3105             {\sf adouble x1=2, x2=4, y;}\\
3106             ...\\
3107             {\sf x1.setADValue(1);}\\
3108             {\sf x2.setADValue(0);}
3109           \end{tabular}
3110         \end{center}
3111         The same example as above but now using {\sf setADValue}-method for initializing the derivative values.
3112\end{itemize}
3113%
3114The derivatives can be obtained at any time during the evaluation
3115process by calling the {\sf getADValue}-method
3116\begin{center}
3117  \begin{tabular}{l}
3118    {\sf adouble y;}\\
3119    ...\\
3120    {\sf cout $<<$ y.getADValue();}
3121  \end{tabular}
3122\end{center}
3123\autoref{fig:modcode} shows the resulting source code incorporating
3124all required changes for the example
3125given above.
3126
3127\begin{figure}[htb]
3128\framebox[\textwidth]{\parbox{\textwidth}{
3129%\begin{center}
3130\begin{tabbing}
3131\hspace*{-1cm} \= \kill
3132\> {\sf \#include $<$iostream$>$}\\
3133\> {\sf using namespace std;}\\
3134\> \\
3135\> {\sf \#define ADOLC\_TAPELESS}\\
3136\> {\sf \#include $<$adouble.h$>$}\\
3137\> {\sf typedef adtl::adouble adouble;}\\
3138\\
3139\> {\sf int main() \{}\\
3140\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3141\\
3142\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3143\> {\sf \rule{1cm}{0pt}...}\\
3144\\
3145\> {\sf \rule{0.5cm}{0pt}x[0].setADValue(1);\hspace*{3cm}// derivative of f with respect to $x_1$}\\
3146\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3147\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3148\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3149\\
3150\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3151\> {\sf \rule{0.5cm}{0pt}cout $<<$ "dy2/dx1 = " $<<$ y[1].getADValue() $<<$ endl;}\\
3152\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3153\> {\sf \}}
3154\end{tabbing}
3155%\end{center}
3156}}
3157\caption{Example for tapeless scalar forward mode}
3158\label{fig:modcode}
3159\end{figure}
3160%
3161\subsubsection*{The vector mode}
3162%
3163In scalar mode only one direction element has to be stored per {\sf
3164  adouble} whereas a field of $p$ elements is needed in the vector
3165  mode to cover the computations for the given $p$ directions. The
3166  resulting changes to the source code are described in this section.
3167
3168Similar to tapeless scalar forward mode, the tapeless vector forward
3169mode is used by defining {\sf ADOLC\_TAPELESS}. Furthermore, one has to define
3170an additional preprocessor macro named {\sf NUMBER\_DIRECTIONS}. This
3171macro takes the maximal number of directions to be used within the
3172resulting vector mode. Just as {\sf ADOLC\_TAPELESS} the new macro
3173must be defined before including the \verb#<adolc.h/adouble.h>#
3174header file since it is ignored otherwise.
3175
3176In many situations recompiling the source code to get a new number of
3177directions is at least undesirable. ADOL-C offers a function named
3178{\sf setNumDir} to work around this problem partially. Calling this
3179function, ADOL-C does not take the number of directions
3180from the macro {\sf NUMBER\_DIRECTIONS} but from the argument of
3181{\sf setNumDir}. A corresponding source code would contain the following lines: 
3182\begin{center}
3183  \begin{tabular}{l}
3184    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3185    ...\\
3186    {\sf adtl::setNumDir(5);}
3187  \end{tabular}
3188\end{center}
3189Note that using this function does not
3190change memory requirements that can be roughly determined by
3191({\sf NUMBER\_DIRECTIONS}$+1$)*(number of {\sf adouble}s).
3192
3193Compared to the scalar case setting and getting the derivative
3194values, i.e. the directions, is more involved. Instead of
3195working with single {\sf double} values, pointer to fields of {\sf
3196double}s are used as illustrated by the following example:
3197\begin{center}
3198  \begin{tabular}{l}
3199    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3200    ...\\
3201    {\sf adouble x, y;}\\
3202    {\sf double *ptr=new double[NUMBER\_DIRECTIONS];}\\
3203      ...\\
3204    {\sf x1=2;}\\
3205    {\sf x1.setADValue(ptr);}\\
3206    ...\\
3207    {\sf ptr=y.getADValue();}
3208  \end{tabular}
3209\end{center}
3210Additionally, the tapeless vector forward mode of ADOL-C offers two
3211new methods for setting/getting the derivative values. Similar
3212to the scalar case, {\sf double} values are used but due to the vector
3213mode the position of the desired vector element must be supplied in
3214the argument list:
3215\begin{center}
3216  \begin{tabular}{l}
3217    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3218    ...\\
3219    {\sf adouble x, y;}\\
3220    ...\\
3221    {\sf x1=2;}\\
3222    {\sf x1.setADValue(5,1);\hspace*{3.7cm}// set the 6th point value of x to 1.0}\\
3223      ...\\
3224    {\sf cout $<<$ y.getADValue(3) $<<$ endl;\hspace*{1cm}// print the 4th derivative value of y}
3225  \end{tabular}
3226\end{center}
3227The resulting source code containing all changes that are required is
3228shown in \autoref{fig:modcode2}
3229
3230\begin{figure}[!h!t!b]
3231\framebox[\textwidth]{\parbox{\textwidth}{
3232\begin{tabbing}
3233\hspace*{-1cm} \= \kill
3234\> {\sf \#include $<$iostream$>$}\\
3235\> {\sf  using namespace std;}\\
3236\\
3237\> {\sf \#define ADOLC\_TAPELESS}\\
3238\> {\sf \#define NUMBER\_DIRECTIONS 3}\\
3239\> {\sf \#include $<$adouble.h$>$}\\
3240\> {\sf typedef adtl::adouble adouble;}\\
3241\\
3242\> {\sf ADOLC\_TAPELESS\_UNIQUE\_INTERNALS;}\\
3243\\
3244\> {\sf int main() \{}\\
3245\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3246\\
3247\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3248\> {\sf \rule{1cm}{0pt}...\hspace*{3cm}// Initialize $x_i$}\\
3249\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j) if (i==j) x[i].setADValue(j,1);}\\
3250\> {\sf \rule{0.5cm}{0pt}\}}\\
3251\\
3252\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3253\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3254\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3255\\
3256\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3257\> {\sf \rule{0.5cm}{0pt}cout $<<$ "jacobian : " $<<$ endl;}\\
3258\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3259\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j)}\\
3260\> {\sf \rule{1.5cm}{0pt}cout $<<$ y[i].getADValue(j) $<<$ "  ";}\\
3261\> {\sf \rule{1cm}{0pt}cout $<<$ endl;}\\
3262\> {\sf \rule{0.5cm}{0pt}\}}\\
3263\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3264\> {\sf \}}
3265\end{tabbing}
3266}}
3267\caption{Example for tapeless vector forward mode}
3268\label{fig:modcode2}
3269\end{figure}
3270%
3271\subsection{Compiling and Linking the Source Code}
3272%
3273After incorporating the required changes, one has to compile the
3274source code and link the object files to get the executable.
3275As long as the ADOL-C header files are not included in the absolute path
3276the compile sequence should be similar to the following example:
3277\begin{center}
3278  \begin{tabular}{l}
3279    {\sf g++ -I/home/username/adolc\_base/include -c tapeless\_scalar.cpp}
3280  \end{tabular}
3281\end{center}
3282The \verb#-I# option tells the compiler where to search for the ADOL-C
3283header files. This option can be omitted when the headers are included
3284with absolute path or if ADOL-C is installed in a ``global'' directory.
3285
3286Since the tapeless forward version of ADOL-C is implemented in the
3287header \verb#adouble.h# as complete inline intended version,
3288the object files do not need to be linked against any external ADOL-C
3289code or the ADOL-C library. Therefore, the example started above could be finished with the
3290following command:
3291\begin{center}
3292  \begin{tabular}{l}
3293    {\sf g++ -o tapeless\_scalar tapeless\_scalar.o}
3294  \end{tabular}
3295\end{center}
3296The mentioned source codes {\sf tapeless\_scalar.c} and {\sf tapeless\_vector.c} 
3297illustrating the use of the for tapeless scalar and vector mode can be found in
3298the directory {\sf examples}.
3299%
3300\subsection{Concluding Remarks for the Tapeless Forward Mode Variant}
3301%
3302As many other AD methods the tapeless forward mode provided by the
3303ADOL-C package has its own strengths and drawbacks. Please read the
3304following section carefully to become familiar with the things that
3305can occur:
3306\begin{itemize}
3307  \item Advantages:
3308    \begin{itemize}
3309      \item Code speed\\
3310        Increasing computation speed was one of the main aspects in writing
3311        the tapeless code. In many cases higher performance can be
3312        expected this way.
3313      \item Easier linking process\\
3314        As another result from the code inlining the object code does
3315        not need to be linked against an ADOL-C library.
3316      \item Smaller overall memory requirements\\
3317        Tapeless ADOL-C does not write tapes anymore, as the name
3318        implies. Loop ''unrolling'' can be avoided this
3319        way. Considered main memory plus disk space as overall memory
3320        requirements the tapeless version can be
3321        executed in a more efficient way.
3322    \end{itemize}
3323  \item Drawbacks:
3324    \begin{itemize}
3325    \item Main memory limitations\\
3326      The ability to compute derivatives to a given function is
3327      bounded by the main memory plus swap size  when using
3328      tapeless ADOL-C. Computation from swap should be avoided anyway
3329      as far as possible since it slows down the computing time
3330      drastically. Therefore, if the program execution is 
3331      terminated without error message insufficient memory size can be
3332      the reason among other things. The memory requirements $M$ can
3333      be determined roughly as followed:
3334      \begin{itemize}
3335        \item Scalar mode: $M=$(number of {\sf adouble}s)$*2 + M_p$
3336        \item Vector mode: $M=$(number of {\sf adouble}s)*({\sf
3337          NUMBER\_DIRECTIONS}$+1) + M_p$ 
3338      \end{itemize}
3339      where the storage size of all non {\sf adouble} based variables is described by $M_p$.
3340    \item Compile time\\
3341      As discussed in the previous sections, the tapeless forward mode of
3342      the ADOL-C package is implemented as inline intended version. Using
3343      this approach results in a higher source code size, since every
3344      operation involving at least one {\sf adouble} stands for the
3345      operation itself as well as for the corresponding derivative
3346      code after the inlining process. Therefore, the compilation time
3347      needed for the tapeless version may be higher than that of the tape based code.
3348    \item Code Size\\
3349      A second drawback and result of the code inlining is the
3350      increase of code sizes for the binaries. The increase
3351      factor compared to the corresponding tape based program is
3352      difficult to quantify as it is task dependent. Practical results
3353      have shown that values between 1.0 and 2.0 can be
3354      expected. Factors higher than 2.0 are possible too and even
3355      values below 1.0 have been observed.
3356    \end{itemize}
3357\end{itemize}
3358%
3359%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3360\section{Installing and Using ADOL-C}
3361\label{install}
3362%
3363\subsection{Generating the ADOL-C Library}
3364\label{genlib}
3365%
3366The currently built system is best summarized by the ubiquitous gnu
3367install triplet
3368\begin{center}
3369\verb=configure - make - make install= .
3370\end{center}
3371Executing this three steps from the package base directory
3372\verb=</SOMEPATH/=\texttt{\packagetar}\verb=>= will compile the static and the dynamic
3373ADOL-C library with default options and install the package (libraries
3374and headers) into the default installation directory {\tt
3375  \verb=<=\$HOME/adolc\_base\verb=>=}. Inside the install directory
3376the subdirectory \verb=include= will contain all the installed header
3377files that may be included by the user program, the subdirectory
3378\verb=lib= will contain the 32-bit compiled library
3379and the subdirectory \verb=lib64= will contain the 64-bit compiled
3380library. Depending on the compiler only one of \verb=lib= or
3381\verb=lib64= may be created.
3382
3383Before doing so the user may modify the header file \verb=usrparms.h=
3384in order to tailor the \mbox{ADOL-C} package to the needs in the
3385particular system environment as discussed in
3386\autoref{Customizing}. The configure procedure which creates the necessary
3387\verb=Makefile=s can be customized by use of some switches. Available
3388options and their meaning can be obtained by executing
3389\verb=./configure --help= from the package base directory.
3390
3391All object files and other intermediately generated files can be
3392removed by the call \verb=make clean=. Uninstalling ADOL-C by
3393executing \verb=make uninstall= is only reasonable after a previous
3394called \verb=make install= and will remove all installed package files
3395but will leave the created directories behind.
3396
3397The sparse drivers are included in the ADOL-C libraries if the
3398\verb=./configure= command is executed with the option
3399\verb=--enable-sparse=. The ColPack library available at
3400\verb=http://www.cscapes.org/coloringpage/software.htm= is required to
3401compute the sparse structures, and is searched for in all the default
3402locations as well as in the subdirectory \verb=<ThirdParty/ColPack/>=.
3403In case the library and its headers are installed in a nonstandard path
3404this may be specified with the \verb?--with-colpack=PATH? option.
3405It is assumed that the library and its header files have the following
3406directory structure: \verb?PATH/include? contains all the header
3407files,
3408\verb?PATH/lib? contains the 32-bit compiled library and
3409\verb?PATH/lib64? contains the 64-bit compiled library. Depending on
3410the compiler used to compile {\sf ADOL-C} one of these libraries will
3411be used for linking.
3412 
3413\subsection{Compiling and Linking the Example Programs}
3414%
3415The installation procedure described in \autoref{genlib} also
3416provides the \verb=Makefile=s  to compile the example programs in the
3417directories \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= and the
3418additional examples in
3419\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples/additional_examples=. However,
3420one has to execute the
3421\verb=configure= command with  appropriate options for the ADOL-C package to enable the compilation of
3422examples. Available options are:
3423\begin{center}
3424\begin{tabular}[t]{ll}
3425\verb=--enable-docexa=&build all examples discussed in this manual\\
3426&(compare \autoref{example})\\
3427\verb=--enable-addexa=&build all additional examples\\
3428&(See file \verb=README= in the various subdirectories)
3429\end{tabular}
3430\end{center}
3431
3432Just calling \verb=make= from the packages base directory generates
3433all configured examples and the library if necessary. Compiling from
3434subdirectory \verb=examples= or one of its subfolders is possible
3435too. At least one kind of the ADOL-C library (static or shared) must
3436have been built previously in that case. Hence, building the library
3437is always the first step.
3438
3439For Compiling the library and the documented examples on Windows using
3440Visual Studio please refer to the \verb=<Readme_VC++.txt>= files in
3441the \verb=<windows/>=, \verb=<ThirdParty/ColPack/>= and
3442\verb=<ADOL-C/examples/>= subdirectories.
3443%
3444\subsection{Description of Important Header Files}
3445\label{ssec:DesIH}
3446%
3447The application of the facilities of ADOL-C requires the user
3448source code (program or module) to include appropriate
3449header files where the desired data types and routines are
3450prototyped. The new hierarchy of header files enables the user
3451to take one of two possible ways to access the right interfaces.
3452The first and easy way is recommended to beginners: As indicated in
3453\autoref{globalHeaders} the provided {\em global} header file
3454\verb=<adolc/adolc.h>= can be included by any user code to support all
3455capabilities of ADOL-C depending on the particular programming language
3456of the source.   
3457
3458\begin{table}[h]
3459\center \small
3460\begin{tabular}{|p{3.6cm}|p{10.5cm}|}\hline
3461\verb=<adolc/adolc.h>= & 
3462\begin{tabular*}{10.5cm}{cp{9.5cm}}
3463  \boldmath $\rightarrow$ \unboldmath
3464                 & global header file available for easy use of ADOL-C; \\
3465  $\bullet$      & includes all ADOL-C header files depending on
3466                   whether the users source is C++ or C code.
3467\end{tabular*}
3468\\ \hline
3469\verb=<adolc/usrparms.h>= &
3470\begin{tabular*}{10.5cm}{cp{9.5cm}}
3471  \boldmath $\rightarrow$ \unboldmath
3472                 & user customization of ADOL-C package (see
3473                   \autoref{Customizing}); \\
3474  $\bullet$      & after a change of
3475                   user options the ADOL-C library \verb=libadolc.*=
3476                   has to be rebuilt (see \autoref{genlib}); \\
3477  $\bullet$      & is included by all ADOL-C header files and thus by all user
3478                   programs.
3479\end{tabular*} \\ \hline
3480\end{tabular}
3481\caption{Global header files}
3482\label{globalHeaders}
3483\end{table} 
3484
3485The second way is meant for the more advanced ADOL-C user: Some source code
3486includes only those interfaces used by the particular application.
3487The respectively needed header files are indicated
3488throughout the manual.
3489Existing application determined dependences between the provided
3490ADOL-C routines are realized by automatic includes of headers in order
3491to maintain easy use. The header files important to the user are described
3492in the \autoref{importantHeaders1} and \autoref{importantHeaders2}.
3493
3494\begin{table}[h]
3495\center \small
3496\begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline
3497%\multicolumn{2}{|l|}{\bf Tracing/taping}\\ \hline
3498\verb=<adolc/adouble.h>= & 
3499\begin{tabular*}{10.5cm}{cp{9.5cm}}
3500  \boldmath $\rightarrow$ \unboldmath
3501                & provides the interface to the basic active
3502                  scalar data type of ADOL-C: {\sf class adouble} 
3503                  (see \autoref{prepar});
3504%  $\bullet$     & includes the header files \verb=<adolc/avector.h>= and \verb=<adolc/taputil.h>=.
3505\end{tabular*}
3506\\ \hline
3507% \verb=<adolc/avector.h>= &
3508%\begin{tabular*}{10.5cm}{cp{9.5cm}}
3509%  \boldmath $\rightarrow$ \unboldmath
3510%                & provides the interface to the active vector
3511%                  and matrix data types of ADOL-C: {\sf class adoublev}
3512%                  and {\sf class adoublem}, respectively
3513%                   (see \autoref{arrays}); \\
3514%  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
3515%\end{tabular*}
3516%\\ \hline
3517 \verb=<adolc/taputil.h>= & 
3518\begin{tabular*}{10.5cm}{cp{9.5cm}}
3519  \boldmath $\rightarrow$ \unboldmath
3520                & provides functions to start/stop the tracing of
3521                  active sections (see \autoref{markingActive})
3522                  as well as utilities to obtain
3523                  tape statistics (see \autoref{examiningTape}); \\
3524  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
3525\end{tabular*}
3526\\ \hline
3527\end{tabular}
3528\caption{Important header files: tracing/taping}
3529\label{importantHeaders1}
3530\end{table} 
3531%
3532\begin{table}[h]
3533\center \small
3534\begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline
3535%\multicolumn{2}{|l|}{\bf Evaluation of derivatives}\\ \hline
3536\verb=<adolc/interfaces.h>= & 
3537\begin{tabular*}{10.5cm}{cp{9.5cm}}
3538  \boldmath $\rightarrow$ \unboldmath
3539                & provides interfaces to the {\sf forward} and
3540                  {\sf reverse} routines as basic versions of derivative
3541                  evaluation (see \autoref{forw_rev}); \\
3542  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
3543  $\bullet$     & includes the header \verb=<adolc/sparse/sparsedrivers.h>=; \\
3544  $\bullet$     & is included by the header \verb=<adolc/drivers/odedrivers.h>=.
3545\end{tabular*}
3546\\ \hline
3547\verb=<adolc/drivers.h>= & 
3548\begin{tabular*}{10.5cm}{cp{9.5cm}}
3549  \boldmath $\rightarrow$ \unboldmath
3550                & provides ``easy to use'' drivers for solving
3551                  optimization problems and nonlinear equations
3552                  (see \autoref{optdrivers}); \\
3553  $\bullet$     & comprises C and Fortran-callable versions.
3554\end{tabular*}
3555\\ \hline
3556\begin{minipage}{3cm}
3557\verb=<adolc/sparse/=\newline\verb= sparsedrivers.h>=
3558\end{minipage}  & 
3559\begin{tabular*}{10.5cm}{cp{9.5cm}}
3560  \boldmath $\rightarrow$ \unboldmath
3561                & provides the ``easy to use'' sparse drivers
3562                  to exploit the sparsity structure of
3563                  Jacobians (see \autoref{sparse}); \\
3564  \boldmath $\rightarrow$ \unboldmath & provides interfaces to \mbox{C++}-callable versions
3565                  of {\sf forward} and {\sf reverse} routines
3566                  propagating bit patterns (see \autoref{ProBit}); \\
3567
3568  $\bullet$     & is included by the header \verb=<adolc/interfaces.h>=.
3569\end{tabular*}
3570\\ \hline
3571\begin{minipage}{3cm}
3572\verb=<adolc/sparse/=\newline\verb= sparse_fo_rev.h>=
3573\end{minipage}  & 
3574\begin{tabular*}{10.5cm}{cp{9.5cm}}
3575  \boldmath $\rightarrow$ \unboldmath
3576                & provides interfaces to the underlying C-callable
3577                  versions of {\sf forward} and {\sf reverse} routines
3578                  propagating bit patterns.
3579\end{tabular*}
3580\\ \hline
3581\begin{minipage}{3cm}
3582\verb=<adolc/drivers/=\newline\verb= odedrivers.h>=
3583\end{minipage}  &
3584\begin{tabular*}{10.5cm}{cp{9.5cm}}
3585  \boldmath $\rightarrow$ \unboldmath
3586                & provides ``easy to use'' drivers for numerical
3587                  solution of ordinary differential equations
3588                  (see \autoref{odedrivers}); \\
3589  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
3590  $\bullet$     & includes the header \verb=<adolc/interfaces.h>=.
3591\end{tabular*}
3592\\ \hline
3593\begin{minipage}{3cm}
3594\verb=<adolc/drivers/=\newline\verb= taylor.h>=
3595\end{minipage}  &
3596\begin{tabular*}{10.5cm}{cp{9.5cm}}
3597  \boldmath $\rightarrow$ \unboldmath
3598                & provides ``easy to use'' drivers for evaluation
3599                  of higher order derivative tensors (see
3600                  \autoref{higherOrderDeriv}) and inverse/implicit function
3601                  differentiation (see \autoref{implicitInverse});\\
3602  $\bullet$     & comprises C++ and C-callable versions.
3603\end{tabular*} 
3604\\ \hline
3605\verb=<adolc/adalloc.h>= &
3606\begin{tabular*}{10.5cm}{cp{9.5cm}}
3607  \boldmath $\rightarrow$ \unboldmath
3608                & provides C++ and C functions for allocation of
3609                  vectors, matrices and three dimensional arrays
3610                  of {\sf double}s.
3611\end{tabular*}
3612\\ \hline
3613\end{tabular}
3614\caption{Important header files: evaluation of derivatives}
3615\label{importantHeaders2}
3616\end{table} 
3617%
3618\subsection{Compiling and Linking C/C++ Programs}
3619%
3620To compile a C/C++ program or single module using ADOL-C
3621data types and routines one has to ensure that all necessary
3622header files according to \autoref{ssec:DesIH} are
3623included. All modules involving {\em active} data types as
3624{\sf adouble}
3625%, {\bf adoublev} and {\bf adoublem}
3626have to be compiled as C++. Modules that make use of a previously
3627generated tape to evaluate derivatives can either be programmed in ANSI-C
3628(while avoiding all C++ interfaces) or in C++. Depending
3629on the chosen programming language the header files provide
3630the right ADOL-C prototypes.
3631For linking the resulting object codes the library \verb=libadolc.*=
3632must be used (see \autoref{genlib}).
3633%
3634\subsection{Adding Quadratures as Special Functions}
3635%
3636\label{quadrat}
3637%
3638Suppose an integral
3639\[ f(x) = \int\limits^{x}_{0} g(t) dt \]
3640is evaluated numerically by a user-supplied function
3641\begin{center}
3642{\sf  double  myintegral(double\& x);}
3643\end{center}
3644Similarly, let us suppose that the integrand itself is evaluated by
3645a user-supplied block of C code {\sf integrand}, which computes a
3646variable with the fixed name {\sf val} from a variable with the fixed
3647name {\sf arg}. In many cases of interest, {\sf integrand} will
3648simply be of the form
3649\begin{center}
3650{\sf \{ val = expression(arg) \}}\enspace .
3651\end{center}
3652In general, the final assignment to {\sf val} may be preceded
3653by several intermediate calculations, possibly involving local
3654active variables of type {\sf adouble}, but no external or static
3655variables of that type.  However, {\sf integrand} may involve local
3656or global variables of type {\sf double} or {\sf int}, provided they
3657do not depend on the value of {\sf arg}. The variables {\sf arg} and
3658{\sf val} are declared automatically; and as {\sf integrand} is a block
3659rather than a function, {\sf integrand} should have no header line. 
3660
3661Now the function {\sf myintegral} can be overloaded for {\sf adouble}
3662arguments and thus included in the library of elementary functions
3663by the following modifications:
3664\begin{enumerate}
3665\item
3666At the end of the file \verb=<adouble.cpp>=, include the full code
3667defining \\ {\sf double myintegral(double\& x)}, and add the line
3668\begin{center}
3669{\sf extend\_quad(myintegral, integrand); }
3670\end{center}
3671This macro is extended to the definition of
3672 {\sf adouble myintegral(adouble\& arg)}.
3673Then remake the library \verb=libadolc.*= (see \autoref{genlib}).
3674\item
3675In the definition of the class
3676{\sf ADOLC\_DLL\_EXPORT adouble} in \verb=<adolc/adouble.h>=, add the statement
3677\begin{center}
3678{\sf friend adouble myintegral(adouble\&)}.
3679\end{center}
3680\end{enumerate}
3681In the first modification, {\sf myintegral} represents the name of the
3682{\sf double} function, whereas {\sf integrand} represents the actual block
3683of C code.
3684
3685For example, in case of the inverse hyperbolic cosine, we have
3686{\sf myintegral} = {\sf acosh}. Then {\sf integrand} can be written as
3687{\sf \{ val = sqrt(arg*arg-1); \}} 
3688so that the line
3689\begin{center}
3690{\sf extend\_quad(acosh,val = sqrt(arg*arg-1));} 
3691\end{center}
3692can be added to the file \verb=<adouble.cpp>=.
3693A mathematically equivalent but longer representation of
3694{\sf integrand} is
3695\begin{center}
3696\begin{tabbing}
3697{\sf \{ }\hspace{1.0in}\= {\sf  \{ adouble} \= temp =   \kill
3698 \>{\sf  \{ adouble} \> {\sf temp = arg;} \\
3699 \> \ \> {\sf  temp = temp*temp; } \\ 
3700 \> \ \> {\sf  val = sqrt(temp-1); \}} 
3701\end{tabbing}
3702\end{center} 
3703The code block {\sf integrand} may call on any elementary function that has already
3704been defined in file \verb=<adouble.cpp>=, so that one may also introduce
3705iterated integrals.
3706%
3707%
3708\section{Example Codes}
3709\label{example}
3710%
3711The following listings are all simplified versions of codes that
3712are contained in the example subdirectory
3713\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= of ADOL-C. In particular,
3714we have left out timings, which are included in the complete codes.
3715%
3716\subsection{Speelpenning's Example ({\tt speelpenning.cpp})}
3717%
3718The first example evaluates the gradient and the Hessian of
3719the function
3720\[ 
3721y \; = \; f(x)\; =\; \prod_{i=0}^{n-1} x_i
3722\] 
3723using the appropriate drivers {\sf gradient} and {\sf hessian}.
3724
3725\begin{verbatim}
3726#include <adolc/adouble.h>               // use of active doubles and taping
3727#include <adolc/drivers/drivers.h>       // use of "Easy to Use" drivers
3728                                   // gradient(.) and hessian(.)
3729#include <adolc/taping.h>                // use of taping
3730...
3731void main() {
3732int n,i,j,tape_stats[11];
3733cout << "SPEELPENNINGS PRODUCT (ADOL-C Documented Example) \n";
3734cout << "number of independent variables = ?  \n";
3735cin >> n;
3736double* xp = new double[n];         
3737double  yp = 0.0;
3738adouble* x = new adouble[n];     
3739adouble  y = 1;
3740for(i=0;i<n;i++)
3741  xp[i] = (i+1.0)/(2.0+i);         // some initialization
3742trace_on(1);                       // tag =1, keep=0 by default
3743  for(i=0;i<n;i++) {
3744    x[i] <<= xp[i]; y *= x[i]; }     
3745  y >>= yp;
3746  delete[] x;                     
3747trace_off();
3748tapestats(1,tape_stats);           // reading of tape statistics
3749cout<<"maxlive "<<tape_stats[2]<<"\n";
3750...                                // ..... print other tape stats
3751double* g = new double[n];       
3752gradient(1,n,xp,g);                // gradient evaluation
3753double** H=(double**)malloc(n*sizeof(double*));
3754for(i=0;i<n;i++)
3755  H[i]=(double*)malloc((i+1)*sizeof(double));
3756hessian(1,n,xp,H);                 // H equals (n-1)g since g is
3757double errg = 0;                   // homogeneous of degree n-1.
3758double errh = 0;
3759for(i=0;i<n;i++)
3760  errg += fabs(g[i]-yp/xp[i]);     // vanishes analytically.
3761for(i=0;i<n;i++) {
3762  for(j=0;j<n;j++) {
3763    if (i>j)                       // lower half of hessian
3764      errh += fabs(H[i][j]-g[i]/xp[j]); } }
3765cout << yp-1/(1.0+n) << " error in function \n";
3766cout << errg <<" error in gradient \n";
3767cout << errh <<" consistency check \n";
3768}                                  // end main
3769\end{verbatim}
3770%
3771\subsection{Power Example ({\tt powexam.cpp})}
3772%
3773The second example function evaluates the $n$-th power of a real
3774variable $x$ in
3775$\log_2 n$ multiplications by recursive halving of the exponent. Since
3776there is only one independent variable, the scalar derivative can be
3777computed by
3778using both {\sf forward} and {\sf reverse}, and the
3779results are subsequently compared.
3780\begin{verbatim}
3781#include <adolc/adolc.h>                 // use of ALL ADOL-C interfaces
3782
3783adouble power(adouble x, int n) {
3784adouble z=1;
3785if (n>0) {                         // recursion and branches
3786  int nh =n/2;                     // that do not depend on
3787  z = power(x,nh);                 // adoubles are fine !!!!
3788  z *= z;
3789  if (2*nh != n)
3790    z *= x;
3791  return z; }                      // end if
3792else {
3793  if (n==0)                        // the local adouble z dies
3794    return z;                      // as it goes out of scope.
3795  else
3796    return 1/power(x,-n); }        // end else
3797} // end power
3798\end{verbatim}
3799The function {\sf power} above was obtained from the original
3800undifferentiated version by simply changing the type of all
3801{\sf double}s including the return variable to {\sf adouble}s. The new version
3802can now be called from within any active section, as in the following
3803main program.
3804\begin{verbatim}
3805#include ...                       // as above
3806int main() {
3807int i,n,tag=1;
3808cout <<"COMPUTATION OF N-TH POWER (ADOL-C Documented Example)\n\n";
3809cout<<"monomial degree=? \n";      // input the desired degree
3810cin >> n;
3811                                   // allocations and initializations
3812double* Y[1];
3813*Y = new double[n+2];
3814double* X[1];                      // allocate passive variables with
3815*X = new double[n+4];              // extra dimension for derivatives
3816X[0][0] = 0.5;                     // function value = 0. coefficient
3817X[0][1] = 1.0;                     // first derivative = 1. coefficient
3818for(i=0;i<n+2;i++)
3819  X[0][i+2]=0;                     // further coefficients
3820double* Z[1];                      // used for checking consistency
3821*Z = new double[n+2];              // between forward and reverse
3822adouble y,x;                       // declare active variables
3823                                   // beginning of active section
3824trace_on(1);                       // tag = 1 and keep = 0
3825x <<= X[0][0];                     // only one independent var
3826y = power(x,n);                    // actual function call
3827y >>= Y[0][0];                     // only one dependent adouble
3828trace_off();                       // no global adouble has died
3829                                   // end of active section
3830double u[1];                       // weighting vector
3831u[0]=1;                            // for reverse call
3832for(i=0;i<n+2;i++) {               // note that keep = i+1 in call
3833  forward(tag,1,1,i,i+1,X,Y);      // evaluate the i-the derivative
3834  if (i==0)
3835    cout << Y[0][i] << " - " << y.value() << " = " << Y[0][i]-y.value()
3836    << " (should be 0)\n";
3837  else
3838    cout << Y[0][i] << " - " << Z[0][i] << " = " << Y[0][i]-Z[0][i]
3839    << " (should be 0)\n";
3840  reverse(tag,1,1,i,u,Z);          // evaluate the (i+1)-st derivative
3841  Z[0][i+1]=Z[0][i]/(i+1); }       // scale derivative to Taylorcoeff.
3842return 1;
3843}                                  // end main
3844\end{verbatim}
3845Since this example has only one independent and one dependent variable,
3846{\sf forward} and {\sf reverse} have the same complexity and calculate
3847the same scalar derivatives, albeit with a slightly different scaling.
3848By replacing the function {\sf power} with any other univariate test function,
3849one can check that {\sf forward} and {\sf reverse} are at least consistent.
3850In the following example the number of independents is much larger
3851than the number of dependents, which makes the reverse mode preferable.
3852%
3853\subsection{Determinant Example ({\tt detexam.cpp})}
3854%
3855Now let us consider an exponentially expensive calculation,
3856namely, the evaluation of a determinant by recursive expansion
3857along rows. The gradient of the determinant with respect to the
3858matrix elements is simply the adjoint, i.e. the matrix of cofactors.
3859Hence the correctness of the numerical result is easily checked by
3860matrix-vector multiplication. The example illustrates the use
3861of {\sf adouble} arrays and pointers. 
3862
3863\begin{verbatim}
3864#include <adolc/adouble.h>               // use of active doubles and taping
3865#include <adolc/interfaces.h>            // use of basic forward/reverse
3866                                   // interfaces of ADOL-C
3867adouble** A;                       // A is an n x n matrix
3868int i,n;                           // k <= n is the order
3869adouble det(int k, int m) {        // of the sub-matrix
3870if (m == 0) return 1.0 ;           // its column indices
3871else {                             // are encoded in m
3872  adouble* pt = A[k-1];
3873  adouble t = zero;                // zero is predefined
3874  int s, p =1;
3875  if (k%2) s = 1; else s = -1;
3876  for(i=0;i<n;i++) {
3877    int p1 = 2*p;
3878    if (m%p1 >= p) {
3879      if (m == p) {
3880        if (s>0) t += *pt; else t -= *pt; }
3881      else {
3882        if (s>0)
3883          t += *pt*det(k-1,m-p);   // recursive call to det
3884        else
3885          t -= *pt*det(k-1,m-p); } // recursive call to det
3886      s = -s;}
3887    ++pt;
3888    p = p1;}
3889  return t; }
3890}                                  // end det
3891\end{verbatim}
3892As one can see, the overloading mechanism has no problem with pointers
3893and looks exactly the same as the original undifferentiated function
3894except for the change of type from {\sf double} to {\sf adouble}.
3895If the type of the temporary {\sf t} or the pointer {\sf pt} had not been changed,
3896a compile time error would have resulted. Now consider a corresponding
3897calling program.
3898
3899\begin{verbatim}
3900#include ...                       // as above
3901int main() {
3902int i,j, m=1,tag=1,keep=1;
3903cout << "COMPUTATION OF DETERMINANTS (ADOL-C Documented Example)\n\n";
3904cout << "order of matrix = ? \n";  // select matrix size
3905cin >> n;
3906A = new adouble*[n];             
3907trace_on(tag,keep);                // tag=1=keep
3908  double detout=0.0, diag = 1.0;   // here keep the intermediates for
3909  for(i=0;i<n;i++) {               // the subsequent call to reverse
3910    m *=2;
3911    A[i] = new adouble[n];         // not needed for adoublem
3912    adouble* pt = A[i];
3913    for(j=0;j<n;j++)
3914      A[i][j] <<= j/(1.0+i);       // make all elements of A independent
3915    diag += A[i][i].value();        // value() converts to double
3916    A[i][i] += 1.0; }
3917  det(n,m-1) >>= detout;           // actual function call
3918  printf("\n %f - %f = %f  (should be 0)\n",detout,diag,detout-diag);
3919trace_off();
3920double u[1];
3921u[0] = 1.0;
3922double* B = new double[n*n];
3923reverse(tag,1,n*n,1,u,B);
3924cout <<" \n first base? : ";
3925for (i=0;i<n;i++) {
3926  adouble sum = 0;
3927  for (j=0;j<n;j++)                // the matrix A times the first n
3928    sum += A[i][j]*B[j];           // components of the gradient B
3929  cout<<sum.value()<<" "; }         // must be a Cartesian basis vector
3930return 1;
3931}                                  // end main
3932\end{verbatim}
3933The variable {\sf diag} should be mathematically
3934equal to the determinant, because the
3935matrix {\sf A} is defined as a rank 1 perturbation of the identity.
3936%
3937\subsection{Ordinary Differential Equation Example ({\tt odexam.cpp})}
3938\label{exam:ode}
3939%
3940Here, we consider a nonlinear ordinary differential equation that
3941is a slight modification of the Robertson test problem
3942given in Hairer and Wanner's book on the numerical solution of
3943ODEs \cite{HW}. The following source code computes the corresponding
3944values of $y^{\prime} \in \R^3$:
3945\begin{verbatim}
3946#include <adolc/adouble.h>                  // use of active doubles and taping
3947#include <adolc/drivers/odedrivers.h>       // use of "Easy To use" ODE drivers
3948#include <adolc/adalloc.h>                  // use of ADOL-C allocation utilities
3949
3950void tracerhs(short int tag, double* py, double* pyprime) {
3951adoublev y(3);                        // this time we left the parameters
3952adoublev yprime(3);                   // passive and use the vector types
3953trace_on(tag);
3954y <<= py;                             // initialize and mark independents
3955yprime[0] = -sin(y[2]) + 1e8*y[2]*(1-1/y[0]);
3956yprime[1] = -10*y[0] + 3e7*y[2]*(1-y[1]);
3957yprime[2] = -yprime[0] - yprime[1];
3958yprime >>= pyprime;                   // mark and pass dependents
3959trace_off(tag);
3960}                                     // end tracerhs
3961\end{verbatim}
3962The Jacobian of the right-hand side has large
3963negative eigenvalues, which make the ODE quite stiff. We  have added
3964some numerically benign transcendentals to make the differentiation
3965more interesting.
3966The following main program uses {\sf forode} to calculate the Taylor series
3967defined by the ODE at the given point $y_0$ and {\sf reverse} as well
3968as {\sf accode} to compute the Jacobians of the coefficient vectors
3969with respect to $x_0$.
3970\begin{verbatim}
3971#include .......                   // as above
3972int main() {
3973int i,j,deg; 
3974int n=3;
3975double py[3];
3976double pyp[3];
3977cout << "MODIFIED ROBERTSON TEST PROBLEM (ADOL-C Documented Example)\n";
3978cout << "degree of Taylor series =?\n";
3979cin >> deg;
3980double **X;
3981X=(double**)malloc(n*sizeof(double*));
3982for(i=0;i<n;i++)
3983  X[i]=(double*)malloc((deg+1)*sizeof(double));
3984double*** Z=new double**[n];
3985double*** B=new double**[n];
3986short** nz = new short*[n];
3987for(i=0;i<n;i++) {
3988  Z[i]=new double*[n];
3989  B[i]=new double*[n];
3990  for(j=0;j<n;j++) {
3991    Z[i][j]=new double[deg];
3992    B[i][j]=new double[deg]; }     // end for
3993}                                  // end for
3994for(i=0;i<n;i++) {
3995  py[i] = (i == 0) ? 1.0 : 0.0;    // initialize the base point
3996  X[i][0] = py[i];                 // and the Taylor coefficient;
3997  nz[i] = new short[n]; }          // set up sparsity array
3998tracerhs(1,py,pyp);                // trace RHS with tag = 1
3999forode(1,n,deg,X);                 // compute deg coefficients
4000reverse(1,n,n,deg-1,Z,nz);         // U defaults to the identity
4001accode(n,deg-1,Z,B,nz);
4002cout << "nonzero pattern:\n";
4003for(i=0;i<n;i++) {
4004  for(j=0;j<n;j++)
4005    cout << nz[i][j]<<"\t";
4006  cout <<"\n"; }                   // end for
4007return 1;
4008}                                  // end main
4009\end{verbatim}
4010\noindent The pattern {\sf nz} returned by {\sf accode} is
4011\begin{verbatim}
4012              3  -1   4
4013              1   2   2
4014              3   2   4
4015\end{verbatim}
4016The original pattern {\sf nz} returned by {\sf reverse} is the same
4017except that the negative entry $-1$ was zero.
4018%
4019%\subsection {Gaussian Elimination Example ({\tt gaussexam.cpp})}
4020%\label{gaussexam}
4021%
4022%The following example uses conditional assignments to show the usage of a once produced tape
4023%for evaluation at new arguments. The elimination is performed with
4024%column pivoting.
4025%\begin{verbatim}
4026%#include <adolc/adolc.h>           // use of ALL ADOL-C interfaces
4027
4028%void gausselim(int n, adoublem& A, adoublev& bv) {
4029%along i;                           // active integer declaration
4030%adoublev temp(n);                  // active vector declaration
4031%adouble r,rj,temps;
4032%int j,k;
4033%for(k=0;k<n;k++) {                 // elimination loop
4034%  i = k;
4035%  r = fabs(A[k][k]);               // initial pivot size
4036%  for(j=k+1;j<n;j++) {
4037%    rj = fabs(A[j][k]);             
4038%    condassign(i,rj-r,j);          // look for a larger element in the same
4039%    condassign(r,rj-r,rj); }       // column with conditional assignments
4040%  temp = A[i];                     // switch rows using active subscripting
4041%  A[i] = A[k];                     // necessary even if i happens to equal
4042%  A[k] = temp;                     // k during taping
4043%  temps = bv[i];
4044%  bv[i]=bv[k];
4045%  bv[k]=temps;
4046%  if (!value(A[k][k]))             // passive subscripting
4047%    exit(1);                       // matrix singular!
4048%  temps= A[k][k];
4049%  A[k] /= temps;
4050%  bv[k] /= temps;
4051%  for(j=k+1;j<n;j++) {
4052%    temps= A[j][k];
4053%    A[j] -= temps*A[k];            // vector operations
4054%    bv[j] -= temps*bv[k]; }        // endfor
4055%}                                  // end elimination loop
4056%temp=0.0;
4057%for(k=n-1;k>=0;k--)                // backsubstitution
4058%  temp[k] = (bv[k]-(A[k]*temp))/A[k][k];
4059%bv=temp;
4060%}                                  // end gausselim
4061%\end{verbatim}
4062%\noindent This function can be called from any program
4063%that suitably initializes
4064%the components of {\sf A} and {\sf bv}
4065%as independents. The resulting tape can be
4066%used to solve any nonsingular linear system of the same size and
4067%to get the sensitivities of the solution with respect to the
4068%system matrix and the right hand side.
4069%\vspace*{-4mm}
4070%
4071\section*{Acknowledgements}
4072%
4073Parts of the ADOL-C source were developed by Andreas
4074Kowarz, Hristo Mitev, Sebastian Schlenkrich,  and Olaf
4075Vogel. We are also indebted to George Corliss,
4076Tom Epperly, Bruce Christianson, David Gay,  David Juedes,
4077Brad Karp, Koichi Kubota, Bob Olson,  Marcela Rosemblun, Dima
4078Shiriaev, Jay Srinivasan, Chuck Tyner, Jean Utke, and Duane Yoder for helping in
4079various ways with the development and documentation of ADOL-C.
4080%
4081\begin{thebibliography}{10}
4082
4083\bibitem{BeKh96}
4084Christian~H. Bischof, Peyvand~M. Khademi, Ali Bouaricha and Alan Carle.
4085\newblock {\em Efficient computation of gradients and Jacobians by dynamic
4086  exploitation of sparsity in automatic differentiation}.
4087\newblock Optimization Methods and Software 7(1):1-39, 1996.
4088
4089\bibitem{Chri91a}
4090Bruce Christianson.
4091\newblock {\em Reverse accumulation and accurate rounding error estimates for
4092Taylor series}.
4093\newblock  Optimization Methods and Software 1:81--94, 1992.
4094
4095\bibitem{GeMaPo05}
4096Assefaw Gebremedhin, Fredrik Manne, and Alex Pothen.
4097\newblock {\em What color is your {J}acobian? {G}raph coloring for computing
4098  derivatives}.
4099\newblock SIAM Review 47(4):629--705, 2005.
4100
4101\bibitem{GePoTaWa06}
4102Assefaw Gebremedhin, Alex Pothen, Arijit Tarafdar and Andrea Walther.
4103{\em Efficient Computation of Sparse Hessians: An Experimental Study
4104  using ADOL-C}. Tech. Rep. (2006). To appear in INFORMS Journal on Computing.
4105
4106\bibitem{GePoWa08} Assefaw Gebremedhin, Alex Pothen, and Andrea
4107  Walther.
4108{\em Exploiting  Sparsity  in Jacobian Computation via Coloring and Automatic Differentiation:
4109a Case Study in a Simulated Moving Bed Process}.
4110In Chr. Bischof et al., eds.,  {\em Proceedings AD 2008 conference}, LNCSE 64, pp. 327 -- 338, Springer (2008).
4111
4112\bibitem{GeTaMaPo07}
4113Assefaw Gebremedhin, Arijit Tarafdar, Fredrik Manne, and Alex Pothen,
4114{\em New Acyclic and Star Coloring Algorithms with Applications to Hessian Computation}.
4115SIAM Journal on Scientific Computing 29(3):1042--1072, 2007.
4116
4117
4118\bibitem{GrWa08}
4119Andreas Griewank and Andrea Walther: {\em Evaluating Derivatives, Principles and Techniques of
4120  Algorithmic Differentiation. Second edition}. SIAM, 2008.
4121
4122
4123\bibitem{Griewank97}
4124Andreas Griewank, Jean Utke, and Andrea Walther.
4125\newblock {\em Evaluating higher derivative tensors by forward propagation
4126          of univariate Taylor series}.
4127\newblock Mathematics of Computation, 69:1117--1130, 2000.
4128
4129\bibitem{GrWa00} 
4130Andreas Griewank and Andrea Walther. {\em Revolve: An Implementation of Checkpointing for the Reverse
4131                 or Adjoint Mode of Computational Differentiation},
4132                 ACM Transaction on Mathematical Software 26:19--45, 2000.
4133
4134\bibitem{HW}
4135    Ernst Hairer and Gerhard Wanner.
4136    {\it Solving Ordinary Differential Equations II.\/}
4137    Springer-Verlag, Berlin, 1991.
4138
4139\bibitem{Knuth73}
4140Donald~E. Knuth.
4141\newblock {\em The Art of Computer Programming. Second edition.}
4142\newblock Addison-Wesley, Reading, 1973.
4143
4144\bibitem{Wa05a}
4145Andrea Walther.
4146\newblock {\em Computing Sparse Hessians with Automatic Differentiation}.
4147\newblock Transaction on Mathematical Software, 34(1), Artikel 3 (2008).
4148\end{thebibliography}
4149\end{document}
4150
Note: See TracBrowser for help on using the repository browser.