source: stable/2.4/ADOL-C/doc/adolc-manual.tex @ 417

Last change on this file since 417 was 379, checked in by kulshres, 6 years ago

include advance branching in manual and --help text for configure

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

File size: 196.5 KB
Line 
1% Latex file containing the documentation of ADOL-C
2%
3% Copyright (C) Andrea Walther, Andreas Griewank, Andreas Kowarz,
4%               Hristo Mitev, Sebastian Schlenkrich, Jean Utke, Olaf Vogel
5%
6% This file is part of ADOL-C. This software is provided as open source.
7% Any use, reproduction, or distribution of the software constitutes
8% recipient's acceptance of the terms of the accompanying license file.
9%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
10
11\documentclass[11pt,twoside]{article}
12\usepackage{hyperref}
13\usepackage{amsmath,amsthm,amssymb}
14\usepackage{graphicx}
15\usepackage{datetime}
16
17\newdateformat{monthyear}{\monthname\ \THEYEAR}
18
19\usepackage{color}
20
21\pagestyle{headings} 
22\bibliographystyle{plain}
23\parskip=6pt
24
25\setlength{\textwidth}{433.6pt}
26\setlength{\oddsidemargin}{23pt}
27\setlength{\evensidemargin}{23pt}
28\setlength{\topmargin}{25.0pt}
29\setlength{\textheight}{580pt}
30\setlength{\baselineskip}{8pt}
31
32\newcommand{\N}{{ {\rm I} \kern -.225em {\rm N} }}
33\newcommand{\R}{{ {\rm I} \kern -.225em {\rm R} }}
34\newcommand{\T}{{ {\rm I} \kern -.425em {\rm T} }}
35
36\renewcommand{\sectionautorefname}{Section}
37\renewcommand{\subsectionautorefname}{Section}
38\renewcommand{\figureautorefname}{Figure}
39\renewcommand{\tableautorefname}{Table}
40
41\setcounter{equation}{0}
42
43\input{version.tex}
44
45%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
46\begin{document}
47
48\begin{titlepage}
49\begin{center}
50{\Large {\bf ADOL-C:}} 
51\footnote{The development of earlier versions was supported by the Office of
52  Scientific Computing, U.S. Department of Energy, the NSF, and the Deutsche
53  Forschungsgemeinschaft. During the development of the current
54  version Andrea Walther and Andreas Kowarz were supported by the
55  grant Wa 1607/2-1 of the Deutsche Forschungsgemeinschaft} 
56\vspace{0.2in} \\
57%
58{\Large A Package for the Automatic Differentiation}\vspace{0.1in} \\
59{\Large of Algorithms Written in C/C++}\\
60\vspace{.2in}
61{\large\bf  Version \packageversion, \monthyear\today} \\
62\bigskip
63 \mbox{Andrea Walther}\footnote{Institute of Mathematics, University
64   of Paderborn, 33098 Paderborn, Germany} and
65 \mbox{Andreas Griewank}\footnote{Department of Mathematics,
66 Humboldt-Universit\"at zu Berlin, 10099 Berlin, Germany}
67\end{center}
68%
69%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
70\begin{abstract}
71The C++ package ADOL-C described here facilitates the evaluation of
72first and higher derivatives of vector functions that are defined
73by computer programs written in C or C++. The resulting derivative
74evaluation routines may be called from C, C++, Fortran, or any other
75language that can be linked with C.
76
77The numerical values of derivative vectors are obtained free
78of truncation errors at a small multiple of the run time and
79random access memory required by the given function evaluation program.
80Derivative matrices are obtained by columns, by rows or in sparse format.
81For solution curves defined by ordinary differential equations,
82special routines are provided that evaluate the Taylor coefficient vectors
83and their Jacobians with respect to the current state vector.
84For explicitly or implicitly defined functions derivative tensors are
85obtained with a complexity that grows only quadratically in their
86degree. The derivative calculations involve a possibly substantial but
87always predictable amount of data. Since the data is accessed strictly sequentially
88it can be automatically paged out to external files.
89\end{abstract}
90%
91{\bf Keywords}: Computational Differentiation, Automatic
92         Differentiation,
93         Chain Rule, Overloading, Taylor Coefficients,
94         Gradients, Hessians, Forward Mode, Reverse Mode,
95         Implicit Function Differentiation, Inverse Function Differentiation
96\medskip
97
98\noindent
99{\bf Abbreviated title}: Automatic differentiation by overloading in C++
100%
101\end{titlepage}
102%
103\tableofcontents       
104%
105
106\newpage
107%
108%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
109%
110%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
111\section{Preparing a Section of C or C++ Code for Differentiation}
112\label{prepar}
113%
114\subsection{Introduction}
115%
116\setcounter{equation}{0}
117The package \mbox{ADOL-C} 
118utilizes overloading in C++, but the
119user has to know only C. The acronym stands for {\bf A}utomatic
120{\bf D}ifferentiation by {\bf O}ver{\bf L}oading in {\bf C}++.
121In contrast to source transformation approaches, overloading does not generate intermediate
122source code.
123As starting points to retrieve further information on techniques and
124application of automatic differentiation, as well as on other AD
125tools, we refer to the book \cite{GrWa08}. Furthermore, the web page
126\verb=http://www.autodiff.org= of the AD community forms a rich source
127of further information and pointers.
128
129
130ADOL-C facilitates the simultaneous
131evaluation of arbitrarily high directional derivatives and the
132gradients of these Taylor coefficients with respect to all independent
133variables. Relative to the cost of evaluating the underlying function,
134the cost for evaluating any such scalar-vector pair grows as the
135square of the degree of the derivative but is still completely
136independent of the numbers $m$ and $n$.
137
138This manual is organized as follows. This section explains the
139modifications required to convert undifferentiated code to code that
140compiles with ADOL-C.
141\autoref{tape} covers aspects of the tape of recorded data that ADOL-C uses to
142evaluate arbitrarily high order derivatives. The discussion includes storage
143requirements and the tailoring of certain tape characteristics to fit specific
144user needs. Descriptions of easy-to-use drivers for a  convenient derivative
145evaluation are contained in \autoref{drivers}.
146\autoref{forw_rev_ad} offers a more mathematical characterization of
147the different modes of AD to compute derivatives. At the same time, the
148corresponding drivers of ADOL-C are explained. 
149The overloaded derivative evaluation routines using the forward and the reverse
150mode of AD are explained in \autoref{forw_rev}.
151Advanced differentiation techniques as the optimal checkpointing for
152time integrations, the exploitation of fixed point iterations, the usages
153of external differentiated functions and the differentiation of OpenMP
154parallel programs are described in \autoref{adv_ad}.
155The tapeless forward mode is presented in \autoref{tapeless}.
156\autoref{install} details the installation and
157use of the ADOL-C package. Finally, \autoref{example} 
158furnishes some example programs that incorporate the ADOL-C package to
159evaluate first and higher-order
160derivatives.  These and other examples are distributed with the ADOL-C
161source code.
162The user should simply refer to them if the more abstract and general
163descriptions of ADOL-C provided in this document do not suffice.
164%
165%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
166\subsection{Declaring Active Variables}
167%
168\label{DecActVar}
169%
170The key ingredient of automatic differentiation by overloading is the
171concept of an {\em active variable}. All variables that may be
172considered as differentiable quantities at some time
173during the program execution must be of an active
174type. ADOL-C uses one
175active scalar type, called {\sf adouble}, whose real part is of the
176standard type {\sf double}.
177Typically, one will declare the independent variables
178and all quantities that directly or indirectly depend on them as
179{\em active}. Other variables that do not depend on the independent
180variables but enter, for example, as parameters, may remain one of the
181{\em passive} types {\sf double, float}, or {\sf int}. There is no
182implicit type conversion from {\sf adouble} to any of these passive
183types; thus, {\bf failure to declare variables as active when they
184depend on other active variables will result in a compile-time error
185message}. In data flow terminology, the set of active variable names
186must contain all its successors in the dependency graph. All components
187of indexed arrays must have the same activity status.
188
189The real component of an {\sf adouble x} can be extracted as
190{\sf x.value()}. In particular,
191such explicit conversions are needed for the standard output procedure
192{\sf printf}. The output stream operator \boldmath $\ll$ \unboldmath is overloaded such
193that first the real part of an {\sf adouble} and then the string
194``{\sf (a)}" is added to the stream. The input stream operator \boldmath $\gg$ \unboldmath  can
195be used to assign a constant value to an {\sf adouble}.
196Naturally, {\sf adouble}s may be
197components of vectors, matrices, and other arrays, as well as
198members of structures or classes.
199
200The C++ class {\sf adouble}, its member functions, and the overloaded
201versions of all arithmetic operations, comparison operators, and
202most ANSI C functions are contained in the file \verb=adouble.cpp= and its
203header \verb=<adolc/adouble.h>=. The latter must be included for compilation
204of all program files containing {\sf adouble}s and corresponding
205operations.
206%
207%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
208\subsection{Marking Active Sections}
209\label{markingActive}
210%
211All calculations involving active variables that occur between
212the void function calls
213\begin{center}
214{\sf trace\_on(tag,keep)} \hspace{0.3in} and \hspace{0.3in}
215{\sf trace\_off(file)}
216\end{center}
217are recorded on a sequential data set called {\em tape}. Pairs of
218these function calls can appear anywhere in a C++ program, but
219they must not overlap. The nonnegative integer argument {\sf tag} identifies the
220particular tape for subsequent function or derivative evaluations.
221Unless several tapes need to be kept, ${\sf tag} =0$ may be used throughout.
222The optional integer arguments {\sf keep} and
223{\sf file} will be discussed in \autoref{tape}. We will refer to the
224sequence of statements executed between a particular call to
225{\sf trace\_on} and the following call to {\sf trace\_off} as an
226{\em active section} of the code. The same active section may be
227entered repeatedly, and one can successively generate several traces
228on distinct tapes by changing the value of {\sf tag}.
229Both functions {\sf trace\_on} and {\sf trace\_off} are prototyped in
230the header file \verb=<adolc/taputil.h>=, which is included by the header
231\verb=<adolc/adouble.h>= automatically.
232
233Active sections may contain nested or even recursive calls to functions
234provided by the user. Naturally, their formal and actual parameters
235must have matching types. In particular, the functions must be
236compiled with their active variables declared as
237{\sf adouble}s and with the header file \verb=<adolc/adouble.h>= included. 
238Variables of type {\sf adouble} may be declared outside an active section and need not
239go out of scope before the end of an active section.
240It is not necessary -- though desirable -- that free-store {\sf adouble}s
241allocated within
242an active section be deleted before its completion. The values of all
243{\sf adouble}s that exist at the beginning and end of an active section
244are automatically
245recorded by {\sf trace\_on} and {\sf trace\_off}, respectively.
246%
247%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
248\subsection{Selecting Independent and Dependent Variables}
249%
250One or more active variables that are read in or initialized to
251the values of constants or passive variables must be distinguished as
252independent variables. Other active variables that are similarly
253initialized may be considered as temporaries (e.g., a variable that
254accumulates the partial sums of a scalar product after being
255initialized to zero). In order to distinguish an active variable {\sf x} as
256independent, ADOL-C requires an assignment of the form
257\begin{center}
258{\sf x} \boldmath $\ll=$ \unboldmath {\sf px}\hspace{0.2in}// {\sf px} of any passive numeric type $\enspace .$
259\end{center}
260This special initialization ensures that {\sf x.value()} = {\sf px}, and it should
261precede any other assignment to {\sf x}. However, {\sf x} may be reassigned
262other values subsequently. Similarly, one or more active variables {\sf y}
263must be distinguished as dependent by an assignment of the form
264\begin{center}
265{\sf y \boldmath $\gg=$ \unboldmath py}\hspace{0.2in}// {\sf py} of any  passive type $\enspace ,$ 
266\end{center}
267which ensures that {\sf py} = {\sf y.value()} and should not be succeeded
268by any other assignment to {\sf y}. However, a dependent variable {\sf y} 
269may have been assigned other real values previously, and it could even be an
270independent variable as well.  The derivative values calculated after
271the
272completion of an active section always represent {\bf derivatives of the final
273values of the dependent variables with respect to the initial values of the
274independent variables}.
275
276The order in which the independent and dependent variables are marked
277by the \boldmath $\ll=$ \unboldmath and \boldmath $\gg=$ \unboldmath statements matters crucially for the subsequent
278derivative evaluations. However, these variables do not have to be
279combined into contiguous vectors. ADOL-C counts the number of
280independent and dependent variable specifications within each active
281section and records them in the header of the tape.
282%
283%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
284\subsection{A Subprogram as an Active Section} 
285%
286As a generic example let us consider a C(++) function of the form
287shown in \autoref{code1}.
288%
289\begin{figure}[hbt]
290\framebox[\textwidth]{\parbox{\textwidth}{
291\begin{center}
292\begin{tabbing}
293{\sf void eval(}\= {\sf int n, int m,} \hspace{0.5 in} \=  // number of independents and dependents\\
294\>{\sf  double *x,} \> // independent variable vector \\
295\>{\sf  double *y,} \> // dependent variable vector  \\ 
296\> {\sf int *k, } \> // integer parameters \\ 
297\>{\sf  double *z)}  \> // real parameters \\
298{\sf \{ }\hspace{0.1 in } \=  \> // beginning of function body \\
299\>{\sf double t = 0;}  \> // local variable declaration \\
300\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> // begin of computation \\
301\>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue  \\
302\>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue \\
303\>{\sf  y[m-1] = t/m; }   \> //   end of computation \\
304{\sf  \} } \>  \> // end of function
305\end{tabbing}
306\end{center}
307}} 
308\caption{Generic example of a subprogram to be activated}
309\label{code1}
310\end{figure}
311%
312
313If {\sf eval} is to be called from within an active C(++)
314section with {\sf x}
315and {\sf y} as vectors of {\sf adouble}s and the other parameters
316passive, then one merely has to change the type declarations of all
317variables that depend on {\sf x} from {\sf double} or {\sf float} to
318{\sf adouble}. Subsequently, the subprogram must be compiled with the
319header file \verb=<adolc/adouble.h>= included as described
320in \autoref{DecActVar}. Now let us consider the situation when {\sf eval} is
321still to be called with integer and real arguments, possibly from
322a program written in Fortran77, which  does not allow overloading.
323
324To automatically compute derivatives of the dependent
325variables {\sf y} with respect to the independent variables {\sf x}, we
326can make the body of the function into an active section. For
327example, we may modify the previous program segment
328as in \autoref{adolcexam}.
329The renaming and doubling up of the original independent and dependent
330variable vectors by active counterparts may seem at first a bit clumsy.
331However, this transformation has the advantage that the calling
332sequence and the computational part, i.e., where the function is
333really evaluated, of {\sf eval} remain completely
334unaltered. If the temporary variable {\sf t} had remained a {\sf double},
335the code would not compile, because of a type conflict in the assignment
336following the declaration. More detailed example codes are listed in
337\autoref{example}.
338
339\begin{figure}[htb]
340\framebox[\textwidth]{\parbox{\textwidth}{
341\begin{center}
342\begin{tabbing}
343{\sf void eval(} \= {\sf  int n,m,} \hspace{1.0 in}\= // number of independents and dependents\\
344\> {\sf double *px,} \> // independent passive variable vector \\
345\> {\sf double *py,} \> // dependent passive variable vector  \\ 
346\> {\sf int *k,}  \> // integer parameters \\
347\> {\sf double *z)} \> // parameter vector \\
348{\sf \{}\hspace{0.1 in}\= \> // beginning of function body \\
349\>{\sf  short int tag = 0;} \>   // tape array and/or tape file specifier\\
350\>{\sf trace\_on(tag);} \> // start tracing  \\
351\>{\sf adouble *x, *y;} \> // declare active variable pointers \\
352\>{\sf x = new adouble[n];}\>// declare active independent variables \\ 
353\>{\sf y = new adouble[m];} \> // declare active dependent variables \\
354\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \\
355\>\hspace{0.2in} {\sf x[i] \boldmath $\ll=$ \unboldmath  px[i];} \> // select independent variables \\
356\>{\sf adouble t = 0;}  \> // local variable declaration \\
357     \>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> //  begin crunch \\
358     \>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue crunch \\
359     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
360     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
361     \>{\sf  y[m-1] = t/m; }   \> //   end crunch as before\\
362     \>{\sf for (int j=0; j \boldmath $<$ \unboldmath m; j++)} \\
363     \>\hspace{0.2in}{\sf y[j] \boldmath $\gg=$ \unboldmath py[j];} \> // select dependent variables \\
364     \>{\sf  delete[] y;} \>// delete dependent active variables \\
365     \>{\sf  delete[] x;} \>// delete independent active variables \\
366     \>{\sf trace\_off();} \> // complete tape \\
367{\sf  \}}   \>\> // end of function
368\end{tabbing} 
369\end{center}}}
370\caption{Activated version of the code listed in \autoref{code1}}
371\label{adolcexam}
372\end{figure}
373%
374%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
375\subsection{Overloaded Operators and Functions}
376\label{OverOper}
377%
378As in the subprogram discussed above, the actual computational
379statements of a C(++) code need not be altered for the purposes of
380automatic differentiation. All arithmetic operations, as well as the
381comparison and assignment operators, are overloaded, so any or all of
382their operands can be an active variable. An {\sf adouble x} occurring
383in a comparison operator is effectively replaced by its real value
384{\sf x.value()}. Most functions contained in the ANSI C standard for
385the math library are overloaded for active arguments. The only
386exceptions are the non-differentiable functions {\sf fmod} and
387{\sf modf}. Otherwise, legitimate C code in active sections can remain
388completely unchanged, provided the direct output of active variables
389is avoided. The rest of this subsection may be skipped by first time
390users who are not worried about marginal issues of differentiability
391and efficiency.
392
393The modulus {\sf fabs(x)} is everywhere Lipschitz continuous but not
394properly differentiable at the origin, which raises the question of
395how this exception ought to be handled. Fortunately, one can easily
396see that {\sf fabs(x)} and all its compositions with smooth
397functions are still directionally differentiable. These
398directional derivatives of arbitrary order can be propagated in the
399forward mode without any ambiguity. In other words, the forward mode as
400implemented in ADOL-C  computes Gateaux derivatives
401in certain directions, which reduce to Fr\'echet derivatives only
402if the dependence on the direction is linear. Otherwise,
403the directional derivatives are merely positively homogeneous with
404respect to the scaling of the directions.
405For the reverse mode, ADOL-C sets the derivative of {\sf fabs(x)} at
406the origin somewhat arbitrarily to zero.
407
408We have defined binary functions {\sf fmin} and {\sf fmax} for {\sf adouble}
409arguments, so that function and derivative values are obtained consistent
410with those of {\sf fabs} according to the identities
411\[
412 \min(a,b) = [a+b-|a-b|]/2 \quad {\rm and} \quad
413 \max(a,b) = [a+b+|a-b|]/2 \quad .
414\]
415These relations cannot hold if either $a$ or $b$ is infinite, in which
416case {\sf fmin} or {\sf fmax} and their derivatives may still be well
417defined. It should be noted that the directional differentiation of
418{\sf fmin} and {\sf fmax} yields at ties $a=b$ different results from
419the corresponding assignment based on the sign of $a-b$. For example,
420the statement
421\begin{center}
422 {\sf if (a $<$ b) c = a; else c = b;}
423\end{center} 
424yields for {\sf a}~=~{\sf b} and {\sf a}$^\prime < $~{\sf b}$^\prime$
425the incorrect directional derivative value
426{\sf c}$^\prime = $~{\sf  b}$^\prime$ rather than the correct
427{\sf c}$^\prime = $~{\sf  a}$^\prime$. Therefore this form of conditional assignment
428should be avoided by use of the function $\sf fmin(a,b)$. There
429are also versions of {\sf fmin} and {\sf fmax} for two passive
430arguments and mixed passive/active arguments are handled by
431implicit conversion.
432On the function class obtained by composing the modulus with real
433analytic functions, the concept of directional differentiation can be
434extended to the propagation of unique one-sided Taylor expansions.
435The branches taken by {\sf fabs, fmin}, and {\sf fmax}, are recorded
436on the tape.
437
438The functions {\sf sqrt}, {\sf pow}, and some inverse trigonometric
439functions have infinite slopes at the boundary points of their domains.
440At these marginal points the derivatives are set by ADOL-C to
441either {\sf $\pm$InfVal}, 0
442or {\sf NoNum}, where {\sf InfVal} and {\sf NoNum} are user-defined
443parameters, see \autoref{Customizing}.
444On IEEE machines {\sf InfVal} can be set to the special value
445{\sf Inf}~=~$1.0/0.0$ and {\sf NoNum} to {\sf NaN}~=~$0.0/0.0$.
446For example, at {\sf a}~=~0 the first derivative {\sf b}$^\prime$ 
447of {\sf b}~=~{\sf sqrt(a)} is set to
448\[
449{\sf b}^\prime = \left\{
450\begin{array}{ll}
451\sf InfVal&\mbox{if}\;\; {\sf a}^\prime>0  \\
4520&\mbox{if}\;\;{\sf a}^\prime =0 \\
453\sf NoNum&\mbox{if}\;\;{\sf a}^\prime <0\\
454\end{array} \right\enspace .
455\]
456In other words, we consider {\sf a} and
457consequently {\sf b}  as a constant when {\sf a}$^\prime$ or more generally
458all computed Taylor coefficients are zero.
459
460The general power function ${\sf pow(x,y)=x^y}$ is computed whenever
461it is defined for the corresponding {\sf double} arguments. If {\sf x} is
462negative, however, the partial derivative with respect to an integral exponent
463is set to zero.
464%Similarly, the partial of {\bf pow} with respect to both arguments
465%is set to zero at the origin, where both arguments vanish.     
466The derivatives of the step functions
467{\sf floor}, {\sf ceil}, {\sf frexp}, and {\sf ldexp} are set to zero at all
468arguments {\sf x}. The result values of the step functions
469are recorded on the tape and can later be checked to recognize
470whether a step to another level was taken during a forward sweep
471at different arguments than at taping time.
472
473Some C implementations supply other special
474functions, in particular the error function {\sf erf(x)}. For the
475latter, we have included an {\sf adouble} version in \verb=<adouble.cpp>=, which
476has been commented out for systems on which the {\sf double} valued version
477is not available. The increment and decrement operators {\sf ++}, \boldmath $--$ \unboldmath (prefix and
478postfix) are available for {\sf adouble}s.
479%
480% XXX: Vector and matrix class have to be reimplemented !!!
481%
482% and also the
483%active subscripts described in the \autoref{act_subscr}.
484Ambiguous statements like {\sf a += a++;} must be
485avoided because the compiler may sequence the evaluation of the
486overloaded
487expression differently from the original in terms of {\sf double}s.
488
489As we have indicated above, all subroutines called with active arguments
490must be modified or suitably overloaded. The simplest procedure is
491to declare the local variables of the function as active so that
492their internal calculations are also recorded on the tape.
493Unfortunately, this approach is likely to be unnecessarily inefficient
494and inaccurate if the original subroutine evaluates a special function
495that is defined as the solution of a particular mathematical problem.
496The most important examples are implicit functions, quadratures,
497and solutions of ordinary differential equations. Often
498the numerical methods for evaluating such special functions are
499elaborate, and their internal workings are not at all differentiable in
500the data. Rather than differentiating through such an adaptive
501procedure, one can obtain first and higher derivatives directly from
502the mathematical definition of the special function. Currently this
503direct approach has been implemented only for user-supplied quadratures
504as described in \autoref{quadrat}.
505%
506%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
507\subsection{Reusing the Tape for Arbitrary Input Values}
508\label{reuse_tape}
509%
510In some situations it may be desirable to calculate the value and
511derivatives of a function at arbitrary arguments by using a tape of
512the function evaluation at one argument and reevaluating the
513function  and its derivatives using the given ADOL-C
514routines. This approach can
515significantly reduce run times, and it
516also allows to port problem functions, in the form of the 
517corresponding tape files, into a computing environment that
518does not support C++ but does support C or Fortran. 
519Therefore, the routines provided by ADOL-C for the evaluation of derivatives
520can be used to at arguments $x$ other than the
521point at which the tape was generated, provided there are
522no user defined quadratures and all comparisons involving
523{\sf adouble}s yield the same result. The last condition
524implies that the control flow is unaltered by the change
525of the independent variable values. Therefore, this sufficient
526condition is tested by ADOL-C and if it is not met
527the ADOL-C routine called for derivative calculations indicates this
528contingency through its return value. Currently, there are six return values,
529see \autoref{retvalues}.
530\begin{table}[h]
531\center\small
532\begin{tabular}{|r|l|}\hline
533 +3 &
534\begin{minipage}{12.5cm}
535\vspace*{1ex}
536The function is locally analytic.
537\vspace*{1ex}
538\end{minipage} \\ \hline
539 +2 &
540\begin{minipage}{12.5cm}
541\vspace*{1ex}
542The function is locally analytic but the sparsity
543structure (compared to the situation at the  taping point)
544may have changed, e.g. while at taping arguments
545{\sf fmax(a,b)} returned {\sf a} we get {\sf b} at
546the argument currently used.
547\vspace*{1ex}
548\end{minipage} \\ \hline
549 +1 &
550\begin{minipage}{12.5cm}
551\vspace*{1ex}
552At least one of the functions {\sf fmin}, {\sf fmax} or {\sf fabs}
553is  evaluated at a tie or zero, respectively.  Hence, the function to be differentiated is
554Lipschitz-continuous but possibly non-differentiable.
555\vspace*{1ex}
556\end{minipage} \\ \hline
557 0 &
558\begin{minipage}{12.5cm}
559\vspace*{1ex}
560Some arithmetic comparison involving {\sf adouble}s yields a tie.
561Hence, the function to be differentiated  may be discontinuous.
562\vspace*{1ex}
563\end{minipage} \\ \hline
564 -1 &
565\begin{minipage}{12.5cm}
566\vspace*{1ex}
567An {\sf adouble} comparison yields different results
568from the evaluation point at which the tape was generated.
569\vspace*{1ex}
570\end{minipage} \\ \hline
571 -2 &
572\begin{minipage}{12.5cm}
573\vspace*{1ex}
574The argument of a user-defined quadrature has changed
575from the evaluation point at which the tape was generated.
576\vspace*{1ex}
577\end{minipage} \\ \hline
578\end{tabular}
579\caption{Description of return values}
580\label{retvalues}
581\end{table}                           
582
583\begin{figure}[h]
584\centering\includegraphics[width=10.0cm]{tap_point}
585\caption{Return values around the taping point}
586\label{fi:tap_point}
587\end{figure}         
588
589In \autoref{fi:tap_point} these return values are illustrated.
590If the user finds the return value of an ADOL-C routine to be negative the
591taping process simply has to be repeated by executing the active section again.
592The crux of the problem lies in the fact that the tape records only
593the operations that are executed during one particular evaluation of the
594function.
595It also has no way to evaluate integrals since the corresponding
596quadratures are never recorded on the tape.
597Therefore, when there are user-defined quadratures the retaping is necessary at each
598new point. If there are only branches conditioned on {\sf adouble}
599comparisons one may hope that re-taping becomes unnecessary when
600the points settle down in some small neighborhood, as one would
601expect for example in an iterative equation solver.
602%
603%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
604\subsection{Conditional Assignments}
605\label{condassign}
606%
607It appears unsatisfactory that, for example, a simple table lookup
608of some physical property forces the re-recording of a possibly
609much larger calculation. However, the basic philosophy of ADOL-C
610is to overload arithmetic, rather than to generate a new program
611with jumps between ``instructions'', which would destroy the
612strictly sequential tape access and
613require the infusion of substantial compiler technology.
614Therefore, we introduce the two constructs of conditional
615assignments and active integers as partial remedies to the
616branching problem.
617
618In many cases, the functionality of branches
619can be replaced by conditional assignments. 
620For this purpose, we provide a special function called
621{\sf condassign(a,b,c,d)}. Its calling sequence corresponds to the
622syntax of the conditional assignment
623\begin{center}
624    {\sf a = (b \boldmath $>$ \unboldmath 0) ? c : d;} 
625\end{center}
626which C++ inherited from C. However, here the arguments are restricted to be
627active or passive scalar arguments, and all expression arguments
628are evaluated before the test on {\sf  b}, which is different from
629the usual conditional assignment or the code segment.
630
631Suppose the original program contains the code segment
632\begin{center}
633{\sf if (b \boldmath $>$ \unboldmath 0) a = c; else a = d;}\\
634\end{center}
635Here, only one of the expressions (or, more generally, program blocks)
636{\sf c} and {\sf d} is evaluated, which exactly constitutes the problem
637for ADOL-C. To obtain the correct value {\sf a} with ADOL-C, one
638may first execute both branches and then pick either {\sf c}
639or {\sf d} using
640{\sf condassign(a,b,c,d)}. To maintain
641consistency with the original code, one has to make sure
642that the two branches do not have any side effects that can
643interfere with each other or may be important for subsequent
644calculations. Furthermore the test parameter {\sf b} has to be an
645{\sf adouble} or an {\sf adouble} expression. Otherwise the
646test condition {\sf b} is recorded on the tape as a {\em constant} with its
647run time value. Thus the original dependency of {\sf b} on
648active variables gets lost, for instance if {\sf b} is a comparison
649expression, see \autoref{OverOper}.
650If there is no {\sf else} part in a conditional assignment, one may call
651the three argument version
652{\sf condassign(a,b,c)}, which
653is logically equivalent to {\sf condassign(a,b,c,a)} in that
654nothing happens if {\sf b} is non-positive. 
655The header file \verb=<adolc/adouble.h>=
656contains also corresponding definitions of
657{\sf condassign(a,b,c,d)} 
658and {\sf condassign(a,b,c)} for
659passive {\sf double} arguments so that the modified code
660without any differentiation can be tested
661for correctness.
662
663A generalization of this concept for more than two branches, e.g.,
664akin to a \texttt{switch} statement or a cascade of \texttt{if...else if}, may be done by enabling
665\textsf{ADOLC\_ADVANCED\_BRANCHING} and performing selection on
666elements of an \texttt{advector} with active indices.
667%
668%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
669\subsection{Step-by-Step Modification Procedure}
670%
671To prepare a section of given C or C++ code for automatic
672differentiation as described above, one applies the following step-by-step procedure.
673\begin{enumerate}
674\item
675Use the statements {\sf trace\_on(tag)} or {\sf trace\_on(tag,keep)}
676and {\sf trace\_off()} or {\sf trace\_off(file)} to mark the
677beginning and end of the active section.
678\item 
679Select the set of active variables, and change their type from
680{\sf double} or {\sf float} to {\sf adouble}.
681\item
682Select a sequence of independent variables, and initialize them with
683\boldmath $\ll=$ \unboldmath assignments from passive variables or vectors.
684\item
685Select a sequence of dependent variables among the active variables,
686and pass their final values to passive variable or vectors thereof
687by \boldmath $\gg=$ \unboldmath assignments.
688\item 
689Compile the codes after including the header file \verb=<adolc/adouble.h>=.
690\end{enumerate}
691Typically, the first compilation will detect several type conflicts
692-- usually attempts to convert from active to passive
693variables or to perform standard I/O of active variables.
694Since all standard
695C programs can be activated by a mechanical application of the
696procedure above, the following section is of importance
697only to advanced users.
698%                                                                 
699%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
700\section{Numbering the Tapes and Controlling the Buffer}
701\label{tape}
702%
703The trace generated by the execution of an active section may stay
704within a triplet of internal arrays or it may be written out
705to three corresponding files. We will refer to these triplets as the
706tape array or tape file, in general tape, which may subsequently be
707used to evaluate the
708underlying function and its derivatives at the original point or at
709alternative arguments. If the active section involves user-defined
710quadratures it must be executed and
711re-taped at each new argument. Similarly, if conditions on
712{\sf adouble} values lead to a different program branch being taken at
713a new argument the evaluation process also needs to be re-taped at the
714new point. Otherwise, direct evaluation from
715the tape by the routine {\sf function} (\autoref{optdrivers}) is
716likely to be
717faster. The use of quadratures and the results of all comparisons on
718{\sf adouble}s are recorded on the tape so that {\sf function} and other
719forward routines stop and  return appropriate flags if their use without
720prior re-taping is unsafe. To avoid any re-taping certain types of
721branches can be recorded on the tape through
722the use of conditional assignments 
723described before in \autoref{condassign}.
724
725Several tapes may be generated and kept simultaneously.
726A tape array is used as a triplet of buffers or a tape file is generated if
727the length of any of the buffers exceeds the maximal array lengths of
728{\sf OBUFSIZE}, {\sf VBUFSIZE} or {\sf LBUFSIZE}. These parameters are
729defined in the header file \verb=<adolc/usrparms.h>=
730and may be adjusted by the user in the header file before compiling
731the ADOL-C library, or on runtime using a file named \verb=.adolcrc=.
732The filesystem folder, where the tapes files may be written to disk,
733can be changed by changing the definition of {\sf TAPE\_DIR} in
734the header file \verb=<adolc/dvlparms.h>= before
735compiling the ADOL-C library, or on runtime by defining {\sf
736  TAPE\_DIR} in the \verb=.adolcrc= file. By default this is defined
737to be the present working directory (\verb=.=).
738
739For simple usage, {\sf trace\_on} may be called with only the tape
740{\sf tag} as argument, and {\sf trace\_off} may be called
741without argument. The optional integer argument {\sf keep} of
742{\sf trace\_on} determines whether the numerical values of all
743active variables are recorded in a buffered temporary array or file
744called the taylor stack.
745This option takes effect if
746{\sf keep} = 1 and prepares the scene for an immediately following
747gradient evaluation by a call to a routine implementing the reverse mode
748as described in the \autoref{forw_rev_ad} and \autoref{forw_rev}. A
749file is used instead of an array if the size exceeds the maximal array
750length of {\sf TBUFSIZE} defined in \verb=<adolc/usrparms.h>= and may
751be adjusted in the same way like the other buffer sizes mentioned above.
752Alternatively, gradients may be evaluated by a call
753to {\sf gradient}, which includes a preparatory forward sweep
754for the creation of the temporary file. If omitted, the argument
755{\sf  keep} defaults to 0, so that no temporary
756taylor stack file is generated.
757
758By setting the optional integer argument {\sf file} of
759{\sf  trace\_off} to 1, the user may force a numbered  tape
760file to be written even if the tape array (buffer) does not overflow.
761If the argument {\sf file} is omitted, it
762defaults to 0, so that the tape array is written onto a tape file only
763if the length of any of the buffers exceeds {\sf [OLVT]BUFSIZE} elements.
764
765After the execution of an active section, if a tape file was generated, i.e.,
766if the length of some buffer exceeded {\sf [OLVT]BUFSIZE} elements or if the
767argument {\sf file} of {\sf trace\_off} was set to 1, the files will be
768saved in the directory defined as {\sf ADOLC\_TAPE\_DIR} (by default
769the current working directory) under filenames formed by
770the strings {\sf ADOLC\_OPERATIONS\_NAME}, {\sf
771  ADOLC\_LOCATIONS\_NAME}, {\sf ADOLC\_VALUES\_NAME} and {\sf
772  ADOLC\_TAYLORS\_NAME} defined in
773the header file \verb=<adolc/dvlparms.h>= appended with the number
774given as the {\sf tag} argument to {\sf trace\_on} and have the
775extension {\sf .tap}.
776
777 Later, all problem-independent routines
778like {\sf gradient}, {\sf jacobian}, {\sf forward}, {\sf reverse}, and others
779expect as first argument a {\sf tag} to determine
780the tape on which their respective computational task is to be performed.
781By calling {\sf trace\_on} with different tape {\sf tag}s, one can create
782several tapes for various function evaluations and subsequently perform
783function and derivative evaluations on one or more of them.
784
785For example, suppose one wishes to calculate for two smooth functions
786$f_1(x)$ and $f_2(x)$ 
787\[
788   f(x) = \max \{f_1(x) ,f_2(x)\},\qquad \nabla f(x),
789\]
790and possibly higher derivatives where the two functions do not tie.
791Provided $f_1$ and $f_2$ are evaluated in two separate active sections,
792one can generate two different tapes by calling {\sf trace\_on} with
793{\sf tag} = 1 and {\sf tag} = 2 at the beginning of the respective active
794sections.
795Subsequently, one can decide whether $f(x)=f_1(x)$ or $f(x)=f_2(x)$ at the
796current argument and then evaluate the gradient $\nabla f(x)$ by calling
797{\sf gradient} with the appropriate argument value {\sf tag} = 1 or
798{\sf tag} = 2.
799%
800%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
801\subsection{Examining the Tape and Predicting Storage Requirements }
802\label{examiningTape}
803%
804At any point in the program, one may call the routine
805\begin{center}
806{\sf void tapestats(unsigned short tag, size\_t* counts)}
807\end{center}
808with {\sf counts} beeing an array of at least eleven integers.
809The first argument {\sf tag} specifies the particular tape of
810interest. The components of {\sf counts} represent
811\[
812\begin{tabular}{ll}
813{\sf counts[0]}: & the number of independents, i.e.~calls to \boldmath $\ll=$ \unboldmath, \\
814{\sf counts[1]}: & the number of dependents, i.e.~calls to \boldmath $\gg=$ \unboldmath,\\ 
815{\sf counts[2]}: & the maximal number of live active variables,\\
816{\sf counts[3]}: & the size of taylor stack (number of overwrites),\\
817{\sf counts[4]}: & the buffer size (a multiple of eight),
818\end{tabular}
819\]
820\[
821\begin{tabular}{ll}
822{\sf counts[5]}: & the total number of operations recorded,\\
823{\sf counts[6-13]}: & other internal information about the tape.
824\end{tabular}
825\]
826The values {\sf maxlive} = {\sf counts[2]} and {\sf tssize} = {\sf counts[3]} 
827determine the temporary
828storage requirements during calls to the routines
829implementing the forward and the reverse mode.
830For a certain degree {\sf deg} $\geq$ 0, the scalar version of the
831forward mode involves apart from the tape buffers an array of
832 $(${\sf deg}$+1)*${\sf maxlive} {\sf double}s in
833core and, in addition, a sequential data set called the value stack
834of {\sf tssize}$*${\sf keep} {\sf revreal}s if called with the
835option {\sf keep} $>$ 0. Here
836the type {\sf revreal} is defined as {\sf double} or {\sf float}. The latter choice halves the storage
837requirement for the sequential data set, which stays in core if
838its length is less than {\sf TBUFSIZE} bytes and is otherwise written
839out to a temporary file. The parameter {\sf TBUFSIZE} is defined in the header file \verb=<adolc/usrparms.h>=.
840The drawback of the economical
841{\sf revreal} = {\sf float} choice is that subsequent calls to reverse mode implementations
842yield gradients and other adjoint vectors only in single-precision
843accuracy. This may be acceptable if the adjoint vectors
844represent rows of a Jacobian that is  used for the calculation of
845Newton steps. In its scalar version, the reverse mode implementation involves
846the same number of {\sf double}s and twice as many {\sf revreal}s as the
847forward mode implementation.
848The storage requirements of the vector versions of the forward mode and
849reverse mode implementation are equal to that of the scalar versions multiplied by
850the vector length.
851%
852%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
853\subsection{Customizing ADOL-C}
854\label{Customizing}
855%
856Based on the information provided by the routine {\sf tapestats}, the user may alter the
857following types and constant dimensions in the header file \verb=<adolc/usrparms.h>=
858to suit his problem and environment.
859
860\begin{description}
861\item[{\sf OBUFSIZE}, {\sf LBUFSIZE}, {\sf VBUFSIZE}{\rm :}] These integer determines the length of
862in\-ter\-nal buf\-fers (default: 524$\,$288). If the buffers are large enough to accommodate all
863required data, any file access is avoided unless {\sf trace\_off}
864is called with a positive argument. This desirable situation can
865be achieved for many problem functions with an execution trace of moderate
866size. Primarily these values occur as an argument
867to {\sf malloc}, so that setting it unnecessarily large may have no
868ill effects, unless the operating system prohibits or penalizes large
869array allocations. It is however recommended to leave the values in
870\texttt{<adolc/usrparms.h>} unchanged and set them using the
871\texttt{.adolcrc} file in the current working directory at runtime.
872
873\item[{\sf TBUFSIZE}{\rm :}] This integer determines the length of the
874in\-ter\-nal buf\-fer for a taylor stack (default: 524$\,$288).
875
876\item[{\sf TBUFNUM}{\rm :}] This integer determines the maximal number of taylor stacks (default: 32).
877
878\item[{\sf fint}{\rm :}] The integer data type used by Fortran callable versions of functions.
879
880\item[{\sf fdouble}{\rm :}] The floating point data type used by Fortran callable versions of functions.
881
882\item[{\sf inf\_num}{\rm :}] This together with {\sf inf\_den}
883sets the ``vertical'' slope {\sf InfVal} = {\sf inf\_num/inf\_den} 
884of special functions at the boundaries of their domains (default: {\sf inf\_num} = 1.0). On IEEE machines
885the default setting produces the standard {\sf Inf}. On non-IEEE machines
886change these values to produce a small {\sf InfVal} value and compare
887the results of two forward sweeps with different {\sf InfVal} settings
888to detect a ``vertical'' slope.
889
890\item[{\sf inf\_den}{\rm :}] See {\sf inf\_num} (default: 0.0).
891
892\item[{\sf non\_num}{\rm :}] This together with {\sf non\_den} 
893sets the mathematically
894undefined derivative value {\sf NoNum} = {\sf non\_num/non\_den}
895of special functions at the boundaries of their domains (default: {\sf non\_num} = 0.0). On IEEE machines
896the default setting produces the standard {\sf NaN}. On non-IEEE machines
897change these values to produce a small {\sf NoNum} value and compare
898the results of two forward sweeps with different {\sf NoNum} settings
899to detect the occurrence of undefined derivative values.
900
901\item[{\sf non\_den}{\rm :}] See {\sf non\_num} (default: 0.0).
902
903\item[{\sf ADOLC\_EPS}{\rm :}] For testing on small numbers to avoid overflows (default: 10E-20).
904
905\item[{\sf DIAG\_OUT}{\rm :}] File identifier used as standard output for ADOL-C diagnostics (default: stdout).
906\end{description}
907
908The following types and options may be set using the command-line
909options of the \texttt{./configure} script.
910
911\begin{description}
912\item[{\sf locint}{\rm :}] The range of the integer type
913{\sf locint} determines how many {\sf adouble}s can be simultaneously
914alive (default: {\sf unsigned int}).  In extreme cases when there are more than $2^{32}$ {\sf adouble}s
915alive at any one time, the type {\sf locint} must be changed to
916 {\sf unsigned long}. This can be done by passing
917 \texttt{--enable-ulong} to \texttt{./configure}.
918
919\item[{\sf revreal}{\rm :}] The choice of this floating-point type
920trades accuracy with storage for reverse sweeps (default: {\sf double}). While functions
921and their derivatives are always evaluated in double precision
922during forward sweeps, gradients and other adjoint vectors are obtained
923with the precision determined by the type {\sf revreal}. The less
924accurate choice {\sf revreal} = {\sf float} nearly halves the
925storage requirement during reverse sweeps. This can be done by passing
926\texttt{--disable-double} to \texttt{./configure}.
927
928\item[{\sf ATRIG\_ERF}{\rm :}] The overloaded versions of the inverse
929  hyperbolic functions and the error function are enabled (default:
930  undefined) by passing \texttt{--enable-atrig-erf}
931  to \texttt{./configure}
932
933\item[{\sf ADOLC\_USE\_CALLOC}{\rm :}] Selects the memory allocation routine
934  used by ADOL-C. {\sf Malloc} will be used if this variable is
935  undefined. {\sf ADOLC\_USE\_CALLOC} is defined by default to avoid incorrect
936  result caused by uninitialized memory. It can be set undefined by
937  passing \texttt{--disable-use-calloc} to \texttt{./configure}.
938
939\item[{\sf ADOLC\_ADVANCED\_BRANCHING}{\rm :}] Enables routines
940  required for automatic branch selection (default: disabled). The
941  boolean valued comparison operators with two \texttt{adouble} type
942  arguments will not return boolean values anymore and may not be used
943  in branch control statements (\texttt{if}, \texttt{while}, \texttt{for}
944  etc.). Instead conditional assignments using \texttt{condassign} or
945  selection operations on elements of \texttt{advector} type should be
946  used. Enabling this option and rewriting the function evaluation
947  using \texttt{condassign} or selections of \texttt{advector}
948  elements will prevent the need for retracing the function at branch
949  switches. This can be enabled by passing
950  \texttt{--enable-advanced-branching} to \texttt{./configure}.
951\end{description}
952%
953%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
954\subsection{Warnings and Suggestions for Improved Efficiency}
955\label{WarSug}
956%
957Since the type {\sf adouble} has a nontrivial constructor,
958the mere declaration of large {\sf adouble} arrays may take up
959considerable run time. The user should be warned against
960the usual Fortran practice of declaring fixed-size arrays
961that can accommodate the largest possible case of an evaluation program
962with variable dimensions. If such programs are converted to or written
963in C, the overloading in combination with ADOL-C will lead to very
964large run time increases for comparatively small values of the
965problem dimension, because the actual computation is completely
966dominated by the construction of the large {\sf adouble} arrays.
967The user is advised to
968create dynamic arrays of
969{\sf adouble}s by using the C++ operator {\sf new} and to destroy them
970using {\sf delete}. For storage efficiency it is desirable that
971dynamic objects are created and destroyed in a last-in-first-out
972fashion.
973
974Whenever an {\sf adouble} is declared, the constructor for the type
975{\sf adouble} assigns it a nominal address, which we will refer to as
976its  {\em location}.  The location is of the type {\sf locint} defined
977in the header file \verb=<adolc/usrparms.h>=. Active vectors occupy
978a range of contiguous locations. As long as the program execution
979never involves more than 65$\,$536 active variables, the type {\sf locint}
980may be defined as {\sf unsigned short}. Otherwise, the range may be
981extended by defining {\sf locint} as {\sf (unsigned) int} or
982{\sf (unsigned) long}, which may nearly double
983the overall mass storage requirement. Sometimes one can avoid exceeding
984the accessible range of {\sf unsigned short}s by using more local variables and deleting
985{\sf adouble}s  created by the new operator in a
986last-in-first-out
987fashion.  When memory for {\sf adouble}s is requested through a call to
988{\sf malloc()} or other related C memory-allocating
989functions, the storage for these {\sf adouble}s is allocated; however, the
990C++ {\sf adouble} constructor is never called.  The newly defined
991{\sf adouble}s are never assigned a location and are not counted in
992the stack of live variables. Thus, any results depending upon these
993pseudo-{\sf adouble}s will be incorrect. For these reasons {\bf DO NOT use
994  malloc() and related C memory-allocating
995functions when declaring adoubles (see the following paragraph).}
996%
997% XXX: Vector and matrix class have to be reimplemented !!!
998%
999%The same point applies, of course,
1000% for active vectors.
1001
1002When an {\sf adouble}
1003%
1004% XXX: Vector and matrix class have to be reimplemented !!!
1005%
1006% or {\bf adoublev}
1007goes out of
1008scope or is explicitly deleted, the destructor notices that its
1009location(s) may be
1010freed for subsequent (nominal) reallocation. In general, this is not done
1011immediately but is delayed until the locations to be deallocated form a
1012contiguous tail of all locations currently being used. 
1013
1014 As a consequence of this allocation scheme, the currently
1015alive {\sf adouble} locations always form a contiguous range of integers
1016that grows and shrinks like a stack. Newly declared {\sf adouble}s are
1017placed on the top so that vectors of {\sf adouble}s obtain a contiguous
1018range of locations. While the C++ compiler can be expected to construct
1019and destruct automatic variables in a last-in-first-out fashion, the
1020user may upset this desirable pattern by deleting free-store {\sf adouble}s
1021too early or too late. Then the {\sf adouble} stack may grow
1022unnecessarily, but the numerical results will still be
1023correct, unless an exception occurs because the range of {\sf locint}
1024is exceeded. In general, free-store {\sf adouble}s
1025%
1026% XXX: Vector and matrix class have to be reimplemented !!!
1027%
1028%and {\bf adoublev}s
1029should be deleted in a last-in-first-out fashion toward the end of
1030the program block in which they were created.
1031When this pattern is maintained, the maximum number of
1032{\sf adouble}s alive and, as a consequence, the
1033randomly accessed storage space
1034of the derivative evaluation routines is bounded by a
1035small multiple of the memory used in the relevant section of the
1036original program. Failure to delete dynamically allocated {\sf adouble}s
1037may cause that the  maximal number of {\sf adouble}s alive at one time will be exceeded
1038if the same active section is called repeatedly. The same effect
1039occurs if static {\sf adouble}s are used.
1040
1041To avoid the storage and manipulation of structurally
1042trivial derivative values, one should pay careful attention to
1043the naming of variables. Ideally, the intermediate
1044values generated during the evaluation of a vector function
1045should be assigned to program variables that are
1046consistently either active or passive, in that all their values
1047either are or are not dependent on the independent variables
1048in a nontrivial way. For example, this rule is violated if a temporary
1049variable is successively used to accumulate inner products involving
1050first only passive and later active arrays. Then the first inner
1051product and all its successors in the data dependency graph become
1052artificially active and the derivative evaluation routines
1053described later will waste
1054time allocating and propagating
1055trivial or useless derivatives. Sometimes even values that do
1056depend on the independent variables may be of only transitory
1057importance and may not affect the dependent variables. For example,
1058this is true for multipliers that are used to scale linear
1059equations, but whose values do not influence the dependent
1060variables in a mathematical sense. Such dead-end variables
1061can be deactivated by the use of the {\sf value} function, which
1062converts {\sf adouble}s to {\sf double}s. The deleterious effects
1063of unnecessary activity are partly alleviated by run time
1064activity flags in the derivative routine
1065{\sf hov\_reverse} presented in \autoref{forw_rev_ad}.
1066
1067The {\sf adouble} default constructor sets to zero the associated value.
1068This implies a certain overhead that may seem unnecessary when no initial value
1069is actually given, however,
1070the implicit initialization of arrays from a partial value list is the only legitimate construct (known to us) that requires this behavior.
1071An array instantiation such as
1072\begin{center}
1073\sf double x[3]=\{2.0\};
1074\end{center}
1075will initialize {\sf x[0]} to {\sf 2.0} and initialize (implicitly) the remaining array elements
1076{\sf x[1]} and {\sf x[2]}  to {\sf 0.0}. According to the C++ standard the array element  construction of
1077the type changed instantiation
1078\begin{center}
1079\sf adouble x[3]=\{2.0\};
1080\end{center}
1081will use the constructor {\sf adouble(const double\&);} for {\sf x[0]} passing in {\sf 2.0} but
1082will call the {\sf adouble} default constructor {\sf x[1]} and {\sf x[2]} leaving these array
1083elements uninitialized {\em unless} the default constructor does implement the initialization to
1084zero.
1085The C++ constructor syntax does not provide a means to  distinguish this implicit initialization from the declaration of any simple uninitialized variable.
1086If the user can ascertain the absence of array instantiations such as the above then one can 
1087configure ADOL-C with the \verb=--disable-stdczero= option , see \autoref{genlib}, to
1088avoid the overhead of these initializations. 
1089 
1090%
1091%
1092%
1093%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1094\section{Easy-To-Use Drivers}
1095\label{drivers}
1096%
1097For the convenience of the user, ADOL-C provides several
1098easy-to-use drivers that compute the most frequently required
1099derivative objects. Throughout, we assume that after the execution of an
1100active section, the corresponding tape with the identifier {\sf tag}
1101contains a detailed record of the computational process by which the
1102final values $y$ of the dependent variables were obtained from the
1103values $x$ of the independent variables. We will denote this functional
1104relation between the input variables $x$ and the output variables $y$ by
1105\[
1106F : \R^n \mapsto \R^m, \qquad x \rightarrow F(x) \equiv y.
1107\]
1108The return value of all drivers presented in this section
1109indicate the validity of the tape as explained in \autoref{reuse_tape}.
1110The presented drivers are all C functions and therefore can be used within
1111C and C++ programs. Some Fortran-callable companions can be found
1112in the appropriate header files.
1113%
1114%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1115\subsection{Drivers for Optimization and Nonlinear Equations}
1116%
1117\label{optdrivers}
1118%
1119The drivers provided for solving optimization problems and nonlinear
1120equations are prototyped in the header file \verb=<adolc/drivers/drivers.h>=,
1121which is included automatically by the global header file \verb=<adolc/adolc.h>=
1122(see \autoref{ssec:DesIH}).
1123
1124The routine {\sf function} allows to evaluate the desired function from
1125the tape instead of executing the corresponding source code:
1126%
1127\begin{tabbing}
1128\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1129\>{\sf int function(tag,m,n,x,y)}\\
1130\>{\sf short int tag;}         \> // tape identification \\
1131\>{\sf int m;}                 \> // number of dependent variables $m$\\
1132\>{\sf int n;}                 \> // number of independent variables $n$\\
1133\>{\sf double x[n];}           \> // independent vector $x$ \\
1134\>{\sf double y[m];}           \> // dependent vector $y=F(x)$ 
1135\end{tabbing}
1136%
1137If the original evaluation program is available this double version
1138should be used to compute the function value in order to avoid the
1139interpretative overhead. 
1140
1141For the calculation of whole derivative vectors and matrices up to order
11422 there are the following procedures:
1143%
1144\begin{tabbing}
1145\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1146\>{\sf int gradient(tag,n,x,g)}\\
1147\>{\sf short int tag;}         \> // tape identification \\
1148\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1149\>{\sf double x[n];}           \> // independent vector $x$ \\
1150\>{\sf double g[n];}           \> // resulting gradient $\nabla F(x)$
1151\end{tabbing}
1152%
1153\begin{tabbing}
1154\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1155\>{\sf int jacobian(tag,m,n,x,J)}\\
1156\>{\sf short int tag;}         \> // tape identification \\
1157\>{\sf int m;}                 \> // number of dependent variables $m$\\
1158\>{\sf int n;}                 \> // number of independent variables $n$\\
1159\>{\sf double x[n];}           \> // independent vector $x$ \\
1160\>{\sf double J[m][n];}        \> // resulting Jacobian $F^\prime (x)$
1161\end{tabbing}
1162%
1163\begin{tabbing}
1164\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1165\>{\sf int hessian(tag,n,x,H)}\\
1166\>{\sf short int tag;}         \> // tape identification \\
1167\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1168\>{\sf double x[n];}           \> // independent vector $x$ \\
1169\>{\sf double H[n][n];}        \> // resulting Hessian matrix $\nabla^2F(x)$ 
1170\end{tabbing}
1171%
1172The driver routine {\sf hessian} computes only the lower half of
1173$\nabla^2f(x_0)$ so that all values {\sf H[i][j]} with $j>i$ 
1174of {\sf H} allocated as a square array remain untouched during the call
1175of {\sf hessian}. Hence only $i+1$ {\sf double}s  need to be
1176allocated starting at the position {\sf H[i]}.
1177
1178To use the full capability of automatic differentiation when the
1179product of derivatives with certain weight vectors or directions are needed, ADOL-C offers
1180the following four drivers: 
1181%
1182\begin{tabbing}
1183\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1184\>{\sf int vec\_jac(tag,m,n,repeat,x,u,z)}\\
1185\>{\sf short int tag;}         \> // tape identification \\
1186\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1187\>{\sf int n;}                 \> // number of independent variables $n$\\
1188\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1189\>{\sf double x[n];}           \> // independent vector $x$ \\
1190\>{\sf double u[m];}           \> // range weight vector $u$ \\ 
1191\>{\sf double z[n];}           \> // result $z = u^TF^\prime (x)$
1192\end{tabbing}
1193If a nonzero value of the parameter {\sf repeat} indicates that the
1194routine {\sf vec\_jac} has been called at the same argument immediately
1195before, the internal forward mode evaluation will be skipped and only
1196reverse mode evaluation with the corresponding arguments is executed
1197resulting in a reduced computational complexity of the function {\sf vec\_jac}.
1198%
1199\begin{tabbing}
1200\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1201\>{\sf int jac\_vec(tag,m,n,x,v,z)}\\
1202\>{\sf short int tag;}         \> // tape identification \\
1203\>{\sf int m;}                 \> // number of dependent variables $m$\\
1204\>{\sf int n;}                 \> // number of independent variables $n$\\
1205\>{\sf double x[n];}           \> // independent vector $x$\\
1206\>{\sf double v[n];}           \> // tangent vector $v$\\ 
1207\>{\sf double z[m];}           \> // result $z = F^\prime (x)v$
1208\end{tabbing}
1209%
1210\begin{tabbing}
1211\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1212\>{\sf int hess\_vec(tag,n,x,v,z)}\\
1213\>{\sf short int tag;}         \> // tape identification \\
1214\>{\sf int n;}                 \> // number of independent variables $n$\\
1215\>{\sf double x[n];}           \> // independent vector $x$\\
1216\>{\sf double v[n];}           \> // tangent vector $v$\\
1217\>{\sf double z[n];}           \> // result $z = \nabla^2F(x) v$ 
1218\end{tabbing}
1219%
1220\begin{tabbing}
1221\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1222\>{\sf int hess\_mat(tag,n,p,x,V,Z)}\\
1223\>{\sf short int tag;}         \> // tape identification \\
1224\>{\sf int n;}                 \> // number of independent variables $n$\\
1225\>{\sf int p;}                 \> // number of columns in $V$\\
1226\>{\sf double x[n];}           \> // independent vector $x$\\
1227\>{\sf double V[n][p];}        \> // tangent matrix $V$\\
1228\>{\sf double Z[n][p];}        \> // result $Z = \nabla^2F(x) V$ 
1229\end{tabbing}
1230%
1231\begin{tabbing}
1232\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1233\>{\sf int lagra\_hess\_vec(tag,m,n,x,v,u,h)}\\
1234\>{\sf short int tag;}         \> // tape identification \\
1235\>{\sf int m;}                 \> // number of dependent variables $m$\\
1236\>{\sf int n;}                 \> // number of independent variables $n$\\
1237\>{\sf double x[n];}           \> // independent vector $x$\\
1238\>{\sf double v[n];}           \> // tangent vector $v$\\
1239\>{\sf double u[m];}           \> // range weight vector $u$ \\
1240\>{\sf double h[n];}           \> // result $h = u^T\nabla^2F(x) v $
1241\end{tabbing}
1242%
1243The next procedure allows the user to perform Newton steps only
1244having the corresponding tape at hand:
1245%
1246\begin{tabbing}
1247\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1248\>{\sf int jac\_solv(tag,n,x,b,mode)} \\
1249\>{\sf short int tag;}         \> // tape identification \\
1250\>{\sf int n;}                 \> // number of independent variables $n$\\
1251\>{\sf double x[n];}           \> // independent vector $x$ as\\
1252\>{\sf double b[n];}           \> // in: right-hand side b, out: result $w$ of
1253$F(x)w = b$\\
1254\>{\sf int mode;}              \> // option to choose different solvers
1255\end{tabbing}
1256%
1257On entry, parameter {\sf b} of the routine {\sf jac\_solv}
1258contains the right-hand side of the equation $F(x)w = b$ to be solved. On exit,
1259{\sf b} equals the solution $w$ of this equation. If {\sf mode} = 0 only
1260the Jacobian of the function
1261given by the tape labeled with {\sf tag} is provided internally.
1262The LU-factorization of this Jacobian is computed for {\sf mode} = 1. The
1263solution of the equation is calculated if {\sf mode} = 2.
1264Hence, it is possible to compute the
1265LU-factorization only once. Then the equation can be solved for several
1266right-hand sides $b$ without calculating the Jacobian and
1267its factorization again. 
1268
1269If the original evaluation code of a function contains neither
1270quadratures nor branches, all drivers described above can be used to
1271evaluate derivatives at any argument in its domain. The same still
1272applies if there are no user defined quadratures and
1273all comparisons  involving {\sf adouble}s have the same result as
1274during taping. If this assumption is falsely made all drivers
1275while internally calling the forward mode evaluation will return the value -1 or -2
1276as already specified in \autoref{reuse_tape}
1277%
1278%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1279\subsection{Drivers for Ordinary Differential Equations}
1280\label{odedrivers}
1281%
1282When $F$ is the right-hand side of an (autonomous) ordinary
1283differential equation 
1284\[
1285x^\prime(t) \; = \; F(x(t)) , 
1286\] 
1287we must have $m=n$. Along any solution path $x(t)$ its Taylor
1288coefficients $x_j$ at some time, e.g., $t=0$, must satisfy
1289the relation
1290\[
1291 x_{i+1} = \frac{1}{1+i} y_i.
1292\]
1293with the $y_j$ the Taylor coefficients of its derivative $y(t)=x^\prime(t)$, namely,
1294\[
1295 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
1296\]
1297defined by an autonomous right-hand side $F$ recorded on the tape.
1298Using this relation, one can generate the Taylor coefficients $x_i$,
1299$i \le deg$,
1300recursively from the current point $x_0$. This task is achieved by the
1301driver routine {\sf forode} defined as follows:
1302%
1303\begin{tabbing}
1304\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1305\>{\sf int forode(tag,n,tau,dol,deg,X)}\\
1306\>{\sf short int tag;}         \> // tape identification \\
1307\>{\sf int n;}                 \> // number of state variables $n$\\
1308\>{\sf double tau;}            \> // scaling parameter\\
1309\>{\sf int dol;}               \> // degree on previous call\\
1310\>{\sf int deg;}               \> // degree on current call\\
1311\>{\sf double X[n][deg+1];}    \> // Taylor coefficient vector $X$
1312\end{tabbing}
1313%
1314If {\sf dol} is positive, it is assumed that {\sf forode}
1315has been called before at the same point so that all Taylor coefficient
1316vectors up to the {\sf dol}-th are already correct.
1317
1318Subsequently one may call the driver routine {\sf reverse} or corresponding
1319low level routines as explained in the \autoref{forw_rev} and
1320\autoref{forw_rev_ad}, respectively, to compute
1321the family of square matrices {\sf Z[n][n][deg]} defined by
1322\[
1323Z_j \equiv U\/\frac{\partial y_j}{\partial x_0} \in{I\!\!R}^{q \times n} ,
1324\]
1325with {\sf double** U}$=I_n$ the identity matrix of order {\sf n}.
1326
1327For the numerical solutions of ordinary differential equations,
1328one may also wish to calculate the Jacobians
1329\begin{equation} 
1330\label{eq:bees}
1331B_j \; \equiv \; \frac{\mbox{d}x_{j+1}}{\mbox{d} x_0}\;\in\;{I\!\!R}^{n \times n}\, ,
1332\end{equation}
1333which exist provided $F$ is sufficiently smooth. These matrices can
1334be obtained from the partial derivatives $\partial y_i/\partial x_0$
1335by an appropriate version of the chain rule.
1336To compute the total derivatives $B = (B_j)_{0\leq j <d}$
1337defined in \eqref{eq:bees}, one has to evaluate $\frac{1}{2}d(d-1)$
1338matrix-matrix products. This can be done by a call of the routine {\sf accode} after the
1339corresponding evaluation of the {\sf hov\_reverse} function. The interface of
1340{\sf accode} is defined as follows:
1341%
1342\begin{tabbing}
1343\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1344\>{\sf int accode(n,tau,deg,Z,B,nz)}\\
1345\>{\sf int n;}                 \> // number of state variables $n$ \\
1346\>{\sf double tau;}            \> // scaling parameter\\
1347\>{\sf int deg;}               \> // degree on current call\\
1348\>{\sf double Z[n][n][deg];}   \> // partials of coefficient vectors\\
1349\>{\sf double B[n][n][deg];}   \> // result $B$ as defined in \eqref{eq:bees}\\
1350\>{\sf short nz[n][n];}        \> // optional nonzero pattern
1351\end{tabbing}
1352%
1353Sparsity information can be exploited by {\sf accode} using the array {\sf
1354nz}. For this purpose, {\sf nz} has to be set by a call of the routine {\sf
1355reverse} or the corresponding basic routines as explained below in
1356\autoref{forw_rev_ad} and \autoref{forw_rev}, respectively. The
1357non-positive entries of {\sf nz} are then changed by {\sf accode} so that upon
1358return
1359\[
1360  \mbox{{\sf B[i][j][k]}} \; \equiv \; 0 \quad {\rm if} \quad \mbox{\sf k} \leq \mbox{\sf $-$nz[i][j]}\; .
1361\]
1362In other words, the matrices $B_k$ = {\sf B[ ][ ][k]} have a
1363sparsity pattern that fills in as $k$ grows. Note, that there need to be no
1364loss in computational efficiency if a time-dependent ordinary differential equation
1365is rewritten in autonomous form.
1366
1367The prototype of the ODE-drivers {\sf forode} and {\sf accode} is contained in the header file
1368\verb=<adolc/drivers/odedrivers.h>=. The global header file
1369\verb=<adolc/adolc.h>=
1370includes this file automatically, see \autoref{ssec:DesIH}.
1371
1372An example program using the procedures {\sf forode} and {\sf accode} together
1373with more detailed information about the coding can be found in
1374\autoref{exam:ode}. The corresponding source code
1375\verb=odexam.cpp= is contained in the subdirectory
1376\verb=examples=.
1377%
1378%
1379%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1380\subsection{Drivers for Sparse Jacobians and Sparse Hessians}
1381\label{sparse}
1382%
1383Quite often, the Jacobians and Hessians that have to be computed are sparse
1384matrices. Therefore, ADOL-C provides additionally drivers that
1385allow the exploitation of sparsity. The exploitation of sparsity is
1386frequently based on {\em graph coloring} methods, discussed
1387for example in \cite{GeMaPo05} and \cite{GeTaMaPo07}. The sparse drivers of ADOL-C presented in this section
1388rely on the the coloring package ColPack developed by the authors of \cite{GeMaPo05} and \cite{GeTaMaPo07}.
1389ColPack is not directly incorporated in ADOL-C, and therefore needs to be installed
1390separately to use the sparse drivers described here. ColPack is available for download at
1391\verb=http://www.cscapes.org/coloringpage/software.htm=. More information about the required
1392installation of ColPack is given in \autoref{install}.
1393%
1394\subsubsection*{Sparse Jacobians and Sparse Hessians}
1395%
1396To compute the entries of sparse Jacobians and sparse Hessians,
1397respectively, in coordinate format one may use the drivers:
1398\begin{tabbing}
1399\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1400\>{\sf int sparse\_jac(tag,m,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1401\>{\sf short int tag;}         \> // tape identification \\
1402\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1403\>{\sf int n;}                 \> // number of independent variables $n$\\
1404\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1405\>{\sf double x[n];}           \> // independent vector $x$ \\
1406\>{\sf int nnz;}               \> // number of nonzeros \\ 
1407\>{\sf unsigned int rind[nnz];}\> // row index\\ 
1408\>{\sf unsigned int cind[nnz];}\> // column index\\ 
1409\>{\sf double values[nnz];}    \> // non-zero values\\ 
1410\>{\sf int options[4];}        \> // array of control parameters\\ 
1411\end{tabbing}
1412%
1413\begin{tabbing}
1414\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1415\>{\sf int sparse\_hess(tag,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1416\>{\sf short int tag;}         \> // tape identification \\
1417\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1418\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1419\>{\sf double x[n];}           \> // independent vector $x$ \\
1420\>{\sf int nnz;}               \> // number of nonzeros \\ 
1421\>{\sf unsigned int rind[nnz];}\> // row indices\\ 
1422\>{\sf unsigned int cind[nnz];}\> // column indices\\ 
1423\>{\sf double values[nnz];}    \> // non-zero values  \\
1424\>{\sf int options[2];}        \> // array of control parameters\\ 
1425\end{tabbing}
1426%
1427Once more, the input variables are the identifier for the internal
1428representation {\sf tag}, if required the number of dependents {\sf m},
1429and the number of independents {\sf n} for a consistency check.
1430Furthermore, the flag {\sf repeat=0} indicates that the functions are called
1431at a point with a new sparsity structure, whereas  {\sf repeat=1} results in
1432the re-usage of the sparsity pattern from the previous call.
1433The current values of the independents are given by the array {\sf x}.
1434The input/output
1435variable {\sf nnz} stores the number of the nonzero entries.
1436Therefore, {\sf nnz} denotes also the length of the arrays {\sf r\_ind} storing
1437the row indices, {\sf c\_ind} storing the column indices, and
1438{\sf values} storing the values of the nonzero entries.
1439If {\sf sparse\_jac} and {\sf sparse\_hess} are called with {\sf repeat=0},
1440the functions determine the number of nonzeros for the sparsity pattern
1441defined by the value of {\sf x}, allocate appropriate arrays {\sf r\_ind},
1442{\sf c\_ind}, and {\sf values} and store the desired information in these
1443arrays.
1444During the next function call with {\sf repeat=1} the allocated memory
1445is reused such that only the values of the arrays are changed.   
1446Before calling {\sf sparse\_jac} or {\sf sparse\_hess} once more with {\sf
1447  repeat=0} the user is responsible for the deallocation of the array
1448 {\sf r\_ind}, {\sf c\_ind}, and {\sf values} using the function {\sf
1449   delete[]}!
1450
1451For each driver the array {\sf options} can be used to adapted the
1452computation of the sparse derivative matrices to the special
1453needs of application under consideration. Most frequently, the default options
1454will give a reasonable performance. The elements of the array {\sf options} control the action of
1455{\sf sparse\_jac} according to \autoref{options_sparse_jac}.
1456\begin{table}[h]
1457\center
1458\begin{tabular}{|c|c|l|} \hline
1459component & value &  \\ \hline
1460{\sf options[0]} &    &  way of sparsity pattern computation \\
1461                 & 0  &  propagation of index domains (default) \\
1462                 & 1  &  propagation of bit pattern \\ \hline
1463{\sf options[1]} &    &  test the computational graph control flow \\
1464                 & 0  &  safe mode (default) \\
1465                 & 1  &  tight mode \\ \hline
1466{\sf options[2]} &    &  way of bit pattern propagation \\
1467                 & 0  &  automatic detection (default) \\
1468                 & 1  &  forward mode \\ 
1469                 & 2  &  reverse mode \\ \hline
1470{\sf options[3]} &    &  way of compression \\
1471                 & 0  &  column compression (default) \\
1472                 & 1  &  row compression \\ \hline
1473\end{tabular}
1474\caption{ {\sf sparse\_jac} parameter {\sf options}\label{options_sparse_jac}}
1475\end{table}           
1476
1477The component {\sf options[1]} determines
1478the usage of the safe or tight mode of sparsity computation.
1479The first, more conservative option is the default. It accounts for all
1480dependences that might occur for any value of the
1481independent variables. For example, the intermediate
1482{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1483always assumed to depend on all independent variables that {\sf a} or {\sf b}
1484dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1485logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1486In contrast
1487the tight option gives this result only in the unlikely event of an exact
1488tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1489associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1490depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1491Obviously, the sparsity pattern obtained with the tight option may contain
1492more zeros than that obtained with the safe option. On the other hand, it
1493will only be valid at points belonging to an area where the function $F$ is locally
1494analytic and that contains the point at which the internal representation was
1495generated. Since generating the sparsity structure using the safe version does not
1496require any reevaluation, it may thus reduce the overall computational cost
1497despite the fact that it produces more nonzero entries.
1498The value of {\sf options[2]} selects the direction of bit pattern propagation.
1499Depending on the number of independent $n$ and of dependent variables $m$ 
1500one would prefer the forward mode if $n$ is significant smaller than $m$ and
1501would otherwise use the reverse mode.
1502
1503 The elements of the array {\sf options} control the action of
1504{\sf sparse\_hess} according to \autoref{options_sparse_hess}.
1505\begin{table}[h]
1506\center
1507\begin{tabular}{|c|c|l|} \hline
1508component & value &  \\ \hline
1509{\sf options[0]} &    &  test the computational graph control flow \\
1510                 & 0  &  safe mode (default) \\
1511                 & 1  &  tight mode \\ \hline
1512{\sf options[1]} &    &  way of recovery \\
1513                 & 0  &  indirect recovery (default) \\
1514                 & 1  &  direct recovery \\ \hline
1515\end{tabular}
1516\caption{ {\sf sparse\_hess} parameter {\sf options}\label{options_sparse_hess}}
1517\end{table}           
1518
1519The described driver routines for the computation of sparse derivative
1520matrices are prototyped in the header file
1521\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1522global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1523Example codes illustrating the usage of {\sf
1524  sparse\_jac} and {\sf sparse\_hess} can be found in the file
1525\verb=sparse_jacobian.cpp=  and \verb=sparse_hessian.cpp= contained in %the subdirectory
1526\verb=examples/additional_examples/sparse=.
1527%
1528%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1529\subsubsection*{Computation of Sparsity Pattern}
1530%
1531ADOL-C offers a convenient way of determining the 
1532sparsity structure of a Jacobian matrix using the function:
1533%
1534\begin{tabbing}
1535\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1536\>{\sf int jac\_pat(tag, m, n, x, JP, options)}\\
1537\>{\sf short int tag;} \> // tape identification \\
1538\>{\sf int m;} \> // number of dependent variables $m$\\
1539\>{\sf int n;} \> // number of independent variables $n$\\
1540\>{\sf double x[n];} \> // independent variables $x_0$\\
1541\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure\\
1542\>{\sf int options[2];} \> // array of control parameters
1543\end{tabbing}
1544%
1545The sparsity pattern of the
1546Jacobian is computed in a compressed row format. For this purpose,
1547{\sf JP} has to be an $m$ dimensional array of pointers to {\sf
1548  unsigned int}s, i.e., one has {\sf unsigned int* JP[m]}.
1549During the call of  {\sf jac\_pat}, the number $\hat{n}_i$ of nonzero
1550entries in row $i$ of the Jacobian is determined for all $1\le i\le
1551m$. Then, a memory allocation is performed such that {\sf JP[i-1]}
1552points to a block of $\hat{n}_i+1$ {\sf  unsigned int} for all $1\le
1553i\le m$ and {\sf JP[i-1][0]} is set to $\hat{n}_i$. Subsequently, the
1554column indices of the $j$ nonzero entries in the $i$th row are stored
1555in the components  {\sf JP[i-1][1]}, \ldots, {\sf JP[i-1][j]}.
1556
1557The elements of the array {\sf options} control the action of
1558{\sf jac\_pat} according to \autoref{options}.
1559\begin{table}[h]
1560\center
1561\begin{tabular}{|c|c|l|} \hline
1562component & value &  \\ \hline
1563{\sf options[0]} &    &  way of sparsity pattern computation \\
1564                 & 0  &  propagation of index domains (default) \\
1565                 & 1  &  propagation of bit pattern \\ \hline
1566{\sf options[1]} &    &  test the computational graph control flow \\
1567                 & 0  &  safe mode (default) \\
1568                 & 1  &  tight mode \\ \hline
1569{\sf options[2]} &    &  way of bit pattern propagation \\
1570                 & 0  &  automatic detection (default) \\
1571                 & 1  &  forward mode \\ 
1572                 & 2  &  reverse mode \\ \hline
1573\end{tabular}
1574\caption{ {\sf jac\_pat} parameter {\sf options}\label{options}}
1575\end{table}           
1576The value of {\sf options[0]} selects the way to compute the sparsity
1577pattern. The component {\sf options[1]} determines
1578the usage of the safe or tight mode of bit pattern propagation.
1579The first, more conservative option is the default. It accounts for all
1580dependences that might occur for any value of the
1581independent variables. For example, the intermediate
1582{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1583always assumed to depend on all independent variables that {\sf a} or {\sf b}
1584dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1585logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1586In contrast
1587the tight option gives this result only in the unlikely event of an exact
1588tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1589associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1590depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1591Obviously, the sparsity pattern obtained with the tight option may contain
1592more zeros than that obtained with the safe option. On the other hand, it
1593will only be valid at points belonging to an area where the function $F$ is locally
1594analytic and that contains the point at which the internal representation was
1595generated. Since generating the sparsity structure using the safe version does not
1596require any reevaluation, it may thus reduce the overall computational cost
1597despite the fact that it produces more nonzero entries. The value of
1598{\sf options[2]} selects the direction of bit pattern propagation.
1599Depending on the number of independent $n$ and of dependent variables $m$ 
1600one would prefer the forward mode if $n$ is significant smaller than $m$ and
1601would otherwise use the reverse mode.
1602
1603The routine {\sf jac\_pat} may use the propagation of bitpattern to
1604determine the sparsity pattern. Therefore, a kind of ``strip-mining''
1605is used to cope with large matrix dimensions. If the system happens to run out of memory, one may reduce
1606the value of the constant {\sf PQ\_STRIPMINE\_MAX}
1607following the instructions in \verb=<adolc/sparse/sparse_fo_rev.h>=.
1608
1609The driver routine is prototyped in the header file
1610\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1611global header file \verb=<adolc/adolc.h>= (see
1612\autoref{ssec:DesIH}). The determination of sparsity patterns is
1613illustrated by the examples \verb=sparse_jacobian.cpp=
1614and \verb=jacpatexam.cpp=
1615contained in
1616\verb=examples/additional_examples/sparse=.
1617
1618To compute the sparsity pattern of a Hessian in a row compressed form, ADOL-C provides the
1619driver
1620\begin{tabbing}
1621\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1622\>{\sf int hess\_pat(tag, n, x, HP, options)}\\
1623\>{\sf short int tag;}       \> // tape identification \\
1624\>{\sf int n;}               \> // number of independent variables $n$\\
1625\>{\sf double x[n];}         \> // independent variables $x_0$\\
1626\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure\\
1627\>{\sf int option;}          \> // control parameter
1628\end{tabbing}
1629where the user has to provide {\sf HP} as an $n$ dimensional array of pointers to {\sf
1630 unsigned int}s.
1631After the function call {\sf HP} contains the sparsity pattern,
1632where {\sf HP[j][0]} contains the number of nonzero elements in the
1633 $j$th row for $1 \le j\le n$.
1634The components {\sf P[j][i]}, $0<${\sf i}~$\le$~{\sf P[j][0]} store the
1635 indices of these entries. For determining the sparsity pattern, ADOL-C uses
1636 the algorithm described in \cite{Wa05a}.  The parameter{\sf option} determines
1637the usage of the safe ({\sf option = 0}, default) or tight mode ({\sf
1638  option = 1}) of the computation of the sparsity pattern as described
1639above.
1640
1641This driver routine is prototyped in the header file
1642\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1643global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1644An example employing the procedure {\sf hess\_pat}  can be found in the file
1645\verb=sparse_hessian.cpp=  contained in
1646\verb=examples/additional_examples/sparse=.
1647%
1648%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1649\subsubsection*{Calculation of Seed Matrices}
1650%
1651To compute a compressed derivative matrix from a given sparsity
1652pattern, one has to calculate an appropriate seed matrix that can be
1653used as input for the derivative calculation. To facilitate the
1654generation of seed matrices for a sparsity pattern given in
1655row compressed form, ADOL-C provides the following two drivers,
1656which are based on the ColPack library:
1657\begin{tabbing}
1658\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1659\>{\sf int generate\_seed\_jac(m, n, JP, S, p)}\\
1660\>{\sf int m;} \> // number of dependent variables $m$\\
1661\>{\sf int n;} \> // number of independent variables $n$\\
1662\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure
1663of Jacobian\\
1664\>{\sf double S[n][p];} \> // seed matrix\\
1665\>{\sf int p;} \> // number of columns in $S$
1666\end{tabbing}
1667The input variables to {\sf generate\_seed\_jac} are the number of dependent variables $m$, the
1668number of independent variables {\sf n} and the sparsity pattern {\sf
1669  JP} of the Jacobian computed for example by {\sf jac\_pat}. First,
1670{\sf generate\_seed\_jac} performs a distance-2 coloring of the bipartite graph defined by the sparsity
1671pattern {\sf JP} as described in \cite{GeMaPo05}. The number of colors needed for the coloring
1672determines the number of columns {\sf p} in the seed
1673matrix. Subsequently, {\sf generate\_seed\_jac} allocates the memory needed by {\sf
1674 S} and initializes {\sf S} according to the graph coloring.
1675The coloring algorithm that is applied in {\sf
1676  generate\_seed\_jac} is used also by the driver {\sf sparse\_jac}
1677described earlier.
1678
1679\begin{tabbing}
1680\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1681\>{\sf int generate\_seed\_hess(n, HP, S, p)}\\
1682\>{\sf int n;} \> // number of independent variables $n$\\
1683\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure
1684of Jacobian\\
1685\>{\sf double S[n][p];} \> // seed matrix\\
1686\>{\sf int p;} \> // number of columns in $S$
1687\end{tabbing}
1688The input variables to {\sf generate\_seed\_hess} are the number of independents $n$
1689and the sparsity pattern {\sf HP} of the Hessian computed for example
1690by {\sf hess\_pat}. First, {\sf generate\_seed\_hess} performs an
1691appropriate coloring of the adjacency graph defined by the sparsity
1692pattern {\sf HP}: An acyclic coloring in the case of an indirect recovery of the Hessian from its
1693    compressed representation and a star coloring in the case of a direct recovery.
1694 Subsequently, {\sf generate\_seed\_hess} allocates the memory needed by {\sf
1695 S} and initializes {\sf S} according to the graph coloring.
1696The coloring algorithm applied in {\sf
1697  generate\_seed\_hess} is used also by the driver {\sf sparse\_hess}
1698described earlier.
1699
1700The specific set of criteria used to define a seed matrix $S$ depends
1701on whether the sparse derivative matrix
1702to be computed is a Jacobian (nonsymmetric) or a Hessian (symmetric). 
1703It also depends on whether the entries of the derivative matrix  are to be
1704recovered from the compressed representation \emph{directly}
1705(without requiring any further arithmetic) or \emph{indirectly} (for
1706example, by solving for unknowns via successive substitutions).
1707Appropriate recovery routines are provided by ColPack and used
1708in the drivers {\sf sparse\_jac} and {\sf sparse\_hess} described in
1709the previous subsection. Examples with a detailed analysis of the
1710employed drivers for the exploitation of sparsity can be found in the
1711papers \cite{GePoTaWa06} and \cite{GePoWa08}.
1712
1713
1714These driver routines are prototyped in
1715\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1716global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1717An example code illustrating the usage of {\sf
1718generate\_seed\_jac} and {\sf generate\_seed\_hess} can be found in the file
1719\verb=sparse_jac_hess_exam.cpp= contained in \verb=examples/additional_examples/sparse=.
1720%
1721%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1722\subsection{Higher Derivative Tensors}
1723\label{higherOrderDeriv}
1724%
1725Many applications in scientific computing need second- and higher-order
1726derivatives. Often, one does not require full derivative tensors but
1727only the derivatives in certain directions $s_i \in \R^{n}$.
1728Suppose a collection of $p$ directions
1729$s_i \in \R^{n}$ is given, which form a matrix
1730\[
1731S\; =\; \left [ s_1, s_2,\ldots,  s_p \right ]\; \in \;
1732 \R^{n \times p}.
1733\]
1734One possible choice is $S = I_n$ with  $p = n$, which leads to
1735full tensors being evaluated.
1736ADOL-C provides the function {\sf tensor\_eval}
1737to calculate the derivative tensors
1738\begin{eqnarray}
1739\label{eq:tensor}
1740\left. \nabla_{\mbox{$\scriptstyle \!\!S$}}^{k}
1741     F(x_0) \; = \; \frac{\partial^k}{\partial z^k} F(x_0+Sz) \right |_{z=0} 
1742     \in \R^{p^k}\quad \mbox{for} \quad k = 0,\ldots,d
1743\end{eqnarray}
1744simultaneously. The function {\sf tensor\_eval} has the following calling sequence and
1745parameters:
1746%
1747\begin{tabbing}
1748\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1749\>{\sf void tensor\_eval(tag,m,n,d,p,x,tensor,S)}\\
1750\>{\sf short int tag;}         \> // tape identification \\
1751\>{\sf int m;}                 \> // number of dependent variables $m$ \\
1752\>{\sf int n;}                 \> // number of independent variables $n$\\
1753\>{\sf int d;}                 \> // highest derivative degree $d$\\
1754\>{\sf int p;}                 \> // number of directions $p$\\
1755\>{\sf double x[n];}           \> // values of independent variables $x_0$\\
1756\>{\sf double tensor[m][size];}\> // result as defined in \eqref{eq:tensor} in compressed form\\
1757\>{\sf double S[n][p];}        \> // seed matrix $S$
1758\end{tabbing}
1759%
1760Using the symmetry of the tensors defined by \eqref{eq:tensor}, the memory 
1761requirement can be reduced enormously. The collection of  tensors up to order $d$ comprises 
1762$\binom{p+d}{d}$ distinct elements. Hence, the second dimension of {\sf tensor} must be
1763greater or equal to $\binom{p+d}{d}$.
1764To compute the derivatives, {\sf tensor\_eval} propagates internally univariate Taylor
1765series along $\binom{n+d-1}{d}$ directions. Then the desired values are interpolated. This
1766approach is described in \cite{Griewank97}.
1767
1768The access of individual entries in symmetric tensors of
1769higher order is a little tricky. We always store the derivative
1770values in the two dimensional array {\sf tensor} and provide two
1771different ways of accessing them. 
1772The leading dimension of the tensor array ranges over
1773the component index $i$ of the function $F$, i.e., $F_{i+1}$ for $i =
17740,\ldots,m-1$. The sub-arrays pointed to by {\sf tensor[i]} have identical
1775structure for all $i$. Each of them represents the symmetric tensors up to
1776order $d$ of the scalar function $F_{i+1}$ in $p$ variables. 
1777%
1778The $\binom{p+d}{d}$ mixed partial derivatives in each of the $m$
1779tensors are linearly ordered according to the tetrahedral
1780scheme described by Knuth \cite{Knuth73}. In the familiar quadratic
1781case $d=2$ the derivative with respect to $z_j$ and $z_k$ with $z$ 
1782as in \eqref{eq:tensor} and $j \leq k$ is stored at {\sf tensor[i][l]} with
1783$l = k*(k+1)/2+j$. At $j = 0 = k$ and hence $l = 0$ we find the
1784function value $F_{i+1}$ itself and the gradient
1785$\nabla F_{i+1}= \partial F_{i+1}/\partial x_k $ is stored at $l=k(k+1)/2$
1786with $j=0$ for $k=1,\ldots,p$.
1787
1788For general $d$ we combine the variable
1789indices to a multi-index $j = (j_1,j_2,\ldots,j_d)$,
1790where $j_k$ indicates differentiation with respect to variable
1791$x_{j_k}$ with $j_k \in \{0,1,\ldots,p\}$. The value $j_k=0$ indicates
1792no differentiation so that all lower derivatives are also
1793contained in the same data structure as described above for
1794the quadratic case. The location of the partial derivative specified
1795by $j$ is computed by the function
1796%
1797\begin{tabbing}
1798\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1799\>{\sf int address(d,$\,$j)} \\
1800\>{\sf int d;}                 \> // highest derivative degree $d$ \\
1801\>{\sf int j[d];}              \> // multi-index $j$
1802\end{tabbing}       
1803%
1804and it may thus be referenced as {\sf tensor[i][address(d,$\,$j)]}.
1805Notice that the address computation does depend on the degree $d$ 
1806but not on the number of directions $p$, which could theoretically be
1807enlarged without the need to reallocate the original tensor.
1808Also, the components of $j$ need to be non-increasing.
1809%
1810To some C programmers it may appear more natural to access tensor
1811entries by successive dereferencing in the form
1812{\sf tensorentry[i][$\,$j1$\,$][$\,$j2$\,$]$\ldots$[$\,$jd$\,$]}.
1813We have also provided this mode, albeit with the restriction
1814that the indices $j_1,j_2,\ldots,j_d$ are non-increasing.
1815In the second order case this means that the Hessian entries must be
1816specified in or below the diagonal. If this restriction is
1817violated the values are almost certain to be wrong and array bounds
1818may be violated. We emphasize that subscripting is not overloaded
1819but that {\sf tensorentry} is a conventional and
1820thus moderately efficient C pointer structure.
1821Such a pointer structure can be allocated and set up completely by the
1822function
1823%
1824\begin{tabbing}
1825\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1826\>{\sf void** tensorsetup(m,p,d,tensor)} \\
1827\>{\sf int m;}                 \> // number of dependent variables $n$ \\
1828\>{\sf int p;}                 \> // number of directions $p$\\
1829\>{\sf int d;}                 \> // highest derivative degree $d$\\
1830\>{\sf double tensor[m][size];}\> // pointer to two dimensional array
1831\end{tabbing}     
1832%
1833Here, {\sf tensor} is the array of $m$ pointers pointing to arrays of {\sf size}
1834$\geq \binom{p+d}{d}$ allocated by the user before. During the execution of {\sf tensorsetup},
1835 $d-1$ layers of pointers are set up so that the return value
1836allows the direct dereferencing of individual tensor elements.
1837
1838For example, suppose some active section involving  $m \geq 5$ dependents and
1839$n \geq 2$ independents has been executed and taped. We may
1840select $p=2$, $d=3$ and initialize the $n\times 2$ seed matrix $S$ with two
1841columns $s_1$ and $s_2$. Then we are able to execute the code segment
1842\begin{tabbing}
1843\hspace{0.5in}\={\sf double**** tensorentry = (double****) tensorsetup(m,p,d,tensor);} \\
1844              \>{\sf tensor\_eval(tag,m,n,d,p,x,tensor,S);}   
1845\end{tabbing}
1846This way, we evaluated all tensors defined in \eqref{eq:tensor} up to degree 3
1847in both directions $s_1$ and
1848$s_2$ at some argument $x$. To allow the access of tensor entries by dereferencing the pointer
1849structure {\sf tensorentry} has been created. Now, 
1850the value of the mixed partial
1851\[
1852 \left. \frac{\partial ^ 3 F_5(x+s_1 z_1+s_2 z_2)}{\partial z_1^2 \partial z_2}   \right |_{z_1=0=z_2
1853\]
1854can be recovered as
1855\begin{center}
1856   {\sf tensorentry[4][2][1][1]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[4][address(d,$\,$j)]},
1857\end{center}
1858where the integer array {\sf j} may equal (1,1,2), (1,2,1) or (2,1,1). 
1859Analogously, the entry
1860\begin{center}   
1861   {\sf tensorentry[2][1][0][0]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[2][address(d,$\,$j)]}
1862\end{center}
1863with {\sf j} = (1,0,0) contains the first derivative of the third dependent
1864variable $F_3$ with respect to the first differentiation parameter $z_1$.
1865
1866Note, that the pointer structure {\sf tensorentry} has to be set up only once. Changing the values of the
1867array {\sf tensor}, e.g.~by a further call of {\sf tensor\_eval}, directly effects the values accessed
1868by {\sf tensorentry}.
1869%
1870When no more derivative evaluations are desired the pointer structure
1871{\sf tensorentry} can be deallocated by a call to the function
1872%
1873\begin{tabbing}
1874\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1875\>{\sf int freetensor(m,p,d, (double ****) tensorentry)}\\
1876\>{\sf int m;}                    \> // number of dependent variables $m$ \\
1877\>{\sf int p;}                    \> // number of independent variables $p$\\
1878\>{\sf int d;}                    \> // highest derivative degree $d$\\
1879\>{\sf double*** tensorentry[m];} \> // return value of {\sf tensorsetup} 
1880\end{tabbing} 
1881%
1882that does not deallocate the array {\sf tensor}.
1883
1884The drivers provided for efficient calculation of higher order
1885derivatives are prototyped in the header file \verb=<adolc/drivers/taylor.h>=,
1886which is included by the global header file \verb=<adolc/adolc.h>= automatically
1887(see \autoref{ssec:DesIH}).
1888Example codes using the above procedures can be found in the files
1889\verb=taylorexam.C= and \verb=accessexam.C= contained in the subdirectory
1890\verb=examples/additional_examples/taylor=.
1891%
1892%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1893\subsection{Derivatives of Implicit and Inverse Functions}
1894\label{implicitInverse}
1895%
1896Frequently, one needs derivatives of variables
1897$y \in \R^{m}$ that are implicitly defined as
1898functions of some variables $x \in \R^{n-m}$
1899by an algebraic system of equations
1900\[
1901G(z) \; = \; 0 \in \R^m \quad
1902{\rm with} \quad z = (y, x) \in \R^n .
1903\] 
1904Naturally, the $n$ arguments of $G$ need not be partitioned in
1905this regular fashion and we wish to provide flexibility for a
1906convenient selection of the $n-m$ {\em truly} independent
1907variables. Let $P \in \R^{(n-m)\times n}$ be a $0-1$ matrix
1908that picks out these variables so that it is a column
1909permutation of the matrix $[0,I_{n-m}] \in \R^{(n-m)\times n}$.
1910Then the nonlinear system
1911\[
1912  G(z) \; = \; 0, \quad P z =  x,                           
1913\] 
1914has a regular Jacobian, wherever the implicit function theorem
1915yields $y$ as a function of $x$. Hence, we may also write
1916\begin{equation}
1917\label{eq:inv_tensor}
1918F(z) = \left(\begin{array}{c}
1919                        G(z) \\
1920                        P z
1921                      \end{array} \right)\; \equiv \;
1922                \left(\begin{array}{c}
1923                        0 \\
1924                        P z
1925                      \end{array} \right)\; \equiv \; S\, x,
1926\end{equation}
1927where $S = [0,I_p]^{T} \in \R^{n \times p}$ with $p=n-m$. Now, we have rewritten
1928the original implicit functional relation between $x$ and $y$ as an inverse
1929relation $F(z) = Sx$. In practice, we may implement the projection $P$ simply
1930by marking $n-m$ of the independents also dependent. 
1931
1932Given any $ F : \R^n \mapsto \R^n $ that is locally invertible and an arbitrary
1933seed matrix $S \in \R^{n \times p}$ we may evaluate all derivatives of $z \in \R^n$
1934with respect to $x \in \R^p$ by calling the following routine:
1935%
1936\begin{tabbing}
1937\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1938\>{\sf void inverse\_tensor\_eval(tag,n,d,p,z,tensor,S)}\\
1939\>{\sf short int tag;}         \> // tape identification \\
1940\>{\sf int n;}                 \> // number of variables $n$\\
1941\>{\sf int d;}                 \> // highest derivative degree $d$\\
1942\>{\sf int p;}                 \> // number of directions $p$\\
1943\>{\sf double z[n];}          \> // values of independent variables $z$\\
1944\>{\sf double tensor[n][size];}\> // partials of $z$ with respect to $x$\\
1945\>{\sf double S[n][p];}        \> // seed matrix $S$
1946\end{tabbing}                 
1947%                 
1948The results obtained in {\sf tensor} are exactly the same as if we had called {\sf tensor\_eval} with
1949{\sf tag} pointing to a tape for the evaluation of the inverse function
1950$z=F^{-1}(y)$ for which naturally $n=m$. Note that the columns of $S$ belong
1951to the domain of that function. Individual derivative components can be
1952accessed in tensor exactly as in the explicit case described above.
1953
1954It must be understood that {\sf inverse\_tensor\_eval} actually computes the
1955derivatives of $z$ with respect to $x$ that is defined by the equation
1956$F(z)=F(z_0)+S \, x$. In other words the base point at
1957which the inverse function is differentiated is given by $F(z_0)$.
1958The routine has no capability for inverting $F$ itself as
1959solving systems of nonlinear
1960equations $F(z)=0$ in the first place is not just a differentiation task.
1961However, the routine {\sf jac\_solv} described in \autoref{optdrivers} may certainly be very
1962useful for that purpose.
1963
1964As an example consider the following two nonlinear expressions
1965\begin{eqnarray*}
1966      G_1(z_1,z_2,z_3,z_4) & = & z_1^2+z_2^2-z_3^\\
1967      G_2(z_1,z_2,z_3,z_4) & = & \cos(z_4) - z_1/z_3 \enspace   .
1968\end{eqnarray*}   
1969The equations $G(z)=0$ describe the relation between the Cartesian
1970coordinates $(z_1,z_2)$ and the polar coordinates $(z_3,z_4)$ in the plane.
1971Now, suppose we are interested in the derivatives of the second Cartesian
1972$y_1=z_2$ and the second (angular) polar coordinate $y_2=z_4$ with respect
1973to the other two variables $x_1=z_1$ and $x_2=z_3$. Then the active section
1974could look simply like
1975%
1976\begin{tabbing}
1977\hspace{1.5in}\={\sf for (j=1; j $<$ 5;$\,$j++)}\hspace{0.15in} \= {\sf z[j] \boldmath $\ll=$ \unboldmath  zp[j];}\\
1978\>{\sf g[1] = z[1]*z[1]+z[2]*z[2]-z[3]*z[3]; }\\
1979\>{\sf g[2] = cos(z[4]) - z[1]/z[3]; }\\
1980\>{\sf g[1] \boldmath $\gg=$ \unboldmath gp[1];} \> {\sf g[2] \boldmath $\gg=$ \unboldmath gp[2];}\\
1981\>{\sf z[1] \boldmath $\gg=$ \unboldmath zd[1];} \> {\sf z[3] \boldmath $\gg=$ \unboldmath zd[2];}
1982\end{tabbing}     
1983%
1984where {\sf zd[1]} and {\sf zd[2]} are dummy arguments.
1985In the last line the two independent variables {\sf z[1]} and
1986{\sf z[3]} are made
1987simultaneously dependent thus generating a square system that can be
1988inverted (at most arguments). The corresponding projection and seed
1989matrix are
1990\begin{eqnarray*}
1991P \;=\; \left( \begin{array}{cccc}
1992               1 & 0 & 0 & 0 \\
1993               0 & 0 & 1 & 0
1994            \end{array}\right) \quad \mbox{and} \quad
1995S^T \; = \; \left( \begin{array}{cccc}
1996               0 & 0 & 1 & 0 \\
1997               0 & 0 & 0 & 1
1998            \end{array}\right\enspace .
1999\end{eqnarray*}
2000Provided the vector {\sf zp} is consistent in that its Cartesian and polar
2001components describe the same point in the plane the resulting tuple
2002{\sf gp} must vanish. The call to {\sf inverse\_tensor\_eval} with
2003$n=4$, $p=2$ and $d$
2004as desired will yield the implicit derivatives, provided
2005{\sf tensor} has been allocated appropriately of course and $S$ has the value
2006given above.
2007%
2008The example is untypical in that the implicit function could also be
2009obtained explicitly by symbolic mani\-pu\-lations. It is typical in that
2010the subset of $z$ components that are to be considered as truly
2011independent can be selected and altered with next to no effort at all.
2012
2013The presented drivers are prototyped in the header file
2014\verb=<adolc/drivers/taylor.h>=. As indicated before this header
2015is included by the global header file \verb=<adolc/adolc.h>= automatically
2016(see \autoref{ssec:DesIH}).
2017The example programs \verb=inversexam.cpp=, \verb=coordinates.cpp= and
2018\verb=trigger.cpp=  in the directory \verb=examples/additional_examples/taylor=
2019show the application of the procedures described here.
2020%
2021%
2022%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
2023\section{Basic Drivers for the Forward and Reverse Mode}
2024\label{forw_rev_ad}
2025%
2026In this section, we present tailored drivers for different
2027variants of the forward mode and the reverse mode, respectively.
2028For a better understanding, we start with a short
2029description of the mathematical background.
2030
2031Provided no arithmetic exception occurs,
2032no comparison including {\sf fmax} or  {\sf fmin} yields a tie,
2033{\sf fabs} does not yield zero,
2034and all special functions were evaluated in the
2035interior of their domains, the functional relation between the input
2036variables $x$
2037and the output variables $y$ denoted by $y=F(x)$ is in
2038fact analytic.  In other words, we can compute arbitrarily high
2039derivatives of the vector function $F : I\!\!R^n \mapsto I\!\!R^m$ defined
2040by the active section.
2041We find it most convenient to describe and
2042compute derivatives in terms of univariate Taylor expansions, which
2043are truncated after the highest derivative degree $d$ that is desired
2044by the user. Let
2045\begin{equation}
2046\label{eq:x_of_t}
2047x(t) \; \equiv \; \sum_{j=0}^dx_jt^j \; : \;  I\!\!R \; \mapsto \;
2048I\!\!R^n
2049\end{equation} 
2050denote any vector polynomial in the scalar variable $t \in I\!\!R$.
2051In other words, $x(t)$ describes a path in $I\!\!R^n$ parameterized by $t$.
2052The Taylor coefficient vectors
2053\[ x_j \; = \; 
2054\frac{1}{j!} \left .  \frac{\partial ^j}{\partial t^j} x(t)
2055\right |_{t=0}
2056\] 
2057are simply the scaled derivatives of $x(t)$ at the parameter
2058origin $t=0$. The first two vectors $x_1,x_2 \in I\!\!R^n$ can be
2059visualized as tangent and curvature at the base point $x_0$,
2060respectively.
2061Provided that $F$ is $d$ times continuously differentiable, it
2062follows from the chain rule that the image path
2063\begin{equation}
2064\label{eq:rela}
2065 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
2066\end{equation}
2067is also smooth and has $(d+1)$ Taylor coefficient vectors
2068$y_j \in I\!\!R^m$ at $t=0$, so that
2069\begin{equation}
2070\label{eq:series}
2071y(t) \; = \; \sum_{j=0}^d y_jt^j + O(t^{d+1}).
2072\end{equation}
2073Also as a consequence of the chain rule, one can observe that
2074each $y_j$ is uniquely and smoothly determined by the coefficient
2075vectors $x_i$ with $i \leq j$.  In particular we have
2076\begin{align}
2077\label{eq:y_0y_1}
2078  y_0 & = F(x_0) \nonumber \\
2079  y_1 & = F'(x_0) x_1 \nonumber\\
2080  y_2 & = F'(x_0) x_2 + \frac{1}{2}F''(x_0)x_1 x_1 \\
2081  y_3 & = F'(x_0) x_3 + F''(x_0)x_1 x_2
2082          + \frac{1}{6}F'''(x_0)x_1 x_1 x_1\nonumber\\
2083  & \ldots\nonumber
2084\end{align}
2085In writing down the last equations we have already departed from the
2086usual matrix-vector notation. It is well known that the number of
2087terms that occur in these ``symbolic'' expressions for
2088the $y_j$ in terms of the first $j$ derivative tensors of $F$ and
2089the ``input'' coefficients $x_i$ with $i\leq j$ grows very rapidly
2090with $j$. Fortunately, this exponential growth does not occur
2091in automatic differentiation, where the many terms are somehow
2092implicitly combined  so that storage and operations count grow only
2093quadratically in the bound $d$ on $j$.
2094
2095Provided $F$ is analytic, this property is inherited by the functions
2096\[
2097y_j = y_j (x_0,x_1, \ldots ,x_j) \in {I\!\!R}^m ,
2098\]
2099and their derivatives satisfy the identities
2100\begin{equation}
2101\label{eq:identity}
2102\frac{\partial y_j}{\partial x_i}  = \frac{\partial y_{j-i}}
2103{\partial x_0} = A_{j-i}(x_0,x_1, \ldots ,x_{j-i})
2104\end{equation}
2105as established in \cite{Chri91a}. This yields in particular
2106\begin{align*}
2107  \frac{\partial y_0}{\partial x_0} =
2108  \frac{\partial y_1}{\partial x_1} =
2109  \frac{\partial y_2}{\partial x_2} =
2110  \frac{\partial y_3}{\partial x_3} =
2111  A_0 & = F'(x_0) \\
2112  \frac{\partial y_1}{\partial x_0} =
2113  \frac{\partial y_2}{\partial x_1} =
2114  \frac{\partial y_3}{\partial x_2} =
2115  A_1 & = F''(x_0) x_1 \\
2116  \frac{\partial y_2}{\partial x_0} =
2117  \frac{\partial y_3}{\partial x_1} =
2118  A_2 & = F''(x_0) x_2 + \frac{1}{2}F'''(x_0)x_1 x_1 \\
2119  \frac{\partial y_3}{\partial x_0} =
2120  A_3 & = F''(x_0) x_3 + F'''(x_0)x_1 x_2
2121          + \frac{1}{6}F^{(4)}(x_0)x_1 x_1 x_1 \\
2122  & \ldots
2123\end{align*}
2124The $m \times n$ matrices $A_k, k=0,\ldots,d$, are actually the Taylor
2125coefficients of the Jacobian path $F^\prime(x(t))$, a fact that is of
2126interest primarily in the context of ordinary differential
2127equations and differential algebraic equations.
2128
2129Given the tape of an active section and the coefficients $x_j$,
2130the resulting $y_j$ and their derivatives $A_j$ can be evaluated
2131by appropriate calls to the ADOL-C forward mode implementations and
2132the ADOL-C reverse mode implementations. The scalar versions of the forward
2133mode propagate just one truncated Taylor series from the $(x_j)_{j\leq d}$
2134to the $(y_j)_{j\leq d}$. The vector versions of the forward
2135mode propagate families of $p\geq 1$ such truncated Taylor series
2136in order to reduce the relative cost of the overhead incurred
2137in the tape interpretation. In detail, ADOL-C provides
2138\begin{tabbing}
2139\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2140\>{\sf int zos\_forward(tag,m,n,keep,x,y)}\\
2141\>{\sf short int tag;}         \> // tape identification \\
2142\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2143\>{\sf int n;}                 \> // number of independent variables $n$\\
2144\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2145\>{\sf double x[n];}           \> // independent vector $x=x_0$\\
2146\>{\sf double y[m];}           \> // dependent vector $y=F(x_0)$
2147\end{tabbing}                 
2148for the {\bf z}ero-{\bf o}rder {\bf s}calar forward mode. This driver computes
2149$y=F(x)$ with $0\leq\text{\sf keep}\leq 1$. The integer
2150flag {\sf keep} plays a similar role as in the call to 
2151{\sf trace\_on}: It determines if {\sf zos\_forward} writes
2152the first Taylor coefficients of all intermediate quantities into a buffered
2153temporary file, i.e., the value stack, in preparation for a subsequent
2154reverse mode evaluation. The value {\sf keep} $=1$
2155prepares for {\sf fos\_reverse} or {\sf fov\_reverse} as exlained below.
2156
2157To compute first-order derivatives, one has
2158\begin{tabbing}
2159\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2160\>{\sf int fos\_forward(tag,m,n,keep,x0,x1,y0,y1)}\\
2161\>{\sf short int tag;}         \> // tape identification \\
2162\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2163\>{\sf int n;}                 \> // number of independent variables $n$\\
2164\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2165\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2166\>{\sf double x1[n];}          \> // tangent vector $x_1$\\
2167\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2168\>{\sf double y1[m];}          \> // first derivative $y_1=F'(x_0)x_1$
2169\end{tabbing}                 
2170for the {\bf f}irst-{\bf o}rder {\bf s}calar forward mode. Here, one has
2171$0\leq\text{\sf keep}\leq 2$, where
2172\begin{align*}
2173\text{\sf keep} = \left\{\begin{array}{cl}
2174       1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2175       2 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2176       \end{array}\right.
2177\end{align*}
2178as exlained below. For the {\bf f}irst-{\bf o}rder {\bf v}ector forward mode,
2179ADOL-C provides
2180\begin{tabbing}
2181\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2182\>{\sf int fov\_forward(tag,m,n,p,x0,X,y0,Y)}\\
2183\>{\sf short int tag;}         \> // tape identification \\
2184\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2185\>{\sf int n;}                 \> // number of independent variables $n$\\
2186\>{\sf int p;}                 \> // number of directions\\
2187\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2188\>{\sf double X[n][p];}        \> // tangent matrix $X$\\
2189\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2190\>{\sf double Y[m][p];}        \> // first derivative matrix $Y=F'(x)X$
2191\end{tabbing}                 
2192For the computation of higher derivative, the driver
2193\begin{tabbing}
2194\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2195\>{\sf int hos\_forward(tag,m,n,d,keep,x0,X,y0,Y)}\\
2196\>{\sf short int tag;}         \> // tape identification \\
2197\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2198\>{\sf int n;}                 \> // number of independent variables $n$\\
2199\>{\sf int d;}                 \> // highest derivative degree $d$\\
2200\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2201\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2202\>{\sf double X[n][d];}        \> // tangent matrix $X$\\
2203\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2204\>{\sf double Y[m][d];}        \> // derivative matrix $Y$
2205\end{tabbing}
2206implementing the  {\bf h}igher-{\bf o}rder {\bf s}calar forward mode.
2207The rows of the matrix $X$ must correspond to the independent variables in the order of their
2208initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2209$X = \{x_j\}_{j=1\ldots d}$ represent Taylor coefficient vectors as in
2210\eqref{eq:x_of_t}. The rows of the matrix $Y$ must correspond to the
2211dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2212The columns of $Y = \{y_j\}_{j=1\ldots d}$ represent
2213Taylor coefficient vectors as in \eqref{eq:series}, i.e., {\sf hos\_forward}
2214computes the values
2215$y_0=F(x_0)$, $y_1=F'(x_0)x_1$, \ldots, where
2216$X=[x_1,x_2,\ldots,x_d]$ and  $Y=[y_1,y_2,\ldots,y_d]$. Furthermore, one has
2217$0\leq\text{\sf keep}\leq d+1$, with
2218\begin{align*}
2219\text{\sf keep}  \left\{\begin{array}{cl}
2220       = 1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2221       > 1 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2222       \end{array}\right.
2223\end{align*}
2224Once more, there is also a vector version given by
2225\begin{tabbing}
2226\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2227\>{\sf int hov\_forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2228\>{\sf short int tag;}         \> // tape identification \\
2229\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2230\>{\sf int n;}                 \> // number of independent variables $n$\\
2231\>{\sf int d;}                 \> // highest derivative degree $d$\\
2232\>{\sf int p;}                 \> // number of directions $p$\\
2233\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2234\>{\sf double X[n][p][d];}     \> // tangent matrix $X$\\
2235\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2236\>{\sf double Y[m][p][d];}     \> // derivative matrix $Y$
2237\end{tabbing}
2238for the  {\bf h}igher-{\bf o}rder {\bf v}ector forward mode that computes
2239$y_0=F(x_0)$, $Y_1=F'(x_0)X_1$, \ldots, where $X=[X_1,X_2,\ldots,X_d]$ and 
2240$Y=[Y_1,Y_2,\ldots,Y_d]$.
2241
2242There are also overloaded versions providing a general {\sf forward}-call.
2243Details of the appropriate calling sequences are given in \autoref{forw_rev}.
2244
2245Once, the required information is generated due to a forward mode evaluation
2246with an approriate value of the parameter {\sf keep}, one may use the
2247following implementation variants of the reverse mode. To compute first-order derivatives
2248one can use
2249\begin{tabbing}
2250\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2251\>{\sf int fos\_reverse(tag,m,n,u,z)}\\
2252\>{\sf short int tag;}         \> // tape identification \\
2253\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2254\>{\sf int n;}                 \> // number of independent variables $n$\\
2255\>{\sf double u[m];}           \> // weight vector $u$\\
2256\>{\sf double z[n];}           \> // resulting adjoint value $z^T=u^T F'(x)$
2257\end{tabbing}                 
2258as {\bf f}irst-{\bf o}rder {\bf s}calar reverse mode implementation that computes
2259the product $z^T=u^T F'(x)$ after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2260{\sf hos\_forward} with {\sf keep}=1. The corresponding {\bf f}irst-{\bf
2261  o}rder {\bf v}ector reverse mode driver is given by
2262\begin{tabbing}
2263\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2264\>{\sf int fov\_reverse(tag,m,n,q,U,Z)}\\
2265\>{\sf short int tag;}         \> // tape identification \\
2266\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2267\>{\sf int n;}                 \> // number of independent variables $n$\\
2268\>{\sf int q;}                 \> // number of weight vectors $q$\\
2269\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2270\>{\sf double Z[q][n];}        \> // resulting adjoint $Z=U F'(x)$
2271\end{tabbing}                 
2272that can be used after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2273{\sf hos\_forward} with {\sf keep}=1. To compute higher-order derivatives,
2274ADOL-C provides
2275\begin{tabbing}
2276\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2277\>{\sf int hos\_reverse(tag,m,n,d,u,Z)}\\
2278\>{\sf short int tag;}         \> // tape identification \\
2279\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2280\>{\sf int n;}                 \> // number of independent variables $n$\\
2281\>{\sf int d;}                 \> // highest derivative degree $d$\\
2282\>{\sf double u[m];}           \> // weight vector $u$\\
2283\>{\sf double Z[n][d+1];}      \> // resulting adjoints
2284\end{tabbing}                 
2285as {\bf h}igher-{\bf o}rder {\bf s}calar reverse mode implementation yielding
2286the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2287\ldots, where $Z=[z_0,z_1,\ldots,z_d]$ after calling  {\sf fos\_forward} or
2288{\sf hos\_forward} with {\sf keep} $=d+1>1$. The vector version is given by
2289\begin{tabbing}
2290\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2291\>{\sf int hov\_reverse(tag,m,n,d,q,U,Z,nz)}\\
2292\>{\sf short int tag;}         \> // tape identification \\
2293\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2294\>{\sf int n;}                 \> // number of independent variables $n$\\
2295\>{\sf int d;}                 \> // highest derivative degree $d$\\
2296\>{\sf double U[q][m];}        \> // weight vector $u$\\
2297\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints\\
2298\>{\sf short int nz[q][n];}    \> // nonzero pattern of {\sf Z}
2299\end{tabbing}                 
2300as {\bf h}igher-{\bf o}rder {\bf v}ector reverse mode driver to compute
2301the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2302\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$ after calling  {\sf fos\_forward} or
2303{\sf hos\_forward} with {\sf keep} $=d+1>1$.
2304After the function call, the last argument of {\sf hov\_reverse} 
2305contains information about the sparsity pattern, i.e. each {\sf nz[i][j]}
2306has a value that characterizes the functional relation between the
2307$i$-th component of $UF^\prime(x)$ and the $j$-th independent value
2308$x_j$ as:
2309\begin{center}
2310\begin{tabular}{ll}
2311 0 & trivial \\
2312 1 & linear
2313\end{tabular} \hspace*{4ex}
2314\begin{tabular}{ll}
2315 2 & polynomial\\
2316 3 & rational
2317\end{tabular} \hspace*{4ex}
2318\begin{tabular}{ll}
2319 4 & transcendental\\
2320 5 & non-smooth
2321\end{tabular}
2322\end{center}
2323Here, ``trivial'' means that there is no dependence at all and ``linear'' means
2324that the partial derivative is a constant that
2325does not dependent on other variables either. ``Non-smooth'' means that one of
2326the functions on the path between $x_i$ and $y_j$ was evaluated at a point
2327where it is not differentiable.  All positive labels
2328$1, 2, 3, 4, 5$ are pessimistic in that the actual functional relation may
2329in fact be simpler, for example due to exact cancellations. 
2330
2331There are also overloaded versions providing a general {\sf reverse}-call.
2332Details of the appropriate calling sequences are given in the following \autoref{forw_rev}.
2333%
2334%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2335\section{Overloaded Forward and Reverse Calls}
2336\label{forw_rev}
2337%
2338In this section, the several versions of the {\sf forward} and
2339{\sf reverse} routines, which utilize the overloading capabilities
2340of C++, are described in detail. With exception of the bit pattern
2341versions all interfaces are prototyped in the header file
2342\verb=<adolc/interfaces.h>=, where also some more specialized {\sf forward}
2343and {\sf reverse} routines are explained. Furthermore, \mbox{ADOL-C} provides
2344C and Fortran-callable versions prototyped in the same header file.
2345The bit pattern versions of {\sf forward} and {\sf reverse} introduced
2346in the \autoref{ProBit} are prototyped in the header file
2347\verb=<adolc/sparse/sparsedrivers.h>=, which will be included by the header
2348file \verb=<adolc/interfaces.h>= automatically.
2349%
2350%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2351\subsection{The Scalar Case}
2352%
2353\label{ScaCas}
2354%     
2355Given any correct tape, one may call from within
2356the generating program, or subsequently during another run, the following
2357procedure:
2358%
2359\begin{tabbing}
2360\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2361\>{\sf int forward(tag,m,n,d,keep,X,Y)} \\
2362\>{\sf short int tag;}         \> // tape identification \\
2363\>{\sf int m;}                 \> // number of dependent variables $m$\\
2364\>{\sf int n;}                 \> // number of independent variables $n$\\
2365\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2366\>{\sf  int keep;}             \> // flag for reverse sweep \\ 
2367\>{\sf  double X[n][d+1];}     \> // Taylor coefficients $X$ of
2368                                     independent variables \\
2369\>{\sf double Y[m][d+1];}      \> // Taylor coefficients $Y$ as
2370                                     in \eqref{eq:series}
2371\end{tabbing}
2372%
2373The rows of the matrix $X$ must correspond to the independent variables in the order of their
2374initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2375$X = \{x_j\}_{j=0\ldots d}$ represent Taylor coefficient vectors as in
2376\eqref{eq:x_of_t}. The rows of the matrix $Y$ must
2377correspond to the
2378dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2379The columns of $Y = \{y_j\}_{j=0\ldots d}$ represent
2380Taylor coefficient vectors as in \eqref{eq:series}.
2381Thus the first column of $Y$ contains the
2382function value $F(x)$ itself, the next column represents the first
2383Taylor coefficient vector of $F$, and the last column the
2384$d$-th Taylor coefficient vector. The integer flag {\sf keep} determines
2385how many Taylor coefficients of all intermediate quantities are
2386written into the value stack as explained in \autoref{forw_rev_ad}.
2387 If {\sf keep} is omitted, it defaults to 0.
2388
2389The given {\sf tag} value is used by {\sf forward} to determine the
2390name of the file on which the tape was written. If the tape file does
2391not exist, {\sf forward} assumes that the relevant
2392tape is still in core and reads from the buffers.
2393After the execution of an active section with \mbox{{\sf keep} = 1} or a call to
2394{\sf forward} with any {\sf keep} $\leq$ $d+1$, one may call
2395the function {\sf reverse} with \mbox{{\sf d} = {\sf keep} $-$ 1} and the same tape
2396identifier {\sf tag}. When $u$ is a vector
2397and $Z$ an $n\times (d+1)$ matrix
2398{\sf reverse} is executed in the scalar mode by the calling
2399sequence
2400%
2401\begin{tabbing}
2402\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position             
2403\>{\sf int reverse(tag,m,n,d,u,Z)}\\
2404\>{\sf short int tag;}         \> // tape identification \\
2405\>{\sf int m;}                 \> // number of dependent variables $m$\\
2406\>{\sf int n;}                 \> // number of independent variables $n$\\
2407\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2408\>{\sf  double u[m];}          \> // weighting vector $u$\\
2409\>{\sf double Z[n][d+1];}      \> // resulting adjoints $Z$ 
2410\end{tabbing}
2411to compute
2412the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2413\ldots, where $Z=[z_0,z_1,\ldots,z_d]$.
2414%
2415%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2416\subsection{The Vector Case}
2417%
2418\label{vecCas}
2419%
2420When $U$ is a matrix {\sf reverse} is executed in the vector mode by the following calling sequence
2421%
2422\begin{tabbing}
2423\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position       
2424\>{\sf int reverse(tag,m,n,d,q,U,Z,nz)}\\
2425\>{\sf short int tag;}         \> // tape identification \\
2426\>{\sf int m;}                 \> // number of dependent variables $m$\\
2427\>{\sf int n;}                 \> // number of independent variables $n$\\
2428\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2429\>{\sf int q;}                 \> // number of weight vectors $q$\\
2430\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2431\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints \\
2432\>{\sf short nz[q][n];}        \> // nonzero pattern of {\sf Z}
2433\end{tabbing}
2434%
2435to compute the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2436\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$.
2437When the arguments {\sf p} and {\sf U} are omitted, they default to
2438$m$ and the identity matrix of order $m$, respectively. 
2439
2440Through the optional argument {\sf nz} of {\sf reverse} one can compute
2441information about the sparsity pattern of $Z$ as described in detail
2442in the previous \autoref{forw_rev_ad}.
2443
2444The return values of {\sf reverse} calls can be interpreted according
2445to \autoref{retvalues}, but negative return values are not
2446valid, since the corresponding forward sweep would have
2447stopped without completing the necessary taylor file.
2448The return value of {\sf reverse} may be higher
2449than that of the preceding {\sf forward} call because some operations
2450that were evaluated  at a critical argument during the forward sweep
2451were found not to impact the dependents during the reverse sweep.
2452
2453In both scalar and vector mode, the degree $d$ must agree with
2454{\sf keep}~$-$~1 for the most recent call to {\sf forward}, or it must be
2455equal to zero if {\sf reverse} directly follows the taping of an active
2456section. Otherwise, {\sf reverse} will return control with a suitable error
2457message.
2458In order to avoid possible confusion, the first four arguments must always be
2459present in the calling sequence. However, if $m$ or $d$
2460attain their trivial values 1 and 0, respectively, then
2461corresponding dimensions of the arrays {\sf X}, {\sf Y}, {\sf u},
2462{\sf U}, or {\sf Z} can be omitted, thus eliminating one level of
2463indirection.  For example, we may call
2464{\sf reverse(tag,1,n,0,1.0,g)} after declaring
2465{\sf double g[n]} 
2466to calculate a gradient of a scalar-valued function.
2467
2468Sometimes it may be useful to perform a forward sweep for families of
2469Taylor series with the same leading term.
2470This vector version of {\sf forward} can be called in the form
2471%
2472\begin{tabbing}
2473\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2474\>{\sf int forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2475\>{\sf short int tag;}         \> // tape identification \\
2476\>{\sf int m;}                 \> // number of dependent variables $m$\\
2477\>{\sf int n;}                 \> // number of independent variables $n$\\
2478\>{\sf int d;}                 \> // highest derivative degree $d$\\
2479\>{\sf int p;}                 \> // number of Taylor series $p$\\
2480\>{\sf  double x0[n];}          \> // values of independent variables $x_0$\\
2481\>{\sf double X[n][p][d];}     \> // Taylor coefficients $X$ of independent variables\\
2482\>{\sf double y0[m];}           \> // values of dependent variables $y_0$\\
2483\>{\sf double Y[m][p][d];}     \> // Taylor coefficients $Y$ of dependent variables
2484\end{tabbing}
2485%
2486where {\sf X} and {\sf Y} hold the Taylor coefficients of first
2487and higher degree and {\sf x0}, {\sf y0} the common Taylor coefficients of
2488degree 0. There is no option to keep the values of active variables
2489that are going out of scope or that are overwritten. Therefore this
2490function cannot prepare a subsequent reverse sweep.
2491The return integer serves as a flag to indicate quadratures or altered
2492comparisons as described above in \autoref{reuse_tape}.
2493
2494Since the calculation of Jacobians is probably the most important
2495automatic differentia\-tion task, we have provided a specialization
2496of vector {\sf forward} to the case where $d = 1$. This version can be
2497called in the form
2498%
2499\begin{tabbing}
2500\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2501\>{\sf int forward(tag,m,n,p,x,X,y,Y)}\\
2502\>{\sf short int tag;}         \> // tape identification \\
2503\>{\sf int m;}                 \> // number of dependent variables $m$\\
2504\>{\sf int n;}                 \> // number of independent variables $n$\\
2505\>{\sf int p;}                 \> // number of partial derivatives $p$ \\
2506\>{\sf double x[n];}          \> // values of independent variables $x_0$\\
2507\>{\sf double X[n][p];}        \> // seed derivatives of independent variables $X$\\
2508\>{\sf double y[m];}           \> // values of dependent variables $y_0$\\
2509\>{\sf double Y[m][p];}        \> // first derivatives of dependent variables $Y$
2510\end{tabbing}
2511%
2512When this routine is called with {\sf p} = {\sf n} and {\sf X} the identity matrix,
2513the resulting {\sf Y} is simply the Jacobian $F^\prime(x_0)$. In general,
2514one obtains the $m\times p$ matrix $Y=F^\prime(x_0)\,X $ for the
2515chosen initialization of $X$. In a workstation environment a value
2516of $p$ somewhere between $10$ and $50$
2517appears to be fairly optimal. For smaller $p$ the interpretive
2518overhead is not appropriately amortized, and for larger $p$ the
2519$p$-fold increase in storage causes too many page faults. Therefore,
2520large Jacobians that cannot be compressed via column coloring
2521as could be done for example using the driver {\sf sparse\_jac}
2522should be ``strip-mined'' in the sense that the above
2523first-order-vector version of {\sf forward} is called
2524repeatedly with the successive \mbox{$n \times p$} matrices $X$ forming
2525a partition of the identity matrix of order $n$.
2526%
2527%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2528\subsection{Dependence Analysis}
2529%
2530\label{ProBit}
2531%
2532The sparsity pattern of Jacobians is often needed to set up data structures
2533for their storage and factorization or to allow their economical evaluation
2534by compression \cite{BeKh96}. Compared to the evaluation of the full
2535Jacobian $F'(x_0)$ in real arithmetic computing the Boolean matrix
2536$\tilde{P}\in\left\{0,1\right\}^{m\times n}$ representing its sparsity
2537pattern in the obvious way requires a little less run-time and
2538certainly a lot less memory.
2539
2540The entry $\tilde{P}_{ji}$ in the $j$-th row and $i$-th column
2541of $\tilde{P}$ should be $1 = true$ exactly when there is a data
2542dependence between the $i$-th independent variable $x_{i}$ and
2543the $j$-th dependent variable $y_{j}$. Just like for real arguments
2544one would wish to compute matrix-vector and vector-matrix products
2545of the form $\tilde{P}\tilde{v}$ or $\tilde{u}^{T}\tilde{P}$ 
2546by appropriate {\sf forward} and {\sf reverse} routines where
2547$\tilde{v}\in\{0,1\}^{n}$ and $\tilde{u}\in\{0,1\}^{m}$.
2548Here, multiplication corresponds to logical
2549{\sf AND} and addition to logical {\sf OR}, so that algebra is performed in a
2550semi-ring.
2551
2552For practical reasons it is assumed that
2553$s=8*${\sf sizeof}$(${\sf unsigned long int}$)$ such Boolean vectors
2554$\tilde{v}$ and $\tilde{u}$ are combined to integer vectors
2555$v\in\N^{n}$ and $u\in\N^{m}$ whose components can be interpreted
2556as bit patterns. Moreover $p$ or $q$ such integer vectors may
2557be combined column-wise or row-wise to integer matrices $X\in\N^{n \times p}$ 
2558and $U\in\N^{q \times m}$, which naturally correspond
2559to Boolean matrices $\tilde{X}\in\{0,1\}^{n\times\left(sp\right)}$
2560and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times m}$. The provided
2561bit pattern versions of {\sf forward} and {\sf reverse} allow
2562to compute integer matrices $Y\in\N^{m \times p}$ and
2563$Z\in\N^{q \times m}$ corresponding to
2564\begin{equation}
2565\label{eq:int_forrev}
2566\tilde{Y} = \tilde{P}\tilde{X} \qquad \mbox{and} \qquad 
2567\tilde{Z} = \tilde{U}\tilde{P} \, ,
2568\end{equation} 
2569respectively, with $\tilde{Y}\in\{0,1\}^{m\times\left(sp\right)}$
2570and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times n}$.
2571In general, the application of the bit pattern versions of
2572{\sf forward} or {\sf reverse} can be interpreted as
2573propagating dependences between variables forward or backward, therefore
2574both the propagated integer matrices and the corresponding
2575Boolean matrices are called {\em dependence structures}.
2576 
2577The bit pattern {\sf forward} routine
2578%
2579\begin{tabbing}
2580\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2581\>{\sf int forward(tag,m,n,p,x,X,y,Y,mode)}\\
2582\>{\sf short int tag;}              \> // tape identification \\
2583\>{\sf int m;}                      \> // number of dependent variables $m$\\
2584\>{\sf int n;}                      \> // number of independent variables $n$\\
2585\>{\sf int p;}                      \> // number of integers propagated $p$\\
2586\>{\sf double x[n];}                \> // values of independent variables $x_0$\\
2587\>{\sf unsigned long int X[n][p];}  \> // dependence structure $X$ \\
2588\>{\sf double y[m];}                \> // values of dependent variables $y_0$\\
2589\>{\sf unsigned long int Y[m][p];}  \> // dependence structure $Y$ according to
2590                                     \eqref{eq:int_forrev}\\
2591\>{\sf char mode;}                  \> // 0 : safe mode (default), 1 : tight mode
2592\end{tabbing}
2593%
2594can be used to obtain the dependence structure $Y$ for a given dependence structure
2595$X$. The dependence structures are
2596represented as arrays of {\sf unsigned long int} the entries of which are
2597interpreted as bit patterns as described above.   
2598For example, for $n=3$ the identity matrix $I_3$ should be passed
2599with $p=1$ as the $3 \times 1$ array
2600\begin{eqnarray*}
2601{\sf X} \; = \;
2602\left( \begin{array}{r}
2603         {\sf 1}0000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2604         0{\sf 1}000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2605         00{\sf 1}00000 \: 00000000 \: 00000000 \: 00000000_2
2606       \end{array} \right)
2607\end{eqnarray*}
2608in the 4-byte long integer format. The parameter {\sf mode} determines
2609the mode of dependence analysis as explained already in \autoref{sparse}.
2610
2611A call to the corresponding bit pattern {\sf reverse} routine
2612%
2613\begin{tabbing}
2614\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2615\>{\sf int reverse(tag,m,n,q,U,Z,mode)}\\
2616\>{\sf short int tag;}         \> // tape identification \\
2617\>{\sf int m;}                 \> // number of dependent variables $m$\\
2618\>{\sf int n;}                 \> // number of independent variables $n$\\
2619\>{\sf int q;}                 \> // number of integers propagated q\\
2620\>{\sf unsigned long int U[q][m];}  \> // dependence structure $U$ \\
2621\>{\sf unsigned long int Z[q][n];}  \> // dependence structure $Z$ according
2622                                     to \eqref{eq:int_forrev}\\
2623\>{\sf char mode;}        \> // 0 : safe mode (default), 1 : tight mode
2624\end{tabbing}
2625%
2626yields the dependence structure $Z$ for a given dependence structure
2627$U$.
2628
2629To determine the whole sparsity pattern $\tilde{P}$ of the Jacobian $F'(x)$
2630as an integer matrix $P$ one may call {\sf forward} or {\sf reverse} 
2631with $p \ge n/s$ or $q \ge m/s$, respectively. For this purpose the
2632corresponding dependence structure $X$ or $U$ must be defined to represent 
2633the identity matrix of the respective dimension.
2634Due to the fact that always a multiple of $s$ Boolean vectors are propagated
2635there may be superfluous vectors, which can be set to zero.
2636
2637The return values of the bit pattern {\sf forward} and {\sf reverse} routines
2638correspond to those described in \autoref{retvalues}.
2639
2640One can control the storage growth by the factor $p$ using
2641``strip-mining'' for the calls of {\sf forward} or {\sf reverse} with successive
2642groups of columns or respectively rows at a time, i.e.~partitioning
2643$X$ or $U$ appropriately as described for the computation of Jacobians
2644in \autoref{vecCas}.
2645%
2646%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2647%
2648\section{Advanced algorithmic differentiation in ADOL-C}
2649\label{adv_ad}
2650%
2651\subsection{Differentiating external functions}
2652%
2653Ideally, AD is applied to a given computation as a whole.
2654This is not always possible because parts of the computation may
2655be coded in a different programming language or may a call to
2656an external library.
2657In the former case one may want to differentiate the parts in
2658question with a different AD tool or provide hand written derivatives.
2659To integrate these
2660In practice, however, sophisticated projects usually evolve over a long period of time.
2661Within this process, a heterogeneous code base for the project
2662develops, which may include the incorporation of external solutions,
2663changes in programming paradigms or even of programming languages.
2664Equally heterogeneous, the computation of derivative values appears.
2665Hence, different \mbox{AD-tools} may be combined with hand-derived
2666codes based on the same or different programming languages.
2667ADOL-C supports such settings  by  the concept of externally
2668differentiated functions, that is, a function
2669not differentiated by ADOL-C itself. The required derivatives
2670have to be provided by the user.
2671
2672For this purpose, it is required that the externally differentiated
2673function (for example named {\sf\em euler\_step} ) has the following signature.
2674\smallskip
2675
2676\noindent
2677\hspace*{2cm}{\sf int euler\_step(int n, double *x, int m, double *y);}
2678\medskip
2679
2680\noindent
2681Note that the formal paraemters in the signature have {\sf double} type, that is,
2682they are not active as in the original program before the ADOL-C type change.
2683The externally differentiated function has to
2684be {\em registered}\footnote{we record the function pointer} using an \mbox{ADOL-C} method as follows.
2685\smallskip
2686
2687\noindent
2688\hspace*{2cm}{\sf ext\_diff\_fct *edf = reg\_ext\_fct(euler\_step);}.
2689\smallskip
2690
2691\noindent
2692This returns a pointer to an {\sf ext\_diff\_fct} instance specific to the registered function.
2693Then, the user has to supply the function pointers for the call back methods (for example here
2694{\sf zos\_for\_euler\_step} and {\sf  fos\_rev\_euler\_step}) the user implemented
2695to compute the derivatives as follows.
2696\begin{tabbing}
2697\hspace*{2cm}\= {\sf edf-$>$zos\_forward = {\em zos\_for\_euler\_step};}\\
2698             \> {\sf // function pointer for computing
2699               Zero-Order-Scalar (=zos)}\\
2700             \> {\sf // forward information}\\
2701%             \> {\sf edf-$>$dp\_x = xp;}\\
2702%             \> {\sf edf-$>$dp\_y = yp;}\\
2703%             \> {\sf // double arrays for arguments and results}\\
2704             \> {\sf edf-$>$fos\_reverse = fos\_rev\_euler\_step;} \\
2705             \> {\sf // function pointer for computing
2706               First-Order-Scalar (=fos)}\\ 
2707             \> {\sf reverse information}
2708\end{tabbing}
2709To facilitate the switch between active and passive versions of the parameters {\sf x} and {\sf y}
2710one has to provide (allocate) both variants. I.e. if the call to {\sf euler\_step} was originally
2711\noindent
2712\hspace*{2cm}{\sf rc=euler\_step(n, sOld, m, sNew);}
2713\medskip
2714then the ADOL-C typechange for the calling context will turn  {\sf sOld} and {\sf sNew} in {\sf adouble} pointers.
2715To trigger the appropriate action for the derivative computation (i.e. creating an external differentiation entry on the trace)
2716the original call to the externally differentiated function
2717must be substituted by
2718\medskip
2719
2720\noindent
2721\hspace*{2cm}{\sf rc=call\_ext\_fct(edf, n, sOldPassive, sOld, m, sNewPassive, sNew);}
2722\medskip
2723
2724\noindent
2725Here, {\sf sOldPassive} and {\sf sNewPassive} are the passive counterparts ({\sf double} pointers allocated to length {\sf n} and {\sf m}, respectively) 
2726to the active arguments {\sf sNew} in {\sf adouble}.
2727The usage of the external function facility is illustrated by the
2728example \verb=ext_diff_func= contained in
2729\verb=examples/additional_examples/ext_diff_func=.
2730There,the external differentiated function is also a C code, but the
2731handling as external differentiated functions also a decrease of the
2732overall required tape size.
2733
2734%
2735\subsection{Advanced algorithmic differentiation of time integration processes}
2736%
2737For many time-dependent applications, the corresponding simulations
2738are based on ordinary or partial differential equations.
2739Furthermore, frequently there are quantities that influence the
2740result of the simulation and can be seen as  control of the systems.
2741To compute an approximation of the
2742simulated process for a time interval $[0,T]$ and evaluated the
2743desired target function, one applies an
2744appropriate integration scheme given by
2745\begin{tabbing}
2746\hspace{5mm} \= some initializations yielding $x_0$\\
2747\> for $i=0,\ldots, N-1$\\
2748\hspace{10mm}\= $x_{i+1} = F(x_i,u_i,t_i)$\\
2749\hspace{5mm} \= evaluation of the target function
2750\end{tabbing}
2751where $x_i\in {\bf R}^n$ denotes the state and $u_i\in {\bf R}^m$ the control at
2752time $t_i$ for a given time grid $t_0,\ldots,t_N$ with $t_0=0$ and
2753$t_N=T$. The operator $F : {\bf R}^n \times {\bf R}^m \times {\bf R} \mapsto {\bf R}^n$
2754defines the time step to compute the state at time $t_i$. Note that we
2755do not assume a uniform grid.
2756
2757When computing derivatives of the target function with respect to the
2758control, the consequences for the tape generation using the ``basic''
2759taping approach as implemented in ADOL-C so far are shown in the left part of
2760\autoref{fig:bas_tap}.
2761\begin{figure}[htbp] 
2762\begin{center}
2763\hspace*{0.5cm}\includegraphics[width=5.8cm]{tapebasic}\hfill
2764\includegraphics[width=5.8cm]{tapeadv} \hspace*{0.5cm}\
2765\end{center}
2766\hspace*{0.8cm} Basic taping process \hspace*{4.3cm} Advanced taping process
2767\caption{Different taping approaches}
2768\label{fig:bas_tap}
2769\end{figure} 
2770As can be seen, the iterative process is completely
2771unrolled due to the taping process. That is, the tape contains an internal representation of each
2772time step. Hence, the overall tape comprises a serious amount of redundant
2773information as illustrated by the light grey rectangles in
2774\autoref{fig:bas_tap}
2775
2776To overcome the repeated storage of essentially the same information,
2777a {\em nested taping} mechanism has been incorporated into ADOL-C as illustrated on
2778the right-hand side of \autoref{fig:bas_tap}. This new
2779capability allows the encapsulation of the time-stepping procedure
2780such that only the last time step $x_{N} = F(x_{N-1},u_{N-1})$ is taped as one
2781representative of the time steps in addition to a function pointer to the
2782evaluation procedure $F$ of the time steps.  The function pointer has
2783to be stored for a possibly necessary retaping during the derivative calculation
2784as explained below.
2785
2786Instead of storing the complete tape, only a very limited number of intermediate
2787states are kept in memory. They serve as checkpoints, such that
2788the required information for the backward integration is generated
2789piecewise during the adjoint calculation.
2790For this modified adjoint computation the optimal checkpointing schedules
2791provided by {\bf revolve} are employed. An adapted version of the
2792software package {\sf revolve} is part of ADOL-C and automatically
2793integrated in the ADOL-C library. Based on {\sf revolve}, $c$ checkpoints are
2794distributed such that computational effort is minimized for the given
2795number of checkpoints and time steps $N$. It is important to note that the overall tape
2796size is drastically reduced due to the advanced taping strategy.  For the
2797implementation of this nested taping we introduced
2798a so-called ``differentiating context'' that enables \mbox{ADOL-C} to
2799handle different internal function representations during the taping
2800procedure and the derivative calculation. This approach allows the generation of a new
2801tape inside the overall tape, where the coupling of the different tapes is based on
2802the {\em external differentiated function} described above.
2803
2804Written under the objective of minimal user effort, the checkpointing routines
2805of \mbox{ADOL-C} need only very limited information. The user must
2806provide two routines as implementation of the time-stepping function $F$ 
2807with the signatures
2808\medskip
2809
2810\noindent
2811\hspace*{2cm}{\sf int time\_step\_function(int n, adouble *u);}\\
2812\hspace*{2cm}{\sf int time\_step\_function(int n, double *u);}
2813\medskip
2814
2815\noindent
2816where the function names can be chosen by the user as long as the names are
2817unique.It is possible that the result vector of one time step
2818iteration overwrites the argument vector of the same time step. Then, no
2819copy operations are required to prepare the next time step.
2820
2821At first, the {\sf adouble} version of the time step function has to
2822be {\em registered} using the \mbox{ADOL-C} function
2823\medskip
2824
2825\noindent
2826\hspace*{2cm}{\sf CP\_Context cpc(time\_step\_function);}.
2827\medskip
2828
2829\noindent
2830This function initializes the structure {\sf cpc}. Then,
2831the user has to provide the remaining checkpointing information
2832by the following commands:
2833\begin{tabbing}
2834\hspace*{2cm}\= {\sf cpc.setDoubleFct(time\_step\_function);}\\
2835             \> {\sf // double variante of the time step function}\\
2836             \> {\sf cpc.setNumberOfSteps(N);}\\
2837             \> {\sf // number of time steps to perform}\\
2838             \> {\sf cpc.setNumberOfCheckpoints(10);}\\
2839             \> {\sf // number of checkpoint} \\
2840             \> {\sf cpc.setDimensionXY(n);}\\
2841             \> {\sf // dimension of input/output}\\
2842             \> {\sf cpc.setInput(y);}\\
2843             \> {\sf // input vector} \\
2844             \> {\sf cpc.setOutput(y);}\\
2845             \> {\sf // output vector }\\
2846             \> {\sf cpc.setTapeNumber(tag\_check);}\\
2847             \> {\sf // subtape number for checkpointing} \\
2848             \> {\sf cpc.setAlwaysRetaping(false);}\\
2849             \> {\sf // always retape or not ?}
2850\end{tabbing}
2851Subsequently, the time loop in the function evaluation can be
2852substituted by a call of the function
2853\medskip
2854
2855\noindent
2856\hspace*{2cm}{\sf int cpc.checkpointing();}
2857\medskip
2858
2859\noindent
2860Then, ADOL-C computes derivative information using the optimal checkpointing
2861strategy provided by {\sf revolve} internally, i.e., completely hidden from the user.
2862
2863The presented driver is prototyped in the header file
2864\verb=<adolc/checkpointing.h>=. This header
2865is included by the global header file \verb=<adolc/adolc.h>= automatically.
2866An example program \verb=checkpointing.cpp= illustrates the
2867checkpointing facilities. It can be found in the directory \verb=examples/additional_examples/checkpointing=.
2868%
2869%
2870%
2871\subsection{Advanced algorithmic differentiation of fixed point iterations}
2872%
2873Quite often, the state of the considered system denoted by $x\in\R^n$
2874depends on some design parameters denoted by $u\in\R^m$. One example for this setting
2875forms the flow over an aircraft wing. Here, the shape of the wing that
2876is defined by the design vector $u$ 
2877determines the flow field $x$. The desired quasi-steady state $x_*$
2878fulfills the fixed point equation
2879\begin{align}
2880  \label{eq:fixedpoint}
2881  x_* = F(x_*,u)
2882\end{align}
2883for a given continuously differentiable function
2884$F:\R^n\times\R^m\rightarrow\R^n$. A fixed point property of this kind is
2885also exploited by many other applications.
2886
2887Assume that one can apply the iteration 
2888\begin{align}
2889\label{eq:iteration}
2890 x_{k+1} = F(x_k,u)
2891\end{align}
2892to obtain a linear converging sequence $\{x_k\}$ generated
2893for any given control $u\in\R^n$. Then the limit point $x_*\in\R^n$ fulfils the fixed
2894point equation~\eqref{eq:fixedpoint}. Moreover,
2895suppose that $\|\frac{dF}{dx}(x_*,u)\|<1$ holds for any pair
2896$(x_*,u)$ satisfying equation \eqref{eq:fixedpoint}.
2897Hence, there exists a
2898differentiable function $\phi:\R^m \rightarrow \R^n$,
2899such that $\phi(u) = F(\phi(u),u)$, where the state
2900$\phi(u)$ is a fixed point of $F$ according to a control
2901$u$. To optimize the system described by the state vector $x=\phi(u)$ with respect to
2902the design vector $u$, derivatives of $\phi$ with respect
2903to $u$ are of particular interest.
2904
2905To exploit the advanced algorithmic differentiation  of such fixed point iterations
2906ADOL-C provides the special functions {\tt fp\_iteration(...)}.
2907It has the following interface:
2908\begin{tabbing}
2909\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2910\>{\sf int
2911  fp\_iteration(}\={\sf sub\_tape\_num,double\_F,adouble\_F,norm,norm\_deriv,eps,eps\_deriv,}\\
2912\>              \>{\sf N\_max,N\_max\_deriv,x\_0,u,x\_fix,dim\_x,dim\_u)}\\
2913\hspace{0.5in}\={\sf short int tag;} \hspace{0.9in}\= \kill    % define tab position
2914\>{\sf short int sub\_tape\_num;}         \> // tape identification for sub\_tape \\
2915\>{\sf int *double\_F;}         \> // pointer to a function that compute for $x$ and $u$ \\
2916\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
2917\>{\sf int *adouble\_F;}        \> // pointer to a function that compute for $x$ and $u$ \\
2918\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
2919\>{\sf int *norm;}              \> // pointer to a function that computes\\
2920\>                              \> // the norm of a vector\\
2921\>{\sf int *norm\_deriv;}       \> // pointer to a function that computes\\
2922\>                              \> // the norm of a vector\\
2923\>{\sf double eps;}             \> // termination criterion for fixed point iteration\\
2924\>{\sf double eps\_deriv;}      \> // termination criterion for adjoint fixed point iteration\\
2925\>{\sf N\_max;}                 \> // maximal number of itertions for state computation\\
2926\>{\sf N\_max\_deriv;}          \> // maximal number of itertions for adjoint computation\\
2927\>{\sf adouble *x\_0;}          \> // inital state of fixed point iteration\\
2928\>{\sf adouble *u;}             \> // value of $u$\\
2929\>{\sf adouble *x\_fic;}        \> // final state of fixed point iteration\\
2930\>{\sf int dim\_x;}             \> // dimension of $x$\\
2931\>{\sf int dim\_u;}             \> // dimension of $u$\\
2932\end{tabbing}
2933%
2934Here {\tt sub\_tape\_num} is an ADOL-C identifier for the subtape that
2935should be used for the fixed point iteration.
2936{\tt double\_F} and {\tt adouble\_F} are pointers to functions, that
2937compute for $x$ and $u$ a single iteration step $y=F(x,u)$. Thereby
2938{\tt double\_F} uses {\tt double} arguments and {\tt adouble\_F}
2939uses ADOL-C {\tt adouble} arguments. The parameters {\tt norm} and
2940{\tt norm\_deriv} are pointers to functions computing the norm
2941of a vector. The latter functions together with {\tt eps},
2942{\tt eps\_deriv}, {\tt N\_max}, and {\tt N\_max\_deriv} control
2943the iterations. Thus the following loops are performed:
2944\begin{center}
2945\begin{tabular}{ll}
2946  do                     &   do                           \\
2947  ~~~~$k = k+1$          &   ~~~~$k = k+1$                \\
2948  ~~~~$x = y$            &   ~~~~$\zeta = \xi$            \\
2949  ~~~~$y = F(x,u)$       &   ~~~
2950  $(\xi^T,\bar u^T) = \zeta^TF'(x_*,u) + (\bar x^T, 0^T)$ \\
2951  while $\|y-x\|\geq\varepsilon$ and $k\leq N_{max}$ \hspace*{0.5cm} &
2952  while $\|\xi -\zeta\|_{deriv}\geq\varepsilon_{deriv}$   \\
2953  & and $k\leq N_{max,deriv}$
2954\end{tabular}
2955\end{center}
2956The vector for the initial iterate and the control is stored
2957in {\tt x\_0} and {\tt u} respectively. The vector in which the
2958fixed point is stored is {\tt x\_fix}. Finally {\tt dim\_x}
2959and {\tt dim\_u} represent the dimensions $n$ and $m$ of the
2960corresponding vectors.
2961
2962The presented driver is prototyped in the header file
2963\verb=<adolc/fixpoint.h>=. This header
2964is included by the global header file \verb=<adolc/adolc.h>= automatically.
2965An example code that shows also the
2966expected signature of the function pointers is contained in the directory \verb=examples/additional_examples/fixpoint_exam=.
2967%
2968\subsection{Advanced algorithmic differentiation of OpenMP parallel programs}
2969%
2970ADOL-C allows to compute derivatives in parallel for functions
2971containing OpenMP parallel loops.
2972This implies that an explicit loop-handling approach is applied. A
2973typical situation is shown in \autoref{fig:basic_layout},
2974\begin{figure}[hbt]
2975    \vspace{3ex}
2976    \begin{center}
2977        \includegraphics[height=4cm]{multiplexed} \\
2978        \begin{picture}(0,0)
2979            \put(-48,40){\vdots}
2980            \put(48,40){\vdots}
2981            \put(-48,80){\vdots}
2982            \put(48,80){\vdots}
2983            \put(-83,132){function eval.}
2984            \put(5,132){derivative calcul.}
2985        \end{picture}
2986    \end{center}
2987    \vspace{-5ex}
2988    \caption{Basic layout of mixed function and the corresponding derivation process}
2989    \label{fig:basic_layout}
2990\end{figure}
2991where the OpenMP-parallel loop is preceded by a serial startup
2992calculation and followed by a serial finalization phase.
2993
2994Initialization of the OpenMP-parallel regions for \mbox{ADOL-C} is only a matter of adding a macro to the outermost OpenMP statement.
2995Two macros are available that only differ in the way the global tape information is handled.
2996Using {\tt ADOLC\_OPENMP}, this information, including the values of the augmented variables, is always transferred from the serial to the parallel region using {\it firstprivate} directives for initialization.
2997For the special case of iterative codes where parallel regions, working on the same data structures, are called repeatedly the {\tt ADOLC\_OPENMP\_NC} macro can be used.
2998Then, the information transfer is performed only once within the iterative process upon encounter of the first parallel region through use of the {\it threadprivate} feature of OpenMP that makes use of thread-local storage, i.e., global memory local to a thread.
2999Due to the inserted macro, the OpenMP statement has the following structure:
3000\begin{tabbing}
3001\hspace*{1cm} \= {\sf \#pragma omp ... ADOLC\_OPENMP} \qquad \qquad or \\
3002              \> {\sf \#pragma omp ... ADOLC\_OPENMP\_NC}
3003\end{tabbing}
3004Inside the parallel region, separate tapes may then be created.
3005Each single thread works in its own dedicated AD-environment, and all
3006serial facilities of \mbox{ADOL-C} are applicable as usual. The global
3007derivatives can be computed using the tapes created in the serial and
3008parallel parts of the function evaluation, where user interaction is
3009required for the correct derivative concatenation of the various tapes.
3010
3011For the usage of the parallel facilities, the \verb=configure=-command
3012has to be used with the option \verb?--with-openmp-flag=FLAG?, where
3013\verb=FLAG= stands for the system dependent OpenMP flag.
3014The parallel differentiation of a parallel program is illustrated
3015by the example program \verb=openmp_exam.cpp= contained in \verb=examples/additional_examples/openmp_exam=.
3016%
3017%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3018%
3019\section{Tapeless forward differentiation in ADOL-C}
3020\label{tapeless}
3021%
3022Up to version 1.9.0, the development of the ADOL-C software package
3023was based on the decision to store all data necessary for derivative
3024computation on tapes, where large applications require the tapes to be
3025written out to corresponding files. In almost all cases this means
3026a considerable drawback in terms of run time due to the excessive
3027memory accesses. Using these tapes enables ADOL-C to offer multiple
3028functions. However, it is not necessary for all tasks of derivative
3029computation to do that.
3030
3031Starting with version 1.10.0, ADOL-C now features a tapeless forward
3032mode for computing first order derivatives in scalar mode, i.e.,
3033$\dot{y} = F'(x)\dot{x}$, and in vector mode, i.e., $\dot{Y} = F'(x)\dot{X}$.
3034This tapeless variant coexists with the more universal
3035tape based mode in the package. The following subsections describe
3036the source code modifications required to use the tapeless forward mode of
3037ADOL-C. 
3038%
3039\subsection{Modifying the Source Code}
3040%
3041Let us consider the coordinate transformation from Cartesian to spherical
3042polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y
3043= F(x)$, with
3044\begin{eqnarray*}
3045y_1  =  \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad
3046y_2  =  \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad
3047y_3  =  \arctan(x_2/x_1),
3048\end{eqnarray*}
3049as an example. The corresponding source code is shown in \autoref{fig:tapeless}.
3050\begin{figure}[htb]
3051\framebox[\textwidth]{\parbox{\textwidth}{
3052%\begin{center}
3053%\begin{flushleft}
3054\begin{tabbing}
3055\= \kill
3056\> {\sf \#include} {\sf $<$iostream$>$}\\
3057\> {\sf using namespace std;}\\
3058\> \\
3059\> {\sf int main() \{}\\
3060\> {\sf \rule{0.5cm}{0pt}double x[3], y[3];}\\
3061\> \\
3062\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3063\> {\sf \rule{1cm}{0pt}...}\\
3064\> \\
3065\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3066\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3067\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3068\> \\
3069\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0] $<<$ " , y2=" $<<$ y[1] $<<$ " , y3=" $<<$ y[2] $<<$ endl;}\\
3070\> \\
3071\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3072\> \}
3073\end{tabbing}
3074%\end{flushleft}
3075%\end{center}
3076}}
3077\caption{Example for tapeless forward mode}
3078\label{fig:tapeless}
3079\end{figure}
3080%
3081Changes to the source code that are necessary for applying the
3082tapeless forward ADOL-C are described in the following two
3083subsections, where the vector mode version is described
3084as extension of the scalar mode.
3085%
3086\subsubsection*{The scalar mode}
3087%
3088To use the tapeless forward mode, one has to include one
3089of the header files \verb#adolc.h# or \verb#adouble.h#
3090where the latter should be preferred since it does not include the
3091tape based functions defined in other header files. Hence, including
3092\verb#adouble.h# avoids mode mixtures, since
3093\verb#adolc.h# is just a wrapper for including all public
3094  headers of the ADOL-C package and does not offer own functions.
3095Since the two ADOL-C forward mode variants tape-based and tapeless,
3096are prototyped in the same header file, the compiler needs to know if a
3097tapeless version is intended. This can be done by defining a
3098preprocessor macro named {\sf ADOLC\_TAPELESS}. Note that it is
3099important to define this macro before the header file is included.
3100Otherwise, the tape-based version of ADOL-C will be used.
3101
3102As in the tape based forward version of ADOL-C all derivative
3103calculations are introduced by calls to overloaded
3104operators. Therefore, similar to the tape-based version all
3105independent, intermediate and dependent variables must be declared
3106with type {\sf adouble}. The whole tapeless functionality provided by
3107\verb#adolc.h# was written as complete inline intended code
3108due to run time aspects, where the real portion of inlined code can
3109be influenced by switches for many compilers. Likely, the whole
3110derivative code is inlined by default. Our experiments
3111with the tapeless mode have produced complete inlined code by using
3112standard switches (optimization) for GNU and Intel C++
3113compiler.
3114
3115To avoid name conflicts
3116resulting from the inlining the tapeless version has its own namespace
3117\verb#adtl#. As a result four possibilities of using the {\sf adouble}
3118type are available for the tapeless version:
3119\begin{itemize}
3120\item Defining a new type
3121      \begin{center}
3122        \begin{tabular}{l}
3123          {\sf typedef adtl::adouble adouble;}\\
3124          ...\\
3125          {\sf adouble tmp;}
3126        \end{tabular}
3127      \end{center}
3128      This is the preferred way. Remember, you can not write an own
3129      {\sf adouble} type/class with different meaning after doing the typedef.
3130\item Declaring with namespace prefix
3131      \begin{center}
3132        \begin{tabular}{l}
3133          {\sf adtl::adouble tmp;}
3134        \end{tabular}
3135      \end{center}
3136      Not the most handsome and efficient way with respect to coding
3137      but without any doubt one of the safest ways. The identifier
3138      {\sf adouble} is still available for user types/classes.
3139\item Trusting macros
3140      \begin{center}
3141        \begin{tabular}{l}
3142          {\sf \#define adouble adtl::adouble}\\
3143          ...\\
3144          {\sf adouble tmp;}
3145        \end{tabular}
3146      \end{center}
3147      This approach should be used with care, since standard defines are text replacements.
3148  \item Using the complete namespace
3149        \begin{center}
3150          \begin{tabular}{l}
3151            {\sf \#using namespace adtl;}\\
3152            ...\\
3153            {\sf adouble tmp;}
3154          \end{tabular}
3155        \end{center}
3156        A very clear approach with the disadvantage of uncovering all the hidden secrets. Name conflicts may arise!
3157\end{itemize}
3158After defining the variables only two things are left to do. First
3159one needs to initialize the values of the independent variables for the
3160function evaluation. This can be done by assigning the variables a {\sf
3161double} value. The {\sf ad}-value is set to zero in this case.
3162Additionally, the tapeless forward mode variant of ADOL-C
3163offers a function named {\sf setValue} for setting the value without
3164changing the {\sf ad}-value. To set the {\sf ad}-values of the independent
3165variables ADOL-C offers two possibilities:
3166\begin{itemize}
3167  \item Using the constructor
3168        \begin{center}
3169          \begin{tabular}{l}
3170            {\sf adouble x1(2,1), x2(4,0), y;}
3171          \end{tabular}
3172        \end{center}
3173        This would create three adoubles $x_1$, $x_2$ and $y$. Obviously, the latter
3174        remains uninitialized. In terms of function evaluation
3175        $x_1$ holds the value 2 and $x_2$ the value 4 whereas the derivative values
3176        are initialized to $\dot{x}_1=1$ and $\dot{x}_2=0$.
3177   \item Setting point values directly
3178         \begin{center}
3179           \begin{tabular}{l}
3180             {\sf adouble x1=2, x2=4, y;}\\
3181             ...\\
3182             {\sf x1.setADValue(1);}\\
3183             {\sf x2.setADValue(0);}
3184           \end{tabular}
3185         \end{center}
3186         The same example as above but now using {\sf setADValue}-method for initializing the derivative values.
3187\end{itemize}
3188%
3189The derivatives can be obtained at any time during the evaluation
3190process by calling the {\sf getADValue}-method
3191\begin{center}
3192  \begin{tabular}{l}
3193    {\sf adouble y;}\\
3194    ...\\
3195    {\sf cout $<<$ y.getADValue();}
3196  \end{tabular}
3197\end{center}
3198\autoref{fig:modcode} shows the resulting source code incorporating
3199all required changes for the example
3200given above.
3201
3202\begin{figure}[htb]
3203\framebox[\textwidth]{\parbox{\textwidth}{
3204%\begin{center}
3205\begin{tabbing}
3206\hspace*{-1cm} \= \kill
3207\> {\sf \#include $<$iostream$>$}\\
3208\> {\sf using namespace std;}\\
3209\> \\
3210\> {\sf \#define ADOLC\_TAPELESS}\\
3211\> {\sf \#include $<$adouble.h$>$}\\
3212\> {\sf typedef adtl::adouble adouble;}\\
3213\\
3214\> {\sf int main() \{}\\
3215\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3216\\
3217\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3218\> {\sf \rule{1cm}{0pt}...}\\
3219\\
3220\> {\sf \rule{0.5cm}{0pt}x[0].setADValue(1);\hspace*{3cm}// derivative of f with respect to $x_1$}\\
3221\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3222\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3223\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3224\\
3225\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3226\> {\sf \rule{0.5cm}{0pt}cout $<<$ "dy2/dx1 = " $<<$ y[1].getADValue() $<<$ endl;}\\
3227\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3228\> {\sf \}}
3229\end{tabbing}
3230%\end{center}
3231}}
3232\caption{Example for tapeless scalar forward mode}
3233\label{fig:modcode}
3234\end{figure}
3235%
3236\subsubsection*{The vector mode}
3237%
3238In scalar mode only one direction element has to be stored per {\sf
3239  adouble} whereas a field of $p$ elements is needed in the vector
3240  mode to cover the computations for the given $p$ directions. The
3241  resulting changes to the source code are described in this section.
3242
3243Similar to tapeless scalar forward mode, the tapeless vector forward
3244mode is used by defining {\sf ADOLC\_TAPELESS}. Furthermore, one has to define
3245an additional preprocessor macro named {\sf NUMBER\_DIRECTIONS}. This
3246macro takes the maximal number of directions to be used within the
3247resulting vector mode. Just as {\sf ADOLC\_TAPELESS} the new macro
3248must be defined before including the \verb#<adolc.h/adouble.h>#
3249header file since it is ignored otherwise.
3250
3251In many situations recompiling the source code to get a new number of
3252directions is at least undesirable. ADOL-C offers a function named
3253{\sf setNumDir} to work around this problem partially. Calling this
3254function, ADOL-C does not take the number of directions
3255from the macro {\sf NUMBER\_DIRECTIONS} but from the argument of
3256{\sf setNumDir}. A corresponding source code would contain the following lines: 
3257\begin{center}
3258  \begin{tabular}{l}
3259    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3260    ...\\
3261    {\sf adtl::setNumDir(5);}
3262  \end{tabular}
3263\end{center}
3264Note that using this function does not
3265change memory requirements that can be roughly determined by
3266({\sf NUMBER\_DIRECTIONS}$+1$)*(number of {\sf adouble}s).
3267
3268Compared to the scalar case setting and getting the derivative
3269values, i.e. the directions, is more involved. Instead of
3270working with single {\sf double} values, pointer to fields of {\sf
3271double}s are used as illustrated by the following example:
3272\begin{center}
3273  \begin{tabular}{l}
3274    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3275    ...\\
3276    {\sf adouble x, y;}\\
3277    {\sf double *ptr=new double[NUMBER\_DIRECTIONS];}\\
3278      ...\\
3279    {\sf x1=2;}\\
3280    {\sf x1.setADValue(ptr);}\\
3281    ...\\
3282    {\sf ptr=y.getADValue();}
3283  \end{tabular}
3284\end{center}
3285Additionally, the tapeless vector forward mode of ADOL-C offers two
3286new methods for setting/getting the derivative values. Similar
3287to the scalar case, {\sf double} values are used but due to the vector
3288mode the position of the desired vector element must be supplied in
3289the argument list:
3290\begin{center}
3291  \begin{tabular}{l}
3292    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3293    ...\\
3294    {\sf adouble x, y;}\\
3295    ...\\
3296    {\sf x1=2;}\\
3297    {\sf x1.setADValue(5,1);\hspace*{3.7cm}// set the 6th point value of x to 1.0}\\
3298      ...\\
3299    {\sf cout $<<$ y.getADValue(3) $<<$ endl;\hspace*{1cm}// print the 4th derivative value of y}
3300  \end{tabular}
3301\end{center}
3302The resulting source code containing all changes that are required is
3303shown in \autoref{fig:modcode2}
3304
3305\begin{figure}[!h!t!b]
3306\framebox[\textwidth]{\parbox{\textwidth}{
3307\begin{tabbing}
3308\hspace*{-1cm} \= \kill
3309\> {\sf \#include $<$iostream$>$}\\
3310\> {\sf  using namespace std;}\\
3311\\
3312\> {\sf \#define ADOLC\_TAPELESS}\\
3313\> {\sf \#define NUMBER\_DIRECTIONS 3}\\
3314\> {\sf \#include $<$adouble.h$>$}\\
3315\> {\sf typedef adtl::adouble adouble;}\\
3316\\
3317\> {\sf ADOLC\_TAPELESS\_UNIQUE\_INTERNALS;}\\
3318\\
3319\> {\sf int main() \{}\\
3320\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3321\\
3322\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3323\> {\sf \rule{1cm}{0pt}...\hspace*{3cm}// Initialize $x_i$}\\
3324\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j) if (i==j) x[i].setADValue(j,1);}\\
3325\> {\sf \rule{0.5cm}{0pt}\}}\\
3326\\
3327\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3328\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3329\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3330\\
3331\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3332\> {\sf \rule{0.5cm}{0pt}cout $<<$ "jacobian : " $<<$ endl;}\\
3333\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3334\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j)}\\
3335\> {\sf \rule{1.5cm}{0pt}cout $<<$ y[i].getADValue(j) $<<$ "  ";}\\
3336\> {\sf \rule{1cm}{0pt}cout $<<$ endl;}\\
3337\> {\sf \rule{0.5cm}{0pt}\}}\\
3338\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3339\> {\sf \}}
3340\end{tabbing}
3341}}
3342\caption{Example for tapeless vector forward mode}
3343\label{fig:modcode2}
3344\end{figure}
3345%
3346\subsection{Compiling and Linking the Source Code}
3347%
3348After incorporating the required changes, one has to compile the
3349source code and link the object files to get the executable.
3350As long as the ADOL-C header files are not included in the absolute path
3351the compile sequence should be similar to the following example:
3352\begin{center}
3353  \begin{tabular}{l}
3354    {\sf g++ -I/home/username/adolc\_base/include -c tapeless\_scalar.cpp}
3355  \end{tabular}
3356\end{center}
3357The \verb#-I# option tells the compiler where to search for the ADOL-C
3358header files. This option can be omitted when the headers are included
3359with absolute path or if ADOL-C is installed in a ``global'' directory.
3360
3361Since the tapeless forward version of ADOL-C is implemented in the
3362header \verb#adouble.h# as complete inline intended version,
3363the object files do not need to be linked against any external ADOL-C
3364code or the ADOL-C library. Therefore, the example started above could be finished with the
3365following command:
3366\begin{center}
3367  \begin{tabular}{l}
3368    {\sf g++ -o tapeless\_scalar tapeless\_scalar.o}
3369  \end{tabular}
3370\end{center}
3371The mentioned source codes {\sf tapeless\_scalar.c} and {\sf tapeless\_vector.c} 
3372illustrating the use of the for tapeless scalar and vector mode can be found in
3373the directory {\sf examples}.
3374%
3375\subsection{Concluding Remarks for the Tapeless Forward Mode Variant}
3376%
3377As many other AD methods the tapeless forward mode provided by the
3378ADOL-C package has its own strengths and drawbacks. Please read the
3379following section carefully to become familiar with the things that
3380can occur:
3381\begin{itemize}
3382  \item Advantages:
3383    \begin{itemize}
3384      \item Code speed\\
3385        Increasing computation speed was one of the main aspects in writing
3386        the tapeless code. In many cases higher performance can be
3387        expected this way.
3388      \item Easier linking process\\
3389        As another result from the code inlining the object code does
3390        not need to be linked against an ADOL-C library.
3391      \item Smaller overall memory requirements\\
3392        Tapeless ADOL-C does not write tapes anymore, as the name
3393        implies. Loop ''unrolling'' can be avoided this
3394        way. Considered main memory plus disk space as overall memory
3395        requirements the tapeless version can be
3396        executed in a more efficient way.
3397    \end{itemize}
3398  \item Drawbacks:
3399    \begin{itemize}
3400    \item Main memory limitations\\
3401      The ability to compute derivatives to a given function is
3402      bounded by the main memory plus swap size  when using
3403      tapeless ADOL-C. Computation from swap should be avoided anyway
3404      as far as possible since it slows down the computing time
3405      drastically. Therefore, if the program execution is 
3406      terminated without error message insufficient memory size can be
3407      the reason among other things. The memory requirements $M$ can
3408      be determined roughly as followed:
3409      \begin{itemize}
3410        \item Scalar mode: $M=$(number of {\sf adouble}s)$*2 + M_p$
3411        \item Vector mode: $M=$(number of {\sf adouble}s)*({\sf
3412          NUMBER\_DIRECTIONS}$+1) + M_p$ 
3413      \end{itemize}
3414      where the storage size of all non {\sf adouble} based variables is described by $M_p$.
3415    \item Compile time\\
3416      As discussed in the previous sections, the tapeless forward mode of
3417      the ADOL-C package is implemented as inline intended version. Using
3418      this approach results in a higher source code size, since every
3419      operation involving at least one {\sf adouble} stands for the
3420      operation itself as well as for the corresponding derivative
3421      code after the inlining process. Therefore, the compilation time
3422      needed for the tapeless version may be higher than that of the tape based code.
3423    \item Code Size\\
3424      A second drawback and result of the code inlining is the
3425      increase of code sizes for the binaries. The increase
3426      factor compared to the corresponding tape based program is
3427      difficult to quantify as it is task dependent. Practical results
3428      have shown that values between 1.0 and 2.0 can be
3429      expected. Factors higher than 2.0 are possible too and even
3430      values below 1.0 have been observed.
3431    \end{itemize}
3432\end{itemize}
3433%
3434%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3435\section{Installing and Using ADOL-C}
3436\label{install}
3437%
3438\subsection{Generating the ADOL-C Library}
3439\label{genlib}
3440%
3441The currently built system is best summarized by the ubiquitous gnu
3442install triplet
3443\begin{center}
3444\verb=configure - make - make install= .
3445\end{center}
3446Executing this three steps from the package base directory
3447\verb=</SOMEPATH/=\texttt{\packagetar}\verb=>= will compile the static and the dynamic
3448ADOL-C library with default options and install the package (libraries
3449and headers) into the default installation directory {\tt
3450  \verb=<=\$HOME/adolc\_base\verb=>=}. Inside the install directory
3451the subdirectory \verb=include= will contain all the installed header
3452files that may be included by the user program, the subdirectory
3453\verb=lib= will contain the 32-bit compiled library
3454and the subdirectory \verb=lib64= will contain the 64-bit compiled
3455library. Depending on the compiler only one of \verb=lib= or
3456\verb=lib64= may be created.
3457
3458Before doing so the user may modify the header file \verb=usrparms.h=
3459in order to tailor the \mbox{ADOL-C} package to the needs in the
3460particular system environment as discussed in
3461\autoref{Customizing}. The configure procedure which creates the necessary
3462\verb=Makefile=s can be customized by use of some switches. Available
3463options and their meaning can be obtained by executing
3464\verb=./configure --help= from the package base directory.
3465
3466All object files and other intermediately generated files can be
3467removed by the call \verb=make clean=. Uninstalling ADOL-C by
3468executing \verb=make uninstall= is only reasonable after a previous
3469called \verb=make install= and will remove all installed package files
3470but will leave the created directories behind.
3471
3472The sparse drivers are included in the ADOL-C libraries if the
3473\verb=./configure= command is executed with the option
3474\verb=--enable-sparse=. The ColPack library available at
3475\verb=http://www.cscapes.org/coloringpage/software.htm= is required to
3476compute the sparse structures, and is searched for in all the default
3477locations as well as in the subdirectory \verb=<ThirdParty/ColPack/>=.
3478In case the library and its headers are installed in a nonstandard path
3479this may be specified with the \verb?--with-colpack=PATH? option.
3480It is assumed that the library and its header files have the following
3481directory structure: \verb?PATH/include? contains all the header
3482files,
3483\verb?PATH/lib? contains the 32-bit compiled library and
3484\verb?PATH/lib64? contains the 64-bit compiled library. Depending on
3485the compiler used to compile {\sf ADOL-C} one of these libraries will
3486be used for linking.
3487 
3488The option \verb=--disable-stdczero= turns off the initialization in the {\sf adouble} default
3489constructor. This will improve efficiency but requires that there be no implicit array initialization in the code, see
3490\autoref{WarSug}
3491
3492
3493\subsection{Compiling and Linking the Example Programs}
3494%
3495The installation procedure described in \autoref{genlib} also
3496provides the \verb=Makefile=s  to compile the example programs in the
3497directories \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= and the
3498additional examples in
3499\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples/additional_examples=. However,
3500one has to execute the
3501\verb=configure= command with  appropriate options for the ADOL-C package to enable the compilation of
3502examples. Available options are:
3503\begin{center}
3504\begin{tabular}[t]{ll}
3505\verb=--enable-docexa=&build all examples discussed in this manual\\
3506&(compare \autoref{example})\\
3507\verb=--enable-addexa=&build all additional examples\\
3508&(See file \verb=README= in the various subdirectories)
3509\end{tabular}
3510\end{center}
3511
3512Just calling \verb=make= from the packages base directory generates
3513all configured examples and the library if necessary. Compiling from
3514subdirectory \verb=examples= or one of its subfolders is possible
3515too. At least one kind of the ADOL-C library (static or shared) must
3516have been built previously in that case. Hence, building the library
3517is always the first step.
3518
3519For Compiling the library and the documented examples on Windows using
3520Visual Studio please refer to the \verb=<Readme_VC++.txt>= files in
3521the \verb=<windows/>=, \verb=<ThirdParty/ColPack/>= and
3522\verb=<ADOL-C/examples/>= subdirectories.
3523%
3524\subsection{Description of Important Header Files}
3525\label{ssec:DesIH}
3526%
3527The application of the facilities of ADOL-C requires the user
3528source code (program or module) to include appropriate
3529header files where the desired data types and routines are
3530prototyped. The new hierarchy of header files enables the user
3531to take one of two possible ways to access the right interfaces.
3532The first and easy way is recommended to beginners: As indicated in
3533\autoref{globalHeaders} the provided {\em global} header file
3534\verb=<adolc/adolc.h>= can be included by any user code to support all
3535capabilities of ADOL-C depending on the particular programming language
3536of the source.   
3537
3538\begin{table}[h]
3539\center \small
3540\begin{tabular}{|p{3.6cm}|p{10.5cm}|}\hline
3541\verb=<adolc/adolc.h>= & 
3542\begin{tabular*}{10.5cm}{cp{9.5cm}}
3543  \boldmath $\rightarrow$ \unboldmath
3544                 & global header file available for easy use of ADOL-C; \\
3545  $\bullet$      & includes all ADOL-C header files depending on
3546                   whether the users source is C++ or C code.
3547\end{tabular*}
3548\\ \hline
3549\verb=<adolc/usrparms.h>= &
3550\begin{tabular*}{10.5cm}{cp{9.5cm}}
3551  \boldmath $\rightarrow$ \unboldmath
3552                 & user customization of ADOL-C package (see
3553                   \autoref{Customizing}); \\
3554  $\bullet$      & after a change of
3555                   user options the ADOL-C library \verb=libadolc.*=
3556                   has to be rebuilt (see \autoref{genlib}); \\
3557  $\bullet$      & is included by all ADOL-C header files and thus by all user
3558                   programs.
3559\end{tabular*} \\ \hline
3560\end{tabular}
3561\caption{Global header files}
3562\label{globalHeaders}
3563\end{table} 
3564
3565The second way is meant for the more advanced ADOL-C user: Some source code
3566includes only those interfaces used by the particular application.
3567The respectively needed header files are indicated
3568throughout the manual.
3569Existing application determined dependences between the provided
3570ADOL-C routines are realized by automatic includes of headers in order
3571to maintain easy use. The header files important to the user are described
3572in the \autoref{importantHeaders1} and \autoref{importantHeaders2}.
3573
3574\begin{table}[h]
3575\center \small
3576\begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline
3577%\multicolumn{2}{|l|}{\bf Tracing/taping}\\ \hline
3578\verb=<adolc/adouble.h>= & 
3579\begin{tabular*}{10.5cm}{cp{9.5cm}}
3580  \boldmath $\rightarrow$ \unboldmath
3581                & provides the interface to the basic active
3582                  scalar data type of ADOL-C: {\sf class adouble} 
3583                  (see \autoref{prepar});
3584%  $\bullet$     & includes the header files \verb=<adolc/avector.h>= and \verb=<adolc/taputil.h>=.
3585\end{tabular*}
3586\\ \hline
3587% \verb=<adolc/avector.h>= &
3588%\begin{tabular*}{10.5cm}{cp{9.5cm}}
3589%  \boldmath $\rightarrow$ \unboldmath
3590%                & provides the interface to the active vector
3591%                  and matrix data types of ADOL-C: {\sf class adoublev}
3592%                  and {\sf class adoublem}, respectively
3593%                   (see \autoref{arrays}); \\
3594%  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
3595%\end{tabular*}
3596%\\ \hline
3597 \verb=<adolc/taputil.h>= & 
3598\begin{tabular*}{10.5cm}{cp{9.5cm}}
3599  \boldmath $\rightarrow$ \unboldmath
3600                & provides functions to start/stop the tracing of
3601                  active sections (see \autoref{markingActive})
3602                  as well as utilities to obtain
3603                  tape statistics (see \autoref{examiningTape}); \\
3604  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
3605\end{tabular*}
3606\\ \hline
3607\end{tabular}
3608\caption{Important header files: tracing/taping}
3609\label{importantHeaders1}
3610\end{table} 
3611%
3612\begin{table}[h]
3613\center \small
3614\begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline
3615%\multicolumn{2}{|l|}{\bf Evaluation of derivatives}\\ \hline
3616\verb=<adolc/interfaces.h>= & 
3617\begin{tabular*}{10.5cm}{cp{9.5cm}}
3618  \boldmath $\rightarrow$ \unboldmath
3619                & provides interfaces to the {\sf forward} and
3620                  {\sf reverse} routines as basic versions of derivative
3621                  evaluation (see \autoref{forw_rev}); \\
3622  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
3623  $\bullet$     & includes the header \verb=<adolc/sparse/sparsedrivers.h>=; \\
3624  $\bullet$     & is included by the header \verb=<adolc/drivers/odedrivers.h>=.
3625\end{tabular*}
3626\\ \hline
3627\verb=<adolc/drivers.h>= & 
3628\begin{tabular*}{10.5cm}{cp{9.5cm}}
3629  \boldmath $\rightarrow$ \unboldmath
3630                & provides ``easy to use'' drivers for solving
3631                  optimization problems and nonlinear equations
3632                  (see \autoref{optdrivers}); \\
3633  $\bullet$     & comprises C and Fortran-callable versions.
3634\end{tabular*}
3635\\ \hline
3636\begin{minipage}{3cm}
3637\verb=<adolc/sparse/=\newline\verb= sparsedrivers.h>=
3638\end{minipage}  & 
3639\begin{tabular*}{10.5cm}{cp{9.5cm}}
3640  \boldmath $\rightarrow$ \unboldmath
3641                & provides the ``easy to use'' sparse drivers
3642                  to exploit the sparsity structure of
3643                  Jacobians (see \autoref{sparse}); \\
3644  \boldmath $\rightarrow$ \unboldmath & provides interfaces to \mbox{C++}-callable versions
3645                  of {\sf forward} and {\sf reverse} routines
3646                  propagating bit patterns (see \autoref{ProBit}); \\
3647
3648  $\bullet$     & is included by the header \verb=<adolc/interfaces.h>=.
3649\end{tabular*}
3650\\ \hline
3651\begin{minipage}{3cm}
3652\verb=<adolc/sparse/=\newline\verb= sparse_fo_rev.h>=
3653\end{minipage}  & 
3654\begin{tabular*}{10.5cm}{cp{9.5cm}}
3655  \boldmath $\rightarrow$ \unboldmath
3656                & provides interfaces to the underlying C-callable
3657                  versions of {\sf forward} and {\sf reverse} routines
3658                  propagating bit patterns.
3659\end{tabular*}
3660\\ \hline
3661\begin{minipage}{3cm}
3662\verb=<adolc/drivers/=\newline\verb= odedrivers.h>=
3663\end{minipage}  &
3664\begin{tabular*}{10.5cm}{cp{9.5cm}}
3665  \boldmath $\rightarrow$ \unboldmath
3666                & provides ``easy to use'' drivers for numerical
3667                  solution of ordinary differential equations
3668                  (see \autoref{odedrivers}); \\
3669  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
3670  $\bullet$     & includes the header \verb=<adolc/interfaces.h>=.
3671\end{tabular*}
3672\\ \hline
3673\begin{minipage}{3cm}
3674\verb=<adolc/drivers/=\newline\verb= taylor.h>=
3675\end{minipage}  &
3676\begin{tabular*}{10.5cm}{cp{9.5cm}}
3677  \boldmath $\rightarrow$ \unboldmath
3678                & provides ``easy to use'' drivers for evaluation
3679                  of higher order derivative tensors (see
3680                  \autoref{higherOrderDeriv}) and inverse/implicit function
3681                  differentiation (see \autoref{implicitInverse});\\
3682  $\bullet$     & comprises C++ and C-callable versions.
3683\end{tabular*} 
3684\\ \hline
3685\verb=<adolc/adalloc.h>= &
3686\begin{tabular*}{10.5cm}{cp{9.5cm}}
3687  \boldmath $\rightarrow$ \unboldmath
3688                & provides C++ and C functions for allocation of
3689                  vectors, matrices and three dimensional arrays
3690                  of {\sf double}s.
3691\end{tabular*}
3692\\ \hline
3693\end{tabular}
3694\caption{Important header files: evaluation of derivatives}
3695\label{importantHeaders2}
3696\end{table} 
3697%
3698\subsection{Compiling and Linking C/C++ Programs}
3699%
3700To compile a C/C++ program or single module using ADOL-C
3701data types and routines one has to ensure that all necessary
3702header files according to \autoref{ssec:DesIH} are
3703included. All modules involving {\em active} data types as
3704{\sf adouble}
3705%, {\bf adoublev} and {\bf adoublem}
3706have to be compiled as C++. Modules that make use of a previously
3707generated tape to evaluate derivatives can either be programmed in ANSI-C
3708(while avoiding all C++ interfaces) or in C++. Depending
3709on the chosen programming language the header files provide
3710the right ADOL-C prototypes.
3711For linking the resulting object codes the library \verb=libadolc.*=
3712must be used (see \autoref{genlib}).
3713%
3714\subsection{Adding Quadratures as Special Functions}
3715%
3716\label{quadrat}
3717%
3718Suppose an integral
3719\[ f(x) = \int\limits^{x}_{0} g(t) dt \]
3720is evaluated numerically by a user-supplied function
3721\begin{center}
3722{\sf  double  myintegral(double\& x);}
3723\end{center}
3724Similarly, let us suppose that the integrand itself is evaluated by
3725a user-supplied block of C code {\sf integrand}, which computes a
3726variable with the fixed name {\sf val} from a variable with the fixed
3727name {\sf arg}. In many cases of interest, {\sf integrand} will
3728simply be of the form
3729\begin{center}
3730{\sf \{ val = expression(arg) \}}\enspace .
3731\end{center}
3732In general, the final assignment to {\sf val} may be preceded
3733by several intermediate calculations, possibly involving local
3734active variables of type {\sf adouble}, but no external or static
3735variables of that type.  However, {\sf integrand} may involve local
3736or global variables of type {\sf double} or {\sf int}, provided they
3737do not depend on the value of {\sf arg}. The variables {\sf arg} and
3738{\sf val} are declared automatically; and as {\sf integrand} is a block
3739rather than a function, {\sf integrand} should have no header line. 
3740
3741Now the function {\sf myintegral} can be overloaded for {\sf adouble}
3742arguments and thus included in the library of elementary functions
3743by the following modifications:
3744\begin{enumerate}
3745\item
3746At the end of the file \verb=<adouble.cpp>=, include the full code
3747defining \\ {\sf double myintegral(double\& x)}, and add the line
3748\begin{center}
3749{\sf extend\_quad(myintegral, integrand); }
3750\end{center}
3751This macro is extended to the definition of
3752 {\sf adouble myintegral(adouble\& arg)}.
3753Then remake the library \verb=libadolc.*= (see \autoref{genlib}).
3754\item
3755In the definition of the class
3756{\sf ADOLC\_DLL\_EXPORT adouble} in \verb=<adolc/adouble.h>=, add the statement
3757\begin{center}
3758{\sf friend adouble myintegral(adouble\&)}.
3759\end{center}
3760\end{enumerate}
3761In the first modification, {\sf myintegral} represents the name of the
3762{\sf double} function, whereas {\sf integrand} represents the actual block
3763of C code.
3764
3765For example, in case of the inverse hyperbolic cosine, we have
3766{\sf myintegral} = {\sf acosh}. Then {\sf integrand} can be written as
3767{\sf \{ val = sqrt(arg*arg-1); \}} 
3768so that the line
3769\begin{center}
3770{\sf extend\_quad(acosh,val = sqrt(arg*arg-1));} 
3771\end{center}
3772can be added to the file \verb=<adouble.cpp>=.
3773A mathematically equivalent but longer representation of
3774{\sf integrand} is
3775\begin{center}
3776\begin{tabbing}
3777{\sf \{ }\hspace{1.0in}\= {\sf  \{ adouble} \= temp =   \kill
3778 \>{\sf  \{ adouble} \> {\sf temp = arg;} \\
3779 \> \ \> {\sf  temp = temp*temp; } \\ 
3780 \> \ \> {\sf  val = sqrt(temp-1); \}} 
3781\end{tabbing}
3782\end{center} 
3783The code block {\sf integrand} may call on any elementary function that has already
3784been defined in file \verb=<adouble.cpp>=, so that one may also introduce
3785iterated integrals.
3786%
3787%
3788\section{Example Codes}
3789\label{example}
3790%
3791The following listings are all simplified versions of codes that
3792are contained in the example subdirectory
3793\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= of ADOL-C. In particular,
3794we have left out timings, which are included in the complete codes.
3795%
3796\subsection{Speelpenning's Example ({\tt speelpenning.cpp})}
3797%
3798The first example evaluates the gradient and the Hessian of
3799the function
3800\[ 
3801y \; = \; f(x)\; =\; \prod_{i=0}^{n-1} x_i
3802\] 
3803using the appropriate drivers {\sf gradient} and {\sf hessian}.
3804
3805\begin{verbatim}
3806#include <adolc/adouble.h>               // use of active doubles and taping
3807#include <adolc/drivers/drivers.h>       // use of "Easy to Use" drivers
3808                                   // gradient(.) and hessian(.)
3809#include <adolc/taping.h>                // use of taping
3810...
3811void main() {
3812int n,i,j;
3813size_t tape_stats[STAT_SIZE];
3814cout << "SPEELPENNINGS PRODUCT (ADOL-C Documented Example) \n";
3815cout << "number of independent variables = ?  \n";
3816cin >> n;
3817double* xp = new double[n];         
3818double  yp = 0.0;
3819adouble* x = new adouble[n];     
3820adouble  y = 1;
3821for(i=0;i<n;i++)
3822  xp[i] = (i+1.0)/(2.0+i);         // some initialization
3823trace_on(1);                       // tag =1, keep=0 by default
3824  for(i=0;i<n;i++) {
3825    x[i] <<= xp[i]; y *= x[i]; }     
3826  y >>= yp;
3827  delete[] x;                     
3828trace_off();
3829tapestats(1,tape_stats);           // reading of tape statistics
3830cout<<"maxlive "<<tape_stats[2]<<"\n";
3831...                                // ..... print other tape stats
3832double* g = new double[n];       
3833gradient(1,n,xp,g);                // gradient evaluation
3834double** H=(double**)malloc(n*sizeof(double*));
3835for(i=0;i<n;i++)
3836  H[i]=(double*)malloc((i+1)*sizeof(double));
3837hessian(1,n,xp,H);                 // H equals (n-1)g since g is
3838double errg = 0;                   // homogeneous of degree n-1.
3839double errh = 0;
3840for(i=0;i<n;i++)
3841  errg += fabs(g[i]-yp/xp[i]);     // vanishes analytically.
3842for(i=0;i<n;i++) {
3843  for(j=0;j<n;j++) {
3844    if (i>j)                       // lower half of hessian
3845      errh += fabs(H[i][j]-g[i]/xp[j]); } }
3846cout << yp-1/(1.0+n) << " error in function \n";
3847cout << errg <<" error in gradient \n";
3848cout << errh <<" consistency check \n";
3849}                                  // end main
3850\end{verbatim}
3851%
3852\subsection{Power Example ({\tt powexam.cpp})}
3853%
3854The second example function evaluates the $n$-th power of a real
3855variable $x$ in
3856$\log_2 n$ multiplications by recursive halving of the exponent. Since
3857there is only one independent variable, the scalar derivative can be
3858computed by
3859using both {\sf forward} and {\sf reverse}, and the
3860results are subsequently compared.
3861\begin{verbatim}
3862#include <adolc/adolc.h>                 // use of ALL ADOL-C interfaces
3863
3864adouble power(adouble x, int n) {
3865adouble z=1;
3866if (n>0) {                         // recursion and branches
3867  int nh =n/2;                     // that do not depend on
3868  z = power(x,nh);                 // adoubles are fine !!!!
3869  z *= z;
3870  if (2*nh != n)
3871    z *= x;
3872  return z; }                      // end if
3873else {
3874  if (n==0)                        // the local adouble z dies
3875    return z;                      // as it goes out of scope.
3876  else
3877    return 1/power(x,-n); }        // end else
3878} // end power
3879\end{verbatim}
3880The function {\sf power} above was obtained from the original
3881undifferentiated version by simply changing the type of all
3882{\sf double}s including the return variable to {\sf adouble}s. The new version
3883can now be called from within any active section, as in the following
3884main program.
3885\begin{verbatim}
3886#include ...                       // as above
3887int main() {
3888int i,n,tag=1;
3889cout <<"COMPUTATION OF N-TH POWER (ADOL-C Documented Example)\n\n";
3890cout<<"monomial degree=? \n";      // input the desired degree
3891cin >> n;
3892                                   // allocations and initializations
3893double* Y[1];
3894*Y = new double[n+2];
3895double* X[1];                      // allocate passive variables with
3896*X = new double[n+4];              // extra dimension for derivatives
3897X[0][0] = 0.5;                     // function value = 0. coefficient
3898X[0][1] = 1.0;                     // first derivative = 1. coefficient
3899for(i=0;i<n+2;i++)
3900  X[0][i+2]=0;                     // further coefficients
3901double* Z[1];                      // used for checking consistency
3902*Z = new double[n+2];              // between forward and reverse
3903adouble y,x;                       // declare active variables
3904                                   // beginning of active section
3905trace_on(1);                       // tag = 1 and keep = 0
3906x <<= X[0][0];                     // only one independent var
3907y = power(x,n);                    // actual function call
3908y >>= Y[0][0];                     // only one dependent adouble
3909trace_off();                       // no global adouble has died
3910                                   // end of active section
3911double u[1];                       // weighting vector
3912u[0]=1;                            // for reverse call
3913for(i=0;i<n+2;i++) {               // note that keep = i+1 in call
3914  forward(tag,1,1,i,i+1,X,Y);      // evaluate the i-the derivative
3915  if (i==0)
3916    cout << Y[0][i] << " - " << y.value() << " = " << Y[0][i]-y.value()
3917    << " (should be 0)\n";
3918  else
3919    cout << Y[0][i] << " - " << Z[0][i] << " = " << Y[0][i]-Z[0][i]
3920    << " (should be 0)\n";
3921  reverse(tag,1,1,i,u,Z);          // evaluate the (i+1)-st derivative
3922  Z[0][i+1]=Z[0][i]/(i+1); }       // scale derivative to Taylorcoeff.
3923return 1;
3924}                                  // end main
3925\end{verbatim}
3926Since this example has only one independent and one dependent variable,
3927{\sf forward} and {\sf reverse} have the same complexity and calculate
3928the same scalar derivatives, albeit with a slightly different scaling.
3929By replacing the function {\sf power} with any other univariate test function,
3930one can check that {\sf forward} and {\sf reverse} are at least consistent.
3931In the following example the number of independents is much larger
3932than the number of dependents, which makes the reverse mode preferable.
3933%
3934\subsection{Determinant Example ({\tt detexam.cpp})}
3935%
3936Now let us consider an exponentially expensive calculation,
3937namely, the evaluation of a determinant by recursive expansion
3938along rows. The gradient of the determinant with respect to the
3939matrix elements is simply the adjoint, i.e. the matrix of cofactors.
3940Hence the correctness of the numerical result is easily checked by
3941matrix-vector multiplication. The example illustrates the use
3942of {\sf adouble} arrays and pointers. 
3943
3944\begin{verbatim}
3945#include <adolc/adouble.h>               // use of active doubles and taping
3946#include <adolc/interfaces.h>            // use of basic forward/reverse
3947                                   // interfaces of ADOL-C
3948adouble** A;                       // A is an n x n matrix
3949int i,n;                           // k <= n is the order
3950adouble det(int k, int m) {        // of the sub-matrix
3951if (m == 0) return 1.0 ;           // its column indices
3952else {                             // are encoded in m
3953  adouble* pt = A[k-1];
3954  adouble t = zero;                // zero is predefined
3955  int s, p =1;
3956  if (k%2) s = 1; else s = -1;
3957  for(i=0;i<n;i++) {
3958    int p1 = 2*p;
3959    if (m%p1 >= p) {
3960      if (m == p) {
3961        if (s>0) t += *pt; else t -= *pt; }
3962      else {
3963        if (s>0)
3964          t += *pt*det(k-1,m-p);   // recursive call to det
3965        else
3966          t -= *pt*det(k-1,m-p); } // recursive call to det
3967      s = -s;}
3968    ++pt;
3969    p = p1;}
3970  return t; }
3971}                                  // end det
3972\end{verbatim}
3973As one can see, the overloading mechanism has no problem with pointers
3974and looks exactly the same as the original undifferentiated function
3975except for the change of type from {\sf double} to {\sf adouble}.
3976If the type of the temporary {\sf t} or the pointer {\sf pt} had not been changed,
3977a compile time error would have resulted. Now consider a corresponding
3978calling program.
3979
3980\begin{verbatim}
3981#include ...                       // as above
3982int main() {
3983int i,j, m=1,tag=1,keep=1;
3984cout << "COMPUTATION OF DETERMINANTS (ADOL-C Documented Example)\n\n";
3985cout << "order of matrix = ? \n";  // select matrix size
3986cin >> n;
3987A = new adouble*[n];             
3988trace_on(tag,keep);                // tag=1=keep
3989  double detout=0.0, diag = 1.0;   // here keep the intermediates for
3990  for(i=0;i<n;i++) {               // the subsequent call to reverse
3991    m *=2;
3992    A[i] = new adouble[n];         // not needed for adoublem
3993    adouble* pt = A[i];
3994    for(j=0;j<n;j++)
3995      A[i][j] <<= j/(1.0+i);       // make all elements of A independent
3996    diag += A[i][i].value();        // value() converts to double
3997    A[i][i] += 1.0; }
3998  det(n,m-1) >>= detout;           // actual function call
3999  printf("\n %f - %f = %f  (should be 0)\n",detout,diag,detout-diag);
4000trace_off();
4001double u[1];
4002u[0] = 1.0;
4003double* B = new double[n*n];
4004reverse(tag,1,n*n,1,u,B);
4005cout <<" \n first base? : ";
4006for (i=0;i<n;i++) {
4007  adouble sum = 0;
4008  for (j=0;j<n;j++)                // the matrix A times the first n
4009    sum += A[i][j]*B[j];           // components of the gradient B
4010  cout<<sum.value()<<" "; }         // must be a Cartesian basis vector
4011return 1;
4012}                                  // end main
4013\end{verbatim}
4014The variable {\sf diag} should be mathematically
4015equal to the determinant, because the
4016matrix {\sf A} is defined as a rank 1 perturbation of the identity.
4017%
4018\subsection{Ordinary Differential Equation Example ({\tt odexam.cpp})}
4019\label{exam:ode}
4020%
4021Here, we consider a nonlinear ordinary differential equation that
4022is a slight modification of the Robertson test problem
4023given in Hairer and Wanner's book on the numerical solution of
4024ODEs \cite{HW}. The following source code computes the corresponding
4025values of $y^{\prime} \in \R^3$:
4026\begin{verbatim}
4027#include <adolc/adouble.h>                  // use of active doubles and taping
4028#include <adolc/drivers/odedrivers.h>       // use of "Easy To use" ODE drivers
4029#include <adolc/adalloc.h>                  // use of ADOL-C allocation utilities
4030
4031void tracerhs(short int tag, double* py, double* pyprime) {
4032adouble y[3];                         // this time we left the parameters
4033adouble yprime[3];                    // passive and use the vector types
4034trace_on(tag);
4035for (int i=0; i<3; i++)
4036     y[i] <<= py[i];                  // initialize and mark independents
4037yprime[0] = -sin(y[2]) + 1e8*y[2]*(1-1/y[0]);
4038yprime[1] = -10*y[0] + 3e7*y[2]*(1-y[1]);
4039yprime[2] = -yprime[0] - yprime[1];
4040yprime >>= pyprime;                   // mark and pass dependents
4041trace_off(tag);
4042}                                     // end tracerhs
4043\end{verbatim}
4044The Jacobian of the right-hand side has large
4045negative eigenvalues, which make the ODE quite stiff. We  have added
4046some numerically benign transcendentals to make the differentiation
4047more interesting.
4048The following main program uses {\sf forode} to calculate the Taylor series
4049defined by the ODE at the given point $y_0$ and {\sf reverse} as well
4050as {\sf accode} to compute the Jacobians of the coefficient vectors
4051with respect to $x_0$.
4052\begin{verbatim}
4053#include .......                   // as above
4054int main() {
4055int i,j,deg; 
4056int n=3;
4057double py[3];
4058double pyp[3];
4059cout << "MODIFIED ROBERTSON TEST PROBLEM (ADOL-C Documented Example)\n";
4060cout << "degree of Taylor series =?\n";
4061cin >> deg;
4062double **X;
4063X=(double**)malloc(n*sizeof(double*));
4064for(i=0;i<n;i++)
4065  X[i]=(double*)malloc((deg+1)*sizeof(double));
4066double*** Z=new double**[n];
4067double*** B=new double**[n];
4068short** nz = new short*[n];
4069for(i=0;i<n;i++) {
4070  Z[i]=new double*[n];
4071  B[i]=new double*[n];
4072  for(j=0;j<n;j++) {
4073    Z[i][j]=new double[deg];
4074    B[i][j]=new double[deg]; }     // end for
4075}                                  // end for
4076for(i=0;i<n;i++) {
4077  py[i] = (i == 0) ? 1.0 : 0.0;    // initialize the base point
4078  X[i][0] = py[i];                 // and the Taylor coefficient;
4079  nz[i] = new short[n]; }          // set up sparsity array
4080tracerhs(1,py,pyp);                // trace RHS with tag = 1
4081forode(1,n,deg,X);                 // compute deg coefficients
4082reverse(1,n,n,deg-1,Z,nz);         // U defaults to the identity
4083accode(n,deg-1,Z,B,nz);
4084cout << "nonzero pattern:\n";
4085for(i=0;i<n;i++) {
4086  for(j=0;j<n;j++)
4087    cout << nz[i][j]<<"\t";
4088  cout <<"\n"; }                   // end for
4089return 1;
4090}                                  // end main
4091\end{verbatim}
4092\noindent The pattern {\sf nz} returned by {\sf accode} is
4093\begin{verbatim}
4094              3  -1   4
4095              1   2   2
4096              3   2   4
4097\end{verbatim}
4098The original pattern {\sf nz} returned by {\sf reverse} is the same
4099except that the negative entry $-1$ was zero.
4100%
4101%\subsection {Gaussian Elimination Example ({\tt gaussexam.cpp})}
4102%\label{gaussexam}
4103%
4104%The following example uses conditional assignments to show the usage of a once produced tape
4105%for evaluation at new arguments. The elimination is performed with
4106%column pivoting.
4107%\begin{verbatim}
4108%#include <adolc/adolc.h>           // use of ALL ADOL-C interfaces
4109
4110%void gausselim(int n, adoublem& A, adoublev& bv) {
4111%along i;                           // active integer declaration
4112%adoublev temp(n);                  // active vector declaration
4113%adouble r,rj,temps;
4114%int j,k;
4115%for(k=0;k<n;k++) {                 // elimination loop
4116%  i = k;
4117%  r = fabs(A[k][k]);               // initial pivot size
4118%  for(j=k+1;j<n;j++) {
4119%    rj = fabs(A[j][k]);             
4120%    condassign(i,rj-r,j);          // look for a larger element in the same
4121%    condassign(r,rj-r,rj); }       // column with conditional assignments
4122%  temp = A[i];                     // switch rows using active subscripting
4123%  A[i] = A[k];                     // necessary even if i happens to equal
4124%  A[k] = temp;                     // k during taping
4125%  temps = bv[i];
4126%  bv[i]=bv[k];
4127%  bv[k]=temps;
4128%  if (!value(A[k][k]))             // passive subscripting
4129%    exit(1);                       // matrix singular!
4130%  temps= A[k][k];
4131%  A[k] /= temps;
4132%  bv[k] /= temps;
4133%  for(j=k+1;j<n;j++) {
4134%    temps= A[j][k];
4135%    A[j] -= temps*A[k];            // vector operations
4136%    bv[j] -= temps*bv[k]; }        // endfor
4137%}                                  // end elimination loop
4138%temp=0.0;
4139%for(k=n-1;k>=0;k--)                // backsubstitution
4140%  temp[k] = (bv[k]-(A[k]*temp))/A[k][k];
4141%bv=temp;
4142%}                                  // end gausselim
4143%\end{verbatim}
4144%\noindent This function can be called from any program
4145%that suitably initializes
4146%the components of {\sf A} and {\sf bv}
4147%as independents. The resulting tape can be
4148%used to solve any nonsingular linear system of the same size and
4149%to get the sensitivities of the solution with respect to the
4150%system matrix and the right hand side.
4151%\vspace*{-4mm}
4152%
4153\section*{Acknowledgements}
4154%
4155Parts of the ADOL-C source were developed by Andreas
4156Kowarz, Hristo Mitev, Sebastian Schlenkrich,  and Olaf
4157Vogel. We are also indebted to George Corliss,
4158Tom Epperly, Bruce Christianson, David Gay,  David Juedes,
4159Brad Karp, Koichi Kubota, Bob Olson,  Marcela Rosemblun, Dima
4160Shiriaev, Jay Srinivasan, Chuck Tyner, Jean Utke, and Duane Yoder for helping in
4161various ways with the development and documentation of ADOL-C.
4162%
4163\begin{thebibliography}{10}
4164
4165\bibitem{BeKh96}
4166Christian~H. Bischof, Peyvand~M. Khademi, Ali Bouaricha and Alan Carle.
4167\newblock {\em Efficient computation of gradients and Jacobians by dynamic
4168  exploitation of sparsity in automatic differentiation}.
4169\newblock Optimization Methods and Software 7(1):1-39, 1996.
4170
4171\bibitem{Chri91a}
4172Bruce Christianson.
4173\newblock {\em Reverse accumulation and accurate rounding error estimates for
4174Taylor series}.
4175\newblock  Optimization Methods and Software 1:81--94, 1992.
4176
4177\bibitem{GeMaPo05}
4178Assefaw Gebremedhin, Fredrik Manne, and Alex Pothen.
4179\newblock {\em What color is your {J}acobian? {G}raph coloring for computing
4180  derivatives}.
4181\newblock SIAM Review 47(4):629--705, 2005.
4182
4183\bibitem{GePoTaWa06}
4184Assefaw Gebremedhin, Alex Pothen, Arijit Tarafdar and Andrea Walther.
4185{\em Efficient Computation of Sparse Hessians: An Experimental Study
4186  using ADOL-C}. Tech. Rep. (2006). To appear in INFORMS Journal on Computing.
4187
4188\bibitem{GePoWa08} Assefaw Gebremedhin, Alex Pothen, and Andrea
4189  Walther.
4190{\em Exploiting  Sparsity  in Jacobian Computation via Coloring and Automatic Differentiation:
4191a Case Study in a Simulated Moving Bed Process}.
4192In Chr. Bischof et al., eds.,  {\em Proceedings AD 2008 conference}, LNCSE 64, pp. 327 -- 338, Springer (2008).
4193
4194\bibitem{GeTaMaPo07}
4195Assefaw Gebremedhin, Arijit Tarafdar, Fredrik Manne, and Alex Pothen,
4196{\em New Acyclic and Star Coloring Algorithms with Applications to Hessian Computation}.
4197SIAM Journal on Scientific Computing 29(3):1042--1072, 2007.
4198
4199
4200\bibitem{GrWa08}
4201Andreas Griewank and Andrea Walther: {\em Evaluating Derivatives, Principles and Techniques of
4202  Algorithmic Differentiation. Second edition}. SIAM, 2008.
4203
4204
4205\bibitem{Griewank97}
4206Andreas Griewank, Jean Utke, and Andrea Walther.
4207\newblock {\em Evaluating higher derivative tensors by forward propagation
4208          of univariate Taylor series}.
4209\newblock Mathematics of Computation, 69:1117--1130, 2000.
4210
4211\bibitem{GrWa00} 
4212Andreas Griewank and Andrea Walther. {\em Revolve: An Implementation of Checkpointing for the Reverse
4213                 or Adjoint Mode of Computational Differentiation},
4214                 ACM Transaction on Mathematical Software 26:19--45, 2000.
4215
4216\bibitem{HW}
4217    Ernst Hairer and Gerhard Wanner.
4218    {\it Solving Ordinary Differential Equations II.\/}
4219    Springer-Verlag, Berlin, 1991.
4220
4221\bibitem{Knuth73}
4222Donald~E. Knuth.
4223\newblock {\em The Art of Computer Programming. Second edition.}
4224\newblock Addison-Wesley, Reading, 1973.
4225
4226\bibitem{Wa05a}
4227Andrea Walther.
4228\newblock {\em Computing Sparse Hessians with Automatic Differentiation}.
4229\newblock Transaction on Mathematical Software, 34(1), Artikel 3 (2008).
4230\end{thebibliography}
4231\end{document}
4232
Note: See TracBrowser for help on using the repository browser.