source: trunk/ADOL-C/doc/adolc-manual.tex @ 370

Last change on this file since 370 was 370, checked in by kulshres, 7 years ago

Merge branch '2.3.x_ISSM' into svn

This introduces the new externally differentiated functions API

From: Jean Utke <utke@…>

Please see comments in ADOL-C/include/adolc/externfcts.h for details

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

File size: 194.9 KB
Line 
1% Latex file containing the documentation of ADOL-C
2%
3% Copyright (C) Andrea Walther, Andreas Griewank, Andreas Kowarz,
4%               Hristo Mitev, Sebastian Schlenkrich, Jean Utke, Olaf Vogel
5%
6% This file is part of ADOL-C. This software is provided as open source.
7% Any use, reproduction, or distribution of the software constitutes
8% recipient's acceptance of the terms of the accompanying license file.
9%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
10
11\documentclass[11pt,twoside]{article}
12\usepackage{hyperref}
13\usepackage{amsmath,amsthm,amssymb}
14\usepackage{graphicx}
15\usepackage{datetime}
16
17\newdateformat{monthyear}{\monthname\ \THEYEAR}
18
19\usepackage{color}
20
21\pagestyle{headings} 
22\bibliographystyle{plain}
23\parskip=6pt
24
25\setlength{\textwidth}{433.6pt}
26\setlength{\oddsidemargin}{23pt}
27\setlength{\evensidemargin}{23pt}
28\setlength{\topmargin}{25.0pt}
29\setlength{\textheight}{580pt}
30\setlength{\baselineskip}{8pt}
31
32\newcommand{\N}{{ {\rm I} \kern -.225em {\rm N} }}
33\newcommand{\R}{{ {\rm I} \kern -.225em {\rm R} }}
34\newcommand{\T}{{ {\rm I} \kern -.425em {\rm T} }}
35
36\renewcommand{\sectionautorefname}{Section}
37\renewcommand{\subsectionautorefname}{Section}
38\renewcommand{\figureautorefname}{Figure}
39\renewcommand{\tableautorefname}{Table}
40
41\setcounter{equation}{0}
42
43\input{version.tex}
44
45%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
46\begin{document}
47
48\begin{titlepage}
49\begin{center}
50{\Large {\bf ADOL-C:}} 
51\footnote{The development of earlier versions was supported by the Office of
52  Scientific Computing, U.S. Department of Energy, the NSF, and the Deutsche
53  Forschungsgemeinschaft. During the development of the current
54  version Andrea Walther and Andreas Kowarz were supported by the
55  grant Wa 1607/2-1 of the Deutsche Forschungsgemeinschaft} 
56\vspace{0.2in} \\
57%
58{\Large A Package for the Automatic Differentiation}\vspace{0.1in} \\
59{\Large of Algorithms Written in C/C++}\\
60\vspace{.2in}
61{\large\bf  Version \packageversion, \monthyear\today} \\
62\bigskip
63 \mbox{Andrea Walther}\footnote{Institute of Mathematics, University
64   of Paderborn, 33098 Paderborn, Germany} and
65 \mbox{Andreas Griewank}\footnote{Department of Mathematics,
66 Humboldt-Universit\"at zu Berlin, 10099 Berlin, Germany}
67\end{center}
68%
69%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
70\begin{abstract}
71The C++ package ADOL-C described here facilitates the evaluation of
72first and higher derivatives of vector functions that are defined
73by computer programs written in C or C++. The resulting derivative
74evaluation routines may be called from C, C++, Fortran, or any other
75language that can be linked with C.
76
77The numerical values of derivative vectors are obtained free
78of truncation errors at a small multiple of the run time and
79random access memory required by the given function evaluation program.
80Derivative matrices are obtained by columns, by rows or in sparse format.
81For solution curves defined by ordinary differential equations,
82special routines are provided that evaluate the Taylor coefficient vectors
83and their Jacobians with respect to the current state vector.
84For explicitly or implicitly defined functions derivative tensors are
85obtained with a complexity that grows only quadratically in their
86degree. The derivative calculations involve a possibly substantial but
87always predictable amount of data. Since the data is accessed strictly sequentially
88it can be automatically paged out to external files.
89\end{abstract}
90%
91{\bf Keywords}: Computational Differentiation, Automatic
92         Differentiation,
93         Chain Rule, Overloading, Taylor Coefficients,
94         Gradients, Hessians, Forward Mode, Reverse Mode,
95         Implicit Function Differentiation, Inverse Function Differentiation
96\medskip
97
98\noindent
99{\bf Abbreviated title}: Automatic differentiation by overloading in C++
100%
101\end{titlepage}
102%
103\tableofcontents       
104%
105
106\newpage
107%
108%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
109%
110%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
111\section{Preparing a Section of C or C++ Code for Differentiation}
112\label{prepar}
113%
114\subsection{Introduction}
115%
116\setcounter{equation}{0}
117The package \mbox{ADOL-C} 
118utilizes overloading in C++, but the
119user has to know only C. The acronym stands for {\bf A}utomatic
120{\bf D}ifferentiation by {\bf O}ver{\bf L}oading in {\bf C}++.
121In contrast to source transformation approaches, overloading does not generate intermediate
122source code.
123As starting points to retrieve further information on techniques and
124application of automatic differentiation, as well as on other AD
125tools, we refer to the book \cite{GrWa08}. Furthermore, the web page
126\verb=http://www.autodiff.org= of the AD community forms a rich source
127of further information and pointers.
128
129
130ADOL-C facilitates the simultaneous
131evaluation of arbitrarily high directional derivatives and the
132gradients of these Taylor coefficients with respect to all independent
133variables. Relative to the cost of evaluating the underlying function,
134the cost for evaluating any such scalar-vector pair grows as the
135square of the degree of the derivative but is still completely
136independent of the numbers $m$ and $n$.
137
138This manual is organized as follows. This section explains the
139modifications required to convert undifferentiated code to code that
140compiles with ADOL-C.
141\autoref{tape} covers aspects of the tape of recorded data that ADOL-C uses to
142evaluate arbitrarily high order derivatives. The discussion includes storage
143requirements and the tailoring of certain tape characteristics to fit specific
144user needs. Descriptions of easy-to-use drivers for a  convenient derivative
145evaluation are contained in \autoref{drivers}.
146\autoref{forw_rev_ad} offers a more mathematical characterization of
147the different modes of AD to compute derivatives. At the same time, the
148corresponding drivers of ADOL-C are explained. 
149The overloaded derivative evaluation routines using the forward and the reverse
150mode of AD are explained in \autoref{forw_rev}.
151Advanced differentiation techniques as the optimal checkpointing for
152time integrations, the exploitation of fixed point iterations, the usages
153of external differentiated functions and the differentiation of OpenMP
154parallel programs are described in \autoref{adv_ad}.
155The tapeless forward mode is presented in \autoref{tapeless}.
156\autoref{install} details the installation and
157use of the ADOL-C package. Finally, \autoref{example} 
158furnishes some example programs that incorporate the ADOL-C package to
159evaluate first and higher-order
160derivatives.  These and other examples are distributed with the ADOL-C
161source code.
162The user should simply refer to them if the more abstract and general
163descriptions of ADOL-C provided in this document do not suffice.
164%
165%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
166\subsection{Declaring Active Variables}
167%
168\label{DecActVar}
169%
170The key ingredient of automatic differentiation by overloading is the
171concept of an {\em active variable}. All variables that may be
172considered as differentiable quantities at some time
173during the program execution must be of an active
174type. ADOL-C uses one
175active scalar type, called {\sf adouble}, whose real part is of the
176standard type {\sf double}.
177Typically, one will declare the independent variables
178and all quantities that directly or indirectly depend on them as
179{\em active}. Other variables that do not depend on the independent
180variables but enter, for example, as parameters, may remain one of the
181{\em passive} types {\sf double, float}, or {\sf int}. There is no
182implicit type conversion from {\sf adouble} to any of these passive
183types; thus, {\bf failure to declare variables as active when they
184depend on other active variables will result in a compile-time error
185message}. In data flow terminology, the set of active variable names
186must contain all its successors in the dependency graph. All components
187of indexed arrays must have the same activity status.
188
189The real component of an {\sf adouble x} can be extracted as
190{\sf x.value()}. In particular,
191such explicit conversions are needed for the standard output procedure
192{\sf printf}. The output stream operator \boldmath $\ll$ \unboldmath is overloaded such
193that first the real part of an {\sf adouble} and then the string
194``{\sf (a)}" is added to the stream. The input stream operator \boldmath $\gg$ \unboldmath  can
195be used to assign a constant value to an {\sf adouble}.
196Naturally, {\sf adouble}s may be
197components of vectors, matrices, and other arrays, as well as
198members of structures or classes.
199
200The C++ class {\sf adouble}, its member functions, and the overloaded
201versions of all arithmetic operations, comparison operators, and
202most ANSI C functions are contained in the file \verb=adouble.cpp= and its
203header \verb=<adolc/adouble.h>=. The latter must be included for compilation
204of all program files containing {\sf adouble}s and corresponding
205operations.
206%
207%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
208\subsection{Marking Active Sections}
209\label{markingActive}
210%
211All calculations involving active variables that occur between
212the void function calls
213\begin{center}
214{\sf trace\_on(tag,keep)} \hspace{0.3in} and \hspace{0.3in}
215{\sf trace\_off(file)}
216\end{center}
217are recorded on a sequential data set called {\em tape}. Pairs of
218these function calls can appear anywhere in a C++ program, but
219they must not overlap. The nonnegative integer argument {\sf tag} identifies the
220particular tape for subsequent function or derivative evaluations.
221Unless several tapes need to be kept, ${\sf tag} =0$ may be used throughout.
222The optional integer arguments {\sf keep} and
223{\sf file} will be discussed in \autoref{tape}. We will refer to the
224sequence of statements executed between a particular call to
225{\sf trace\_on} and the following call to {\sf trace\_off} as an
226{\em active section} of the code. The same active section may be
227entered repeatedly, and one can successively generate several traces
228on distinct tapes by changing the value of {\sf tag}.
229Both functions {\sf trace\_on} and {\sf trace\_off} are prototyped in
230the header file \verb=<adolc/taputil.h>=, which is included by the header
231\verb=<adolc/adouble.h>= automatically.
232
233Active sections may contain nested or even recursive calls to functions
234provided by the user. Naturally, their formal and actual parameters
235must have matching types. In particular, the functions must be
236compiled with their active variables declared as
237{\sf adouble}s and with the header file \verb=<adolc/adouble.h>= included. 
238Variables of type {\sf adouble} may be declared outside an active section and need not
239go out of scope before the end of an active section.
240It is not necessary -- though desirable -- that free-store {\sf adouble}s
241allocated within
242an active section be deleted before its completion. The values of all
243{\sf adouble}s that exist at the beginning and end of an active section
244are automatically
245recorded by {\sf trace\_on} and {\sf trace\_off}, respectively.
246%
247%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
248\subsection{Selecting Independent and Dependent Variables}
249%
250One or more active variables that are read in or initialized to
251the values of constants or passive variables must be distinguished as
252independent variables. Other active variables that are similarly
253initialized may be considered as temporaries (e.g., a variable that
254accumulates the partial sums of a scalar product after being
255initialized to zero). In order to distinguish an active variable {\sf x} as
256independent, ADOL-C requires an assignment of the form
257\begin{center}
258{\sf x} \boldmath $\ll=$ \unboldmath {\sf px}\hspace{0.2in}// {\sf px} of any passive numeric type $\enspace .$
259\end{center}
260This special initialization ensures that {\sf x.value()} = {\sf px}, and it should
261precede any other assignment to {\sf x}. However, {\sf x} may be reassigned
262other values subsequently. Similarly, one or more active variables {\sf y}
263must be distinguished as dependent by an assignment of the form
264\begin{center}
265{\sf y \boldmath $\gg=$ \unboldmath py}\hspace{0.2in}// {\sf py} of any  passive type $\enspace ,$ 
266\end{center}
267which ensures that {\sf py} = {\sf y.value()} and should not be succeeded
268by any other assignment to {\sf y}. However, a dependent variable {\sf y} 
269may have been assigned other real values previously, and it could even be an
270independent variable as well.  The derivative values calculated after
271the
272completion of an active section always represent {\bf derivatives of the final
273values of the dependent variables with respect to the initial values of the
274independent variables}.
275
276The order in which the independent and dependent variables are marked
277by the \boldmath $\ll=$ \unboldmath and \boldmath $\gg=$ \unboldmath statements matters crucially for the subsequent
278derivative evaluations. However, these variables do not have to be
279combined into contiguous vectors. ADOL-C counts the number of
280independent and dependent variable specifications within each active
281section and records them in the header of the tape.
282%
283%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
284\subsection{A Subprogram as an Active Section} 
285%
286As a generic example let us consider a C(++) function of the form
287shown in \autoref{code1}.
288%
289\begin{figure}[hbt]
290\framebox[\textwidth]{\parbox{\textwidth}{
291\begin{center}
292\begin{tabbing}
293{\sf void eval(}\= {\sf int n, int m,} \hspace{0.5 in} \=  // number of independents and dependents\\
294\>{\sf  double *x,} \> // independent variable vector \\
295\>{\sf  double *y,} \> // dependent variable vector  \\ 
296\> {\sf int *k, } \> // integer parameters \\ 
297\>{\sf  double *z)}  \> // real parameters \\
298{\sf \{ }\hspace{0.1 in } \=  \> // beginning of function body \\
299\>{\sf double t = 0;}  \> // local variable declaration \\
300\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> // begin of computation \\
301\>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue  \\
302\>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue \\
303\>{\sf  y[m-1] = t/m; }   \> //   end of computation \\
304{\sf  \} } \>  \> // end of function
305\end{tabbing}
306\end{center}
307}} 
308\caption{Generic example of a subprogram to be activated}
309\label{code1}
310\end{figure}
311%
312
313If {\sf eval} is to be called from within an active C(++)
314section with {\sf x}
315and {\sf y} as vectors of {\sf adouble}s and the other parameters
316passive, then one merely has to change the type declarations of all
317variables that depend on {\sf x} from {\sf double} or {\sf float} to
318{\sf adouble}. Subsequently, the subprogram must be compiled with the
319header file \verb=<adolc/adouble.h>= included as described
320in \autoref{DecActVar}. Now let us consider the situation when {\sf eval} is
321still to be called with integer and real arguments, possibly from
322a program written in Fortran77, which  does not allow overloading.
323
324To automatically compute derivatives of the dependent
325variables {\sf y} with respect to the independent variables {\sf x}, we
326can make the body of the function into an active section. For
327example, we may modify the previous program segment
328as in \autoref{adolcexam}.
329The renaming and doubling up of the original independent and dependent
330variable vectors by active counterparts may seem at first a bit clumsy.
331However, this transformation has the advantage that the calling
332sequence and the computational part, i.e., where the function is
333really evaluated, of {\sf eval} remain completely
334unaltered. If the temporary variable {\sf t} had remained a {\sf double},
335the code would not compile, because of a type conflict in the assignment
336following the declaration. More detailed example codes are listed in
337\autoref{example}.
338
339\begin{figure}[htb]
340\framebox[\textwidth]{\parbox{\textwidth}{
341\begin{center}
342\begin{tabbing}
343{\sf void eval(} \= {\sf  int n,m,} \hspace{1.0 in}\= // number of independents and dependents\\
344\> {\sf double *px,} \> // independent passive variable vector \\
345\> {\sf double *py,} \> // dependent passive variable vector  \\ 
346\> {\sf int *k,}  \> // integer parameters \\
347\> {\sf double *z)} \> // parameter vector \\
348{\sf \{}\hspace{0.1 in}\= \> // beginning of function body \\
349\>{\sf  short int tag = 0;} \>   // tape array and/or tape file specifier\\
350\>{\sf trace\_on(tag);} \> // start tracing  \\
351\>{\sf adouble *x, *y;} \> // declare active variable pointers \\
352\>{\sf x = new adouble[n];}\>// declare active independent variables \\ 
353\>{\sf y = new adouble[m];} \> // declare active dependent variables \\
354\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \\
355\>\hspace{0.2in} {\sf x[i] \boldmath $\ll=$ \unboldmath  px[i];} \> // select independent variables \\
356\>{\sf adouble t = 0;}  \> // local variable declaration \\
357     \>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> //  begin crunch \\
358     \>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue crunch \\
359     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
360     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
361     \>{\sf  y[m-1] = t/m; }   \> //   end crunch as before\\
362     \>{\sf for (int j=0; j \boldmath $<$ \unboldmath m; j++)} \\
363     \>\hspace{0.2in}{\sf y[j] \boldmath $\gg=$ \unboldmath py[j];} \> // select dependent variables \\
364     \>{\sf  delete[] y;} \>// delete dependent active variables \\
365     \>{\sf  delete[] x;} \>// delete independent active variables \\
366     \>{\sf trace\_off();} \> // complete tape \\
367{\sf  \}}   \>\> // end of function
368\end{tabbing} 
369\end{center}}}
370\caption{Activated version of the code listed in \autoref{code1}}
371\label{adolcexam}
372\end{figure}
373%
374%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
375\subsection{Overloaded Operators and Functions}
376\label{OverOper}
377%
378As in the subprogram discussed above, the actual computational
379statements of a C(++) code need not be altered for the purposes of
380automatic differentiation. All arithmetic operations, as well as the
381comparison and assignment operators, are overloaded, so any or all of
382their operands can be an active variable. An {\sf adouble x} occurring
383in a comparison operator is effectively replaced by its real value
384{\sf x.value()}. Most functions contained in the ANSI C standard for
385the math library are overloaded for active arguments. The only
386exceptions are the non-differentiable functions {\sf fmod} and
387{\sf modf}. Otherwise, legitimate C code in active sections can remain
388completely unchanged, provided the direct output of active variables
389is avoided. The rest of this subsection may be skipped by first time
390users who are not worried about marginal issues of differentiability
391and efficiency.
392
393The modulus {\sf fabs(x)} is everywhere Lipschitz continuous but not
394properly differentiable at the origin, which raises the question of
395how this exception ought to be handled. Fortunately, one can easily
396see that {\sf fabs(x)} and all its compositions with smooth
397functions are still directionally differentiable. These
398directional derivatives of arbitrary order can be propagated in the
399forward mode without any ambiguity. In other words, the forward mode as
400implemented in ADOL-C  computes Gateaux derivatives
401in certain directions, which reduce to Fr\'echet derivatives only
402if the dependence on the direction is linear. Otherwise,
403the directional derivatives are merely positively homogeneous with
404respect to the scaling of the directions.
405For the reverse mode, ADOL-C sets the derivative of {\sf fabs(x)} at
406the origin somewhat arbitrarily to zero.
407
408We have defined binary functions {\sf fmin} and {\sf fmax} for {\sf adouble}
409arguments, so that function and derivative values are obtained consistent
410with those of {\sf fabs} according to the identities
411\[
412 \min(a,b) = [a+b-|a-b|]/2 \quad {\rm and} \quad
413 \max(a,b) = [a+b+|a-b|]/2 \quad .
414\]
415These relations cannot hold if either $a$ or $b$ is infinite, in which
416case {\sf fmin} or {\sf fmax} and their derivatives may still be well
417defined. It should be noted that the directional differentiation of
418{\sf fmin} and {\sf fmax} yields at ties $a=b$ different results from
419the corresponding assignment based on the sign of $a-b$. For example,
420the statement
421\begin{center}
422 {\sf if (a $<$ b) c = a; else c = b;}
423\end{center} 
424yields for {\sf a}~=~{\sf b} and {\sf a}$^\prime < $~{\sf b}$^\prime$
425the incorrect directional derivative value
426{\sf c}$^\prime = $~{\sf  b}$^\prime$ rather than the correct
427{\sf c}$^\prime = $~{\sf  a}$^\prime$. Therefore this form of conditional assignment
428should be avoided by use of the function $\sf fmin(a,b)$. There
429are also versions of {\sf fmin} and {\sf fmax} for two passive
430arguments and mixed passive/active arguments are handled by
431implicit conversion.
432On the function class obtained by composing the modulus with real
433analytic functions, the concept of directional differentiation can be
434extended to the propagation of unique one-sided Taylor expansions.
435The branches taken by {\sf fabs, fmin}, and {\sf fmax}, are recorded
436on the tape.
437
438The functions {\sf sqrt}, {\sf pow}, and some inverse trigonometric
439functions have infinite slopes at the boundary points of their domains.
440At these marginal points the derivatives are set by ADOL-C to
441either {\sf $\pm$InfVal}, 0
442or {\sf NoNum}, where {\sf InfVal} and {\sf NoNum} are user-defined
443parameters, see \autoref{Customizing}.
444On IEEE machines {\sf InfVal} can be set to the special value
445{\sf Inf}~=~$1.0/0.0$ and {\sf NoNum} to {\sf NaN}~=~$0.0/0.0$.
446For example, at {\sf a}~=~0 the first derivative {\sf b}$^\prime$ 
447of {\sf b}~=~{\sf sqrt(a)} is set to
448\[
449{\sf b}^\prime = \left\{
450\begin{array}{ll}
451\sf InfVal&\mbox{if}\;\; {\sf a}^\prime>0  \\
4520&\mbox{if}\;\;{\sf a}^\prime =0 \\
453\sf NoNum&\mbox{if}\;\;{\sf a}^\prime <0\\
454\end{array} \right\enspace .
455\]
456In other words, we consider {\sf a} and
457consequently {\sf b}  as a constant when {\sf a}$^\prime$ or more generally
458all computed Taylor coefficients are zero.
459
460The general power function ${\sf pow(x,y)=x^y}$ is computed whenever
461it is defined for the corresponding {\sf double} arguments. If {\sf x} is
462negative, however, the partial derivative with respect to an integral exponent
463is set to zero.
464%Similarly, the partial of {\bf pow} with respect to both arguments
465%is set to zero at the origin, where both arguments vanish.     
466The derivatives of the step functions
467{\sf floor}, {\sf ceil}, {\sf frexp}, and {\sf ldexp} are set to zero at all
468arguments {\sf x}. The result values of the step functions
469are recorded on the tape and can later be checked to recognize
470whether a step to another level was taken during a forward sweep
471at different arguments than at taping time.
472
473Some C implementations supply other special
474functions, in particular the error function {\sf erf(x)}. For the
475latter, we have included an {\sf adouble} version in \verb=<adouble.cpp>=, which
476has been commented out for systems on which the {\sf double} valued version
477is not available. The increment and decrement operators {\sf ++}, \boldmath $--$ \unboldmath (prefix and
478postfix) are available for {\sf adouble}s.
479%
480% XXX: Vector and matrix class have to be reimplemented !!!
481%
482% and also the
483%active subscripts described in the \autoref{act_subscr}.
484Ambiguous statements like {\sf a += a++;} must be
485avoided because the compiler may sequence the evaluation of the
486overloaded
487expression differently from the original in terms of {\sf double}s.
488
489As we have indicated above, all subroutines called with active arguments
490must be modified or suitably overloaded. The simplest procedure is
491to declare the local variables of the function as active so that
492their internal calculations are also recorded on the tape.
493Unfortunately, this approach is likely to be unnecessarily inefficient
494and inaccurate if the original subroutine evaluates a special function
495that is defined as the solution of a particular mathematical problem.
496The most important examples are implicit functions, quadratures,
497and solutions of ordinary differential equations. Often
498the numerical methods for evaluating such special functions are
499elaborate, and their internal workings are not at all differentiable in
500the data. Rather than differentiating through such an adaptive
501procedure, one can obtain first and higher derivatives directly from
502the mathematical definition of the special function. Currently this
503direct approach has been implemented only for user-supplied quadratures
504as described in \autoref{quadrat}.
505%
506%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
507\subsection{Reusing the Tape for Arbitrary Input Values}
508\label{reuse_tape}
509%
510In some situations it may be desirable to calculate the value and
511derivatives of a function at arbitrary arguments by using a tape of
512the function evaluation at one argument and reevaluating the
513function  and its derivatives using the given ADOL-C
514routines. This approach can
515significantly reduce run times, and it
516also allows to port problem functions, in the form of the 
517corresponding tape files, into a computing environment that
518does not support C++ but does support C or Fortran. 
519Therefore, the routines provided by ADOL-C for the evaluation of derivatives
520can be used to at arguments $x$ other than the
521point at which the tape was generated, provided there are
522no user defined quadratures and all comparisons involving
523{\sf adouble}s yield the same result. The last condition
524implies that the control flow is unaltered by the change
525of the independent variable values. Therefore, this sufficient
526condition is tested by ADOL-C and if it is not met
527the ADOL-C routine called for derivative calculations indicates this
528contingency through its return value. Currently, there are six return values,
529see \autoref{retvalues}.
530\begin{table}[h]
531\center\small
532\begin{tabular}{|r|l|}\hline
533 +3 &
534\begin{minipage}{12.5cm}
535\vspace*{1ex}
536The function is locally analytic.
537\vspace*{1ex}
538\end{minipage} \\ \hline
539 +2 &
540\begin{minipage}{12.5cm}
541\vspace*{1ex}
542The function is locally analytic but the sparsity
543structure (compared to the situation at the  taping point)
544may have changed, e.g. while at taping arguments
545{\sf fmax(a,b)} returned {\sf a} we get {\sf b} at
546the argument currently used.
547\vspace*{1ex}
548\end{minipage} \\ \hline
549 +1 &
550\begin{minipage}{12.5cm}
551\vspace*{1ex}
552At least one of the functions {\sf fmin}, {\sf fmax} or {\sf fabs}
553is  evaluated at a tie or zero, respectively.  Hence, the function to be differentiated is
554Lipschitz-continuous but possibly non-differentiable.
555\vspace*{1ex}
556\end{minipage} \\ \hline
557 0 &
558\begin{minipage}{12.5cm}
559\vspace*{1ex}
560Some arithmetic comparison involving {\sf adouble}s yields a tie.
561Hence, the function to be differentiated  may be discontinuous.
562\vspace*{1ex}
563\end{minipage} \\ \hline
564 -1 &
565\begin{minipage}{12.5cm}
566\vspace*{1ex}
567An {\sf adouble} comparison yields different results
568from the evaluation point at which the tape was generated.
569\vspace*{1ex}
570\end{minipage} \\ \hline
571 -2 &
572\begin{minipage}{12.5cm}
573\vspace*{1ex}
574The argument of a user-defined quadrature has changed
575from the evaluation point at which the tape was generated.
576\vspace*{1ex}
577\end{minipage} \\ \hline
578\end{tabular}
579\caption{Description of return values}
580\label{retvalues}
581\end{table}                           
582
583\begin{figure}[h]
584\centering\includegraphics[width=10.0cm]{tap_point}
585\caption{Return values around the taping point}
586\label{fi:tap_point}
587\end{figure}         
588
589In \autoref{fi:tap_point} these return values are illustrated.
590If the user finds the return value of an ADOL-C routine to be negative the
591taping process simply has to be repeated by executing the active section again.
592The crux of the problem lies in the fact that the tape records only
593the operations that are executed during one particular evaluation of the
594function.
595It also has no way to evaluate integrals since the corresponding
596quadratures are never recorded on the tape.
597Therefore, when there are user-defined quadratures the retaping is necessary at each
598new point. If there are only branches conditioned on {\sf adouble}
599comparisons one may hope that re-taping becomes unnecessary when
600the points settle down in some small neighborhood, as one would
601expect for example in an iterative equation solver.
602%
603%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
604\subsection{Conditional Assignments}
605\label{condassign}
606%
607It appears unsatisfactory that, for example, a simple table lookup
608of some physical property forces the re-recording of a possibly
609much larger calculation. However, the basic philosophy of ADOL-C
610is to overload arithmetic, rather than to generate a new program
611with jumps between ``instructions'', which would destroy the
612strictly sequential tape access and
613require the infusion of substantial compiler technology.
614Therefore, we introduce the two constructs of conditional
615assignments and active integers as partial remedies to the
616branching problem.
617
618In many cases, the functionality of branches
619can be replaced by conditional assignments. 
620For this purpose, we provide a special function called
621{\sf condassign(a,b,c,d)}. Its calling sequence corresponds to the
622syntax of the conditional assignment
623\begin{center}
624    {\sf a = (b \boldmath $>$ \unboldmath 0) ? c : d;} 
625\end{center}
626which C++ inherited from C. However, here the arguments are restricted to be
627active or passive scalar arguments, and all expression arguments
628are evaluated before the test on {\sf  b}, which is different from
629the usual conditional assignment or the code segment.
630
631Suppose the original program contains the code segment
632\begin{center}
633{\sf if (b \boldmath $>$ \unboldmath 0) a = c; else a = d;}\\
634\end{center}
635Here, only one of the expressions (or, more generally, program blocks)
636{\sf c} and {\sf d} is evaluated, which exactly constitutes the problem
637for ADOL-C. To obtain the correct value {\sf a} with ADOL-C, one
638may first execute both branches and then pick either {\sf c}
639or {\sf d} using
640{\sf condassign(a,b,c,d)}. To maintain
641consistency with the original code, one has to make sure
642that the two branches do not have any side effects that can
643interfere with each other or may be important for subsequent
644calculations. Furthermore the test parameter {\sf b} has to be an
645{\sf adouble} or an {\sf adouble} expression. Otherwise the
646test condition {\sf b} is recorded on the tape as a {\em constant} with its
647run time value. Thus the original dependency of {\sf b} on
648active variables gets lost, for instance if {\sf b} is a comparison
649expression, see \autoref{OverOper}.
650If there is no {\sf else} part in a conditional assignment, one may call
651the three argument version
652{\sf condassign(a,b,c)}, which
653is logically equivalent to {\sf condassign(a,b,c,a)} in that
654nothing happens if {\sf b} is non-positive. 
655The header file \verb=<adolc/adouble.h>=
656contains also corresponding definitions of
657{\sf condassign(a,b,c,d)} 
658and {\sf condassign(a,b,c)} for
659passive {\sf double} arguments so that the modified code
660without any differentiation can be tested
661for correctness.
662%
663%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
664\subsection{Step-by-Step Modification Procedure}
665%
666To prepare a section of given C or C++ code for automatic
667differentiation as described above, one applies the following step-by-step procedure.
668\begin{enumerate}
669\item
670Use the statements {\sf trace\_on(tag)} or {\sf trace\_on(tag,keep)}
671and {\sf trace\_off()} or {\sf trace\_off(file)} to mark the
672beginning and end of the active section.
673\item 
674Select the set of active variables, and change their type from
675{\sf double} or {\sf float} to {\sf adouble}.
676\item
677Select a sequence of independent variables, and initialize them with
678\boldmath $\ll=$ \unboldmath assignments from passive variables or vectors.
679\item
680Select a sequence of dependent variables among the active variables,
681and pass their final values to passive variable or vectors thereof
682by \boldmath $\gg=$ \unboldmath assignments.
683\item 
684Compile the codes after including the header file \verb=<adolc/adouble.h>=.
685\end{enumerate}
686Typically, the first compilation will detect several type conflicts
687-- usually attempts to convert from active to passive
688variables or to perform standard I/O of active variables.
689Since all standard
690C programs can be activated by a mechanical application of the
691procedure above, the following section is of importance
692only to advanced users.
693%                                                                 
694%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
695\section{Numbering the Tapes and Controlling the Buffer}
696\label{tape}
697%
698The trace generated by the execution of an active section may stay
699within a triplet of internal arrays or it may be written out
700to three corresponding files. We will refer to these triplets as the
701tape array or tape file, in general tape, which may subsequently be
702used to evaluate the
703underlying function and its derivatives at the original point or at
704alternative arguments. If the active section involves user-defined
705quadratures it must be executed and
706re-taped at each new argument. Similarly, if conditions on
707{\sf adouble} values lead to a different program branch being taken at
708a new argument the evaluation process also needs to be re-taped at the
709new point. Otherwise, direct evaluation from
710the tape by the routine {\sf function} (\autoref{optdrivers}) is
711likely to be
712faster. The use of quadratures and the results of all comparisons on
713{\sf adouble}s are recorded on the tape so that {\sf function} and other
714forward routines stop and  return appropriate flags if their use without
715prior re-taping is unsafe. To avoid any re-taping certain types of
716branches can be recorded on the tape through
717the use of conditional assignments 
718described before in \autoref{condassign}.
719
720Several tapes may be generated and kept simultaneously.
721A tape array is used as a triplet of buffers or a tape file is generated if
722the length of any of the buffers exceeds the maximal array lengths of
723{\sf OBUFSIZE}, {\sf VBUFSIZE} or {\sf LBUFSIZE}. These parameters are
724defined in the header file \verb=<adolc/usrparms.h>=
725and may be adjusted by the user in the header file before compiling
726the ADOL-C library, or on runtime using a file named \verb=.adolcrc=.
727The filesystem folder, where the tapes files may be written to disk,
728can be changed by changing the definition of {\sf TAPE\_DIR} in
729the header file \verb=<adolc/dvlparms.h>= before
730compiling the ADOL-C library, or on runtime by defining {\sf
731  TAPE\_DIR} in the \verb=.adolcrc= file. By default this is defined
732to be the present working directory (\verb=.=).
733
734For simple usage, {\sf trace\_on} may be called with only the tape
735{\sf tag} as argument, and {\sf trace\_off} may be called
736without argument. The optional integer argument {\sf keep} of
737{\sf trace\_on} determines whether the numerical values of all
738active variables are recorded in a buffered temporary array or file
739called the taylor stack.
740This option takes effect if
741{\sf keep} = 1 and prepares the scene for an immediately following
742gradient evaluation by a call to a routine implementing the reverse mode
743as described in the \autoref{forw_rev_ad} and \autoref{forw_rev}. A
744file is used instead of an array if the size exceeds the maximal array
745length of {\sf TBUFSIZE} defined in \verb=<adolc/usrparms.h>= and may
746be adjusted in the same way like the other buffer sizes mentioned above.
747Alternatively, gradients may be evaluated by a call
748to {\sf gradient}, which includes a preparatory forward sweep
749for the creation of the temporary file. If omitted, the argument
750{\sf  keep} defaults to 0, so that no temporary
751taylor stack file is generated.
752
753By setting the optional integer argument {\sf file} of
754{\sf  trace\_off} to 1, the user may force a numbered  tape
755file to be written even if the tape array (buffer) does not overflow.
756If the argument {\sf file} is omitted, it
757defaults to 0, so that the tape array is written onto a tape file only
758if the length of any of the buffers exceeds {\sf [OLVT]BUFSIZE} elements.
759
760After the execution of an active section, if a tape file was generated, i.e.,
761if the length of some buffer exceeded {\sf [OLVT]BUFSIZE} elements or if the
762argument {\sf file} of {\sf trace\_off} was set to 1, the files will be
763saved in the directory defined as {\sf ADOLC\_TAPE\_DIR} (by default
764the current working directory) under filenames formed by
765the strings {\sf ADOLC\_OPERATIONS\_NAME}, {\sf
766  ADOLC\_LOCATIONS\_NAME}, {\sf ADOLC\_VALUES\_NAME} and {\sf
767  ADOLC\_TAYLORS\_NAME} defined in
768the header file \verb=<adolc/dvlparms.h>= appended with the number
769given as the {\sf tag} argument to {\sf trace\_on} and have the
770extension {\sf .tap}.
771
772 Later, all problem-independent routines
773like {\sf gradient}, {\sf jacobian}, {\sf forward}, {\sf reverse}, and others
774expect as first argument a {\sf tag} to determine
775the tape on which their respective computational task is to be performed.
776By calling {\sf trace\_on} with different tape {\sf tag}s, one can create
777several tapes for various function evaluations and subsequently perform
778function and derivative evaluations on one or more of them.
779
780For example, suppose one wishes to calculate for two smooth functions
781$f_1(x)$ and $f_2(x)$ 
782\[
783   f(x) = \max \{f_1(x) ,f_2(x)\},\qquad \nabla f(x),
784\]
785and possibly higher derivatives where the two functions do not tie.
786Provided $f_1$ and $f_2$ are evaluated in two separate active sections,
787one can generate two different tapes by calling {\sf trace\_on} with
788{\sf tag} = 1 and {\sf tag} = 2 at the beginning of the respective active
789sections.
790Subsequently, one can decide whether $f(x)=f_1(x)$ or $f(x)=f_2(x)$ at the
791current argument and then evaluate the gradient $\nabla f(x)$ by calling
792{\sf gradient} with the appropriate argument value {\sf tag} = 1 or
793{\sf tag} = 2.
794%
795%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
796\subsection{Examining the Tape and Predicting Storage Requirements }
797\label{examiningTape}
798%
799At any point in the program, one may call the routine
800\begin{center}
801{\sf void tapestats(unsigned short tag, size\_t* counts)}
802\end{center}
803with {\sf counts} beeing an array of at least eleven integers.
804The first argument {\sf tag} specifies the particular tape of
805interest. The components of {\sf counts} represent
806\[
807\begin{tabular}{ll}
808{\sf counts[0]}: & the number of independents, i.e.~calls to \boldmath $\ll=$ \unboldmath, \\
809{\sf counts[1]}: & the number of dependents, i.e.~calls to \boldmath $\gg=$ \unboldmath,\\ 
810{\sf counts[2]}: & the maximal number of live active variables,\\
811{\sf counts[3]}: & the size of taylor stack (number of overwrites),\\
812{\sf counts[4]}: & the buffer size (a multiple of eight),
813\end{tabular}
814\]
815\[
816\begin{tabular}{ll}
817{\sf counts[5]}: & the total number of operations recorded,\\
818{\sf counts[6-13]}: & other internal information about the tape.
819\end{tabular}
820\]
821The values {\sf maxlive} = {\sf counts[2]} and {\sf tssize} = {\sf counts[3]} 
822determine the temporary
823storage requirements during calls to the routines
824implementing the forward and the reverse mode.
825For a certain degree {\sf deg} $\geq$ 0, the scalar version of the
826forward mode involves apart from the tape buffers an array of
827 $(${\sf deg}$+1)*${\sf maxlive} {\sf double}s in
828core and, in addition, a sequential data set called the value stack
829of {\sf tssize}$*${\sf keep} {\sf revreal}s if called with the
830option {\sf keep} $>$ 0. Here
831the type {\sf revreal} is defined as {\sf double} or {\sf float} in
832the header file \verb=<adolc/usrparms.h>=. The latter choice halves the storage
833requirement for the sequential data set, which stays in core if
834its length is less than {\sf TBUFSIZE} bytes and is otherwise written
835out to a temporary file. The parameter {\sf TBUFSIZE} is defined in the header file \verb=<adolc/usrparms.h>=.
836The drawback of the economical
837{\sf revreal} = {\sf float} choice is that subsequent calls to reverse mode implementations
838yield gradients and other adjoint vectors only in single-precision
839accuracy. This may be acceptable if the adjoint vectors
840represent rows of a Jacobian that is  used for the calculation of
841Newton steps. In its scalar version, the reverse mode implementation involves
842the same number of {\sf double}s and twice as many {\sf revreal}s as the
843forward mode implementation.
844The storage requirements of the vector versions of the forward mode and
845reverse mode implementation are equal to that of the scalar versions multiplied by
846the vector length.
847%
848%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
849\subsection{Customizing ADOL-C}
850\label{Customizing}
851%
852Based on the information provided by the routine {\sf tapestats}, the user may alter the
853following types and constant dimensions in the header file \verb=<adolc/usrparms.h>=
854to suit his problem and environment.
855
856\begin{description}
857\item[{\sf OBUFSIZE}, {\sf LBUFSIZE}, {\sf VBUFSIZE}{\rm :}] These integer determines the length of
858in\-ter\-nal buf\-fers (default: 65$\,$536). If the buffers are large enough to accommodate all
859required data, any file access is avoided unless {\sf trace\_off}
860is called with a positive argument. This desirable situation can
861be achieved for many problem functions with an execution trace of moderate
862size. Primarily these values occur as an argument
863to {\sf malloc}, so that setting it unnecessarily large may have no
864ill effects, unless the operating system prohibits or penalizes large
865array allocations.
866
867\item[{\sf TBUFSIZE}{\rm :}] This integer determines the length of the
868in\-ter\-nal buf\-fer for a taylor stack (default: 65$\,$536).
869
870\item[{\sf TBUFNUM}{\rm :}] This integer determines the maximal number of taylor stacks (default: 32).
871
872\item[{\sf locint}{\rm :}] The range of the integer type
873{\sf locint} determines how many {\sf adouble}s can be simultaneously
874alive (default: {\sf unsigned int}).  In extreme cases when there are more than $2^{32}$ {\sf adouble}s
875alive at any one time, the type {\sf locint} must be changed to
876 {\sf unsigned long}.
877
878\item[{\sf revreal}{\rm :}] The choice of this floating-point type
879trades accuracy with storage for reverse sweeps (default: {\sf double}). While functions
880and their derivatives are always evaluated in double precision
881during forward sweeps, gradients and other adjoint vectors are obtained
882with the precision determined by the type {\sf revreal}. The less
883accurate choice {\sf revreal} = {\sf float} nearly halves the
884storage requirement during reverse sweeps.
885
886\item[{\sf fint}{\rm :}] The integer data type used by Fortran callable versions of functions.
887
888\item[{\sf fdouble}{\rm :}] The floating point data type used by Fortran callable versions of functions.
889
890\item[{\sf inf\_num}{\rm :}] This together with {\sf inf\_den}
891sets the ``vertical'' slope {\sf InfVal} = {\sf inf\_num/inf\_den} 
892of special functions at the boundaries of their domains (default: {\sf inf\_num} = 1.0). On IEEE machines
893the default setting produces the standard {\sf Inf}. On non-IEEE machines
894change these values to produce a small {\sf InfVal} value and compare
895the results of two forward sweeps with different {\sf InfVal} settings
896to detect a ``vertical'' slope.
897
898\item[{\sf inf\_den}{\rm :}] See {\sf inf\_num} (default: 0.0).
899
900\item[{\sf non\_num}{\rm :}] This together with {\sf non\_den} 
901sets the mathematically
902undefined derivative value {\sf NoNum} = {\sf non\_num/non\_den}
903of special functions at the boundaries of their domains (default: {\sf non\_num} = 0.0). On IEEE machines
904the default setting produces the standard {\sf NaN}. On non-IEEE machines
905change these values to produce a small {\sf NoNum} value and compare
906the results of two forward sweeps with different {\sf NoNum} settings
907to detect the occurrence of undefined derivative values.
908
909\item[{\sf non\_den}{\rm :}] See {\sf non\_num} (default: 0.0).
910
911\item[{\sf ADOLC\_EPS}{\rm :}] For testing on small numbers to avoid overflows (default: 10E-20).
912
913\item[{\sf ATRIG\_ERF}{\rm :}] By removing the comment signs
914the overloaded versions of the inverse hyperbolic functions and
915the error function are enabled (default: undefined).
916
917\item[{\sf DIAG\_OUT}{\rm :}] File identifier used as standard output for ADOL-C diagnostics (default: stdout).
918
919\item[{\sf ADOLC\_USE\_CALLOC}{\rm :}] Selects the memory allocation routine
920  used by ADOL-C. {\sf Malloc} will be used if this variable is
921  undefined. {\sf ADOLC\_USE\_CALLOC} is defined by default to avoid incorrect
922  result caused by uninitialized memory.
923\end{description}
924%
925%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
926\subsection{Warnings and Suggestions for Improved Efficiency}
927\label{WarSug}
928%
929Since the type {\sf adouble} has a nontrivial constructor,
930the mere declaration of large {\sf adouble} arrays may take up
931considerable run time. The user should be warned against
932the usual Fortran practice of declaring fixed-size arrays
933that can accommodate the largest possible case of an evaluation program
934with variable dimensions. If such programs are converted to or written
935in C, the overloading in combination with ADOL-C will lead to very
936large run time increases for comparatively small values of the
937problem dimension, because the actual computation is completely
938dominated by the construction of the large {\sf adouble} arrays.
939The user is advised to
940create dynamic arrays of
941{\sf adouble}s by using the C++ operator {\sf new} and to destroy them
942using {\sf delete}. For storage efficiency it is desirable that
943dynamic objects are created and destroyed in a last-in-first-out
944fashion.
945
946Whenever an {\sf adouble} is declared, the constructor for the type
947{\sf adouble} assigns it a nominal address, which we will refer to as
948its  {\em location}.  The location is of the type {\sf locint} defined
949in the header file \verb=<adolc/usrparms.h>=. Active vectors occupy
950a range of contiguous locations. As long as the program execution
951never involves more than 65$\,$536 active variables, the type {\sf locint}
952may be defined as {\sf unsigned short}. Otherwise, the range may be
953extended by defining {\sf locint} as {\sf (unsigned) int} or
954{\sf (unsigned) long}, which may nearly double
955the overall mass storage requirement. Sometimes one can avoid exceeding
956the accessible range of {\sf unsigned short}s by using more local variables and deleting
957{\sf adouble}s  created by the new operator in a
958last-in-first-out
959fashion.  When memory for {\sf adouble}s is requested through a call to
960{\sf malloc()} or other related C memory-allocating
961functions, the storage for these {\sf adouble}s is allocated; however, the
962C++ {\sf adouble} constructor is never called.  The newly defined
963{\sf adouble}s are never assigned a location and are not counted in
964the stack of live variables. Thus, any results depending upon these
965pseudo-{\sf adouble}s will be incorrect. For these reasons {\bf DO NOT use
966  malloc() and related C memory-allocating
967functions when declaring adoubles (see the following paragraph).}
968%
969% XXX: Vector and matrix class have to be reimplemented !!!
970%
971%The same point applies, of course,
972% for active vectors.
973
974When an {\sf adouble}
975%
976% XXX: Vector and matrix class have to be reimplemented !!!
977%
978% or {\bf adoublev}
979goes out of
980scope or is explicitly deleted, the destructor notices that its
981location(s) may be
982freed for subsequent (nominal) reallocation. In general, this is not done
983immediately but is delayed until the locations to be deallocated form a
984contiguous tail of all locations currently being used. 
985
986 As a consequence of this allocation scheme, the currently
987alive {\sf adouble} locations always form a contiguous range of integers
988that grows and shrinks like a stack. Newly declared {\sf adouble}s are
989placed on the top so that vectors of {\sf adouble}s obtain a contiguous
990range of locations. While the C++ compiler can be expected to construct
991and destruct automatic variables in a last-in-first-out fashion, the
992user may upset this desirable pattern by deleting free-store {\sf adouble}s
993too early or too late. Then the {\sf adouble} stack may grow
994unnecessarily, but the numerical results will still be
995correct, unless an exception occurs because the range of {\sf locint}
996is exceeded. In general, free-store {\sf adouble}s
997%
998% XXX: Vector and matrix class have to be reimplemented !!!
999%
1000%and {\bf adoublev}s
1001should be deleted in a last-in-first-out fashion toward the end of
1002the program block in which they were created.
1003When this pattern is maintained, the maximum number of
1004{\sf adouble}s alive and, as a consequence, the
1005randomly accessed storage space
1006of the derivative evaluation routines is bounded by a
1007small multiple of the memory used in the relevant section of the
1008original program. Failure to delete dynamically allocated {\sf adouble}s
1009may cause that the  maximal number of {\sf adouble}s alive at one time will be exceeded
1010if the same active section is called repeatedly. The same effect
1011occurs if static {\sf adouble}s are used.
1012
1013To avoid the storage and manipulation of structurally
1014trivial derivative values, one should pay careful attention to
1015the naming of variables. Ideally, the intermediate
1016values generated during the evaluation of a vector function
1017should be assigned to program variables that are
1018consistently either active or passive, in that all their values
1019either are or are not dependent on the independent variables
1020in a nontrivial way. For example, this rule is violated if a temporary
1021variable is successively used to accumulate inner products involving
1022first only passive and later active arrays. Then the first inner
1023product and all its successors in the data dependency graph become
1024artificially active and the derivative evaluation routines
1025described later will waste
1026time allocating and propagating
1027trivial or useless derivatives. Sometimes even values that do
1028depend on the independent variables may be of only transitory
1029importance and may not affect the dependent variables. For example,
1030this is true for multipliers that are used to scale linear
1031equations, but whose values do not influence the dependent
1032variables in a mathematical sense. Such dead-end variables
1033can be deactivated by the use of the {\sf value} function, which
1034converts {\sf adouble}s to {\sf double}s. The deleterious effects
1035of unnecessary activity are partly alleviated by run time
1036activity flags in the derivative routine
1037{\sf hov\_reverse} presented in \autoref{forw_rev_ad}.
1038
1039The {\sf adouble} default constructor sets to zero the associated value.
1040This implies a certain overhead that may seem unnecessary when no initial value
1041is actually given, however,
1042the implicit initialization of arrays from a partial value list is the only legitimate construct (known to us) that requires this behavior.
1043An array instantiation such as
1044\begin{center}
1045\sf double x[3]=\{2.0\};
1046\end{center}
1047will initialize {\sf x[0]} to {\sf 2.0} and initialize (implicitly) the remaining array elements
1048{\sf x[1]} and {\sf x[2]}  to {\sf 0.0}. According to the C++ standard the array element  construction of
1049the type changed instantiation
1050\begin{center}
1051\sf adouble x[3]=\{2.0\};
1052\end{center}
1053will use the constructor {\sf adouble(const double\&);} for {\sf x[0]} passing in {\sf 2.0} but
1054will call the {\sf adouble} default constructor {\sf x[1]} and {\sf x[2]} leaving these array
1055elements uninitialized {\em unless} the default constructor does implement the initialization to
1056zero.
1057The C++ constructor syntax does not provide a means to  distinguish this implicit initialization from the declaration of any simple uninitialized variable.
1058If the user can ascertain the absence of array instantiations such as the above then one can 
1059configure ADOL-C with the \verb=--disable-stdczero= option , see \autoref{genlib}, to
1060avoid the overhead of these initializations. 
1061 
1062%
1063%
1064%
1065%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1066\section{Easy-To-Use Drivers}
1067\label{drivers}
1068%
1069For the convenience of the user, ADOL-C provides several
1070easy-to-use drivers that compute the most frequently required
1071derivative objects. Throughout, we assume that after the execution of an
1072active section, the corresponding tape with the identifier {\sf tag}
1073contains a detailed record of the computational process by which the
1074final values $y$ of the dependent variables were obtained from the
1075values $x$ of the independent variables. We will denote this functional
1076relation between the input variables $x$ and the output variables $y$ by
1077\[
1078F : \R^n \mapsto \R^m, \qquad x \rightarrow F(x) \equiv y.
1079\]
1080The return value of all drivers presented in this section
1081indicate the validity of the tape as explained in \autoref{reuse_tape}.
1082The presented drivers are all C functions and therefore can be used within
1083C and C++ programs. Some Fortran-callable companions can be found
1084in the appropriate header files.
1085%
1086%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1087\subsection{Drivers for Optimization and Nonlinear Equations}
1088%
1089\label{optdrivers}
1090%
1091The drivers provided for solving optimization problems and nonlinear
1092equations are prototyped in the header file \verb=<adolc/drivers/drivers.h>=,
1093which is included automatically by the global header file \verb=<adolc/adolc.h>=
1094(see \autoref{ssec:DesIH}).
1095
1096The routine {\sf function} allows to evaluate the desired function from
1097the tape instead of executing the corresponding source code:
1098%
1099\begin{tabbing}
1100\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1101\>{\sf int function(tag,m,n,x,y)}\\
1102\>{\sf short int tag;}         \> // tape identification \\
1103\>{\sf int m;}                 \> // number of dependent variables $m$\\
1104\>{\sf int n;}                 \> // number of independent variables $n$\\
1105\>{\sf double x[n];}           \> // independent vector $x$ \\
1106\>{\sf double y[m];}           \> // dependent vector $y=F(x)$ 
1107\end{tabbing}
1108%
1109If the original evaluation program is available this double version
1110should be used to compute the function value in order to avoid the
1111interpretative overhead. 
1112
1113For the calculation of whole derivative vectors and matrices up to order
11142 there are the following procedures:
1115%
1116\begin{tabbing}
1117\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1118\>{\sf int gradient(tag,n,x,g)}\\
1119\>{\sf short int tag;}         \> // tape identification \\
1120\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1121\>{\sf double x[n];}           \> // independent vector $x$ \\
1122\>{\sf double g[n];}           \> // resulting gradient $\nabla F(x)$
1123\end{tabbing}
1124%
1125\begin{tabbing}
1126\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1127\>{\sf int jacobian(tag,m,n,x,J)}\\
1128\>{\sf short int tag;}         \> // tape identification \\
1129\>{\sf int m;}                 \> // number of dependent variables $m$\\
1130\>{\sf int n;}                 \> // number of independent variables $n$\\
1131\>{\sf double x[n];}           \> // independent vector $x$ \\
1132\>{\sf double J[m][n];}        \> // resulting Jacobian $F^\prime (x)$
1133\end{tabbing}
1134%
1135\begin{tabbing}
1136\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1137\>{\sf int hessian(tag,n,x,H)}\\
1138\>{\sf short int tag;}         \> // tape identification \\
1139\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1140\>{\sf double x[n];}           \> // independent vector $x$ \\
1141\>{\sf double H[n][n];}        \> // resulting Hessian matrix $\nabla^2F(x)$ 
1142\end{tabbing}
1143%
1144The driver routine {\sf hessian} computes only the lower half of
1145$\nabla^2f(x_0)$ so that all values {\sf H[i][j]} with $j>i$ 
1146of {\sf H} allocated as a square array remain untouched during the call
1147of {\sf hessian}. Hence only $i+1$ {\sf double}s  need to be
1148allocated starting at the position {\sf H[i]}.
1149
1150To use the full capability of automatic differentiation when the
1151product of derivatives with certain weight vectors or directions are needed, ADOL-C offers
1152the following four drivers: 
1153%
1154\begin{tabbing}
1155\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1156\>{\sf int vec\_jac(tag,m,n,repeat,x,u,z)}\\
1157\>{\sf short int tag;}         \> // tape identification \\
1158\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1159\>{\sf int n;}                 \> // number of independent variables $n$\\
1160\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1161\>{\sf double x[n];}           \> // independent vector $x$ \\
1162\>{\sf double u[m];}           \> // range weight vector $u$ \\ 
1163\>{\sf double z[n];}           \> // result $z = u^TF^\prime (x)$
1164\end{tabbing}
1165If a nonzero value of the parameter {\sf repeat} indicates that the
1166routine {\sf vec\_jac} has been called at the same argument immediately
1167before, the internal forward mode evaluation will be skipped and only
1168reverse mode evaluation with the corresponding arguments is executed
1169resulting in a reduced computational complexity of the function {\sf vec\_jac}.
1170%
1171\begin{tabbing}
1172\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1173\>{\sf int jac\_vec(tag,m,n,x,v,z)}\\
1174\>{\sf short int tag;}         \> // tape identification \\
1175\>{\sf int m;}                 \> // number of dependent variables $m$\\
1176\>{\sf int n;}                 \> // number of independent variables $n$\\
1177\>{\sf double x[n];}           \> // independent vector $x$\\
1178\>{\sf double v[n];}           \> // tangent vector $v$\\ 
1179\>{\sf double z[m];}           \> // result $z = F^\prime (x)v$
1180\end{tabbing}
1181%
1182\begin{tabbing}
1183\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1184\>{\sf int hess\_vec(tag,n,x,v,z)}\\
1185\>{\sf short int tag;}         \> // tape identification \\
1186\>{\sf int n;}                 \> // number of independent variables $n$\\
1187\>{\sf double x[n];}           \> // independent vector $x$\\
1188\>{\sf double v[n];}           \> // tangent vector $v$\\
1189\>{\sf double z[n];}           \> // result $z = \nabla^2F(x) v$ 
1190\end{tabbing}
1191%
1192\begin{tabbing}
1193\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1194\>{\sf int hess\_mat(tag,n,p,x,V,Z)}\\
1195\>{\sf short int tag;}         \> // tape identification \\
1196\>{\sf int n;}                 \> // number of independent variables $n$\\
1197\>{\sf int p;}                 \> // number of columns in $V$\\
1198\>{\sf double x[n];}           \> // independent vector $x$\\
1199\>{\sf double V[n][p];}        \> // tangent matrix $V$\\
1200\>{\sf double Z[n][p];}        \> // result $Z = \nabla^2F(x) V$ 
1201\end{tabbing}
1202%
1203\begin{tabbing}
1204\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1205\>{\sf int lagra\_hess\_vec(tag,m,n,x,v,u,h)}\\
1206\>{\sf short int tag;}         \> // tape identification \\
1207\>{\sf int m;}                 \> // number of dependent variables $m$\\
1208\>{\sf int n;}                 \> // number of independent variables $n$\\
1209\>{\sf double x[n];}           \> // independent vector $x$\\
1210\>{\sf double v[n];}           \> // tangent vector $v$\\
1211\>{\sf double u[m];}           \> // range weight vector $u$ \\
1212\>{\sf double h[n];}           \> // result $h = u^T\nabla^2F(x) v $
1213\end{tabbing}
1214%
1215The next procedure allows the user to perform Newton steps only
1216having the corresponding tape at hand:
1217%
1218\begin{tabbing}
1219\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1220\>{\sf int jac\_solv(tag,n,x,b,mode)} \\
1221\>{\sf short int tag;}         \> // tape identification \\
1222\>{\sf int n;}                 \> // number of independent variables $n$\\
1223\>{\sf double x[n];}           \> // independent vector $x$ as\\
1224\>{\sf double b[n];}           \> // in: right-hand side b, out: result $w$ of
1225$F(x)w = b$\\
1226\>{\sf int mode;}              \> // option to choose different solvers
1227\end{tabbing}
1228%
1229On entry, parameter {\sf b} of the routine {\sf jac\_solv}
1230contains the right-hand side of the equation $F(x)w = b$ to be solved. On exit,
1231{\sf b} equals the solution $w$ of this equation. If {\sf mode} = 0 only
1232the Jacobian of the function
1233given by the tape labeled with {\sf tag} is provided internally.
1234The LU-factorization of this Jacobian is computed for {\sf mode} = 1. The
1235solution of the equation is calculated if {\sf mode} = 2.
1236Hence, it is possible to compute the
1237LU-factorization only once. Then the equation can be solved for several
1238right-hand sides $b$ without calculating the Jacobian and
1239its factorization again. 
1240
1241If the original evaluation code of a function contains neither
1242quadratures nor branches, all drivers described above can be used to
1243evaluate derivatives at any argument in its domain. The same still
1244applies if there are no user defined quadratures and
1245all comparisons  involving {\sf adouble}s have the same result as
1246during taping. If this assumption is falsely made all drivers
1247while internally calling the forward mode evaluation will return the value -1 or -2
1248as already specified in \autoref{reuse_tape}
1249%
1250%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1251\subsection{Drivers for Ordinary Differential Equations}
1252\label{odedrivers}
1253%
1254When $F$ is the right-hand side of an (autonomous) ordinary
1255differential equation 
1256\[
1257x^\prime(t) \; = \; F(x(t)) , 
1258\] 
1259we must have $m=n$. Along any solution path $x(t)$ its Taylor
1260coefficients $x_j$ at some time, e.g., $t=0$, must satisfy
1261the relation
1262\[
1263 x_{i+1} = \frac{1}{1+i} y_i.
1264\]
1265with the $y_j$ the Taylor coefficients of its derivative $y(t)=x^\prime(t)$, namely,
1266\[
1267 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
1268\]
1269defined by an autonomous right-hand side $F$ recorded on the tape.
1270Using this relation, one can generate the Taylor coefficients $x_i$,
1271$i \le deg$,
1272recursively from the current point $x_0$. This task is achieved by the
1273driver routine {\sf forode} defined as follows:
1274%
1275\begin{tabbing}
1276\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1277\>{\sf int forode(tag,n,tau,dol,deg,X)}\\
1278\>{\sf short int tag;}         \> // tape identification \\
1279\>{\sf int n;}                 \> // number of state variables $n$\\
1280\>{\sf double tau;}            \> // scaling parameter\\
1281\>{\sf int dol;}               \> // degree on previous call\\
1282\>{\sf int deg;}               \> // degree on current call\\
1283\>{\sf double X[n][deg+1];}    \> // Taylor coefficient vector $X$
1284\end{tabbing}
1285%
1286If {\sf dol} is positive, it is assumed that {\sf forode}
1287has been called before at the same point so that all Taylor coefficient
1288vectors up to the {\sf dol}-th are already correct.
1289
1290Subsequently one may call the driver routine {\sf reverse} or corresponding
1291low level routines as explained in the \autoref{forw_rev} and
1292\autoref{forw_rev_ad}, respectively, to compute
1293the family of square matrices {\sf Z[n][n][deg]} defined by
1294\[
1295Z_j \equiv U\/\frac{\partial y_j}{\partial x_0} \in{I\!\!R}^{q \times n} ,
1296\]
1297with {\sf double** U}$=I_n$ the identity matrix of order {\sf n}.
1298
1299For the numerical solutions of ordinary differential equations,
1300one may also wish to calculate the Jacobians
1301\begin{equation} 
1302\label{eq:bees}
1303B_j \; \equiv \; \frac{\mbox{d}x_{j+1}}{\mbox{d} x_0}\;\in\;{I\!\!R}^{n \times n}\, ,
1304\end{equation}
1305which exist provided $F$ is sufficiently smooth. These matrices can
1306be obtained from the partial derivatives $\partial y_i/\partial x_0$
1307by an appropriate version of the chain rule.
1308To compute the total derivatives $B = (B_j)_{0\leq j <d}$
1309defined in \eqref{eq:bees}, one has to evaluate $\frac{1}{2}d(d-1)$
1310matrix-matrix products. This can be done by a call of the routine {\sf accode} after the
1311corresponding evaluation of the {\sf hov\_reverse} function. The interface of
1312{\sf accode} is defined as follows:
1313%
1314\begin{tabbing}
1315\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1316\>{\sf int accode(n,tau,deg,Z,B,nz)}\\
1317\>{\sf int n;}                 \> // number of state variables $n$ \\
1318\>{\sf double tau;}            \> // scaling parameter\\
1319\>{\sf int deg;}               \> // degree on current call\\
1320\>{\sf double Z[n][n][deg];}   \> // partials of coefficient vectors\\
1321\>{\sf double B[n][n][deg];}   \> // result $B$ as defined in \eqref{eq:bees}\\
1322\>{\sf short nz[n][n];}        \> // optional nonzero pattern
1323\end{tabbing}
1324%
1325Sparsity information can be exploited by {\sf accode} using the array {\sf
1326nz}. For this purpose, {\sf nz} has to be set by a call of the routine {\sf
1327reverse} or the corresponding basic routines as explained below in
1328\autoref{forw_rev_ad} and \autoref{forw_rev}, respectively. The
1329non-positive entries of {\sf nz} are then changed by {\sf accode} so that upon
1330return
1331\[
1332  \mbox{{\sf B[i][j][k]}} \; \equiv \; 0 \quad {\rm if} \quad \mbox{\sf k} \leq \mbox{\sf $-$nz[i][j]}\; .
1333\]
1334In other words, the matrices $B_k$ = {\sf B[ ][ ][k]} have a
1335sparsity pattern that fills in as $k$ grows. Note, that there need to be no
1336loss in computational efficiency if a time-dependent ordinary differential equation
1337is rewritten in autonomous form.
1338
1339The prototype of the ODE-drivers {\sf forode} and {\sf accode} is contained in the header file
1340\verb=<adolc/drivers/odedrivers.h>=. The global header file
1341\verb=<adolc/adolc.h>=
1342includes this file automatically, see \autoref{ssec:DesIH}.
1343
1344An example program using the procedures {\sf forode} and {\sf accode} together
1345with more detailed information about the coding can be found in
1346\autoref{exam:ode}. The corresponding source code
1347\verb=odexam.cpp= is contained in the subdirectory
1348\verb=examples=.
1349%
1350%
1351%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1352\subsection{Drivers for Sparse Jacobians and Sparse Hessians}
1353\label{sparse}
1354%
1355Quite often, the Jacobians and Hessians that have to be computed are sparse
1356matrices. Therefore, ADOL-C provides additionally drivers that
1357allow the exploitation of sparsity. The exploitation of sparsity is
1358frequently based on {\em graph coloring} methods, discussed
1359for example in \cite{GeMaPo05} and \cite{GeTaMaPo07}. The sparse drivers of ADOL-C presented in this section
1360rely on the the coloring package ColPack developed by the authors of \cite{GeMaPo05} and \cite{GeTaMaPo07}.
1361ColPack is not directly incorporated in ADOL-C, and therefore needs to be installed
1362separately to use the sparse drivers described here. ColPack is available for download at
1363\verb=http://www.cscapes.org/coloringpage/software.htm=. More information about the required
1364installation of ColPack is given in \autoref{install}.
1365%
1366\subsubsection*{Sparse Jacobians and Sparse Hessians}
1367%
1368To compute the entries of sparse Jacobians and sparse Hessians,
1369respectively, in coordinate format one may use the drivers:
1370\begin{tabbing}
1371\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1372\>{\sf int sparse\_jac(tag,m,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1373\>{\sf short int tag;}         \> // tape identification \\
1374\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1375\>{\sf int n;}                 \> // number of independent variables $n$\\
1376\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1377\>{\sf double x[n];}           \> // independent vector $x$ \\
1378\>{\sf int nnz;}               \> // number of nonzeros \\ 
1379\>{\sf unsigned int rind[nnz];}\> // row index\\ 
1380\>{\sf unsigned int cind[nnz];}\> // column index\\ 
1381\>{\sf double values[nnz];}    \> // non-zero values\\ 
1382\>{\sf int options[4];}        \> // array of control parameters\\ 
1383\end{tabbing}
1384%
1385\begin{tabbing}
1386\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1387\>{\sf int sparse\_hess(tag,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1388\>{\sf short int tag;}         \> // tape identification \\
1389\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1390\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1391\>{\sf double x[n];}           \> // independent vector $x$ \\
1392\>{\sf int nnz;}               \> // number of nonzeros \\ 
1393\>{\sf unsigned int rind[nnz];}\> // row indices\\ 
1394\>{\sf unsigned int cind[nnz];}\> // column indices\\ 
1395\>{\sf double values[nnz];}    \> // non-zero values  \\
1396\>{\sf int options[2];}        \> // array of control parameters\\ 
1397\end{tabbing}
1398%
1399Once more, the input variables are the identifier for the internal
1400representation {\sf tag}, if required the number of dependents {\sf m},
1401and the number of independents {\sf n} for a consistency check.
1402Furthermore, the flag {\sf repeat=0} indicates that the functions are called
1403at a point with a new sparsity structure, whereas  {\sf repeat=1} results in
1404the re-usage of the sparsity pattern from the previous call.
1405The current values of the independents are given by the array {\sf x}.
1406The input/output
1407variable {\sf nnz} stores the number of the nonzero entries.
1408Therefore, {\sf nnz} denotes also the length of the arrays {\sf r\_ind} storing
1409the row indices, {\sf c\_ind} storing the column indices, and
1410{\sf values} storing the values of the nonzero entries.
1411If {\sf sparse\_jac} and {\sf sparse\_hess} are called with {\sf repeat=0},
1412the functions determine the number of nonzeros for the sparsity pattern
1413defined by the value of {\sf x}, allocate appropriate arrays {\sf r\_ind},
1414{\sf c\_ind}, and {\sf values} and store the desired information in these
1415arrays.
1416During the next function call with {\sf repeat=1} the allocated memory
1417is reused such that only the values of the arrays are changed.   
1418Before calling {\sf sparse\_jac} or {\sf sparse\_hess} once more with {\sf
1419  repeat=0} the user is responsible for the deallocation of the array
1420 {\sf r\_ind}, {\sf c\_ind}, and {\sf values} using the function {\sf
1421   delete[]}!
1422
1423For each driver the array {\sf options} can be used to adapted the
1424computation of the sparse derivative matrices to the special
1425needs of application under consideration. Most frequently, the default options
1426will give a reasonable performance. The elements of the array {\sf options} control the action of
1427{\sf sparse\_jac} according to \autoref{options_sparse_jac}.
1428\begin{table}[h]
1429\center
1430\begin{tabular}{|c|c|l|} \hline
1431component & value &  \\ \hline
1432{\sf options[0]} &    &  way of sparsity pattern computation \\
1433                 & 0  &  propagation of index domains (default) \\
1434                 & 1  &  propagation of bit pattern \\ \hline
1435{\sf options[1]} &    &  test the computational graph control flow \\
1436                 & 0  &  safe mode (default) \\
1437                 & 1  &  tight mode \\ \hline
1438{\sf options[2]} &    &  way of bit pattern propagation \\
1439                 & 0  &  automatic detection (default) \\
1440                 & 1  &  forward mode \\ 
1441                 & 2  &  reverse mode \\ \hline
1442{\sf options[3]} &    &  way of compression \\
1443                 & 0  &  column compression (default) \\
1444                 & 1  &  row compression \\ \hline
1445\end{tabular}
1446\caption{ {\sf sparse\_jac} parameter {\sf options}\label{options_sparse_jac}}
1447\end{table}           
1448
1449The component {\sf options[1]} determines
1450the usage of the safe or tight mode of sparsity computation.
1451The first, more conservative option is the default. It accounts for all
1452dependences that might occur for any value of the
1453independent variables. For example, the intermediate
1454{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1455always assumed to depend on all independent variables that {\sf a} or {\sf b}
1456dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1457logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1458In contrast
1459the tight option gives this result only in the unlikely event of an exact
1460tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1461associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1462depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1463Obviously, the sparsity pattern obtained with the tight option may contain
1464more zeros than that obtained with the safe option. On the other hand, it
1465will only be valid at points belonging to an area where the function $F$ is locally
1466analytic and that contains the point at which the internal representation was
1467generated. Since generating the sparsity structure using the safe version does not
1468require any reevaluation, it may thus reduce the overall computational cost
1469despite the fact that it produces more nonzero entries.
1470The value of {\sf options[2]} selects the direction of bit pattern propagation.
1471Depending on the number of independent $n$ and of dependent variables $m$ 
1472one would prefer the forward mode if $n$ is significant smaller than $m$ and
1473would otherwise use the reverse mode.
1474
1475 The elements of the array {\sf options} control the action of
1476{\sf sparse\_hess} according to \autoref{options_sparse_hess}.
1477\begin{table}[h]
1478\center
1479\begin{tabular}{|c|c|l|} \hline
1480component & value &  \\ \hline
1481{\sf options[0]} &    &  test the computational graph control flow \\
1482                 & 0  &  safe mode (default) \\
1483                 & 1  &  tight mode \\ \hline
1484{\sf options[1]} &    &  way of recovery \\
1485                 & 0  &  indirect recovery (default) \\
1486                 & 1  &  direct recovery \\ \hline
1487\end{tabular}
1488\caption{ {\sf sparse\_hess} parameter {\sf options}\label{options_sparse_hess}}
1489\end{table}           
1490
1491The described driver routines for the computation of sparse derivative
1492matrices are prototyped in the header file
1493\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1494global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1495Example codes illustrating the usage of {\sf
1496  sparse\_jac} and {\sf sparse\_hess} can be found in the file
1497\verb=sparse_jacobian.cpp=  and \verb=sparse_hessian.cpp= contained in %the subdirectory
1498\verb=examples/additional_examples/sparse=.
1499%
1500%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1501\subsubsection*{Computation of Sparsity Pattern}
1502%
1503ADOL-C offers a convenient way of determining the 
1504sparsity structure of a Jacobian matrix using the function:
1505%
1506\begin{tabbing}
1507\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1508\>{\sf int jac\_pat(tag, m, n, x, JP, options)}\\
1509\>{\sf short int tag;} \> // tape identification \\
1510\>{\sf int m;} \> // number of dependent variables $m$\\
1511\>{\sf int n;} \> // number of independent variables $n$\\
1512\>{\sf double x[n];} \> // independent variables $x_0$\\
1513\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure\\
1514\>{\sf int options[2];} \> // array of control parameters
1515\end{tabbing}
1516%
1517The sparsity pattern of the
1518Jacobian is computed in a compressed row format. For this purpose,
1519{\sf JP} has to be an $m$ dimensional array of pointers to {\sf
1520  unsigned int}s, i.e., one has {\sf unsigned int* JP[m]}.
1521During the call of  {\sf jac\_pat}, the number $\hat{n}_i$ of nonzero
1522entries in row $i$ of the Jacobian is determined for all $1\le i\le
1523m$. Then, a memory allocation is performed such that {\sf JP[i-1]}
1524points to a block of $\hat{n}_i+1$ {\sf  unsigned int} for all $1\le
1525i\le m$ and {\sf JP[i-1][0]} is set to $\hat{n}_i$. Subsequently, the
1526column indices of the $j$ nonzero entries in the $i$th row are stored
1527in the components  {\sf JP[i-1][1]}, \ldots, {\sf JP[i-1][j]}.
1528
1529The elements of the array {\sf options} control the action of
1530{\sf jac\_pat} according to \autoref{options}.
1531\begin{table}[h]
1532\center
1533\begin{tabular}{|c|c|l|} \hline
1534component & value &  \\ \hline
1535{\sf options[0]} &    &  way of sparsity pattern computation \\
1536                 & 0  &  propagation of index domains (default) \\
1537                 & 1  &  propagation of bit pattern \\ \hline
1538{\sf options[1]} &    &  test the computational graph control flow \\
1539                 & 0  &  safe mode (default) \\
1540                 & 1  &  tight mode \\ \hline
1541{\sf options[2]} &    &  way of bit pattern propagation \\
1542                 & 0  &  automatic detection (default) \\
1543                 & 1  &  forward mode \\ 
1544                 & 2  &  reverse mode \\ \hline
1545\end{tabular}
1546\caption{ {\sf jac\_pat} parameter {\sf options}\label{options}}
1547\end{table}           
1548The value of {\sf options[0]} selects the way to compute the sparsity
1549pattern. The component {\sf options[1]} determines
1550the usage of the safe or tight mode of bit pattern propagation.
1551The first, more conservative option is the default. It accounts for all
1552dependences that might occur for any value of the
1553independent variables. For example, the intermediate
1554{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1555always assumed to depend on all independent variables that {\sf a} or {\sf b}
1556dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1557logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1558In contrast
1559the tight option gives this result only in the unlikely event of an exact
1560tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1561associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1562depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1563Obviously, the sparsity pattern obtained with the tight option may contain
1564more zeros than that obtained with the safe option. On the other hand, it
1565will only be valid at points belonging to an area where the function $F$ is locally
1566analytic and that contains the point at which the internal representation was
1567generated. Since generating the sparsity structure using the safe version does not
1568require any reevaluation, it may thus reduce the overall computational cost
1569despite the fact that it produces more nonzero entries. The value of
1570{\sf options[2]} selects the direction of bit pattern propagation.
1571Depending on the number of independent $n$ and of dependent variables $m$ 
1572one would prefer the forward mode if $n$ is significant smaller than $m$ and
1573would otherwise use the reverse mode.
1574
1575The routine {\sf jac\_pat} may use the propagation of bitpattern to
1576determine the sparsity pattern. Therefore, a kind of ``strip-mining''
1577is used to cope with large matrix dimensions. If the system happens to run out of memory, one may reduce
1578the value of the constant {\sf PQ\_STRIPMINE\_MAX}
1579following the instructions in \verb=<adolc/sparse/sparse_fo_rev.h>=.
1580
1581The driver routine is prototyped in the header file
1582\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1583global header file \verb=<adolc/adolc.h>= (see
1584\autoref{ssec:DesIH}). The determination of sparsity patterns is
1585illustrated by the examples \verb=sparse_jacobian.cpp=
1586and \verb=jacpatexam.cpp=
1587contained in
1588\verb=examples/additional_examples/sparse=.
1589
1590To compute the sparsity pattern of a Hessian in a row compressed form, ADOL-C provides the
1591driver
1592\begin{tabbing}
1593\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1594\>{\sf int hess\_pat(tag, n, x, HP, options)}\\
1595\>{\sf short int tag;}       \> // tape identification \\
1596\>{\sf int n;}               \> // number of independent variables $n$\\
1597\>{\sf double x[n];}         \> // independent variables $x_0$\\
1598\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure\\
1599\>{\sf int option;}          \> // control parameter
1600\end{tabbing}
1601where the user has to provide {\sf HP} as an $n$ dimensional array of pointers to {\sf
1602 unsigned int}s.
1603After the function call {\sf HP} contains the sparsity pattern,
1604where {\sf HP[j][0]} contains the number of nonzero elements in the
1605 $j$th row for $1 \le j\le n$.
1606The components {\sf P[j][i]}, $0<${\sf i}~$\le$~{\sf P[j][0]} store the
1607 indices of these entries. For determining the sparsity pattern, ADOL-C uses
1608 the algorithm described in \cite{Wa05a}.  The parameter{\sf option} determines
1609the usage of the safe ({\sf option = 0}, default) or tight mode ({\sf
1610  option = 1}) of the computation of the sparsity pattern as described
1611above.
1612
1613This driver routine is prototyped in the header file
1614\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1615global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1616An example employing the procedure {\sf hess\_pat}  can be found in the file
1617\verb=sparse_hessian.cpp=  contained in
1618\verb=examples/additional_examples/sparse=.
1619%
1620%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1621\subsubsection*{Calculation of Seed Matrices}
1622%
1623To compute a compressed derivative matrix from a given sparsity
1624pattern, one has to calculate an appropriate seed matrix that can be
1625used as input for the derivative calculation. To facilitate the
1626generation of seed matrices for a sparsity pattern given in
1627row compressed form, ADOL-C provides the following two drivers,
1628which are based on the ColPack library:
1629\begin{tabbing}
1630\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1631\>{\sf int generate\_seed\_jac(m, n, JP, S, p)}\\
1632\>{\sf int m;} \> // number of dependent variables $m$\\
1633\>{\sf int n;} \> // number of independent variables $n$\\
1634\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure
1635of Jacobian\\
1636\>{\sf double S[n][p];} \> // seed matrix\\
1637\>{\sf int p;} \> // number of columns in $S$
1638\end{tabbing}
1639The input variables to {\sf generate\_seed\_jac} are the number of dependent variables $m$, the
1640number of independent variables {\sf n} and the sparsity pattern {\sf
1641  JP} of the Jacobian computed for example by {\sf jac\_pat}. First,
1642{\sf generate\_seed\_jac} performs a distance-2 coloring of the bipartite graph defined by the sparsity
1643pattern {\sf JP} as described in \cite{GeMaPo05}. The number of colors needed for the coloring
1644determines the number of columns {\sf p} in the seed
1645matrix. Subsequently, {\sf generate\_seed\_jac} allocates the memory needed by {\sf
1646 S} and initializes {\sf S} according to the graph coloring.
1647The coloring algorithm that is applied in {\sf
1648  generate\_seed\_jac} is used also by the driver {\sf sparse\_jac}
1649described earlier.
1650
1651\begin{tabbing}
1652\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1653\>{\sf int generate\_seed\_hess(n, HP, S, p)}\\
1654\>{\sf int n;} \> // number of independent variables $n$\\
1655\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure
1656of Jacobian\\
1657\>{\sf double S[n][p];} \> // seed matrix\\
1658\>{\sf int p;} \> // number of columns in $S$
1659\end{tabbing}
1660The input variables to {\sf generate\_seed\_hess} are the number of independents $n$
1661and the sparsity pattern {\sf HP} of the Hessian computed for example
1662by {\sf hess\_pat}. First, {\sf generate\_seed\_hess} performs an
1663appropriate coloring of the adjacency graph defined by the sparsity
1664pattern {\sf HP}: An acyclic coloring in the case of an indirect recovery of the Hessian from its
1665    compressed representation and a star coloring in the case of a direct recovery.
1666 Subsequently, {\sf generate\_seed\_hess} allocates the memory needed by {\sf
1667 S} and initializes {\sf S} according to the graph coloring.
1668The coloring algorithm applied in {\sf
1669  generate\_seed\_hess} is used also by the driver {\sf sparse\_hess}
1670described earlier.
1671
1672The specific set of criteria used to define a seed matrix $S$ depends
1673on whether the sparse derivative matrix
1674to be computed is a Jacobian (nonsymmetric) or a Hessian (symmetric). 
1675It also depends on whether the entries of the derivative matrix  are to be
1676recovered from the compressed representation \emph{directly}
1677(without requiring any further arithmetic) or \emph{indirectly} (for
1678example, by solving for unknowns via successive substitutions).
1679Appropriate recovery routines are provided by ColPack and used
1680in the drivers {\sf sparse\_jac} and {\sf sparse\_hess} described in
1681the previous subsection. Examples with a detailed analysis of the
1682employed drivers for the exploitation of sparsity can be found in the
1683papers \cite{GePoTaWa06} and \cite{GePoWa08}.
1684
1685
1686These driver routines are prototyped in
1687\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1688global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1689An example code illustrating the usage of {\sf
1690generate\_seed\_jac} and {\sf generate\_seed\_hess} can be found in the file
1691\verb=sparse_jac_hess_exam.cpp= contained in \verb=examples/additional_examples/sparse=.
1692%
1693%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1694\subsection{Higher Derivative Tensors}
1695\label{higherOrderDeriv}
1696%
1697Many applications in scientific computing need second- and higher-order
1698derivatives. Often, one does not require full derivative tensors but
1699only the derivatives in certain directions $s_i \in \R^{n}$.
1700Suppose a collection of $p$ directions
1701$s_i \in \R^{n}$ is given, which form a matrix
1702\[
1703S\; =\; \left [ s_1, s_2,\ldots,  s_p \right ]\; \in \;
1704 \R^{n \times p}.
1705\]
1706One possible choice is $S = I_n$ with  $p = n$, which leads to
1707full tensors being evaluated.
1708ADOL-C provides the function {\sf tensor\_eval}
1709to calculate the derivative tensors
1710\begin{eqnarray}
1711\label{eq:tensor}
1712\left. \nabla_{\mbox{$\scriptstyle \!\!S$}}^{k}
1713     F(x_0) \; = \; \frac{\partial^k}{\partial z^k} F(x_0+Sz) \right |_{z=0} 
1714     \in \R^{p^k}\quad \mbox{for} \quad k = 0,\ldots,d
1715\end{eqnarray}
1716simultaneously. The function {\sf tensor\_eval} has the following calling sequence and
1717parameters:
1718%
1719\begin{tabbing}
1720\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1721\>{\sf void tensor\_eval(tag,m,n,d,p,x,tensor,S)}\\
1722\>{\sf short int tag;}         \> // tape identification \\
1723\>{\sf int m;}                 \> // number of dependent variables $m$ \\
1724\>{\sf int n;}                 \> // number of independent variables $n$\\
1725\>{\sf int d;}                 \> // highest derivative degree $d$\\
1726\>{\sf int p;}                 \> // number of directions $p$\\
1727\>{\sf double x[n];}           \> // values of independent variables $x_0$\\
1728\>{\sf double tensor[m][size];}\> // result as defined in \eqref{eq:tensor} in compressed form\\
1729\>{\sf double S[n][p];}        \> // seed matrix $S$
1730\end{tabbing}
1731%
1732Using the symmetry of the tensors defined by \eqref{eq:tensor}, the memory 
1733requirement can be reduced enormously. The collection of  tensors up to order $d$ comprises 
1734$\binom{p+d}{d}$ distinct elements. Hence, the second dimension of {\sf tensor} must be
1735greater or equal to $\binom{p+d}{d}$.
1736To compute the derivatives, {\sf tensor\_eval} propagates internally univariate Taylor
1737series along $\binom{n+d-1}{d}$ directions. Then the desired values are interpolated. This
1738approach is described in \cite{Griewank97}.
1739
1740The access of individual entries in symmetric tensors of
1741higher order is a little tricky. We always store the derivative
1742values in the two dimensional array {\sf tensor} and provide two
1743different ways of accessing them. 
1744The leading dimension of the tensor array ranges over
1745the component index $i$ of the function $F$, i.e., $F_{i+1}$ for $i =
17460,\ldots,m-1$. The sub-arrays pointed to by {\sf tensor[i]} have identical
1747structure for all $i$. Each of them represents the symmetric tensors up to
1748order $d$ of the scalar function $F_{i+1}$ in $p$ variables. 
1749%
1750The $\binom{p+d}{d}$ mixed partial derivatives in each of the $m$
1751tensors are linearly ordered according to the tetrahedral
1752scheme described by Knuth \cite{Knuth73}. In the familiar quadratic
1753case $d=2$ the derivative with respect to $z_j$ and $z_k$ with $z$ 
1754as in \eqref{eq:tensor} and $j \leq k$ is stored at {\sf tensor[i][l]} with
1755$l = k*(k+1)/2+j$. At $j = 0 = k$ and hence $l = 0$ we find the
1756function value $F_{i+1}$ itself and the gradient
1757$\nabla F_{i+1}= \partial F_{i+1}/\partial x_k $ is stored at $l=k(k+1)/2$
1758with $j=0$ for $k=1,\ldots,p$.
1759
1760For general $d$ we combine the variable
1761indices to a multi-index $j = (j_1,j_2,\ldots,j_d)$,
1762where $j_k$ indicates differentiation with respect to variable
1763$x_{j_k}$ with $j_k \in \{0,1,\ldots,p\}$. The value $j_k=0$ indicates
1764no differentiation so that all lower derivatives are also
1765contained in the same data structure as described above for
1766the quadratic case. The location of the partial derivative specified
1767by $j$ is computed by the function
1768%
1769\begin{tabbing}
1770\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1771\>{\sf int address(d,$\,$j)} \\
1772\>{\sf int d;}                 \> // highest derivative degree $d$ \\
1773\>{\sf int j[d];}              \> // multi-index $j$
1774\end{tabbing}       
1775%
1776and it may thus be referenced as {\sf tensor[i][address(d,$\,$j)]}.
1777Notice that the address computation does depend on the degree $d$ 
1778but not on the number of directions $p$, which could theoretically be
1779enlarged without the need to reallocate the original tensor.
1780Also, the components of $j$ need to be non-increasing.
1781%
1782To some C programmers it may appear more natural to access tensor
1783entries by successive dereferencing in the form
1784{\sf tensorentry[i][$\,$j1$\,$][$\,$j2$\,$]$\ldots$[$\,$jd$\,$]}.
1785We have also provided this mode, albeit with the restriction
1786that the indices $j_1,j_2,\ldots,j_d$ are non-increasing.
1787In the second order case this means that the Hessian entries must be
1788specified in or below the diagonal. If this restriction is
1789violated the values are almost certain to be wrong and array bounds
1790may be violated. We emphasize that subscripting is not overloaded
1791but that {\sf tensorentry} is a conventional and
1792thus moderately efficient C pointer structure.
1793Such a pointer structure can be allocated and set up completely by the
1794function
1795%
1796\begin{tabbing}
1797\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1798\>{\sf void** tensorsetup(m,p,d,tensor)} \\
1799\>{\sf int m;}                 \> // number of dependent variables $n$ \\
1800\>{\sf int p;}                 \> // number of directions $p$\\
1801\>{\sf int d;}                 \> // highest derivative degree $d$\\
1802\>{\sf double tensor[m][size];}\> // pointer to two dimensional array
1803\end{tabbing}     
1804%
1805Here, {\sf tensor} is the array of $m$ pointers pointing to arrays of {\sf size}
1806$\geq \binom{p+d}{d}$ allocated by the user before. During the execution of {\sf tensorsetup},
1807 $d-1$ layers of pointers are set up so that the return value
1808allows the direct dereferencing of individual tensor elements.
1809
1810For example, suppose some active section involving  $m \geq 5$ dependents and
1811$n \geq 2$ independents has been executed and taped. We may
1812select $p=2$, $d=3$ and initialize the $n\times 2$ seed matrix $S$ with two
1813columns $s_1$ and $s_2$. Then we are able to execute the code segment
1814\begin{tabbing}
1815\hspace{0.5in}\={\sf double**** tensorentry = (double****) tensorsetup(m,p,d,tensor);} \\
1816              \>{\sf tensor\_eval(tag,m,n,d,p,x,tensor,S);}   
1817\end{tabbing}
1818This way, we evaluated all tensors defined in \eqref{eq:tensor} up to degree 3
1819in both directions $s_1$ and
1820$s_2$ at some argument $x$. To allow the access of tensor entries by dereferencing the pointer
1821structure {\sf tensorentry} has been created. Now, 
1822the value of the mixed partial
1823\[
1824 \left. \frac{\partial ^ 3 F_5(x+s_1 z_1+s_2 z_2)}{\partial z_1^2 \partial z_2}   \right |_{z_1=0=z_2
1825\]
1826can be recovered as
1827\begin{center}
1828   {\sf tensorentry[4][2][1][1]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[4][address(d,$\,$j)]},
1829\end{center}
1830where the integer array {\sf j} may equal (1,1,2), (1,2,1) or (2,1,1). 
1831Analogously, the entry
1832\begin{center}   
1833   {\sf tensorentry[2][1][0][0]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[2][address(d,$\,$j)]}
1834\end{center}
1835with {\sf j} = (1,0,0) contains the first derivative of the third dependent
1836variable $F_3$ with respect to the first differentiation parameter $z_1$.
1837
1838Note, that the pointer structure {\sf tensorentry} has to be set up only once. Changing the values of the
1839array {\sf tensor}, e.g.~by a further call of {\sf tensor\_eval}, directly effects the values accessed
1840by {\sf tensorentry}.
1841%
1842When no more derivative evaluations are desired the pointer structure
1843{\sf tensorentry} can be deallocated by a call to the function
1844%
1845\begin{tabbing}
1846\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1847\>{\sf int freetensor(m,p,d, (double ****) tensorentry)}\\
1848\>{\sf int m;}                    \> // number of dependent variables $m$ \\
1849\>{\sf int p;}                    \> // number of independent variables $p$\\
1850\>{\sf int d;}                    \> // highest derivative degree $d$\\
1851\>{\sf double*** tensorentry[m];} \> // return value of {\sf tensorsetup} 
1852\end{tabbing} 
1853%
1854that does not deallocate the array {\sf tensor}.
1855
1856The drivers provided for efficient calculation of higher order
1857derivatives are prototyped in the header file \verb=<adolc/drivers/taylor.h>=,
1858which is included by the global header file \verb=<adolc/adolc.h>= automatically
1859(see \autoref{ssec:DesIH}).
1860Example codes using the above procedures can be found in the files
1861\verb=taylorexam.C= and \verb=accessexam.C= contained in the subdirectory
1862\verb=examples/additional_examples/taylor=.
1863%
1864%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1865\subsection{Derivatives of Implicit and Inverse Functions}
1866\label{implicitInverse}
1867%
1868Frequently, one needs derivatives of variables
1869$y \in \R^{m}$ that are implicitly defined as
1870functions of some variables $x \in \R^{n-m}$
1871by an algebraic system of equations
1872\[
1873G(z) \; = \; 0 \in \R^m \quad
1874{\rm with} \quad z = (y, x) \in \R^n .
1875\] 
1876Naturally, the $n$ arguments of $G$ need not be partitioned in
1877this regular fashion and we wish to provide flexibility for a
1878convenient selection of the $n-m$ {\em truly} independent
1879variables. Let $P \in \R^{(n-m)\times n}$ be a $0-1$ matrix
1880that picks out these variables so that it is a column
1881permutation of the matrix $[0,I_{n-m}] \in \R^{(n-m)\times n}$.
1882Then the nonlinear system
1883\[
1884  G(z) \; = \; 0, \quad P z =  x,                           
1885\] 
1886has a regular Jacobian, wherever the implicit function theorem
1887yields $y$ as a function of $x$. Hence, we may also write
1888\begin{equation}
1889\label{eq:inv_tensor}
1890F(z) = \left(\begin{array}{c}
1891                        G(z) \\
1892                        P z
1893                      \end{array} \right)\; \equiv \;
1894                \left(\begin{array}{c}
1895                        0 \\
1896                        P z
1897                      \end{array} \right)\; \equiv \; S\, x,
1898\end{equation}
1899where $S = [0,I_p]^{T} \in \R^{n \times p}$ with $p=n-m$. Now, we have rewritten
1900the original implicit functional relation between $x$ and $y$ as an inverse
1901relation $F(z) = Sx$. In practice, we may implement the projection $P$ simply
1902by marking $n-m$ of the independents also dependent. 
1903
1904Given any $ F : \R^n \mapsto \R^n $ that is locally invertible and an arbitrary
1905seed matrix $S \in \R^{n \times p}$ we may evaluate all derivatives of $z \in \R^n$
1906with respect to $x \in \R^p$ by calling the following routine:
1907%
1908\begin{tabbing}
1909\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1910\>{\sf void inverse\_tensor\_eval(tag,n,d,p,z,tensor,S)}\\
1911\>{\sf short int tag;}         \> // tape identification \\
1912\>{\sf int n;}                 \> // number of variables $n$\\
1913\>{\sf int d;}                 \> // highest derivative degree $d$\\
1914\>{\sf int p;}                 \> // number of directions $p$\\
1915\>{\sf double z[n];}          \> // values of independent variables $z$\\
1916\>{\sf double tensor[n][size];}\> // partials of $z$ with respect to $x$\\
1917\>{\sf double S[n][p];}        \> // seed matrix $S$
1918\end{tabbing}                 
1919%                 
1920The results obtained in {\sf tensor} are exactly the same as if we had called {\sf tensor\_eval} with
1921{\sf tag} pointing to a tape for the evaluation of the inverse function
1922$z=F^{-1}(y)$ for which naturally $n=m$. Note that the columns of $S$ belong
1923to the domain of that function. Individual derivative components can be
1924accessed in tensor exactly as in the explicit case described above.
1925
1926It must be understood that {\sf inverse\_tensor\_eval} actually computes the
1927derivatives of $z$ with respect to $x$ that is defined by the equation
1928$F(z)=F(z_0)+S \, x$. In other words the base point at
1929which the inverse function is differentiated is given by $F(z_0)$.
1930The routine has no capability for inverting $F$ itself as
1931solving systems of nonlinear
1932equations $F(z)=0$ in the first place is not just a differentiation task.
1933However, the routine {\sf jac\_solv} described in \autoref{optdrivers} may certainly be very
1934useful for that purpose.
1935
1936As an example consider the following two nonlinear expressions
1937\begin{eqnarray*}
1938      G_1(z_1,z_2,z_3,z_4) & = & z_1^2+z_2^2-z_3^\\
1939      G_2(z_1,z_2,z_3,z_4) & = & \cos(z_4) - z_1/z_3 \enspace   .
1940\end{eqnarray*}   
1941The equations $G(z)=0$ describe the relation between the Cartesian
1942coordinates $(z_1,z_2)$ and the polar coordinates $(z_3,z_4)$ in the plane.
1943Now, suppose we are interested in the derivatives of the second Cartesian
1944$y_1=z_2$ and the second (angular) polar coordinate $y_2=z_4$ with respect
1945to the other two variables $x_1=z_1$ and $x_2=z_3$. Then the active section
1946could look simply like
1947%
1948\begin{tabbing}
1949\hspace{1.5in}\={\sf for (j=1; j $<$ 5;$\,$j++)}\hspace{0.15in} \= {\sf z[j] \boldmath $\ll=$ \unboldmath  zp[j];}\\
1950\>{\sf g[1] = z[1]*z[1]+z[2]*z[2]-z[3]*z[3]; }\\
1951\>{\sf g[2] = cos(z[4]) - z[1]/z[3]; }\\
1952\>{\sf g[1] \boldmath $\gg=$ \unboldmath gp[1];} \> {\sf g[2] \boldmath $\gg=$ \unboldmath gp[2];}\\
1953\>{\sf z[1] \boldmath $\gg=$ \unboldmath zd[1];} \> {\sf z[3] \boldmath $\gg=$ \unboldmath zd[2];}
1954\end{tabbing}     
1955%
1956where {\sf zd[1]} and {\sf zd[2]} are dummy arguments.
1957In the last line the two independent variables {\sf z[1]} and
1958{\sf z[3]} are made
1959simultaneously dependent thus generating a square system that can be
1960inverted (at most arguments). The corresponding projection and seed
1961matrix are
1962\begin{eqnarray*}
1963P \;=\; \left( \begin{array}{cccc}
1964               1 & 0 & 0 & 0 \\
1965               0 & 0 & 1 & 0
1966            \end{array}\right) \quad \mbox{and} \quad
1967S^T \; = \; \left( \begin{array}{cccc}
1968               0 & 0 & 1 & 0 \\
1969               0 & 0 & 0 & 1
1970            \end{array}\right\enspace .
1971\end{eqnarray*}
1972Provided the vector {\sf zp} is consistent in that its Cartesian and polar
1973components describe the same point in the plane the resulting tuple
1974{\sf gp} must vanish. The call to {\sf inverse\_tensor\_eval} with
1975$n=4$, $p=2$ and $d$
1976as desired will yield the implicit derivatives, provided
1977{\sf tensor} has been allocated appropriately of course and $S$ has the value
1978given above.
1979%
1980The example is untypical in that the implicit function could also be
1981obtained explicitly by symbolic mani\-pu\-lations. It is typical in that
1982the subset of $z$ components that are to be considered as truly
1983independent can be selected and altered with next to no effort at all.
1984
1985The presented drivers are prototyped in the header file
1986\verb=<adolc/drivers/taylor.h>=. As indicated before this header
1987is included by the global header file \verb=<adolc/adolc.h>= automatically
1988(see \autoref{ssec:DesIH}).
1989The example programs \verb=inversexam.cpp=, \verb=coordinates.cpp= and
1990\verb=trigger.cpp=  in the directory \verb=examples/additional_examples/taylor=
1991show the application of the procedures described here.
1992%
1993%
1994%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
1995\section{Basic Drivers for the Forward and Reverse Mode}
1996\label{forw_rev_ad}
1997%
1998In this section, we present tailored drivers for different
1999variants of the forward mode and the reverse mode, respectively.
2000For a better understanding, we start with a short
2001description of the mathematical background.
2002
2003Provided no arithmetic exception occurs,
2004no comparison including {\sf fmax} or  {\sf fmin} yields a tie,
2005{\sf fabs} does not yield zero,
2006and all special functions were evaluated in the
2007interior of their domains, the functional relation between the input
2008variables $x$
2009and the output variables $y$ denoted by $y=F(x)$ is in
2010fact analytic.  In other words, we can compute arbitrarily high
2011derivatives of the vector function $F : I\!\!R^n \mapsto I\!\!R^m$ defined
2012by the active section.
2013We find it most convenient to describe and
2014compute derivatives in terms of univariate Taylor expansions, which
2015are truncated after the highest derivative degree $d$ that is desired
2016by the user. Let
2017\begin{equation}
2018\label{eq:x_of_t}
2019x(t) \; \equiv \; \sum_{j=0}^dx_jt^j \; : \;  I\!\!R \; \mapsto \;
2020I\!\!R^n
2021\end{equation} 
2022denote any vector polynomial in the scalar variable $t \in I\!\!R$.
2023In other words, $x(t)$ describes a path in $I\!\!R^n$ parameterized by $t$.
2024The Taylor coefficient vectors
2025\[ x_j \; = \; 
2026\frac{1}{j!} \left .  \frac{\partial ^j}{\partial t^j} x(t)
2027\right |_{t=0}
2028\] 
2029are simply the scaled derivatives of $x(t)$ at the parameter
2030origin $t=0$. The first two vectors $x_1,x_2 \in I\!\!R^n$ can be
2031visualized as tangent and curvature at the base point $x_0$,
2032respectively.
2033Provided that $F$ is $d$ times continuously differentiable, it
2034follows from the chain rule that the image path
2035\begin{equation}
2036\label{eq:rela}
2037 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
2038\end{equation}
2039is also smooth and has $(d+1)$ Taylor coefficient vectors
2040$y_j \in I\!\!R^m$ at $t=0$, so that
2041\begin{equation}
2042\label{eq:series}
2043y(t) \; = \; \sum_{j=0}^d y_jt^j + O(t^{d+1}).
2044\end{equation}
2045Also as a consequence of the chain rule, one can observe that
2046each $y_j$ is uniquely and smoothly determined by the coefficient
2047vectors $x_i$ with $i \leq j$.  In particular we have
2048\begin{align}
2049\label{eq:y_0y_1}
2050  y_0 & = F(x_0) \nonumber \\
2051  y_1 & = F'(x_0) x_1 \nonumber\\
2052  y_2 & = F'(x_0) x_2 + \frac{1}{2}F''(x_0)x_1 x_1 \\
2053  y_3 & = F'(x_0) x_3 + F''(x_0)x_1 x_2
2054          + \frac{1}{6}F'''(x_0)x_1 x_1 x_1\nonumber\\
2055  & \ldots\nonumber
2056\end{align}
2057In writing down the last equations we have already departed from the
2058usual matrix-vector notation. It is well known that the number of
2059terms that occur in these ``symbolic'' expressions for
2060the $y_j$ in terms of the first $j$ derivative tensors of $F$ and
2061the ``input'' coefficients $x_i$ with $i\leq j$ grows very rapidly
2062with $j$. Fortunately, this exponential growth does not occur
2063in automatic differentiation, where the many terms are somehow
2064implicitly combined  so that storage and operations count grow only
2065quadratically in the bound $d$ on $j$.
2066
2067Provided $F$ is analytic, this property is inherited by the functions
2068\[
2069y_j = y_j (x_0,x_1, \ldots ,x_j) \in {I\!\!R}^m ,
2070\]
2071and their derivatives satisfy the identities
2072\begin{equation}
2073\label{eq:identity}
2074\frac{\partial y_j}{\partial x_i}  = \frac{\partial y_{j-i}}
2075{\partial x_0} = A_{j-i}(x_0,x_1, \ldots ,x_{j-i})
2076\end{equation}
2077as established in \cite{Chri91a}. This yields in particular
2078\begin{align*}
2079  \frac{\partial y_0}{\partial x_0} =
2080  \frac{\partial y_1}{\partial x_1} =
2081  \frac{\partial y_2}{\partial x_2} =
2082  \frac{\partial y_3}{\partial x_3} =
2083  A_0 & = F'(x_0) \\
2084  \frac{\partial y_1}{\partial x_0} =
2085  \frac{\partial y_2}{\partial x_1} =
2086  \frac{\partial y_3}{\partial x_2} =
2087  A_1 & = F''(x_0) x_1 \\
2088  \frac{\partial y_2}{\partial x_0} =
2089  \frac{\partial y_3}{\partial x_1} =
2090  A_2 & = F''(x_0) x_2 + \frac{1}{2}F'''(x_0)x_1 x_1 \\
2091  \frac{\partial y_3}{\partial x_0} =
2092  A_3 & = F''(x_0) x_3 + F'''(x_0)x_1 x_2
2093          + \frac{1}{6}F^{(4)}(x_0)x_1 x_1 x_1 \\
2094  & \ldots
2095\end{align*}
2096The $m \times n$ matrices $A_k, k=0,\ldots,d$, are actually the Taylor
2097coefficients of the Jacobian path $F^\prime(x(t))$, a fact that is of
2098interest primarily in the context of ordinary differential
2099equations and differential algebraic equations.
2100
2101Given the tape of an active section and the coefficients $x_j$,
2102the resulting $y_j$ and their derivatives $A_j$ can be evaluated
2103by appropriate calls to the ADOL-C forward mode implementations and
2104the ADOL-C reverse mode implementations. The scalar versions of the forward
2105mode propagate just one truncated Taylor series from the $(x_j)_{j\leq d}$
2106to the $(y_j)_{j\leq d}$. The vector versions of the forward
2107mode propagate families of $p\geq 1$ such truncated Taylor series
2108in order to reduce the relative cost of the overhead incurred
2109in the tape interpretation. In detail, ADOL-C provides
2110\begin{tabbing}
2111\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2112\>{\sf int zos\_forward(tag,m,n,keep,x,y)}\\
2113\>{\sf short int tag;}         \> // tape identification \\
2114\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2115\>{\sf int n;}                 \> // number of independent variables $n$\\
2116\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2117\>{\sf double x[n];}           \> // independent vector $x=x_0$\\
2118\>{\sf double y[m];}           \> // dependent vector $y=F(x_0)$
2119\end{tabbing}                 
2120for the {\bf z}ero-{\bf o}rder {\bf s}calar forward mode. This driver computes
2121$y=F(x)$ with $0\leq\text{\sf keep}\leq 1$. The integer
2122flag {\sf keep} plays a similar role as in the call to 
2123{\sf trace\_on}: It determines if {\sf zos\_forward} writes
2124the first Taylor coefficients of all intermediate quantities into a buffered
2125temporary file, i.e., the value stack, in preparation for a subsequent
2126reverse mode evaluation. The value {\sf keep} $=1$
2127prepares for {\sf fos\_reverse} or {\sf fov\_reverse} as exlained below.
2128
2129To compute first-order derivatives, one has
2130\begin{tabbing}
2131\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2132\>{\sf int fos\_forward(tag,m,n,keep,x0,x1,y0,y1)}\\
2133\>{\sf short int tag;}         \> // tape identification \\
2134\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2135\>{\sf int n;}                 \> // number of independent variables $n$\\
2136\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2137\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2138\>{\sf double x1[n];}          \> // tangent vector $x_1$\\
2139\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2140\>{\sf double y1[m];}          \> // first derivative $y_1=F'(x_0)x_1$
2141\end{tabbing}                 
2142for the {\bf f}irst-{\bf o}rder {\bf s}calar forward mode. Here, one has
2143$0\leq\text{\sf keep}\leq 2$, where
2144\begin{align*}
2145\text{\sf keep} = \left\{\begin{array}{cl}
2146       1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2147       2 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2148       \end{array}\right.
2149\end{align*}
2150as exlained below. For the {\bf f}irst-{\bf o}rder {\bf v}ector forward mode,
2151ADOL-C provides
2152\begin{tabbing}
2153\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2154\>{\sf int fov\_forward(tag,m,n,p,x0,X,y0,Y)}\\
2155\>{\sf short int tag;}         \> // tape identification \\
2156\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2157\>{\sf int n;}                 \> // number of independent variables $n$\\
2158\>{\sf int p;}                 \> // number of directions\\
2159\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2160\>{\sf double X[n][p];}        \> // tangent matrix $X$\\
2161\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2162\>{\sf double Y[m][p];}        \> // first derivative matrix $Y=F'(x)X$
2163\end{tabbing}                 
2164For the computation of higher derivative, the driver
2165\begin{tabbing}
2166\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2167\>{\sf int hos\_forward(tag,m,n,d,keep,x0,X,y0,Y)}\\
2168\>{\sf short int tag;}         \> // tape identification \\
2169\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2170\>{\sf int n;}                 \> // number of independent variables $n$\\
2171\>{\sf int d;}                 \> // highest derivative degree $d$\\
2172\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2173\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2174\>{\sf double X[n][d];}        \> // tangent matrix $X$\\
2175\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2176\>{\sf double Y[m][d];}        \> // derivative matrix $Y$
2177\end{tabbing}
2178implementing the  {\bf h}igher-{\bf o}rder {\bf s}calar forward mode.
2179The rows of the matrix $X$ must correspond to the independent variables in the order of their
2180initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2181$X = \{x_j\}_{j=1\ldots d}$ represent Taylor coefficient vectors as in
2182\eqref{eq:x_of_t}. The rows of the matrix $Y$ must correspond to the
2183dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2184The columns of $Y = \{y_j\}_{j=1\ldots d}$ represent
2185Taylor coefficient vectors as in \eqref{eq:series}, i.e., {\sf hos\_forward}
2186computes the values
2187$y_0=F(x_0)$, $y_1=F'(x_0)x_1$, \ldots, where
2188$X=[x_1,x_2,\ldots,x_d]$ and  $Y=[y_1,y_2,\ldots,y_d]$. Furthermore, one has
2189$0\leq\text{\sf keep}\leq d+1$, with
2190\begin{align*}
2191\text{\sf keep}  \left\{\begin{array}{cl}
2192       = 1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2193       > 1 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2194       \end{array}\right.
2195\end{align*}
2196Once more, there is also a vector version given by
2197\begin{tabbing}
2198\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2199\>{\sf int hov\_forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2200\>{\sf short int tag;}         \> // tape identification \\
2201\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2202\>{\sf int n;}                 \> // number of independent variables $n$\\
2203\>{\sf int d;}                 \> // highest derivative degree $d$\\
2204\>{\sf int p;}                 \> // number of directions $p$\\
2205\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2206\>{\sf double X[n][p][d];}     \> // tangent matrix $X$\\
2207\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2208\>{\sf double Y[m][p][d];}     \> // derivative matrix $Y$
2209\end{tabbing}
2210for the  {\bf h}igher-{\bf o}rder {\bf v}ector forward mode that computes
2211$y_0=F(x_0)$, $Y_1=F'(x_0)X_1$, \ldots, where $X=[X_1,X_2,\ldots,X_d]$ and 
2212$Y=[Y_1,Y_2,\ldots,Y_d]$.
2213
2214There are also overloaded versions providing a general {\sf forward}-call.
2215Details of the appropriate calling sequences are given in \autoref{forw_rev}.
2216
2217Once, the required information is generated due to a forward mode evaluation
2218with an approriate value of the parameter {\sf keep}, one may use the
2219following implementation variants of the reverse mode. To compute first-order derivatives
2220one can use
2221\begin{tabbing}
2222\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2223\>{\sf int fos\_reverse(tag,m,n,u,z)}\\
2224\>{\sf short int tag;}         \> // tape identification \\
2225\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2226\>{\sf int n;}                 \> // number of independent variables $n$\\
2227\>{\sf double u[m];}           \> // weight vector $u$\\
2228\>{\sf double z[n];}           \> // resulting adjoint value $z^T=u^T F'(x)$
2229\end{tabbing}                 
2230as {\bf f}irst-{\bf o}rder {\bf s}calar reverse mode implementation that computes
2231the product $z^T=u^T F'(x)$ after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2232{\sf hos\_forward} with {\sf keep}=1. The corresponding {\bf f}irst-{\bf
2233  o}rder {\bf v}ector reverse mode driver is given by
2234\begin{tabbing}
2235\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2236\>{\sf int fov\_reverse(tag,m,n,q,U,Z)}\\
2237\>{\sf short int tag;}         \> // tape identification \\
2238\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2239\>{\sf int n;}                 \> // number of independent variables $n$\\
2240\>{\sf int q;}                 \> // number of weight vectors $q$\\
2241\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2242\>{\sf double Z[q][n];}        \> // resulting adjoint $Z=U F'(x)$
2243\end{tabbing}                 
2244that can be used after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2245{\sf hos\_forward} with {\sf keep}=1. To compute higher-order derivatives,
2246ADOL-C provides
2247\begin{tabbing}
2248\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2249\>{\sf int hos\_reverse(tag,m,n,d,u,Z)}\\
2250\>{\sf short int tag;}         \> // tape identification \\
2251\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2252\>{\sf int n;}                 \> // number of independent variables $n$\\
2253\>{\sf int d;}                 \> // highest derivative degree $d$\\
2254\>{\sf double u[m];}           \> // weight vector $u$\\
2255\>{\sf double Z[n][d+1];}      \> // resulting adjoints
2256\end{tabbing}                 
2257as {\bf h}igher-{\bf o}rder {\bf s}calar reverse mode implementation yielding
2258the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2259\ldots, where $Z=[z_0,z_1,\ldots,z_d]$ after calling  {\sf fos\_forward} or
2260{\sf hos\_forward} with {\sf keep} $=d+1>1$. The vector version is given by
2261\begin{tabbing}
2262\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2263\>{\sf int hov\_reverse(tag,m,n,d,q,U,Z,nz)}\\
2264\>{\sf short int tag;}         \> // tape identification \\
2265\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2266\>{\sf int n;}                 \> // number of independent variables $n$\\
2267\>{\sf int d;}                 \> // highest derivative degree $d$\\
2268\>{\sf double U[q][m];}        \> // weight vector $u$\\
2269\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints\\
2270\>{\sf short int nz[q][n];}    \> // nonzero pattern of {\sf Z}
2271\end{tabbing}                 
2272as {\bf h}igher-{\bf o}rder {\bf v}ector reverse mode driver to compute
2273the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2274\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$ after calling  {\sf fos\_forward} or
2275{\sf hos\_forward} with {\sf keep} $=d+1>1$.
2276After the function call, the last argument of {\sf hov\_reverse} 
2277contains information about the sparsity pattern, i.e. each {\sf nz[i][j]}
2278has a value that characterizes the functional relation between the
2279$i$-th component of $UF^\prime(x)$ and the $j$-th independent value
2280$x_j$ as:
2281\begin{center}
2282\begin{tabular}{ll}
2283 0 & trivial \\
2284 1 & linear
2285\end{tabular} \hspace*{4ex}
2286\begin{tabular}{ll}
2287 2 & polynomial\\
2288 3 & rational
2289\end{tabular} \hspace*{4ex}
2290\begin{tabular}{ll}
2291 4 & transcendental\\
2292 5 & non-smooth
2293\end{tabular}
2294\end{center}
2295Here, ``trivial'' means that there is no dependence at all and ``linear'' means
2296that the partial derivative is a constant that
2297does not dependent on other variables either. ``Non-smooth'' means that one of
2298the functions on the path between $x_i$ and $y_j$ was evaluated at a point
2299where it is not differentiable.  All positive labels
2300$1, 2, 3, 4, 5$ are pessimistic in that the actual functional relation may
2301in fact be simpler, for example due to exact cancellations. 
2302
2303There are also overloaded versions providing a general {\sf reverse}-call.
2304Details of the appropriate calling sequences are given in the following \autoref{forw_rev}.
2305%
2306%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2307\section{Overloaded Forward and Reverse Calls}
2308\label{forw_rev}
2309%
2310In this section, the several versions of the {\sf forward} and
2311{\sf reverse} routines, which utilize the overloading capabilities
2312of C++, are described in detail. With exception of the bit pattern
2313versions all interfaces are prototyped in the header file
2314\verb=<adolc/interfaces.h>=, where also some more specialized {\sf forward}
2315and {\sf reverse} routines are explained. Furthermore, \mbox{ADOL-C} provides
2316C and Fortran-callable versions prototyped in the same header file.
2317The bit pattern versions of {\sf forward} and {\sf reverse} introduced
2318in the \autoref{ProBit} are prototyped in the header file
2319\verb=<adolc/sparse/sparsedrivers.h>=, which will be included by the header
2320file \verb=<adolc/interfaces.h>= automatically.
2321%
2322%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2323\subsection{The Scalar Case}
2324%
2325\label{ScaCas}
2326%     
2327Given any correct tape, one may call from within
2328the generating program, or subsequently during another run, the following
2329procedure:
2330%
2331\begin{tabbing}
2332\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2333\>{\sf int forward(tag,m,n,d,keep,X,Y)} \\
2334\>{\sf short int tag;}         \> // tape identification \\
2335\>{\sf int m;}                 \> // number of dependent variables $m$\\
2336\>{\sf int n;}                 \> // number of independent variables $n$\\
2337\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2338\>{\sf  int keep;}             \> // flag for reverse sweep \\ 
2339\>{\sf  double X[n][d+1];}     \> // Taylor coefficients $X$ of
2340                                     independent variables \\
2341\>{\sf double Y[m][d+1];}      \> // Taylor coefficients $Y$ as
2342                                     in \eqref{eq:series}
2343\end{tabbing}
2344%
2345The rows of the matrix $X$ must correspond to the independent variables in the order of their
2346initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2347$X = \{x_j\}_{j=0\ldots d}$ represent Taylor coefficient vectors as in
2348\eqref{eq:x_of_t}. The rows of the matrix $Y$ must
2349correspond to the
2350dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2351The columns of $Y = \{y_j\}_{j=0\ldots d}$ represent
2352Taylor coefficient vectors as in \eqref{eq:series}.
2353Thus the first column of $Y$ contains the
2354function value $F(x)$ itself, the next column represents the first
2355Taylor coefficient vector of $F$, and the last column the
2356$d$-th Taylor coefficient vector. The integer flag {\sf keep} determines
2357how many Taylor coefficients of all intermediate quantities are
2358written into the value stack as explained in \autoref{forw_rev_ad}.
2359 If {\sf keep} is omitted, it defaults to 0.
2360
2361The given {\sf tag} value is used by {\sf forward} to determine the
2362name of the file on which the tape was written. If the tape file does
2363not exist, {\sf forward} assumes that the relevant
2364tape is still in core and reads from the buffers.
2365After the execution of an active section with \mbox{{\sf keep} = 1} or a call to
2366{\sf forward} with any {\sf keep} $\leq$ $d+1$, one may call
2367the function {\sf reverse} with \mbox{{\sf d} = {\sf keep} $-$ 1} and the same tape
2368identifier {\sf tag}. When $u$ is a vector
2369and $Z$ an $n\times (d+1)$ matrix
2370{\sf reverse} is executed in the scalar mode by the calling
2371sequence
2372%
2373\begin{tabbing}
2374\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position             
2375\>{\sf int reverse(tag,m,n,d,u,Z)}\\
2376\>{\sf short int tag;}         \> // tape identification \\
2377\>{\sf int m;}                 \> // number of dependent variables $m$\\
2378\>{\sf int n;}                 \> // number of independent variables $n$\\
2379\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2380\>{\sf  double u[m];}          \> // weighting vector $u$\\
2381\>{\sf double Z[n][d+1];}      \> // resulting adjoints $Z$ 
2382\end{tabbing}
2383to compute
2384the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2385\ldots, where $Z=[z_0,z_1,\ldots,z_d]$.
2386%
2387%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2388\subsection{The Vector Case}
2389%
2390\label{vecCas}
2391%
2392When $U$ is a matrix {\sf reverse} is executed in the vector mode by the following calling sequence
2393%
2394\begin{tabbing}
2395\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position       
2396\>{\sf int reverse(tag,m,n,d,q,U,Z,nz)}\\
2397\>{\sf short int tag;}         \> // tape identification \\
2398\>{\sf int m;}                 \> // number of dependent variables $m$\\
2399\>{\sf int n;}                 \> // number of independent variables $n$\\
2400\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2401\>{\sf int q;}                 \> // number of weight vectors $q$\\
2402\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2403\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints \\
2404\>{\sf short nz[q][n];}        \> // nonzero pattern of {\sf Z}
2405\end{tabbing}
2406%
2407to compute the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2408\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$.
2409When the arguments {\sf p} and {\sf U} are omitted, they default to
2410$m$ and the identity matrix of order $m$, respectively. 
2411
2412Through the optional argument {\sf nz} of {\sf reverse} one can compute
2413information about the sparsity pattern of $Z$ as described in detail
2414in the previous \autoref{forw_rev_ad}.
2415
2416The return values of {\sf reverse} calls can be interpreted according
2417to \autoref{retvalues}, but negative return values are not
2418valid, since the corresponding forward sweep would have
2419stopped without completing the necessary taylor file.
2420The return value of {\sf reverse} may be higher
2421than that of the preceding {\sf forward} call because some operations
2422that were evaluated  at a critical argument during the forward sweep
2423were found not to impact the dependents during the reverse sweep.
2424
2425In both scalar and vector mode, the degree $d$ must agree with
2426{\sf keep}~$-$~1 for the most recent call to {\sf forward}, or it must be
2427equal to zero if {\sf reverse} directly follows the taping of an active
2428section. Otherwise, {\sf reverse} will return control with a suitable error
2429message.
2430In order to avoid possible confusion, the first four arguments must always be
2431present in the calling sequence. However, if $m$ or $d$
2432attain their trivial values 1 and 0, respectively, then
2433corresponding dimensions of the arrays {\sf X}, {\sf Y}, {\sf u},
2434{\sf U}, or {\sf Z} can be omitted, thus eliminating one level of
2435indirection.  For example, we may call
2436{\sf reverse(tag,1,n,0,1.0,g)} after declaring
2437{\sf double g[n]} 
2438to calculate a gradient of a scalar-valued function.
2439
2440Sometimes it may be useful to perform a forward sweep for families of
2441Taylor series with the same leading term.
2442This vector version of {\sf forward} can be called in the form
2443%
2444\begin{tabbing}
2445\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2446\>{\sf int forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2447\>{\sf short int tag;}         \> // tape identification \\
2448\>{\sf int m;}                 \> // number of dependent variables $m$\\
2449\>{\sf int n;}                 \> // number of independent variables $n$\\
2450\>{\sf int d;}                 \> // highest derivative degree $d$\\
2451\>{\sf int p;}                 \> // number of Taylor series $p$\\
2452\>{\sf  double x0[n];}          \> // values of independent variables $x_0$\\
2453\>{\sf double X[n][p][d];}     \> // Taylor coefficients $X$ of independent variables\\
2454\>{\sf double y0[m];}           \> // values of dependent variables $y_0$\\
2455\>{\sf double Y[m][p][d];}     \> // Taylor coefficients $Y$ of dependent variables
2456\end{tabbing}
2457%
2458where {\sf X} and {\sf Y} hold the Taylor coefficients of first
2459and higher degree and {\sf x0}, {\sf y0} the common Taylor coefficients of
2460degree 0. There is no option to keep the values of active variables
2461that are going out of scope or that are overwritten. Therefore this
2462function cannot prepare a subsequent reverse sweep.
2463The return integer serves as a flag to indicate quadratures or altered
2464comparisons as described above in \autoref{reuse_tape}.
2465
2466Since the calculation of Jacobians is probably the most important
2467automatic differentia\-tion task, we have provided a specialization
2468of vector {\sf forward} to the case where $d = 1$. This version can be
2469called in the form
2470%
2471\begin{tabbing}
2472\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2473\>{\sf int forward(tag,m,n,p,x,X,y,Y)}\\
2474\>{\sf short int tag;}         \> // tape identification \\
2475\>{\sf int m;}                 \> // number of dependent variables $m$\\
2476\>{\sf int n;}                 \> // number of independent variables $n$\\
2477\>{\sf int p;}                 \> // number of partial derivatives $p$ \\
2478\>{\sf double x[n];}          \> // values of independent variables $x_0$\\
2479\>{\sf double X[n][p];}        \> // seed derivatives of independent variables $X$\\
2480\>{\sf double y[m];}           \> // values of dependent variables $y_0$\\
2481\>{\sf double Y[m][p];}        \> // first derivatives of dependent variables $Y$
2482\end{tabbing}
2483%
2484When this routine is called with {\sf p} = {\sf n} and {\sf X} the identity matrix,
2485the resulting {\sf Y} is simply the Jacobian $F^\prime(x_0)$. In general,
2486one obtains the $m\times p$ matrix $Y=F^\prime(x_0)\,X $ for the
2487chosen initialization of $X$. In a workstation environment a value
2488of $p$ somewhere between $10$ and $50$
2489appears to be fairly optimal. For smaller $p$ the interpretive
2490overhead is not appropriately amortized, and for larger $p$ the
2491$p$-fold increase in storage causes too many page faults. Therefore,
2492large Jacobians that cannot be compressed via column coloring
2493as could be done for example using the driver {\sf sparse\_jac}
2494should be ``strip-mined'' in the sense that the above
2495first-order-vector version of {\sf forward} is called
2496repeatedly with the successive \mbox{$n \times p$} matrices $X$ forming
2497a partition of the identity matrix of order $n$.
2498%
2499%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2500\subsection{Dependence Analysis}
2501%
2502\label{ProBit}
2503%
2504The sparsity pattern of Jacobians is often needed to set up data structures
2505for their storage and factorization or to allow their economical evaluation
2506by compression \cite{BeKh96}. Compared to the evaluation of the full
2507Jacobian $F'(x_0)$ in real arithmetic computing the Boolean matrix
2508$\tilde{P}\in\left\{0,1\right\}^{m\times n}$ representing its sparsity
2509pattern in the obvious way requires a little less run-time and
2510certainly a lot less memory.
2511
2512The entry $\tilde{P}_{ji}$ in the $j$-th row and $i$-th column
2513of $\tilde{P}$ should be $1 = true$ exactly when there is a data
2514dependence between the $i$-th independent variable $x_{i}$ and
2515the $j$-th dependent variable $y_{j}$. Just like for real arguments
2516one would wish to compute matrix-vector and vector-matrix products
2517of the form $\tilde{P}\tilde{v}$ or $\tilde{u}^{T}\tilde{P}$ 
2518by appropriate {\sf forward} and {\sf reverse} routines where
2519$\tilde{v}\in\{0,1\}^{n}$ and $\tilde{u}\in\{0,1\}^{m}$.
2520Here, multiplication corresponds to logical
2521{\sf AND} and addition to logical {\sf OR}, so that algebra is performed in a
2522semi-ring.
2523
2524For practical reasons it is assumed that
2525$s=8*${\sf sizeof}$(${\sf unsigned long int}$)$ such Boolean vectors
2526$\tilde{v}$ and $\tilde{u}$ are combined to integer vectors
2527$v\in\N^{n}$ and $u\in\N^{m}$ whose components can be interpreted
2528as bit patterns. Moreover $p$ or $q$ such integer vectors may
2529be combined column-wise or row-wise to integer matrices $X\in\N^{n \times p}$ 
2530and $U\in\N^{q \times m}$, which naturally correspond
2531to Boolean matrices $\tilde{X}\in\{0,1\}^{n\times\left(sp\right)}$
2532and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times m}$. The provided
2533bit pattern versions of {\sf forward} and {\sf reverse} allow
2534to compute integer matrices $Y\in\N^{m \times p}$ and
2535$Z\in\N^{q \times m}$ corresponding to
2536\begin{equation}
2537\label{eq:int_forrev}
2538\tilde{Y} = \tilde{P}\tilde{X} \qquad \mbox{and} \qquad 
2539\tilde{Z} = \tilde{U}\tilde{P} \, ,
2540\end{equation} 
2541respectively, with $\tilde{Y}\in\{0,1\}^{m\times\left(sp\right)}$
2542and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times n}$.
2543In general, the application of the bit pattern versions of
2544{\sf forward} or {\sf reverse} can be interpreted as
2545propagating dependences between variables forward or backward, therefore
2546both the propagated integer matrices and the corresponding
2547Boolean matrices are called {\em dependence structures}.
2548 
2549The bit pattern {\sf forward} routine
2550%
2551\begin{tabbing}
2552\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2553\>{\sf int forward(tag,m,n,p,x,X,y,Y,mode)}\\
2554\>{\sf short int tag;}              \> // tape identification \\
2555\>{\sf int m;}                      \> // number of dependent variables $m$\\
2556\>{\sf int n;}                      \> // number of independent variables $n$\\
2557\>{\sf int p;}                      \> // number of integers propagated $p$\\
2558\>{\sf double x[n];}                \> // values of independent variables $x_0$\\
2559\>{\sf unsigned long int X[n][p];}  \> // dependence structure $X$ \\
2560\>{\sf double y[m];}                \> // values of dependent variables $y_0$\\
2561\>{\sf unsigned long int Y[m][p];}  \> // dependence structure $Y$ according to
2562                                     \eqref{eq:int_forrev}\\
2563\>{\sf char mode;}                  \> // 0 : safe mode (default), 1 : tight mode
2564\end{tabbing}
2565%
2566can be used to obtain the dependence structure $Y$ for a given dependence structure
2567$X$. The dependence structures are
2568represented as arrays of {\sf unsigned long int} the entries of which are
2569interpreted as bit patterns as described above.   
2570For example, for $n=3$ the identity matrix $I_3$ should be passed
2571with $p=1$ as the $3 \times 1$ array
2572\begin{eqnarray*}
2573{\sf X} \; = \;
2574\left( \begin{array}{r}
2575         {\sf 1}0000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2576         0{\sf 1}000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2577         00{\sf 1}00000 \: 00000000 \: 00000000 \: 00000000_2
2578       \end{array} \right)
2579\end{eqnarray*}
2580in the 4-byte long integer format. The parameter {\sf mode} determines
2581the mode of dependence analysis as explained already in \autoref{sparse}.
2582
2583A call to the corresponding bit pattern {\sf reverse} routine
2584%
2585\begin{tabbing}
2586\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2587\>{\sf int reverse(tag,m,n,q,U,Z,mode)}\\
2588\>{\sf short int tag;}         \> // tape identification \\
2589\>{\sf int m;}                 \> // number of dependent variables $m$\\
2590\>{\sf int n;}                 \> // number of independent variables $n$\\
2591\>{\sf int q;}                 \> // number of integers propagated q\\
2592\>{\sf unsigned long int U[q][m];}  \> // dependence structure $U$ \\
2593\>{\sf unsigned long int Z[q][n];}  \> // dependence structure $Z$ according
2594                                     to \eqref{eq:int_forrev}\\
2595\>{\sf char mode;}        \> // 0 : safe mode (default), 1 : tight mode
2596\end{tabbing}
2597%
2598yields the dependence structure $Z$ for a given dependence structure
2599$U$.
2600
2601To determine the whole sparsity pattern $\tilde{P}$ of the Jacobian $F'(x)$
2602as an integer matrix $P$ one may call {\sf forward} or {\sf reverse} 
2603with $p \ge n/s$ or $q \ge m/s$, respectively. For this purpose the
2604corresponding dependence structure $X$ or $U$ must be defined to represent 
2605the identity matrix of the respective dimension.
2606Due to the fact that always a multiple of $s$ Boolean vectors are propagated
2607there may be superfluous vectors, which can be set to zero.
2608
2609The return values of the bit pattern {\sf forward} and {\sf reverse} routines
2610correspond to those described in \autoref{retvalues}.
2611
2612One can control the storage growth by the factor $p$ using
2613``strip-mining'' for the calls of {\sf forward} or {\sf reverse} with successive
2614groups of columns or respectively rows at a time, i.e.~partitioning
2615$X$ or $U$ appropriately as described for the computation of Jacobians
2616in \autoref{vecCas}.
2617%
2618%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2619%
2620\section{Advanced algorithmic differentiation in ADOL-C}
2621\label{adv_ad}
2622%
2623\subsection{Differentiating external functions}
2624%
2625Ideally, AD is applied to a given computation as a whole.
2626This is not always possible because parts of the computation may
2627be coded in a different programming language or may a call to
2628an external library.
2629In the former case one may want to differentiate the parts in
2630question with a different AD tool or provide hand written derivatives.
2631To integrate these
2632In practice, however, sophisticated projects usually evolve over a long period of time.
2633Within this process, a heterogeneous code base for the project
2634develops, which may include the incorporation of external solutions,
2635changes in programming paradigms or even of programming languages.
2636Equally heterogeneous, the computation of derivative values appears.
2637Hence, different \mbox{AD-tools} may be combined with hand-derived
2638codes based on the same or different programming languages.
2639ADOL-C supports such settings  by  the concept of externally
2640differentiated functions, that is, a function
2641not differentiated by ADOL-C itself. The required derivatives
2642have to be provided by the user.
2643
2644For this purpose, it is required that the externally differentiated
2645function (for example named {\sf\em euler\_step} ) has the following signature.
2646\smallskip
2647
2648\noindent
2649\hspace*{2cm}{\sf int euler\_step(int n, double *x, int m, double *y);}
2650\medskip
2651
2652\noindent
2653Note that the formal paraemters in the signature have {\sf double} type, that is,
2654they are not active as in the original program before the ADOL-C type change.
2655The externally differentiated function has to
2656be {\em registered}\footnote{we record the function pointer} using an \mbox{ADOL-C} method as follows.
2657\smallskip
2658
2659\noindent
2660\hspace*{2cm}{\sf ext\_diff\_fct *edf = reg\_ext\_fct(euler\_step);}.
2661\smallskip
2662
2663\noindent
2664This returns a pointer to an {\sf ext\_diff\_fct} instance specific to the registered function.
2665Then, the user has to supply the function pointers for the call back methods (for example here
2666{\sf zos\_for\_euler\_step} and {\sf  fos\_rev\_euler\_step}) the user implemented
2667to compute the derivatives as follows.
2668\begin{tabbing}
2669\hspace*{2cm}\= {\sf edf-$>$zos\_forward = {\em zos\_for\_euler\_step};}\\
2670             \> {\sf // function pointer for computing
2671               Zero-Order-Scalar (=zos)}\\
2672             \> {\sf // forward information}\\
2673%             \> {\sf edf-$>$dp\_x = xp;}\\
2674%             \> {\sf edf-$>$dp\_y = yp;}\\
2675%             \> {\sf // double arrays for arguments and results}\\
2676             \> {\sf edf-$>$fos\_reverse = fos\_rev\_euler\_step;} \\
2677             \> {\sf // function pointer for computing
2678               First-Order-Scalar (=fos)}\\ 
2679             \> {\sf reverse information}
2680\end{tabbing}
2681To facilitate the switch between active and passive versions of the parameters {\sf x} and {\sf y}
2682one has to provide (allocate) both variants. I.e. if the call to {\sf euler\_step} was originally
2683\noindent
2684\hspace*{2cm}{\sf rc=euler\_step(n, sOld, m, sNew);}
2685\medskip
2686then the ADOL-C typechange for the calling context will turn  {\sf sOld} and {\sf sNew} in {\sf adouble} pointers.
2687To trigger the appropriate action for the derivative computation (i.e. creating an external differentiation entry on the trace)
2688the original call to the externally differentiated function
2689must be substituted by
2690\medskip
2691
2692\noindent
2693\hspace*{2cm}{\sf rc=call\_ext\_fct(edf, n, sOldPassive, sOld, m, sNewPassive, sNew);}
2694\medskip
2695
2696\noindent
2697Here, {\sf sOldPassive} and {\sf sNewPassive} are the passive counterparts ({\sf double} pointers allocated to length {\sf n} and {\sf m}, respectively) 
2698to the active arguments {\sf sNew} in {\sf adouble}.
2699The usage of the external function facility is illustrated by the
2700example \verb=ext_diff_func= contained in
2701\verb=examples/additional_examples/ext_diff_func=.
2702There,the external differentiated function is also a C code, but the
2703handling as external differentiated functions also a decrease of the
2704overall required tape size.
2705
2706%
2707\subsection{Advanced algorithmic differentiation of time integration processes}
2708%
2709For many time-dependent applications, the corresponding simulations
2710are based on ordinary or partial differential equations.
2711Furthermore, frequently there are quantities that influence the
2712result of the simulation and can be seen as  control of the systems.
2713To compute an approximation of the
2714simulated process for a time interval $[0,T]$ and evaluated the
2715desired target function, one applies an
2716appropriate integration scheme given by
2717\begin{tabbing}
2718\hspace{5mm} \= some initializations yielding $x_0$\\
2719\> for $i=0,\ldots, N-1$\\
2720\hspace{10mm}\= $x_{i+1} = F(x_i,u_i,t_i)$\\
2721\hspace{5mm} \= evaluation of the target function
2722\end{tabbing}
2723where $x_i\in {\bf R}^n$ denotes the state and $u_i\in {\bf R}^m$ the control at
2724time $t_i$ for a given time grid $t_0,\ldots,t_N$ with $t_0=0$ and
2725$t_N=T$. The operator $F : {\bf R}^n \times {\bf R}^m \times {\bf R} \mapsto {\bf R}^n$
2726defines the time step to compute the state at time $t_i$. Note that we
2727do not assume a uniform grid.
2728
2729When computing derivatives of the target function with respect to the
2730control, the consequences for the tape generation using the ``basic''
2731taping approach as implemented in ADOL-C so far are shown in the left part of
2732\autoref{fig:bas_tap}.
2733\begin{figure}[htbp] 
2734\begin{center}
2735\hspace*{0.5cm}\includegraphics[width=5.8cm]{tapebasic}\hfill
2736\includegraphics[width=5.8cm]{tapeadv} \hspace*{0.5cm}\
2737\end{center}
2738\hspace*{0.8cm} Basic taping process \hspace*{4.3cm} Advanced taping process
2739\caption{Different taping approaches}
2740\label{fig:bas_tap}
2741\end{figure} 
2742As can be seen, the iterative process is completely
2743unrolled due to the taping process. That is, the tape contains an internal representation of each
2744time step. Hence, the overall tape comprises a serious amount of redundant
2745information as illustrated by the light grey rectangles in
2746\autoref{fig:bas_tap}
2747
2748To overcome the repeated storage of essentially the same information,
2749a {\em nested taping} mechanism has been incorporated into ADOL-C as illustrated on
2750the right-hand side of \autoref{fig:bas_tap}. This new
2751capability allows the encapsulation of the time-stepping procedure
2752such that only the last time step $x_{N} = F(x_{N-1},u_{N-1})$ is taped as one
2753representative of the time steps in addition to a function pointer to the
2754evaluation procedure $F$ of the time steps.  The function pointer has
2755to be stored for a possibly necessary retaping during the derivative calculation
2756as explained below.
2757
2758Instead of storing the complete tape, only a very limited number of intermediate
2759states are kept in memory. They serve as checkpoints, such that
2760the required information for the backward integration is generated
2761piecewise during the adjoint calculation.
2762For this modified adjoint computation the optimal checkpointing schedules
2763provided by {\bf revolve} are employed. An adapted version of the
2764software package {\sf revolve} is part of ADOL-C and automatically
2765integrated in the ADOL-C library. Based on {\sf revolve}, $c$ checkpoints are
2766distributed such that computational effort is minimized for the given
2767number of checkpoints and time steps $N$. It is important to note that the overall tape
2768size is drastically reduced due to the advanced taping strategy.  For the
2769implementation of this nested taping we introduced
2770a so-called ``differentiating context'' that enables \mbox{ADOL-C} to
2771handle different internal function representations during the taping
2772procedure and the derivative calculation. This approach allows the generation of a new
2773tape inside the overall tape, where the coupling of the different tapes is based on
2774the {\em external differentiated function} described above.
2775
2776Written under the objective of minimal user effort, the checkpointing routines
2777of \mbox{ADOL-C} need only very limited information. The user must
2778provide two routines as implementation of the time-stepping function $F$ 
2779with the signatures
2780\medskip
2781
2782\noindent
2783\hspace*{2cm}{\sf int time\_step\_function(int n, adouble *u);}\\
2784\hspace*{2cm}{\sf int time\_step\_function(int n, double *u);}
2785\medskip
2786
2787\noindent
2788where the function names can be chosen by the user as long as the names are
2789unique.It is possible that the result vector of one time step
2790iteration overwrites the argument vector of the same time step. Then, no
2791copy operations are required to prepare the next time step.
2792
2793At first, the {\sf adouble} version of the time step function has to
2794be {\em registered} using the \mbox{ADOL-C} function
2795\medskip
2796
2797\noindent
2798\hspace*{2cm}{\sf CP\_Context cpc(time\_step\_function);}.
2799\medskip
2800
2801\noindent
2802This function initializes the structure {\sf cpc}. Then,
2803the user has to provide the remaining checkpointing information
2804by the following commands:
2805\begin{tabbing}
2806\hspace*{2cm}\= {\sf cpc.setDoubleFct(time\_step\_function);}\\
2807             \> {\sf // double variante of the time step function}\\
2808             \> {\sf cpc.setNumberOfSteps(N);}\\
2809             \> {\sf // number of time steps to perform}\\
2810             \> {\sf cpc.setNumberOfCheckpoints(10);}\\
2811             \> {\sf // number of checkpoint} \\
2812             \> {\sf cpc.setDimensionXY(n);}\\
2813             \> {\sf // dimension of input/output}\\
2814             \> {\sf cpc.setInput(y);}\\
2815             \> {\sf // input vector} \\
2816             \> {\sf cpc.setOutput(y);}\\
2817             \> {\sf // output vector }\\
2818             \> {\sf cpc.setTapeNumber(tag\_check);}\\
2819             \> {\sf // subtape number for checkpointing} \\
2820             \> {\sf cpc.setAlwaysRetaping(false);}\\
2821             \> {\sf // always retape or not ?}
2822\end{tabbing}
2823Subsequently, the time loop in the function evaluation can be
2824substituted by a call of the function
2825\medskip
2826
2827\noindent
2828\hspace*{2cm}{\sf int cpc.checkpointing();}
2829\medskip
2830
2831\noindent
2832Then, ADOL-C computes derivative information using the optimal checkpointing
2833strategy provided by {\sf revolve} internally, i.e., completely hidden from the user.
2834
2835The presented driver is prototyped in the header file
2836\verb=<adolc/checkpointing.h>=. This header
2837is included by the global header file \verb=<adolc/adolc.h>= automatically.
2838An example program \verb=checkpointing.cpp= illustrates the
2839checkpointing facilities. It can be found in the directory \verb=examples/additional_examples/checkpointing=.
2840%
2841%
2842%
2843\subsection{Advanced algorithmic differentiation of fixed point iterations}
2844%
2845Quite often, the state of the considered system denoted by $x\in\R^n$
2846depends on some design parameters denoted by $u\in\R^m$. One example for this setting
2847forms the flow over an aircraft wing. Here, the shape of the wing that
2848is defined by the design vector $u$ 
2849determines the flow field $x$. The desired quasi-steady state $x_*$
2850fulfills the fixed point equation
2851\begin{align}
2852  \label{eq:fixedpoint}
2853  x_* = F(x_*,u)
2854\end{align}
2855for a given continuously differentiable function
2856$F:\R^n\times\R^m\rightarrow\R^n$. A fixed point property of this kind is
2857also exploited by many other applications.
2858
2859Assume that one can apply the iteration 
2860\begin{align}
2861\label{eq:iteration}
2862 x_{k+1} = F(x_k,u)
2863\end{align}
2864to obtain a linear converging sequence $\{x_k\}$ generated
2865for any given control $u\in\R^n$. Then the limit point $x_*\in\R^n$ fulfils the fixed
2866point equation~\eqref{eq:fixedpoint}. Moreover,
2867suppose that $\|\frac{dF}{dx}(x_*,u)\|<1$ holds for any pair
2868$(x_*,u)$ satisfying equation \eqref{eq:fixedpoint}.
2869Hence, there exists a
2870differentiable function $\phi:\R^m \rightarrow \R^n$,
2871such that $\phi(u) = F(\phi(u),u)$, where the state
2872$\phi(u)$ is a fixed point of $F$ according to a control
2873$u$. To optimize the system described by the state vector $x=\phi(u)$ with respect to
2874the design vector $u$, derivatives of $\phi$ with respect
2875to $u$ are of particular interest.
2876
2877To exploit the advanced algorithmic differentiation  of such fixed point iterations
2878ADOL-C provides the special functions {\tt fp\_iteration(...)}.
2879It has the following interface:
2880\begin{tabbing}
2881\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2882\>{\sf int
2883  fp\_iteration(}\={\sf sub\_tape\_num,double\_F,adouble\_F,norm,norm\_deriv,eps,eps\_deriv,}\\
2884\>              \>{\sf N\_max,N\_max\_deriv,x\_0,u,x\_fix,dim\_x,dim\_u)}\\
2885\hspace{0.5in}\={\sf short int tag;} \hspace{0.9in}\= \kill    % define tab position
2886\>{\sf short int sub\_tape\_num;}         \> // tape identification for sub\_tape \\
2887\>{\sf int *double\_F;}         \> // pointer to a function that compute for $x$ and $u$ \\
2888\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
2889\>{\sf int *adouble\_F;}        \> // pointer to a function that compute for $x$ and $u$ \\
2890\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
2891\>{\sf int *norm;}              \> // pointer to a function that computes\\
2892\>                              \> // the norm of a vector\\
2893\>{\sf int *norm\_deriv;}       \> // pointer to a function that computes\\
2894\>                              \> // the norm of a vector\\
2895\>{\sf double eps;}             \> // termination criterion for fixed point iteration\\
2896\>{\sf double eps\_deriv;}      \> // termination criterion for adjoint fixed point iteration\\
2897\>{\sf N\_max;}                 \> // maximal number of itertions for state computation\\
2898\>{\sf N\_max\_deriv;}          \> // maximal number of itertions for adjoint computation\\
2899\>{\sf adouble *x\_0;}          \> // inital state of fixed point iteration\\
2900\>{\sf adouble *u;}             \> // value of $u$\\
2901\>{\sf adouble *x\_fic;}        \> // final state of fixed point iteration\\
2902\>{\sf int dim\_x;}             \> // dimension of $x$\\
2903\>{\sf int dim\_u;}             \> // dimension of $u$\\
2904\end{tabbing}
2905%
2906Here {\tt sub\_tape\_num} is an ADOL-C identifier for the subtape that
2907should be used for the fixed point iteration.
2908{\tt double\_F} and {\tt adouble\_F} are pointers to functions, that
2909compute for $x$ and $u$ a single iteration step $y=F(x,u)$. Thereby
2910{\tt double\_F} uses {\tt double} arguments and {\tt adouble\_F}
2911uses ADOL-C {\tt adouble} arguments. The parameters {\tt norm} and
2912{\tt norm\_deriv} are pointers to functions computing the norm
2913of a vector. The latter functions together with {\tt eps},
2914{\tt eps\_deriv}, {\tt N\_max}, and {\tt N\_max\_deriv} control
2915the iterations. Thus the following loops are performed:
2916\begin{center}
2917\begin{tabular}{ll}
2918  do                     &   do                           \\
2919  ~~~~$k = k+1$          &   ~~~~$k = k+1$                \\
2920  ~~~~$x = y$            &   ~~~~$\zeta = \xi$            \\
2921  ~~~~$y = F(x,u)$       &   ~~~
2922  $(\xi^T,\bar u^T) = \zeta^TF'(x_*,u) + (\bar x^T, 0^T)$ \\
2923  while $\|y-x\|\geq\varepsilon$ and $k\leq N_{max}$ \hspace*{0.5cm} &
2924  while $\|\xi -\zeta\|_{deriv}\geq\varepsilon_{deriv}$   \\
2925  & and $k\leq N_{max,deriv}$
2926\end{tabular}
2927\end{center}
2928The vector for the initial iterate and the control is stored
2929in {\tt x\_0} and {\tt u} respectively. The vector in which the
2930fixed point is stored is {\tt x\_fix}. Finally {\tt dim\_x}
2931and {\tt dim\_u} represent the dimensions $n$ and $m$ of the
2932corresponding vectors.
2933
2934The presented driver is prototyped in the header file
2935\verb=<adolc/fixpoint.h>=. This header
2936is included by the global header file \verb=<adolc/adolc.h>= automatically.
2937An example code that shows also the
2938expected signature of the function pointers is contained in the directory \verb=examples/additional_examples/fixpoint_exam=.
2939%
2940\subsection{Advanced algorithmic differentiation of OpenMP parallel programs}
2941%
2942ADOL-C allows to compute derivatives in parallel for functions
2943containing OpenMP parallel loops.
2944This implies that an explicit loop-handling approach is applied. A
2945typical situation is shown in \autoref{fig:basic_layout},
2946\begin{figure}[hbt]
2947    \vspace{3ex}
2948    \begin{center}
2949        \includegraphics[height=4cm]{multiplexed} \\
2950        \begin{picture}(0,0)
2951            \put(-48,40){\vdots}
2952            \put(48,40){\vdots}
2953            \put(-48,80){\vdots}
2954            \put(48,80){\vdots}
2955            \put(-83,132){function eval.}
2956            \put(5,132){derivative calcul.}
2957        \end{picture}
2958    \end{center}
2959    \vspace{-5ex}
2960    \caption{Basic layout of mixed function and the corresponding derivation process}
2961    \label{fig:basic_layout}
2962\end{figure}
2963where the OpenMP-parallel loop is preceded by a serial startup
2964calculation and followed by a serial finalization phase.
2965
2966Initialization of the OpenMP-parallel regions for \mbox{ADOL-C} is only a matter of adding a macro to the outermost OpenMP statement.
2967Two macros are available that only differ in the way the global tape information is handled.
2968Using {\tt ADOLC\_OPENMP}, this information, including the values of the augmented variables, is always transferred from the serial to the parallel region using {\it firstprivate} directives for initialization.
2969For the special case of iterative codes where parallel regions, working on the same data structures, are called repeatedly the {\tt ADOLC\_OPENMP\_NC} macro can be used.
2970Then, the information transfer is performed only once within the iterative process upon encounter of the first parallel region through use of the {\it threadprivate} feature of OpenMP that makes use of thread-local storage, i.e., global memory local to a thread.
2971Due to the inserted macro, the OpenMP statement has the following structure:
2972\begin{tabbing}
2973\hspace*{1cm} \= {\sf \#pragma omp ... ADOLC\_OPENMP} \qquad \qquad or \\
2974              \> {\sf \#pragma omp ... ADOLC\_OPENMP\_NC}
2975\end{tabbing}
2976Inside the parallel region, separate tapes may then be created.
2977Each single thread works in its own dedicated AD-environment, and all
2978serial facilities of \mbox{ADOL-C} are applicable as usual. The global
2979derivatives can be computed using the tapes created in the serial and
2980parallel parts of the function evaluation, where user interaction is
2981required for the correct derivative concatenation of the various tapes.
2982
2983For the usage of the parallel facilities, the \verb=configure=-command
2984has to be used with the option \verb?--with-openmp-flag=FLAG?, where
2985\verb=FLAG= stands for the system dependent OpenMP flag.
2986The parallel differentiation of a parallel program is illustrated
2987by the example program \verb=openmp_exam.cpp= contained in \verb=examples/additional_examples/openmp_exam=.
2988%
2989%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2990%
2991\section{Tapeless forward differentiation in ADOL-C}
2992\label{tapeless}
2993%
2994Up to version 1.9.0, the development of the ADOL-C software package
2995was based on the decision to store all data necessary for derivative
2996computation on tapes, where large applications require the tapes to be
2997written out to corresponding files. In almost all cases this means
2998a considerable drawback in terms of run time due to the excessive
2999memory accesses. Using these tapes enables ADOL-C to offer multiple
3000functions. However, it is not necessary for all tasks of derivative
3001computation to do that.
3002
3003Starting with version 1.10.0, ADOL-C now features a tapeless forward
3004mode for computing first order derivatives in scalar mode, i.e.,
3005$\dot{y} = F'(x)\dot{x}$, and in vector mode, i.e., $\dot{Y} = F'(x)\dot{X}$.
3006This tapeless variant coexists with the more universal
3007tape based mode in the package. The following subsections describe
3008the source code modifications required to use the tapeless forward mode of
3009ADOL-C. 
3010%
3011\subsection{Modifying the Source Code}
3012%
3013Let us consider the coordinate transformation from Cartesian to spherical
3014polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y
3015= F(x)$, with
3016\begin{eqnarray*}
3017y_1  =  \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad
3018y_2  =  \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad
3019y_3  =  \arctan(x_2/x_1),
3020\end{eqnarray*}
3021as an example. The corresponding source code is shown in \autoref{fig:tapeless}.
3022\begin{figure}[htb]
3023\framebox[\textwidth]{\parbox{\textwidth}{
3024%\begin{center}
3025%\begin{flushleft}
3026\begin{tabbing}
3027\= \kill
3028\> {\sf \#include} {\sf $<$iostream$>$}\\
3029\> {\sf using namespace std;}\\
3030\> \\
3031\> {\sf int main() \{}\\
3032\> {\sf \rule{0.5cm}{0pt}double x[3], y[3];}\\
3033\> \\
3034\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3035\> {\sf \rule{1cm}{0pt}...}\\
3036\> \\
3037\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3038\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3039\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3040\> \\
3041\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0] $<<$ " , y2=" $<<$ y[1] $<<$ " , y3=" $<<$ y[2] $<<$ endl;}\\
3042\> \\
3043\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3044\> \}
3045\end{tabbing}
3046%\end{flushleft}
3047%\end{center}
3048}}
3049\caption{Example for tapeless forward mode}
3050\label{fig:tapeless}
3051\end{figure}
3052%
3053Changes to the source code that are necessary for applying the
3054tapeless forward ADOL-C are described in the following two
3055subsections, where the vector mode version is described
3056as extension of the scalar mode.
3057%
3058\subsubsection*{The scalar mode}
3059%
3060To use the tapeless forward mode, one has to include one
3061of the header files \verb#adolc.h# or \verb#adouble.h#
3062where the latter should be preferred since it does not include the
3063tape based functions defined in other header files. Hence, including
3064\verb#adouble.h# avoids mode mixtures, since
3065\verb#adolc.h# is just a wrapper for including all public
3066  headers of the ADOL-C package and does not offer own functions.
3067Since the two ADOL-C forward mode variants tape-based and tapeless,
3068are prototyped in the same header file, the compiler needs to know if a
3069tapeless version is intended. This can be done by defining a
3070preprocessor macro named {\sf ADOLC\_TAPELESS}. Note that it is
3071important to define this macro before the header file is included.
3072Otherwise, the tape-based version of ADOL-C will be used.
3073
3074As in the tape based forward version of ADOL-C all derivative
3075calculations are introduced by calls to overloaded
3076operators. Therefore, similar to the tape-based version all
3077independent, intermediate and dependent variables must be declared
3078with type {\sf adouble}. The whole tapeless functionality provided by
3079\verb#adolc.h# was written as complete inline intended code
3080due to run time aspects, where the real portion of inlined code can
3081be influenced by switches for many compilers. Likely, the whole
3082derivative code is inlined by default. Our experiments
3083with the tapeless mode have produced complete inlined code by using
3084standard switches (optimization) for GNU and Intel C++
3085compiler.
3086
3087To avoid name conflicts
3088resulting from the inlining the tapeless version has its own namespace
3089\verb#adtl#. As a result four possibilities of using the {\sf adouble}
3090type are available for the tapeless version:
3091\begin{itemize}
3092\item Defining a new type
3093      \begin{center}
3094        \begin{tabular}{l}
3095          {\sf typedef adtl::adouble adouble;}\\
3096          ...\\
3097          {\sf adouble tmp;}
3098        \end{tabular}
3099      \end{center}
3100      This is the preferred way. Remember, you can not write an own
3101      {\sf adouble} type/class with different meaning after doing the typedef.
3102\item Declaring with namespace prefix
3103      \begin{center}
3104        \begin{tabular}{l}
3105          {\sf adtl::adouble tmp;}
3106        \end{tabular}
3107      \end{center}
3108      Not the most handsome and efficient way with respect to coding
3109      but without any doubt one of the safest ways. The identifier
3110      {\sf adouble} is still available for user types/classes.
3111\item Trusting macros
3112      \begin{center}
3113        \begin{tabular}{l}
3114          {\sf \#define adouble adtl::adouble}\\
3115          ...\\
3116          {\sf adouble tmp;}
3117        \end{tabular}
3118      \end{center}
3119      This approach should be used with care, since standard defines are text replacements.
3120  \item Using the complete namespace
3121        \begin{center}
3122          \begin{tabular}{l}
3123            {\sf \#using namespace adtl;}\\
3124            ...\\
3125            {\sf adouble tmp;}
3126          \end{tabular}
3127        \end{center}
3128        A very clear approach with the disadvantage of uncovering all the hidden secrets. Name conflicts may arise!
3129\end{itemize}
3130After defining the variables only two things are left to do. First
3131one needs to initialize the values of the independent variables for the
3132function evaluation. This can be done by assigning the variables a {\sf
3133double} value. The {\sf ad}-value is set to zero in this case.
3134Additionally, the tapeless forward mode variant of ADOL-C
3135offers a function named {\sf setValue} for setting the value without
3136changing the {\sf ad}-value. To set the {\sf ad}-values of the independent
3137variables ADOL-C offers two possibilities:
3138\begin{itemize}
3139  \item Using the constructor
3140        \begin{center}
3141          \begin{tabular}{l}
3142            {\sf adouble x1(2,1), x2(4,0), y;}
3143          \end{tabular}
3144        \end{center}
3145        This would create three adoubles $x_1$, $x_2$ and $y$. Obviously, the latter
3146        remains uninitialized. In terms of function evaluation
3147        $x_1$ holds the value 2 and $x_2$ the value 4 whereas the derivative values
3148        are initialized to $\dot{x}_1=1$ and $\dot{x}_2=0$.
3149   \item Setting point values directly
3150         \begin{center}
3151           \begin{tabular}{l}
3152             {\sf adouble x1=2, x2=4, y;}\\
3153             ...\\
3154             {\sf x1.setADValue(1);}\\
3155             {\sf x2.setADValue(0);}
3156           \end{tabular}
3157         \end{center}
3158         The same example as above but now using {\sf setADValue}-method for initializing the derivative values.
3159\end{itemize}
3160%
3161The derivatives can be obtained at any time during the evaluation
3162process by calling the {\sf getADValue}-method
3163\begin{center}
3164  \begin{tabular}{l}
3165    {\sf adouble y;}\\
3166    ...\\
3167    {\sf cout $<<$ y.getADValue();}
3168  \end{tabular}
3169\end{center}
3170\autoref{fig:modcode} shows the resulting source code incorporating
3171all required changes for the example
3172given above.
3173
3174\begin{figure}[htb]
3175\framebox[\textwidth]{\parbox{\textwidth}{
3176%\begin{center}
3177\begin{tabbing}
3178\hspace*{-1cm} \= \kill
3179\> {\sf \#include $<$iostream$>$}\\
3180\> {\sf using namespace std;}\\
3181\> \\
3182\> {\sf \#define ADOLC\_TAPELESS}\\
3183\> {\sf \#include $<$adouble.h$>$}\\
3184\> {\sf typedef adtl::adouble adouble;}\\
3185\\
3186\> {\sf int main() \{}\\
3187\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3188\\
3189\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3190\> {\sf \rule{1cm}{0pt}...}\\
3191\\
3192\> {\sf \rule{0.5cm}{0pt}x[0].setADValue(1);\hspace*{3cm}// derivative of f with respect to $x_1$}\\
3193\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3194\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3195\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3196\\
3197\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3198\> {\sf \rule{0.5cm}{0pt}cout $<<$ "dy2/dx1 = " $<<$ y[1].getADValue() $<<$ endl;}\\
3199\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3200\> {\sf \}}
3201\end{tabbing}
3202%\end{center}
3203}}
3204\caption{Example for tapeless scalar forward mode}
3205\label{fig:modcode}
3206\end{figure}
3207%
3208\subsubsection*{The vector mode}
3209%
3210In scalar mode only one direction element has to be stored per {\sf
3211  adouble} whereas a field of $p$ elements is needed in the vector
3212  mode to cover the computations for the given $p$ directions. The
3213  resulting changes to the source code are described in this section.
3214
3215Similar to tapeless scalar forward mode, the tapeless vector forward
3216mode is used by defining {\sf ADOLC\_TAPELESS}. Furthermore, one has to define
3217an additional preprocessor macro named {\sf NUMBER\_DIRECTIONS}. This
3218macro takes the maximal number of directions to be used within the
3219resulting vector mode. Just as {\sf ADOLC\_TAPELESS} the new macro
3220must be defined before including the \verb#<adolc.h/adouble.h>#
3221header file since it is ignored otherwise.
3222
3223In many situations recompiling the source code to get a new number of
3224directions is at least undesirable. ADOL-C offers a function named
3225{\sf setNumDir} to work around this problem partially. Calling this
3226function, ADOL-C does not take the number of directions
3227from the macro {\sf NUMBER\_DIRECTIONS} but from the argument of
3228{\sf setNumDir}. A corresponding source code would contain the following lines: 
3229\begin{center}
3230  \begin{tabular}{l}
3231    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3232    ...\\
3233    {\sf adtl::setNumDir(5);}
3234  \end{tabular}
3235\end{center}
3236Note that using this function does not
3237change memory requirements that can be roughly determined by
3238({\sf NUMBER\_DIRECTIONS}$+1$)*(number of {\sf adouble}s).
3239
3240Compared to the scalar case setting and getting the derivative
3241values, i.e. the directions, is more involved. Instead of
3242working with single {\sf double} values, pointer to fields of {\sf
3243double}s are used as illustrated by the following example:
3244\begin{center}
3245  \begin{tabular}{l}
3246    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3247    ...\\
3248    {\sf adouble x, y;}\\
3249    {\sf double *ptr=new double[NUMBER\_DIRECTIONS];}\\
3250      ...\\
3251    {\sf x1=2;}\\
3252    {\sf x1.setADValue(ptr);}\\
3253    ...\\
3254    {\sf ptr=y.getADValue();}
3255  \end{tabular}
3256\end{center}
3257Additionally, the tapeless vector forward mode of ADOL-C offers two
3258new methods for setting/getting the derivative values. Similar
3259to the scalar case, {\sf double} values are used but due to the vector
3260mode the position of the desired vector element must be supplied in
3261the argument list:
3262\begin{center}
3263  \begin{tabular}{l}
3264    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3265    ...\\
3266    {\sf adouble x, y;}\\
3267    ...\\
3268    {\sf x1=2;}\\
3269    {\sf x1.setADValue(5,1);\hspace*{3.7cm}// set the 6th point value of x to 1.0}\\
3270      ...\\
3271    {\sf cout $<<$ y.getADValue(3) $<<$ endl;\hspace*{1cm}// print the 4th derivative value of y}
3272  \end{tabular}
3273\end{center}
3274The resulting source code containing all changes that are required is
3275shown in \autoref{fig:modcode2}
3276
3277\begin{figure}[!h!t!b]
3278\framebox[\textwidth]{\parbox{\textwidth}{
3279\begin{tabbing}
3280\hspace*{-1cm} \= \kill
3281\> {\sf \#include $<$iostream$>$}\\
3282\> {\sf  using namespace std;}\\
3283\\
3284\> {\sf \#define ADOLC\_TAPELESS}\\
3285\> {\sf \#define NUMBER\_DIRECTIONS 3}\\
3286\> {\sf \#include $<$adouble.h$>$}\\
3287\> {\sf typedef adtl::adouble adouble;}\\
3288\\
3289\> {\sf ADOLC\_TAPELESS\_UNIQUE\_INTERNALS;}\\
3290\\
3291\> {\sf int main() \{}\\
3292\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3293\\
3294\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3295\> {\sf \rule{1cm}{0pt}...\hspace*{3cm}// Initialize $x_i$}\\
3296\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j) if (i==j) x[i].setADValue(j,1);}\\
3297\> {\sf \rule{0.5cm}{0pt}\}}\\
3298\\
3299\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3300\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3301\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3302\\
3303\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3304\> {\sf \rule{0.5cm}{0pt}cout $<<$ "jacobian : " $<<$ endl;}\\
3305\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3306\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j)}\\
3307\> {\sf \rule{1.5cm}{0pt}cout $<<$ y[i].getADValue(j) $<<$ "  ";}\\
3308\> {\sf \rule{1cm}{0pt}cout $<<$ endl;}\\
3309\> {\sf \rule{0.5cm}{0pt}\}}\\
3310\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3311\> {\sf \}}
3312\end{tabbing}
3313}}
3314\caption{Example for tapeless vector forward mode}
3315\label{fig:modcode2}
3316\end{figure}
3317%
3318\subsection{Compiling and Linking the Source Code}
3319%
3320After incorporating the required changes, one has to compile the
3321source code and link the object files to get the executable.
3322As long as the ADOL-C header files are not included in the absolute path
3323the compile sequence should be similar to the following example:
3324\begin{center}
3325  \begin{tabular}{l}
3326    {\sf g++ -I/home/username/adolc\_base/include -c tapeless\_scalar.cpp}
3327  \end{tabular}
3328\end{center}
3329The \verb#-I# option tells the compiler where to search for the ADOL-C
3330header files. This option can be omitted when the headers are included
3331with absolute path or if ADOL-C is installed in a ``global'' directory.
3332
3333Since the tapeless forward version of ADOL-C is implemented in the
3334header \verb#adouble.h# as complete inline intended version,
3335the object files do not need to be linked against any external ADOL-C
3336code or the ADOL-C library. Therefore, the example started above could be finished with the
3337following command:
3338\begin{center}
3339  \begin{tabular}{l}
3340    {\sf g++ -o tapeless\_scalar tapeless\_scalar.o}
3341  \end{tabular}
3342\end{center}
3343The mentioned source codes {\sf tapeless\_scalar.c} and {\sf tapeless\_vector.c} 
3344illustrating the use of the for tapeless scalar and vector mode can be found in
3345the directory {\sf examples}.
3346%
3347\subsection{Concluding Remarks for the Tapeless Forward Mode Variant}
3348%
3349As many other AD methods the tapeless forward mode provided by the
3350ADOL-C package has its own strengths and drawbacks. Please read the
3351following section carefully to become familiar with the things that
3352can occur:
3353\begin{itemize}
3354  \item Advantages:
3355    \begin{itemize}
3356      \item Code speed\\
3357        Increasing computation speed was one of the main aspects in writing
3358        the tapeless code. In many cases higher performance can be
3359        expected this way.
3360      \item Easier linking process\\
3361        As another result from the code inlining the object code does
3362        not need to be linked against an ADOL-C library.
3363      \item Smaller overall memory requirements\\
3364        Tapeless ADOL-C does not write tapes anymore, as the name
3365        implies. Loop ''unrolling'' can be avoided this
3366        way. Considered main memory plus disk space as overall memory
3367        requirements the tapeless version can be
3368        executed in a more efficient way.
3369    \end{itemize}
3370  \item Drawbacks:
3371    \begin{itemize}
3372    \item Main memory limitations\\
3373      The ability to compute derivatives to a given function is
3374      bounded by the main memory plus swap size  when using
3375      tapeless ADOL-C. Computation from swap should be avoided anyway
3376      as far as possible since it slows down the computing time
3377      drastically. Therefore, if the program execution is 
3378      terminated without error message insufficient memory size can be
3379      the reason among other things. The memory requirements $M$ can
3380      be determined roughly as followed:
3381      \begin{itemize}
3382        \item Scalar mode: $M=$(number of {\sf adouble}s)$*2 + M_p$
3383        \item Vector mode: $M=$(number of {\sf adouble}s)*({\sf
3384          NUMBER\_DIRECTIONS}$+1) + M_p$ 
3385      \end{itemize}
3386      where the storage size of all non {\sf adouble} based variables is described by $M_p$.
3387    \item Compile time\\
3388      As discussed in the previous sections, the tapeless forward mode of
3389      the ADOL-C package is implemented as inline intended version. Using
3390      this approach results in a higher source code size, since every
3391      operation involving at least one {\sf adouble} stands for the
3392      operation itself as well as for the corresponding derivative
3393      code after the inlining process. Therefore, the compilation time
3394      needed for the tapeless version may be higher than that of the tape based code.
3395    \item Code Size\\
3396      A second drawback and result of the code inlining is the
3397      increase of code sizes for the binaries. The increase
3398      factor compared to the corresponding tape based program is
3399      difficult to quantify as it is task dependent. Practical results
3400      have shown that values between 1.0 and 2.0 can be
3401      expected. Factors higher than 2.0 are possible too and even
3402      values below 1.0 have been observed.
3403    \end{itemize}
3404\end{itemize}
3405%
3406%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3407\section{Installing and Using ADOL-C}
3408\label{install}
3409%
3410\subsection{Generating the ADOL-C Library}
3411\label{genlib}
3412%
3413The currently built system is best summarized by the ubiquitous gnu
3414install triplet
3415\begin{center}
3416\verb=configure - make - make install= .
3417\end{center}
3418Executing this three steps from the package base directory
3419\verb=</SOMEPATH/=\texttt{\packagetar}\verb=>= will compile the static and the dynamic
3420ADOL-C library with default options and install the package (libraries
3421and headers) into the default installation directory {\tt
3422  \verb=<=\$HOME/adolc\_base\verb=>=}. Inside the install directory
3423the subdirectory \verb=include= will contain all the installed header
3424files that may be included by the user program, the subdirectory
3425\verb=lib= will contain the 32-bit compiled library
3426and the subdirectory \verb=lib64= will contain the 64-bit compiled
3427library. Depending on the compiler only one of \verb=lib= or
3428\verb=lib64= may be created.
3429
3430Before doing so the user may modify the header file \verb=usrparms.h=
3431in order to tailor the \mbox{ADOL-C} package to the needs in the
3432particular system environment as discussed in
3433\autoref{Customizing}. The configure procedure which creates the necessary
3434\verb=Makefile=s can be customized by use of some switches. Available
3435options and their meaning can be obtained by executing
3436\verb=./configure --help= from the package base directory.
3437
3438All object files and other intermediately generated files can be
3439removed by the call \verb=make clean=. Uninstalling ADOL-C by
3440executing \verb=make uninstall= is only reasonable after a previous
3441called \verb=make install= and will remove all installed package files
3442but will leave the created directories behind.
3443
3444The sparse drivers are included in the ADOL-C libraries if the
3445\verb=./configure= command is executed with the option
3446\verb=--enable-sparse=. The ColPack library available at
3447\verb=http://www.cscapes.org/coloringpage/software.htm= is required to
3448compute the sparse structures, and is searched for in all the default
3449locations as well as in the subdirectory \verb=<ThirdParty/ColPack/>=.
3450In case the library and its headers are installed in a nonstandard path
3451this may be specified with the \verb?--with-colpack=PATH? option.
3452It is assumed that the library and its header files have the following
3453directory structure: \verb?PATH/include? contains all the header
3454files,
3455\verb?PATH/lib? contains the 32-bit compiled library and
3456\verb?PATH/lib64? contains the 64-bit compiled library. Depending on
3457the compiler used to compile {\sf ADOL-C} one of these libraries will
3458be used for linking.
3459 
3460The option \verb=--disable-stdczero= turns off the initialization in the {\sf adouble} default
3461constructor. This will improve efficiency but requires that there be no implicit array initialization in the code, see
3462\autoref{WarSug}
3463
3464
3465\subsection{Compiling and Linking the Example Programs}
3466%
3467The installation procedure described in \autoref{genlib} also
3468provides the \verb=Makefile=s  to compile the example programs in the
3469directories \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= and the
3470additional examples in
3471\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples/additional_examples=. However,
3472one has to execute the
3473\verb=configure= command with  appropriate options for the ADOL-C package to enable the compilation of
3474examples. Available options are:
3475\begin{center}
3476\begin{tabular}[t]{ll}
3477\verb=--enable-docexa=&build all examples discussed in this manual\\
3478&(compare \autoref{example})\\
3479\verb=--enable-addexa=&build all additional examples\\
3480&(See file \verb=README= in the various subdirectories)
3481\end{tabular}
3482\end{center}
3483
3484Just calling \verb=make= from the packages base directory generates
3485all configured examples and the library if necessary. Compiling from
3486subdirectory \verb=examples= or one of its subfolders is possible
3487too. At least one kind of the ADOL-C library (static or shared) must
3488have been built previously in that case. Hence, building the library
3489is always the first step.
3490
3491For Compiling the library and the documented examples on Windows using
3492Visual Studio please refer to the \verb=<Readme_VC++.txt>= files in
3493the \verb=<windows/>=, \verb=<ThirdParty/ColPack/>= and
3494\verb=<ADOL-C/examples/>= subdirectories.
3495%
3496\subsection{Description of Important Header Files}
3497\label{ssec:DesIH}
3498%
3499The application of the facilities of ADOL-C requires the user
3500source code (program or module) to include appropriate
3501header files where the desired data types and routines are
3502prototyped. The new hierarchy of header files enables the user
3503to take one of two possible ways to access the right interfaces.
3504The first and easy way is recommended to beginners: As indicated in
3505\autoref{globalHeaders} the provided {\em global} header file
3506\verb=<adolc/adolc.h>= can be included by any user code to support all
3507capabilities of ADOL-C depending on the particular programming language
3508of the source.   
3509
3510\begin{table}[h]
3511\center \small
3512\begin{tabular}{|p{3.6cm}|p{10.5cm}|}\hline
3513\verb=<adolc/adolc.h>= & 
3514\begin{tabular*}{10.5cm}{cp{9.5cm}}
3515  \boldmath $\rightarrow$ \unboldmath
3516                 & global header file available for easy use of ADOL-C; \\
3517  $\bullet$      & includes all ADOL-C header files depending on
3518                   whether the users source is C++ or C code.
3519\end{tabular*}
3520\\ \hline
3521\verb=<adolc/usrparms.h>= &
3522\begin{tabular*}{10.5cm}{cp{9.5cm}}
3523  \boldmath $\rightarrow$ \unboldmath
3524                 & user customization of ADOL-C package (see
3525                   \autoref{Customizing}); \\
3526  $\bullet$      & after a change of
3527                   user options the ADOL-C library \verb=libadolc.*=
3528                   has to be rebuilt (see \autoref{genlib}); \\
3529  $\bullet$      & is included by all ADOL-C header files and thus by all user
3530                   programs.
3531\end{tabular*} \\ \hline
3532\end{tabular}
3533\caption{Global header files}
3534\label{globalHeaders}
3535\end{table} 
3536
3537The second way is meant for the more advanced ADOL-C user: Some source code
3538includes only those interfaces used by the particular application.
3539The respectively needed header files are indicated
3540throughout the manual.
3541Existing application determined dependences between the provided
3542ADOL-C routines are realized by automatic includes of headers in order
3543to maintain easy use. The header files important to the user are described
3544in the \autoref{importantHeaders1} and \autoref{importantHeaders2}.
3545
3546\begin{table}[h]
3547\center \small
3548\begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline
3549%\multicolumn{2}{|l|}{\bf Tracing/taping}\\ \hline
3550\verb=<adolc/adouble.h>= & 
3551\begin{tabular*}{10.5cm}{cp{9.5cm}}
3552  \boldmath $\rightarrow$ \unboldmath
3553                & provides the interface to the basic active
3554                  scalar data type of ADOL-C: {\sf class adouble} 
3555                  (see \autoref{prepar});
3556%  $\bullet$     & includes the header files \verb=<adolc/avector.h>= and \verb=<adolc/taputil.h>=.
3557\end{tabular*}
3558\\ \hline
3559% \verb=<adolc/avector.h>= &
3560%\begin{tabular*}{10.5cm}{cp{9.5cm}}
3561%  \boldmath $\rightarrow$ \unboldmath
3562%                & provides the interface to the active vector
3563%                  and matrix data types of ADOL-C: {\sf class adoublev}
3564%                  and {\sf class adoublem}, respectively
3565%                   (see \autoref{arrays}); \\
3566%  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
3567%\end{tabular*}
3568%\\ \hline
3569 \verb=<adolc/taputil.h>= & 
3570\begin{tabular*}{10.5cm}{cp{9.5cm}}
3571  \boldmath $\rightarrow$ \unboldmath
3572                & provides functions to start/stop the tracing of
3573                  active sections (see \autoref{markingActive})
3574                  as well as utilities to obtain
3575                  tape statistics (see \autoref{examiningTape}); \\
3576  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
3577\end{tabular*}
3578\\ \hline
3579\end{tabular}
3580\caption{Important header files: tracing/taping}
3581\label{importantHeaders1}
3582\end{table} 
3583%
3584\begin{table}[h]
3585\center \small
3586\begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline
3587%\multicolumn{2}{|l|}{\bf Evaluation of derivatives}\\ \hline
3588\verb=<adolc/interfaces.h>= & 
3589\begin{tabular*}{10.5cm}{cp{9.5cm}}
3590  \boldmath $\rightarrow$ \unboldmath
3591                & provides interfaces to the {\sf forward} and
3592                  {\sf reverse} routines as basic versions of derivative
3593                  evaluation (see \autoref{forw_rev}); \\
3594  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
3595  $\bullet$     & includes the header \verb=<adolc/sparse/sparsedrivers.h>=; \\
3596  $\bullet$     & is included by the header \verb=<adolc/drivers/odedrivers.h>=.
3597\end{tabular*}
3598\\ \hline
3599\verb=<adolc/drivers.h>= & 
3600\begin{tabular*}{10.5cm}{cp{9.5cm}}
3601  \boldmath $\rightarrow$ \unboldmath
3602                & provides ``easy to use'' drivers for solving
3603                  optimization problems and nonlinear equations
3604                  (see \autoref{optdrivers}); \\
3605  $\bullet$     & comprises C and Fortran-callable versions.
3606\end{tabular*}
3607\\ \hline
3608\begin{minipage}{3cm}
3609\verb=<adolc/sparse/=\newline\verb= sparsedrivers.h>=
3610\end{minipage}  & 
3611\begin{tabular*}{10.5cm}{cp{9.5cm}}
3612  \boldmath $\rightarrow$ \unboldmath
3613                & provides the ``easy to use'' sparse drivers
3614                  to exploit the sparsity structure of
3615                  Jacobians (see \autoref{sparse}); \\
3616  \boldmath $\rightarrow$ \unboldmath & provides interfaces to \mbox{C++}-callable versions
3617                  of {\sf forward} and {\sf reverse} routines
3618                  propagating bit patterns (see \autoref{ProBit}); \\
3619
3620  $\bullet$     & is included by the header \verb=<adolc/interfaces.h>=.
3621\end{tabular*}
3622\\ \hline
3623\begin{minipage}{3cm}
3624\verb=<adolc/sparse/=\newline\verb= sparse_fo_rev.h>=
3625\end{minipage}  & 
3626\begin{tabular*}{10.5cm}{cp{9.5cm}}
3627  \boldmath $\rightarrow$ \unboldmath
3628                & provides interfaces to the underlying C-callable
3629                  versions of {\sf forward} and {\sf reverse} routines
3630                  propagating bit patterns.
3631\end{tabular*}
3632\\ \hline
3633\begin{minipage}{3cm}
3634\verb=<adolc/drivers/=\newline\verb= odedrivers.h>=
3635\end{minipage}  &
3636\begin{tabular*}{10.5cm}{cp{9.5cm}}
3637  \boldmath $\rightarrow$ \unboldmath
3638                & provides ``easy to use'' drivers for numerical
3639                  solution of ordinary differential equations
3640                  (see \autoref{odedrivers}); \\
3641  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
3642  $\bullet$     & includes the header \verb=<adolc/interfaces.h>=.
3643\end{tabular*}
3644\\ \hline
3645\begin{minipage}{3cm}
3646\verb=<adolc/drivers/=\newline\verb= taylor.h>=
3647\end{minipage}  &
3648\begin{tabular*}{10.5cm}{cp{9.5cm}}
3649  \boldmath $\rightarrow$ \unboldmath
3650                & provides ``easy to use'' drivers for evaluation
3651                  of higher order derivative tensors (see
3652                  \autoref{higherOrderDeriv}) and inverse/implicit function
3653                  differentiation (see \autoref{implicitInverse});\\
3654  $\bullet$     & comprises C++ and C-callable versions.
3655\end{tabular*} 
3656\\ \hline
3657\verb=<adolc/adalloc.h>= &
3658\begin{tabular*}{10.5cm}{cp{9.5cm}}
3659  \boldmath $\rightarrow$ \unboldmath
3660                & provides C++ and C functions for allocation of
3661                  vectors, matrices and three dimensional arrays
3662                  of {\sf double}s.
3663\end{tabular*}
3664\\ \hline
3665\end{tabular}
3666\caption{Important header files: evaluation of derivatives}
3667\label{importantHeaders2}
3668\end{table} 
3669%
3670\subsection{Compiling and Linking C/C++ Programs}
3671%
3672To compile a C/C++ program or single module using ADOL-C
3673data types and routines one has to ensure that all necessary
3674header files according to \autoref{ssec:DesIH} are
3675included. All modules involving {\em active} data types as
3676{\sf adouble}
3677%, {\bf adoublev} and {\bf adoublem}
3678have to be compiled as C++. Modules that make use of a previously
3679generated tape to evaluate derivatives can either be programmed in ANSI-C
3680(while avoiding all C++ interfaces) or in C++. Depending
3681on the chosen programming language the header files provide
3682the right ADOL-C prototypes.
3683For linking the resulting object codes the library \verb=libadolc.*=
3684must be used (see \autoref{genlib}).
3685%
3686\subsection{Adding Quadratures as Special Functions}
3687%
3688\label{quadrat}
3689%
3690Suppose an integral
3691\[ f(x) = \int\limits^{x}_{0} g(t) dt \]
3692is evaluated numerically by a user-supplied function
3693\begin{center}
3694{\sf  double  myintegral(double\& x);}
3695\end{center}
3696Similarly, let us suppose that the integrand itself is evaluated by
3697a user-supplied block of C code {\sf integrand}, which computes a
3698variable with the fixed name {\sf val} from a variable with the fixed
3699name {\sf arg}. In many cases of interest, {\sf integrand} will
3700simply be of the form
3701\begin{center}
3702{\sf \{ val = expression(arg) \}}\enspace .
3703\end{center}
3704In general, the final assignment to {\sf val} may be preceded
3705by several intermediate calculations, possibly involving local
3706active variables of type {\sf adouble}, but no external or static
3707variables of that type.  However, {\sf integrand} may involve local
3708or global variables of type {\sf double} or {\sf int}, provided they
3709do not depend on the value of {\sf arg}. The variables {\sf arg} and
3710{\sf val} are declared automatically; and as {\sf integrand} is a block
3711rather than a function, {\sf integrand} should have no header line. 
3712
3713Now the function {\sf myintegral} can be overloaded for {\sf adouble}
3714arguments and thus included in the library of elementary functions
3715by the following modifications:
3716\begin{enumerate}
3717\item
3718At the end of the file \verb=<adouble.cpp>=, include the full code
3719defining \\ {\sf double myintegral(double\& x)}, and add the line
3720\begin{center}
3721{\sf extend\_quad(myintegral, integrand); }
3722\end{center}
3723This macro is extended to the definition of
3724 {\sf adouble myintegral(adouble\& arg)}.
3725Then remake the library \verb=libadolc.*= (see \autoref{genlib}).
3726\item
3727In the definition of the class
3728{\sf ADOLC\_DLL\_EXPORT adouble} in \verb=<adolc/adouble.h>=, add the statement
3729\begin{center}
3730{\sf friend adouble myintegral(adouble\&)}.
3731\end{center}
3732\end{enumerate}
3733In the first modification, {\sf myintegral} represents the name of the
3734{\sf double} function, whereas {\sf integrand} represents the actual block
3735of C code.
3736
3737For example, in case of the inverse hyperbolic cosine, we have
3738{\sf myintegral} = {\sf acosh}. Then {\sf integrand} can be written as
3739{\sf \{ val = sqrt(arg*arg-1); \}} 
3740so that the line
3741\begin{center}
3742{\sf extend\_quad(acosh,val = sqrt(arg*arg-1));} 
3743\end{center}
3744can be added to the file \verb=<adouble.cpp>=.
3745A mathematically equivalent but longer representation of
3746{\sf integrand} is
3747\begin{center}
3748\begin{tabbing}
3749{\sf \{ }\hspace{1.0in}\= {\sf  \{ adouble} \= temp =   \kill
3750 \>{\sf  \{ adouble} \> {\sf temp = arg;} \\
3751 \> \ \> {\sf  temp = temp*temp; } \\ 
3752 \> \ \> {\sf  val = sqrt(temp-1); \}} 
3753\end{tabbing}
3754\end{center} 
3755The code block {\sf integrand} may call on any elementary function that has already
3756been defined in file \verb=<adouble.cpp>=, so that one may also introduce
3757iterated integrals.
3758%
3759%
3760\section{Example Codes}
3761\label{example}
3762%
3763The following listings are all simplified versions of codes that
3764are contained in the example subdirectory
3765\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= of ADOL-C. In particular,
3766we have left out timings, which are included in the complete codes.
3767%
3768\subsection{Speelpenning's Example ({\tt speelpenning.cpp})}
3769%
3770The first example evaluates the gradient and the Hessian of
3771the function
3772\[ 
3773y \; = \; f(x)\; =\; \prod_{i=0}^{n-1} x_i
3774\] 
3775using the appropriate drivers {\sf gradient} and {\sf hessian}.
3776
3777\begin{verbatim}
3778#include <adolc/adouble.h>               // use of active doubles and taping
3779#include <adolc/drivers/drivers.h>       // use of "Easy to Use" drivers
3780                                   // gradient(.) and hessian(.)
3781#include <adolc/taping.h>                // use of taping
3782...
3783void main() {
3784int n,i,j;
3785size_t tape_stats[STAT_SIZE];
3786cout << "SPEELPENNINGS PRODUCT (ADOL-C Documented Example) \n";
3787cout << "number of independent variables = ?  \n";
3788cin >> n;
3789double* xp = new double[n];         
3790double  yp = 0.0;
3791adouble* x = new adouble[n];     
3792adouble  y = 1;
3793for(i=0;i<n;i++)
3794  xp[i] = (i+1.0)/(2.0+i);         // some initialization
3795trace_on(1);                       // tag =1, keep=0 by default
3796  for(i=0;i<n;i++) {
3797    x[i] <<= xp[i]; y *= x[i]; }     
3798  y >>= yp;
3799  delete[] x;                     
3800trace_off();
3801tapestats(1,tape_stats);           // reading of tape statistics
3802cout<<"maxlive "<<tape_stats[2]<<"\n";
3803...                                // ..... print other tape stats
3804double* g = new double[n];       
3805gradient(1,n,xp,g);                // gradient evaluation
3806double** H=(double**)malloc(n*sizeof(double*));
3807for(i=0;i<n;i++)
3808  H[i]=(double*)malloc((i+1)*sizeof(double));
3809hessian(1,n,xp,H);                 // H equals (n-1)g since g is
3810double errg = 0;                   // homogeneous of degree n-1.
3811double errh = 0;
3812for(i=0;i<n;i++)
3813  errg += fabs(g[i]-yp/xp[i]);     // vanishes analytically.
3814for(i=0;i<n;i++) {
3815  for(j=0;j<n;j++) {
3816    if (i>j)                       // lower half of hessian
3817      errh += fabs(H[i][j]-g[i]/xp[j]); } }
3818cout << yp-1/(1.0+n) << " error in function \n";
3819cout << errg <<" error in gradient \n";
3820cout << errh <<" consistency check \n";
3821}                                  // end main
3822\end{verbatim}
3823%
3824\subsection{Power Example ({\tt powexam.cpp})}
3825%
3826The second example function evaluates the $n$-th power of a real
3827variable $x$ in
3828$\log_2 n$ multiplications by recursive halving of the exponent. Since
3829there is only one independent variable, the scalar derivative can be
3830computed by
3831using both {\sf forward} and {\sf reverse}, and the
3832results are subsequently compared.
3833\begin{verbatim}
3834#include <adolc/adolc.h>                 // use of ALL ADOL-C interfaces
3835
3836adouble power(adouble x, int n) {
3837adouble z=1;
3838if (n>0) {                         // recursion and branches
3839  int nh =n/2;                     // that do not depend on
3840  z = power(x,nh);                 // adoubles are fine !!!!
3841  z *= z;
3842  if (2*nh != n)
3843    z *= x;
3844  return z; }                      // end if
3845else {
3846  if (n==0)                        // the local adouble z dies
3847    return z;                      // as it goes out of scope.
3848  else
3849    return 1/power(x,-n); }        // end else
3850} // end power
3851\end{verbatim}
3852The function {\sf power} above was obtained from the original
3853undifferentiated version by simply changing the type of all
3854{\sf double}s including the return variable to {\sf adouble}s. The new version
3855can now be called from within any active section, as in the following
3856main program.
3857\begin{verbatim}
3858#include ...                       // as above
3859int main() {
3860int i,n,tag=1;
3861cout <<"COMPUTATION OF N-TH POWER (ADOL-C Documented Example)\n\n";
3862cout<<"monomial degree=? \n";      // input the desired degree
3863cin >> n;
3864                                   // allocations and initializations
3865double* Y[1];
3866*Y = new double[n+2];
3867double* X[1];                      // allocate passive variables with
3868*X = new double[n+4];              // extra dimension for derivatives
3869X[0][0] = 0.5;                     // function value = 0. coefficient
3870X[0][1] = 1.0;                     // first derivative = 1. coefficient
3871for(i=0;i<n+2;i++)
3872  X[0][i+2]=0;                     // further coefficients
3873double* Z[1];                      // used for checking consistency
3874*Z = new double[n+2];              // between forward and reverse
3875adouble y,x;                       // declare active variables
3876                                   // beginning of active section
3877trace_on(1);                       // tag = 1 and keep = 0
3878x <<= X[0][0];                     // only one independent var
3879y = power(x,n);                    // actual function call
3880y >>= Y[0][0];                     // only one dependent adouble
3881trace_off();                       // no global adouble has died
3882                                   // end of active section
3883double u[1];                       // weighting vector
3884u[0]=1;                            // for reverse call
3885for(i=0;i<n+2;i++) {               // note that keep = i+1 in call
3886  forward(tag,1,1,i,i+1,X,Y);      // evaluate the i-the derivative
3887  if (i==0)
3888    cout << Y[0][i] << " - " << y.value() << " = " << Y[0][i]-y.value()
3889    << " (should be 0)\n";
3890  else
3891    cout << Y[0][i] << " - " << Z[0][i] << " = " << Y[0][i]-Z[0][i]
3892    << " (should be 0)\n";
3893  reverse(tag,1,1,i,u,Z);          // evaluate the (i+1)-st derivative
3894  Z[0][i+1]=Z[0][i]/(i+1); }       // scale derivative to Taylorcoeff.
3895return 1;
3896}                                  // end main
3897\end{verbatim}
3898Since this example has only one independent and one dependent variable,
3899{\sf forward} and {\sf reverse} have the same complexity and calculate
3900the same scalar derivatives, albeit with a slightly different scaling.
3901By replacing the function {\sf power} with any other univariate test function,
3902one can check that {\sf forward} and {\sf reverse} are at least consistent.
3903In the following example the number of independents is much larger
3904than the number of dependents, which makes the reverse mode preferable.
3905%
3906\subsection{Determinant Example ({\tt detexam.cpp})}
3907%
3908Now let us consider an exponentially expensive calculation,
3909namely, the evaluation of a determinant by recursive expansion
3910along rows. The gradient of the determinant with respect to the
3911matrix elements is simply the adjoint, i.e. the matrix of cofactors.
3912Hence the correctness of the numerical result is easily checked by
3913matrix-vector multiplication. The example illustrates the use
3914of {\sf adouble} arrays and pointers. 
3915
3916\begin{verbatim}
3917#include <adolc/adouble.h>               // use of active doubles and taping
3918#include <adolc/interfaces.h>            // use of basic forward/reverse
3919                                   // interfaces of ADOL-C
3920adouble** A;                       // A is an n x n matrix
3921int i,n;                           // k <= n is the order
3922adouble det(int k, int m) {        // of the sub-matrix
3923if (m == 0) return 1.0 ;           // its column indices
3924else {                             // are encoded in m
3925  adouble* pt = A[k-1];
3926  adouble t = zero;                // zero is predefined
3927  int s, p =1;
3928  if (k%2) s = 1; else s = -1;
3929  for(i=0;i<n;i++) {
3930    int p1 = 2*p;
3931    if (m%p1 >= p) {
3932      if (m == p) {
3933        if (s>0) t += *pt; else t -= *pt; }
3934      else {
3935        if (s>0)
3936          t += *pt*det(k-1,m-p);   // recursive call to det
3937        else
3938          t -= *pt*det(k-1,m-p); } // recursive call to det
3939      s = -s;}
3940    ++pt;
3941    p = p1;}
3942  return t; }
3943}                                  // end det
3944\end{verbatim}
3945As one can see, the overloading mechanism has no problem with pointers
3946and looks exactly the same as the original undifferentiated function
3947except for the change of type from {\sf double} to {\sf adouble}.
3948If the type of the temporary {\sf t} or the pointer {\sf pt} had not been changed,
3949a compile time error would have resulted. Now consider a corresponding
3950calling program.
3951
3952\begin{verbatim}
3953#include ...                       // as above
3954int main() {
3955int i,j, m=1,tag=1,keep=1;
3956cout << "COMPUTATION OF DETERMINANTS (ADOL-C Documented Example)\n\n";
3957cout << "order of matrix = ? \n";  // select matrix size
3958cin >> n;
3959A = new adouble*[n];             
3960trace_on(tag,keep);                // tag=1=keep
3961  double detout=0.0, diag = 1.0;   // here keep the intermediates for
3962  for(i=0;i<n;i++) {               // the subsequent call to reverse
3963    m *=2;
3964    A[i] = new adouble[n];         // not needed for adoublem
3965    adouble* pt = A[i];
3966    for(j=0;j<n;j++)
3967      A[i][j] <<= j/(1.0+i);       // make all elements of A independent
3968    diag += A[i][i].value();        // value() converts to double
3969    A[i][i] += 1.0; }
3970  det(n,m-1) >>= detout;           // actual function call
3971  printf("\n %f - %f = %f  (should be 0)\n",detout,diag,detout-diag);
3972trace_off();
3973double u[1];
3974u[0] = 1.0;
3975double* B = new double[n*n];
3976reverse(tag,1,n*n,1,u,B);
3977cout <<" \n first base? : ";
3978for (i=0;i<n;i++) {
3979  adouble sum = 0;
3980  for (j=0;j<n;j++)                // the matrix A times the first n
3981    sum += A[i][j]*B[j];           // components of the gradient B
3982  cout<<sum.value()<<" "; }         // must be a Cartesian basis vector
3983return 1;
3984}                                  // end main
3985\end{verbatim}
3986The variable {\sf diag} should be mathematically
3987equal to the determinant, because the
3988matrix {\sf A} is defined as a rank 1 perturbation of the identity.
3989%
3990\subsection{Ordinary Differential Equation Example ({\tt odexam.cpp})}
3991\label{exam:ode}
3992%
3993Here, we consider a nonlinear ordinary differential equation that
3994is a slight modification of the Robertson test problem
3995given in Hairer and Wanner's book on the numerical solution of
3996ODEs \cite{HW}. The following source code computes the corresponding
3997values of $y^{\prime} \in \R^3$:
3998\begin{verbatim}
3999#include <adolc/adouble.h>                  // use of active doubles and taping
4000#include <adolc/drivers/odedrivers.h>       // use of "Easy To use" ODE drivers
4001#include <adolc/adalloc.h>                  // use of ADOL-C allocation utilities
4002
4003void tracerhs(short int tag, double* py, double* pyprime) {
4004adouble y[3];                         // this time we left the parameters
4005adouble yprime[3];                    // passive and use the vector types
4006trace_on(tag);
4007for (int i=0; i<3; i++)
4008     y[i] <<= py[i];                  // initialize and mark independents
4009yprime[0] = -sin(y[2]) + 1e8*y[2]*(1-1/y[0]);
4010yprime[1] = -10*y[0] + 3e7*y[2]*(1-y[1]);
4011yprime[2] = -yprime[0] - yprime[1];
4012yprime >>= pyprime;                   // mark and pass dependents
4013trace_off(tag);
4014}                                     // end tracerhs
4015\end{verbatim}
4016The Jacobian of the right-hand side has large
4017negative eigenvalues, which make the ODE quite stiff. We  have added
4018some numerically benign transcendentals to make the differentiation
4019more interesting.
4020The following main program uses {\sf forode} to calculate the Taylor series
4021defined by the ODE at the given point $y_0$ and {\sf reverse} as well
4022as {\sf accode} to compute the Jacobians of the coefficient vectors
4023with respect to $x_0$.
4024\begin{verbatim}
4025#include .......                   // as above
4026int main() {
4027int i,j,deg; 
4028int n=3;
4029double py[3];
4030double pyp[3];
4031cout << "MODIFIED ROBERTSON TEST PROBLEM (ADOL-C Documented Example)\n";
4032cout << "degree of Taylor series =?\n";
4033cin >> deg;
4034double **X;
4035X=(double**)malloc(n*sizeof(double*));
4036for(i=0;i<n;i++)
4037  X[i]=(double*)malloc((deg+1)*sizeof(double));
4038double*** Z=new double**[n];
4039double*** B=new double**[n];
4040short** nz = new short*[n];
4041for(i=0;i<n;i++) {
4042  Z[i]=new double*[n];
4043  B[i]=new double*[n];
4044  for(j=0;j<n;j++) {
4045    Z[i][j]=new double[deg];
4046    B[i][j]=new double[deg]; }     // end for
4047}                                  // end for
4048for(i=0;i<n;i++) {
4049  py[i] = (i == 0) ? 1.0 : 0.0;    // initialize the base point
4050  X[i][0] = py[i];                 // and the Taylor coefficient;
4051  nz[i] = new short[n]; }          // set up sparsity array
4052tracerhs(1,py,pyp);                // trace RHS with tag = 1
4053forode(1,n,deg,X);                 // compute deg coefficients
4054reverse(1,n,n,deg-1,Z,nz);         // U defaults to the identity
4055accode(n,deg-1,Z,B,nz);
4056cout << "nonzero pattern:\n";
4057for(i=0;i<n;i++) {
4058  for(j=0;j<n;j++)
4059    cout << nz[i][j]<<"\t";
4060  cout <<"\n"; }                   // end for
4061return 1;
4062}                                  // end main
4063\end{verbatim}
4064\noindent The pattern {\sf nz} returned by {\sf accode} is
4065\begin{verbatim}
4066              3  -1   4
4067              1   2   2
4068              3   2   4
4069\end{verbatim}
4070The original pattern {\sf nz} returned by {\sf reverse} is the same
4071except that the negative entry $-1$ was zero.
4072%
4073%\subsection {Gaussian Elimination Example ({\tt gaussexam.cpp})}
4074%\label{gaussexam}
4075%
4076%The following example uses conditional assignments to show the usage of a once produced tape
4077%for evaluation at new arguments. The elimination is performed with
4078%column pivoting.
4079%\begin{verbatim}
4080%#include <adolc/adolc.h>           // use of ALL ADOL-C interfaces
4081
4082%void gausselim(int n, adoublem& A, adoublev& bv) {
4083%along i;                           // active integer declaration
4084%adoublev temp(n);                  // active vector declaration
4085%adouble r,rj,temps;
4086%int j,k;
4087%for(k=0;k<n;k++) {                 // elimination loop
4088%  i = k;
4089%  r = fabs(A[k][k]);               // initial pivot size
4090%  for(j=k+1;j<n;j++) {
4091%    rj = fabs(A[j][k]);             
4092%    condassign(i,rj-r,j);          // look for a larger element in the same
4093%    condassign(r,rj-r,rj); }       // column with conditional assignments
4094%  temp = A[i];                     // switch rows using active subscripting
4095%  A[i] = A[k];                     // necessary even if i happens to equal
4096%  A[k] = temp;                     // k during taping
4097%  temps = bv[i];
4098%  bv[i]=bv[k];
4099%  bv[k]=temps;
4100%  if (!value(A[k][k]))             // passive subscripting
4101%    exit(1);                       // matrix singular!
4102%  temps= A[k][k];
4103%  A[k] /= temps;
4104%  bv[k] /= temps;
4105%  for(j=k+1;j<n;j++) {
4106%    temps= A[j][k];
4107%    A[j] -= temps*A[k];            // vector operations
4108%    bv[j] -= temps*bv[k]; }        // endfor
4109%}                                  // end elimination loop
4110%temp=0.0;
4111%for(k=n-1;k>=0;k--)                // backsubstitution
4112%  temp[k] = (bv[k]-(A[k]*temp))/A[k][k];
4113%bv=temp;
4114%}                                  // end gausselim
4115%\end{verbatim}
4116%\noindent This function can be called from any program
4117%that suitably initializes
4118%the components of {\sf A} and {\sf bv}
4119%as independents. The resulting tape can be
4120%used to solve any nonsingular linear system of the same size and
4121%to get the sensitivities of the solution with respect to the
4122%system matrix and the right hand side.
4123%\vspace*{-4mm}
4124%
4125\section*{Acknowledgements}
4126%
4127Parts of the ADOL-C source were developed by Andreas
4128Kowarz, Hristo Mitev, Sebastian Schlenkrich,  and Olaf
4129Vogel. We are also indebted to George Corliss,
4130Tom Epperly, Bruce Christianson, David Gay,  David Juedes,
4131Brad Karp, Koichi Kubota, Bob Olson,  Marcela Rosemblun, Dima
4132Shiriaev, Jay Srinivasan, Chuck Tyner, Jean Utke, and Duane Yoder for helping in
4133various ways with the development and documentation of ADOL-C.
4134%
4135\begin{thebibliography}{10}
4136
4137\bibitem{BeKh96}
4138Christian~H. Bischof, Peyvand~M. Khademi, Ali Bouaricha and Alan Carle.
4139\newblock {\em Efficient computation of gradients and Jacobians by dynamic
4140  exploitation of sparsity in automatic differentiation}.
4141\newblock Optimization Methods and Software 7(1):1-39, 1996.
4142
4143\bibitem{Chri91a}
4144Bruce Christianson.
4145\newblock {\em Reverse accumulation and accurate rounding error estimates for
4146Taylor series}.
4147\newblock  Optimization Methods and Software 1:81--94, 1992.
4148
4149\bibitem{GeMaPo05}
4150Assefaw Gebremedhin, Fredrik Manne, and Alex Pothen.
4151\newblock {\em What color is your {J}acobian? {G}raph coloring for computing
4152  derivatives}.
4153\newblock SIAM Review 47(4):629--705, 2005.
4154
4155\bibitem{GePoTaWa06}
4156Assefaw Gebremedhin, Alex Pothen, Arijit Tarafdar and Andrea Walther.
4157{\em Efficient Computation of Sparse Hessians: An Experimental Study
4158  using ADOL-C}. Tech. Rep. (2006). To appear in INFORMS Journal on Computing.
4159
4160\bibitem{GePoWa08} Assefaw Gebremedhin, Alex Pothen, and Andrea
4161  Walther.
4162{\em Exploiting  Sparsity  in Jacobian Computation via Coloring and Automatic Differentiation:
4163a Case Study in a Simulated Moving Bed Process}.
4164In Chr. Bischof et al., eds.,  {\em Proceedings AD 2008 conference}, LNCSE 64, pp. 327 -- 338, Springer (2008).
4165
4166\bibitem{GeTaMaPo07}
4167Assefaw Gebremedhin, Arijit Tarafdar, Fredrik Manne, and Alex Pothen,
4168{\em New Acyclic and Star Coloring Algorithms with Applications to Hessian Computation}.
4169SIAM Journal on Scientific Computing 29(3):1042--1072, 2007.
4170
4171
4172\bibitem{GrWa08}
4173Andreas Griewank and Andrea Walther: {\em Evaluating Derivatives, Principles and Techniques of
4174  Algorithmic Differentiation. Second edition}. SIAM, 2008.
4175
4176
4177\bibitem{Griewank97}
4178Andreas Griewank, Jean Utke, and Andrea Walther.
4179\newblock {\em Evaluating higher derivative tensors by forward propagation
4180          of univariate Taylor series}.
4181\newblock Mathematics of Computation, 69:1117--1130, 2000.
4182
4183\bibitem{GrWa00} 
4184Andreas Griewank and Andrea Walther. {\em Revolve: An Implementation of Checkpointing for the Reverse
4185                 or Adjoint Mode of Computational Differentiation},
4186                 ACM Transaction on Mathematical Software 26:19--45, 2000.
4187
4188\bibitem{HW}
4189    Ernst Hairer and Gerhard Wanner.
4190    {\it Solving Ordinary Differential Equations II.\/}
4191    Springer-Verlag, Berlin, 1991.
4192
4193\bibitem{Knuth73}
4194Donald~E. Knuth.
4195\newblock {\em The Art of Computer Programming. Second edition.}
4196\newblock Addison-Wesley, Reading, 1973.
4197
4198\bibitem{Wa05a}
4199Andrea Walther.
4200\newblock {\em Computing Sparse Hessians with Automatic Differentiation}.
4201\newblock Transaction on Mathematical Software, 34(1), Artikel 3 (2008).
4202\end{thebibliography}
4203\end{document}
4204
Note: See TracBrowser for help on using the repository browser.