source: trunk/ADOL-C/doc/adolc-manual.tex @ 270

Last change on this file since 270 was 190, checked in by kulshres, 9 years ago

correct documentation of tapestats and STAT_SIZE

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

File size: 192.0 KB
1% Latex file containing the documentation of ADOL-C
3% Copyright (C) Andrea Walther, Andreas Griewank, Andreas Kowarz,
4%               Hristo Mitev, Sebastian Schlenkrich, Jean Utke, Olaf Vogel
6% This file is part of ADOL-C. This software is provided as open source.
7% Any use, reproduction, or distribution of the software constitutes
8% recipient's acceptance of the terms of the accompanying license file.
17\newdateformat{monthyear}{\monthname\ \THEYEAR}
32\newcommand{\N}{{ {\rm I} \kern -.225em {\rm N} }}
33\newcommand{\R}{{ {\rm I} \kern -.225em {\rm R} }}
34\newcommand{\T}{{ {\rm I} \kern -.425em {\rm T} }}
50{\Large {\bf ADOL-C:}} 
51\footnote{The development of earlier versions was supported by the Office of
52  Scientific Computing, U.S. Department of Energy, the NSF, and the Deutsche
53  Forschungsgemeinschaft. During the development of the current
54  version Andrea Walther and Andreas Kowarz were supported by the
55  grant Wa 1607/2-1 of the Deutsche Forschungsgemeinschaft} 
56\vspace{0.2in} \\
58{\Large A Package for the Automatic Differentiation}\vspace{0.1in} \\
59{\Large of Algorithms Written in C/C++}\\
61{\large\bf  Version \packageversion, \monthyear\today} \\
63 \mbox{Andrea Walther}\footnote{Institute of Mathematics, University
64   of Paderborn, 33098 Paderborn, Germany} and
65 \mbox{Andreas Griewank}\footnote{Department of Mathematics,
66 Humboldt-Universit\"at zu Berlin, 10099 Berlin, Germany}
71The C++ package ADOL-C described here facilitates the evaluation of
72first and higher derivatives of vector functions that are defined
73by computer programs written in C or C++. The resulting derivative
74evaluation routines may be called from C, C++, Fortran, or any other
75language that can be linked with C.
77The numerical values of derivative vectors are obtained free
78of truncation errors at a small multiple of the run time and
79random access memory required by the given function evaluation program.
80Derivative matrices are obtained by columns, by rows or in sparse format.
81For solution curves defined by ordinary differential equations,
82special routines are provided that evaluate the Taylor coefficient vectors
83and their Jacobians with respect to the current state vector.
84For explicitly or implicitly defined functions derivative tensors are
85obtained with a complexity that grows only quadratically in their
86degree. The derivative calculations involve a possibly substantial but
87always predictable amount of data. Since the data is accessed strictly sequentially
88it can be automatically paged out to external files.
91{\bf Keywords}: Computational Differentiation, Automatic
92         Differentiation,
93         Chain Rule, Overloading, Taylor Coefficients,
94         Gradients, Hessians, Forward Mode, Reverse Mode,
95         Implicit Function Differentiation, Inverse Function Differentiation
99{\bf Abbreviated title}: Automatic differentiation by overloading in C++
111\section{Preparing a Section of C or C++ Code for Differentiation}
117The package \mbox{ADOL-C} 
118utilizes overloading in C++, but the
119user has to know only C. The acronym stands for {\bf A}utomatic
120{\bf D}ifferentiation by {\bf O}ver{\bf L}oading in {\bf C}++.
121In contrast to source transformation approaches, overloading does not generate intermediate
122source code.
123As starting points to retrieve further information on techniques and
124application of automatic differentiation, as well as on other AD
125tools, we refer to the book \cite{GrWa08}. Furthermore, the web page
126\verb= of the AD community forms a rich source
127of further information and pointers.
130ADOL-C facilitates the simultaneous
131evaluation of arbitrarily high directional derivatives and the
132gradients of these Taylor coefficients with respect to all independent
133variables. Relative to the cost of evaluating the underlying function,
134the cost for evaluating any such scalar-vector pair grows as the
135square of the degree of the derivative but is still completely
136independent of the numbers $m$ and $n$.
138This manual is organized as follows. This section explains the
139modifications required to convert undifferentiated code to code that
140compiles with ADOL-C.
141\autoref{tape} covers aspects of the tape of recorded data that ADOL-C uses to
142evaluate arbitrarily high order derivatives. The discussion includes storage
143requirements and the tailoring of certain tape characteristics to fit specific
144user needs. Descriptions of easy-to-use drivers for a  convenient derivative
145evaluation are contained in \autoref{drivers}.
146\autoref{forw_rev_ad} offers a more mathematical characterization of
147the different modes of AD to compute derivatives. At the same time, the
148corresponding drivers of ADOL-C are explained. 
149The overloaded derivative evaluation routines using the forward and the reverse
150mode of AD are explained in \autoref{forw_rev}.
151Advanced differentiation techniques as the optimal checkpointing for
152time integrations, the exploitation of fixed point iterations, the usages
153of external differentiated functions and the differentiation of OpenMP
154parallel programs are described in \autoref{adv_ad}.
155The tapeless forward mode is presented in \autoref{tapeless}.
156\autoref{install} details the installation and
157use of the ADOL-C package. Finally, \autoref{example} 
158furnishes some example programs that incorporate the ADOL-C package to
159evaluate first and higher-order
160derivatives.  These and other examples are distributed with the ADOL-C
161source code.
162The user should simply refer to them if the more abstract and general
163descriptions of ADOL-C provided in this document do not suffice.
166\subsection{Declaring Active Variables}
170The key ingredient of automatic differentiation by overloading is the
171concept of an {\em active variable}. All variables that may be
172considered as differentiable quantities at some time
173during the program execution must be of an active
174type. ADOL-C uses one
175active scalar type, called {\sf adouble}, whose real part is of the
176standard type {\sf double}.
177Typically, one will declare the independent variables
178and all quantities that directly or indirectly depend on them as
179{\em active}. Other variables that do not depend on the independent
180variables but enter, for example, as parameters, may remain one of the
181{\em passive} types {\sf double, float}, or {\sf int}. There is no
182implicit type conversion from {\sf adouble} to any of these passive
183types; thus, {\bf failure to declare variables as active when they
184depend on other active variables will result in a compile-time error
185message}. In data flow terminology, the set of active variable names
186must contain all its successors in the dependency graph. All components
187of indexed arrays must have the same activity status.
189The real component of an {\sf adouble x} can be extracted as
190{\sf x.value()}. In particular,
191such explicit conversions are needed for the standard output procedure
192{\sf printf}. The output stream operator \boldmath $\ll$ \unboldmath is overloaded such
193that first the real part of an {\sf adouble} and then the string
194``{\sf (a)}" is added to the stream. The input stream operator \boldmath $\gg$ \unboldmath  can
195be used to assign a constant value to an {\sf adouble}.
196Naturally, {\sf adouble}s may be
197components of vectors, matrices, and other arrays, as well as
198members of structures or classes.
200The C++ class {\sf adouble}, its member functions, and the overloaded
201versions of all arithmetic operations, comparison operators, and
202most ANSI C functions are contained in the file \verb=adouble.cpp= and its
203header \verb=<adolc/adouble.h>=. The latter must be included for compilation
204of all program files containing {\sf adouble}s and corresponding
208\subsection{Marking Active Sections}
211All calculations involving active variables that occur between
212the void function calls
214{\sf trace\_on(tag,keep)} \hspace{0.3in} and \hspace{0.3in}
215{\sf trace\_off(file)}
217are recorded on a sequential data set called {\em tape}. Pairs of
218these function calls can appear anywhere in a C++ program, but
219they must not overlap. The nonnegative integer argument {\sf tag} identifies the
220particular tape for subsequent function or derivative evaluations.
221Unless several tapes need to be kept, ${\sf tag} =0$ may be used throughout.
222The optional integer arguments {\sf keep} and
223{\sf file} will be discussed in \autoref{tape}. We will refer to the
224sequence of statements executed between a particular call to
225{\sf trace\_on} and the following call to {\sf trace\_off} as an
226{\em active section} of the code. The same active section may be
227entered repeatedly, and one can successively generate several traces
228on distinct tapes by changing the value of {\sf tag}.
229Both functions {\sf trace\_on} and {\sf trace\_off} are prototyped in
230the header file \verb=<adolc/taputil.h>=, which is included by the header
231\verb=<adolc/adouble.h>= automatically.
233Active sections may contain nested or even recursive calls to functions
234provided by the user. Naturally, their formal and actual parameters
235must have matching types. In particular, the functions must be
236compiled with their active variables declared as
237{\sf adouble}s and with the header file \verb=<adolc/adouble.h>= included. 
238Variables of type {\sf adouble} may be declared outside an active section and need not
239go out of scope before the end of an active section.
240It is not necessary -- though desirable -- that free-store {\sf adouble}s
241allocated within
242an active section be deleted before its completion. The values of all
243{\sf adouble}s that exist at the beginning and end of an active section
244are automatically
245recorded by {\sf trace\_on} and {\sf trace\_off}, respectively.
248\subsection{Selecting Independent and Dependent Variables}
250One or more active variables that are read in or initialized to
251the values of constants or passive variables must be distinguished as
252independent variables. Other active variables that are similarly
253initialized may be considered as temporaries (e.g., a variable that
254accumulates the partial sums of a scalar product after being
255initialized to zero). In order to distinguish an active variable {\sf x} as
256independent, ADOL-C requires an assignment of the form
258{\sf x} \boldmath $\ll=$ \unboldmath {\sf px}\hspace{0.2in}// {\sf px} of any passive numeric type $\enspace .$
260This special initialization ensures that {\sf x.value()} = {\sf px}, and it should
261precede any other assignment to {\sf x}. However, {\sf x} may be reassigned
262other values subsequently. Similarly, one or more active variables {\sf y}
263must be distinguished as dependent by an assignment of the form
265{\sf y \boldmath $\gg=$ \unboldmath py}\hspace{0.2in}// {\sf py} of any  passive type $\enspace ,$ 
267which ensures that {\sf py} = {\sf y.value()} and should not be succeeded
268by any other assignment to {\sf y}. However, a dependent variable {\sf y} 
269may have been assigned other real values previously, and it could even be an
270independent variable as well.  The derivative values calculated after
272completion of an active section always represent {\bf derivatives of the final
273values of the dependent variables with respect to the initial values of the
274independent variables}.
276The order in which the independent and dependent variables are marked
277by the \boldmath $\ll=$ \unboldmath and \boldmath $\gg=$ \unboldmath statements matters crucially for the subsequent
278derivative evaluations. However, these variables do not have to be
279combined into contiguous vectors. ADOL-C counts the number of
280independent and dependent variable specifications within each active
281section and records them in the header of the tape.
284\subsection{A Subprogram as an Active Section} 
286As a generic example let us consider a C(++) function of the form
287shown in \autoref{code1}.
293{\sf void eval(}\= {\sf int n, int m,} \hspace{0.5 in} \=  // number of independents and dependents\\
294\>{\sf  double *x,} \> // independent variable vector \\
295\>{\sf  double *y,} \> // dependent variable vector  \\ 
296\> {\sf int *k, } \> // integer parameters \\ 
297\>{\sf  double *z)}  \> // real parameters \\
298{\sf \{ }\hspace{0.1 in } \=  \> // beginning of function body \\
299\>{\sf double t = 0;}  \> // local variable declaration \\
300\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> // begin of computation \\
301\>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue  \\
302\>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue \\
303\>{\sf  y[m-1] = t/m; }   \> //   end of computation \\
304{\sf  \} } \>  \> // end of function
308\caption{Generic example of a subprogram to be activated}
313If {\sf eval} is to be called from within an active C(++)
314section with {\sf x}
315and {\sf y} as vectors of {\sf adouble}s and the other parameters
316passive, then one merely has to change the type declarations of all
317variables that depend on {\sf x} from {\sf double} or {\sf float} to
318{\sf adouble}. Subsequently, the subprogram must be compiled with the
319header file \verb=<adolc/adouble.h>= included as described
320in \autoref{DecActVar}. Now let us consider the situation when {\sf eval} is
321still to be called with integer and real arguments, possibly from
322a program written in Fortran77, which  does not allow overloading.
324To automatically compute derivatives of the dependent
325variables {\sf y} with respect to the independent variables {\sf x}, we
326can make the body of the function into an active section. For
327example, we may modify the previous program segment
328as in \autoref{adolcexam}.
329The renaming and doubling up of the original independent and dependent
330variable vectors by active counterparts may seem at first a bit clumsy.
331However, this transformation has the advantage that the calling
332sequence and the computational part, i.e., where the function is
333really evaluated, of {\sf eval} remain completely
334unaltered. If the temporary variable {\sf t} had remained a {\sf double},
335the code would not compile, because of a type conflict in the assignment
336following the declaration. More detailed example codes are listed in
343{\sf void eval(} \= {\sf  int n,m,} \hspace{1.0 in}\= // number of independents and dependents\\
344\> {\sf double *px,} \> // independent passive variable vector \\
345\> {\sf double *py,} \> // dependent passive variable vector  \\ 
346\> {\sf int *k,}  \> // integer parameters \\
347\> {\sf double *z)} \> // parameter vector \\
348{\sf \{}\hspace{0.1 in}\= \> // beginning of function body \\
349\>{\sf  short int tag = 0;} \>   // tape array and/or tape file specifier\\
350\>{\sf trace\_on(tag);} \> // start tracing  \\
351\>{\sf adouble *x, *y;} \> // declare active variable pointers \\
352\>{\sf x = new adouble[n];}\>// declare active independent variables \\ 
353\>{\sf y = new adouble[m];} \> // declare active dependent variables \\
354\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \\
355\>\hspace{0.2in} {\sf x[i] \boldmath $\ll=$ \unboldmath  px[i];} \> // select independent variables \\
356\>{\sf adouble t = 0;}  \> // local variable declaration \\
357     \>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> //  begin crunch \\
358     \>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue crunch \\
359     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
360     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
361     \>{\sf  y[m-1] = t/m; }   \> //   end crunch as before\\
362     \>{\sf for (int j=0; j \boldmath $<$ \unboldmath m; j++)} \\
363     \>\hspace{0.2in}{\sf y[j] \boldmath $\gg=$ \unboldmath py[j];} \> // select dependent variables \\
364     \>{\sf  delete[] y;} \>// delete dependent active variables \\
365     \>{\sf  delete[] x;} \>// delete independent active variables \\
366     \>{\sf trace\_off();} \> // complete tape \\
367{\sf  \}}   \>\> // end of function
370\caption{Activated version of the code listed in \autoref{code1}}
375\subsection{Overloaded Operators and Functions}
378As in the subprogram discussed above, the actual computational
379statements of a C(++) code need not be altered for the purposes of
380automatic differentiation. All arithmetic operations, as well as the
381comparison and assignment operators, are overloaded, so any or all of
382their operands can be an active variable. An {\sf adouble x} occurring
383in a comparison operator is effectively replaced by its real value
384{\sf x.value()}. Most functions contained in the ANSI C standard for
385the math library are overloaded for active arguments. The only
386exceptions are the non-differentiable functions {\sf fmod} and
387{\sf modf}. Otherwise, legitimate C code in active sections can remain
388completely unchanged, provided the direct output of active variables
389is avoided. The rest of this subsection may be skipped by first time
390users who are not worried about marginal issues of differentiability
391and efficiency.
393The modulus {\sf fabs(x)} is everywhere Lipschitz continuous but not
394properly differentiable at the origin, which raises the question of
395how this exception ought to be handled. Fortunately, one can easily
396see that {\sf fabs(x)} and all its compositions with smooth
397functions are still directionally differentiable. These
398directional derivatives of arbitrary order can be propagated in the
399forward mode without any ambiguity. In other words, the forward mode as
400implemented in ADOL-C  computes Gateaux derivatives
401in certain directions, which reduce to Fr\'echet derivatives only
402if the dependence on the direction is linear. Otherwise,
403the directional derivatives are merely positively homogeneous with
404respect to the scaling of the directions.
405For the reverse mode, ADOL-C sets the derivative of {\sf fabs(x)} at
406the origin somewhat arbitrarily to zero.
408We have defined binary functions {\sf fmin} and {\sf fmax} for {\sf adouble}
409arguments, so that function and derivative values are obtained consistent
410with those of {\sf fabs} according to the identities
412 \min(a,b) = [a+b-|a-b|]/2 \quad {\rm and} \quad
413 \max(a,b) = [a+b+|a-b|]/2 \quad .
415These relations cannot hold if either $a$ or $b$ is infinite, in which
416case {\sf fmin} or {\sf fmax} and their derivatives may still be well
417defined. It should be noted that the directional differentiation of
418{\sf fmin} and {\sf fmax} yields at ties $a=b$ different results from
419the corresponding assignment based on the sign of $a-b$. For example,
420the statement
422 {\sf if (a $<$ b) c = a; else c = b;}
424yields for {\sf a}~=~{\sf b} and {\sf a}$^\prime < $~{\sf b}$^\prime$
425the incorrect directional derivative value
426{\sf c}$^\prime = $~{\sf  b}$^\prime$ rather than the correct
427{\sf c}$^\prime = $~{\sf  a}$^\prime$. Therefore this form of conditional assignment
428should be avoided by use of the function $\sf fmin(a,b)$. There
429are also versions of {\sf fmin} and {\sf fmax} for two passive
430arguments and mixed passive/active arguments are handled by
431implicit conversion.
432On the function class obtained by composing the modulus with real
433analytic functions, the concept of directional differentiation can be
434extended to the propagation of unique one-sided Taylor expansions.
435The branches taken by {\sf fabs, fmin}, and {\sf fmax}, are recorded
436on the tape.
438The functions {\sf sqrt}, {\sf pow}, and some inverse trigonometric
439functions have infinite slopes at the boundary points of their domains.
440At these marginal points the derivatives are set by ADOL-C to
441either {\sf $\pm$InfVal}, 0
442or {\sf NoNum}, where {\sf InfVal} and {\sf NoNum} are user-defined
443parameters, see \autoref{Customizing}.
444On IEEE machines {\sf InfVal} can be set to the special value
445{\sf Inf}~=~$1.0/0.0$ and {\sf NoNum} to {\sf NaN}~=~$0.0/0.0$.
446For example, at {\sf a}~=~0 the first derivative {\sf b}$^\prime$ 
447of {\sf b}~=~{\sf sqrt(a)} is set to
449{\sf b}^\prime = \left\{
451\sf InfVal&\mbox{if}\;\; {\sf a}^\prime>0  \\
4520&\mbox{if}\;\;{\sf a}^\prime =0 \\
453\sf NoNum&\mbox{if}\;\;{\sf a}^\prime <0\\
454\end{array} \right\enspace .
456In other words, we consider {\sf a} and
457consequently {\sf b}  as a constant when {\sf a}$^\prime$ or more generally
458all computed Taylor coefficients are zero.
460The general power function ${\sf pow(x,y)=x^y}$ is computed whenever
461it is defined for the corresponding {\sf double} arguments. If {\sf x} is
462negative, however, the partial derivative with respect to an integral exponent
463is set to zero.
464%Similarly, the partial of {\bf pow} with respect to both arguments
465%is set to zero at the origin, where both arguments vanish.     
466The derivatives of the step functions
467{\sf floor}, {\sf ceil}, {\sf frexp}, and {\sf ldexp} are set to zero at all
468arguments {\sf x}. The result values of the step functions
469are recorded on the tape and can later be checked to recognize
470whether a step to another level was taken during a forward sweep
471at different arguments than at taping time.
473Some C implementations supply other special
474functions, in particular the error function {\sf erf(x)}. For the
475latter, we have included an {\sf adouble} version in \verb=<adouble.cpp>=, which
476has been commented out for systems on which the {\sf double} valued version
477is not available. The increment and decrement operators {\sf ++}, \boldmath $--$ \unboldmath (prefix and
478postfix) are available for {\sf adouble}s.
480% XXX: Vector and matrix class have to be reimplemented !!!
482% and also the
483%active subscripts described in the \autoref{act_subscr}.
484Ambiguous statements like {\sf a += a++;} must be
485avoided because the compiler may sequence the evaluation of the
487expression differently from the original in terms of {\sf double}s.
489As we have indicated above, all subroutines called with active arguments
490must be modified or suitably overloaded. The simplest procedure is
491to declare the local variables of the function as active so that
492their internal calculations are also recorded on the tape.
493Unfortunately, this approach is likely to be unnecessarily inefficient
494and inaccurate if the original subroutine evaluates a special function
495that is defined as the solution of a particular mathematical problem.
496The most important examples are implicit functions, quadratures,
497and solutions of ordinary differential equations. Often
498the numerical methods for evaluating such special functions are
499elaborate, and their internal workings are not at all differentiable in
500the data. Rather than differentiating through such an adaptive
501procedure, one can obtain first and higher derivatives directly from
502the mathematical definition of the special function. Currently this
503direct approach has been implemented only for user-supplied quadratures
504as described in \autoref{quadrat}.
507\subsection{Reusing the Tape for Arbitrary Input Values}
510In some situations it may be desirable to calculate the value and
511derivatives of a function at arbitrary arguments by using a tape of
512the function evaluation at one argument and reevaluating the
513function  and its derivatives using the given ADOL-C
514routines. This approach can
515significantly reduce run times, and it
516also allows to port problem functions, in the form of the 
517corresponding tape files, into a computing environment that
518does not support C++ but does support C or Fortran. 
519Therefore, the routines provided by ADOL-C for the evaluation of derivatives
520can be used to at arguments $x$ other than the
521point at which the tape was generated, provided there are
522no user defined quadratures and all comparisons involving
523{\sf adouble}s yield the same result. The last condition
524implies that the control flow is unaltered by the change
525of the independent variable values. Therefore, this sufficient
526condition is tested by ADOL-C and if it is not met
527the ADOL-C routine called for derivative calculations indicates this
528contingency through its return value. Currently, there are six return values,
529see \autoref{retvalues}.
533 +3 &
536The function is locally analytic.
538\end{minipage} \\ \hline
539 +2 &
542The function is locally analytic but the sparsity
543structure (compared to the situation at the  taping point)
544may have changed, e.g. while at taping arguments
545{\sf fmax(a,b)} returned {\sf a} we get {\sf b} at
546the argument currently used.
548\end{minipage} \\ \hline
549 +1 &
552At least one of the functions {\sf fmin}, {\sf fmax} or {\sf fabs}
553is  evaluated at a tie or zero, respectively.  Hence, the function to be differentiated is
554Lipschitz-continuous but possibly non-differentiable.
556\end{minipage} \\ \hline
557 0 &
560Some arithmetic comparison involving {\sf adouble}s yields a tie.
561Hence, the function to be differentiated  may be discontinuous.
563\end{minipage} \\ \hline
564 -1 &
567An {\sf adouble} comparison yields different results
568from the evaluation point at which the tape was generated.
570\end{minipage} \\ \hline
571 -2 &
574The argument of a user-defined quadrature has changed
575from the evaluation point at which the tape was generated.
577\end{minipage} \\ \hline
579\caption{Description of return values}
585\caption{Return values around the taping point}
589In \autoref{fi:tap_point} these return values are illustrated.
590If the user finds the return value of an ADOL-C routine to be negative the
591taping process simply has to be repeated by executing the active section again.
592The crux of the problem lies in the fact that the tape records only
593the operations that are executed during one particular evaluation of the
595It also has no way to evaluate integrals since the corresponding
596quadratures are never recorded on the tape.
597Therefore, when there are user-defined quadratures the retaping is necessary at each
598new point. If there are only branches conditioned on {\sf adouble}
599comparisons one may hope that re-taping becomes unnecessary when
600the points settle down in some small neighborhood, as one would
601expect for example in an iterative equation solver.
604\subsection{Conditional Assignments}
607It appears unsatisfactory that, for example, a simple table lookup
608of some physical property forces the re-recording of a possibly
609much larger calculation. However, the basic philosophy of ADOL-C
610is to overload arithmetic, rather than to generate a new program
611with jumps between ``instructions'', which would destroy the
612strictly sequential tape access and
613require the infusion of substantial compiler technology.
614Therefore, we introduce the two constructs of conditional
615assignments and active integers as partial remedies to the
616branching problem.
618In many cases, the functionality of branches
619can be replaced by conditional assignments. 
620For this purpose, we provide a special function called
621{\sf condassign(a,b,c,d)}. Its calling sequence corresponds to the
622syntax of the conditional assignment
624    {\sf a = (b \boldmath $>$ \unboldmath 0) ? c : d;} 
626which C++ inherited from C. However, here the arguments are restricted to be
627active or passive scalar arguments, and all expression arguments
628are evaluated before the test on {\sf  b}, which is different from
629the usual conditional assignment or the code segment.
631Suppose the original program contains the code segment
633{\sf if (b \boldmath $>$ \unboldmath 0) a = c; else a = d;}\\
635Here, only one of the expressions (or, more generally, program blocks)
636{\sf c} and {\sf d} is evaluated, which exactly constitutes the problem
637for ADOL-C. To obtain the correct value {\sf a} with ADOL-C, one
638may first execute both branches and then pick either {\sf c}
639or {\sf d} using
640{\sf condassign(a,b,c,d)}. To maintain
641consistency with the original code, one has to make sure
642that the two branches do not have any side effects that can
643interfere with each other or may be important for subsequent
644calculations. Furthermore the test parameter {\sf b} has to be an
645{\sf adouble} or an {\sf adouble} expression. Otherwise the
646test condition {\sf b} is recorded on the tape as a {\em constant} with its
647run time value. Thus the original dependency of {\sf b} on
648active variables gets lost, for instance if {\sf b} is a comparison
649expression, see \autoref{OverOper}.
650If there is no {\sf else} part in a conditional assignment, one may call
651the three argument version
652{\sf condassign(a,b,c)}, which
653is logically equivalent to {\sf condassign(a,b,c,a)} in that
654nothing happens if {\sf b} is non-positive. 
655The header file \verb=<adolc/adouble.h>=
656contains also corresponding definitions of
657{\sf condassign(a,b,c,d)} 
658and {\sf condassign(a,b,c)} for
659passive {\sf double} arguments so that the modified code
660without any differentiation can be tested
661for correctness.
664\subsection{Step-by-Step Modification Procedure}
666To prepare a section of given C or C++ code for automatic
667differentiation as described above, one applies the following step-by-step procedure.
670Use the statements {\sf trace\_on(tag)} or {\sf trace\_on(tag,keep)}
671and {\sf trace\_off()} or {\sf trace\_off(file)} to mark the
672beginning and end of the active section.
674Select the set of active variables, and change their type from
675{\sf double} or {\sf float} to {\sf adouble}.
677Select a sequence of independent variables, and initialize them with
678\boldmath $\ll=$ \unboldmath assignments from passive variables or vectors.
680Select a sequence of dependent variables among the active variables,
681and pass their final values to passive variable or vectors thereof
682by \boldmath $\gg=$ \unboldmath assignments.
684Compile the codes after including the header file \verb=<adolc/adouble.h>=.
686Typically, the first compilation will detect several type conflicts
687-- usually attempts to convert from active to passive
688variables or to perform standard I/O of active variables.
689Since all standard
690C programs can be activated by a mechanical application of the
691procedure above, the following section is of importance
692only to advanced users.
695\section{Numbering the Tapes and Controlling the Buffer}
698The trace generated by the execution of an active section may stay
699within a triplet of internal arrays or it may be written out
700to three corresponding files. We will refer to these triplets as the
701tape array or tape file, in general tape, which may subsequently be
702used to evaluate the
703underlying function and its derivatives at the original point or at
704alternative arguments. If the active section involves user-defined
705quadratures it must be executed and
706re-taped at each new argument. Similarly, if conditions on
707{\sf adouble} values lead to a different program branch being taken at
708a new argument the evaluation process also needs to be re-taped at the
709new point. Otherwise, direct evaluation from
710the tape by the routine {\sf function} (\autoref{optdrivers}) is
711likely to be
712faster. The use of quadratures and the results of all comparisons on
713{\sf adouble}s are recorded on the tape so that {\sf function} and other
714forward routines stop and  return appropriate flags if their use without
715prior re-taping is unsafe. To avoid any re-taping certain types of
716branches can be recorded on the tape through
717the use of conditional assignments 
718described before in \autoref{condassign}.
720Several tapes may be generated and kept simultaneously.
721A tape array is used as a triplet of buffers or a tape file is generated if
722the length of any of the buffers exceeds the maximal array lengths of
723{\sf OBUFSIZE}, {\sf VBUFSIZE} or {\sf LBUFSIZE}. These parameters are
724defined in the header file \verb=<adolc/usrparms.h>=
725and may be adjusted by the user in the header file before compiling
726the ADOL-C library, or on runtime using a file named \verb=.adolcrc=.
727The filesystem folder, where the tapes files may be written to disk,
728can be changed by changing the definition of {\sf TAPE\_DIR} in
729the header file \verb=<adolc/dvlparms.h>= before
730compiling the ADOL-C library, or on runtime by defining {\sf
731  TAPE\_DIR} in the \verb=.adolcrc= file. By default this is defined
732to be the present working directory (\verb=.=).
734For simple usage, {\sf trace\_on} may be called with only the tape
735{\sf tag} as argument, and {\sf trace\_off} may be called
736without argument. The optional integer argument {\sf keep} of
737{\sf trace\_on} determines whether the numerical values of all
738active variables are recorded in a buffered temporary array or file
739called the taylor stack.
740This option takes effect if
741{\sf keep} = 1 and prepares the scene for an immediately following
742gradient evaluation by a call to a routine implementing the reverse mode
743as described in the \autoref{forw_rev_ad} and \autoref{forw_rev}. A
744file is used instead of an array if the size exceeds the maximal array
745length of {\sf TBUFSIZE} defined in \verb=<adolc/usrparms.h>= and may
746be adjusted in the same way like the other buffer sizes mentioned above.
747Alternatively, gradients may be evaluated by a call
748to {\sf gradient}, which includes a preparatory forward sweep
749for the creation of the temporary file. If omitted, the argument
750{\sf  keep} defaults to 0, so that no temporary
751taylor stack file is generated.
753By setting the optional integer argument {\sf file} of
754{\sf  trace\_off} to 1, the user may force a numbered  tape
755file to be written even if the tape array (buffer) does not overflow.
756If the argument {\sf file} is omitted, it
757defaults to 0, so that the tape array is written onto a tape file only
758if the length of any of the buffers exceeds {\sf [OLVT]BUFSIZE} elements.
760After the execution of an active section, if a tape file was generated, i.e.,
761if the length of some buffer exceeded {\sf [OLVT]BUFSIZE} elements or if the
762argument {\sf file} of {\sf trace\_off} was set to 1, the files will be
763saved in the directory defined as {\sf ADOLC\_TAPE\_DIR} (by default
764the current working directory) under filenames formed by
765the strings {\sf ADOLC\_OPERATIONS\_NAME}, {\sf
767  ADOLC\_TAYLORS\_NAME} defined in
768the header file \verb=<adolc/dvlparms.h>= appended with the number
769given as the {\sf tag} argument to {\sf trace\_on} and have the
770extension {\sf .tap}.
772 Later, all problem-independent routines
773like {\sf gradient}, {\sf jacobian}, {\sf forward}, {\sf reverse}, and others
774expect as first argument a {\sf tag} to determine
775the tape on which their respective computational task is to be performed.
776By calling {\sf trace\_on} with different tape {\sf tag}s, one can create
777several tapes for various function evaluations and subsequently perform
778function and derivative evaluations on one or more of them.
780For example, suppose one wishes to calculate for two smooth functions
781$f_1(x)$ and $f_2(x)$ 
783   f(x) = \max \{f_1(x) ,f_2(x)\},\qquad \nabla f(x),
785and possibly higher derivatives where the two functions do not tie.
786Provided $f_1$ and $f_2$ are evaluated in two separate active sections,
787one can generate two different tapes by calling {\sf trace\_on} with
788{\sf tag} = 1 and {\sf tag} = 2 at the beginning of the respective active
790Subsequently, one can decide whether $f(x)=f_1(x)$ or $f(x)=f_2(x)$ at the
791current argument and then evaluate the gradient $\nabla f(x)$ by calling
792{\sf gradient} with the appropriate argument value {\sf tag} = 1 or
793{\sf tag} = 2.
796\subsection{Examining the Tape and Predicting Storage Requirements }
799At any point in the program, one may call the routine
801{\sf void tapestats(unsigned short tag, int* counts)}
803with {\sf counts} beeing an array of at least eleven integers.
804The first argument {\sf tag} specifies the particular tape of
805interest. The components of {\sf counts} represent
808{\sf counts[0]}: & the number of independents, i.e.~calls to \boldmath $\ll=$ \unboldmath, \\
809{\sf counts[1]}: & the number of dependents, i.e.~calls to \boldmath $\gg=$ \unboldmath,\\ 
810{\sf counts[2]}: & the maximal number of live active variables,\\
811{\sf counts[3]}: & the size of taylor stack (number of overwrites),\\
812{\sf counts[4]}: & the buffer size (a multiple of eight),
817{\sf counts[5]}: & the total number of operations recorded,\\
818{\sf counts[6-13]}: & other internal information about the tape.
821The values {\sf maxlive} = {\sf counts[2]} and {\sf tssize} = {\sf counts[3]} 
822determine the temporary
823storage requirements during calls to the routines
824implementing the forward and the reverse mode.
825For a certain degree {\sf deg} $\geq$ 0, the scalar version of the
826forward mode involves apart from the tape buffers an array of
827 $(${\sf deg}$+1)*${\sf maxlive} {\sf double}s in
828core and, in addition, a sequential data set called the value stack
829of {\sf tssize}$*${\sf keep} {\sf revreal}s if called with the
830option {\sf keep} $>$ 0. Here
831the type {\sf revreal} is defined as {\sf double} or {\sf float} in
832the header file \verb=<adolc/usrparms.h>=. The latter choice halves the storage
833requirement for the sequential data set, which stays in core if
834its length is less than {\sf TBUFSIZE} bytes and is otherwise written
835out to a temporary file. The parameter {\sf TBUFSIZE} is defined in the header file \verb=<adolc/usrparms.h>=.
836The drawback of the economical
837{\sf revreal} = {\sf float} choice is that subsequent calls to reverse mode implementations
838yield gradients and other adjoint vectors only in single-precision
839accuracy. This may be acceptable if the adjoint vectors
840represent rows of a Jacobian that is  used for the calculation of
841Newton steps. In its scalar version, the reverse mode implementation involves
842the same number of {\sf double}s and twice as many {\sf revreal}s as the
843forward mode implementation.
844The storage requirements of the vector versions of the forward mode and
845reverse mode implementation are equal to that of the scalar versions multiplied by
846the vector length.
849\subsection{Customizing ADOL-C}
852Based on the information provided by the routine {\sf tapestats}, the user may alter the
853following types and constant dimensions in the header file \verb=<adolc/usrparms.h>=
854to suit his problem and environment.
857\item[{\sf OBUFSIZE}, {\sf LBUFSIZE}, {\sf VBUFSIZE}{\rm :}] These integer determines the length of
858in\-ter\-nal buf\-fers (default: 65$\,$536). If the buffers are large enough to accommodate all
859required data, any file access is avoided unless {\sf trace\_off}
860is called with a positive argument. This desirable situation can
861be achieved for many problem functions with an execution trace of moderate
862size. Primarily these values occur as an argument
863to {\sf malloc}, so that setting it unnecessarily large may have no
864ill effects, unless the operating system prohibits or penalizes large
865array allocations.
867\item[{\sf TBUFSIZE}{\rm :}] This integer determines the length of the
868in\-ter\-nal buf\-fer for a taylor stack (default: 65$\,$536).
870\item[{\sf TBUFNUM}{\rm :}] This integer determines the maximal number of taylor stacks (default: 32).
872\item[{\sf locint}{\rm :}] The range of the integer type
873{\sf locint} determines how many {\sf adouble}s can be simultaneously
874alive (default: {\sf unsigned int}).  In extreme cases when there are more than $2^{32}$ {\sf adouble}s
875alive at any one time, the type {\sf locint} must be changed to
876 {\sf unsigned long}.
878\item[{\sf revreal}{\rm :}] The choice of this floating-point type
879trades accuracy with storage for reverse sweeps (default: {\sf double}). While functions
880and their derivatives are always evaluated in double precision
881during forward sweeps, gradients and other adjoint vectors are obtained
882with the precision determined by the type {\sf revreal}. The less
883accurate choice {\sf revreal} = {\sf float} nearly halves the
884storage requirement during reverse sweeps.
886\item[{\sf fint}{\rm :}] The integer data type used by Fortran callable versions of functions.
888\item[{\sf fdouble}{\rm :}] The floating point data type used by Fortran callable versions of functions.
890\item[{\sf inf\_num}{\rm :}] This together with {\sf inf\_den}
891sets the ``vertical'' slope {\sf InfVal} = {\sf inf\_num/inf\_den} 
892of special functions at the boundaries of their domains (default: {\sf inf\_num} = 1.0). On IEEE machines
893the default setting produces the standard {\sf Inf}. On non-IEEE machines
894change these values to produce a small {\sf InfVal} value and compare
895the results of two forward sweeps with different {\sf InfVal} settings
896to detect a ``vertical'' slope.
898\item[{\sf inf\_den}{\rm :}] See {\sf inf\_num} (default: 0.0).
900\item[{\sf non\_num}{\rm :}] This together with {\sf non\_den} 
901sets the mathematically
902undefined derivative value {\sf NoNum} = {\sf non\_num/non\_den}
903of special functions at the boundaries of their domains (default: {\sf non\_num} = 0.0). On IEEE machines
904the default setting produces the standard {\sf NaN}. On non-IEEE machines
905change these values to produce a small {\sf NoNum} value and compare
906the results of two forward sweeps with different {\sf NoNum} settings
907to detect the occurrence of undefined derivative values.
909\item[{\sf non\_den}{\rm :}] See {\sf non\_num} (default: 0.0).
911\item[{\sf ADOLC\_EPS}{\rm :}] For testing on small numbers to avoid overflows (default: 10E-20).
913\item[{\sf ATRIG\_ERF}{\rm :}] By removing the comment signs
914the overloaded versions of the inverse hyperbolic functions and
915the error function are enabled (default: undefined).
917\item[{\sf DIAG\_OUT}{\rm :}] File identifier used as standard output for ADOL-C diagnostics (default: stdout).
919\item[{\sf ADOLC\_USE\_CALLOC}{\rm :}] Selects the memory allocation routine
920  used by ADOL-C. {\sf Malloc} will be used if this variable is
921  undefined. {\sf ADOLC\_USE\_CALLOC} is defined by default to avoid incorrect
922  result caused by uninitialized memory.
926\subsection{Warnings and Suggestions for Improved Efficiency}
929Since the type {\sf adouble} has a nontrivial constructor,
930the mere declaration of large {\sf adouble} arrays may take up
931considerable run time. The user should be warned against
932the usual Fortran practice of declaring fixed-size arrays
933that can accommodate the largest possible case of an evaluation program
934with variable dimensions. If such programs are converted to or written
935in C, the overloading in combination with ADOL-C will lead to very
936large run time increases for comparatively small values of the
937problem dimension, because the actual computation is completely
938dominated by the construction of the large {\sf adouble} arrays.
939The user is advised to
940create dynamic arrays of
941{\sf adouble}s by using the C++ operator {\sf new} and to destroy them
942using {\sf delete}. For storage efficiency it is desirable that
943dynamic objects are created and destroyed in a last-in-first-out
946Whenever an {\sf adouble} is declared, the constructor for the type
947{\sf adouble} assigns it a nominal address, which we will refer to as
948its  {\em location}.  The location is of the type {\sf locint} defined
949in the header file \verb=<adolc/usrparms.h>=. Active vectors occupy
950a range of contiguous locations. As long as the program execution
951never involves more than 65$\,$536 active variables, the type {\sf locint}
952may be defined as {\sf unsigned short}. Otherwise, the range may be
953extended by defining {\sf locint} as {\sf (unsigned) int} or
954{\sf (unsigned) long}, which may nearly double
955the overall mass storage requirement. Sometimes one can avoid exceeding
956the accessible range of {\sf unsigned short}s by using more local variables and deleting
957{\sf adouble}s  created by the new operator in a
959fashion.  When memory for {\sf adouble}s is requested through a call to
960{\sf malloc()} or other related C memory-allocating
961functions, the storage for these {\sf adouble}s is allocated; however, the
962C++ {\sf adouble} constructor is never called.  The newly defined
963{\sf adouble}s are never assigned a location and are not counted in
964the stack of live variables. Thus, any results depending upon these
965pseudo-{\sf adouble}s will be incorrect. For these reasons {\bf DO NOT use
966  malloc() and related C memory-allocating
967functions when declaring adoubles (see the following paragraph).}
969% XXX: Vector and matrix class have to be reimplemented !!!
971%The same point applies, of course,
972% for active vectors.
974When an {\sf adouble}
976% XXX: Vector and matrix class have to be reimplemented !!!
978% or {\bf adoublev}
979goes out of
980scope or is explicitly deleted, the destructor notices that its
981location(s) may be
982freed for subsequent (nominal) reallocation. In general, this is not done
983immediately but is delayed until the locations to be deallocated form a
984contiguous tail of all locations currently being used. 
986 As a consequence of this allocation scheme, the currently
987alive {\sf adouble} locations always form a contiguous range of integers
988that grows and shrinks like a stack. Newly declared {\sf adouble}s are
989placed on the top so that vectors of {\sf adouble}s obtain a contiguous
990range of locations. While the C++ compiler can be expected to construct
991and destruct automatic variables in a last-in-first-out fashion, the
992user may upset this desirable pattern by deleting free-store {\sf adouble}s
993too early or too late. Then the {\sf adouble} stack may grow
994unnecessarily, but the numerical results will still be
995correct, unless an exception occurs because the range of {\sf locint}
996is exceeded. In general, free-store {\sf adouble}s
998% XXX: Vector and matrix class have to be reimplemented !!!
1000%and {\bf adoublev}s
1001should be deleted in a last-in-first-out fashion toward the end of
1002the program block in which they were created.
1003When this pattern is maintained, the maximum number of
1004{\sf adouble}s alive and, as a consequence, the
1005randomly accessed storage space
1006of the derivative evaluation routines is bounded by a
1007small multiple of the memory used in the relevant section of the
1008original program. Failure to delete dynamically allocated {\sf adouble}s
1009may cause that the  maximal number of {\sf adouble}s alive at one time will be exceeded
1010if the same active section is called repeatedly. The same effect
1011occurs if static {\sf adouble}s are used.
1013To avoid the storage and manipulation of structurally
1014trivial derivative values, one should pay careful attention to
1015the naming of variables. Ideally, the intermediate
1016values generated during the evaluation of a vector function
1017should be assigned to program variables that are
1018consistently either active or passive, in that all their values
1019either are or are not dependent on the independent variables
1020in a nontrivial way. For example, this rule is violated if a temporary
1021variable is successively used to accumulate inner products involving
1022first only passive and later active arrays. Then the first inner
1023product and all its successors in the data dependency graph become
1024artificially active and the derivative evaluation routines
1025described later will waste
1026time allocating and propagating
1027trivial or useless derivatives. Sometimes even values that do
1028depend on the independent variables may be of only transitory
1029importance and may not affect the dependent variables. For example,
1030this is true for multipliers that are used to scale linear
1031equations, but whose values do not influence the dependent
1032variables in a mathematical sense. Such dead-end variables
1033can be deactivated by the use of the {\sf value} function, which
1034converts {\sf adouble}s to {\sf double}s. The deleterious effects
1035of unnecessary activity are partly alleviated by run time
1036activity flags in the derivative routine
1037{\sf hov\_reverse} presented in \autoref{forw_rev_ad}.
1042\section{Easy-To-Use Drivers}
1045For the convenience of the user, ADOL-C provides several
1046easy-to-use drivers that compute the most frequently required
1047derivative objects. Throughout, we assume that after the execution of an
1048active section, the corresponding tape with the identifier {\sf tag}
1049contains a detailed record of the computational process by which the
1050final values $y$ of the dependent variables were obtained from the
1051values $x$ of the independent variables. We will denote this functional
1052relation between the input variables $x$ and the output variables $y$ by
1054F : \R^n \mapsto \R^m, \qquad x \rightarrow F(x) \equiv y.
1056The return value of all drivers presented in this section
1057indicate the validity of the tape as explained in \autoref{reuse_tape}.
1058The presented drivers are all C functions and therefore can be used within
1059C and C++ programs. Some Fortran-callable companions can be found
1060in the appropriate header files.
1063\subsection{Drivers for Optimization and Nonlinear Equations}
1067The drivers provided for solving optimization problems and nonlinear
1068equations are prototyped in the header file \verb=<adolc/drivers/drivers.h>=,
1069which is included automatically by the global header file \verb=<adolc/adolc.h>=
1070(see \autoref{ssec:DesIH}).
1072The routine {\sf function} allows to evaluate the desired function from
1073the tape instead of executing the corresponding source code:
1076\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1077\>{\sf int function(tag,m,n,x,y)}\\
1078\>{\sf short int tag;}         \> // tape identification \\
1079\>{\sf int m;}                 \> // number of dependent variables $m$\\
1080\>{\sf int n;}                 \> // number of independent variables $n$\\
1081\>{\sf double x[n];}           \> // independent vector $x$ \\
1082\>{\sf double y[m];}           \> // dependent vector $y=F(x)$ 
1085If the original evaluation program is available this double version
1086should be used to compute the function value in order to avoid the
1087interpretative overhead. 
1089For the calculation of whole derivative vectors and matrices up to order
10902 there are the following procedures:
1093\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1094\>{\sf int gradient(tag,n,x,g)}\\
1095\>{\sf short int tag;}         \> // tape identification \\
1096\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1097\>{\sf double x[n];}           \> // independent vector $x$ \\
1098\>{\sf double g[n];}           \> // resulting gradient $\nabla F(x)$
1102\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1103\>{\sf int jacobian(tag,m,n,x,J)}\\
1104\>{\sf short int tag;}         \> // tape identification \\
1105\>{\sf int m;}                 \> // number of dependent variables $m$\\
1106\>{\sf int n;}                 \> // number of independent variables $n$\\
1107\>{\sf double x[n];}           \> // independent vector $x$ \\
1108\>{\sf double J[m][n];}        \> // resulting Jacobian $F^\prime (x)$
1112\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1113\>{\sf int hessian(tag,n,x,H)}\\
1114\>{\sf short int tag;}         \> // tape identification \\
1115\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1116\>{\sf double x[n];}           \> // independent vector $x$ \\
1117\>{\sf double H[n][n];}        \> // resulting Hessian matrix $\nabla^2F(x)$ 
1120The driver routine {\sf hessian} computes only the lower half of
1121$\nabla^2f(x_0)$ so that all values {\sf H[i][j]} with $j>i$ 
1122of {\sf H} allocated as a square array remain untouched during the call
1123of {\sf hessian}. Hence only $i+1$ {\sf double}s  need to be
1124allocated starting at the position {\sf H[i]}.
1126To use the full capability of automatic differentiation when the
1127product of derivatives with certain weight vectors or directions are needed, ADOL-C offers
1128the following four drivers: 
1131\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1132\>{\sf int vec\_jac(tag,m,n,repeat,x,u,z)}\\
1133\>{\sf short int tag;}         \> // tape identification \\
1134\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1135\>{\sf int n;}                 \> // number of independent variables $n$\\
1136\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1137\>{\sf double x[n];}           \> // independent vector $x$ \\
1138\>{\sf double u[m];}           \> // range weight vector $u$ \\ 
1139\>{\sf double z[n];}           \> // result $z = u^TF^\prime (x)$
1141If a nonzero value of the parameter {\sf repeat} indicates that the
1142routine {\sf vec\_jac} has been called at the same argument immediately
1143before, the internal forward mode evaluation will be skipped and only
1144reverse mode evaluation with the corresponding arguments is executed
1145resulting in a reduced computational complexity of the function {\sf vec\_jac}.
1148\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1149\>{\sf int jac\_vec(tag,m,n,x,v,z)}\\
1150\>{\sf short int tag;}         \> // tape identification \\
1151\>{\sf int m;}                 \> // number of dependent variables $m$\\
1152\>{\sf int n;}                 \> // number of independent variables $n$\\
1153\>{\sf double x[n];}           \> // independent vector $x$\\
1154\>{\sf double v[n];}           \> // tangent vector $v$\\ 
1155\>{\sf double z[m];}           \> // result $z = F^\prime (x)v$
1159\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1160\>{\sf int hess\_vec(tag,n,x,v,z)}\\
1161\>{\sf short int tag;}         \> // tape identification \\
1162\>{\sf int n;}                 \> // number of independent variables $n$\\
1163\>{\sf double x[n];}           \> // independent vector $x$\\
1164\>{\sf double v[n];}           \> // tangent vector $v$\\
1165\>{\sf double z[n];}           \> // result $z = \nabla^2F(x) v$ 
1169\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1170\>{\sf int hess\_mat(tag,n,p,x,V,Z)}\\
1171\>{\sf short int tag;}         \> // tape identification \\
1172\>{\sf int n;}                 \> // number of independent variables $n$\\
1173\>{\sf int p;}                 \> // number of columns in $V$\\
1174\>{\sf double x[n];}           \> // independent vector $x$\\
1175\>{\sf double V[n][p];}        \> // tangent matrix $V$\\
1176\>{\sf double Z[n][p];}        \> // result $Z = \nabla^2F(x) V$ 
1180\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1181\>{\sf int lagra\_hess\_vec(tag,m,n,x,v,u,h)}\\
1182\>{\sf short int tag;}         \> // tape identification \\
1183\>{\sf int m;}                 \> // number of dependent variables $m$\\
1184\>{\sf int n;}                 \> // number of independent variables $n$\\
1185\>{\sf double x[n];}           \> // independent vector $x$\\
1186\>{\sf double v[n];}           \> // tangent vector $v$\\
1187\>{\sf double u[m];}           \> // range weight vector $u$ \\
1188\>{\sf double h[n];}           \> // result $h = u^T\nabla^2F(x) v $
1191The next procedure allows the user to perform Newton steps only
1192having the corresponding tape at hand:
1195\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1196\>{\sf int jac\_solv(tag,n,x,b,mode)} \\
1197\>{\sf short int tag;}         \> // tape identification \\
1198\>{\sf int n;}                 \> // number of independent variables $n$\\
1199\>{\sf double x[n];}           \> // independent vector $x$ as\\
1200\>{\sf double b[n];}           \> // in: right-hand side b, out: result $w$ of
1201$F(x)w = b$\\
1202\>{\sf int mode;}              \> // option to choose different solvers
1205On entry, parameter {\sf b} of the routine {\sf jac\_solv}
1206contains the right-hand side of the equation $F(x)w = b$ to be solved. On exit,
1207{\sf b} equals the solution $w$ of this equation. If {\sf mode} = 0 only
1208the Jacobian of the function
1209given by the tape labeled with {\sf tag} is provided internally.
1210The LU-factorization of this Jacobian is computed for {\sf mode} = 1. The
1211solution of the equation is calculated if {\sf mode} = 2.
1212Hence, it is possible to compute the
1213LU-factorization only once. Then the equation can be solved for several
1214right-hand sides $b$ without calculating the Jacobian and
1215its factorization again. 
1217If the original evaluation code of a function contains neither
1218quadratures nor branches, all drivers described above can be used to
1219evaluate derivatives at any argument in its domain. The same still
1220applies if there are no user defined quadratures and
1221all comparisons  involving {\sf adouble}s have the same result as
1222during taping. If this assumption is falsely made all drivers
1223while internally calling the forward mode evaluation will return the value -1 or -2
1224as already specified in \autoref{reuse_tape}
1227\subsection{Drivers for Ordinary Differential Equations}
1230When $F$ is the right-hand side of an (autonomous) ordinary
1231differential equation 
1233x^\prime(t) \; = \; F(x(t)) , 
1235we must have $m=n$. Along any solution path $x(t)$ its Taylor
1236coefficients $x_j$ at some time, e.g., $t=0$, must satisfy
1237the relation
1239 x_{i+1} = \frac{1}{1+i} y_i.
1241with the $y_j$ the Taylor coefficients of its derivative $y(t)=x^\prime(t)$, namely,
1243 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
1245defined by an autonomous right-hand side $F$ recorded on the tape.
1246Using this relation, one can generate the Taylor coefficients $x_i$,
1247$i \le deg$,
1248recursively from the current point $x_0$. This task is achieved by the
1249driver routine {\sf forode} defined as follows:
1252\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1253\>{\sf int forode(tag,n,tau,dol,deg,X)}\\
1254\>{\sf short int tag;}         \> // tape identification \\
1255\>{\sf int n;}                 \> // number of state variables $n$\\
1256\>{\sf double tau;}            \> // scaling parameter\\
1257\>{\sf int dol;}               \> // degree on previous call\\
1258\>{\sf int deg;}               \> // degree on current call\\
1259\>{\sf double X[n][deg+1];}    \> // Taylor coefficient vector $X$
1262If {\sf dol} is positive, it is assumed that {\sf forode}
1263has been called before at the same point so that all Taylor coefficient
1264vectors up to the {\sf dol}-th are already correct.
1266Subsequently one may call the driver routine {\sf reverse} or corresponding
1267low level routines as explained in the \autoref{forw_rev} and
1268\autoref{forw_rev_ad}, respectively, to compute
1269the family of square matrices {\sf Z[n][n][deg]} defined by
1271Z_j \equiv U\/\frac{\partial y_j}{\partial x_0} \in{I\!\!R}^{q \times n} ,
1273with {\sf double** U}$=I_n$ the identity matrix of order {\sf n}.
1275For the numerical solutions of ordinary differential equations,
1276one may also wish to calculate the Jacobians
1279B_j \; \equiv \; \frac{\mbox{d}x_{j+1}}{\mbox{d} x_0}\;\in\;{I\!\!R}^{n \times n}\, ,
1281which exist provided $F$ is sufficiently smooth. These matrices can
1282be obtained from the partial derivatives $\partial y_i/\partial x_0$
1283by an appropriate version of the chain rule.
1284To compute the total derivatives $B = (B_j)_{0\leq j <d}$
1285defined in \eqref{eq:bees}, one has to evaluate $\frac{1}{2}d(d-1)$
1286matrix-matrix products. This can be done by a call of the routine {\sf accode} after the
1287corresponding evaluation of the {\sf hov\_reverse} function. The interface of
1288{\sf accode} is defined as follows:
1291\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1292\>{\sf int accode(n,tau,deg,Z,B,nz)}\\
1293\>{\sf int n;}                 \> // number of state variables $n$ \\
1294\>{\sf double tau;}            \> // scaling parameter\\
1295\>{\sf int deg;}               \> // degree on current call\\
1296\>{\sf double Z[n][n][deg];}   \> // partials of coefficient vectors\\
1297\>{\sf double B[n][n][deg];}   \> // result $B$ as defined in \eqref{eq:bees}\\
1298\>{\sf short nz[n][n];}        \> // optional nonzero pattern
1301Sparsity information can be exploited by {\sf accode} using the array {\sf
1302nz}. For this purpose, {\sf nz} has to be set by a call of the routine {\sf
1303reverse} or the corresponding basic routines as explained below in
1304\autoref{forw_rev_ad} and \autoref{forw_rev}, respectively. The
1305non-positive entries of {\sf nz} are then changed by {\sf accode} so that upon
1308  \mbox{{\sf B[i][j][k]}} \; \equiv \; 0 \quad {\rm if} \quad \mbox{\sf k} \leq \mbox{\sf $-$nz[i][j]}\; .
1310In other words, the matrices $B_k$ = {\sf B[ ][ ][k]} have a
1311sparsity pattern that fills in as $k$ grows. Note, that there need to be no
1312loss in computational efficiency if a time-dependent ordinary differential equation
1313is rewritten in autonomous form.
1315The prototype of the ODE-drivers {\sf forode} and {\sf accode} is contained in the header file
1316\verb=<adolc/drivers/odedrivers.h>=. The global header file
1318includes this file automatically, see \autoref{ssec:DesIH}.
1320An example program using the procedures {\sf forode} and {\sf accode} together
1321with more detailed information about the coding can be found in
1322\autoref{exam:ode}. The corresponding source code
1323\verb=odexam.cpp= is contained in the subdirectory
1328\subsection{Drivers for Sparse Jacobians and Sparse Hessians}
1331Quite often, the Jacobians and Hessians that have to be computed are sparse
1332matrices. Therefore, ADOL-C provides additionally drivers that
1333allow the exploitation of sparsity. The exploitation of sparsity is
1334frequently based on {\em graph coloring} methods, discussed
1335for example in \cite{GeMaPo05} and \cite{GeTaMaPo07}. The sparse drivers of ADOL-C presented in this section
1336rely on the the coloring package ColPack developed by the authors of \cite{GeMaPo05} and \cite{GeTaMaPo07}.
1337ColPack is not directly incorporated in ADOL-C, and therefore needs to be installed
1338separately to use the sparse drivers described here. ColPack is available for download at
1339\verb= More information about the required
1340installation of ColPack is given in \autoref{install}.
1342\subsubsection*{Sparse Jacobians and Sparse Hessians}
1344To compute the entries of sparse Jacobians and sparse Hessians,
1345respectively, in coordinate format one may use the drivers:
1347\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1348\>{\sf int sparse\_jac(tag,m,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1349\>{\sf short int tag;}         \> // tape identification \\
1350\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1351\>{\sf int n;}                 \> // number of independent variables $n$\\
1352\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1353\>{\sf double x[n];}           \> // independent vector $x$ \\
1354\>{\sf int nnz;}               \> // number of nonzeros \\ 
1355\>{\sf unsigned int rind[nnz];}\> // row index\\ 
1356\>{\sf unsigned int cind[nnz];}\> // column index\\ 
1357\>{\sf double values[nnz];}    \> // non-zero values\\ 
1358\>{\sf int options[4];}        \> // array of control parameters\\ 
1362\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1363\>{\sf int sparse\_hess(tag,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1364\>{\sf short int tag;}         \> // tape identification \\
1365\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1366\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1367\>{\sf double x[n];}           \> // independent vector $x$ \\
1368\>{\sf int nnz;}               \> // number of nonzeros \\ 
1369\>{\sf unsigned int rind[nnz];}\> // row indices\\ 
1370\>{\sf unsigned int cind[nnz];}\> // column indices\\ 
1371\>{\sf double values[nnz];}    \> // non-zero values  \\
1372\>{\sf int options[2];}        \> // array of control parameters\\ 
1375Once more, the input variables are the identifier for the internal
1376representation {\sf tag}, if required the number of dependents {\sf m},
1377and the number of independents {\sf n} for a consistency check.
1378Furthermore, the flag {\sf repeat=0} indicates that the functions are called
1379at a point with a new sparsity structure, whereas  {\sf repeat=1} results in
1380the re-usage of the sparsity pattern from the previous call.
1381The current values of the independents are given by the array {\sf x}.
1382The input/output
1383variable {\sf nnz} stores the number of the nonzero entries.
1384Therefore, {\sf nnz} denotes also the length of the arrays {\sf r\_ind} storing
1385the row indices, {\sf c\_ind} storing the column indices, and
1386{\sf values} storing the values of the nonzero entries.
1387If {\sf sparse\_jac} and {\sf sparse\_hess} are called with {\sf repeat=0},
1388the functions determine the number of nonzeros for the sparsity pattern
1389defined by the value of {\sf x}, allocate appropriate arrays {\sf r\_ind},
1390{\sf c\_ind}, and {\sf values} and store the desired information in these
1392During the next function call with {\sf repeat=1} the allocated memory
1393is reused such that only the values of the arrays are changed.   
1394Before calling {\sf sparse\_jac} or {\sf sparse\_hess} once more with {\sf
1395  repeat=0} the user is responsible for the deallocation of the array
1396 {\sf r\_ind}, {\sf c\_ind}, and {\sf values} using the function {\sf
1397   delete[]}!
1399For each driver the array {\sf options} can be used to adapted the
1400computation of the sparse derivative matrices to the special
1401needs of application under consideration. Most frequently, the default options
1402will give a reasonable performance. The elements of the array {\sf options} control the action of
1403{\sf sparse\_jac} according to \autoref{options_sparse_jac}.
1406\begin{tabular}{|c|c|l|} \hline
1407component & value &  \\ \hline
1408{\sf options[0]} &    &  way of sparsity pattern computation \\
1409                 & 0  &  propagation of index domains (default) \\
1410                 & 1  &  propagation of bit pattern \\ \hline
1411{\sf options[1]} &    &  test the computational graph control flow \\
1412                 & 0  &  safe mode (default) \\
1413                 & 1  &  tight mode \\ \hline
1414{\sf options[2]} &    &  way of bit pattern propagation \\
1415                 & 0  &  automatic detection (default) \\
1416                 & 1  &  forward mode \\ 
1417                 & 2  &  reverse mode \\ \hline
1418{\sf options[3]} &    &  way of compression \\
1419                 & 0  &  column compression (default) \\
1420                 & 1  &  row compression \\ \hline
1422\caption{ {\sf sparse\_jac} parameter {\sf options}\label{options_sparse_jac}}
1425The component {\sf options[1]} determines
1426the usage of the safe or tight mode of sparsity computation.
1427The first, more conservative option is the default. It accounts for all
1428dependences that might occur for any value of the
1429independent variables. For example, the intermediate
1430{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1431always assumed to depend on all independent variables that {\sf a} or {\sf b}
1432dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1433logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1434In contrast
1435the tight option gives this result only in the unlikely event of an exact
1436tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1437associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1438depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1439Obviously, the sparsity pattern obtained with the tight option may contain
1440more zeros than that obtained with the safe option. On the other hand, it
1441will only be valid at points belonging to an area where the function $F$ is locally
1442analytic and that contains the point at which the internal representation was
1443generated. Since generating the sparsity structure using the safe version does not
1444require any reevaluation, it may thus reduce the overall computational cost
1445despite the fact that it produces more nonzero entries.
1446The value of {\sf options[2]} selects the direction of bit pattern propagation.
1447Depending on the number of independent $n$ and of dependent variables $m$ 
1448one would prefer the forward mode if $n$ is significant smaller than $m$ and
1449would otherwise use the reverse mode.
1451 The elements of the array {\sf options} control the action of
1452{\sf sparse\_hess} according to \autoref{options_sparse_hess}.
1455\begin{tabular}{|c|c|l|} \hline
1456component & value &  \\ \hline
1457{\sf options[0]} &    &  test the computational graph control flow \\
1458                 & 0  &  safe mode (default) \\
1459                 & 1  &  tight mode \\ \hline
1460{\sf options[1]} &    &  way of recovery \\
1461                 & 0  &  indirect recovery (default) \\
1462                 & 1  &  direct recovery \\ \hline
1464\caption{ {\sf sparse\_hess} parameter {\sf options}\label{options_sparse_hess}}
1467The described driver routines for the computation of sparse derivative
1468matrices are prototyped in the header file
1469\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1470global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1471Example codes illustrating the usage of {\sf
1472  sparse\_jac} and {\sf sparse\_hess} can be found in the file
1473\verb=sparse_jacobian.cpp=  and \verb=sparse_hessian.cpp= contained in %the subdirectory
1477\subsubsection*{Computation of Sparsity Pattern}
1479ADOL-C offers a convenient way of determining the 
1480sparsity structure of a Jacobian matrix using the function:
1483\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1484\>{\sf int jac\_pat(tag, m, n, x, JP, options)}\\
1485\>{\sf short int tag;} \> // tape identification \\
1486\>{\sf int m;} \> // number of dependent variables $m$\\
1487\>{\sf int n;} \> // number of independent variables $n$\\
1488\>{\sf double x[n];} \> // independent variables $x_0$\\
1489\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure\\
1490\>{\sf int options[2];} \> // array of control parameters
1493The sparsity pattern of the
1494Jacobian is computed in a compressed row format. For this purpose,
1495{\sf JP} has to be an $m$ dimensional array of pointers to {\sf
1496  unsigned int}s, i.e., one has {\sf unsigned int* JP[m]}.
1497During the call of  {\sf jac\_pat}, the number $\hat{n}_i$ of nonzero
1498entries in row $i$ of the Jacobian is determined for all $1\le i\le
1499m$. Then, a memory allocation is performed such that {\sf JP[i-1]}
1500points to a block of $\hat{n}_i+1$ {\sf  unsigned int} for all $1\le
1501i\le m$ and {\sf JP[i-1][0]} is set to $\hat{n}_i$. Subsequently, the
1502column indices of the $j$ nonzero entries in the $i$th row are stored
1503in the components  {\sf JP[i-1][1]}, \ldots, {\sf JP[i-1][j]}.
1505The elements of the array {\sf options} control the action of
1506{\sf jac\_pat} according to \autoref{options}.
1509\begin{tabular}{|c|c|l|} \hline
1510component & value &  \\ \hline
1511{\sf options[0]} &    &  way of sparsity pattern computation \\
1512                 & 0  &  propagation of index domains (default) \\
1513                 & 1  &  propagation of bit pattern \\ \hline
1514{\sf options[1]} &    &  test the computational graph control flow \\
1515                 & 0  &  safe mode (default) \\
1516                 & 1  &  tight mode \\ \hline
1517{\sf options[2]} &    &  way of bit pattern propagation \\
1518                 & 0  &  automatic detection (default) \\
1519                 & 1  &  forward mode \\ 
1520                 & 2  &  reverse mode \\ \hline
1522\caption{ {\sf jac\_pat} parameter {\sf options}\label{options}}
1524The value of {\sf options[0]} selects the way to compute the sparsity
1525pattern. The component {\sf options[1]} determines
1526the usage of the safe or tight mode of bit pattern propagation.
1527The first, more conservative option is the default. It accounts for all
1528dependences that might occur for any value of the
1529independent variables. For example, the intermediate
1530{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1531always assumed to depend on all independent variables that {\sf a} or {\sf b}
1532dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1533logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1534In contrast
1535the tight option gives this result only in the unlikely event of an exact
1536tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1537associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1538depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1539Obviously, the sparsity pattern obtained with the tight option may contain
1540more zeros than that obtained with the safe option. On the other hand, it
1541will only be valid at points belonging to an area where the function $F$ is locally
1542analytic and that contains the point at which the internal representation was
1543generated. Since generating the sparsity structure using the safe version does not
1544require any reevaluation, it may thus reduce the overall computational cost
1545despite the fact that it produces more nonzero entries. The value of
1546{\sf options[2]} selects the direction of bit pattern propagation.
1547Depending on the number of independent $n$ and of dependent variables $m$ 
1548one would prefer the forward mode if $n$ is significant smaller than $m$ and
1549would otherwise use the reverse mode.
1551The routine {\sf jac\_pat} may use the propagation of bitpattern to
1552determine the sparsity pattern. Therefore, a kind of ``strip-mining''
1553is used to cope with large matrix dimensions. If the system happens to run out of memory, one may reduce
1554the value of the constant {\sf PQ\_STRIPMINE\_MAX}
1555following the instructions in \verb=<adolc/sparse/sparse_fo_rev.h>=.
1557The driver routine is prototyped in the header file
1558\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1559global header file \verb=<adolc/adolc.h>= (see
1560\autoref{ssec:DesIH}). The determination of sparsity patterns is
1561illustrated by the examples \verb=sparse_jacobian.cpp=
1562and \verb=jacpatexam.cpp=
1563contained in
1566To compute the sparsity pattern of a Hessian in a row compressed form, ADOL-C provides the
1569\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1570\>{\sf int hess\_pat(tag, n, x, HP, options)}\\
1571\>{\sf short int tag;}       \> // tape identification \\
1572\>{\sf int n;}               \> // number of independent variables $n$\\
1573\>{\sf double x[n];}         \> // independent variables $x_0$\\
1574\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure\\
1575\>{\sf int option;}          \> // control parameter
1577where the user has to provide {\sf HP} as an $n$ dimensional array of pointers to {\sf
1578 unsigned int}s.
1579After the function call {\sf HP} contains the sparsity pattern,
1580where {\sf HP[j][0]} contains the number of nonzero elements in the
1581 $j$th row for $1 \le j\le n$.
1582The components {\sf P[j][i]}, $0<${\sf i}~$\le$~{\sf P[j][0]} store the
1583 indices of these entries. For determining the sparsity pattern, ADOL-C uses
1584 the algorithm described in \cite{Wa05a}.  The parameter{\sf option} determines
1585the usage of the safe ({\sf option = 0}, default) or tight mode ({\sf
1586  option = 1}) of the computation of the sparsity pattern as described
1589This driver routine is prototyped in the header file
1590\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1591global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1592An example employing the procedure {\sf hess\_pat}  can be found in the file
1593\verb=sparse_hessian.cpp=  contained in
1597\subsubsection*{Calculation of Seed Matrices}
1599To compute a compressed derivative matrix from a given sparsity
1600pattern, one has to calculate an appropriate seed matrix that can be
1601used as input for the derivative calculation. To facilitate the
1602generation of seed matrices for a sparsity pattern given in
1603row compressed form, ADOL-C provides the following two drivers,
1604which are based on the ColPack library:
1606\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1607\>{\sf int generate\_seed\_jac(m, n, JP, S, p)}\\
1608\>{\sf int m;} \> // number of dependent variables $m$\\
1609\>{\sf int n;} \> // number of independent variables $n$\\
1610\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure
1611of Jacobian\\
1612\>{\sf double S[n][p];} \> // seed matrix\\
1613\>{\sf int p;} \> // number of columns in $S$
1615The input variables to {\sf generate\_seed\_jac} are the number of dependent variables $m$, the
1616number of independent variables {\sf n} and the sparsity pattern {\sf
1617  JP} of the Jacobian computed for example by {\sf jac\_pat}. First,
1618{\sf generate\_seed\_jac} performs a distance-2 coloring of the bipartite graph defined by the sparsity
1619pattern {\sf JP} as described in \cite{GeMaPo05}. The number of colors needed for the coloring
1620determines the number of columns {\sf p} in the seed
1621matrix. Subsequently, {\sf generate\_seed\_jac} allocates the memory needed by {\sf
1622 S} and initializes {\sf S} according to the graph coloring.
1623The coloring algorithm that is applied in {\sf
1624  generate\_seed\_jac} is used also by the driver {\sf sparse\_jac}
1625described earlier.
1628\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1629\>{\sf int generate\_seed\_hess(n, HP, S, p)}\\
1630\>{\sf int n;} \> // number of independent variables $n$\\
1631\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure
1632of Jacobian\\
1633\>{\sf double S[n][p];} \> // seed matrix\\
1634\>{\sf int p;} \> // number of columns in $S$
1636The input variables to {\sf generate\_seed\_hess} are the number of independents $n$
1637and the sparsity pattern {\sf HP} of the Hessian computed for example
1638by {\sf hess\_pat}. First, {\sf generate\_seed\_hess} performs an
1639appropriate coloring of the adjacency graph defined by the sparsity
1640pattern {\sf HP}: An acyclic coloring in the case of an indirect recovery of the Hessian from its
1641    compressed representation and a star coloring in the case of a direct recovery.
1642 Subsequently, {\sf generate\_seed\_hess} allocates the memory needed by {\sf
1643 S} and initializes {\sf S} according to the graph coloring.
1644The coloring algorithm applied in {\sf
1645  generate\_seed\_hess} is used also by the driver {\sf sparse\_hess}
1646described earlier.
1648The specific set of criteria used to define a seed matrix $S$ depends
1649on whether the sparse derivative matrix
1650to be computed is a Jacobian (nonsymmetric) or a Hessian (symmetric). 
1651It also depends on whether the entries of the derivative matrix  are to be
1652recovered from the compressed representation \emph{directly}
1653(without requiring any further arithmetic) or \emph{indirectly} (for
1654example, by solving for unknowns via successive substitutions).
1655Appropriate recovery routines are provided by ColPack and used
1656in the drivers {\sf sparse\_jac} and {\sf sparse\_hess} described in
1657the previous subsection. Examples with a detailed analysis of the
1658employed drivers for the exploitation of sparsity can be found in the
1659papers \cite{GePoTaWa06} and \cite{GePoWa08}.
1662These driver routines are prototyped in
1663\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1664global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1665An example code illustrating the usage of {\sf
1666generate\_seed\_jac} and {\sf generate\_seed\_hess} can be found in the file
1667\verb=sparse_jac_hess_exam.cpp= contained in \verb=examples/additional_examples/sparse=.
1670\subsection{Higher Derivative Tensors}
1673Many applications in scientific computing need second- and higher-order
1674derivatives. Often, one does not require full derivative tensors but
1675only the derivatives in certain directions $s_i \in \R^{n}$.
1676Suppose a collection of $p$ directions
1677$s_i \in \R^{n}$ is given, which form a matrix
1679S\; =\; \left [ s_1, s_2,\ldots,  s_p \right ]\; \in \;
1680 \R^{n \times p}.
1682One possible choice is $S = I_n$ with  $p = n$, which leads to
1683full tensors being evaluated.
1684ADOL-C provides the function {\sf tensor\_eval}
1685to calculate the derivative tensors
1688\left. \nabla_{\mbox{$\scriptstyle \!\!S$}}^{k}
1689     F(x_0) \; = \; \frac{\partial^k}{\partial z^k} F(x_0+Sz) \right |_{z=0} 
1690     \in \R^{p^k}\quad \mbox{for} \quad k = 0,\ldots,d
1692simultaneously. The function {\sf tensor\_eval} has the following calling sequence and
1696\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1697\>{\sf void tensor\_eval(tag,m,n,d,p,x,tensor,S)}\\
1698\>{\sf short int tag;}         \> // tape identification \\
1699\>{\sf int m;}                 \> // number of dependent variables $m$ \\
1700\>{\sf int n;}                 \> // number of independent variables $n$\\
1701\>{\sf int d;}                 \> // highest derivative degree $d$\\
1702\>{\sf int p;}                 \> // number of directions $p$\\
1703\>{\sf double x[n];}           \> // values of independent variables $x_0$\\
1704\>{\sf double tensor[m][size];}\> // result as defined in \eqref{eq:tensor} in compressed form\\
1705\>{\sf double S[n][p];}        \> // seed matrix $S$
1708Using the symmetry of the tensors defined by \eqref{eq:tensor}, the memory 
1709requirement can be reduced enormously. The collection of  tensors up to order $d$ comprises 
1710$\binom{p+d}{d}$ distinct elements. Hence, the second dimension of {\sf tensor} must be
1711greater or equal to $\binom{p+d}{d}$.
1712To compute the derivatives, {\sf tensor\_eval} propagates internally univariate Taylor
1713series along $\binom{n+d-1}{d}$ directions. Then the desired values are interpolated. This
1714approach is described in \cite{Griewank97}.
1716The access of individual entries in symmetric tensors of
1717higher order is a little tricky. We always store the derivative
1718values in the two dimensional array {\sf tensor} and provide two
1719different ways of accessing them. 
1720The leading dimension of the tensor array ranges over
1721the component index $i$ of the function $F$, i.e., $F_{i+1}$ for $i =
17220,\ldots,m-1$. The sub-arrays pointed to by {\sf tensor[i]} have identical
1723structure for all $i$. Each of them represents the symmetric tensors up to
1724order $d$ of the scalar function $F_{i+1}$ in $p$ variables. 
1726The $\binom{p+d}{d}$ mixed partial derivatives in each of the $m$
1727tensors are linearly ordered according to the tetrahedral
1728scheme described by Knuth \cite{Knuth73}. In the familiar quadratic
1729case $d=2$ the derivative with respect to $z_j$ and $z_k$ with $z$ 
1730as in \eqref{eq:tensor} and $j \leq k$ is stored at {\sf tensor[i][l]} with
1731$l = k*(k+1)/2+j$. At $j = 0 = k$ and hence $l = 0$ we find the
1732function value $F_{i+1}$ itself and the gradient
1733$\nabla F_{i+1}= \partial F_{i+1}/\partial x_k $ is stored at $l=k(k+1)/2$
1734with $j=0$ for $k=1,\ldots,p$.
1736For general $d$ we combine the variable
1737indices to a multi-index $j = (j_1,j_2,\ldots,j_d)$,
1738where $j_k$ indicates differentiation with respect to variable
1739$x_{j_k}$ with $j_k \in \{0,1,\ldots,p\}$. The value $j_k=0$ indicates
1740no differentiation so that all lower derivatives are also
1741contained in the same data structure as described above for
1742the quadratic case. The location of the partial derivative specified
1743by $j$ is computed by the function
1746\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1747\>{\sf int address(d,$\,$j)} \\
1748\>{\sf int d;}                 \> // highest derivative degree $d$ \\
1749\>{\sf int j[d];}              \> // multi-index $j$
1752and it may thus be referenced as {\sf tensor[i][address(d,$\,$j)]}.
1753Notice that the address computation does depend on the degree $d$ 
1754but not on the number of directions $p$, which could theoretically be
1755enlarged without the need to reallocate the original tensor.
1756Also, the components of $j$ need to be non-increasing.
1758To some C programmers it may appear more natural to access tensor
1759entries by successive dereferencing in the form
1760{\sf tensorentry[i][$\,$j1$\,$][$\,$j2$\,$]$\ldots$[$\,$jd$\,$]}.
1761We have also provided this mode, albeit with the restriction
1762that the indices $j_1,j_2,\ldots,j_d$ are non-increasing.
1763In the second order case this means that the Hessian entries must be
1764specified in or below the diagonal. If this restriction is
1765violated the values are almost certain to be wrong and array bounds
1766may be violated. We emphasize that subscripting is not overloaded
1767but that {\sf tensorentry} is a conventional and
1768thus moderately efficient C pointer structure.
1769Such a pointer structure can be allocated and set up completely by the
1773\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1774\>{\sf void** tensorsetup(m,p,d,tensor)} \\
1775\>{\sf int m;}                 \> // number of dependent variables $n$ \\
1776\>{\sf int p;}                 \> // number of directions $p$\\
1777\>{\sf int d;}                 \> // highest derivative degree $d$\\
1778\>{\sf double tensor[m][size];}\> // pointer to two dimensional array
1781Here, {\sf tensor} is the array of $m$ pointers pointing to arrays of {\sf size}
1782$\geq \binom{p+d}{d}$ allocated by the user before. During the execution of {\sf tensorsetup},
1783 $d-1$ layers of pointers are set up so that the return value
1784allows the direct dereferencing of individual tensor elements.
1786For example, suppose some active section involving  $m \geq 5$ dependents and
1787$n \geq 2$ independents has been executed and taped. We may
1788select $p=2$, $d=3$ and initialize the $n\times 2$ seed matrix $S$ with two
1789columns $s_1$ and $s_2$. Then we are able to execute the code segment
1791\hspace{0.5in}\={\sf double**** tensorentry = (double****) tensorsetup(m,p,d,tensor);} \\
1792              \>{\sf tensor\_eval(tag,m,n,d,p,x,tensor,S);}   
1794This way, we evaluated all tensors defined in \eqref{eq:tensor} up to degree 3
1795in both directions $s_1$ and
1796$s_2$ at some argument $x$. To allow the access of tensor entries by dereferencing the pointer
1797structure {\sf tensorentry} has been created. Now, 
1798the value of the mixed partial
1800 \left. \frac{\partial ^ 3 F_5(x+s_1 z_1+s_2 z_2)}{\partial z_1^2 \partial z_2}   \right |_{z_1=0=z_2
1802can be recovered as
1804   {\sf tensorentry[4][2][1][1]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[4][address(d,$\,$j)]},
1806where the integer array {\sf j} may equal (1,1,2), (1,2,1) or (2,1,1). 
1807Analogously, the entry
1809   {\sf tensorentry[2][1][0][0]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[2][address(d,$\,$j)]}
1811with {\sf j} = (1,0,0) contains the first derivative of the third dependent
1812variable $F_3$ with respect to the first differentiation parameter $z_1$.
1814Note, that the pointer structure {\sf tensorentry} has to be set up only once. Changing the values of the
1815array {\sf tensor}, e.g.~by a further call of {\sf tensor\_eval}, directly effects the values accessed
1816by {\sf tensorentry}.
1818When no more derivative evaluations are desired the pointer structure
1819{\sf tensorentry} can be deallocated by a call to the function
1822\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1823\>{\sf int freetensor(m,p,d, (double ****) tensorentry)}\\
1824\>{\sf int m;}                    \> // number of dependent variables $m$ \\
1825\>{\sf int p;}                    \> // number of independent variables $p$\\
1826\>{\sf int d;}                    \> // highest derivative degree $d$\\
1827\>{\sf double*** tensorentry[m];} \> // return value of {\sf tensorsetup} 
1830that does not deallocate the array {\sf tensor}.
1832The drivers provided for efficient calculation of higher order
1833derivatives are prototyped in the header file \verb=<adolc/drivers/taylor.h>=,
1834which is included by the global header file \verb=<adolc/adolc.h>= automatically
1835(see \autoref{ssec:DesIH}).
1836Example codes using the above procedures can be found in the files
1837\verb=taylorexam.C= and \verb=accessexam.C= contained in the subdirectory
1841\subsection{Derivatives of Implicit and Inverse Functions}
1844Frequently, one needs derivatives of variables
1845$y \in \R^{m}$ that are implicitly defined as
1846functions of some variables $x \in \R^{n-m}$
1847by an algebraic system of equations
1849G(z) \; = \; 0 \in \R^m \quad
1850{\rm with} \quad z = (y, x) \in \R^n .
1852Naturally, the $n$ arguments of $G$ need not be partitioned in
1853this regular fashion and we wish to provide flexibility for a
1854convenient selection of the $n-m$ {\em truly} independent
1855variables. Let $P \in \R^{(n-m)\times n}$ be a $0-1$ matrix
1856that picks out these variables so that it is a column
1857permutation of the matrix $[0,I_{n-m}] \in \R^{(n-m)\times n}$.
1858Then the nonlinear system
1860  G(z) \; = \; 0, \quad P z =  x,                           
1862has a regular Jacobian, wherever the implicit function theorem
1863yields $y$ as a function of $x$. Hence, we may also write
1866F(z) = \left(\begin{array}{c}
1867                        G(z) \\
1868                        P z
1869                      \end{array} \right)\; \equiv \;
1870                \left(\begin{array}{c}
1871                        0 \\
1872                        P z
1873                      \end{array} \right)\; \equiv \; S\, x,
1875where $S = [0,I_p]^{T} \in \R^{n \times p}$ with $p=n-m$. Now, we have rewritten
1876the original implicit functional relation between $x$ and $y$ as an inverse
1877relation $F(z) = Sx$. In practice, we may implement the projection $P$ simply
1878by marking $n-m$ of the independents also dependent. 
1880Given any $ F : \R^n \mapsto \R^n $ that is locally invertible and an arbitrary
1881seed matrix $S \in \R^{n \times p}$ we may evaluate all derivatives of $z \in \R^n$
1882with respect to $x \in \R^p$ by calling the following routine:
1885\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1886\>{\sf void inverse\_tensor\_eval(tag,n,d,p,z,tensor,S)}\\
1887\>{\sf short int tag;}         \> // tape identification \\
1888\>{\sf int n;}                 \> // number of variables $n$\\
1889\>{\sf int d;}                 \> // highest derivative degree $d$\\
1890\>{\sf int p;}                 \> // number of directions $p$\\
1891\>{\sf double z[n];}          \> // values of independent variables $z$\\
1892\>{\sf double tensor[n][size];}\> // partials of $z$ with respect to $x$\\
1893\>{\sf double S[n][p];}        \> // seed matrix $S$
1896The results obtained in {\sf tensor} are exactly the same as if we had called {\sf tensor\_eval} with
1897{\sf tag} pointing to a tape for the evaluation of the inverse function
1898$z=F^{-1}(y)$ for which naturally $n=m$. Note that the columns of $S$ belong
1899to the domain of that function. Individual derivative components can be
1900accessed in tensor exactly as in the explicit case described above.
1902It must be understood that {\sf inverse\_tensor\_eval} actually computes the
1903derivatives of $z$ with respect to $x$ that is defined by the equation
1904$F(z)=F(z_0)+S \, x$. In other words the base point at
1905which the inverse function is differentiated is given by $F(z_0)$.
1906The routine has no capability for inverting $F$ itself as
1907solving systems of nonlinear
1908equations $F(z)=0$ in the first place is not just a differentiation task.
1909However, the routine {\sf jac\_solv} described in \autoref{optdrivers} may certainly be very
1910useful for that purpose.
1912As an example consider the following two nonlinear expressions
1914      G_1(z_1,z_2,z_3,z_4) & = & z_1^2+z_2^2-z_3^\\
1915      G_2(z_1,z_2,z_3,z_4) & = & \cos(z_4) - z_1/z_3 \enspace   .
1917The equations $G(z)=0$ describe the relation between the Cartesian
1918coordinates $(z_1,z_2)$ and the polar coordinates $(z_3,z_4)$ in the plane.
1919Now, suppose we are interested in the derivatives of the second Cartesian
1920$y_1=z_2$ and the second (angular) polar coordinate $y_2=z_4$ with respect
1921to the other two variables $x_1=z_1$ and $x_2=z_3$. Then the active section
1922could look simply like
1925\hspace{1.5in}\={\sf for (j=1; j $<$ 5;$\,$j++)}\hspace{0.15in} \= {\sf z[j] \boldmath $\ll=$ \unboldmath  zp[j];}\\
1926\>{\sf g[1] = z[1]*z[1]+z[2]*z[2]-z[3]*z[3]; }\\
1927\>{\sf g[2] = cos(z[4]) - z[1]/z[3]; }\\
1928\>{\sf g[1] \boldmath $\gg=$ \unboldmath gp[1];} \> {\sf g[2] \boldmath $\gg=$ \unboldmath gp[2];}\\
1929\>{\sf z[1] \boldmath $\gg=$ \unboldmath zd[1];} \> {\sf z[3] \boldmath $\gg=$ \unboldmath zd[2];}
1932where {\sf zd[1]} and {\sf zd[2]} are dummy arguments.
1933In the last line the two independent variables {\sf z[1]} and
1934{\sf z[3]} are made
1935simultaneously dependent thus generating a square system that can be
1936inverted (at most arguments). The corresponding projection and seed
1937matrix are
1939P \;=\; \left( \begin{array}{cccc}
1940               1 & 0 & 0 & 0 \\
1941               0 & 0 & 1 & 0
1942            \end{array}\right) \quad \mbox{and} \quad
1943S^T \; = \; \left( \begin{array}{cccc}
1944               0 & 0 & 1 & 0 \\
1945               0 & 0 & 0 & 1
1946            \end{array}\right\enspace .
1948Provided the vector {\sf zp} is consistent in that its Cartesian and polar
1949components describe the same point in the plane the resulting tuple
1950{\sf gp} must vanish. The call to {\sf inverse\_tensor\_eval} with
1951$n=4$, $p=2$ and $d$
1952as desired will yield the implicit derivatives, provided
1953{\sf tensor} has been allocated appropriately of course and $S$ has the value
1954given above.
1956The example is untypical in that the implicit function could also be
1957obtained explicitly by symbolic mani\-pu\-lations. It is typical in that
1958the subset of $z$ components that are to be considered as truly
1959independent can be selected and altered with next to no effort at all.
1961The presented drivers are prototyped in the header file
1962\verb=<adolc/drivers/taylor.h>=. As indicated before this header
1963is included by the global header file \verb=<adolc/adolc.h>= automatically
1964(see \autoref{ssec:DesIH}).
1965The example programs \verb=inversexam.cpp=, \verb=coordinates.cpp= and
1966\verb=trigger.cpp=  in the directory \verb=examples/additional_examples/taylor=
1967show the application of the procedures described here.
1971\section{Basic Drivers for the Forward and Reverse Mode}
1974In this section, we present tailored drivers for different
1975variants of the forward mode and the reverse mode, respectively.
1976For a better understanding, we start with a short
1977description of the mathematical background.
1979Provided no arithmetic exception occurs,
1980no comparison including {\sf fmax} or  {\sf fmin} yields a tie,
1981{\sf fabs} does not yield zero,
1982and all special functions were evaluated in the
1983interior of their domains, the functional relation between the input
1984variables $x$
1985and the output variables $y$ denoted by $y=F(x)$ is in
1986fact analytic.  In other words, we can compute arbitrarily high
1987derivatives of the vector function $F : I\!\!R^n \mapsto I\!\!R^m$ defined
1988by the active section.
1989We find it most convenient to describe and
1990compute derivatives in terms of univariate Taylor expansions, which
1991are truncated after the highest derivative degree $d$ that is desired
1992by the user. Let
1995x(t) \; \equiv \; \sum_{j=0}^dx_jt^j \; : \;  I\!\!R \; \mapsto \;
1998denote any vector polynomial in the scalar variable $t \in I\!\!R$.
1999In other words, $x(t)$ describes a path in $I\!\!R^n$ parameterized by $t$.
2000The Taylor coefficient vectors
2001\[ x_j \; = \; 
2002\frac{1}{j!} \left .  \frac{\partial ^j}{\partial t^j} x(t)
2003\right |_{t=0}
2005are simply the scaled derivatives of $x(t)$ at the parameter
2006origin $t=0$. The first two vectors $x_1,x_2 \in I\!\!R^n$ can be
2007visualized as tangent and curvature at the base point $x_0$,
2009Provided that $F$ is $d$ times continuously differentiable, it
2010follows from the chain rule that the image path
2013 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
2015is also smooth and has $(d+1)$ Taylor coefficient vectors
2016$y_j \in I\!\!R^m$ at $t=0$, so that
2019y(t) \; = \; \sum_{j=0}^d y_jt^j + O(t^{d+1}).
2021Also as a consequence of the chain rule, one can observe that
2022each $y_j$ is uniquely and smoothly determined by the coefficient
2023vectors $x_i$ with $i \leq j$.  In particular we have
2026  y_0 & = F(x_0) \nonumber \\
2027  y_1 & = F'(x_0) x_1 \nonumber\\
2028  y_2 & = F'(x_0) x_2 + \frac{1}{2}F''(x_0)x_1 x_1 \\
2029  y_3 & = F'(x_0) x_3 + F''(x_0)x_1 x_2
2030          + \frac{1}{6}F'''(x_0)x_1 x_1 x_1\nonumber\\
2031  & \ldots\nonumber
2033In writing down the last equations we have already departed from the
2034usual matrix-vector notation. It is well known that the number of
2035terms that occur in these ``symbolic'' expressions for
2036the $y_j$ in terms of the first $j$ derivative tensors of $F$ and
2037the ``input'' coefficients $x_i$ with $i\leq j$ grows very rapidly
2038with $j$. Fortunately, this exponential growth does not occur
2039in automatic differentiation, where the many terms are somehow
2040implicitly combined  so that storage and operations count grow only
2041quadratically in the bound $d$ on $j$.
2043Provided $F$ is analytic, this property is inherited by the functions
2045y_j = y_j (x_0,x_1, \ldots ,x_j) \in {I\!\!R}^m ,
2047and their derivatives satisfy the identities
2050\frac{\partial y_j}{\partial x_i}  = \frac{\partial y_{j-i}}
2051{\partial x_0} = A_{j-i}(x_0,x_1, \ldots ,x_{j-i})
2053as established in \cite{Chri91a}. This yields in particular
2055  \frac{\partial y_0}{\partial x_0} =
2056  \frac{\partial y_1}{\partial x_1} =
2057  \frac{\partial y_2}{\partial x_2} =
2058  \frac{\partial y_3}{\partial x_3} =
2059  A_0 & = F'(x_0) \\
2060  \frac{\partial y_1}{\partial x_0} =
2061  \frac{\partial y_2}{\partial x_1} =
2062  \frac{\partial y_3}{\partial x_2} =
2063  A_1 & = F''(x_0) x_1 \\
2064  \frac{\partial y_2}{\partial x_0} =
2065  \frac{\partial y_3}{\partial x_1} =
2066  A_2 & = F''(x_0) x_2 + \frac{1}{2}F'''(x_0)x_1 x_1 \\
2067  \frac{\partial y_3}{\partial x_0} =
2068  A_3 & = F''(x_0) x_3 + F'''(x_0)x_1 x_2
2069          + \frac{1}{6}F^{(4)}(x_0)x_1 x_1 x_1 \\
2070  & \ldots
2072The $m \times n$ matrices $A_k, k=0,\ldots,d$, are actually the Taylor
2073coefficients of the Jacobian path $F^\prime(x(t))$, a fact that is of
2074interest primarily in the context of ordinary differential
2075equations and differential algebraic equations.
2077Given the tape of an active section and the coefficients $x_j$,
2078the resulting $y_j$ and their derivatives $A_j$ can be evaluated
2079by appropriate calls to the ADOL-C forward mode implementations and
2080the ADOL-C reverse mode implementations. The scalar versions of the forward
2081mode propagate just one truncated Taylor series from the $(x_j)_{j\leq d}$
2082to the $(y_j)_{j\leq d}$. The vector versions of the forward
2083mode propagate families of $p\geq 1$ such truncated Taylor series
2084in order to reduce the relative cost of the overhead incurred
2085in the tape interpretation. In detail, ADOL-C provides
2087\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2088\>{\sf int zos\_forward(tag,m,n,keep,x,y)}\\
2089\>{\sf short int tag;}         \> // tape identification \\
2090\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2091\>{\sf int n;}                 \> // number of independent variables $n$\\
2092\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2093\>{\sf double x[n];}           \> // independent vector $x=x_0$\\
2094\>{\sf double y[m];}           \> // dependent vector $y=F(x_0)$
2096for the {\bf z}ero-{\bf o}rder {\bf s}calar forward mode. This driver computes
2097$y=F(x)$ with $0\leq\text{\sf keep}\leq 1$. The integer
2098flag {\sf keep} plays a similar role as in the call to 
2099{\sf trace\_on}: It determines if {\sf zos\_forward} writes
2100the first Taylor coefficients of all intermediate quantities into a buffered
2101temporary file, i.e., the value stack, in preparation for a subsequent
2102reverse mode evaluation. The value {\sf keep} $=1$
2103prepares for {\sf fos\_reverse} or {\sf fov\_reverse} as exlained below.
2105To compute first-order derivatives, one has
2107\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2108\>{\sf int fos\_forward(tag,m,n,keep,x0,x1,y0,y1)}\\
2109\>{\sf short int tag;}         \> // tape identification \\
2110\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2111\>{\sf int n;}                 \> // number of independent variables $n$\\
2112\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2113\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2114\>{\sf double x1[n];}          \> // tangent vector $x_1$\\
2115\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2116\>{\sf double y1[m];}          \> // first derivative $y_1=F'(x_0)x_1$
2118for the {\bf f}irst-{\bf o}rder {\bf s}calar forward mode. Here, one has
2119$0\leq\text{\sf keep}\leq 2$, where
2121\text{\sf keep} = \left\{\begin{array}{cl}
2122       1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2123       2 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2124       \end{array}\right.
2126as exlained below. For the {\bf f}irst-{\bf o}rder {\bf v}ector forward mode,
2127ADOL-C provides
2129\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2130\>{\sf int fov\_forward(tag,m,n,p,x0,X,y0,Y)}\\
2131\>{\sf short int tag;}         \> // tape identification \\
2132\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2133\>{\sf int n;}                 \> // number of independent variables $n$\\
2134\>{\sf int p;}                 \> // number of directions\\
2135\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2136\>{\sf double X[n][p];}        \> // tangent matrix $X$\\
2137\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2138\>{\sf double Y[m][p];}        \> // first derivative matrix $Y=F'(x)X$
2140For the computation of higher derivative, the driver
2142\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2143\>{\sf int hos\_forward(tag,m,n,d,keep,x0,X,y0,Y)}\\
2144\>{\sf short int tag;}         \> // tape identification \\
2145\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2146\>{\sf int n;}                 \> // number of independent variables $n$\\
2147\>{\sf int d;}                 \> // highest derivative degree $d$\\
2148\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2149\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2150\>{\sf double X[n][d];}        \> // tangent matrix $X$\\
2151\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2152\>{\sf double Y[m][d];}        \> // derivative matrix $Y$
2154implementing the  {\bf h}igher-{\bf o}rder {\bf s}calar forward mode.
2155The rows of the matrix $X$ must correspond to the independent variables in the order of their
2156initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2157$X = \{x_j\}_{j=1\ldots d}$ represent Taylor coefficient vectors as in
2158\eqref{eq:x_of_t}. The rows of the matrix $Y$ must correspond to the
2159dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2160The columns of $Y = \{y_j\}_{j=1\ldots d}$ represent
2161Taylor coefficient vectors as in \eqref{eq:series}, i.e., {\sf hos\_forward}
2162computes the values
2163$y_0=F(x_0)$, $y_1=F'(x_0)x_1$, \ldots, where
2164$X=[x_1,x_2,\ldots,x_d]$ and  $Y=[y_1,y_2,\ldots,y_d]$. Furthermore, one has
2165$0\leq\text{\sf keep}\leq d+1$, with
2167\text{\sf keep}  \left\{\begin{array}{cl}
2168       = 1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2169       > 1 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2170       \end{array}\right.
2172Once more, there is also a vector version given by
2174\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2175\>{\sf int hov\_forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2176\>{\sf short int tag;}         \> // tape identification \\
2177\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2178\>{\sf int n;}                 \> // number of independent variables $n$\\
2179\>{\sf int d;}                 \> // highest derivative degree $d$\\
2180\>{\sf int p;}                 \> // number of directions $p$\\
2181\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2182\>{\sf double X[n][p][d];}     \> // tangent matrix $X$\\
2183\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2184\>{\sf double Y[m][p][d];}     \> // derivative matrix $Y$
2186for the  {\bf h}igher-{\bf o}rder {\bf v}ector forward mode that computes
2187$y_0=F(x_0)$, $Y_1=F'(x_0)X_1$, \ldots, where $X=[X_1,X_2,\ldots,X_d]$ and 
2190There are also overloaded versions providing a general {\sf forward}-call.
2191Details of the appropriate calling sequences are given in \autoref{forw_rev}.
2193Once, the required information is generated due to a forward mode evaluation
2194with an approriate value of the parameter {\sf keep}, one may use the
2195following implementation variants of the reverse mode. To compute first-order derivatives
2196one can use
2198\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2199\>{\sf int fos\_reverse(tag,m,n,u,z)}\\
2200\>{\sf short int tag;}         \> // tape identification \\
2201\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2202\>{\sf int n;}                 \> // number of independent variables $n$\\
2203\>{\sf double u[m];}           \> // weight vector $u$\\
2204\>{\sf double z[n];}           \> // resulting adjoint value $z^T=u^T F'(x)$
2206as {\bf f}irst-{\bf o}rder {\bf s}calar reverse mode implementation that computes
2207the product $z^T=u^T F'(x)$ after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2208{\sf hos\_forward} with {\sf keep}=1. The corresponding {\bf f}irst-{\bf
2209  o}rder {\bf v}ector reverse mode driver is given by
2211\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2212\>{\sf int fov\_reverse(tag,m,n,q,U,Z)}\\
2213\>{\sf short int tag;}         \> // tape identification \\
2214\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2215\>{\sf int n;}                 \> // number of independent variables $n$\\
2216\>{\sf int q;}                 \> // number of weight vectors $q$\\
2217\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2218\>{\sf double Z[q][n];}        \> // resulting adjoint $Z=U F'(x)$
2220that can be used after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2221{\sf hos\_forward} with {\sf keep}=1. To compute higher-order derivatives,
2222ADOL-C provides
2224\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2225\>{\sf int hos\_reverse(tag,m,n,d,u,Z)}\\
2226\>{\sf short int tag;}         \> // tape identification \\
2227\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2228\>{\sf int n;}                 \> // number of independent variables $n$\\
2229\>{\sf int d;}                 \> // highest derivative degree $d$\\
2230\>{\sf double u[m];}           \> // weight vector $u$\\
2231\>{\sf double Z[n][d+1];}      \> // resulting adjoints
2233as {\bf h}igher-{\bf o}rder {\bf s}calar reverse mode implementation yielding
2234the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2235\ldots, where $Z=[z_0,z_1,\ldots,z_d]$ after calling  {\sf fos\_forward} or
2236{\sf hos\_forward} with {\sf keep} $=d+1>1$. The vector version is given by
2238\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2239\>{\sf int hov\_reverse(tag,m,n,d,q,U,Z,nz)}\\
2240\>{\sf short int tag;}         \> // tape identification \\
2241\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2242\>{\sf int n;}                 \> // number of independent variables $n$\\
2243\>{\sf int d;}                 \> // highest derivative degree $d$\\
2244\>{\sf double U[q][m];}        \> // weight vector $u$\\
2245\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints\\
2246\>{\sf short int nz[q][n];}    \> // nonzero pattern of {\sf Z}
2248as {\bf h}igher-{\bf o}rder {\bf v}ector reverse mode driver to compute
2249the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2250\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$ after calling  {\sf fos\_forward} or
2251{\sf hos\_forward} with {\sf keep} $=d+1>1$.
2252After the function call, the last argument of {\sf hov\_reverse} 
2253contains information about the sparsity pattern, i.e. each {\sf nz[i][j]}
2254has a value that characterizes the functional relation between the
2255$i$-th component of $UF^\prime(x)$ and the $j$-th independent value
2256$x_j$ as:
2259 0 & trivial \\
2260 1 & linear
2261\end{tabular} \hspace*{4ex}
2263 2 & polynomial\\
2264 3 & rational
2265\end{tabular} \hspace*{4ex}
2267 4 & transcendental\\
2268 5 & non-smooth
2271Here, ``trivial'' means that there is no dependence at all and ``linear'' means
2272that the partial derivative is a constant that
2273does not dependent on other variables either. ``Non-smooth'' means that one of
2274the functions on the path between $x_i$ and $y_j$ was evaluated at a point
2275where it is not differentiable.  All positive labels
2276$1, 2, 3, 4, 5$ are pessimistic in that the actual functional relation may
2277in fact be simpler, for example due to exact cancellations. 
2279There are also overloaded versions providing a general {\sf reverse}-call.
2280Details of the appropriate calling sequences are given in the following \autoref{forw_rev}.
2283\section{Overloaded Forward and Reverse Calls}
2286In this section, the several versions of the {\sf forward} and
2287{\sf reverse} routines, which utilize the overloading capabilities
2288of C++, are described in detail. With exception of the bit pattern
2289versions all interfaces are prototyped in the header file
2290\verb=<adolc/interfaces.h>=, where also some more specialized {\sf forward}
2291and {\sf reverse} routines are explained. Furthermore, \mbox{ADOL-C} provides
2292C and Fortran-callable versions prototyped in the same header file.
2293The bit pattern versions of {\sf forward} and {\sf reverse} introduced
2294in the \autoref{ProBit} are prototyped in the header file
2295\verb=<adolc/sparse/sparsedrivers.h>=, which will be included by the header
2296file \verb=<adolc/interfaces.h>= automatically.
2299\subsection{The Scalar Case}
2303Given any correct tape, one may call from within
2304the generating program, or subsequently during another run, the following
2308\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2309\>{\sf int forward(tag,m,n,d,keep,X,Y)} \\
2310\>{\sf short int tag;}         \> // tape identification \\
2311\>{\sf int m;}                 \> // number of dependent variables $m$\\
2312\>{\sf int n;}                 \> // number of independent variables $n$\\
2313\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2314\>{\sf  int keep;}             \> // flag for reverse sweep \\ 
2315\>{\sf  double X[n][d+1];}     \> // Taylor coefficients $X$ of
2316                                     independent variables \\
2317\>{\sf double Y[m][d+1];}      \> // Taylor coefficients $Y$ as
2318                                     in \eqref{eq:series}
2321The rows of the matrix $X$ must correspond to the independent variables in the order of their
2322initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2323$X = \{x_j\}_{j=0\ldots d}$ represent Taylor coefficient vectors as in
2324\eqref{eq:x_of_t}. The rows of the matrix $Y$ must
2325correspond to the
2326dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2327The columns of $Y = \{y_j\}_{j=0\ldots d}$ represent
2328Taylor coefficient vectors as in \eqref{eq:series}.
2329Thus the first column of $Y$ contains the
2330function value $F(x)$ itself, the next column represents the first
2331Taylor coefficient vector of $F$, and the last column the
2332$d$-th Taylor coefficient vector. The integer flag {\sf keep} determines
2333how many Taylor coefficients of all intermediate quantities are
2334written into the value stack as explained in \autoref{forw_rev_ad}.
2335 If {\sf keep} is omitted, it defaults to 0.
2337The given {\sf tag} value is used by {\sf forward} to determine the
2338name of the file on which the tape was written. If the tape file does
2339not exist, {\sf forward} assumes that the relevant
2340tape is still in core and reads from the buffers.
2341After the execution of an active section with \mbox{{\sf keep} = 1} or a call to
2342{\sf forward} with any {\sf keep} $\leq$ $d+1$, one may call
2343the function {\sf reverse} with \mbox{{\sf d} = {\sf keep} $-$ 1} and the same tape
2344identifier {\sf tag}. When $u$ is a vector
2345and $Z$ an $n\times (d+1)$ matrix
2346{\sf reverse} is executed in the scalar mode by the calling
2350\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position             
2351\>{\sf int reverse(tag,m,n,d,u,Z)}\\
2352\>{\sf short int tag;}         \> // tape identification \\
2353\>{\sf int m;}                 \> // number of dependent variables $m$\\
2354\>{\sf int n;}                 \> // number of independent variables $n$\\
2355\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2356\>{\sf  double u[m];}          \> // weighting vector $u$\\
2357\>{\sf double Z[n][d+1];}      \> // resulting adjoints $Z$ 
2359to compute
2360the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2361\ldots, where $Z=[z_0,z_1,\ldots,z_d]$.
2364\subsection{The Vector Case}
2368When $U$ is a matrix {\sf reverse} is executed in the vector mode by the following calling sequence
2371\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position       
2372\>{\sf int reverse(tag,m,n,d,q,U,Z,nz)}\\
2373\>{\sf short int tag;}         \> // tape identification \\
2374\>{\sf int m;}                 \> // number of dependent variables $m$\\
2375\>{\sf int n;}                 \> // number of independent variables $n$\\
2376\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2377\>{\sf int q;}                 \> // number of weight vectors $q$\\
2378\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2379\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints \\
2380\>{\sf short nz[q][n];}        \> // nonzero pattern of {\sf Z}
2383to compute the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2384\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$.
2385When the arguments {\sf p} and {\sf U} are omitted, they default to
2386$m$ and the identity matrix of order $m$, respectively. 
2388Through the optional argument {\sf nz} of {\sf reverse} one can compute
2389information about the sparsity pattern of $Z$ as described in detail
2390in the previous \autoref{forw_rev_ad}.
2392The return values of {\sf reverse} calls can be interpreted according
2393to \autoref{retvalues}, but negative return values are not
2394valid, since the corresponding forward sweep would have
2395stopped without completing the necessary taylor file.
2396The return value of {\sf reverse} may be higher
2397than that of the preceding {\sf forward} call because some operations
2398that were evaluated  at a critical argument during the forward sweep
2399were found not to impact the dependents during the reverse sweep.
2401In both scalar and vector mode, the degree $d$ must agree with
2402{\sf keep}~$-$~1 for the most recent call to {\sf forward}, or it must be
2403equal to zero if {\sf reverse} directly follows the taping of an active
2404section. Otherwise, {\sf reverse} will return control with a suitable error
2406In order to avoid possible confusion, the first four arguments must always be
2407present in the calling sequence. However, if $m$ or $d$
2408attain their trivial values 1 and 0, respectively, then
2409corresponding dimensions of the arrays {\sf X}, {\sf Y}, {\sf u},
2410{\sf U}, or {\sf Z} can be omitted, thus eliminating one level of
2411indirection.  For example, we may call
2412{\sf reverse(tag,1,n,0,1.0,g)} after declaring
2413{\sf double g[n]} 
2414to calculate a gradient of a scalar-valued function.
2416Sometimes it may be useful to perform a forward sweep for families of
2417Taylor series with the same leading term.
2418This vector version of {\sf forward} can be called in the form
2421\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2422\>{\sf int forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2423\>{\sf short int tag;}         \> // tape identification \\
2424\>{\sf int m;}                 \> // number of dependent variables $m$\\
2425\>{\sf int n;}                 \> // number of independent variables $n$\\
2426\>{\sf int d;}                 \> // highest derivative degree $d$\\
2427\>{\sf int p;}                 \> // number of Taylor series $p$\\
2428\>{\sf  double x0[n];}          \> // values of independent variables $x_0$\\
2429\>{\sf double X[n][p][d];}     \> // Taylor coefficients $X$ of independent variables\\
2430\>{\sf double y0[m];}           \> // values of dependent variables $y_0$\\
2431\>{\sf double Y[m][p][d];}     \> // Taylor coefficients $Y$ of dependent variables
2434where {\sf X} and {\sf Y} hold the Taylor coefficients of first
2435and higher degree and {\sf x0}, {\sf y0} the common Taylor coefficients of
2436degree 0. There is no option to keep the values of active variables
2437that are going out of scope or that are overwritten. Therefore this
2438function cannot prepare a subsequent reverse sweep.
2439The return integer serves as a flag to indicate quadratures or altered
2440comparisons as described above in \autoref{reuse_tape}.
2442Since the calculation of Jacobians is probably the most important
2443automatic differentia\-tion task, we have provided a specialization
2444of vector {\sf forward} to the case where $d = 1$. This version can be
2445called in the form
2448\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2449\>{\sf int forward(tag,m,n,p,x,X,y,Y)}\\
2450\>{\sf short int tag;}         \> // tape identification \\
2451\>{\sf int m;}                 \> // number of dependent variables $m$\\
2452\>{\sf int n;}                 \> // number of independent variables $n$\\
2453\>{\sf int p;}                 \> // number of partial derivatives $p$ \\
2454\>{\sf double x[n];}          \> // values of independent variables $x_0$\\
2455\>{\sf double X[n][p];}        \> // seed derivatives of independent variables $X$\\
2456\>{\sf double y[m];}           \> // values of dependent variables $y_0$\\
2457\>{\sf double Y[m][p];}        \> // first derivatives of dependent variables $Y$
2460When this routine is called with {\sf p} = {\sf n} and {\sf X} the identity matrix,
2461the resulting {\sf Y} is simply the Jacobian $F^\prime(x_0)$. In general,
2462one obtains the $m\times p$ matrix $Y=F^\prime(x_0)\,X $ for the
2463chosen initialization of $X$. In a workstation environment a value
2464of $p$ somewhere between $10$ and $50$
2465appears to be fairly optimal. For smaller $p$ the interpretive
2466overhead is not appropriately amortized, and for larger $p$ the
2467$p$-fold increase in storage causes too many page faults. Therefore,
2468large Jacobians that cannot be compressed via column coloring
2469as could be done for example using the driver {\sf sparse\_jac}
2470should be ``strip-mined'' in the sense that the above
2471first-order-vector version of {\sf forward} is called
2472repeatedly with the successive \mbox{$n \times p$} matrices $X$ forming
2473a partition of the identity matrix of order $n$.
2476\subsection{Dependence Analysis}
2480The sparsity pattern of Jacobians is often needed to set up data structures
2481for their storage and factorization or to allow their economical evaluation
2482by compression \cite{BeKh96}. Compared to the evaluation of the full
2483Jacobian $F'(x_0)$ in real arithmetic computing the Boolean matrix
2484$\tilde{P}\in\left\{0,1\right\}^{m\times n}$ representing its sparsity
2485pattern in the obvious way requires a little less run-time and
2486certainly a lot less memory.
2488The entry $\tilde{P}_{ji}$ in the $j$-th row and $i$-th column
2489of $\tilde{P}$ should be $1 = true$ exactly when there is a data
2490dependence between the $i$-th independent variable $x_{i}$ and
2491the $j$-th dependent variable $y_{j}$. Just like for real arguments
2492one would wish to compute matrix-vector and vector-matrix products
2493of the form $\tilde{P}\tilde{v}$ or $\tilde{u}^{T}\tilde{P}$ 
2494by appropriate {\sf forward} and {\sf reverse} routines where
2495$\tilde{v}\in\{0,1\}^{n}$ and $\tilde{u}\in\{0,1\}^{m}$.
2496Here, multiplication corresponds to logical
2497{\sf AND} and addition to logical {\sf OR}, so that algebra is performed in a
2500For practical reasons it is assumed that
2501$s=8*${\sf sizeof}$(${\sf unsigned long int}$)$ such Boolean vectors
2502$\tilde{v}$ and $\tilde{u}$ are combined to integer vectors
2503$v\in\N^{n}$ and $u\in\N^{m}$ whose components can be interpreted
2504as bit patterns. Moreover $p$ or $q$ such integer vectors may
2505be combined column-wise or row-wise to integer matrices $X\in\N^{n \times p}$ 
2506and $U\in\N^{q \times m}$, which naturally correspond
2507to Boolean matrices $\tilde{X}\in\{0,1\}^{n\times\left(sp\right)}$
2508and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times m}$. The provided
2509bit pattern versions of {\sf forward} and {\sf reverse} allow
2510to compute integer matrices $Y\in\N^{m \times p}$ and
2511$Z\in\N^{q \times m}$ corresponding to
2514\tilde{Y} = \tilde{P}\tilde{X} \qquad \mbox{and} \qquad 
2515\tilde{Z} = \tilde{U}\tilde{P} \, ,
2517respectively, with $\tilde{Y}\in\{0,1\}^{m\times\left(sp\right)}$
2518and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times n}$.
2519In general, the application of the bit pattern versions of
2520{\sf forward} or {\sf reverse} can be interpreted as
2521propagating dependences between variables forward or backward, therefore
2522both the propagated integer matrices and the corresponding
2523Boolean matrices are called {\em dependence structures}.
2525The bit pattern {\sf forward} routine
2528\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2529\>{\sf int forward(tag,m,n,p,x,X,y,Y,mode)}\\
2530\>{\sf short int tag;}              \> // tape identification \\
2531\>{\sf int m;}                      \> // number of dependent variables $m$\\
2532\>{\sf int n;}                      \> // number of independent variables $n$\\
2533\>{\sf int p;}                      \> // number of integers propagated $p$\\
2534\>{\sf double x[n];}                \> // values of independent variables $x_0$\\
2535\>{\sf unsigned long int X[n][p];}  \> // dependence structure $X$ \\
2536\>{\sf double y[m];}                \> // values of dependent variables $y_0$\\
2537\>{\sf unsigned long int Y[m][p];}  \> // dependence structure $Y$ according to
2538                                     \eqref{eq:int_forrev}\\
2539\>{\sf char mode;}                  \> // 0 : safe mode (default), 1 : tight mode
2542can be used to obtain the dependence structure $Y$ for a given dependence structure
2543$X$. The dependence structures are
2544represented as arrays of {\sf unsigned long int} the entries of which are
2545interpreted as bit patterns as described above.   
2546For example, for $n=3$ the identity matrix $I_3$ should be passed
2547with $p=1$ as the $3 \times 1$ array
2549{\sf X} \; = \;
2550\left( \begin{array}{r}
2551         {\sf 1}0000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2552         0{\sf 1}000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2553         00{\sf 1}00000 \: 00000000 \: 00000000 \: 00000000_2
2554       \end{array} \right)
2556in the 4-byte long integer format. The parameter {\sf mode} determines
2557the mode of dependence analysis as explained already in \autoref{sparse}.
2559A call to the corresponding bit pattern {\sf reverse} routine
2562\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2563\>{\sf int reverse(tag,m,n,q,U,Z,mode)}\\
2564\>{\sf short int tag;}         \> // tape identification \\
2565\>{\sf int m;}                 \> // number of dependent variables $m$\\
2566\>{\sf int n;}                 \> // number of independent variables $n$\\
2567\>{\sf int q;}                 \> // number of integers propagated q\\
2568\>{\sf unsigned long int U[q][m];}  \> // dependence structure $U$ \\
2569\>{\sf unsigned long int Z[q][n];}  \> // dependence structure $Z$ according
2570                                     to \eqref{eq:int_forrev}\\
2571\>{\sf char mode;}        \> // 0 : safe mode (default), 1 : tight mode
2574yields the dependence structure $Z$ for a given dependence structure
2577To determine the whole sparsity pattern $\tilde{P}$ of the Jacobian $F'(x)$
2578as an integer matrix $P$ one may call {\sf forward} or {\sf reverse} 
2579with $p \ge n/s$ or $q \ge m/s$, respectively. For this purpose the
2580corresponding dependence structure $X$ or $U$ must be defined to represent 
2581the identity matrix of the respective dimension.
2582Due to the fact that always a multiple of $s$ Boolean vectors are propagated
2583there may be superfluous vectors, which can be set to zero.
2585The return values of the bit pattern {\sf forward} and {\sf reverse} routines
2586correspond to those described in \autoref{retvalues}.
2588One can control the storage growth by the factor $p$ using
2589``strip-mining'' for the calls of {\sf forward} or {\sf reverse} with successive
2590groups of columns or respectively rows at a time, i.e.~partitioning
2591$X$ or $U$ appropriately as described for the computation of Jacobians
2592in \autoref{vecCas}.
2596\section{Advance algorithmic differentiation in ADOL-C}
2599\subsection{External differentiated functions}
2601Ideally, AD is applied to a given function as a whole.
2602In practice, however, sophisticated projects usually evolve over a long period of time.
2603Within this process, a heterogeneous code base for the project
2604develops, which may include the incorporation of external solutions,
2605changes in programming paradigms or even of programming languages.
2606Equally heterogeneous, the computation of derivative values appears.
2607Hence, different \mbox{AD-tools} may be combined with hand-derived
2608codes based on the same or different programming languages.
2609ADOL-C support such settings  by the concept of external
2610differentiated functions. Hence, a external differentiated function
2611itself is not differentiated by ADOL-C. The required derivative
2612information have to be provided by the user.
2614For this purpose, it is assumed that the external differentiated
2615function has the signature
2619\hspace*{2cm}{\sf int ext\_func(int n, double *yin, int m, double  *yout);}
2623where the function names can be chosen by the user as long as the names are
2624unique. This {\sf double} version of the external differentiated function has to
2625be {\em registered} using the \mbox{ADOL-C} function
2629\hspace*{2cm}{\sf edf = reg\_ext\_fct(ext\_func);}.
2633This function initializes the structure {\sf edf}. Then,
2634the user has to provide the remaining  information
2635by the following commands:
2637\hspace*{2cm}\= {\sf edf-$>$zos\_forward = zos\_for\_ext\_func;}\\
2638             \> {\sf // function pointer for computing
2639               Zero-Order-Scalar (=zos)}\\
2640             \> {\sf // forward information}\\
2641             \> {\sf edf-$>$dp\_x = xp;}\\
2642             \> {\sf edf-$>$dp\_y = yp;}\\
2643             \> {\sf // double arrays for arguments and results}\\
2644             \> {\sf edf-$>$fos\_reverse = fos\_rev\_ext\_func;} \\
2645             \> {\sf // function pointer for computing
2646               First-Order-Scalar (=fos)}\\ 
2647             \> {\sf reverse information}
2649Subsequently, the call to the external differentiated function  in the function evaluation can be
2650substituted by the call of
2654\hspace*{2cm}{\sf int call\_ext\_fct(edf, n, xp, x, m, yp, y);}
2658The usage of the external function facility is illustrated by the
2659example \verb=ext_diff_func= contained in
2661Here,the external differentiated function is also a C code, but the
2662handling as external differentiated functions also a decrease of the
2663overall required tape size.
2666\subsection{Advance algorithmic differentiation of time integration processes}
2668For many time-dependent applications, the corresponding simulations
2669are based on ordinary or partial differential equations.
2670Furthermore, frequently there are quantities that influence the
2671result of the simulation and can be seen as  control of the systems.
2672To compute an approximation of the
2673simulated process for a time interval $[0,T]$ and evaluated the
2674desired target function, one applies an
2675appropriate integration scheme given by
2677\hspace{5mm} \= some initializations yielding $x_0$\\
2678\> for $i=0,\ldots, N-1$\\
2679\hspace{10mm}\= $x_{i+1} = F(x_i,u_i,t_i)$\\
2680\hspace{5mm} \= evaluation of the target function
2682where $x_i\in {\bf R}^n$ denotes the state and $u_i\in {\bf R}^m$ the control at
2683time $t_i$ for a given time grid $t_0,\ldots,t_N$ with $t_0=0$ and
2684$t_N=T$. The operator $F : {\bf R}^n \times {\bf R}^m \times {\bf R} \mapsto {\bf R}^n$
2685defines the time step to compute the state at time $t_i$. Note that we
2686do not assume a uniform grid.
2688When computing derivatives of the target function with respect to the
2689control, the consequences for the tape generation using the ``basic''
2690taping approach as implemented in ADOL-C so far are shown in the left part of
2695\includegraphics[width=5.8cm]{tapeadv} \hspace*{0.5cm}\
2697\hspace*{0.8cm} Basic taping process \hspace*{4.3cm} Advanced taping process
2698\caption{Different taping approaches}
2701As can be seen, the iterative process is completely
2702unrolled due to the taping process. That is, the tape contains an internal representation of each
2703time step. Hence, the overall tape comprises a serious amount of redundant
2704information as illustrated by the light grey rectangles in
2707To overcome the repeated storage of essentially the same information,
2708a {\em nested taping} mechanism has been incorporated into ADOL-C as illustrated on
2709the right-hand side of \autoref{fig:bas_tap}. This new
2710capability allows the encapsulation of the time-stepping procedure
2711such that only the last time step $x_{N} = F(x_{N-1},u_{N-1})$ is taped as one
2712representative of the time steps in addition to a function pointer to the
2713evaluation procedure $F$ of the time steps.  The function pointer has
2714to be stored for a possibly necessary retaping during the derivative calculation
2715as explained below.
2717Instead of storing the complete tape, only a very limited number of intermediate
2718states are kept in memory. They serve as checkpoints, such that
2719the required information for the backward integration is generated
2720piecewise during the adjoint calculation.
2721For this modified adjoint computation the optimal checkpointing schedules
2722provided by {\bf revolve} are employed. An adapted version of the
2723software package {\sf revolve} is part of ADOL-C and automatically
2724integrated in the ADOL-C library. Based on {\sf revolve}, $c$ checkpoints are
2725distributed such that computational effort is minimized for the given
2726number of checkpoints and time steps $N$. It is important to note that the overall tape
2727size is drastically reduced due to the advanced taping strategy.  For the
2728implementation of this nested taping we introduced
2729a so-called ``differentiating context'' that enables \mbox{ADOL-C} to
2730handle different internal function representations during the taping
2731procedure and the derivative calculation. This approach allows the generation of a new
2732tape inside the overall tape, where the coupling of the different tapes is based on
2733the {\em external differentiated function} described above.
2735Written under the objective of minimal user effort, the checkpointing routines
2736of \mbox{ADOL-C} need only very limited information. The user must
2737provide two routines as implementation of the time-stepping function $F$ 
2738with the signatures
2742\hspace*{2cm}{\sf int time\_step\_function(int n, adouble *u);}\\
2743\hspace*{2cm}{\sf int time\_step\_function(int n, double *u);}
2747where the function names can be chosen by the user as long as the names are
2748unique.It is possible that the result vector of one time step
2749iteration overwrites the argument vector of the same time step. Then, no
2750copy operations are required to prepare the next time step.
2752At first, the {\sf adouble} version of the time step function has to
2753be {\em registered} using the \mbox{ADOL-C} function
2757\hspace*{2cm}{\sf CP\_Context cpc(time\_step\_function);}.
2761This function initializes the structure {\sf cpc}. Then,
2762the user has to provide the remaining checkpointing information
2763by the following commands:
2765\hspace*{2cm}\= {\sf cpc.setDoubleFct(time\_step\_function);}\\
2766             \> {\sf // double variante of the time step function}\\
2767             \> {\sf cpc.setNumberOfSteps(N);}\\
2768             \> {\sf // number of time steps to perform}\\
2769             \> {\sf cpc.setNumberOfCheckpoints(10);}\\
2770             \> {\sf // number of checkpoint} \\
2771             \> {\sf cpc.setDimensionXY(n);}\\
2772             \> {\sf // dimension of input/output}\\
2773             \> {\sf cpc.setInput(y);}\\
2774             \> {\sf // input vector} \\
2775             \> {\sf cpc.setOutput(y);}\\
2776             \> {\sf // output vector }\\
2777             \> {\sf cpc.setTapeNumber(tag\_check);}\\
2778             \> {\sf // subtape number for checkpointing} \\
2779             \> {\sf cpc.setAlwaysRetaping(false);}\\
2780             \> {\sf // always retape or not ?}
2782Subsequently, the time loop in the function evaluation can be
2783substituted by a call of the function
2787\hspace*{2cm}{\sf int cpc.checkpointing();}
2791Then, ADOL-C computes derivative information using the optimal checkpointing
2792strategy provided by {\sf revolve} internally, i.e., completely hidden from the user.
2794The presented driver is prototyped in the header file
2795\verb=<adolc/checkpointing.h>=. This header
2796is included by the global header file \verb=<adolc/adolc.h>= automatically.
2797An example program \verb=checkpointing.cpp= illustrates the
2798checkpointing facilities. It can be found in the directory \verb=examples/additional_examples/checkpointing=.
2802\subsection{Advance algorithmic differentiation of fixed point iterations}
2804Quite often, the state of the considered system denoted by $x\in\R^n$
2805depends on some design parameters denoted by $u\in\R^m$. One example for this setting
2806forms the flow over an aircraft wing. Here, the shape of the wing that
2807is defined by the design vector $u$ 
2808determines the flow field $x$. The desired quasi-steady state $x_*$
2809fulfills the fixed point equation
2811  \label{eq:fixedpoint}
2812  x_* = F(x_*,u)
2814for a given continuously differentiable function
2815$F:\R^n\times\R^m\rightarrow\R^n$. A fixed point property of this kind is
2816also exploited by many other applications.
2818Assume that one can apply the iteration 
2821 x_{k+1} = F(x_k,u)
2823to obtain a linear converging sequence $\{x_k\}$ generated
2824for any given control $u\in\R^n$. Then the limit point $x_*\in\R^n$ fulfils the fixed
2825point equation~\eqref{eq:fixedpoint}. Moreover,
2826suppose that $\|\frac{dF}{dx}(x_*,u)\|<1$ holds for any pair
2827$(x_*,u)$ satisfying equation \eqref{eq:fixedpoint}.
2828Hence, there exists a
2829differentiable function $\phi:\R^m \rightarrow \R^n$,
2830such that $\phi(u) = F(\phi(u),u)$, where the state
2831$\phi(u)$ is a fixed point of $F$ according to a control
2832$u$. To optimize the system described by the state vector $x=\phi(u)$ with respect to
2833the design vector $u$, derivatives of $\phi$ with respect
2834to $u$ are of particular interest.
2836To exploit the advanced algorithmic differentiation  of such fixed point iterations
2837ADOL-C provides the special functions {\tt fp\_iteration(...)}.
2838It has the following interface:
2840\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2841\>{\sf int
2842  fp\_iteration(}\={\sf sub\_tape\_num,double\_F,adouble\_F,norm,norm\_deriv,eps,eps\_deriv,}\\
2843\>              \>{\sf N\_max,N\_max\_deriv,x\_0,u,x\_fix,dim\_x,dim\_u)}\\
2844\hspace{0.5in}\={\sf short int tag;} \hspace{0.9in}\= \kill    % define tab position
2845\>{\sf short int sub\_tape\_num;}         \> // tape identification for sub\_tape \\
2846\>{\sf int *double\_F;}         \> // pointer to a function that compute for $x$ and $u$ \\
2847\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
2848\>{\sf int *adouble\_F;}        \> // pointer to a function that compute for $x$ and $u$ \\
2849\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
2850\>{\sf int *norm;}              \> // pointer to a function that computes\\
2851\>                              \> // the norm of a vector\\
2852\>{\sf int *norm\_deriv;}       \> // pointer to a function that computes\\
2853\>                              \> // the norm of a vector\\
2854\>{\sf double eps;}             \> // termination criterion for fixed point iteration\\
2855\>{\sf double eps\_deriv;}      \> // termination criterion for adjoint fixed point iteration\\
2856\>{\sf N\_max;}                 \> // maximal number of itertions for state computation\\
2857\>{\sf N\_max\_deriv;}          \> // maximal number of itertions for adjoint computation\\
2858\>{\sf adouble *x\_0;}          \> // inital state of fixed point iteration\\
2859\>{\sf adouble *u;}             \> // value of $u$\\
2860\>{\sf adouble *x\_fic;}        \> // final state of fixed point iteration\\
2861\>{\sf int dim\_x;}             \> // dimension of $x$\\
2862\>{\sf int dim\_u;}             \> // dimension of $u$\\
2865Here {\tt sub\_tape\_num} is an ADOL-C identifier for the subtape that
2866should be used for the fixed point iteration.
2867{\tt double\_F} and {\tt adouble\_F} are pointers to functions, that
2868compute for $x$ and $u$ a single iteration step $y=F(x,u)$. Thereby
2869{\tt double\_F} uses {\tt double} arguments and {\tt adouble\_F}
2870uses ADOL-C {\tt adouble} arguments. The parameters {\tt norm} and
2871{\tt norm\_deriv} are pointers to functions computing the norm
2872of a vector. The latter functions together with {\tt eps},
2873{\tt eps\_deriv}, {\tt N\_max}, and {\tt N\_max\_deriv} control
2874the iterations. Thus the following loops are performed:
2877  do                     &   do                           \\
2878  ~~~~$k = k+1$          &   ~~~~$k = k+1$                \\
2879  ~~~~$x = y$            &   ~~~~$\zeta = \xi$            \\
2880  ~~~~$y = F(x,u)$       &   ~~~
2881  $(\xi^T,\bar u^T) = \zeta^TF'(x_*,u) + (\bar x^T, 0^T)$ \\
2882  while $\|y-x\|\geq\varepsilon$ and $k\leq N_{max}$ \hspace*{0.5cm} &
2883  while $\|\xi -\zeta\|_{deriv}\geq\varepsilon_{deriv}$   \\
2884  & and $k\leq N_{max,deriv}$
2887The vector for the initial iterate and the control is stored
2888in {\tt x\_0} and {\tt u} respectively. The vector in which the
2889fixed point is stored is {\tt x\_fix}. Finally {\tt dim\_x}
2890and {\tt dim\_u} represent the dimensions $n$ and $m$ of the
2891corresponding vectors.
2893The presented driver is prototyped in the header file
2894\verb=<adolc/fixpoint.h>=. This header
2895is included by the global header file \verb=<adolc/adolc.h>= automatically.
2896An example code that shows also the
2897expected signature of the function pointers is contained in the directory \verb=examples/additional_examples/fixpoint_exam=.
2899\subsection{Advance algorithmic differentiation of OpenMP parallel programs}
2901ADOL-C allows to compute derivatives in parallel for functions
2902containing OpenMP parallel loops.
2903This implies that an explicit loop-handling approach is applied. A
2904typical situation is shown in \autoref{fig:basic_layout},
2906    \vspace{3ex}
2907    \begin{center}
2908        \includegraphics[height=4cm]{multiplexed} \\
2909        \begin{picture}(0,0)
2910            \put(-48,40){\vdots}
2911            \put(48,40){\vdots}
2912            \put(-48,80){\vdots}
2913            \put(48,80){\vdots}
2914            \put(-83,132){function eval.}
2915            \put(5,132){derivative calcul.}
2916        \end{picture}
2917    \end{center}
2918    \vspace{-5ex}
2919    \caption{Basic layout of mixed function and the corresponding derivation process}
2920    \label{fig:basic_layout}
2922where the OpenMP-parallel loop is preceded by a serial startup
2923calculation and followed by a serial finalization phase.
2925Initialization of the OpenMP-parallel regions for \mbox{ADOL-C} is only a matter of adding a macro to the outermost OpenMP statement.
2926Two macros are available that only differ in the way the global tape information is handled.
2927Using {\tt ADOLC\_OPENMP}, this information, including the values of the augmented variables, is always transferred from the serial to the parallel region using {\it firstprivate} directives for initialization.
2928For the special case of iterative codes where parallel regions, working on the same data structures, are called repeatedly the {\tt ADOLC\_OPENMP\_NC} macro can be used.
2929Then, the information transfer is performed only once within the iterative process upon encounter of the first parallel region through use of the {\it threadprivate} feature of OpenMP that makes use of thread-local storage, i.e., global memory local to a thread.
2930Due to the inserted macro, the OpenMP statement has the following structure:
2932\hspace*{1cm} \= {\sf \#pragma omp ... ADOLC\_OPENMP} \qquad \qquad or \\
2933              \> {\sf \#pragma omp ... ADOLC\_OPENMP\_NC}
2935Inside the parallel region, separate tapes may then be created.
2936Each single thread works in its own dedicated AD-environment, and all
2937serial facilities of \mbox{ADOL-C} are applicable as usual. The global
2938derivatives can be computed using the tapes created in the serial and
2939parallel parts of the function evaluation, where user interaction is
2940required for the correct derivative concatenation of the various tapes.
2942For the usage of the parallel facilities, the \verb=configure=-command
2943has to be used with the option \verb?--with-openmp-flag=FLAG?, where
2944\verb=FLAG= stands for the system dependent OpenMP flag.
2945The parallel differentiation of a parallel program is illustrated
2946by the example program \verb=openmp_exam.cpp= contained in \verb=examples/additional_examples/openmp_exam=.
2950\section{Tapeless forward differentiation in ADOL-C}
2953Up to version 1.9.0, the development of the ADOL-C software package
2954was based on the decision to store all data necessary for derivative
2955computation on tapes, where large applications require the tapes to be
2956written out to corresponding files. In almost all cases this means
2957a considerable drawback in terms of run time due to the excessive
2958memory accesses. Using these tapes enables ADOL-C to offer multiple
2959functions. However, it is not necessary for all tasks of derivative
2960computation to do that.
2962Starting with version 1.10.0, ADOL-C now features a tapeless forward
2963mode for computing first order derivatives in scalar mode, i.e.,
2964$\dot{y} = F'(x)\dot{x}$, and in vector mode, i.e., $\dot{Y} = F'(x)\dot{X}$.
2965This tapeless variant coexists with the more universal
2966tape based mode in the package. The following subsections describe
2967the source code modifications required to use the tapeless forward mode of
2970\subsection{Modifying the Source Code}
2972Let us consider the coordinate transformation from Cartesian to spherical
2973polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y
2974= F(x)$, with
2976y_1  =  \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad
2977y_2  =  \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad
2978y_3  =  \arctan(x_2/x_1),
2980as an example. The corresponding source code is shown in \autoref{fig:tapeless}.
2986\= \kill
2987\> {\sf \#include} {\sf $<$iostream$>$}\\
2988\> {\sf using namespace std;}\\
2989\> \\
2990\> {\sf int main() \{}\\
2991\> {\sf \rule{0.5cm}{0pt}double x[3], y[3];}\\
2992\> \\
2993\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
2994\> {\sf \rule{1cm}{0pt}...}\\
2995\> \\
2996\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
2997\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
2998\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
2999\> \\
3000\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0] $<<$ " , y2=" $<<$ y[1] $<<$ " , y3=" $<<$ y[2] $<<$ endl;}\\
3001\> \\
3002\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3003\> \}
3008\caption{Example for tapeless forward mode}
3012Changes to the source code that are necessary for applying the
3013tapeless forward ADOL-C are described in the following two
3014subsections, where the vector mode version is described
3015as extension of the scalar mode.
3017\subsubsection*{The scalar mode}
3019To use the tapeless forward mode, one has to include one
3020of the header files \verb#adolc.h# or \verb#adouble.h#
3021where the latter should be preferred since it does not include the
3022tape based functions defined in other header files. Hence, including
3023\verb#adouble.h# avoids mode mixtures, since
3024\verb#adolc.h# is just a wrapper for including all public
3025  headers of the ADOL-C package and does not offer own functions.
3026Since the two ADOL-C forward mode variants tape-based and tapeless,
3027are prototyped in the same header file, the compiler needs to know if a
3028tapeless version is intended. This can be done by defining a
3029preprocessor macro named {\sf ADOLC\_TAPELESS}. Note that it is
3030important to define this macro before the header file is included.
3031Otherwise, the tape-based version of ADOL-C will be used.
3033As in the tape based forward version of ADOL-C all derivative
3034calculations are introduced by calls to overloaded
3035operators. Therefore, similar to the tape-based version all
3036independent, intermediate and dependent variables must be declared
3037with type {\sf adouble}. The whole tapeless functionality provided by
3038\verb#adolc.h# was written as complete inline intended code
3039due to run time aspects, where the real portion of inlined code can
3040be influenced by switches for many compilers. Likely, the whole
3041derivative code is inlined by default. Our experiments
3042with the tapeless mode have produced complete inlined code by using
3043standard switches (optimization) for GNU and Intel C++
3046To avoid name conflicts
3047resulting from the inlining the tapeless version has its own namespace
3048\verb#adtl#. As a result four possibilities of using the {\sf adouble}
3049type are available for the tapeless version:
3051\item Defining a new type
3052      \begin{center}
3053        \begin{tabular}{l}
3054          {\sf typedef adtl::adouble adouble;}\\
3055          ...\\
3056          {\sf adouble tmp;}
3057        \end{tabular}
3058      \end{center}
3059      This is the preferred way. Remember, you can not write an own
3060      {\sf adouble} type/class with different meaning after doing the typedef.
3061\item Declaring with namespace prefix
3062      \begin{center}
3063        \begin{tabular}{l}
3064          {\sf adtl::adouble tmp;}
3065        \end{tabular}
3066      \end{center}
3067      Not the most handsome and efficient way with respect to coding
3068      but without any doubt one of the safest ways. The identifier
3069      {\sf adouble} is still available for user types/classes.
3070\item Trusting macros
3071      \begin{center}
3072        \begin{tabular}{l}
3073          {\sf \#define adouble adtl::adouble}\\
3074          ...\\
3075          {\sf adouble tmp;}
3076        \end{tabular}
3077      \end{center}
3078      This approach should be used with care, since standard defines are text replacements.
3079  \item Using the complete namespace
3080        \begin{center}
3081          \begin{tabular}{l}
3082            {\sf \#using namespace adtl;}\\
3083            ...\\
3084            {\sf adouble tmp;}
3085          \end{tabular}
3086        \end{center}
3087        A very clear approach with the disadvantage of uncovering all the hidden secrets. Name conflicts may arise!
3089After defining the variables only two things are left to do. First
3090one needs to initialize the values of the independent variables for the
3091function evaluation. This can be done by assigning the variables a {\sf
3092double} value. The {\sf ad}-value is set to zero in this case.
3093Additionally, the tapeless forward mode variant of ADOL-C
3094offers a function named {\sf setValue} for setting the value without
3095changing the {\sf ad}-value. To set the {\sf ad}-values of the independent
3096variables ADOL-C offers two possibilities:
3098  \item Using the constructor
3099        \begin{center}
3100          \begin{tabular}{l}
3101            {\sf adouble x1(2,1), x2(4,0), y;}
3102          \end{tabular}
3103        \end{center}
3104        This would create three adoubles $x_1$, $x_2$ and $y$. Obviously, the latter
3105        remains uninitialized. In terms of function evaluation
3106        $x_1$ holds the value 2 and $x_2$ the value 4 whereas the derivative values
3107        are initialized to $\dot{x}_1=1$ and $\dot{x}_2=0$.
3108   \item Setting point values directly
3109         \begin{center}
3110           \begin{tabular}{l}
3111             {\sf adouble x1=2, x2=4, y;}\\
3112             ...\\
3113             {\sf x1.setADValue(1);}\\
3114             {\sf x2.setADValue(0);}
3115           \end{tabular}
3116         \end{center}
3117         The same example as above but now using {\sf setADValue}-method for initializing the derivative values.
3120The derivatives can be obtained at any time during the evaluation
3121process by calling the {\sf getADValue}-method
3123  \begin{tabular}{l}
3124    {\sf adouble y;}\\
3125    ...\\
3126    {\sf cout $<<$ y.getADValue();}
3127  \end{tabular}
3129\autoref{fig:modcode} shows the resulting source code incorporating
3130all required changes for the example
3131given above.
3137\hspace*{-1cm} \= \kill
3138\> {\sf \#include $<$iostream$>$}\\
3139\> {\sf using namespace std;}\\
3140\> \\
3141\> {\sf \#define ADOLC\_TAPELESS}\\
3142\> {\sf \#include $<$adouble.h$>$}\\
3143\> {\sf typedef adtl::adouble adouble;}\\
3145\> {\sf int main() \{}\\
3146\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3148\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3149\> {\sf \rule{1cm}{0pt}...}\\
3151\> {\sf \rule{0.5cm}{0pt}x[0].setADValue(1);\hspace*{3cm}// derivative of f with respect to $x_1$}\\
3152\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3153\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3154\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3156\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3157\> {\sf \rule{0.5cm}{0pt}cout $<<$ "dy2/dx1 = " $<<$ y[1].getADValue() $<<$ endl;}\\
3158\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3159\> {\sf \}}
3163\caption{Example for tapeless scalar forward mode}
3167\subsubsection*{The vector mode}
3169In scalar mode only one direction element has to be stored per {\sf
3170  adouble} whereas a field of $p$ elements is needed in the vector
3171  mode to cover the computations for the given $p$ directions. The
3172  resulting changes to the source code are described in this section.
3174Similar to tapeless scalar forward mode, the tapeless vector forward
3175mode is used by defining {\sf ADOLC\_TAPELESS}. Furthermore, one has to define
3176an additional preprocessor macro named {\sf NUMBER\_DIRECTIONS}. This
3177macro takes the maximal number of directions to be used within the
3178resulting vector mode. Just as {\sf ADOLC\_TAPELESS} the new macro
3179must be defined before including the \verb#<adolc.h/adouble.h>#
3180header file since it is ignored otherwise.
3182In many situations recompiling the source code to get a new number of
3183directions is at least undesirable. ADOL-C offers a function named
3184{\sf setNumDir} to work around this problem partially. Calling this
3185function, ADOL-C does not take the number of directions
3186from the macro {\sf NUMBER\_DIRECTIONS} but from the argument of
3187{\sf setNumDir}. A corresponding source code would contain the following lines: 
3189  \begin{tabular}{l}
3190    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3191    ...\\
3192    {\sf adtl::setNumDir(5);}
3193  \end{tabular}
3195Note that using this function does not
3196change memory requirements that can be roughly determined by
3197({\sf NUMBER\_DIRECTIONS}$+1$)*(number of {\sf adouble}s).
3199Compared to the scalar case setting and getting the derivative
3200values, i.e. the directions, is more involved. Instead of
3201working with single {\sf double} values, pointer to fields of {\sf
3202double}s are used as illustrated by the following example:
3204  \begin{tabular}{l}
3205    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3206    ...\\
3207    {\sf adouble x, y;}\\
3208    {\sf double *ptr=new double[NUMBER\_DIRECTIONS];}\\
3209      ...\\
3210    {\sf x1=2;}\\
3211    {\sf x1.setADValue(ptr);}\\
3212    ...\\
3213    {\sf ptr=y.getADValue();}
3214  \end{tabular}
3216Additionally, the tapeless vector forward mode of ADOL-C offers two
3217new methods for setting/getting the derivative values. Similar
3218to the scalar case, {\sf double} values are used but due to the vector
3219mode the position of the desired vector element must be supplied in
3220the argument list:
3222  \begin{tabular}{l}
3223    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3224    ...\\
3225    {\sf adouble x, y;}\\
3226    ...\\
3227    {\sf x1=2;}\\
3228    {\sf x1.setADValue(5,1);\hspace*{3.7cm}// set the 6th point value of x to 1.0}\\
3229      ...\\
3230    {\sf cout $<<$ y.getADValue(3) $<<$ endl;\hspace*{1cm}// print the 4th derivative value of y}
3231  \end{tabular}
3233The resulting source code containing all changes that are required is
3234shown in \autoref{fig:modcode2}
3239\hspace*{-1cm} \= \kill
3240\> {\sf \#include $<$iostream$>$}\\
3241\> {\sf  using namespace std;}\\
3243\> {\sf \#define ADOLC\_TAPELESS}\\
3244\> {\sf \#define NUMBER\_DIRECTIONS 3}\\
3245\> {\sf \#include $<$adouble.h$>$}\\
3246\> {\sf typedef adtl::adouble adouble;}\\
3250\> {\sf int main() \{}\\
3251\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3253\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3254\> {\sf \rule{1cm}{0pt}...\hspace*{3cm}// Initialize $x_i$}\\
3255\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j) if (i==j) x[i].setADValue(j,1);}\\
3256\> {\sf \rule{0.5cm}{0pt}\}}\\
3258\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3259\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3260\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3262\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3263\> {\sf \rule{0.5cm}{0pt}cout $<<$ "jacobian : " $<<$ endl;}\\
3264\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3265\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j)}\\
3266\> {\sf \rule{1.5cm}{0pt}cout $<<$ y[i].getADValue(j) $<<$ "  ";}\\
3267\> {\sf \rule{1cm}{0pt}cout $<<$ endl;}\\
3268\> {\sf \rule{0.5cm}{0pt}\}}\\
3269\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3270\> {\sf \}}
3273\caption{Example for tapeless vector forward mode}
3277\subsection{Compiling and Linking the Source Code}
3279After incorporating the required changes, one has to compile the
3280source code and link the object files to get the executable.
3281As long as the ADOL-C header files are not included in the absolute path
3282the compile sequence should be similar to the following example:
3284  \begin{tabular}{l}
3285    {\sf g++ -I/home/username/adolc\_base/include -c tapeless\_scalar.cpp}
3286  \end{tabular}
3288The \verb#-I# option tells the compiler where to search for the ADOL-C
3289header files. This option can be omitted when the headers are included
3290with absolute path or if ADOL-C is installed in a ``global'' directory.
3292Since the tapeless forward version of ADOL-C is implemented in the
3293header \verb#adouble.h# as complete inline intended version,
3294the object files do not need to be linked against any external ADOL-C
3295code or the ADOL-C library. Therefore, the example started above could be finished with the
3296following command:
3298  \begin{tabular}{l}
3299    {\sf g++ -o tapeless\_scalar tapeless\_scalar.o}
3300  \end{tabular}
3302The mentioned source codes {\sf tapeless\_scalar.c} and {\sf tapeless\_vector.c} 
3303illustrating the use of the for tapeless scalar and vector mode can be found in
3304the directory {\sf examples}.
3306\subsection{Concluding Remarks for the Tapeless Forward Mode Variant}
3308As many other AD methods the tapeless forward mode provided by the
3309ADOL-C package has its own strengths and drawbacks. Please read the
3310following section carefully to become familiar with the things that
3311can occur:
3313  \item Advantages:
3314    \begin{itemize}
3315      \item Code speed\\
3316        Increasing computation speed was one of the main aspects in writing
3317        the tapeless code. In many cases higher performance can be
3318        expected this way.
3319      \item Easier linking process\\
3320        As another result from the code inlining the object code does
3321        not need to be linked against an ADOL-C library.
3322      \item Smaller overall memory requirements\\
3323        Tapeless ADOL-C does not write tapes anymore, as the name
3324        implies. Loop ''unrolling'' can be avoided this
3325        way. Considered main memory plus disk space as overall memory
3326        requirements the tapeless version can be
3327        executed in a more efficient way.
3328    \end{itemize}
3329  \item Drawbacks:
3330    \begin{itemize}
3331    \item Main memory limitations\\
3332      The ability to compute derivatives to a given function is
3333      bounded by the main memory plus swap size  when using
3334      tapeless ADOL-C. Computation from swap should be avoided anyway
3335      as far as possible since it slows down the computing time
3336      drastically. Therefore, if the program execution is 
3337      terminated without error message insufficient memory size can be
3338      the reason among other things. The memory requirements $M$ can
3339      be determined roughly as followed:
3340      \begin{itemize}
3341        \item Scalar mode: $M=$(number of {\sf adouble}s)$*2 + M_p$
3342        \item Vector mode: $M=$(number of {\sf adouble}s)*({\sf
3343          NUMBER\_DIRECTIONS}$+1) + M_p$ 
3344      \end{itemize}
3345      where the storage size of all non {\sf adouble} based variables is described by $M_p$.
3346    \item Compile time\\
3347      As discussed in the previous sections, the tapeless forward mode of
3348      the ADOL-C package is implemented as inline intended version. Using
3349      this approach results in a higher source code size, since every
3350      operation involving at least one {\sf adouble} stands for the
3351      operation itself as well as for the corresponding derivative
3352      code after the inlining process. Therefore, the compilation time
3353      needed for the tapeless version may be higher than that of the tape based code.
3354    \item Code Size\\
3355      A second drawback and result of the code inlining is the
3356      increase of code sizes for the binaries. The increase
3357      factor compared to the corresponding tape based program is
3358      difficult to quantify as it is task dependent. Practical results
3359      have shown that values between 1.0 and 2.0 can be
3360      expected. Factors higher than 2.0 are possible too and even
3361      values below 1.0 have been observed.
3362    \end{itemize}
3366\section{Installing and Using ADOL-C}
3369\subsection{Generating the ADOL-C Library}
3372The currently built system is best summarized by the ubiquitous gnu
3373install triplet
3375\verb=configure - make - make install= .
3377Executing this three steps from the package base directory
3378\verb=</SOMEPATH/=\texttt{\packagetar}\verb=>= will compile the static and the dynamic
3379ADOL-C library with default options and install the package (libraries
3380and headers) into the default installation directory {\tt
3381  \verb=<=\$HOME/adolc\_base\verb=>=}. Inside the install directory
3382the subdirectory \verb=include= will contain all the installed header
3383files that may be included by the user program, the subdirectory
3384\verb=lib= will contain the 32-bit compiled library
3385and the subdirectory \verb=lib64= will contain the 64-bit compiled
3386library. Depending on the compiler only one of \verb=lib= or
3387\verb=lib64= may be created.
3389Before doing so the user may modify the header file \verb=usrparms.h=
3390in order to tailor the \mbox{ADOL-C} package to the needs in the
3391particular system environment as discussed in
3392\autoref{Customizing}. The configure procedure which creates the necessary
3393\verb=Makefile=s can be customized by use of some switches. Available
3394options and their meaning can be obtained by executing
3395\verb=./configure --help= from the package base directory.
3397All object files and other intermediately generated files can be
3398removed by the call \verb=make clean=. Uninstalling ADOL-C by
3399executing \verb=make uninstall= is only reasonable after a previous
3400called \verb=make install= and will remove all installed package files
3401but will leave the created directories behind.
3403The sparse drivers are included in the ADOL-C libraries if the
3404\verb=./configure= command is executed with the option
3405\verb=--enable-sparse=. The ColPack library available at
3406\verb= is required to
3407compute the sparse structures, and is searched for in all the default
3408locations as well as in the subdirectory \verb=<ThirdParty/ColPack/>=.
3409In case the library and its headers are installed in a nonstandard path
3410this may be specified with the \verb?--with-colpack=PATH? option.
3411It is assumed that the library and its header files have the following
3412directory structure: \verb?PATH/include? contains all the header
3414\verb?PATH/lib? contains the 32-bit compiled library and
3415\verb?PATH/lib64? contains the 64-bit compiled library. Depending on
3416the compiler used to compile {\sf ADOL-C} one of these libraries will
3417be used for linking.
3419\subsection{Compiling and Linking the Example Programs}
3421The installation procedure described in \autoref{genlib} also
3422provides the \verb=Makefile=s  to compile the example programs in the
3423directories \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= and the
3424additional examples in
3425\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples/additional_examples=. However,
3426one has to execute the
3427\verb=configure= command with  appropriate options for the ADOL-C package to enable the compilation of
3428examples. Available options are:
3431\verb=--enable-docexa=&build all examples discussed in this manual\\
3432&(compare \autoref{example})\\
3433\verb=--enable-addexa=&build all additional examples\\
3434&(See file \verb=README= in the various subdirectories)
3438Just calling \verb=make= from the packages base directory generates
3439all configured examples and the library if necessary. Compiling from
3440subdirectory \verb=examples= or one of its subfolders is possible
3441too. At least one kind of the ADOL-C library (static or shared) must
3442have been built previously in that case. Hence, building the library
3443is always the first step.
3445For Compiling the library and the documented examples on Windows using
3446Visual Studio please refer to the \verb=<Readme_VC++.txt>= files in
3447the \verb=<windows/>=, \verb=<ThirdParty/ColPack/>= and
3448\verb=<ADOL-C/examples/>= subdirectories.
3450\subsection{Description of Important Header Files}
3453The application of the facilities of ADOL-C requires the user
3454source code (program or module) to include appropriate
3455header files where the desired data types and routines are
3456prototyped. The new hierarchy of header files enables the user
3457to take one of two possible ways to access the right interfaces.
3458The first and easy way is recommended to beginners: As indicated in
3459\autoref{globalHeaders} the provided {\em global} header file
3460\verb=<adolc/adolc.h>= can be included by any user code to support all
3461capabilities of ADOL-C depending on the particular programming language
3462of the source.   
3465\center \small
3467\verb=<adolc/adolc.h>= & 
3469  \boldmath $\rightarrow$ \unboldmath
3470                 & global header file available for easy use of ADOL-C; \\
3471  $\bullet$      & includes all ADOL-C header files depending on
3472                   whether the users source is C++ or C code.
3474\\ \hline
3475\verb=<adolc/usrparms.h>= &
3477  \boldmath $\rightarrow$ \unboldmath
3478                 & user customization of ADOL-C package (see
3479                   \autoref{Customizing}); \\
3480  $\bullet$      & after a change of
3481                   user options the ADOL-C library \verb=libadolc.*=
3482                   has to be rebuilt (see \autoref{genlib}); \\
3483  $\bullet$      & is included by all ADOL-C header files and thus by all user
3484                   programs.
3485\end{tabular*} \\ \hline
3487\caption{Global header files}
3491The second way is meant for the more advanced ADOL-C user: Some source code
3492includes only those interfaces used by the particular application.
3493The respectively needed header files are indicated
3494throughout the manual.
3495Existing application determined dependences between the provided
3496ADOL-C routines are realized by automatic includes of headers in order
3497to maintain easy use. The header files important to the user are described
3498in the \autoref{importantHeaders1} and \autoref{importantHeaders2}.
3501\center \small
3503%\multicolumn{2}{|l|}{\bf Tracing/taping}\\ \hline
3504\verb=<adolc/adouble.h>= & 
3506  \boldmath $\rightarrow$ \unboldmath
3507                & provides the interface to the basic active
3508                  scalar data type of ADOL-C: {\sf class adouble} 
3509                  (see \autoref{prepar});
3510%  $\bullet$     & includes the header files \verb=<adolc/avector.h>= and \verb=<adolc/taputil.h>=.
3512\\ \hline
3513% \verb=<adolc/avector.h>= &
3515%  \boldmath $\rightarrow$ \unboldmath
3516%                & provides the interface to the active vector
3517%                  and matrix data types of ADOL-C: {\sf class adoublev}
3518%                  and {\sf class adoublem}, respectively
3519%                   (see \autoref{arrays}); \\
3520%  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
3522%\\ \hline
3523 \verb=<adolc/taputil.h>= & 
3525  \boldmath $\rightarrow$ \unboldmath
3526                & provides functions to start/stop the tracing of
3527                  active sections (see \autoref{markingActive})
3528                  as well as utilities to obtain
3529                  tape statistics (see \autoref{examiningTape}); \\
3530  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
3532\\ \hline
3534\caption{Important header files: tracing/taping}
3539\center \small
3541%\multicolumn{2}{|l|}{\bf Evaluation of derivatives}\\ \hline
3542\verb=<adolc/interfaces.h>= & 
3544  \boldmath $\rightarrow$ \unboldmath
3545                & provides interfaces to the {\sf forward} and
3546                  {\sf reverse} routines as basic versions of derivative
3547                  evaluation (see \autoref{forw_rev}); \\
3548  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
3549  $\bullet$     & includes the header \verb=<adolc/sparse/sparsedrivers.h>=; \\
3550  $\bullet$     & is included by the header \verb=<adolc/drivers/odedrivers.h>=.
3552\\ \hline
3553\verb=<adolc/drivers.h>= & 
3555  \boldmath $\rightarrow$ \unboldmath
3556                & provides ``easy to use'' drivers for solving
3557                  optimization problems and nonlinear equations
3558                  (see \autoref{optdrivers}); \\
3559  $\bullet$     & comprises C and Fortran-callable versions.
3561\\ \hline
3563\verb=<adolc/sparse/=\newline\verb= sparsedrivers.h>=
3564\end{minipage}  & 
3566  \boldmath $\rightarrow$ \unboldmath
3567                & provides the ``easy to use'' sparse drivers
3568                  to exploit the sparsity structure of
3569                  Jacobians (see \autoref{sparse}); \\
3570  \boldmath $\rightarrow$ \unboldmath & provides interfaces to \mbox{C++}-callable versions
3571                  of {\sf forward} and {\sf reverse} routines
3572                  propagating bit patterns (see \autoref{ProBit}); \\
3574  $\bullet$     & is included by the header \verb=<adolc/interfaces.h>=.
3576\\ \hline
3578\verb=<adolc/sparse/=\newline\verb= sparse_fo_rev.h>=
3579\end{minipage}  & 
3581  \boldmath $\rightarrow$ \unboldmath
3582                & provides interfaces to the underlying C-callable
3583                  versions of {\sf forward} and {\sf reverse} routines
3584                  propagating bit patterns.
3586\\ \hline
3588\verb=<adolc/drivers/=\newline\verb= odedrivers.h>=
3589\end{minipage}  &
3591  \boldmath $\rightarrow$ \unboldmath
3592                & provides ``easy to use'' drivers for numerical
3593                  solution of ordinary differential equations
3594                  (see \autoref{odedrivers}); \\
3595  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
3596  $\bullet$     & includes the header \verb=<adolc/interfaces.h>=.
3598\\ \hline
3600\verb=<adolc/drivers/=\newline\verb= taylor.h>=
3601\end{minipage}  &
3603  \boldmath $\rightarrow$ \unboldmath
3604                & provides ``easy to use'' drivers for evaluation
3605                  of higher order derivative tensors (see
3606                  \autoref{higherOrderDeriv}) and inverse/implicit function
3607                  differentiation (see \autoref{implicitInverse});\\
3608  $\bullet$     & comprises C++ and C-callable versions.
3610\\ \hline
3611\verb=<adolc/adalloc.h>= &
3613  \boldmath $\rightarrow$ \unboldmath
3614                & provides C++ and C functions for allocation of
3615                  vectors, matrices and three dimensional arrays
3616                  of {\sf double}s.
3618\\ \hline
3620\caption{Important header files: evaluation of derivatives}
3624\subsection{Compiling and Linking C/C++ Programs}
3626To compile a C/C++ program or single module using ADOL-C
3627data types and routines one has to ensure that all necessary
3628header files according to \autoref{ssec:DesIH} are
3629included. All modules involving {\em active} data types as
3630{\sf adouble}
3631%, {\bf adoublev} and {\bf adoublem}
3632have to be compiled as C++. Modules that make use of a previously
3633generated tape to evaluate derivatives can either be programmed in ANSI-C
3634(while avoiding all C++ interfaces) or in C++. Depending
3635on the chosen programming language the header files provide
3636the right ADOL-C prototypes.
3637For linking the resulting object codes the library \verb=libadolc.*=
3638must be used (see \autoref{genlib}).
3640\subsection{Adding Quadratures as Special Functions}
3644Suppose an integral
3645\[ f(x) = \int\limits^{x}_{0} g(t) dt \]
3646is evaluated numerically by a user-supplied function
3648{\sf  double  myintegral(double\& x);}
3650Similarly, let us suppose that the integrand itself is evaluated by
3651a user-supplied block of C code {\sf integrand}, which computes a
3652variable with the fixed name {\sf val} from a variable with the fixed
3653name {\sf arg}. In many cases of interest, {\sf integrand} will
3654simply be of the form
3656{\sf \{ val = expression(arg) \}}\enspace .
3658In general, the final assignment to {\sf val} may be preceded
3659by several intermediate calculations, possibly involving local
3660active variables of type {\sf adouble}, but no external or static
3661variables of that type.  However, {\sf integrand} may involve local
3662or global variables of type {\sf double} or {\sf int}, provided they
3663do not depend on the value of {\sf arg}. The variables {\sf arg} and
3664{\sf val} are declared automatically; and as {\sf integrand} is a block
3665rather than a function, {\sf integrand} should have no header line. 
3667Now the function {\sf myintegral} can be overloaded for {\sf adouble}
3668arguments and thus included in the library of elementary functions
3669by the following modifications:
3672At the end of the file \verb=<adouble.cpp>=, include the full code
3673defining \\ {\sf double myintegral(double\& x)}, and add the line
3675{\sf extend\_quad(myintegral, integrand); }
3677This macro is extended to the definition of
3678 {\sf adouble myintegral(adouble\& arg)}.
3679Then remake the library \verb=libadolc.*= (see \autoref{genlib}).
3681In the definition of the class
3682{\sf ADOLC\_DLL\_EXPORT adouble} in \verb=<adolc/adouble.h>=, add the statement
3684{\sf friend adouble myintegral(adouble\&)}.
3687In the first modification, {\sf myintegral} represents the name of the
3688{\sf double} function, whereas {\sf integrand} represents the actual block
3689of C code.
3691For example, in case of the inverse hyperbolic cosine, we have
3692{\sf myintegral} = {\sf acosh}. Then {\sf integrand} can be written as
3693{\sf \{ val = sqrt(arg*arg-1); \}} 
3694so that the line
3696{\sf extend\_quad(acosh,val = sqrt(arg*arg-1));} 
3698can be added to the file \verb=<adouble.cpp>=.
3699A mathematically equivalent but longer representation of
3700{\sf integrand} is
3703{\sf \{ }\hspace{1.0in}\= {\sf  \{ adouble} \= temp =   \kill
3704 \>{\sf  \{ adouble} \> {\sf temp = arg;} \\
3705 \> \ \> {\sf  temp = temp*temp; } \\ 
3706 \> \ \> {\sf  val = sqrt(temp-1); \}} 
3709The code block {\sf integrand} may call on any elementary function that has already
3710been defined in file \verb=<adouble.cpp>=, so that one may also introduce
3711iterated integrals.
3714\section{Example Codes}
3717The following listings are all simplified versions of codes that
3718are contained in the example subdirectory
3719\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= of ADOL-C. In particular,
3720we have left out timings, which are included in the complete codes.
3722\subsection{Speelpenning's Example ({\tt speelpenning.cpp})}
3724The first example evaluates the gradient and the Hessian of
3725the function
3727y \; = \; f(x)\; =\; \prod_{i=0}^{n-1} x_i
3729using the appropriate drivers {\sf gradient} and {\sf hessian}.
3732#include <adolc/adouble.h>               // use of active doubles and taping
3733#include <adolc/drivers/drivers.h>       // use of "Easy to Use" drivers
3734                                   // gradient(.) and hessian(.)
3735#include <adolc/taping.h>                // use of taping
3737void main() {
3738int n,i,j,tape_stats[STAT_SIZE];
3739cout << "SPEELPENNINGS PRODUCT (ADOL-C Documented Example) \n";
3740cout << "number of independent variables = ?  \n";
3741cin >> n;
3742double* xp = new double[n];         
3743double  yp = 0.0;
3744adouble* x = new adouble[n];     
3745adouble  y = 1;
3747  xp[i] = (i+1.0)/(2.0+i);         // some initialization
3748trace_on(1);                       // tag =1, keep=0 by default
3749  for(i=0;i<n;i++) {
3750    x[i] <<= xp[i]; y *= x[i]; }     
3751  y >>= yp;
3752  delete[] x;                     
3754tapestats(1,tape_stats);           // reading of tape statistics
3755cout<<"maxlive "<<tape_stats[2]<<"\n";
3756...                                // ..... print other tape stats
3757double* g = new double[n];       
3758gradient(1,n,xp,g);                // gradient evaluation
3759double** H=(double**)malloc(n*sizeof(double*));
3761  H[i]=(double*)malloc((i+1)*sizeof(double));
3762hessian(1,n,xp,H);                 // H equals (n-1)g since g is
3763double errg = 0;                   // homogeneous of degree n-1.
3764double errh = 0;
3766  errg += fabs(g[i]-yp/xp[i]);     // vanishes analytically.
3767for(i=0;i<n;i++) {
3768  for(j=0;j<n;j++) {
3769    if (i>j)                       // lower half of hessian
3770      errh += fabs(H[i][j]-g[i]/xp[j]); } }
3771cout << yp-1/(1.0+n) << " error in function \n";
3772cout << errg <<" error in gradient \n";
3773cout << errh <<" consistency check \n";
3774}                                  // end main
3777\subsection{Power Example ({\tt powexam.cpp})}
3779The second example function evaluates the $n$-th power of a real
3780variable $x$ in
3781$\log_2 n$ multiplications by recursive halving of the exponent. Since
3782there is only one independent variable, the scalar derivative can be
3783computed by
3784using both {\sf forward} and {\sf reverse}, and the
3785results are subsequently compared.
3787#include <adolc/adolc.h>                 // use of ALL ADOL-C interfaces
3789adouble power(adouble x, int n) {
3790adouble z=1;
3791if (n>0) {                         // recursion and branches
3792  int nh =n/2;                     // that do not depend on
3793  z = power(x,nh);                 // adoubles are fine !!!!
3794  z *= z;
3795  if (2*nh != n)
3796    z *= x;
3797  return z; }                      // end if
3798else {
3799  if (n==0)                        // the local adouble z dies
3800    return z;                      // as it goes out of scope.
3801  else
3802    return 1/power(x,-n); }        // end else
3803} // end power
3805The function {\sf power} above was obtained from the original
3806undifferentiated version by simply changing the type of all
3807{\sf double}s including the return variable to {\sf adouble}s. The new version
3808can now be called from within any active section, as in the following
3809main program.
3811#include ...                       // as above
3812int main() {
3813int i,n,tag=1;
3814cout <<"COMPUTATION OF N-TH POWER (ADOL-C Documented Example)\n\n";
3815cout<<"monomial degree=? \n";      // input the desired degree
3816cin >> n;
3817                                   // allocations and initializations
3818double* Y[1];
3819*Y = new double[n+2];
3820double* X[1];                      // allocate passive variables with
3821*X = new double[n+4];              // extra dimension for derivatives
3822X[0][0] = 0.5;                     // function value = 0. coefficient
3823X[0][1] = 1.0;                     // first derivative = 1. coefficient
3825  X[0][i+2]=0;                     // further coefficients
3826double* Z[1];                      // used for checking consistency
3827*Z = new double[n+2];              // between forward and reverse
3828adouble y,x;                       // declare active variables
3829                                   // beginning of active section
3830trace_on(1);                       // tag = 1 and keep = 0
3831x <<= X[0][0];                     // only one independent var
3832y = power(x,n);                    // actual function call
3833y >>= Y[0][0];                     // only one dependent adouble
3834trace_off();                       // no global adouble has died
3835                                   // end of active section
3836double u[1];                       // weighting vector
3837u[0]=1;                            // for reverse call
3838for(i=0;i<n+2;i++) {               // note that keep = i+1 in call
3839  forward(tag,1,1,i,i+1,X,Y);      // evaluate the i-the derivative
3840  if (i==0)
3841    cout << Y[0][i] << " - " << y.value() << " = " << Y[0][i]-y.value()
3842    << " (should be 0)\n";
3843  else
3844    cout << Y[0][i] << " - " << Z[0][i] << " = " << Y[0][i]-Z[0][i]
3845    << " (should be 0)\n";
3846  reverse(tag,1,1,i,u,Z);          // evaluate the (i+1)-st derivative
3847  Z[0][i+1]=Z[0][i]/(i+1); }       // scale derivative to Taylorcoeff.
3848return 1;
3849}                                  // end main
3851Since this example has only one independent and one dependent variable,
3852{\sf forward} and {\sf reverse} have the same complexity and calculate
3853the same scalar derivatives, albeit with a slightly different scaling.
3854By replacing the function {\sf power} with any other univariate test function,
3855one can check that {\sf forward} and {\sf reverse} are at least consistent.
3856In the following example the number of independents is much larger
3857than the number of dependents, which makes the reverse mode preferable.
3859\subsection{Determinant Example ({\tt detexam.cpp})}
3861Now let us consider an exponentially expensive calculation,
3862namely, the evaluation of a determinant by recursive expansion
3863along rows. The gradient of the determinant with respect to the
3864matrix elements is simply the adjoint, i.e. the matrix of cofactors.
3865Hence the correctness of the numerical result is easily checked by
3866matrix-vector multiplication. The example illustrates the use
3867of {\sf adouble} arrays and pointers. 
3870#include <adolc/adouble.h>               // use of active doubles and taping
3871#include <adolc/interfaces.h>            // use of basic forward/reverse
3872                                   // interfaces of ADOL-C
3873adouble** A;                       // A is an n x n matrix
3874int i,n;                           // k <= n is the order
3875adouble det(int k, int m) {        // of the sub-matrix
3876if (m == 0) return 1.0 ;           // its column indices
3877else {                             // are encoded in m
3878  adouble* pt = A[k-1];
3879  adouble t = zero;                // zero is predefined
3880  int s, p =1;
3881  if (k%2) s = 1; else s = -1;
3882  for(i=0;i<n;i++) {
3883    int p1 = 2*p;
3884    if (m%p1 >= p) {
3885      if (m == p) {
3886        if (s>0) t += *pt; else t -= *pt; }
3887      else {
3888        if (s>0)
3889          t += *pt*det(k-1,m-p);   // recursive call to det
3890        else
3891          t -= *pt*det(k-1,m-p); } // recursive call to det
3892      s = -s;}
3893    ++pt;
3894    p = p1;}
3895  return t; }
3896}                                  // end det
3898As one can see, the overloading mechanism has no problem with pointers
3899and looks exactly the same as the original undifferentiated function
3900except for the change of type from {\sf double} to {\sf adouble}.
3901If the type of the temporary {\sf t} or the pointer {\sf pt} had not been changed,
3902a compile time error would have resulted. Now consider a corresponding
3903calling program.
3906#include ...                       // as above
3907int main() {
3908int i,j, m=1,tag=1,keep=1;
3909cout << "COMPUTATION OF DETERMINANTS (ADOL-C Documented Example)\n\n";
3910cout << "order of matrix = ? \n";  // select matrix size
3911cin >> n;
3912A = new adouble*[n];             
3913trace_on(tag,keep);                // tag=1=keep
3914  double detout=0.0, diag = 1.0;   // here keep the intermediates for
3915  for(i=0;i<n;i++) {               // the subsequent call to reverse
3916    m *=2;
3917    A[i] = new adouble[n];         // not needed for adoublem
3918    adouble* pt = A[i];
3919    for(j=0;j<n;j++)
3920      A[i][j] <<= j/(1.0+i);       // make all elements of A independent
3921    diag += A[i][i].value();        // value() converts to double
3922    A[i][i] += 1.0; }
3923  det(n,m-1) >>= detout;           // actual function call
3924  printf("\n %f - %f = %f  (should be 0)\n",detout,diag,detout-diag);
3926double u[1];
3927u[0] = 1.0;
3928double* B = new double[n*n];
3930cout <<" \n first base? : ";
3931for (i=0;i<n;i++) {
3932  adouble sum = 0;
3933  for (j=0;j<n;j++)                // the matrix A times the first n
3934    sum += A[i][j]*B[j];           // components of the gradient B
3935  cout<<sum.value()<<" "; }         // must be a Cartesian basis vector
3936return 1;
3937}                                  // end main
3939The variable {\sf diag} should be mathematically
3940equal to the determinant, because the
3941matrix {\sf A} is defined as a rank 1 perturbation of the identity.
3943\subsection{Ordinary Differential Equation Example ({\tt odexam.cpp})}
3946Here, we consider a nonlinear ordinary differential equation that
3947is a slight modification of the Robertson test problem
3948given in Hairer and Wanner's book on the numerical solution of
3949ODEs \cite{HW}. The following source code computes the corresponding
3950values of $y^{\prime} \in \R^3$:
3952#include <adolc/adouble.h>                  // use of active doubles and taping
3953#include <adolc/drivers/odedrivers.h>       // use of "Easy To use" ODE drivers
3954#include <adolc/adalloc.h>                  // use of ADOL-C allocation utilities
3956void tracerhs(short int tag, double* py, double* pyprime) {
3957adoublev y(3);                        // this time we left the parameters
3958adoublev yprime(3);                   // passive and use the vector types
3960y <<= py;                             // initialize and mark independents
3961yprime[0] = -sin(y[2]) + 1e8*y[2]*(1-1/y[0]);
3962yprime[1] = -10*y[0] + 3e7*y[2]*(1-y[1]);
3963yprime[2] = -yprime[0] - yprime[1];
3964yprime >>= pyprime;                   // mark and pass dependents
3966}                                     // end tracerhs
3968The Jacobian of the right-hand side has large
3969negative eigenvalues, which make the ODE quite stiff. We  have added
3970some numerically benign transcendentals to make the differentiation
3971more interesting.
3972The following main program uses {\sf forode} to calculate the Taylor series
3973defined by the ODE at the given point $y_0$ and {\sf reverse} as well
3974as {\sf accode} to compute the Jacobians of the coefficient vectors
3975with respect to $x_0$.
3977#include .......                   // as above
3978int main() {
3979int i,j,deg; 
3980int n=3;
3981double py[3];
3982double pyp[3];
3983cout << "MODIFIED ROBERTSON TEST PROBLEM (ADOL-C Documented Example)\n";
3984cout << "degree of Taylor series =?\n";
3985cin >> deg;
3986double **X;
3989  X[i]=(double*)malloc((deg+1)*sizeof(double));
3990double*** Z=new double**[n];
3991double*** B=new double**[n];
3992short** nz = new short*[n];
3993for(i=0;i<n;i++) {
3994  Z[i]=new double*[n];
3995  B[i]=new double*[n];
3996  for(j=0;j<n;j++) {
3997    Z[i][j]=new double[deg];
3998    B[i][j]=new double[deg]; }     // end for
3999}                                  // end for
4000for(i=0;i<n;i++) {
4001  py[i] = (i == 0) ? 1.0 : 0.0;    // initialize the base point
4002  X[i][0] = py[i];                 // and the Taylor coefficient;
4003  nz[i] = new short[n]; }          // set up sparsity array
4004tracerhs(1,py,pyp);                // trace RHS with tag = 1
4005forode(1,n,deg,X);                 // compute deg coefficients
4006reverse(1,n,n,deg-1,Z,nz);         // U defaults to the identity
4008cout << "nonzero pattern:\n";
4009for(i=0;i<n;i++) {
4010  for(j=0;j<n;j++)
4011    cout << nz[i][j]<<"\t";
4012  cout <<"\n"; }                   // end for
4013return 1;
4014}                                  // end main
4016\noindent The pattern {\sf nz} returned by {\sf accode} is
4018              3  -1   4
4019              1   2   2
4020              3   2   4
4022The original pattern {\sf nz} returned by {\sf reverse} is the same
4023except that the negative entry $-1$ was zero.
4025%\subsection {Gaussian Elimination Example ({\tt gaussexam.cpp})}
4028%The following example uses conditional assignments to show the usage of a once produced tape
4029%for evaluation at new arguments. The elimination is performed with
4030%column pivoting.
4032%#include <adolc/adolc.h>           // use of ALL ADOL-C interfaces
4034%void gausselim(int n, adoublem& A, adoublev& bv) {
4035%along i;                           // active integer declaration
4036%adoublev temp(n);                  // active vector declaration
4037%adouble r,rj,temps;
4038%int j,k;
4039%for(k=0;k<n;k++) {                 // elimination loop
4040%  i = k;
4041%  r = fabs(A[k][k]);               // initial pivot size
4042%  for(j=k+1;j<n;j++) {
4043%    rj = fabs(A[j][k]);             
4044%    condassign(i,rj-r,j);          // look for a larger element in the same
4045%    condassign(r,rj-r,rj); }       // column with conditional assignments
4046%  temp = A[i];                     // switch rows using active subscripting
4047%  A[i] = A[k];                     // necessary even if i happens to equal
4048%  A[k] = temp;                     // k during taping
4049%  temps = bv[i];
4050%  bv[i]=bv[k];
4051%  bv[k]=temps;
4052%  if (!value(A[k][k]))             // passive subscripting
4053%    exit(1);                       // matrix singular!
4054%  temps= A[k][k];
4055%  A[k] /= temps;
4056%  bv[k] /= temps;
4057%  for(j=k+1;j<n;j++) {
4058%    temps= A[j][k];
4059%    A[j] -= temps*A[k];            // vector operations
4060%    bv[j] -= temps*bv[k]; }        // endfor
4061%}                                  // end elimination loop
4063%for(k=n-1;k>=0;k--)                // backsubstitution
4064%  temp[k] = (bv[k]-(A[k]*temp))/A[k][k];
4066%}                                  // end gausselim
4068%\noindent This function can be called from any program
4069%that suitably initializes
4070%the components of {\sf A} and {\sf bv}
4071%as independents. The resulting tape can be
4072%used to solve any nonsingular linear system of the same size and
4073%to get the sensitivities of the solution with respect to the
4074%system matrix and the right hand side.
4079Parts of the ADOL-C source were developed by Andreas
4080Kowarz, Hristo Mitev, Sebastian Schlenkrich,  and Olaf
4081Vogel. We are also indebted to George Corliss,
4082Tom Epperly, Bruce Christianson, David Gay,  David Juedes,
4083Brad Karp, Koichi Kubota, Bob Olson,  Marcela Rosemblun, Dima
4084Shiriaev, Jay Srinivasan, Chuck Tyner, Jean Utke, and Duane Yoder for helping in
4085various ways with the development and documentation of ADOL-C.
4090Christian~H. Bischof, Peyvand~M. Khademi, Ali Bouaricha and Alan Carle.
4091\newblock {\em Efficient computation of gradients and Jacobians by dynamic
4092  exploitation of sparsity in automatic differentiation}.
4093\newblock Optimization Methods and Software 7(1):1-39, 1996.
4096Bruce Christianson.
4097\newblock {\em Reverse accumulation and accurate rounding error estimates for
4098Taylor series}.
4099\newblock  Optimization Methods and Software 1:81--94, 1992.
4102Assefaw Gebremedhin, Fredrik Manne, and Alex Pothen.
4103\newblock {\em What color is your {J}acobian? {G}raph coloring for computing
4104  derivatives}.
4105\newblock SIAM Review 47(4):629--705, 2005.
4108Assefaw Gebremedhin, Alex Pothen, Arijit Tarafdar and Andrea Walther.
4109{\em Efficient Computation of Sparse Hessians: An Experimental Study
4110  using ADOL-C}. Tech. Rep. (2006). To appear in INFORMS Journal on Computing.
4112\bibitem{GePoWa08} Assefaw Gebremedhin, Alex Pothen, and Andrea
4113  Walther.
4114{\em Exploiting  Sparsity  in Jacobian Computation via Coloring and Automatic Differentiation:
4115a Case Study in a Simulated Moving Bed Process}.
4116In Chr. Bischof et al., eds.,  {\em Proceedings AD 2008 conference}, LNCSE 64, pp. 327 -- 338, Springer (2008).
4119Assefaw Gebremedhin, Arijit Tarafdar, Fredrik Manne, and Alex Pothen,
4120{\em New Acyclic and Star Coloring Algorithms with Applications to Hessian Computation}.
4121SIAM Journal on Scientific Computing 29(3):1042--1072, 2007.
4125Andreas Griewank and Andrea Walther: {\em Evaluating Derivatives, Principles and Techniques of
4126  Algorithmic Differentiation. Second edition}. SIAM, 2008.
4130Andreas Griewank, Jean Utke, and Andrea Walther.
4131\newblock {\em Evaluating higher derivative tensors by forward propagation
4132          of univariate Taylor series}.
4133\newblock Mathematics of Computation, 69:1117--1130, 2000.
4136Andreas Griewank and Andrea Walther. {\em Revolve: An Implementation of Checkpointing for the Reverse
4137                 or Adjoint Mode of Computational Differentiation},
4138                 ACM Transaction on Mathematical Software 26:19--45, 2000.
4141    Ernst Hairer and Gerhard Wanner.
4142    {\it Solving Ordinary Differential Equations II.\/}
4143    Springer-Verlag, Berlin, 1991.
4146Donald~E. Knuth.
4147\newblock {\em The Art of Computer Programming. Second edition.}
4148\newblock Addison-Wesley, Reading, 1973.
4151Andrea Walther.
4152\newblock {\em Computing Sparse Hessians with Automatic Differentiation}.
4153\newblock Transaction on Mathematical Software, 34(1), Artikel 3 (2008).
Note: See TracBrowser for help on using the repository browser.