source: trunk/ADOL-C/doc/adolc-manual.tex @ 406

Last change on this file since 406 was 406, checked in by kulshres, 9 years ago

Merge branch 'cuda' of 'gitclone'

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

The following commits were merged:

commit b62001306013a3cff8685cd41355db4bbcdbdfe1
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Tue Feb 12 10:46:48 2013 +0100

Add examples to build system, copyrights and multi-include protection

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit ce0f1c68301992d307136ac114af10b0a05e8915
Author: Alina Koniaeva <alinak@…>
Date: Thu Jul 12 16:19:38 2012 +0200

sqrt and log function changed in the header file adoublecuda.h

Signed-off-by: Alina Koniaeva <alinak@…>

commit eed8665516916084e3b7dd0a783938f814d26391
Author: Alina Koniaeva <alinak@…>
Date: Thu Jul 12 16:18:34 2012 +0200

Examples for the use of ADOL-C with Cuda

Signed-off-by: Alina Koniaeva <alinak@…>

commit c1dff45b95768c536366cd688281488721279c3d
Author: Alina Koniaeva <alinak@…>
Date: Thu Jul 12 16:13:34 2012 +0200

Documentation for the use of ADOL-C with Cuda

Signed-off-by: Alina Koniaeva <alinak@…>

commit 7d3ffd136d5785162beeb7e7f1063e50d236f001
Author: Alina Koniaeva <alinak@…>
Date: Fri Jun 1 16:52:05 2012 +0200

Example for the use of ADOL-C with Cuda

Signed-off-by: Alina Koniaeva <alinak@…>

commit e59c0d04e59e01e3c498f6e3e60aa1371ed0fbc2
Author: Alina Koniaeva <alinak@…>
Date: Fri Jun 1 16:51:20 2012 +0200

Headerfile adoublecuda.h included in makefile

Signed-off-by: Alina Koniaeva <alinak@…>

commit 3a81a3ae62be40673cd6d915e5aa56e57996e918
Author: Alina Koniaeva <alinak@…>
Date: Fri Jun 1 16:50:37 2012 +0200

Headerfile for the use of ADOL-C with Cuda

Signed-off-by: Alina Koniaeva <alinak@…>

File size: 211.1 KB
1% Latex file containing the documentation of ADOL-C
3% Copyright (C) Andrea Walther, Andreas Griewank, Andreas Kowarz,
4%               Hristo Mitev, Sebastian Schlenkrich, Jean Utke, Olaf Vogel
6% This file is part of ADOL-C. This software is provided as open source.
7% Any use, reproduction, or distribution of the software constitutes
8% recipient's acceptance of the terms of the accompanying license file.
17\newdateformat{monthyear}{\monthname\ \THEYEAR}
32\newcommand{\N}{{ {\rm I} \kern -.225em {\rm N} }}
33\newcommand{\R}{{ {\rm I} \kern -.225em {\rm R} }}
34\newcommand{\T}{{ {\rm I} \kern -.425em {\rm T} }}
50{\Large {\bf ADOL-C:}} 
51\footnote{The development of earlier versions was supported by the Office of
52  Scientific Computing, U.S. Department of Energy, the NSF, and the Deutsche
53  Forschungsgemeinschaft. During the development of the current
54  version Andrea Walther and Andreas Kowarz were supported by the
55  grant Wa 1607/2-1 of the Deutsche Forschungsgemeinschaft} 
56\vspace{0.2in} \\
58{\Large A Package for the Automatic Differentiation}\vspace{0.1in} \\
59{\Large of Algorithms Written in C/C++}\\
61{\large\bf  Version \packageversion, \monthyear\today} \\
63 \mbox{Andrea Walther}\footnote{Institute of Mathematics, University
64   of Paderborn, 33098 Paderborn, Germany} and
65 \mbox{Andreas Griewank}\footnote{Department of Mathematics,
66 Humboldt-Universit\"at zu Berlin, 10099 Berlin, Germany}
71The C++ package ADOL-C described here facilitates the evaluation of
72first and higher derivatives of vector functions that are defined
73by computer programs written in C or C++. The resulting derivative
74evaluation routines may be called from C, C++, Fortran, or any other
75language that can be linked with C.
77The numerical values of derivative vectors are obtained free
78of truncation errors at a small multiple of the run time and
79random access memory required by the given function evaluation program.
80Derivative matrices are obtained by columns, by rows or in sparse format.
81For solution curves defined by ordinary differential equations,
82special routines are provided that evaluate the Taylor coefficient vectors
83and their Jacobians with respect to the current state vector.
84For explicitly or implicitly defined functions derivative tensors are
85obtained with a complexity that grows only quadratically in their
86degree. The derivative calculations involve a possibly substantial but
87always predictable amount of data. Since the data is accessed strictly sequentially
88it can be automatically paged out to external files.
91{\bf Keywords}: Computational Differentiation, Automatic
92         Differentiation,
93         Chain Rule, Overloading, Taylor Coefficients,
94         Gradients, Hessians, Forward Mode, Reverse Mode,
95         Implicit Function Differentiation, Inverse Function Differentiation
99{\bf Abbreviated title}: Automatic differentiation by overloading in C++
111\section{Preparing a Section of C or C++ Code for Differentiation}
117The package \mbox{ADOL-C} 
118utilizes overloading in C++, but the
119user has to know only C. The acronym stands for {\bf A}utomatic
120{\bf D}ifferentiation by {\bf O}ver{\bf L}oading in {\bf C}++.
121In contrast to source transformation approaches, overloading does not generate intermediate
122source code.
123As starting points to retrieve further information on techniques and
124application of automatic differentiation, as well as on other AD
125tools, we refer to the book \cite{GrWa08}. Furthermore, the web page
126\verb= of the AD community forms a rich source
127of further information and pointers.
130ADOL-C facilitates the simultaneous
131evaluation of arbitrarily high directional derivatives and the
132gradients of these Taylor coefficients with respect to all independent
133variables. Relative to the cost of evaluating the underlying function,
134the cost for evaluating any such scalar-vector pair grows as the
135square of the degree of the derivative but is still completely
136independent of the numbers $m$ and $n$.
138This manual is organized as follows. This section explains the
139modifications required to convert undifferentiated code to code that
140compiles with ADOL-C.
141\autoref{tape} covers aspects of the tape of recorded data that ADOL-C uses to
142evaluate arbitrarily high order derivatives. The discussion includes storage
143requirements and the tailoring of certain tape characteristics to fit specific
144user needs. Descriptions of easy-to-use drivers for a  convenient derivative
145evaluation are contained in \autoref{drivers}.
146\autoref{forw_rev_ad} offers a more mathematical characterization of
147the different modes of AD to compute derivatives. At the same time, the
148corresponding drivers of ADOL-C are explained. 
149The overloaded derivative evaluation routines using the forward and the reverse
150mode of AD are explained in \autoref{forw_rev}.
151Advanced differentiation techniques as the optimal checkpointing for
152time integrations, the exploitation of fixed point iterations, the usages
153of external differentiated functions and the differentiation of OpenMP
154parallel programs are described in \autoref{adv_ad}.
155The tapeless forward mode is presented in \autoref{tapeless}.
156\autoref{install} details the installation and
157use of the ADOL-C package. Finally, \autoref{example} 
158furnishes some example programs that incorporate the ADOL-C package to
159evaluate first and higher-order
160derivatives.  These and other examples are distributed with the ADOL-C
161source code.
162The user should simply refer to them if the more abstract and general
163descriptions of ADOL-C provided in this document do not suffice.
166\subsection{Declaring Active Variables}
170The key ingredient of automatic differentiation by overloading is the
171concept of an {\em active variable}. All variables that may be
172considered as differentiable quantities at some time
173during the program execution must be of an active
174type. ADOL-C uses one
175active scalar type, called {\sf adouble}, whose real part is of the
176standard type {\sf double}.
177Typically, one will declare the independent variables
178and all quantities that directly or indirectly depend on them as
179{\em active}. Other variables that do not depend on the independent
180variables but enter, for example, as parameters, may remain one of the
181{\em passive} types {\sf double, float}, or {\sf int}. There is no
182implicit type conversion from {\sf adouble} to any of these passive
183types; thus, {\bf failure to declare variables as active when they
184depend on other active variables will result in a compile-time error
185message}. In data flow terminology, the set of active variable names
186must contain all its successors in the dependency graph. All components
187of indexed arrays must have the same activity status.
189The real component of an {\sf adouble x} can be extracted as
190{\sf x.value()}. In particular,
191such explicit conversions are needed for the standard output procedure
192{\sf printf}. The output stream operator \boldmath $\ll$ \unboldmath is overloaded such
193that first the real part of an {\sf adouble} and then the string
194``{\sf (a)}" is added to the stream. The input stream operator \boldmath $\gg$ \unboldmath  can
195be used to assign a constant value to an {\sf adouble}.
196Naturally, {\sf adouble}s may be
197components of vectors, matrices, and other arrays, as well as
198members of structures or classes.
200The C++ class {\sf adouble}, its member functions, and the overloaded
201versions of all arithmetic operations, comparison operators, and
202most ANSI C functions are contained in the file \verb=adouble.cpp= and its
203header \verb=<adolc/adouble.h>=. The latter must be included for compilation
204of all program files containing {\sf adouble}s and corresponding
208\subsection{Marking Active Sections}
211All calculations involving active variables that occur between
212the void function calls
214{\sf trace\_on(tag,keep)} \hspace{0.3in} and \hspace{0.3in}
215{\sf trace\_off(file)}
217are recorded on a sequential data set called {\em tape}. Pairs of
218these function calls can appear anywhere in a C++ program, but
219they must not overlap. The nonnegative integer argument {\sf tag} identifies the
220particular tape for subsequent function or derivative evaluations.
221Unless several tapes need to be kept, ${\sf tag} =0$ may be used throughout.
222The optional integer arguments {\sf keep} and
223{\sf file} will be discussed in \autoref{tape}. We will refer to the
224sequence of statements executed between a particular call to
225{\sf trace\_on} and the following call to {\sf trace\_off} as an
226{\em active section} of the code. The same active section may be
227entered repeatedly, and one can successively generate several traces
228on distinct tapes by changing the value of {\sf tag}.
229Both functions {\sf trace\_on} and {\sf trace\_off} are prototyped in
230the header file \verb=<adolc/taputil.h>=, which is included by the header
231\verb=<adolc/adouble.h>= automatically.
233Active sections may contain nested or even recursive calls to functions
234provided by the user. Naturally, their formal and actual parameters
235must have matching types. In particular, the functions must be
236compiled with their active variables declared as
237{\sf adouble}s and with the header file \verb=<adolc/adouble.h>= included. 
238Variables of type {\sf adouble} may be declared outside an active section and need not
239go out of scope before the end of an active section.
240It is not necessary -- though desirable -- that free-store {\sf adouble}s
241allocated within
242an active section be deleted before its completion. The values of all
243{\sf adouble}s that exist at the beginning and end of an active section
244are automatically
245recorded by {\sf trace\_on} and {\sf trace\_off}, respectively.
248\subsection{Selecting Independent and Dependent Variables}
250One or more active variables that are read in or initialized to
251the values of constants or passive variables must be distinguished as
252independent variables. Other active variables that are similarly
253initialized may be considered as temporaries (e.g., a variable that
254accumulates the partial sums of a scalar product after being
255initialized to zero). In order to distinguish an active variable {\sf x} as
256independent, ADOL-C requires an assignment of the form
258{\sf x} \boldmath $\ll=$ \unboldmath {\sf px}\hspace{0.2in}// {\sf px} of any passive numeric type $\enspace .$
260This special initialization ensures that {\sf x.value()} = {\sf px}, and it should
261precede any other assignment to {\sf x}. However, {\sf x} may be reassigned
262other values subsequently. Similarly, one or more active variables {\sf y}
263must be distinguished as dependent by an assignment of the form
265{\sf y \boldmath $\gg=$ \unboldmath py}\hspace{0.2in}// {\sf py} of any  passive type $\enspace ,$ 
267which ensures that {\sf py} = {\sf y.value()} and should not be succeeded
268by any other assignment to {\sf y}. However, a dependent variable {\sf y} 
269may have been assigned other real values previously, and it could even be an
270independent variable as well.  The derivative values calculated after
272completion of an active section always represent {\bf derivatives of the final
273values of the dependent variables with respect to the initial values of the
274independent variables}.
276The order in which the independent and dependent variables are marked
277by the \boldmath $\ll=$ \unboldmath and \boldmath $\gg=$ \unboldmath statements matters crucially for the subsequent
278derivative evaluations. However, these variables do not have to be
279combined into contiguous vectors. ADOL-C counts the number of
280independent and dependent variable specifications within each active
281section and records them in the header of the tape.
284\subsection{A Subprogram as an Active Section} 
286As a generic example let us consider a C(++) function of the form
287shown in \autoref{code1}.
293{\sf void eval(}\= {\sf int n, int m,} \hspace{0.5 in} \=  // number of independents and dependents\\
294\>{\sf  double *x,} \> // independent variable vector \\
295\>{\sf  double *y,} \> // dependent variable vector  \\ 
296\> {\sf int *k, } \> // integer parameters \\ 
297\>{\sf  double *z)}  \> // real parameters \\
298{\sf \{ }\hspace{0.1 in } \=  \> // beginning of function body \\
299\>{\sf double t = 0;}  \> // local variable declaration \\
300\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> // begin of computation \\
301\>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue  \\
302\>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue \\
303\>{\sf  y[m-1] = t/m; }   \> //   end of computation \\
304{\sf  \} } \>  \> // end of function
308\caption{Generic example of a subprogram to be activated}
313If {\sf eval} is to be called from within an active C(++)
314section with {\sf x}
315and {\sf y} as vectors of {\sf adouble}s and the other parameters
316passive, then one merely has to change the type declarations of all
317variables that depend on {\sf x} from {\sf double} or {\sf float} to
318{\sf adouble}. Subsequently, the subprogram must be compiled with the
319header file \verb=<adolc/adouble.h>= included as described
320in \autoref{DecActVar}. Now let us consider the situation when {\sf eval} is
321still to be called with integer and real arguments, possibly from
322a program written in Fortran77, which  does not allow overloading.
324To automatically compute derivatives of the dependent
325variables {\sf y} with respect to the independent variables {\sf x}, we
326can make the body of the function into an active section. For
327example, we may modify the previous program segment
328as in \autoref{adolcexam}.
329The renaming and doubling up of the original independent and dependent
330variable vectors by active counterparts may seem at first a bit clumsy.
331However, this transformation has the advantage that the calling
332sequence and the computational part, i.e., where the function is
333really evaluated, of {\sf eval} remain completely
334unaltered. If the temporary variable {\sf t} had remained a {\sf double},
335the code would not compile, because of a type conflict in the assignment
336following the declaration. More detailed example codes are listed in
343{\sf void eval(} \= {\sf  int n,m,} \hspace{1.0 in}\= // number of independents and dependents\\
344\> {\sf double *px,} \> // independent passive variable vector \\
345\> {\sf double *py,} \> // dependent passive variable vector  \\ 
346\> {\sf int *k,}  \> // integer parameters \\
347\> {\sf double *z)} \> // parameter vector \\
348{\sf \{}\hspace{0.1 in}\= \> // beginning of function body \\
349\>{\sf  short int tag = 0;} \>   // tape array and/or tape file specifier\\
350\>{\sf trace\_on(tag);} \> // start tracing  \\
351\>{\sf adouble *x, *y;} \> // declare active variable pointers \\
352\>{\sf x = new adouble[n];}\>// declare active independent variables \\ 
353\>{\sf y = new adouble[m];} \> // declare active dependent variables \\
354\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \\
355\>\hspace{0.2in} {\sf x[i] \boldmath $\ll=$ \unboldmath  px[i];} \> // select independent variables \\
356\>{\sf adouble t = 0;}  \> // local variable declaration \\
357     \>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> //  begin crunch \\
358     \>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue crunch \\
359     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
360     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
361     \>{\sf  y[m-1] = t/m; }   \> //   end crunch as before\\
362     \>{\sf for (int j=0; j \boldmath $<$ \unboldmath m; j++)} \\
363     \>\hspace{0.2in}{\sf y[j] \boldmath $\gg=$ \unboldmath py[j];} \> // select dependent variables \\
364     \>{\sf  delete[] y;} \>// delete dependent active variables \\
365     \>{\sf  delete[] x;} \>// delete independent active variables \\
366     \>{\sf trace\_off();} \> // complete tape \\
367{\sf  \}}   \>\> // end of function
370\caption{Activated version of the code listed in \autoref{code1}}
375\subsection{Overloaded Operators and Functions}
378As in the subprogram discussed above, the actual computational
379statements of a C(++) code need not be altered for the purposes of
380automatic differentiation. All arithmetic operations, as well as the
381comparison and assignment operators, are overloaded, so any or all of
382their operands can be an active variable. An {\sf adouble x} occurring
383in a comparison operator is effectively replaced by its real value
384{\sf x.value()}. Most functions contained in the ANSI C standard for
385the math library are overloaded for active arguments. The only
386exceptions are the non-differentiable functions {\sf fmod} and
387{\sf modf}. Otherwise, legitimate C code in active sections can remain
388completely unchanged, provided the direct output of active variables
389is avoided. The rest of this subsection may be skipped by first time
390users who are not worried about marginal issues of differentiability
391and efficiency.
393The modulus {\sf fabs(x)} is everywhere Lipschitz continuous but not
394properly differentiable at the origin, which raises the question of
395how this exception ought to be handled. Fortunately, one can easily
396see that {\sf fabs(x)} and all its compositions with smooth
397functions are still directionally differentiable. These
398directional derivatives of arbitrary order can be propagated in the
399forward mode without any ambiguity. In other words, the forward mode as
400implemented in ADOL-C  computes Gateaux derivatives
401in certain directions, which reduce to Fr\'echet derivatives only
402if the dependence on the direction is linear. Otherwise,
403the directional derivatives are merely positively homogeneous with
404respect to the scaling of the directions.
405For the reverse mode, ADOL-C sets the derivative of {\sf fabs(x)} at
406the origin somewhat arbitrarily to zero.
408We have defined binary functions {\sf fmin} and {\sf fmax} for {\sf adouble}
409arguments, so that function and derivative values are obtained consistent
410with those of {\sf fabs} according to the identities
412 \min(a,b) = [a+b-|a-b|]/2 \quad {\rm and} \quad
413 \max(a,b) = [a+b+|a-b|]/2 \quad .
415These relations cannot hold if either $a$ or $b$ is infinite, in which
416case {\sf fmin} or {\sf fmax} and their derivatives may still be well
417defined. It should be noted that the directional differentiation of
418{\sf fmin} and {\sf fmax} yields at ties $a=b$ different results from
419the corresponding assignment based on the sign of $a-b$. For example,
420the statement
422 {\sf if (a $<$ b) c = a; else c = b;}
424yields for {\sf a}~=~{\sf b} and {\sf a}$^\prime < $~{\sf b}$^\prime$
425the incorrect directional derivative value
426{\sf c}$^\prime = $~{\sf  b}$^\prime$ rather than the correct
427{\sf c}$^\prime = $~{\sf  a}$^\prime$. Therefore this form of conditional assignment
428should be avoided by use of the function $\sf fmin(a,b)$. There
429are also versions of {\sf fmin} and {\sf fmax} for two passive
430arguments and mixed passive/active arguments are handled by
431implicit conversion.
432On the function class obtained by composing the modulus with real
433analytic functions, the concept of directional differentiation can be
434extended to the propagation of unique one-sided Taylor expansions.
435The branches taken by {\sf fabs, fmin}, and {\sf fmax}, are recorded
436on the tape.
438The functions {\sf sqrt}, {\sf pow}, and some inverse trigonometric
439functions have infinite slopes at the boundary points of their domains.
440At these marginal points the derivatives are set by ADOL-C to
441either {\sf $\pm$InfVal}, 0
442or {\sf NoNum}, where {\sf InfVal} and {\sf NoNum} are user-defined
443parameters, see \autoref{Customizing}.
444On IEEE machines {\sf InfVal} can be set to the special value
445{\sf Inf}~=~$1.0/0.0$ and {\sf NoNum} to {\sf NaN}~=~$0.0/0.0$.
446For example, at {\sf a}~=~0 the first derivative {\sf b}$^\prime$ 
447of {\sf b}~=~{\sf sqrt(a)} is set to
449{\sf b}^\prime = \left\{
451\sf InfVal&\mbox{if}\;\; {\sf a}^\prime>0  \\
4520&\mbox{if}\;\;{\sf a}^\prime =0 \\
453\sf NoNum&\mbox{if}\;\;{\sf a}^\prime <0\\
454\end{array} \right\enspace .
456In other words, we consider {\sf a} and
457consequently {\sf b}  as a constant when {\sf a}$^\prime$ or more generally
458all computed Taylor coefficients are zero.
460The general power function ${\sf pow(x,y)=x^y}$ is computed whenever
461it is defined for the corresponding {\sf double} arguments. If {\sf x} is
462negative, however, the partial derivative with respect to an integral exponent
463is set to zero.
464%Similarly, the partial of {\bf pow} with respect to both arguments
465%is set to zero at the origin, where both arguments vanish.     
466The derivatives of the step functions
467{\sf floor}, {\sf ceil}, {\sf frexp}, and {\sf ldexp} are set to zero at all
468arguments {\sf x}. The result values of the step functions
469are recorded on the tape and can later be checked to recognize
470whether a step to another level was taken during a forward sweep
471at different arguments than at taping time.
473Some C implementations supply other special
474functions, in particular the error function {\sf erf(x)}. For the
475latter, we have included an {\sf adouble} version in \verb=<adouble.cpp>=, which
476has been commented out for systems on which the {\sf double} valued version
477is not available. The increment and decrement operators {\sf ++}, \boldmath $--$ \unboldmath (prefix and
478postfix) are available for {\sf adouble}s.
480% XXX: Vector and matrix class have to be reimplemented !!!
482% and also the
483%active subscripts described in the \autoref{act_subscr}.
484Ambiguous statements like {\sf a += a++;} must be
485avoided because the compiler may sequence the evaluation of the
487expression differently from the original in terms of {\sf double}s.
489As we have indicated above, all subroutines called with active arguments
490must be modified or suitably overloaded. The simplest procedure is
491to declare the local variables of the function as active so that
492their internal calculations are also recorded on the tape.
493Unfortunately, this approach is likely to be unnecessarily inefficient
494and inaccurate if the original subroutine evaluates a special function
495that is defined as the solution of a particular mathematical problem.
496The most important examples are implicit functions, quadratures,
497and solutions of ordinary differential equations. Often
498the numerical methods for evaluating such special functions are
499elaborate, and their internal workings are not at all differentiable in
500the data. Rather than differentiating through such an adaptive
501procedure, one can obtain first and higher derivatives directly from
502the mathematical definition of the special function. Currently this
503direct approach has been implemented only for user-supplied quadratures
504as described in \autoref{quadrat}.
507\subsection{Reusing the Tape for Arbitrary Input Values}
510In some situations it may be desirable to calculate the value and
511derivatives of a function at arbitrary arguments by using a tape of
512the function evaluation at one argument and reevaluating the
513function  and its derivatives using the given ADOL-C
514routines. This approach can
515significantly reduce run times, and it
516also allows to port problem functions, in the form of the 
517corresponding tape files, into a computing environment that
518does not support C++ but does support C or Fortran. 
519Therefore, the routines provided by ADOL-C for the evaluation of derivatives
520can be used to at arguments $x$ other than the
521point at which the tape was generated, provided there are
522no user defined quadratures and all comparisons involving
523{\sf adouble}s yield the same result. The last condition
524implies that the control flow is unaltered by the change
525of the independent variable values. Therefore, this sufficient
526condition is tested by ADOL-C and if it is not met
527the ADOL-C routine called for derivative calculations indicates this
528contingency through its return value. Currently, there are six return values,
529see \autoref{retvalues}.
533 +3 &
536The function is locally analytic.
538\end{minipage} \\ \hline
539 +2 &
542The function is locally analytic but the sparsity
543structure (compared to the situation at the  taping point)
544may have changed, e.g. while at taping arguments
545{\sf fmax(a,b)} returned {\sf a} we get {\sf b} at
546the argument currently used.
548\end{minipage} \\ \hline
549 +1 &
552At least one of the functions {\sf fmin}, {\sf fmax} or {\sf fabs}
553is  evaluated at a tie or zero, respectively.  Hence, the function to be differentiated is
554Lipschitz-continuous but possibly non-differentiable.
556\end{minipage} \\ \hline
557 0 &
560Some arithmetic comparison involving {\sf adouble}s yields a tie.
561Hence, the function to be differentiated  may be discontinuous.
563\end{minipage} \\ \hline
564 -1 &
567An {\sf adouble} comparison yields different results
568from the evaluation point at which the tape was generated.
570\end{minipage} \\ \hline
571 -2 &
574The argument of a user-defined quadrature has changed
575from the evaluation point at which the tape was generated.
577\end{minipage} \\ \hline
579\caption{Description of return values}
585\caption{Return values around the taping point}
589In \autoref{fi:tap_point} these return values are illustrated.
590If the user finds the return value of an ADOL-C routine to be negative the
591taping process simply has to be repeated by executing the active section again.
592The crux of the problem lies in the fact that the tape records only
593the operations that are executed during one particular evaluation of the
595It also has no way to evaluate integrals since the corresponding
596quadratures are never recorded on the tape.
597Therefore, when there are user-defined quadratures the retaping is necessary at each
598new point. If there are only branches conditioned on {\sf adouble}
599comparisons one may hope that re-taping becomes unnecessary when
600the points settle down in some small neighborhood, as one would
601expect for example in an iterative equation solver.
604\subsection{Conditional Assignments}
607It appears unsatisfactory that, for example, a simple table lookup
608of some physical property forces the re-recording of a possibly
609much larger calculation. However, the basic philosophy of ADOL-C
610is to overload arithmetic, rather than to generate a new program
611with jumps between ``instructions'', which would destroy the
612strictly sequential tape access and
613require the infusion of substantial compiler technology.
614Therefore, we introduce the two constructs of conditional
615assignments and active integers as partial remedies to the
616branching problem.
618In many cases, the functionality of branches
619can be replaced by conditional assignments. 
620For this purpose, we provide a special function called
621{\sf condassign(a,b,c,d)}. Its calling sequence corresponds to the
622syntax of the conditional assignment
624    {\sf a = (b \boldmath $>$ \unboldmath 0) ? c : d;} 
626which C++ inherited from C. However, here the arguments are restricted to be
627active or passive scalar arguments, and all expression arguments
628are evaluated before the test on {\sf  b}, which is different from
629the usual conditional assignment or the code segment.
631Suppose the original program contains the code segment
633{\sf if (b \boldmath $>$ \unboldmath 0) a = c; else a = d;}\\
635Here, only one of the expressions (or, more generally, program blocks)
636{\sf c} and {\sf d} is evaluated, which exactly constitutes the problem
637for ADOL-C. To obtain the correct value {\sf a} with ADOL-C, one
638may first execute both branches and then pick either {\sf c}
639or {\sf d} using
640{\sf condassign(a,b,c,d)}. To maintain
641consistency with the original code, one has to make sure
642that the two branches do not have any side effects that can
643interfere with each other or may be important for subsequent
644calculations. Furthermore the test parameter {\sf b} has to be an
645{\sf adouble} or an {\sf adouble} expression. Otherwise the
646test condition {\sf b} is recorded on the tape as a {\em constant} with its
647run time value. Thus the original dependency of {\sf b} on
648active variables gets lost, for instance if {\sf b} is a comparison
649expression, see \autoref{OverOper}.
650If there is no {\sf else} part in a conditional assignment, one may call
651the three argument version
652{\sf condassign(a,b,c)}, which
653is logically equivalent to {\sf condassign(a,b,c,a)} in that
654nothing happens if {\sf b} is non-positive. 
655The header file \verb=<adolc/adouble.h>=
656contains also corresponding definitions of
657{\sf condassign(a,b,c,d)} 
658and {\sf condassign(a,b,c)} for
659passive {\sf double} arguments so that the modified code
660without any differentiation can be tested
661for correctness.
663A generalization of this concept for more than two branches, e.g.,
664akin to a \texttt{switch} statement or a cascade of \texttt{if...else if}, may be done by enabling
665\textsf{ADOLC\_ADVANCED\_BRANCHING} and performing selection on
666elements of an \texttt{advector} with active indices.
669\subsection{Step-by-Step Modification Procedure}
671To prepare a section of given C or C++ code for automatic
672differentiation as described above, one applies the following step-by-step procedure.
675Use the statements {\sf trace\_on(tag)} or {\sf trace\_on(tag,keep)}
676and {\sf trace\_off()} or {\sf trace\_off(file)} to mark the
677beginning and end of the active section.
679Select the set of active variables, and change their type from
680{\sf double} or {\sf float} to {\sf adouble}.
682Select a sequence of independent variables, and initialize them with
683\boldmath $\ll=$ \unboldmath assignments from passive variables or vectors.
685Select a sequence of dependent variables among the active variables,
686and pass their final values to passive variable or vectors thereof
687by \boldmath $\gg=$ \unboldmath assignments.
689Compile the codes after including the header file \verb=<adolc/adouble.h>=.
691Typically, the first compilation will detect several type conflicts
692-- usually attempts to convert from active to passive
693variables or to perform standard I/O of active variables.
694Since all standard
695C programs can be activated by a mechanical application of the
696procedure above, the following section is of importance
697only to advanced users.
700\section{Numbering the Tapes and Controlling the Buffer}
703The trace generated by the execution of an active section may stay
704within a triplet of internal arrays or it may be written out
705to three corresponding files. We will refer to these triplets as the
706tape array or tape file, in general tape, which may subsequently be
707used to evaluate the
708underlying function and its derivatives at the original point or at
709alternative arguments. If the active section involves user-defined
710quadratures it must be executed and
711re-taped at each new argument. Similarly, if conditions on
712{\sf adouble} values lead to a different program branch being taken at
713a new argument the evaluation process also needs to be re-taped at the
714new point. Otherwise, direct evaluation from
715the tape by the routine {\sf function} (\autoref{optdrivers}) is
716likely to be
717faster. The use of quadratures and the results of all comparisons on
718{\sf adouble}s are recorded on the tape so that {\sf function} and other
719forward routines stop and  return appropriate flags if their use without
720prior re-taping is unsafe. To avoid any re-taping certain types of
721branches can be recorded on the tape through
722the use of conditional assignments 
723described before in \autoref{condassign}.
725Several tapes may be generated and kept simultaneously.
726A tape array is used as a triplet of buffers or a tape file is generated if
727the length of any of the buffers exceeds the maximal array lengths of
728{\sf OBUFSIZE}, {\sf VBUFSIZE} or {\sf LBUFSIZE}. These parameters are
729defined in the header file \verb=<adolc/usrparms.h>=
730and may be adjusted by the user in the header file before compiling
731the ADOL-C library, or on runtime using a file named \verb=.adolcrc=.
732The filesystem folder, where the tapes files may be written to disk,
733can be changed by changing the definition of {\sf TAPE\_DIR} in
734the header file \verb=<adolc/dvlparms.h>= before
735compiling the ADOL-C library, or on runtime by defining {\sf
736  TAPE\_DIR} in the \verb=.adolcrc= file. By default this is defined
737to be the present working directory (\verb=.=).
739For simple usage, {\sf trace\_on} may be called with only the tape
740{\sf tag} as argument, and {\sf trace\_off} may be called
741without argument. The optional integer argument {\sf keep} of
742{\sf trace\_on} determines whether the numerical values of all
743active variables are recorded in a buffered temporary array or file
744called the taylor stack.
745This option takes effect if
746{\sf keep} = 1 and prepares the scene for an immediately following
747gradient evaluation by a call to a routine implementing the reverse mode
748as described in the \autoref{forw_rev_ad} and \autoref{forw_rev}. A
749file is used instead of an array if the size exceeds the maximal array
750length of {\sf TBUFSIZE} defined in \verb=<adolc/usrparms.h>= and may
751be adjusted in the same way like the other buffer sizes mentioned above.
752Alternatively, gradients may be evaluated by a call
753to {\sf gradient}, which includes a preparatory forward sweep
754for the creation of the temporary file. If omitted, the argument
755{\sf  keep} defaults to 0, so that no temporary
756taylor stack file is generated.
758By setting the optional integer argument {\sf file} of
759{\sf  trace\_off} to 1, the user may force a numbered  tape
760file to be written even if the tape array (buffer) does not overflow.
761If the argument {\sf file} is omitted, it
762defaults to 0, so that the tape array is written onto a tape file only
763if the length of any of the buffers exceeds {\sf [OLVT]BUFSIZE} elements.
765After the execution of an active section, if a tape file was generated, i.e.,
766if the length of some buffer exceeded {\sf [OLVT]BUFSIZE} elements or if the
767argument {\sf file} of {\sf trace\_off} was set to 1, the files will be
768saved in the directory defined as {\sf ADOLC\_TAPE\_DIR} (by default
769the current working directory) under filenames formed by
770the strings {\sf ADOLC\_OPERATIONS\_NAME}, {\sf
772  ADOLC\_TAYLORS\_NAME} defined in
773the header file \verb=<adolc/dvlparms.h>= appended with the number
774given as the {\sf tag} argument to {\sf trace\_on} and have the
775extension {\sf .tap}.
777 Later, all problem-independent routines
778like {\sf gradient}, {\sf jacobian}, {\sf forward}, {\sf reverse}, and others
779expect as first argument a {\sf tag} to determine
780the tape on which their respective computational task is to be performed.
781By calling {\sf trace\_on} with different tape {\sf tag}s, one can create
782several tapes for various function evaluations and subsequently perform
783function and derivative evaluations on one or more of them.
785For example, suppose one wishes to calculate for two smooth functions
786$f_1(x)$ and $f_2(x)$ 
788   f(x) = \max \{f_1(x) ,f_2(x)\},\qquad \nabla f(x),
790and possibly higher derivatives where the two functions do not tie.
791Provided $f_1$ and $f_2$ are evaluated in two separate active sections,
792one can generate two different tapes by calling {\sf trace\_on} with
793{\sf tag} = 1 and {\sf tag} = 2 at the beginning of the respective active
795Subsequently, one can decide whether $f(x)=f_1(x)$ or $f(x)=f_2(x)$ at the
796current argument and then evaluate the gradient $\nabla f(x)$ by calling
797{\sf gradient} with the appropriate argument value {\sf tag} = 1 or
798{\sf tag} = 2.
801\subsection{Examining the Tape and Predicting Storage Requirements }
804At any point in the program, one may call the routine
806{\sf void tapestats(unsigned short tag, size\_t* counts)}
808with {\sf counts} beeing an array of at least eleven integers.
809The first argument {\sf tag} specifies the particular tape of
810interest. The components of {\sf counts} represent
813{\sf counts[0]}: & the number of independents, i.e.~calls to \boldmath $\ll=$ \unboldmath, \\
814{\sf counts[1]}: & the number of dependents, i.e.~calls to \boldmath $\gg=$ \unboldmath,\\ 
815{\sf counts[2]}: & the maximal number of live active variables,\\
816{\sf counts[3]}: & the size of taylor stack (number of overwrites),\\
817{\sf counts[4]}: & the buffer size (a multiple of eight),
822{\sf counts[5]}: & the total number of operations recorded,\\
823{\sf counts[6-13]}: & other internal information about the tape.
826The values {\sf maxlive} = {\sf counts[2]} and {\sf tssize} = {\sf counts[3]} 
827determine the temporary
828storage requirements during calls to the routines
829implementing the forward and the reverse mode.
830For a certain degree {\sf deg} $\geq$ 0, the scalar version of the
831forward mode involves apart from the tape buffers an array of
832 $(${\sf deg}$+1)*${\sf maxlive} {\sf double}s in
833core and, in addition, a sequential data set called the value stack
834of {\sf tssize}$*${\sf keep} {\sf revreal}s if called with the
835option {\sf keep} $>$ 0. Here
836the type {\sf revreal} is defined as {\sf double} or {\sf float}. The latter choice halves the storage
837requirement for the sequential data set, which stays in core if
838its length is less than {\sf TBUFSIZE} bytes and is otherwise written
839out to a temporary file. The parameter {\sf TBUFSIZE} is defined in the header file \verb=<adolc/usrparms.h>=.
840The drawback of the economical
841{\sf revreal} = {\sf float} choice is that subsequent calls to reverse mode implementations
842yield gradients and other adjoint vectors only in single-precision
843accuracy. This may be acceptable if the adjoint vectors
844represent rows of a Jacobian that is  used for the calculation of
845Newton steps. In its scalar version, the reverse mode implementation involves
846the same number of {\sf double}s and twice as many {\sf revreal}s as the
847forward mode implementation.
848The storage requirements of the vector versions of the forward mode and
849reverse mode implementation are equal to that of the scalar versions multiplied by
850the vector length.
853\subsection{Customizing ADOL-C}
856Based on the information provided by the routine {\sf tapestats}, the user may alter the
857following types and constant dimensions in the header file \verb=<adolc/usrparms.h>=
858to suit his problem and environment.
861\item[{\sf OBUFSIZE}, {\sf LBUFSIZE}, {\sf VBUFSIZE}{\rm :}] These integer determines the length of
862in\-ter\-nal buf\-fers (default: 524$\,$288). If the buffers are large enough to accommodate all
863required data, any file access is avoided unless {\sf trace\_off}
864is called with a positive argument. This desirable situation can
865be achieved for many problem functions with an execution trace of moderate
866size. Primarily these values occur as an argument
867to {\sf malloc}, so that setting it unnecessarily large may have no
868ill effects, unless the operating system prohibits or penalizes large
869array allocations. It is however recommended to leave the values in
870\texttt{<adolc/usrparms.h>} unchanged and set them using the
871\texttt{.adolcrc} file in the current working directory at runtime.
873\item[{\sf TBUFSIZE}{\rm :}] This integer determines the length of the
874in\-ter\-nal buf\-fer for a taylor stack (default: 524$\,$288).
876\item[{\sf TBUFNUM}{\rm :}] This integer determines the maximal number of taylor stacks (default: 32).
878\item[{\sf fint}{\rm :}] The integer data type used by Fortran callable versions of functions.
880\item[{\sf fdouble}{\rm :}] The floating point data type used by Fortran callable versions of functions.
882\item[{\sf inf\_num}{\rm :}] This together with {\sf inf\_den}
883sets the ``vertical'' slope {\sf InfVal} = {\sf inf\_num/inf\_den} 
884of special functions at the boundaries of their domains (default: {\sf inf\_num} = 1.0). On IEEE machines
885the default setting produces the standard {\sf Inf}. On non-IEEE machines
886change these values to produce a small {\sf InfVal} value and compare
887the results of two forward sweeps with different {\sf InfVal} settings
888to detect a ``vertical'' slope.
890\item[{\sf inf\_den}{\rm :}] See {\sf inf\_num} (default: 0.0).
892\item[{\sf non\_num}{\rm :}] This together with {\sf non\_den} 
893sets the mathematically
894undefined derivative value {\sf NoNum} = {\sf non\_num/non\_den}
895of special functions at the boundaries of their domains (default: {\sf non\_num} = 0.0). On IEEE machines
896the default setting produces the standard {\sf NaN}. On non-IEEE machines
897change these values to produce a small {\sf NoNum} value and compare
898the results of two forward sweeps with different {\sf NoNum} settings
899to detect the occurrence of undefined derivative values.
901\item[{\sf non\_den}{\rm :}] See {\sf non\_num} (default: 0.0).
903\item[{\sf ADOLC\_EPS}{\rm :}] For testing on small numbers to avoid overflows (default: 10E-20).
905\item[{\sf DIAG\_OUT}{\rm :}] File identifier used as standard output for ADOL-C diagnostics (default: stdout).
908The following types and options may be set using the command-line
909options of the \texttt{./configure} script.
912\item[{\sf locint}{\rm :}] The range of the integer type
913{\sf locint} determines how many {\sf adouble}s can be simultaneously
914alive (default: {\sf unsigned int}).  In extreme cases when there are more than $2^{32}$ {\sf adouble}s
915alive at any one time, the type {\sf locint} must be changed to
916 {\sf unsigned long}. This can be done by passing
917 \texttt{--enable-ulong} to \texttt{./configure}.
919\item[{\sf revreal}{\rm :}] The choice of this floating-point type
920trades accuracy with storage for reverse sweeps (default: {\sf double}). While functions
921and their derivatives are always evaluated in double precision
922during forward sweeps, gradients and other adjoint vectors are obtained
923with the precision determined by the type {\sf revreal}. The less
924accurate choice {\sf revreal} = {\sf float} nearly halves the
925storage requirement during reverse sweeps. This can be done by passing
926\texttt{--disable-double} to \texttt{./configure}.
928\item[{\sf ATRIG\_ERF}{\rm :}] The overloaded versions of the inverse
929  hyperbolic functions and the error function are enabled (default:
930  undefined) by passing \texttt{--enable-atrig-erf}
931  to \texttt{./configure}
933\item[{\sf ADOLC\_USE\_CALLOC}{\rm :}] Selects the memory allocation routine
934  used by ADOL-C. {\sf Malloc} will be used if this variable is
935  undefined. {\sf ADOLC\_USE\_CALLOC} is defined by default to avoid incorrect
936  result caused by uninitialized memory. It can be set undefined by
937  passing \texttt{--disable-use-calloc} to \texttt{./configure}.
939\item[{\sf ADOLC\_ADVANCED\_BRANCHING}{\rm :}] Enables routines
940  required for automatic branch selection (default: disabled). The
941  boolean valued comparison operators with two \texttt{adouble} type
942  arguments will not return boolean values anymore and may not be used
943  in branch control statements (\texttt{if}, \texttt{while}, \texttt{for}
944  etc.). Instead conditional assignments using \texttt{condassign} or
945  selection operations on elements of \texttt{advector} type should be
946  used. Enabling this option and rewriting the function evaluation
947  using \texttt{condassign} or selections of \texttt{advector}
948  elements will prevent the need for retracing the function at branch
949  switches. This can be enabled by passing
950  \texttt{--enable-advanced-branching} to \texttt{./configure}.
954\subsection{Warnings and Suggestions for Improved Efficiency}
957Since the type {\sf adouble} has a nontrivial constructor,
958the mere declaration of large {\sf adouble} arrays may take up
959considerable run time. The user should be warned against
960the usual Fortran practice of declaring fixed-size arrays
961that can accommodate the largest possible case of an evaluation program
962with variable dimensions. If such programs are converted to or written
963in C, the overloading in combination with ADOL-C will lead to very
964large run time increases for comparatively small values of the
965problem dimension, because the actual computation is completely
966dominated by the construction of the large {\sf adouble} arrays.
967The user is advised to
968create dynamic arrays of
969{\sf adouble}s by using the C++ operator {\sf new} and to destroy them
970using {\sf delete}. For storage efficiency it is desirable that
971dynamic objects are created and destroyed in a last-in-first-out
974Whenever an {\sf adouble} is declared, the constructor for the type
975{\sf adouble} assigns it a nominal address, which we will refer to as
976its  {\em location}.  The location is of the type {\sf locint} defined
977in the header file \verb=<adolc/usrparms.h>=. Active vectors occupy
978a range of contiguous locations. As long as the program execution
979never involves more than 65$\,$536 active variables, the type {\sf locint}
980may be defined as {\sf unsigned short}. Otherwise, the range may be
981extended by defining {\sf locint} as {\sf (unsigned) int} or
982{\sf (unsigned) long}, which may nearly double
983the overall mass storage requirement. Sometimes one can avoid exceeding
984the accessible range of {\sf unsigned short}s by using more local variables and deleting
985{\sf adouble}s  created by the new operator in a
987fashion.  When memory for {\sf adouble}s is requested through a call to
988{\sf malloc()} or other related C memory-allocating
989functions, the storage for these {\sf adouble}s is allocated; however, the
990C++ {\sf adouble} constructor is never called.  The newly defined
991{\sf adouble}s are never assigned a location and are not counted in
992the stack of live variables. Thus, any results depending upon these
993pseudo-{\sf adouble}s will be incorrect. For these reasons {\bf DO NOT use
994  malloc() and related C memory-allocating
995functions when declaring adoubles (see the following paragraph).}
997% XXX: Vector and matrix class have to be reimplemented !!!
999%The same point applies, of course,
1000% for active vectors.
1002When an {\sf adouble}
1004% XXX: Vector and matrix class have to be reimplemented !!!
1006% or {\bf adoublev}
1007goes out of
1008scope or is explicitly deleted, the destructor notices that its
1009location(s) may be
1010freed for subsequent (nominal) reallocation. In general, this is not done
1011immediately but is delayed until the locations to be deallocated form a
1012contiguous tail of all locations currently being used. 
1014 As a consequence of this allocation scheme, the currently
1015alive {\sf adouble} locations always form a contiguous range of integers
1016that grows and shrinks like a stack. Newly declared {\sf adouble}s are
1017placed on the top so that vectors of {\sf adouble}s obtain a contiguous
1018range of locations. While the C++ compiler can be expected to construct
1019and destruct automatic variables in a last-in-first-out fashion, the
1020user may upset this desirable pattern by deleting free-store {\sf adouble}s
1021too early or too late. Then the {\sf adouble} stack may grow
1022unnecessarily, but the numerical results will still be
1023correct, unless an exception occurs because the range of {\sf locint}
1024is exceeded. In general, free-store {\sf adouble}s
1026% XXX: Vector and matrix class have to be reimplemented !!!
1028%and {\bf adoublev}s
1029should be deleted in a last-in-first-out fashion toward the end of
1030the program block in which they were created.
1031When this pattern is maintained, the maximum number of
1032{\sf adouble}s alive and, as a consequence, the
1033randomly accessed storage space
1034of the derivative evaluation routines is bounded by a
1035small multiple of the memory used in the relevant section of the
1036original program. Failure to delete dynamically allocated {\sf adouble}s
1037may cause that the  maximal number of {\sf adouble}s alive at one time will be exceeded
1038if the same active section is called repeatedly. The same effect
1039occurs if static {\sf adouble}s are used.
1041To avoid the storage and manipulation of structurally
1042trivial derivative values, one should pay careful attention to
1043the naming of variables. Ideally, the intermediate
1044values generated during the evaluation of a vector function
1045should be assigned to program variables that are
1046consistently either active or passive, in that all their values
1047either are or are not dependent on the independent variables
1048in a nontrivial way. For example, this rule is violated if a temporary
1049variable is successively used to accumulate inner products involving
1050first only passive and later active arrays. Then the first inner
1051product and all its successors in the data dependency graph become
1052artificially active and the derivative evaluation routines
1053described later will waste
1054time allocating and propagating
1055trivial or useless derivatives. Sometimes even values that do
1056depend on the independent variables may be of only transitory
1057importance and may not affect the dependent variables. For example,
1058this is true for multipliers that are used to scale linear
1059equations, but whose values do not influence the dependent
1060variables in a mathematical sense. Such dead-end variables
1061can be deactivated by the use of the {\sf value} function, which
1062converts {\sf adouble}s to {\sf double}s. The deleterious effects
1063of unnecessary activity are partly alleviated by run time
1064activity flags in the derivative routine
1065{\sf hov\_reverse} presented in \autoref{forw_rev_ad}.
1067The {\sf adouble} default constructor sets to zero the associated value.
1068This implies a certain overhead that may seem unnecessary when no initial value
1069is actually given, however,
1070the implicit initialization of arrays from a partial value list is the only legitimate construct (known to us) that requires this behavior.
1071An array instantiation such as
1073\sf double x[3]=\{2.0\};
1075will initialize {\sf x[0]} to {\sf 2.0} and initialize (implicitly) the remaining array elements
1076{\sf x[1]} and {\sf x[2]}  to {\sf 0.0}. According to the C++ standard the array element  construction of
1077the type changed instantiation
1079\sf adouble x[3]=\{2.0\};
1081will use the constructor {\sf adouble(const double\&);} for {\sf x[0]} passing in {\sf 2.0} but
1082will call the {\sf adouble} default constructor {\sf x[1]} and {\sf x[2]} leaving these array
1083elements uninitialized {\em unless} the default constructor does implement the initialization to
1085The C++ constructor syntax does not provide a means to  distinguish this implicit initialization from the declaration of any simple uninitialized variable.
1086If the user can ascertain the absence of array instantiations such as the above then one can 
1087configure ADOL-C with the \verb=--disable-stdczero= option , see \autoref{genlib}, to
1088avoid the overhead of these initializations. 
1094\section{Easy-To-Use Drivers}
1097For the convenience of the user, ADOL-C provides several
1098easy-to-use drivers that compute the most frequently required
1099derivative objects. Throughout, we assume that after the execution of an
1100active section, the corresponding tape with the identifier {\sf tag}
1101contains a detailed record of the computational process by which the
1102final values $y$ of the dependent variables were obtained from the
1103values $x$ of the independent variables. We will denote this functional
1104relation between the input variables $x$ and the output variables $y$ by
1106F : \R^n \mapsto \R^m, \qquad x \rightarrow F(x) \equiv y.
1108The return value of all drivers presented in this section
1109indicate the validity of the tape as explained in \autoref{reuse_tape}.
1110The presented drivers are all C functions and therefore can be used within
1111C and C++ programs. Some Fortran-callable companions can be found
1112in the appropriate header files.
1115\subsection{Drivers for Optimization and Nonlinear Equations}
1119The drivers provided for solving optimization problems and nonlinear
1120equations are prototyped in the header file \verb=<adolc/drivers/drivers.h>=,
1121which is included automatically by the global header file \verb=<adolc/adolc.h>=
1122(see \autoref{ssec:DesIH}).
1124The routine {\sf function} allows to evaluate the desired function from
1125the tape instead of executing the corresponding source code:
1128\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1129\>{\sf int function(tag,m,n,x,y)}\\
1130\>{\sf short int tag;}         \> // tape identification \\
1131\>{\sf int m;}                 \> // number of dependent variables $m$\\
1132\>{\sf int n;}                 \> // number of independent variables $n$\\
1133\>{\sf double x[n];}           \> // independent vector $x$ \\
1134\>{\sf double y[m];}           \> // dependent vector $y=F(x)$ 
1137If the original evaluation program is available this double version
1138should be used to compute the function value in order to avoid the
1139interpretative overhead. 
1141For the calculation of whole derivative vectors and matrices up to order
11422 there are the following procedures:
1145\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1146\>{\sf int gradient(tag,n,x,g)}\\
1147\>{\sf short int tag;}         \> // tape identification \\
1148\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1149\>{\sf double x[n];}           \> // independent vector $x$ \\
1150\>{\sf double g[n];}           \> // resulting gradient $\nabla F(x)$
1154\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1155\>{\sf int jacobian(tag,m,n,x,J)}\\
1156\>{\sf short int tag;}         \> // tape identification \\
1157\>{\sf int m;}                 \> // number of dependent variables $m$\\
1158\>{\sf int n;}                 \> // number of independent variables $n$\\
1159\>{\sf double x[n];}           \> // independent vector $x$ \\
1160\>{\sf double J[m][n];}        \> // resulting Jacobian $F^\prime (x)$
1164\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1165\>{\sf int hessian(tag,n,x,H)}\\
1166\>{\sf short int tag;}         \> // tape identification \\
1167\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1168\>{\sf double x[n];}           \> // independent vector $x$ \\
1169\>{\sf double H[n][n];}        \> // resulting Hessian matrix $\nabla^2F(x)$ 
1172The driver routine {\sf hessian} computes only the lower half of
1173$\nabla^2f(x_0)$ so that all values {\sf H[i][j]} with $j>i$ 
1174of {\sf H} allocated as a square array remain untouched during the call
1175of {\sf hessian}. Hence only $i+1$ {\sf double}s  need to be
1176allocated starting at the position {\sf H[i]}.
1178To use the full capability of automatic differentiation when the
1179product of derivatives with certain weight vectors or directions are needed, ADOL-C offers
1180the following four drivers: 
1183\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1184\>{\sf int vec\_jac(tag,m,n,repeat,x,u,z)}\\
1185\>{\sf short int tag;}         \> // tape identification \\
1186\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1187\>{\sf int n;}                 \> // number of independent variables $n$\\
1188\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1189\>{\sf double x[n];}           \> // independent vector $x$ \\
1190\>{\sf double u[m];}           \> // range weight vector $u$ \\ 
1191\>{\sf double z[n];}           \> // result $z = u^TF^\prime (x)$
1193If a nonzero value of the parameter {\sf repeat} indicates that the
1194routine {\sf vec\_jac} has been called at the same argument immediately
1195before, the internal forward mode evaluation will be skipped and only
1196reverse mode evaluation with the corresponding arguments is executed
1197resulting in a reduced computational complexity of the function {\sf vec\_jac}.
1200\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1201\>{\sf int jac\_vec(tag,m,n,x,v,z)}\\
1202\>{\sf short int tag;}         \> // tape identification \\
1203\>{\sf int m;}                 \> // number of dependent variables $m$\\
1204\>{\sf int n;}                 \> // number of independent variables $n$\\
1205\>{\sf double x[n];}           \> // independent vector $x$\\
1206\>{\sf double v[n];}           \> // tangent vector $v$\\ 
1207\>{\sf double z[m];}           \> // result $z = F^\prime (x)v$
1211\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1212\>{\sf int hess\_vec(tag,n,x,v,z)}\\
1213\>{\sf short int tag;}         \> // tape identification \\
1214\>{\sf int n;}                 \> // number of independent variables $n$\\
1215\>{\sf double x[n];}           \> // independent vector $x$\\
1216\>{\sf double v[n];}           \> // tangent vector $v$\\
1217\>{\sf double z[n];}           \> // result $z = \nabla^2F(x) v$ 
1221\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1222\>{\sf int hess\_mat(tag,n,p,x,V,Z)}\\
1223\>{\sf short int tag;}         \> // tape identification \\
1224\>{\sf int n;}                 \> // number of independent variables $n$\\
1225\>{\sf int p;}                 \> // number of columns in $V$\\
1226\>{\sf double x[n];}           \> // independent vector $x$\\
1227\>{\sf double V[n][p];}        \> // tangent matrix $V$\\
1228\>{\sf double Z[n][p];}        \> // result $Z = \nabla^2F(x) V$ 
1232\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1233\>{\sf int lagra\_hess\_vec(tag,m,n,x,v,u,h)}\\
1234\>{\sf short int tag;}         \> // tape identification \\
1235\>{\sf int m;}                 \> // number of dependent variables $m$\\
1236\>{\sf int n;}                 \> // number of independent variables $n$\\
1237\>{\sf double x[n];}           \> // independent vector $x$\\
1238\>{\sf double v[n];}           \> // tangent vector $v$\\
1239\>{\sf double u[m];}           \> // range weight vector $u$ \\
1240\>{\sf double h[n];}           \> // result $h = u^T\nabla^2F(x) v $
1243The next procedure allows the user to perform Newton steps only
1244having the corresponding tape at hand:
1247\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1248\>{\sf int jac\_solv(tag,n,x,b,mode)} \\
1249\>{\sf short int tag;}         \> // tape identification \\
1250\>{\sf int n;}                 \> // number of independent variables $n$\\
1251\>{\sf double x[n];}           \> // independent vector $x$ as\\
1252\>{\sf double b[n];}           \> // in: right-hand side b, out: result $w$ of
1253$F(x)w = b$\\
1254\>{\sf int mode;}              \> // option to choose different solvers
1257On entry, parameter {\sf b} of the routine {\sf jac\_solv}
1258contains the right-hand side of the equation $F(x)w = b$ to be solved. On exit,
1259{\sf b} equals the solution $w$ of this equation. If {\sf mode} = 0 only
1260the Jacobian of the function
1261given by the tape labeled with {\sf tag} is provided internally.
1262The LU-factorization of this Jacobian is computed for {\sf mode} = 1. The
1263solution of the equation is calculated if {\sf mode} = 2.
1264Hence, it is possible to compute the
1265LU-factorization only once. Then the equation can be solved for several
1266right-hand sides $b$ without calculating the Jacobian and
1267its factorization again. 
1269If the original evaluation code of a function contains neither
1270quadratures nor branches, all drivers described above can be used to
1271evaluate derivatives at any argument in its domain. The same still
1272applies if there are no user defined quadratures and
1273all comparisons  involving {\sf adouble}s have the same result as
1274during taping. If this assumption is falsely made all drivers
1275while internally calling the forward mode evaluation will return the value -1 or -2
1276as already specified in \autoref{reuse_tape}
1279\subsection{Drivers for Ordinary Differential Equations}
1282When $F$ is the right-hand side of an (autonomous) ordinary
1283differential equation 
1285x^\prime(t) \; = \; F(x(t)) , 
1287we must have $m=n$. Along any solution path $x(t)$ its Taylor
1288coefficients $x_j$ at some time, e.g., $t=0$, must satisfy
1289the relation
1291 x_{i+1} = \frac{1}{1+i} y_i.
1293with the $y_j$ the Taylor coefficients of its derivative $y(t)=x^\prime(t)$, namely,
1295 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
1297defined by an autonomous right-hand side $F$ recorded on the tape.
1298Using this relation, one can generate the Taylor coefficients $x_i$,
1299$i \le deg$,
1300recursively from the current point $x_0$. This task is achieved by the
1301driver routine {\sf forode} defined as follows:
1304\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1305\>{\sf int forode(tag,n,tau,dol,deg,X)}\\
1306\>{\sf short int tag;}         \> // tape identification \\
1307\>{\sf int n;}                 \> // number of state variables $n$\\
1308\>{\sf double tau;}            \> // scaling parameter\\
1309\>{\sf int dol;}               \> // degree on previous call\\
1310\>{\sf int deg;}               \> // degree on current call\\
1311\>{\sf double X[n][deg+1];}    \> // Taylor coefficient vector $X$
1314If {\sf dol} is positive, it is assumed that {\sf forode}
1315has been called before at the same point so that all Taylor coefficient
1316vectors up to the {\sf dol}-th are already correct.
1318Subsequently one may call the driver routine {\sf reverse} or corresponding
1319low level routines as explained in the \autoref{forw_rev} and
1320\autoref{forw_rev_ad}, respectively, to compute
1321the family of square matrices {\sf Z[n][n][deg]} defined by
1323Z_j \equiv U\/\frac{\partial y_j}{\partial x_0} \in{I\!\!R}^{q \times n} ,
1325with {\sf double** U}$=I_n$ the identity matrix of order {\sf n}.
1327For the numerical solutions of ordinary differential equations,
1328one may also wish to calculate the Jacobians
1331B_j \; \equiv \; \frac{\mbox{d}x_{j+1}}{\mbox{d} x_0}\;\in\;{I\!\!R}^{n \times n}\, ,
1333which exist provided $F$ is sufficiently smooth. These matrices can
1334be obtained from the partial derivatives $\partial y_i/\partial x_0$
1335by an appropriate version of the chain rule.
1336To compute the total derivatives $B = (B_j)_{0\leq j <d}$
1337defined in \eqref{eq:bees}, one has to evaluate $\frac{1}{2}d(d-1)$
1338matrix-matrix products. This can be done by a call of the routine {\sf accode} after the
1339corresponding evaluation of the {\sf hov\_reverse} function. The interface of
1340{\sf accode} is defined as follows:
1343\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1344\>{\sf int accode(n,tau,deg,Z,B,nz)}\\
1345\>{\sf int n;}                 \> // number of state variables $n$ \\
1346\>{\sf double tau;}            \> // scaling parameter\\
1347\>{\sf int deg;}               \> // degree on current call\\
1348\>{\sf double Z[n][n][deg];}   \> // partials of coefficient vectors\\
1349\>{\sf double B[n][n][deg];}   \> // result $B$ as defined in \eqref{eq:bees}\\
1350\>{\sf short nz[n][n];}        \> // optional nonzero pattern
1353Sparsity information can be exploited by {\sf accode} using the array {\sf
1354nz}. For this purpose, {\sf nz} has to be set by a call of the routine {\sf
1355reverse} or the corresponding basic routines as explained below in
1356\autoref{forw_rev_ad} and \autoref{forw_rev}, respectively. The
1357non-positive entries of {\sf nz} are then changed by {\sf accode} so that upon
1360  \mbox{{\sf B[i][j][k]}} \; \equiv \; 0 \quad {\rm if} \quad \mbox{\sf k} \leq \mbox{\sf $-$nz[i][j]}\; .
1362In other words, the matrices $B_k$ = {\sf B[ ][ ][k]} have a
1363sparsity pattern that fills in as $k$ grows. Note, that there need to be no
1364loss in computational efficiency if a time-dependent ordinary differential equation
1365is rewritten in autonomous form.
1367The prototype of the ODE-drivers {\sf forode} and {\sf accode} is contained in the header file
1368\verb=<adolc/drivers/odedrivers.h>=. The global header file
1370includes this file automatically, see \autoref{ssec:DesIH}.
1372An example program using the procedures {\sf forode} and {\sf accode} together
1373with more detailed information about the coding can be found in
1374\autoref{exam:ode}. The corresponding source code
1375\verb=odexam.cpp= is contained in the subdirectory
1380\subsection{Drivers for Sparse Jacobians and Sparse Hessians}
1383Quite often, the Jacobians and Hessians that have to be computed are sparse
1384matrices. Therefore, ADOL-C provides additionally drivers that
1385allow the exploitation of sparsity. The exploitation of sparsity is
1386frequently based on {\em graph coloring} methods, discussed
1387for example in \cite{GeMaPo05} and \cite{GeTaMaPo07}. The sparse drivers of ADOL-C presented in this section
1388rely on the the coloring package ColPack developed by the authors of \cite{GeMaPo05} and \cite{GeTaMaPo07}.
1389ColPack is not directly incorporated in ADOL-C, and therefore needs to be installed
1390separately to use the sparse drivers described here. ColPack is available for download at
1391\verb= More information about the required
1392installation of ColPack is given in \autoref{install}.
1394\subsubsection*{Sparse Jacobians and Sparse Hessians}
1396To compute the entries of sparse Jacobians and sparse Hessians,
1397respectively, in coordinate format one may use the drivers:
1399\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1400\>{\sf int sparse\_jac(tag,m,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1401\>{\sf short int tag;}         \> // tape identification \\
1402\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1403\>{\sf int n;}                 \> // number of independent variables $n$\\
1404\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1405\>{\sf double x[n];}           \> // independent vector $x$ \\
1406\>{\sf int nnz;}               \> // number of nonzeros \\ 
1407\>{\sf unsigned int rind[nnz];}\> // row index\\ 
1408\>{\sf unsigned int cind[nnz];}\> // column index\\ 
1409\>{\sf double values[nnz];}    \> // non-zero values\\ 
1410\>{\sf int options[4];}        \> // array of control parameters\\ 
1414\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1415\>{\sf int sparse\_hess(tag,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1416\>{\sf short int tag;}         \> // tape identification \\
1417\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1418\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1419\>{\sf double x[n];}           \> // independent vector $x$ \\
1420\>{\sf int nnz;}               \> // number of nonzeros \\ 
1421\>{\sf unsigned int rind[nnz];}\> // row indices\\ 
1422\>{\sf unsigned int cind[nnz];}\> // column indices\\ 
1423\>{\sf double values[nnz];}    \> // non-zero values  \\
1424\>{\sf int options[2];}        \> // array of control parameters\\ 
1427Once more, the input variables are the identifier for the internal
1428representation {\sf tag}, if required the number of dependents {\sf m},
1429and the number of independents {\sf n} for a consistency check.
1430Furthermore, the flag {\sf repeat=0} indicates that the functions are called
1431at a point with a new sparsity structure, whereas  {\sf repeat=1} results in
1432the re-usage of the sparsity pattern from the previous call.
1433The current values of the independents are given by the array {\sf x}.
1434The input/output
1435variable {\sf nnz} stores the number of the nonzero entries.
1436Therefore, {\sf nnz} denotes also the length of the arrays {\sf r\_ind} storing
1437the row indices, {\sf c\_ind} storing the column indices, and
1438{\sf values} storing the values of the nonzero entries.
1439If {\sf sparse\_jac} and {\sf sparse\_hess} are called with {\sf repeat=0},
1440the functions determine the number of nonzeros for the sparsity pattern
1441defined by the value of {\sf x}, allocate appropriate arrays {\sf r\_ind},
1442{\sf c\_ind}, and {\sf values} and store the desired information in these
1444During the next function call with {\sf repeat=1} the allocated memory
1445is reused such that only the values of the arrays are changed.   
1446Before calling {\sf sparse\_jac} or {\sf sparse\_hess} once more with {\sf
1447  repeat=0} the user is responsible for the deallocation of the array
1448 {\sf r\_ind}, {\sf c\_ind}, and {\sf values} using the function {\sf
1449   delete[]}!
1451For each driver the array {\sf options} can be used to adapted the
1452computation of the sparse derivative matrices to the special
1453needs of application under consideration. Most frequently, the default options
1454will give a reasonable performance. The elements of the array {\sf options} control the action of
1455{\sf sparse\_jac} according to \autoref{options_sparse_jac}.
1458\begin{tabular}{|c|c|l|} \hline
1459component & value &  \\ \hline
1460{\sf options[0]} &    &  way of sparsity pattern computation \\
1461                 & 0  &  propagation of index domains (default) \\
1462                 & 1  &  propagation of bit pattern \\ \hline
1463{\sf options[1]} &    &  test the computational graph control flow \\
1464                 & 0  &  safe mode (default) \\
1465                 & 1  &  tight mode \\ \hline
1466{\sf options[2]} &    &  way of bit pattern propagation \\
1467                 & 0  &  automatic detection (default) \\
1468                 & 1  &  forward mode \\ 
1469                 & 2  &  reverse mode \\ \hline
1470{\sf options[3]} &    &  way of compression \\
1471                 & 0  &  column compression (default) \\
1472                 & 1  &  row compression \\ \hline
1474\caption{ {\sf sparse\_jac} parameter {\sf options}\label{options_sparse_jac}}
1477The component {\sf options[1]} determines
1478the usage of the safe or tight mode of sparsity computation.
1479The first, more conservative option is the default. It accounts for all
1480dependences that might occur for any value of the
1481independent variables. For example, the intermediate
1482{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1483always assumed to depend on all independent variables that {\sf a} or {\sf b}
1484dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1485logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1486In contrast
1487the tight option gives this result only in the unlikely event of an exact
1488tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1489associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1490depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1491Obviously, the sparsity pattern obtained with the tight option may contain
1492more zeros than that obtained with the safe option. On the other hand, it
1493will only be valid at points belonging to an area where the function $F$ is locally
1494analytic and that contains the point at which the internal representation was
1495generated. Since generating the sparsity structure using the safe version does not
1496require any reevaluation, it may thus reduce the overall computational cost
1497despite the fact that it produces more nonzero entries.
1498The value of {\sf options[2]} selects the direction of bit pattern propagation.
1499Depending on the number of independent $n$ and of dependent variables $m$ 
1500one would prefer the forward mode if $n$ is significant smaller than $m$ and
1501would otherwise use the reverse mode.
1503 The elements of the array {\sf options} control the action of
1504{\sf sparse\_hess} according to \autoref{options_sparse_hess}.
1507\begin{tabular}{|c|c|l|} \hline
1508component & value &  \\ \hline
1509{\sf options[0]} &    &  test the computational graph control flow \\
1510                 & 0  &  safe mode (default) \\
1511                 & 1  &  tight mode \\ \hline
1512{\sf options[1]} &    &  way of recovery \\
1513                 & 0  &  indirect recovery (default) \\
1514                 & 1  &  direct recovery \\ \hline
1516\caption{ {\sf sparse\_hess} parameter {\sf options}\label{options_sparse_hess}}
1519The described driver routines for the computation of sparse derivative
1520matrices are prototyped in the header file
1521\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1522global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1523Example codes illustrating the usage of {\sf
1524  sparse\_jac} and {\sf sparse\_hess} can be found in the file
1525\verb=sparse_jacobian.cpp=  and \verb=sparse_hessian.cpp= contained in %the subdirectory
1529\subsubsection*{Computation of Sparsity Pattern}
1531ADOL-C offers a convenient way of determining the 
1532sparsity structure of a Jacobian matrix using the function:
1535\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1536\>{\sf int jac\_pat(tag, m, n, x, JP, options)}\\
1537\>{\sf short int tag;} \> // tape identification \\
1538\>{\sf int m;} \> // number of dependent variables $m$\\
1539\>{\sf int n;} \> // number of independent variables $n$\\
1540\>{\sf double x[n];} \> // independent variables $x_0$\\
1541\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure\\
1542\>{\sf int options[2];} \> // array of control parameters
1545The sparsity pattern of the
1546Jacobian is computed in a compressed row format. For this purpose,
1547{\sf JP} has to be an $m$ dimensional array of pointers to {\sf
1548  unsigned int}s, i.e., one has {\sf unsigned int* JP[m]}.
1549During the call of  {\sf jac\_pat}, the number $\hat{n}_i$ of nonzero
1550entries in row $i$ of the Jacobian is determined for all $1\le i\le
1551m$. Then, a memory allocation is performed such that {\sf JP[i-1]}
1552points to a block of $\hat{n}_i+1$ {\sf  unsigned int} for all $1\le
1553i\le m$ and {\sf JP[i-1][0]} is set to $\hat{n}_i$. Subsequently, the
1554column indices of the $j$ nonzero entries in the $i$th row are stored
1555in the components  {\sf JP[i-1][1]}, \ldots, {\sf JP[i-1][j]}.
1557The elements of the array {\sf options} control the action of
1558{\sf jac\_pat} according to \autoref{options}.
1561\begin{tabular}{|c|c|l|} \hline
1562component & value &  \\ \hline
1563{\sf options[0]} &    &  way of sparsity pattern computation \\
1564                 & 0  &  propagation of index domains (default) \\
1565                 & 1  &  propagation of bit pattern \\ \hline
1566{\sf options[1]} &    &  test the computational graph control flow \\
1567                 & 0  &  safe mode (default) \\
1568                 & 1  &  tight mode \\ \hline
1569{\sf options[2]} &    &  way of bit pattern propagation \\
1570                 & 0  &  automatic detection (default) \\
1571                 & 1  &  forward mode \\ 
1572                 & 2  &  reverse mode \\ \hline
1574\caption{ {\sf jac\_pat} parameter {\sf options}\label{options}}
1576The value of {\sf options[0]} selects the way to compute the sparsity
1577pattern. The component {\sf options[1]} determines
1578the usage of the safe or tight mode of bit pattern propagation.
1579The first, more conservative option is the default. It accounts for all
1580dependences that might occur for any value of the
1581independent variables. For example, the intermediate
1582{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1583always assumed to depend on all independent variables that {\sf a} or {\sf b}
1584dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1585logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1586In contrast
1587the tight option gives this result only in the unlikely event of an exact
1588tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1589associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1590depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1591Obviously, the sparsity pattern obtained with the tight option may contain
1592more zeros than that obtained with the safe option. On the other hand, it
1593will only be valid at points belonging to an area where the function $F$ is locally
1594analytic and that contains the point at which the internal representation was
1595generated. Since generating the sparsity structure using the safe version does not
1596require any reevaluation, it may thus reduce the overall computational cost
1597despite the fact that it produces more nonzero entries. The value of
1598{\sf options[2]} selects the direction of bit pattern propagation.
1599Depending on the number of independent $n$ and of dependent variables $m$ 
1600one would prefer the forward mode if $n$ is significant smaller than $m$ and
1601would otherwise use the reverse mode.
1603The routine {\sf jac\_pat} may use the propagation of bitpattern to
1604determine the sparsity pattern. Therefore, a kind of ``strip-mining''
1605is used to cope with large matrix dimensions. If the system happens to run out of memory, one may reduce
1606the value of the constant {\sf PQ\_STRIPMINE\_MAX}
1607following the instructions in \verb=<adolc/sparse/sparse_fo_rev.h>=.
1609The driver routine is prototyped in the header file
1610\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1611global header file \verb=<adolc/adolc.h>= (see
1612\autoref{ssec:DesIH}). The determination of sparsity patterns is
1613illustrated by the examples \verb=sparse_jacobian.cpp=
1614and \verb=jacpatexam.cpp=
1615contained in
1618To compute the sparsity pattern of a Hessian in a row compressed form, ADOL-C provides the
1621\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1622\>{\sf int hess\_pat(tag, n, x, HP, options)}\\
1623\>{\sf short int tag;}       \> // tape identification \\
1624\>{\sf int n;}               \> // number of independent variables $n$\\
1625\>{\sf double x[n];}         \> // independent variables $x_0$\\
1626\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure\\
1627\>{\sf int option;}          \> // control parameter
1629where the user has to provide {\sf HP} as an $n$ dimensional array of pointers to {\sf
1630 unsigned int}s.
1631After the function call {\sf HP} contains the sparsity pattern,
1632where {\sf HP[j][0]} contains the number of nonzero elements in the
1633 $j$th row for $1 \le j\le n$.
1634The components {\sf P[j][i]}, $0<${\sf i}~$\le$~{\sf P[j][0]} store the
1635 indices of these entries. For determining the sparsity pattern, ADOL-C uses
1636 the algorithm described in \cite{Wa05a}.  The parameter{\sf option} determines
1637the usage of the safe ({\sf option = 0}, default) or tight mode ({\sf
1638  option = 1}) of the computation of the sparsity pattern as described
1641This driver routine is prototyped in the header file
1642\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1643global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1644An example employing the procedure {\sf hess\_pat}  can be found in the file
1645\verb=sparse_hessian.cpp=  contained in
1649\subsubsection*{Calculation of Seed Matrices}
1651To compute a compressed derivative matrix from a given sparsity
1652pattern, one has to calculate an appropriate seed matrix that can be
1653used as input for the derivative calculation. To facilitate the
1654generation of seed matrices for a sparsity pattern given in
1655row compressed form, ADOL-C provides the following two drivers,
1656which are based on the ColPack library:
1658\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1659\>{\sf int generate\_seed\_jac(m, n, JP, S, p)}\\
1660\>{\sf int m;} \> // number of dependent variables $m$\\
1661\>{\sf int n;} \> // number of independent variables $n$\\
1662\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure
1663of Jacobian\\
1664\>{\sf double S[n][p];} \> // seed matrix\\
1665\>{\sf int p;} \> // number of columns in $S$
1667The input variables to {\sf generate\_seed\_jac} are the number of dependent variables $m$, the
1668number of independent variables {\sf n} and the sparsity pattern {\sf
1669  JP} of the Jacobian computed for example by {\sf jac\_pat}. First,
1670{\sf generate\_seed\_jac} performs a distance-2 coloring of the bipartite graph defined by the sparsity
1671pattern {\sf JP} as described in \cite{GeMaPo05}. The number of colors needed for the coloring
1672determines the number of columns {\sf p} in the seed
1673matrix. Subsequently, {\sf generate\_seed\_jac} allocates the memory needed by {\sf
1674 S} and initializes {\sf S} according to the graph coloring.
1675The coloring algorithm that is applied in {\sf
1676  generate\_seed\_jac} is used also by the driver {\sf sparse\_jac}
1677described earlier.
1680\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1681\>{\sf int generate\_seed\_hess(n, HP, S, p)}\\
1682\>{\sf int n;} \> // number of independent variables $n$\\
1683\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure
1684of Jacobian\\
1685\>{\sf double S[n][p];} \> // seed matrix\\
1686\>{\sf int p;} \> // number of columns in $S$
1688The input variables to {\sf generate\_seed\_hess} are the number of independents $n$
1689and the sparsity pattern {\sf HP} of the Hessian computed for example
1690by {\sf hess\_pat}. First, {\sf generate\_seed\_hess} performs an
1691appropriate coloring of the adjacency graph defined by the sparsity
1692pattern {\sf HP}: An acyclic coloring in the case of an indirect recovery of the Hessian from its
1693    compressed representation and a star coloring in the case of a direct recovery.
1694 Subsequently, {\sf generate\_seed\_hess} allocates the memory needed by {\sf
1695 S} and initializes {\sf S} according to the graph coloring.
1696The coloring algorithm applied in {\sf
1697  generate\_seed\_hess} is used also by the driver {\sf sparse\_hess}
1698described earlier.
1700The specific set of criteria used to define a seed matrix $S$ depends
1701on whether the sparse derivative matrix
1702to be computed is a Jacobian (nonsymmetric) or a Hessian (symmetric). 
1703It also depends on whether the entries of the derivative matrix  are to be
1704recovered from the compressed representation \emph{directly}
1705(without requiring any further arithmetic) or \emph{indirectly} (for
1706example, by solving for unknowns via successive substitutions).
1707Appropriate recovery routines are provided by ColPack and used
1708in the drivers {\sf sparse\_jac} and {\sf sparse\_hess} described in
1709the previous subsection. Examples with a detailed analysis of the
1710employed drivers for the exploitation of sparsity can be found in the
1711papers \cite{GePoTaWa06} and \cite{GePoWa08}.
1714These driver routines are prototyped in
1715\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1716global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1717An example code illustrating the usage of {\sf
1718generate\_seed\_jac} and {\sf generate\_seed\_hess} can be found in the file
1719\verb=sparse_jac_hess_exam.cpp= contained in \verb=examples/additional_examples/sparse=.
1722\subsection{Higher Derivative Tensors}
1725Many applications in scientific computing need second- and higher-order
1726derivatives. Often, one does not require full derivative tensors but
1727only the derivatives in certain directions $s_i \in \R^{n}$.
1728Suppose a collection of $p$ directions
1729$s_i \in \R^{n}$ is given, which form a matrix
1731S\; =\; \left [ s_1, s_2,\ldots,  s_p \right ]\; \in \;
1732 \R^{n \times p}.
1734One possible choice is $S = I_n$ with  $p = n$, which leads to
1735full tensors being evaluated.
1736ADOL-C provides the function {\sf tensor\_eval}
1737to calculate the derivative tensors
1740\left. \nabla_{\mbox{$\scriptstyle \!\!S$}}^{k}
1741     F(x_0) \; = \; \frac{\partial^k}{\partial z^k} F(x_0+Sz) \right |_{z=0} 
1742     \in \R^{p^k}\quad \mbox{for} \quad k = 0,\ldots,d
1744simultaneously. The function {\sf tensor\_eval} has the following calling sequence and
1748\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1749\>{\sf void tensor\_eval(tag,m,n,d,p,x,tensor,S)}\\
1750\>{\sf short int tag;}         \> // tape identification \\
1751\>{\sf int m;}                 \> // number of dependent variables $m$ \\
1752\>{\sf int n;}                 \> // number of independent variables $n$\\
1753\>{\sf int d;}                 \> // highest derivative degree $d$\\
1754\>{\sf int p;}                 \> // number of directions $p$\\
1755\>{\sf double x[n];}           \> // values of independent variables $x_0$\\
1756\>{\sf double tensor[m][size];}\> // result as defined in \eqref{eq:tensor} in compressed form\\
1757\>{\sf double S[n][p];}        \> // seed matrix $S$
1760Using the symmetry of the tensors defined by \eqref{eq:tensor}, the memory 
1761requirement can be reduced enormously. The collection of  tensors up to order $d$ comprises 
1762$\binom{p+d}{d}$ distinct elements. Hence, the second dimension of {\sf tensor} must be
1763greater or equal to $\binom{p+d}{d}$.
1764To compute the derivatives, {\sf tensor\_eval} propagates internally univariate Taylor
1765series along $\binom{n+d-1}{d}$ directions. Then the desired values are interpolated. This
1766approach is described in \cite{Griewank97}.
1768The access of individual entries in symmetric tensors of
1769higher order is a little tricky. We always store the derivative
1770values in the two dimensional array {\sf tensor} and provide two
1771different ways of accessing them. 
1772The leading dimension of the tensor array ranges over
1773the component index $i$ of the function $F$, i.e., $F_{i+1}$ for $i =
17740,\ldots,m-1$. The sub-arrays pointed to by {\sf tensor[i]} have identical
1775structure for all $i$. Each of them represents the symmetric tensors up to
1776order $d$ of the scalar function $F_{i+1}$ in $p$ variables. 
1778The $\binom{p+d}{d}$ mixed partial derivatives in each of the $m$
1779tensors are linearly ordered according to the tetrahedral
1780scheme described by Knuth \cite{Knuth73}. In the familiar quadratic
1781case $d=2$ the derivative with respect to $z_j$ and $z_k$ with $z$ 
1782as in \eqref{eq:tensor} and $j \leq k$ is stored at {\sf tensor[i][l]} with
1783$l = k*(k+1)/2+j$. At $j = 0 = k$ and hence $l = 0$ we find the
1784function value $F_{i+1}$ itself and the gradient
1785$\nabla F_{i+1}= \partial F_{i+1}/\partial x_k $ is stored at $l=k(k+1)/2$
1786with $j=0$ for $k=1,\ldots,p$.
1788For general $d$ we combine the variable
1789indices to a multi-index $j = (j_1,j_2,\ldots,j_d)$,
1790where $j_k$ indicates differentiation with respect to variable
1791$x_{j_k}$ with $j_k \in \{0,1,\ldots,p\}$. The value $j_k=0$ indicates
1792no differentiation so that all lower derivatives are also
1793contained in the same data structure as described above for
1794the quadratic case. The location of the partial derivative specified
1795by $j$ is computed by the function
1798\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1799\>{\sf int address(d,$\,$j)} \\
1800\>{\sf int d;}                 \> // highest derivative degree $d$ \\
1801\>{\sf int j[d];}              \> // multi-index $j$
1804and it may thus be referenced as {\sf tensor[i][address(d,$\,$j)]}.
1805Notice that the address computation does depend on the degree $d$ 
1806but not on the number of directions $p$, which could theoretically be
1807enlarged without the need to reallocate the original tensor.
1808Also, the components of $j$ need to be non-increasing.
1810To some C programmers it may appear more natural to access tensor
1811entries by successive dereferencing in the form
1812{\sf tensorentry[i][$\,$j1$\,$][$\,$j2$\,$]$\ldots$[$\,$jd$\,$]}.
1813We have also provided this mode, albeit with the restriction
1814that the indices $j_1,j_2,\ldots,j_d$ are non-increasing.
1815In the second order case this means that the Hessian entries must be
1816specified in or below the diagonal. If this restriction is
1817violated the values are almost certain to be wrong and array bounds
1818may be violated. We emphasize that subscripting is not overloaded
1819but that {\sf tensorentry} is a conventional and
1820thus moderately efficient C pointer structure.
1821Such a pointer structure can be allocated and set up completely by the
1825\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1826\>{\sf void** tensorsetup(m,p,d,tensor)} \\
1827\>{\sf int m;}                 \> // number of dependent variables $n$ \\
1828\>{\sf int p;}                 \> // number of directions $p$\\
1829\>{\sf int d;}                 \> // highest derivative degree $d$\\
1830\>{\sf double tensor[m][size];}\> // pointer to two dimensional array
1833Here, {\sf tensor} is the array of $m$ pointers pointing to arrays of {\sf size}
1834$\geq \binom{p+d}{d}$ allocated by the user before. During the execution of {\sf tensorsetup},
1835 $d-1$ layers of pointers are set up so that the return value
1836allows the direct dereferencing of individual tensor elements.
1838For example, suppose some active section involving  $m \geq 5$ dependents and
1839$n \geq 2$ independents has been executed and taped. We may
1840select $p=2$, $d=3$ and initialize the $n\times 2$ seed matrix $S$ with two
1841columns $s_1$ and $s_2$. Then we are able to execute the code segment
1843\hspace{0.5in}\={\sf double**** tensorentry = (double****) tensorsetup(m,p,d,tensor);} \\
1844              \>{\sf tensor\_eval(tag,m,n,d,p,x,tensor,S);}   
1846This way, we evaluated all tensors defined in \eqref{eq:tensor} up to degree 3
1847in both directions $s_1$ and
1848$s_2$ at some argument $x$. To allow the access of tensor entries by dereferencing the pointer
1849structure {\sf tensorentry} has been created. Now, 
1850the value of the mixed partial
1852 \left. \frac{\partial ^ 3 F_5(x+s_1 z_1+s_2 z_2)}{\partial z_1^2 \partial z_2}   \right |_{z_1=0=z_2
1854can be recovered as
1856   {\sf tensorentry[4][2][1][1]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[4][address(d,$\,$j)]},
1858where the integer array {\sf j} may equal (1,1,2), (1,2,1) or (2,1,1). 
1859Analogously, the entry
1861   {\sf tensorentry[2][1][0][0]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[2][address(d,$\,$j)]}
1863with {\sf j} = (1,0,0) contains the first derivative of the third dependent
1864variable $F_3$ with respect to the first differentiation parameter $z_1$.
1866Note, that the pointer structure {\sf tensorentry} has to be set up only once. Changing the values of the
1867array {\sf tensor}, e.g.~by a further call of {\sf tensor\_eval}, directly effects the values accessed
1868by {\sf tensorentry}.
1870When no more derivative evaluations are desired the pointer structure
1871{\sf tensorentry} can be deallocated by a call to the function
1874\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1875\>{\sf int freetensor(m,p,d, (double ****) tensorentry)}\\
1876\>{\sf int m;}                    \> // number of dependent variables $m$ \\
1877\>{\sf int p;}                    \> // number of independent variables $p$\\
1878\>{\sf int d;}                    \> // highest derivative degree $d$\\
1879\>{\sf double*** tensorentry[m];} \> // return value of {\sf tensorsetup} 
1882that does not deallocate the array {\sf tensor}.
1884The drivers provided for efficient calculation of higher order
1885derivatives are prototyped in the header file \verb=<adolc/drivers/taylor.h>=,
1886which is included by the global header file \verb=<adolc/adolc.h>= automatically
1887(see \autoref{ssec:DesIH}).
1888Example codes using the above procedures can be found in the files
1889\verb=taylorexam.C= and \verb=accessexam.C= contained in the subdirectory
1893\subsection{Derivatives of Implicit and Inverse Functions}
1896Frequently, one needs derivatives of variables
1897$y \in \R^{m}$ that are implicitly defined as
1898functions of some variables $x \in \R^{n-m}$
1899by an algebraic system of equations
1901G(z) \; = \; 0 \in \R^m \quad
1902{\rm with} \quad z = (y, x) \in \R^n .
1904Naturally, the $n$ arguments of $G$ need not be partitioned in
1905this regular fashion and we wish to provide flexibility for a
1906convenient selection of the $n-m$ {\em truly} independent
1907variables. Let $P \in \R^{(n-m)\times n}$ be a $0-1$ matrix
1908that picks out these variables so that it is a column
1909permutation of the matrix $[0,I_{n-m}] \in \R^{(n-m)\times n}$.
1910Then the nonlinear system
1912  G(z) \; = \; 0, \quad P z =  x,                           
1914has a regular Jacobian, wherever the implicit function theorem
1915yields $y$ as a function of $x$. Hence, we may also write
1918F(z) = \left(\begin{array}{c}
1919                        G(z) \\
1920                        P z
1921                      \end{array} \right)\; \equiv \;
1922                \left(\begin{array}{c}
1923                        0 \\
1924                        P z
1925                      \end{array} \right)\; \equiv \; S\, x,
1927where $S = [0,I_p]^{T} \in \R^{n \times p}$ with $p=n-m$. Now, we have rewritten
1928the original implicit functional relation between $x$ and $y$ as an inverse
1929relation $F(z) = Sx$. In practice, we may implement the projection $P$ simply
1930by marking $n-m$ of the independents also dependent. 
1932Given any $ F : \R^n \mapsto \R^n $ that is locally invertible and an arbitrary
1933seed matrix $S \in \R^{n \times p}$ we may evaluate all derivatives of $z \in \R^n$
1934with respect to $x \in \R^p$ by calling the following routine:
1937\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1938\>{\sf void inverse\_tensor\_eval(tag,n,d,p,z,tensor,S)}\\
1939\>{\sf short int tag;}         \> // tape identification \\
1940\>{\sf int n;}                 \> // number of variables $n$\\
1941\>{\sf int d;}                 \> // highest derivative degree $d$\\
1942\>{\sf int p;}                 \> // number of directions $p$\\
1943\>{\sf double z[n];}          \> // values of independent variables $z$\\
1944\>{\sf double tensor[n][size];}\> // partials of $z$ with respect to $x$\\
1945\>{\sf double S[n][p];}        \> // seed matrix $S$
1948The results obtained in {\sf tensor} are exactly the same as if we had called {\sf tensor\_eval} with
1949{\sf tag} pointing to a tape for the evaluation of the inverse function
1950$z=F^{-1}(y)$ for which naturally $n=m$. Note that the columns of $S$ belong
1951to the domain of that function. Individual derivative components can be
1952accessed in tensor exactly as in the explicit case described above.
1954It must be understood that {\sf inverse\_tensor\_eval} actually computes the
1955derivatives of $z$ with respect to $x$ that is defined by the equation
1956$F(z)=F(z_0)+S \, x$. In other words the base point at
1957which the inverse function is differentiated is given by $F(z_0)$.
1958The routine has no capability for inverting $F$ itself as
1959solving systems of nonlinear
1960equations $F(z)=0$ in the first place is not just a differentiation task.
1961However, the routine {\sf jac\_solv} described in \autoref{optdrivers} may certainly be very
1962useful for that purpose.
1964As an example consider the following two nonlinear expressions
1966      G_1(z_1,z_2,z_3,z_4) & = & z_1^2+z_2^2-z_3^\\
1967      G_2(z_1,z_2,z_3,z_4) & = & \cos(z_4) - z_1/z_3 \enspace   .
1969The equations $G(z)=0$ describe the relation between the Cartesian
1970coordinates $(z_1,z_2)$ and the polar coordinates $(z_3,z_4)$ in the plane.
1971Now, suppose we are interested in the derivatives of the second Cartesian
1972$y_1=z_2$ and the second (angular) polar coordinate $y_2=z_4$ with respect
1973to the other two variables $x_1=z_1$ and $x_2=z_3$. Then the active section
1974could look simply like
1977\hspace{1.5in}\={\sf for (j=1; j $<$ 5;$\,$j++)}\hspace{0.15in} \= {\sf z[j] \boldmath $\ll=$ \unboldmath  zp[j];}\\
1978\>{\sf g[1] = z[1]*z[1]+z[2]*z[2]-z[3]*z[3]; }\\
1979\>{\sf g[2] = cos(z[4]) - z[1]/z[3]; }\\
1980\>{\sf g[1] \boldmath $\gg=$ \unboldmath gp[1];} \> {\sf g[2] \boldmath $\gg=$ \unboldmath gp[2];}\\
1981\>{\sf z[1] \boldmath $\gg=$ \unboldmath zd[1];} \> {\sf z[3] \boldmath $\gg=$ \unboldmath zd[2];}
1984where {\sf zd[1]} and {\sf zd[2]} are dummy arguments.
1985In the last line the two independent variables {\sf z[1]} and
1986{\sf z[3]} are made
1987simultaneously dependent thus generating a square system that can be
1988inverted (at most arguments). The corresponding projection and seed
1989matrix are
1991P \;=\; \left( \begin{array}{cccc}
1992               1 & 0 & 0 & 0 \\
1993               0 & 0 & 1 & 0
1994            \end{array}\right) \quad \mbox{and} \quad
1995S^T \; = \; \left( \begin{array}{cccc}
1996               0 & 0 & 1 & 0 \\
1997               0 & 0 & 0 & 1
1998            \end{array}\right\enspace .
2000Provided the vector {\sf zp} is consistent in that its Cartesian and polar
2001components describe the same point in the plane the resulting tuple
2002{\sf gp} must vanish. The call to {\sf inverse\_tensor\_eval} with
2003$n=4$, $p=2$ and $d$
2004as desired will yield the implicit derivatives, provided
2005{\sf tensor} has been allocated appropriately of course and $S$ has the value
2006given above.
2008The example is untypical in that the implicit function could also be
2009obtained explicitly by symbolic mani\-pu\-lations. It is typical in that
2010the subset of $z$ components that are to be considered as truly
2011independent can be selected and altered with next to no effort at all.
2013The presented drivers are prototyped in the header file
2014\verb=<adolc/drivers/taylor.h>=. As indicated before this header
2015is included by the global header file \verb=<adolc/adolc.h>= automatically
2016(see \autoref{ssec:DesIH}).
2017The example programs \verb=inversexam.cpp=, \verb=coordinates.cpp= and
2018\verb=trigger.cpp=  in the directory \verb=examples/additional_examples/taylor=
2019show the application of the procedures described here.
2023\section{Basic Drivers for the Forward and Reverse Mode}
2026In this section, we present tailored drivers for different
2027variants of the forward mode and the reverse mode, respectively.
2028For a better understanding, we start with a short
2029description of the mathematical background.
2031Provided no arithmetic exception occurs,
2032no comparison including {\sf fmax} or  {\sf fmin} yields a tie,
2033{\sf fabs} does not yield zero,
2034and all special functions were evaluated in the
2035interior of their domains, the functional relation between the input
2036variables $x$
2037and the output variables $y$ denoted by $y=F(x)$ is in
2038fact analytic.  In other words, we can compute arbitrarily high
2039derivatives of the vector function $F : I\!\!R^n \mapsto I\!\!R^m$ defined
2040by the active section.
2041We find it most convenient to describe and
2042compute derivatives in terms of univariate Taylor expansions, which
2043are truncated after the highest derivative degree $d$ that is desired
2044by the user. Let
2047x(t) \; \equiv \; \sum_{j=0}^dx_jt^j \; : \;  I\!\!R \; \mapsto \;
2050denote any vector polynomial in the scalar variable $t \in I\!\!R$.
2051In other words, $x(t)$ describes a path in $I\!\!R^n$ parameterized by $t$.
2052The Taylor coefficient vectors
2053\[ x_j \; = \; 
2054\frac{1}{j!} \left .  \frac{\partial ^j}{\partial t^j} x(t)
2055\right |_{t=0}
2057are simply the scaled derivatives of $x(t)$ at the parameter
2058origin $t=0$. The first two vectors $x_1,x_2 \in I\!\!R^n$ can be
2059visualized as tangent and curvature at the base point $x_0$,
2061Provided that $F$ is $d$ times continuously differentiable, it
2062follows from the chain rule that the image path
2065 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
2067is also smooth and has $(d+1)$ Taylor coefficient vectors
2068$y_j \in I\!\!R^m$ at $t=0$, so that
2071y(t) \; = \; \sum_{j=0}^d y_jt^j + O(t^{d+1}).
2073Also as a consequence of the chain rule, one can observe that
2074each $y_j$ is uniquely and smoothly determined by the coefficient
2075vectors $x_i$ with $i \leq j$.  In particular we have
2078  y_0 & = F(x_0) \nonumber \\
2079  y_1 & = F'(x_0) x_1 \nonumber\\
2080  y_2 & = F'(x_0) x_2 + \frac{1}{2}F''(x_0)x_1 x_1 \\
2081  y_3 & = F'(x_0) x_3 + F''(x_0)x_1 x_2
2082          + \frac{1}{6}F'''(x_0)x_1 x_1 x_1\nonumber\\
2083  & \ldots\nonumber
2085In writing down the last equations we have already departed from the
2086usual matrix-vector notation. It is well known that the number of
2087terms that occur in these ``symbolic'' expressions for
2088the $y_j$ in terms of the first $j$ derivative tensors of $F$ and
2089the ``input'' coefficients $x_i$ with $i\leq j$ grows very rapidly
2090with $j$. Fortunately, this exponential growth does not occur
2091in automatic differentiation, where the many terms are somehow
2092implicitly combined  so that storage and operations count grow only
2093quadratically in the bound $d$ on $j$.
2095Provided $F$ is analytic, this property is inherited by the functions
2097y_j = y_j (x_0,x_1, \ldots ,x_j) \in {I\!\!R}^m ,
2099and their derivatives satisfy the identities
2102\frac{\partial y_j}{\partial x_i}  = \frac{\partial y_{j-i}}
2103{\partial x_0} = A_{j-i}(x_0,x_1, \ldots ,x_{j-i})
2105as established in \cite{Chri91a}. This yields in particular
2107  \frac{\partial y_0}{\partial x_0} =
2108  \frac{\partial y_1}{\partial x_1} =
2109  \frac{\partial y_2}{\partial x_2} =
2110  \frac{\partial y_3}{\partial x_3} =
2111  A_0 & = F'(x_0) \\
2112  \frac{\partial y_1}{\partial x_0} =
2113  \frac{\partial y_2}{\partial x_1} =
2114  \frac{\partial y_3}{\partial x_2} =
2115  A_1 & = F''(x_0) x_1 \\
2116  \frac{\partial y_2}{\partial x_0} =
2117  \frac{\partial y_3}{\partial x_1} =
2118  A_2 & = F''(x_0) x_2 + \frac{1}{2}F'''(x_0)x_1 x_1 \\
2119  \frac{\partial y_3}{\partial x_0} =
2120  A_3 & = F''(x_0) x_3 + F'''(x_0)x_1 x_2
2121          + \frac{1}{6}F^{(4)}(x_0)x_1 x_1 x_1 \\
2122  & \ldots
2124The $m \times n$ matrices $A_k, k=0,\ldots,d$, are actually the Taylor
2125coefficients of the Jacobian path $F^\prime(x(t))$, a fact that is of
2126interest primarily in the context of ordinary differential
2127equations and differential algebraic equations.
2129Given the tape of an active section and the coefficients $x_j$,
2130the resulting $y_j$ and their derivatives $A_j$ can be evaluated
2131by appropriate calls to the ADOL-C forward mode implementations and
2132the ADOL-C reverse mode implementations. The scalar versions of the forward
2133mode propagate just one truncated Taylor series from the $(x_j)_{j\leq d}$
2134to the $(y_j)_{j\leq d}$. The vector versions of the forward
2135mode propagate families of $p\geq 1$ such truncated Taylor series
2136in order to reduce the relative cost of the overhead incurred
2137in the tape interpretation. In detail, ADOL-C provides
2139\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2140\>{\sf int zos\_forward(tag,m,n,keep,x,y)}\\
2141\>{\sf short int tag;}         \> // tape identification \\
2142\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2143\>{\sf int n;}                 \> // number of independent variables $n$\\
2144\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2145\>{\sf double x[n];}           \> // independent vector $x=x_0$\\
2146\>{\sf double y[m];}           \> // dependent vector $y=F(x_0)$
2148for the {\bf z}ero-{\bf o}rder {\bf s}calar forward mode. This driver computes
2149$y=F(x)$ with $0\leq\text{\sf keep}\leq 1$. The integer
2150flag {\sf keep} plays a similar role as in the call to 
2151{\sf trace\_on}: It determines if {\sf zos\_forward} writes
2152the first Taylor coefficients of all intermediate quantities into a buffered
2153temporary file, i.e., the value stack, in preparation for a subsequent
2154reverse mode evaluation. The value {\sf keep} $=1$
2155prepares for {\sf fos\_reverse} or {\sf fov\_reverse} as exlained below.
2157To compute first-order derivatives, one has
2159\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2160\>{\sf int fos\_forward(tag,m,n,keep,x0,x1,y0,y1)}\\
2161\>{\sf short int tag;}         \> // tape identification \\
2162\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2163\>{\sf int n;}                 \> // number of independent variables $n$\\
2164\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2165\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2166\>{\sf double x1[n];}          \> // tangent vector $x_1$\\
2167\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2168\>{\sf double y1[m];}          \> // first derivative $y_1=F'(x_0)x_1$
2170for the {\bf f}irst-{\bf o}rder {\bf s}calar forward mode. Here, one has
2171$0\leq\text{\sf keep}\leq 2$, where
2173\text{\sf keep} = \left\{\begin{array}{cl}
2174       1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2175       2 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2176       \end{array}\right.
2178as exlained below. For the {\bf f}irst-{\bf o}rder {\bf v}ector forward mode,
2179ADOL-C provides
2181\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2182\>{\sf int fov\_forward(tag,m,n,p,x0,X,y0,Y)}\\
2183\>{\sf short int tag;}         \> // tape identification \\
2184\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2185\>{\sf int n;}                 \> // number of independent variables $n$\\
2186\>{\sf int p;}                 \> // number of directions\\
2187\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2188\>{\sf double X[n][p];}        \> // tangent matrix $X$\\
2189\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2190\>{\sf double Y[m][p];}        \> // first derivative matrix $Y=F'(x)X$
2192For the computation of higher derivative, the driver
2194\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2195\>{\sf int hos\_forward(tag,m,n,d,keep,x0,X,y0,Y)}\\
2196\>{\sf short int tag;}         \> // tape identification \\
2197\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2198\>{\sf int n;}                 \> // number of independent variables $n$\\
2199\>{\sf int d;}                 \> // highest derivative degree $d$\\
2200\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2201\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2202\>{\sf double X[n][d];}        \> // tangent matrix $X$\\
2203\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2204\>{\sf double Y[m][d];}        \> // derivative matrix $Y$
2206implementing the  {\bf h}igher-{\bf o}rder {\bf s}calar forward mode.
2207The rows of the matrix $X$ must correspond to the independent variables in the order of their
2208initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2209$X = \{x_j\}_{j=1\ldots d}$ represent Taylor coefficient vectors as in
2210\eqref{eq:x_of_t}. The rows of the matrix $Y$ must correspond to the
2211dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2212The columns of $Y = \{y_j\}_{j=1\ldots d}$ represent
2213Taylor coefficient vectors as in \eqref{eq:series}, i.e., {\sf hos\_forward}
2214computes the values
2215$y_0=F(x_0)$, $y_1=F'(x_0)x_1$, \ldots, where
2216$X=[x_1,x_2,\ldots,x_d]$ and  $Y=[y_1,y_2,\ldots,y_d]$. Furthermore, one has
2217$0\leq\text{\sf keep}\leq d+1$, with
2219\text{\sf keep}  \left\{\begin{array}{cl}
2220       = 1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2221       > 1 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2222       \end{array}\right.
2224Once more, there is also a vector version given by
2226\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2227\>{\sf int hov\_forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2228\>{\sf short int tag;}         \> // tape identification \\
2229\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2230\>{\sf int n;}                 \> // number of independent variables $n$\\
2231\>{\sf int d;}                 \> // highest derivative degree $d$\\
2232\>{\sf int p;}                 \> // number of directions $p$\\
2233\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2234\>{\sf double X[n][p][d];}     \> // tangent matrix $X$\\
2235\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2236\>{\sf double Y[m][p][d];}     \> // derivative matrix $Y$
2238for the  {\bf h}igher-{\bf o}rder {\bf v}ector forward mode that computes
2239$y_0=F(x_0)$, $Y_1=F'(x_0)X_1$, \ldots, where $X=[X_1,X_2,\ldots,X_d]$ and 
2242There are also overloaded versions providing a general {\sf forward}-call.
2243Details of the appropriate calling sequences are given in \autoref{forw_rev}.
2245Once, the required information is generated due to a forward mode evaluation
2246with an approriate value of the parameter {\sf keep}, one may use the
2247following implementation variants of the reverse mode. To compute first-order derivatives
2248one can use
2250\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2251\>{\sf int fos\_reverse(tag,m,n,u,z)}\\
2252\>{\sf short int tag;}         \> // tape identification \\
2253\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2254\>{\sf int n;}                 \> // number of independent variables $n$\\
2255\>{\sf double u[m];}           \> // weight vector $u$\\
2256\>{\sf double z[n];}           \> // resulting adjoint value $z^T=u^T F'(x)$
2258as {\bf f}irst-{\bf o}rder {\bf s}calar reverse mode implementation that computes
2259the product $z^T=u^T F'(x)$ after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2260{\sf hos\_forward} with {\sf keep}=1. The corresponding {\bf f}irst-{\bf
2261  o}rder {\bf v}ector reverse mode driver is given by
2263\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2264\>{\sf int fov\_reverse(tag,m,n,q,U,Z)}\\
2265\>{\sf short int tag;}         \> // tape identification \\
2266\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2267\>{\sf int n;}                 \> // number of independent variables $n$\\
2268\>{\sf int q;}                 \> // number of weight vectors $q$\\
2269\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2270\>{\sf double Z[q][n];}        \> // resulting adjoint $Z=U F'(x)$
2272that can be used after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2273{\sf hos\_forward} with {\sf keep}=1. To compute higher-order derivatives,
2274ADOL-C provides
2276\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2277\>{\sf int hos\_reverse(tag,m,n,d,u,Z)}\\
2278\>{\sf short int tag;}         \> // tape identification \\
2279\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2280\>{\sf int n;}                 \> // number of independent variables $n$\\
2281\>{\sf int d;}                 \> // highest derivative degree $d$\\
2282\>{\sf double u[m];}           \> // weight vector $u$\\
2283\>{\sf double Z[n][d+1];}      \> // resulting adjoints
2285as {\bf h}igher-{\bf o}rder {\bf s}calar reverse mode implementation yielding
2286the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2287\ldots, where $Z=[z_0,z_1,\ldots,z_d]$ after calling  {\sf fos\_forward} or
2288{\sf hos\_forward} with {\sf keep} $=d+1>1$. The vector version is given by
2290\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2291\>{\sf int hov\_reverse(tag,m,n,d,q,U,Z,nz)}\\
2292\>{\sf short int tag;}         \> // tape identification \\
2293\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2294\>{\sf int n;}                 \> // number of independent variables $n$\\
2295\>{\sf int d;}                 \> // highest derivative degree $d$\\
2296\>{\sf double U[q][m];}        \> // weight vector $u$\\
2297\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints\\
2298\>{\sf short int nz[q][n];}    \> // nonzero pattern of {\sf Z}
2300as {\bf h}igher-{\bf o}rder {\bf v}ector reverse mode driver to compute
2301the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2302\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$ after calling  {\sf fos\_forward} or
2303{\sf hos\_forward} with {\sf keep} $=d+1>1$.
2304After the function call, the last argument of {\sf hov\_reverse} 
2305contains information about the sparsity pattern, i.e. each {\sf nz[i][j]}
2306has a value that characterizes the functional relation between the
2307$i$-th component of $UF^\prime(x)$ and the $j$-th independent value
2308$x_j$ as:
2311 0 & trivial \\
2312 1 & linear
2313\end{tabular} \hspace*{4ex}
2315 2 & polynomial\\
2316 3 & rational
2317\end{tabular} \hspace*{4ex}
2319 4 & transcendental\\
2320 5 & non-smooth
2323Here, ``trivial'' means that there is no dependence at all and ``linear'' means
2324that the partial derivative is a constant that
2325does not dependent on other variables either. ``Non-smooth'' means that one of
2326the functions on the path between $x_i$ and $y_j$ was evaluated at a point
2327where it is not differentiable.  All positive labels
2328$1, 2, 3, 4, 5$ are pessimistic in that the actual functional relation may
2329in fact be simpler, for example due to exact cancellations. 
2331There are also overloaded versions providing a general {\sf reverse}-call.
2332Details of the appropriate calling sequences are given in the following \autoref{forw_rev}.
2335\section{Overloaded Forward and Reverse Calls}
2338In this section, the several versions of the {\sf forward} and
2339{\sf reverse} routines, which utilize the overloading capabilities
2340of C++, are described in detail. With exception of the bit pattern
2341versions all interfaces are prototyped in the header file
2342\verb=<adolc/interfaces.h>=, where also some more specialized {\sf forward}
2343and {\sf reverse} routines are explained. Furthermore, \mbox{ADOL-C} provides
2344C and Fortran-callable versions prototyped in the same header file.
2345The bit pattern versions of {\sf forward} and {\sf reverse} introduced
2346in the \autoref{ProBit} are prototyped in the header file
2347\verb=<adolc/sparse/sparsedrivers.h>=, which will be included by the header
2348file \verb=<adolc/interfaces.h>= automatically.
2351\subsection{The Scalar Case}
2355Given any correct tape, one may call from within
2356the generating program, or subsequently during another run, the following
2360\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2361\>{\sf int forward(tag,m,n,d,keep,X,Y)} \\
2362\>{\sf short int tag;}         \> // tape identification \\
2363\>{\sf int m;}                 \> // number of dependent variables $m$\\
2364\>{\sf int n;}                 \> // number of independent variables $n$\\
2365\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2366\>{\sf  int keep;}             \> // flag for reverse sweep \\ 
2367\>{\sf  double X[n][d+1];}     \> // Taylor coefficients $X$ of
2368                                     independent variables \\
2369\>{\sf double Y[m][d+1];}      \> // Taylor coefficients $Y$ as
2370                                     in \eqref{eq:series}
2373The rows of the matrix $X$ must correspond to the independent variables in the order of their
2374initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2375$X = \{x_j\}_{j=0\ldots d}$ represent Taylor coefficient vectors as in
2376\eqref{eq:x_of_t}. The rows of the matrix $Y$ must
2377correspond to the
2378dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2379The columns of $Y = \{y_j\}_{j=0\ldots d}$ represent
2380Taylor coefficient vectors as in \eqref{eq:series}.
2381Thus the first column of $Y$ contains the
2382function value $F(x)$ itself, the next column represents the first
2383Taylor coefficient vector of $F$, and the last column the
2384$d$-th Taylor coefficient vector. The integer flag {\sf keep} determines
2385how many Taylor coefficients of all intermediate quantities are
2386written into the value stack as explained in \autoref{forw_rev_ad}.
2387 If {\sf keep} is omitted, it defaults to 0.
2389The given {\sf tag} value is used by {\sf forward} to determine the
2390name of the file on which the tape was written. If the tape file does
2391not exist, {\sf forward} assumes that the relevant
2392tape is still in core and reads from the buffers.
2393After the execution of an active section with \mbox{{\sf keep} = 1} or a call to
2394{\sf forward} with any {\sf keep} $\leq$ $d+1$, one may call
2395the function {\sf reverse} with \mbox{{\sf d} = {\sf keep} $-$ 1} and the same tape
2396identifier {\sf tag}. When $u$ is a vector
2397and $Z$ an $n\times (d+1)$ matrix
2398{\sf reverse} is executed in the scalar mode by the calling
2402\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position             
2403\>{\sf int reverse(tag,m,n,d,u,Z)}\\
2404\>{\sf short int tag;}         \> // tape identification \\
2405\>{\sf int m;}                 \> // number of dependent variables $m$\\
2406\>{\sf int n;}                 \> // number of independent variables $n$\\
2407\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2408\>{\sf  double u[m];}          \> // weighting vector $u$\\
2409\>{\sf double Z[n][d+1];}      \> // resulting adjoints $Z$ 
2411to compute
2412the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2413\ldots, where $Z=[z_0,z_1,\ldots,z_d]$.
2416\subsection{The Vector Case}
2420When $U$ is a matrix {\sf reverse} is executed in the vector mode by the following calling sequence
2423\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position       
2424\>{\sf int reverse(tag,m,n,d,q,U,Z,nz)}\\
2425\>{\sf short int tag;}         \> // tape identification \\
2426\>{\sf int m;}                 \> // number of dependent variables $m$\\
2427\>{\sf int n;}                 \> // number of independent variables $n$\\
2428\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2429\>{\sf int q;}                 \> // number of weight vectors $q$\\
2430\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2431\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints \\
2432\>{\sf short nz[q][n];}        \> // nonzero pattern of {\sf Z}
2435to compute the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2436\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$.
2437When the arguments {\sf p} and {\sf U} are omitted, they default to
2438$m$ and the identity matrix of order $m$, respectively. 
2440Through the optional argument {\sf nz} of {\sf reverse} one can compute
2441information about the sparsity pattern of $Z$ as described in detail
2442in the previous \autoref{forw_rev_ad}.
2444The return values of {\sf reverse} calls can be interpreted according
2445to \autoref{retvalues}, but negative return values are not
2446valid, since the corresponding forward sweep would have
2447stopped without completing the necessary taylor file.
2448The return value of {\sf reverse} may be higher
2449than that of the preceding {\sf forward} call because some operations
2450that were evaluated  at a critical argument during the forward sweep
2451were found not to impact the dependents during the reverse sweep.
2453In both scalar and vector mode, the degree $d$ must agree with
2454{\sf keep}~$-$~1 for the most recent call to {\sf forward}, or it must be
2455equal to zero if {\sf reverse} directly follows the taping of an active
2456section. Otherwise, {\sf reverse} will return control with a suitable error
2458In order to avoid possible confusion, the first four arguments must always be
2459present in the calling sequence. However, if $m$ or $d$
2460attain their trivial values 1 and 0, respectively, then
2461corresponding dimensions of the arrays {\sf X}, {\sf Y}, {\sf u},
2462{\sf U}, or {\sf Z} can be omitted, thus eliminating one level of
2463indirection.  For example, we may call
2464{\sf reverse(tag,1,n,0,1.0,g)} after declaring
2465{\sf double g[n]} 
2466to calculate a gradient of a scalar-valued function.
2468Sometimes it may be useful to perform a forward sweep for families of
2469Taylor series with the same leading term.
2470This vector version of {\sf forward} can be called in the form
2473\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2474\>{\sf int forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2475\>{\sf short int tag;}         \> // tape identification \\
2476\>{\sf int m;}                 \> // number of dependent variables $m$\\
2477\>{\sf int n;}                 \> // number of independent variables $n$\\
2478\>{\sf int d;}                 \> // highest derivative degree $d$\\
2479\>{\sf int p;}                 \> // number of Taylor series $p$\\
2480\>{\sf  double x0[n];}          \> // values of independent variables $x_0$\\
2481\>{\sf double X[n][p][d];}     \> // Taylor coefficients $X$ of independent variables\\
2482\>{\sf double y0[m];}           \> // values of dependent variables $y_0$\\
2483\>{\sf double Y[m][p][d];}     \> // Taylor coefficients $Y$ of dependent variables
2486where {\sf X} and {\sf Y} hold the Taylor coefficients of first
2487and higher degree and {\sf x0}, {\sf y0} the common Taylor coefficients of
2488degree 0. There is no option to keep the values of active variables
2489that are going out of scope or that are overwritten. Therefore this
2490function cannot prepare a subsequent reverse sweep.
2491The return integer serves as a flag to indicate quadratures or altered
2492comparisons as described above in \autoref{reuse_tape}.
2494Since the calculation of Jacobians is probably the most important
2495automatic differentia\-tion task, we have provided a specialization
2496of vector {\sf forward} to the case where $d = 1$. This version can be
2497called in the form
2500\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2501\>{\sf int forward(tag,m,n,p,x,X,y,Y)}\\
2502\>{\sf short int tag;}         \> // tape identification \\
2503\>{\sf int m;}                 \> // number of dependent variables $m$\\
2504\>{\sf int n;}                 \> // number of independent variables $n$\\
2505\>{\sf int p;}                 \> // number of partial derivatives $p$ \\
2506\>{\sf double x[n];}          \> // values of independent variables $x_0$\\
2507\>{\sf double X[n][p];}        \> // seed derivatives of independent variables $X$\\
2508\>{\sf double y[m];}           \> // values of dependent variables $y_0$\\
2509\>{\sf double Y[m][p];}        \> // first derivatives of dependent variables $Y$
2512When this routine is called with {\sf p} = {\sf n} and {\sf X} the identity matrix,
2513the resulting {\sf Y} is simply the Jacobian $F^\prime(x_0)$. In general,
2514one obtains the $m\times p$ matrix $Y=F^\prime(x_0)\,X $ for the
2515chosen initialization of $X$. In a workstation environment a value
2516of $p$ somewhere between $10$ and $50$
2517appears to be fairly optimal. For smaller $p$ the interpretive
2518overhead is not appropriately amortized, and for larger $p$ the
2519$p$-fold increase in storage causes too many page faults. Therefore,
2520large Jacobians that cannot be compressed via column coloring
2521as could be done for example using the driver {\sf sparse\_jac}
2522should be ``strip-mined'' in the sense that the above
2523first-order-vector version of {\sf forward} is called
2524repeatedly with the successive \mbox{$n \times p$} matrices $X$ forming
2525a partition of the identity matrix of order $n$.
2528\subsection{Dependence Analysis}
2532The sparsity pattern of Jacobians is often needed to set up data structures
2533for their storage and factorization or to allow their economical evaluation
2534by compression \cite{BeKh96}. Compared to the evaluation of the full
2535Jacobian $F'(x_0)$ in real arithmetic computing the Boolean matrix
2536$\tilde{P}\in\left\{0,1\right\}^{m\times n}$ representing its sparsity
2537pattern in the obvious way requires a little less run-time and
2538certainly a lot less memory.
2540The entry $\tilde{P}_{ji}$ in the $j$-th row and $i$-th column
2541of $\tilde{P}$ should be $1 = true$ exactly when there is a data
2542dependence between the $i$-th independent variable $x_{i}$ and
2543the $j$-th dependent variable $y_{j}$. Just like for real arguments
2544one would wish to compute matrix-vector and vector-matrix products
2545of the form $\tilde{P}\tilde{v}$ or $\tilde{u}^{T}\tilde{P}$ 
2546by appropriate {\sf forward} and {\sf reverse} routines where
2547$\tilde{v}\in\{0,1\}^{n}$ and $\tilde{u}\in\{0,1\}^{m}$.
2548Here, multiplication corresponds to logical
2549{\sf AND} and addition to logical {\sf OR}, so that algebra is performed in a
2552For practical reasons it is assumed that
2553$s=8*${\sf sizeof}$(${\sf unsigned long int}$)$ such Boolean vectors
2554$\tilde{v}$ and $\tilde{u}$ are combined to integer vectors
2555$v\in\N^{n}$ and $u\in\N^{m}$ whose components can be interpreted
2556as bit patterns. Moreover $p$ or $q$ such integer vectors may
2557be combined column-wise or row-wise to integer matrices $X\in\N^{n \times p}$ 
2558and $U\in\N^{q \times m}$, which naturally correspond
2559to Boolean matrices $\tilde{X}\in\{0,1\}^{n\times\left(sp\right)}$
2560and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times m}$. The provided
2561bit pattern versions of {\sf forward} and {\sf reverse} allow
2562to compute integer matrices $Y\in\N^{m \times p}$ and
2563$Z\in\N^{q \times m}$ corresponding to
2566\tilde{Y} = \tilde{P}\tilde{X} \qquad \mbox{and} \qquad 
2567\tilde{Z} = \tilde{U}\tilde{P} \, ,
2569respectively, with $\tilde{Y}\in\{0,1\}^{m\times\left(sp\right)}$
2570and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times n}$.
2571In general, the application of the bit pattern versions of
2572{\sf forward} or {\sf reverse} can be interpreted as
2573propagating dependences between variables forward or backward, therefore
2574both the propagated integer matrices and the corresponding
2575Boolean matrices are called {\em dependence structures}.
2577The bit pattern {\sf forward} routine
2580\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2581\>{\sf int forward(tag,m,n,p,x,X,y,Y,mode)}\\
2582\>{\sf short int tag;}              \> // tape identification \\
2583\>{\sf int m;}                      \> // number of dependent variables $m$\\
2584\>{\sf int n;}                      \> // number of independent variables $n$\\
2585\>{\sf int p;}                      \> // number of integers propagated $p$\\
2586\>{\sf double x[n];}                \> // values of independent variables $x_0$\\
2587\>{\sf unsigned long int X[n][p];}  \> // dependence structure $X$ \\
2588\>{\sf double y[m];}                \> // values of dependent variables $y_0$\\
2589\>{\sf unsigned long int Y[m][p];}  \> // dependence structure $Y$ according to
2590                                     \eqref{eq:int_forrev}\\
2591\>{\sf char mode;}                  \> // 0 : safe mode (default), 1 : tight mode
2594can be used to obtain the dependence structure $Y$ for a given dependence structure
2595$X$. The dependence structures are
2596represented as arrays of {\sf unsigned long int} the entries of which are
2597interpreted as bit patterns as described above.   
2598For example, for $n=3$ the identity matrix $I_3$ should be passed
2599with $p=1$ as the $3 \times 1$ array
2601{\sf X} \; = \;
2602\left( \begin{array}{r}
2603         {\sf 1}0000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2604         0{\sf 1}000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2605         00{\sf 1}00000 \: 00000000 \: 00000000 \: 00000000_2
2606       \end{array} \right)
2608in the 4-byte long integer format. The parameter {\sf mode} determines
2609the mode of dependence analysis as explained already in \autoref{sparse}.
2611A call to the corresponding bit pattern {\sf reverse} routine
2614\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2615\>{\sf int reverse(tag,m,n,q,U,Z,mode)}\\
2616\>{\sf short int tag;}         \> // tape identification \\
2617\>{\sf int m;}                 \> // number of dependent variables $m$\\
2618\>{\sf int n;}                 \> // number of independent variables $n$\\
2619\>{\sf int q;}                 \> // number of integers propagated q\\
2620\>{\sf unsigned long int U[q][m];}  \> // dependence structure $U$ \\
2621\>{\sf unsigned long int Z[q][n];}  \> // dependence structure $Z$ according
2622                                     to \eqref{eq:int_forrev}\\
2623\>{\sf char mode;}        \> // 0 : safe mode (default), 1 : tight mode
2626yields the dependence structure $Z$ for a given dependence structure
2629To determine the whole sparsity pattern $\tilde{P}$ of the Jacobian $F'(x)$
2630as an integer matrix $P$ one may call {\sf forward} or {\sf reverse} 
2631with $p \ge n/s$ or $q \ge m/s$, respectively. For this purpose the
2632corresponding dependence structure $X$ or $U$ must be defined to represent 
2633the identity matrix of the respective dimension.
2634Due to the fact that always a multiple of $s$ Boolean vectors are propagated
2635there may be superfluous vectors, which can be set to zero.
2637The return values of the bit pattern {\sf forward} and {\sf reverse} routines
2638correspond to those described in \autoref{retvalues}.
2640One can control the storage growth by the factor $p$ using
2641``strip-mining'' for the calls of {\sf forward} or {\sf reverse} with successive
2642groups of columns or respectively rows at a time, i.e.~partitioning
2643$X$ or $U$ appropriately as described for the computation of Jacobians
2644in \autoref{vecCas}.
2648\section{Advanced algorithmic differentiation in ADOL-C}
2651\subsection{Differentiating external functions}
2653Ideally, AD is applied to a given computation as a whole.
2654This is not always possible because parts of the computation may
2655be coded in a different programming language or may a call to
2656an external library.
2657In the former case one may want to differentiate the parts in
2658question with a different AD tool or provide hand written derivatives.
2659To integrate these
2660In practice, however, sophisticated projects usually evolve over a long period of time.
2661Within this process, a heterogeneous code base for the project
2662develops, which may include the incorporation of external solutions,
2663changes in programming paradigms or even of programming languages.
2664Equally heterogeneous, the computation of derivative values appears.
2665Hence, different \mbox{AD-tools} may be combined with hand-derived
2666codes based on the same or different programming languages.
2667ADOL-C supports such settings  by  the concept of externally
2668differentiated functions, that is, a function
2669not differentiated by ADOL-C itself. The required derivatives
2670have to be provided by the user.
2672For this purpose, it is required that the externally differentiated
2673function (for example named {\sf\em euler\_step} ) has the following signature.
2677\hspace*{2cm}{\sf int euler\_step(int n, double *x, int m, double *y);}
2681Note that the formal paraemters in the signature have {\sf double} type, that is,
2682they are not active as in the original program before the ADOL-C type change.
2683The externally differentiated function has to
2684be {\em registered}\footnote{we record the function pointer} using an \mbox{ADOL-C} method as follows.
2688\hspace*{2cm}{\sf ext\_diff\_fct *edf = reg\_ext\_fct(euler\_step);}.
2692This returns a pointer to an {\sf ext\_diff\_fct} instance specific to the registered function.
2693Then, the user has to supply the function pointers for the call back methods (for example here
2694{\sf zos\_for\_euler\_step} and {\sf  fos\_rev\_euler\_step}) the user implemented
2695to compute the derivatives as follows.
2697\hspace*{2cm}\= {\sf edf-$>$zos\_forward = {\em zos\_for\_euler\_step};}\\
2698             \> {\sf // function pointer for computing
2699               Zero-Order-Scalar (=zos)}\\
2700             \> {\sf // forward information}\\
2701%             \> {\sf edf-$>$dp\_x = xp;}\\
2702%             \> {\sf edf-$>$dp\_y = yp;}\\
2703%             \> {\sf // double arrays for arguments and results}\\
2704             \> {\sf edf-$>$fos\_reverse = fos\_rev\_euler\_step;} \\
2705             \> {\sf // function pointer for computing
2706               First-Order-Scalar (=fos)}\\ 
2707             \> {\sf reverse information}
2709To facilitate the switch between active and passive versions of the parameters {\sf x} and {\sf y}
2710one has to provide (allocate) both variants. I.e. if the call to {\sf euler\_step} was originally
2712\hspace*{2cm}{\sf rc=euler\_step(n, sOld, m, sNew);}
2714then the ADOL-C typechange for the calling context will turn  {\sf sOld} and {\sf sNew} in {\sf adouble} pointers.
2715To trigger the appropriate action for the derivative computation (i.e. creating an external differentiation entry on the trace)
2716the original call to the externally differentiated function
2717must be substituted by
2721\hspace*{2cm}{\sf rc=call\_ext\_fct(edf, n, sOldPassive, sOld, m, sNewPassive, sNew);}
2725Here, {\sf sOldPassive} and {\sf sNewPassive} are the passive counterparts ({\sf double} pointers allocated to length {\sf n} and {\sf m}, respectively) 
2726to the active arguments {\sf sNew} in {\sf adouble}.
2727The usage of the external function facility is illustrated by the
2728example \verb=ext_diff_func= contained in
2730There,the external differentiated function is also a C code, but the
2731handling as external differentiated functions also a decrease of the
2732overall required tape size.
2735\subsection{Advanced algorithmic differentiation of time integration processes}
2737For many time-dependent applications, the corresponding simulations
2738are based on ordinary or partial differential equations.
2739Furthermore, frequently there are quantities that influence the
2740result of the simulation and can be seen as  control of the systems.
2741To compute an approximation of the
2742simulated process for a time interval $[0,T]$ and evaluated the
2743desired target function, one applies an
2744appropriate integration scheme given by
2746\hspace{5mm} \= some initializations yielding $x_0$\\
2747\> for $i=0,\ldots, N-1$\\
2748\hspace{10mm}\= $x_{i+1} = F(x_i,u_i,t_i)$\\
2749\hspace{5mm} \= evaluation of the target function
2751where $x_i\in {\bf R}^n$ denotes the state and $u_i\in {\bf R}^m$ the control at
2752time $t_i$ for a given time grid $t_0,\ldots,t_N$ with $t_0=0$ and
2753$t_N=T$. The operator $F : {\bf R}^n \times {\bf R}^m \times {\bf R} \mapsto {\bf R}^n$
2754defines the time step to compute the state at time $t_i$. Note that we
2755do not assume a uniform grid.
2757When computing derivatives of the target function with respect to the
2758control, the consequences for the tape generation using the ``basic''
2759taping approach as implemented in ADOL-C so far are shown in the left part of
2764\includegraphics[width=5.8cm]{tapeadv} \hspace*{0.5cm}\
2766\hspace*{0.8cm} Basic taping process \hspace*{4.3cm} Advanced taping process
2767\caption{Different taping approaches}
2770As can be seen, the iterative process is completely
2771unrolled due to the taping process. That is, the tape contains an internal representation of each
2772time step. Hence, the overall tape comprises a serious amount of redundant
2773information as illustrated by the light grey rectangles in
2776To overcome the repeated storage of essentially the same information,
2777a {\em nested taping} mechanism has been incorporated into ADOL-C as illustrated on
2778the right-hand side of \autoref{fig:bas_tap}. This new
2779capability allows the encapsulation of the time-stepping procedure
2780such that only the last time step $x_{N} = F(x_{N-1},u_{N-1})$ is taped as one
2781representative of the time steps in addition to a function pointer to the
2782evaluation procedure $F$ of the time steps.  The function pointer has
2783to be stored for a possibly necessary retaping during the derivative calculation
2784as explained below.
2786Instead of storing the complete tape, only a very limited number of intermediate
2787states are kept in memory. They serve as checkpoints, such that
2788the required information for the backward integration is generated
2789piecewise during the adjoint calculation.
2790For this modified adjoint computation the optimal checkpointing schedules
2791provided by {\bf revolve} are employed. An adapted version of the
2792software package {\sf revolve} is part of ADOL-C and automatically
2793integrated in the ADOL-C library. Based on {\sf revolve}, $c$ checkpoints are
2794distributed such that computational effort is minimized for the given
2795number of checkpoints and time steps $N$. It is important to note that the overall tape
2796size is drastically reduced due to the advanced taping strategy.  For the
2797implementation of this nested taping we introduced
2798a so-called ``differentiating context'' that enables \mbox{ADOL-C} to
2799handle different internal function representations during the taping
2800procedure and the derivative calculation. This approach allows the generation of a new
2801tape inside the overall tape, where the coupling of the different tapes is based on
2802the {\em external differentiated function} described above.
2804Written under the objective of minimal user effort, the checkpointing routines
2805of \mbox{ADOL-C} need only very limited information. The user must
2806provide two routines as implementation of the time-stepping function $F$ 
2807with the signatures
2811\hspace*{2cm}{\sf int time\_step\_function(int n, adouble *u);}\\
2812\hspace*{2cm}{\sf int time\_step\_function(int n, double *u);}
2816where the function names can be chosen by the user as long as the names are
2817unique.It is possible that the result vector of one time step
2818iteration overwrites the argument vector of the same time step. Then, no
2819copy operations are required to prepare the next time step.
2821At first, the {\sf adouble} version of the time step function has to
2822be {\em registered} using the \mbox{ADOL-C} function
2826\hspace*{2cm}{\sf CP\_Context cpc(time\_step\_function);}.
2830This function initializes the structure {\sf cpc}. Then,
2831the user has to provide the remaining checkpointing information
2832by the following commands:
2834\hspace*{2cm}\= {\sf cpc.setDoubleFct(time\_step\_function);}\\
2835             \> {\sf // double variante of the time step function}\\
2836             \> {\sf cpc.setNumberOfSteps(N);}\\
2837             \> {\sf // number of time steps to perform}\\
2838             \> {\sf cpc.setNumberOfCheckpoints(10);}\\
2839             \> {\sf // number of checkpoint} \\
2840             \> {\sf cpc.setDimensionXY(n);}\\
2841             \> {\sf // dimension of input/output}\\
2842             \> {\sf cpc.setInput(y);}\\
2843             \> {\sf // input vector} \\
2844             \> {\sf cpc.setOutput(y);}\\
2845             \> {\sf // output vector }\\
2846             \> {\sf cpc.setTapeNumber(tag\_check);}\\
2847             \> {\sf // subtape number for checkpointing} \\
2848             \> {\sf cpc.setAlwaysRetaping(false);}\\
2849             \> {\sf // always retape or not ?}
2851Subsequently, the time loop in the function evaluation can be
2852substituted by a call of the function
2856\hspace*{2cm}{\sf int cpc.checkpointing();}
2860Then, ADOL-C computes derivative information using the optimal checkpointing
2861strategy provided by {\sf revolve} internally, i.e., completely hidden from the user.
2863The presented driver is prototyped in the header file
2864\verb=<adolc/checkpointing.h>=. This header
2865is included by the global header file \verb=<adolc/adolc.h>= automatically.
2866An example program \verb=checkpointing.cpp= illustrates the
2867checkpointing facilities. It can be found in the directory \verb=examples/additional_examples/checkpointing=.
2871\subsection{Advanced algorithmic differentiation of fixed point iterations}
2873Quite often, the state of the considered system denoted by $x\in\R^n$
2874depends on some design parameters denoted by $u\in\R^m$. One example for this setting
2875forms the flow over an aircraft wing. Here, the shape of the wing that
2876is defined by the design vector $u$ 
2877determines the flow field $x$. The desired quasi-steady state $x_*$
2878fulfills the fixed point equation
2880  \label{eq:fixedpoint}
2881  x_* = F(x_*,u)
2883for a given continuously differentiable function
2884$F:\R^n\times\R^m\rightarrow\R^n$. A fixed point property of this kind is
2885also exploited by many other applications.
2887Assume that one can apply the iteration 
2890 x_{k+1} = F(x_k,u)
2892to obtain a linear converging sequence $\{x_k\}$ generated
2893for any given control $u\in\R^n$. Then the limit point $x_*\in\R^n$ fulfils the fixed
2894point equation~\eqref{eq:fixedpoint}. Moreover,
2895suppose that $\|\frac{dF}{dx}(x_*,u)\|<1$ holds for any pair
2896$(x_*,u)$ satisfying equation \eqref{eq:fixedpoint}.
2897Hence, there exists a
2898differentiable function $\phi:\R^m \rightarrow \R^n$,
2899such that $\phi(u) = F(\phi(u),u)$, where the state
2900$\phi(u)$ is a fixed point of $F$ according to a control
2901$u$. To optimize the system described by the state vector $x=\phi(u)$ with respect to
2902the design vector $u$, derivatives of $\phi$ with respect
2903to $u$ are of particular interest.
2905To exploit the advanced algorithmic differentiation  of such fixed point iterations
2906ADOL-C provides the special functions {\tt fp\_iteration(...)}.
2907It has the following interface:
2909\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2910\>{\sf int
2911  fp\_iteration(}\={\sf sub\_tape\_num,double\_F,adouble\_F,norm,norm\_deriv,eps,eps\_deriv,}\\
2912\>              \>{\sf N\_max,N\_max\_deriv,x\_0,u,x\_fix,dim\_x,dim\_u)}\\
2913\hspace{0.5in}\={\sf short int tag;} \hspace{0.9in}\= \kill    % define tab position
2914\>{\sf short int sub\_tape\_num;}         \> // tape identification for sub\_tape \\
2915\>{\sf int *double\_F;}         \> // pointer to a function that compute for $x$ and $u$ \\
2916\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
2917\>{\sf int *adouble\_F;}        \> // pointer to a function that compute for $x$ and $u$ \\
2918\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
2919\>{\sf int *norm;}              \> // pointer to a function that computes\\
2920\>                              \> // the norm of a vector\\
2921\>{\sf int *norm\_deriv;}       \> // pointer to a function that computes\\
2922\>                              \> // the norm of a vector\\
2923\>{\sf double eps;}             \> // termination criterion for fixed point iteration\\
2924\>{\sf double eps\_deriv;}      \> // termination criterion for adjoint fixed point iteration\\
2925\>{\sf N\_max;}                 \> // maximal number of itertions for state computation\\
2926\>{\sf N\_max\_deriv;}          \> // maximal number of itertions for adjoint computation\\
2927\>{\sf adouble *x\_0;}          \> // inital state of fixed point iteration\\
2928\>{\sf adouble *u;}             \> // value of $u$\\
2929\>{\sf adouble *x\_fic;}        \> // final state of fixed point iteration\\
2930\>{\sf int dim\_x;}             \> // dimension of $x$\\
2931\>{\sf int dim\_u;}             \> // dimension of $u$\\
2934Here {\tt sub\_tape\_num} is an ADOL-C identifier for the subtape that
2935should be used for the fixed point iteration.
2936{\tt double\_F} and {\tt adouble\_F} are pointers to functions, that
2937compute for $x$ and $u$ a single iteration step $y=F(x,u)$. Thereby
2938{\tt double\_F} uses {\tt double} arguments and {\tt adouble\_F}
2939uses ADOL-C {\tt adouble} arguments. The parameters {\tt norm} and
2940{\tt norm\_deriv} are pointers to functions computing the norm
2941of a vector. The latter functions together with {\tt eps},
2942{\tt eps\_deriv}, {\tt N\_max}, and {\tt N\_max\_deriv} control
2943the iterations. Thus the following loops are performed:
2946  do                     &   do                           \\
2947  ~~~~$k = k+1$          &   ~~~~$k = k+1$                \\
2948  ~~~~$x = y$            &   ~~~~$\zeta = \xi$            \\
2949  ~~~~$y = F(x,u)$       &   ~~~
2950  $(\xi^T,\bar u^T) = \zeta^TF'(x_*,u) + (\bar x^T, 0^T)$ \\
2951  while $\|y-x\|\geq\varepsilon$ and $k\leq N_{max}$ \hspace*{0.5cm} &
2952  while $\|\xi -\zeta\|_{deriv}\geq\varepsilon_{deriv}$   \\
2953  & and $k\leq N_{max,deriv}$
2956The vector for the initial iterate and the control is stored
2957in {\tt x\_0} and {\tt u} respectively. The vector in which the
2958fixed point is stored is {\tt x\_fix}. Finally {\tt dim\_x}
2959and {\tt dim\_u} represent the dimensions $n$ and $m$ of the
2960corresponding vectors.
2962The presented driver is prototyped in the header file
2963\verb=<adolc/fixpoint.h>=. This header
2964is included by the global header file \verb=<adolc/adolc.h>= automatically.
2965An example code that shows also the
2966expected signature of the function pointers is contained in the directory \verb=examples/additional_examples/fixpoint_exam=.
2968\subsection{Advanced algorithmic differentiation of OpenMP parallel programs}
2970ADOL-C allows to compute derivatives in parallel for functions
2971containing OpenMP parallel loops.
2972This implies that an explicit loop-handling approach is applied. A
2973typical situation is shown in \autoref{fig:basic_layout},
2975    \vspace{3ex}
2976    \begin{center}
2977        \includegraphics[height=4cm]{multiplexed} \\
2978        \begin{picture}(0,0)
2979            \put(-48,40){\vdots}
2980            \put(48,40){\vdots}
2981            \put(-48,80){\vdots}
2982            \put(48,80){\vdots}
2983            \put(-83,132){function eval.}
2984            \put(5,132){derivative calcul.}
2985        \end{picture}
2986    \end{center}
2987    \vspace{-5ex}
2988    \caption{Basic layout of mixed function and the corresponding derivation process}
2989    \label{fig:basic_layout}
2991where the OpenMP-parallel loop is preceded by a serial startup
2992calculation and followed by a serial finalization phase.
2994Initialization of the OpenMP-parallel regions for \mbox{ADOL-C} is only a matter of adding a macro to the outermost OpenMP statement.
2995Two macros are available that only differ in the way the global tape information is handled.
2996Using {\tt ADOLC\_OPENMP}, this information, including the values of the augmented variables, is always transferred from the serial to the parallel region using {\it firstprivate} directives for initialization.
2997For the special case of iterative codes where parallel regions, working on the same data structures, are called repeatedly the {\tt ADOLC\_OPENMP\_NC} macro can be used.
2998Then, the information transfer is performed only once within the iterative process upon encounter of the first parallel region through use of the {\it threadprivate} feature of OpenMP that makes use of thread-local storage, i.e., global memory local to a thread.
2999Due to the inserted macro, the OpenMP statement has the following structure:
3001\hspace*{1cm} \= {\sf \#pragma omp ... ADOLC\_OPENMP} \qquad \qquad or \\
3002              \> {\sf \#pragma omp ... ADOLC\_OPENMP\_NC}
3004Inside the parallel region, separate tapes may then be created.
3005Each single thread works in its own dedicated AD-environment, and all
3006serial facilities of \mbox{ADOL-C} are applicable as usual. The global
3007derivatives can be computed using the tapes created in the serial and
3008parallel parts of the function evaluation, where user interaction is
3009required for the correct derivative concatenation of the various tapes.
3011For the usage of the parallel facilities, the \verb=configure=-command
3012has to be used with the option \verb?--with-openmp-flag=FLAG?, where
3013\verb=FLAG= stands for the system dependent OpenMP flag.
3014The parallel differentiation of a parallel program is illustrated
3015by the example program \verb=openmp_exam.cpp= contained in \verb=examples/additional_examples/openmp_exam=.
3019\section{Tapeless forward differentiation in ADOL-C}
3022Up to version 1.9.0, the development of the ADOL-C software package
3023was based on the decision to store all data necessary for derivative
3024computation on tapes, where large applications require the tapes to be
3025written out to corresponding files. In almost all cases this means
3026a considerable drawback in terms of run time due to the excessive
3027memory accesses. Using these tapes enables ADOL-C to offer multiple
3028functions. However, it is not necessary for all tasks of derivative
3029computation to do that.
3031Starting with version 1.10.0, ADOL-C now features a tapeless forward
3032mode for computing first order derivatives in scalar mode, i.e.,
3033$\dot{y} = F'(x)\dot{x}$, and in vector mode, i.e., $\dot{Y} = F'(x)\dot{X}$.
3034This tapeless variant coexists with the more universal
3035tape based mode in the package. The following subsections describe
3036the source code modifications required to use the tapeless forward mode of
3039\subsection{Modifying the Source Code}
3041Let us consider the coordinate transformation from Cartesian to spherical
3042polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y
3043= F(x)$, with
3045y_1  =  \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad
3046y_2  =  \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad
3047y_3  =  \arctan(x_2/x_1),
3049as an example. The corresponding source code is shown in \autoref{fig:tapeless}.
3055\= \kill
3056\> {\sf \#include} {\sf $<$iostream$>$}\\
3057\> {\sf using namespace std;}\\
3058\> \\
3059\> {\sf int main() \{}\\
3060\> {\sf \rule{0.5cm}{0pt}double x[3], y[3];}\\
3061\> \\
3062\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3063\> {\sf \rule{1cm}{0pt}...}\\
3064\> \\
3065\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3066\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3067\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3068\> \\
3069\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0] $<<$ " , y2=" $<<$ y[1] $<<$ " , y3=" $<<$ y[2] $<<$ endl;}\\
3070\> \\
3071\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3072\> \}
3077\caption{Example for tapeless forward mode}
3081Changes to the source code that are necessary for applying the
3082tapeless forward ADOL-C are described in the following two
3083subsections, where the vector mode version is described
3084as extension of the scalar mode.
3086\subsubsection*{The scalar mode}
3088To use the tapeless forward mode, one has to include one
3089of the header files \verb#adolc.h# or \verb#adouble.h#
3090where the latter should be preferred since it does not include the
3091tape based functions defined in other header files. Hence, including
3092\verb#adouble.h# avoids mode mixtures, since
3093\verb#adolc.h# is just a wrapper for including all public
3094  headers of the ADOL-C package and does not offer own functions.
3095Since the two ADOL-C forward mode variants tape-based and tapeless,
3096are prototyped in the same header file, the compiler needs to know if a
3097tapeless version is intended. This can be done by defining a
3098preprocessor macro named {\sf ADOLC\_TAPELESS}. Note that it is
3099important to define this macro before the header file is included.
3100Otherwise, the tape-based version of ADOL-C will be used.
3102As in the tape based forward version of ADOL-C all derivative
3103calculations are introduced by calls to overloaded
3104operators. Therefore, similar to the tape-based version all
3105independent, intermediate and dependent variables must be declared
3106with type {\sf adouble}. The whole tapeless functionality provided by
3107\verb#adolc.h# was written as complete inline intended code
3108due to run time aspects, where the real portion of inlined code can
3109be influenced by switches for many compilers. Likely, the whole
3110derivative code is inlined by default. Our experiments
3111with the tapeless mode have produced complete inlined code by using
3112standard switches (optimization) for GNU and Intel C++
3115To avoid name conflicts
3116resulting from the inlining the tapeless version has its own namespace
3117\verb#adtl#. As a result four possibilities of using the {\sf adouble}
3118type are available for the tapeless version:
3120\item Defining a new type
3121      \begin{center}
3122        \begin{tabular}{l}
3123          {\sf typedef adtl::adouble adouble;}\\
3124          ...\\
3125          {\sf adouble tmp;}
3126        \end{tabular}
3127      \end{center}
3128      This is the preferred way. Remember, you can not write an own
3129      {\sf adouble} type/class with different meaning after doing the typedef.
3130\item Declaring with namespace prefix
3131      \begin{center}
3132        \begin{tabular}{l}
3133          {\sf adtl::adouble tmp;}
3134        \end{tabular}
3135      \end{center}
3136      Not the most handsome and efficient way with respect to coding
3137      but without any doubt one of the safest ways. The identifier
3138      {\sf adouble} is still available for user types/classes.
3139\item Trusting macros
3140      \begin{center}
3141        \begin{tabular}{l}
3142          {\sf \#define adouble adtl::adouble}\\
3143          ...\\
3144          {\sf adouble tmp;}
3145        \end{tabular}
3146      \end{center}
3147      This approach should be used with care, since standard defines are text replacements.
3148  \item Using the complete namespace
3149        \begin{center}
3150          \begin{tabular}{l}
3151            {\sf \#using namespace adtl;}\\
3152            ...\\
3153            {\sf adouble tmp;}
3154          \end{tabular}
3155        \end{center}
3156        A very clear approach with the disadvantage of uncovering all the hidden secrets. Name conflicts may arise!
3158After defining the variables only two things are left to do. First
3159one needs to initialize the values of the independent variables for the
3160function evaluation. This can be done by assigning the variables a {\sf
3161double} value. The {\sf ad}-value is set to zero in this case.
3162Additionally, the tapeless forward mode variant of ADOL-C
3163offers a function named {\sf setValue} for setting the value without
3164changing the {\sf ad}-value. To set the {\sf ad}-values of the independent
3165variables ADOL-C offers two possibilities:
3167  \item Using the constructor
3168        \begin{center}
3169          \begin{tabular}{l}
3170            {\sf adouble x1(2,1), x2(4,0), y;}
3171          \end{tabular}
3172        \end{center}
3173        This would create three adoubles $x_1$, $x_2$ and $y$. Obviously, the latter
3174        remains uninitialized. In terms of function evaluation
3175        $x_1$ holds the value 2 and $x_2$ the value 4 whereas the derivative values
3176        are initialized to $\dot{x}_1=1$ and $\dot{x}_2=0$.
3177   \item Setting point values directly
3178         \begin{center}
3179           \begin{tabular}{l}
3180             {\sf adouble x1=2, x2=4, y;}\\
3181             ...\\
3182             {\sf x1.setADValue(1);}\\
3183             {\sf x2.setADValue(0);}
3184           \end{tabular}
3185         \end{center}
3186         The same example as above but now using {\sf setADValue}-method for initializing the derivative values.
3189The derivatives can be obtained at any time during the evaluation
3190process by calling the {\sf getADValue}-method
3192  \begin{tabular}{l}
3193    {\sf adouble y;}\\
3194    ...\\
3195    {\sf cout $<<$ y.getADValue();}
3196  \end{tabular}
3198\autoref{fig:modcode} shows the resulting source code incorporating
3199all required changes for the example
3200given above.
3206\hspace*{-1cm} \= \kill
3207\> {\sf \#include $<$iostream$>$}\\
3208\> {\sf using namespace std;}\\
3209\> \\
3210\> {\sf \#define ADOLC\_TAPELESS}\\
3211\> {\sf \#include $<$adouble.h$>$}\\
3212\> {\sf typedef adtl::adouble adouble;}\\
3214\> {\sf int main() \{}\\
3215\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3217\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3218\> {\sf \rule{1cm}{0pt}...}\\
3220\> {\sf \rule{0.5cm}{0pt}x[0].setADValue(1);\hspace*{3cm}// derivative of f with respect to $x_1$}\\
3221\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3222\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3223\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3225\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3226\> {\sf \rule{0.5cm}{0pt}cout $<<$ "dy2/dx1 = " $<<$ y[1].getADValue() $<<$ endl;}\\
3227\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3228\> {\sf \}}
3232\caption{Example for tapeless scalar forward mode}
3236\subsubsection*{The vector mode}
3238In scalar mode only one direction element has to be stored per {\sf
3239  adouble} whereas a field of $p$ elements is needed in the vector
3240  mode to cover the computations for the given $p$ directions. The
3241  resulting changes to the source code are described in this section.
3243Similar to tapeless scalar forward mode, the tapeless vector forward
3244mode is used by defining {\sf ADOLC\_TAPELESS}. Furthermore, one has to define
3245an additional preprocessor macro named {\sf NUMBER\_DIRECTIONS}. This
3246macro takes the maximal number of directions to be used within the
3247resulting vector mode. Just as {\sf ADOLC\_TAPELESS} the new macro
3248must be defined before including the \verb#<adolc.h/adouble.h>#
3249header file since it is ignored otherwise.
3251In many situations recompiling the source code to get a new number of
3252directions is at least undesirable. ADOL-C offers a function named
3253{\sf setNumDir} to work around this problem partially. Calling this
3254function, ADOL-C does not take the number of directions
3255from the macro {\sf NUMBER\_DIRECTIONS} but from the argument of
3256{\sf setNumDir}. A corresponding source code would contain the following lines: 
3258  \begin{tabular}{l}
3259    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3260    ...\\
3261    {\sf adtl::setNumDir(5);}
3262  \end{tabular}
3264Note that using this function does not
3265change memory requirements that can be roughly determined by
3266({\sf NUMBER\_DIRECTIONS}$+1$)*(number of {\sf adouble}s).
3268Compared to the scalar case setting and getting the derivative
3269values, i.e. the directions, is more involved. Instead of
3270working with single {\sf double} values, pointer to fields of {\sf
3271double}s are used as illustrated by the following example:
3273  \begin{tabular}{l}
3274    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3275    ...\\
3276    {\sf adouble x, y;}\\
3277    {\sf double *ptr=new double[NUMBER\_DIRECTIONS];}\\
3278      ...\\
3279    {\sf x1=2;}\\
3280    {\sf x1.setADValue(ptr);}\\
3281    ...\\
3282    {\sf ptr=y.getADValue();}
3283  \end{tabular}
3285Additionally, the tapeless vector forward mode of ADOL-C offers two
3286new methods for setting/getting the derivative values. Similar
3287to the scalar case, {\sf double} values are used but due to the vector
3288mode the position of the desired vector element must be supplied in
3289the argument list:
3291  \begin{tabular}{l}
3292    {\sf \#define NUMBER\_DIRECTIONS 10}\\
3293    ...\\
3294    {\sf adouble x, y;}\\
3295    ...\\
3296    {\sf x1=2;}\\
3297    {\sf x1.setADValue(5,1);\hspace*{3.7cm}// set the 6th point value of x to 1.0}\\
3298      ...\\
3299    {\sf cout $<<$ y.getADValue(3) $<<$ endl;\hspace*{1cm}// print the 4th derivative value of y}
3300  \end{tabular}
3302The resulting source code containing all changes that are required is
3303shown in \autoref{fig:modcode2}
3308\hspace*{-1cm} \= \kill
3309\> {\sf \#include $<$iostream$>$}\\
3310\> {\sf  using namespace std;}\\
3312\> {\sf \#define ADOLC\_TAPELESS}\\
3313\> {\sf \#define NUMBER\_DIRECTIONS 3}\\
3314\> {\sf \#include $<$adouble.h$>$}\\
3315\> {\sf typedef adtl::adouble adouble;}\\
3319\> {\sf int main() \{}\\
3320\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3322\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3323\> {\sf \rule{1cm}{0pt}...\hspace*{3cm}// Initialize $x_i$}\\
3324\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j) if (i==j) x[i].setADValue(j,1);}\\
3325\> {\sf \rule{0.5cm}{0pt}\}}\\
3327\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3328\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3329\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3331\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3332\> {\sf \rule{0.5cm}{0pt}cout $<<$ "jacobian : " $<<$ endl;}\\
3333\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3334\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j)}\\
3335\> {\sf \rule{1.5cm}{0pt}cout $<<$ y[i].getADValue(j) $<<$ "  ";}\\
3336\> {\sf \rule{1cm}{0pt}cout $<<$ endl;}\\
3337\> {\sf \rule{0.5cm}{0pt}\}}\\
3338\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3339\> {\sf \}}
3342\caption{Example for tapeless vector forward mode}
3346\subsection{Compiling and Linking the Source Code}
3348After incorporating the required changes, one has to compile the
3349source code and link the object files to get the executable.
3350As long as the ADOL-C header files are not included in the absolute path
3351the compile sequence should be similar to the following example:
3353  \begin{tabular}{l}
3354    {\sf g++ -I/home/username/adolc\_base/include -c tapeless\_scalar.cpp}
3355  \end{tabular}
3357The \verb#-I# option tells the compiler where to search for the ADOL-C
3358header files. This option can be omitted when the headers are included
3359with absolute path or if ADOL-C is installed in a ``global'' directory.
3361Since the tapeless forward version of ADOL-C is implemented in the
3362header \verb#adouble.h# as complete inline intended version,
3363the object files do not need to be linked against any external ADOL-C
3364code or the ADOL-C library. Therefore, the example started above could be finished with the
3365following command:
3367  \begin{tabular}{l}
3368    {\sf g++ -o tapeless\_scalar tapeless\_scalar.o}
3369  \end{tabular}
3371The mentioned source codes {\sf tapeless\_scalar.c} and {\sf tapeless\_vector.c} 
3372illustrating the use of the for tapeless scalar and vector mode can be found in
3373the directory {\sf examples}.
3375\subsection{Concluding Remarks for the Tapeless Forward Mode Variant}
3377As many other AD methods the tapeless forward mode provided by the
3378ADOL-C package has its own strengths and drawbacks. Please read the
3379following section carefully to become familiar with the things that
3380can occur:
3382  \item Advantages:
3383    \begin{itemize}
3384      \item Code speed\\
3385        Increasing computation speed was one of the main aspects in writing
3386        the tapeless code. In many cases higher performance can be
3387        expected this way.
3388      \item Easier linking process\\
3389        As another result from the code inlining the object code does
3390        not need to be linked against an ADOL-C library.
3391      \item Smaller overall memory requirements\\
3392        Tapeless ADOL-C does not write tapes anymore, as the name
3393        implies. Loop ''unrolling'' can be avoided this
3394        way. Considered main memory plus disk space as overall memory
3395        requirements the tapeless version can be
3396        executed in a more efficient way.
3397    \end{itemize}
3398  \item Drawbacks:
3399    \begin{itemize}
3400    \item Main memory limitations\\
3401      The ability to compute derivatives to a given function is
3402      bounded by the main memory plus swap size  when using
3403      tapeless ADOL-C. Computation from swap should be avoided anyway
3404      as far as possible since it slows down the computing time
3405      drastically. Therefore, if the program execution is 
3406      terminated without error message insufficient memory size can be
3407      the reason among other things. The memory requirements $M$ can
3408      be determined roughly as followed:
3409      \begin{itemize}
3410        \item Scalar mode: $M=$(number of {\sf adouble}s)$*2 + M_p$
3411        \item Vector mode: $M=$(number of {\sf adouble}s)*({\sf
3412          NUMBER\_DIRECTIONS}$+1) + M_p$ 
3413      \end{itemize}
3414      where the storage size of all non {\sf adouble} based variables is described by $M_p$.
3415    \item Compile time\\
3416      As discussed in the previous sections, the tapeless forward mode of
3417      the ADOL-C package is implemented as inline intended version. Using
3418      this approach results in a higher source code size, since every
3419      operation involving at least one {\sf adouble} stands for the
3420      operation itself as well as for the corresponding derivative
3421      code after the inlining process. Therefore, the compilation time
3422      needed for the tapeless version may be higher than that of the tape based code.
3423    \item Code Size\\
3424      A second drawback and result of the code inlining is the
3425      increase of code sizes for the binaries. The increase
3426      factor compared to the corresponding tape based program is
3427      difficult to quantify as it is task dependent. Practical results
3428      have shown that values between 1.0 and 2.0 can be
3429      expected. Factors higher than 2.0 are possible too and even
3430      values below 1.0 have been observed.
3431    \end{itemize}
3435\section{Traceless forward differentiation in ADOL-C using Cuda}
3438One major drawback using the traceless version of ADOL-C is the fact
3439that several function evaluations are needed to compute derivatives
3440in many different points. More precisely, to calculate the Jacobian for
3441a function $F:\mathbb{R}^n\rightarrow \mathbb{R}^m$ in $M$ points, $M$
3442function evaluations are needed for the traceless vector mode and even
3443$M*n$ for the traceless scalar mode. Depending on the size of the function
3444this can result in a long runtime. To achieve a better performance one can
3445use parallelisation techniques as the same operations are
3446performed during a function evaluation. One possibility is to use GPUs
3447since they are optimized for data parallel computation. Starting with
3448version 2.3.0 ADOL-C now features a traceless forward mode for computing
3449first order derivatives in scalar mode on GPUs using the
3450general purpose parallel computing architecture Cuda.
3452The idea is to include parallel code that executes in many GPU threads across
3453processing elements. This can be done by using kernel functions, that is functions
3454which are executed on GPU as an array of threads in parallel. In general all
3455threads execute the same code. They are grouped into blocks which are then grouped
3456into grids. A kernel is executed as a grid of blocks of threads. For more
3457details see, e.g., the NVIDIA CUDA C Programming Guide which can be downloaded from
3458the web page {\sf}. To solve the problem of calculating
3459the Jacobian of $F$ at $M$ points it is possible to let each thread perform a
3460function evaluation and thus the computation of derivatives for one direction
3461at one point. The advantage is that the function is evaluated in different points
3462in parallel which can result in a faster wallclock runtime.
3464The following subsection describes the source code modifications required to use
3465the traceless forward mode of ADOL-C with Cuda. 
3467\subsection{Modifying the source code}
3469Let us again consider the coordinate transformation from Cartesian to spherical
3470polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y
3471= F(x)$, with
3473y_1  =  \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad
3474y_2  =  \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad
3475y_3  =  \arctan(x_2/x_1),
3477as an example. We now calculate the Jacobian at $M=1024$ different points.
3478The source code for one point is shown in \autoref{fig:tapeless}. This example
3479has no real application but can still be used to show the combination of the traceless version
3480of ADOL-C and Cuda.
3482For the use of this mode a Cuda toolkit, which is suitable for the grafic card used,
3483has to be installed. Furthermore, it is important to check that the graphic card
3484used supports double precision (for details see e.g. NVIDIA CUDA C Programming Guide).
3485Otherwise the data type employed inside of the {\sf adouble} class has to be adapted
3486to {\sf float}. To use the traceless forward mode with Cuda, one has to include the header
3487files \verb#adoublecuda.h# and \verb#cuda.h#. The first one contains the
3488definition of the class {\sf adouble} and the overloaded operators
3489for this version of ADOL-C. As in the other versions all derivative
3490calculations are introduced by calls to overloaded operators. The
3491second header file is needed for the use of Cuda. 
3493One possibility to solve the problem above is the following. First of all
3494three {\sf double} arrays are needed: {\sf x} for the independent variables,
3495{\sf y} for the dependent variables and {\sf deriv} for the values of the
3496Jacobian matrices. The independent variables have to be initialised, therefore
3497the points at which the function should be evaluated are saved in a row in the
3498same array of length $3*M$. For the computation on GPUs one also has to allocate memory
3499on this device. Using the syntax in Cuda one can allocate an array of
3500length $3*M$ (number of independent variables times number of points) for
3501the independent variables as follows:
3503  \begin{tabular}{l}
3504    {\sf double * devx;}\\
3505    {\sf cudaMalloc((void**)\&devx, 3*M*sizeof(double));}\\
3506  \end{tabular}
3508The arrays for the dependent variables and the values of the Jacobian matrices
3509are allocated in the same way. Then the values of the independent variables
3510have to be copied to the GPU using the following command
3512  \begin{tabular}{l}
3513    {\sf cudaMemcpy(devx, x, sizeof(double)*3*M, cudaMemcpyHostToDevice);}\\
3514  \end{tabular}
3516The argument {\sf cudaMemcpyHostToDevice} indicates that the values are copied from
3517the host to the GPU. In this case the values stored in {\sf x} are copied to {\sf devx}.
3519Now all required information has been transferred to the GPU. The changes
3520in the source code made so far are summarized in \autoref{fig:cudacode1}.
3526\hspace*{-1cm} \= \kill
3527\> {\sf \#include $<$iostream$>$}\\
3528\> {\sf \#include $<$cuda.h$>$}\\
3529\> {\sf \#include $<$adoublecuda.h$>$}\\
3530\> {\sf using namespace std;}\\
3532\> {\sf int main() \{}\\
3533\> {\sf \rule{0.5cm}{0pt}int M=1024;}\\
3534\> {\sf \rule{0.5cm}{0pt}double* deriv = new double[9*M];}\\
3535\> {\sf \rule{0.5cm}{0pt}double* y = new double[3*M];}\\
3536\> {\sf \rule{0.5cm}{0pt}double* x = new double[3*M];}\\
3538\> {\sf \rule{0.5cm}{0pt}// Initialize $x_i$}\\
3539\> {\sf \rule{0.5cm}{0pt}for (int k=0; k$<$M; ++k)\{}\\
3540\> {\sf \rule{1cm}{0pt}for (int i=0; i$<$3; ++i)}\\
3541\> {\sf \rule{1.5cm}{0pt}x[k*3+i] =i + 1/(k+1);\}}\\
3543\> {\sf \rule{0.5cm}{0pt}// Allocate array for independent and dependent variables}\\
3544\> {\sf \rule{0.5cm}{0pt}// and Jacobian matrices on GPU}\\
3545\> {\sf \rule{0.5cm}{0pt}double * devx;}\\
3546\> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devx, 3*M*sizeof(double));}\\
3547\> {\sf \rule{0.5cm}{0pt}double * devy;}\\
3548\> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devy, 3*M*sizeof(double));}\\
3549\> {\sf \rule{0.5cm}{0pt}double * devderiv;}\\
3550\> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devderiv, 3*3*M*sizeof(double));}\\
3552\> {\sf \rule{0.5cm}{0pt}// Copy values of independent variables from host to GPU}\\
3553\> {\sf \rule{0.5cm}{0pt}cudaMemcpy(devx, x, sizeof(double)*3*M, cudaMemcpyHostToDevice);}\\
3555\> {\sf \rule{0.5cm}{0pt}// Call function to specify amount of blocks and threads to be used}\\
3556\> {\sf \rule{0.5cm}{0pt}kernellaunch(devx, devy, devderiv,M);}\\
3558\> {\sf \rule{0.5cm}{0pt}// Copy values of dependent variables and Jacobian matrices from GPU to host}\\
3559\> {\sf \rule{0.5cm}{0pt}cudaMemcpy(y, devy, sizeof(double)*3*M,cudaMemcpyDeviceToHost);}\\
3560\> {\sf \rule{0.5cm}{0pt}cudaMemcpy(deriv, devderiv, sizeof(double)*3*3*M, cudaMemcpyDeviceToHost);}\\
3561\> {\sf \}}
3565\caption{Example for traceless scalar forward mode with Cuda}
3569In the next step the user has to specify how many blocks and threads per block
3570will be needed for the function evaluations. In the present example this is done by 
3571the call of the function {\sf kernellaunch}, see \autoref{fig:cudacode2}.
3572In this case the blocks are two dimensional: the x-dimension is determined
3573by the number of points $M$ at which the Jacobian matrix has to be calculated
3574while the y-dimension is given by the number of independent variables, i.e., 3.
3575Since a block cannot contain more than 1024 threads, the
3576x-dimension in the example is 64 instead of $M=1024$. Therefore $1024/64=16$
3577blocks are needed. The described division into blocks is reasonable as each
3578thread has to perform a function evaluation for one point and one direction,
3579hence $M*3$ threads are needed where 3 denotes the number of independent
3580variables corresponding to the number of directions needed for the computation
3581of the Jacobian in one point.
3586\hspace*{-1cm} \= \kill
3587\> {\sf cudaError\_t kernellaunch(double* inx, double* outy, double* outderiv, int M) \{}\\
3588\> {\sf \rule{0.5cm}{0pt}// Create 16 blocks}\\
3589\> {\sf \rule{0.5cm}{0pt}int Blocks=16;}\\
3590\> {\sf \rule{0.5cm}{0pt}// Two dimensional (M/Blocks)$\times$3 blocks}\\
3591\> {\sf \rule{0.5cm}{0pt}dim3 threadsPerBlock(M/Blocks,3);}\\
3593\> {\sf \rule{0.5cm}{0pt}// Call kernel function with 16 blocks with (M/Blocks)$\times$3 threads per block}\\
3594\> {\sf \rule{0.5cm}{0pt}kernel $<<<$ Blocks, threadsPerBlock $>>>$(inx, outy, outderiv);}\\
3595\> {\sf \rule{0.5cm}{0pt}cudaError\_t cudaErr = cudaGetLastError();}\\
3597\> {\sf \rule{0.5cm}{0pt}return cudaErr;}\\
3598\> {\sf \}}
3601\caption{Example for traceless scalar forward mode with Cuda}
3605We can now perform the function evaluations together with the calculation
3606of the Jacobians. The corresponding code is illustrated in \autoref{fig:cudacode3}.
3607Since the function evaluations should be performed on a GPU a kernel function is
3608needed which is defined by using the \mbox{{\sf \_\_ global\_\_}} declaration specifier
3609(see \autoref{fig:cudacode3}). Then each thread executes the operations that are
3610defined in the kernel.
3615\hspace*{-1cm} \= \kill
3616\> {\sf \_\_global\_\_ void kernel(double* inx, double* outy, double* outderiv) \{}\\
3617\> {\sf \rule{0.5cm}{0pt}const int index = threadIdx.x ;}\\
3618\> {\sf \rule{0.5cm}{0pt}const int index1 = threadIdx.y;}\\
3619\> {\sf \rule{0.5cm}{0pt}const int index2 = blockIdx.x;}\\
3620\> {\sf \rule{0.5cm}{0pt}const int index3 = blockDim.x;}\\
3621\> {\sf \rule{0.5cm}{0pt}const int index4 = blockDim.x*blockDim.y;}\\
3623\> {\sf \rule{0.5cm}{0pt}// Declare dependent and independent variables as {\sf adouble}s}\\
3624\> {\sf \rule{0.5cm}{0pt}adtlc::adouble y[3], x[3];}\\
3625\> {\sf \rule{0.5cm}{0pt}// Read out point for function evaluation}\\
3626\> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\
3627\> {\sf \rule{1cm}{0pt}x[i]=inx[index2*index4+index*3+i];}\\
3628\> {\sf \rule{0.5cm}{0pt}// Set direction for calculation of derivatives}\\
3629\> {\sf \rule{0.5cm}{0pt}x[index1].setADValue(1);}\\
3631\> {\sf \rule{0.5cm}{0pt}// Function evaluation}\\
3632\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3633\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3634\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3636\> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\
3637\> {\sf \rule{1cm}{0pt}outy[(index2*index3+index)*3+i]=y[i].getValue();}\\
3638\> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\
3639\> {\sf \rule{1cm}{0pt}outderiv[(index2*index4+index*3+index1)*3+i]=y[i].getADValue();}\\
3640\> {\sf \}}
3643\caption{Example for traceless scalar forward mode with Cuda}
3647In this example each thread is assigned the task of calculating the
3648derivatives in one point with respect to one independent variable. Therefore
3649some indices are needed for the implementation. Each thread has a unique thread
3650ID in a block consisting of an x and an y-dimension. The blocks have an ID as
3651well. In the example the following indices are used.
3654 \item {\sf index = threadIdx.x} denotes the x-dimension of a thread (ranges from 0 to 63 in the example)
3655 \item {\sf index1 = threadIdx.y} denotes the y-dimension of a thread (ranges from 0 to 2 in the example)
3656 \item {\sf index2 = blockIdx.x} denotes the block index (ranges from 0 to 15 in the example)
3657 \item {\sf index3 = blockDim.x} denotes the x-dimension of a block (always 64 in the example)
3658 \item {\sf index4 = blockDim.x*blockDim.y} denotes the size of a block (always $64*3=192$ in the example)
3661For the calculation of derivatives the function evaluation has to be performed
3662with {\sf adouble}s. Therefore the dependent and the independent variables
3663have to be declared as {\sf adouble}s as in the other versions of ADOL-C (see, e.g., \autoref{tapeless}).
3664Similar to the traceless version without Cuda the namespace \verb#adtlc# is used.
3665Now each thread has to read out the point at which the function evaluation is
3666then performed. This is determined by the blockindex and the x-dimension of
3667the thread, therefore the independent variables in a thread
3668have the values
3670  \begin{tabular}{l}
3671    {\sf \rule{0.5cm}{0pt}x[i]=inx[index2*index4+index*3+i];}\\
3672  \end{tabular}
3674for {\sf i=0,1,2} where {\sf inx} corresponds to the vector {\sf devx}. The direction for
3675the derivatives is given by {\sf index1}.
3677  \begin{tabular}{l}
3678     {\sf \rule{0.5cm}{0pt}x[index1].setADValue(1);}\\
3679  \end{tabular}
3681The functions for setting and getting the value and the derivative value of an
3682{\sf adouble} are the same as in the traceless version of ADOL-C for first
3683order derivatives (see \autoref{tapeless}).
3684The function evaluation is then performed with {\sf adouble x} on GPU and the results
3685are saved in {\sf adouble y}. The function evaluation itself remains unchanged
3686(see \autoref{fig:cudacode3}). Then we store the values of the function
3687evaluations in each point in a row in one array:
3689  \begin{tabular}{l}
3690   {\sf \rule{0.5cm}{0pt}outy[(index2*index3+index)*3+i]=y[i].getValue();}\\
3691  \end{tabular}
3693for {\sf i=0,1,2}. The values of a Jacobian are stored column by column in
3694a row:
3696  \begin{tabular}{l}
3697   {\sf \rule{0.5cm}{0pt}outderiv[(index2*index4+index*3+index1)*3+i]=y[i].getADValue();}\\
3698  \end{tabular}
3700for {\sf i=0,1,2}.
3702Now there is one thing left to do. The values calculated on the GPU have to be copied
3703back to host. For the dependent variables in the example this can be done by the
3704following call:
3706  \begin{tabular}{l}
3707 {\sf \rule{0.5cm}{0pt}cudaMemcpy(y, devy, sizeof(double)*3*M,cudaMemcpyDeviceToHost);}\\
3708  \end{tabular}
3710see \autoref{fig:cudacode1}. The argument {\sf cudaMemcpyDeviceToHost} determines that
3711the values are copied from GPU to host.
3714\subsection{Compiling and Linking the Source Code}
3716After incorporating the required changes, one has to compile the
3717source code and link the object files to get the executable. For
3718the compilation of a Cuda file the {\sf nvcc} compiler is needed.
3719The compile sequence should be similar to the following example:
3721  \begin{tabular}{l}
3722    {\sf nvcc -arch=sm\_20 -o traceless\_cuda traceless\}
3723  \end{tabular}
3725The compiler option \verb#-arch=sm_20# specifies the compute capability that is
3726assumed, in this case one that supports double precision.
3728The mentioned source code {\sf traceless\} illustrating the use of the
3729forward traceless scalar mode with Cuda and a further example {\sf}
3730can be found in the directory examples. The second example is an adaption of
3731the OpenMP example to the traceless version with Cuda.
3735\section{Installing and Using ADOL-C}
3738\subsection{Generating the ADOL-C Library}
3741The currently built system is best summarized by the ubiquitous gnu
3742install triplet
3744\verb=configure - make - make install= .
3746Executing this three steps from the package base directory
3747\verb=</SOMEPATH/=\texttt{\packagetar}\verb=>= will compile the static and the dynamic
3748ADOL-C library with default options and install the package (libraries
3749and headers) into the default installation directory {\tt
3750  \verb=<=\$HOME/adolc\_base\verb=>=}. Inside the install directory
3751the subdirectory \verb=include= will contain all the installed header
3752files that may be included by the user program, the subdirectory
3753\verb=lib= will contain the 32-bit compiled library
3754and the subdirectory \verb=lib64= will contain the 64-bit compiled
3755library. Depending on the compiler only one of \verb=lib= or
3756\verb=lib64= may be created.
3758Before doing so the user may modify the header file \verb=usrparms.h=
3759in order to tailor the \mbox{ADOL-C} package to the needs in the
3760particular system environment as discussed in
3761\autoref{Customizing}. The configure procedure which creates the necessary
3762\verb=Makefile=s can be customized by use of some switches. Available
3763options and their meaning can be obtained by executing
3764\verb=./configure --help= from the package base directory.
3766All object files and other intermediately generated files can be
3767removed by the call \verb=make clean=. Uninstalling ADOL-C by
3768executing \verb=make uninstall= is only reasonable after a previous
3769called \verb=make install= and will remove all installed package files
3770but will leave the created directories behind.
3772The sparse drivers are included in the ADOL-C libraries if the
3773\verb=./configure= command is executed with the option
3774\verb=--enable-sparse=. The ColPack library available at
3775\verb= is required to
3776compute the sparse structures, and is searched for in all the default
3777locations as well as in the subdirectory \verb=<ThirdParty/ColPack/>=.
3778In case the library and its headers are installed in a nonstandard path
3779this may be specified with the \verb?--with-colpack=PATH? option.
3780It is assumed that the library and its header files have the following
3781directory structure: \verb?PATH/include? contains all the header
3783\verb?PATH/lib? contains the 32-bit compiled library and
3784\verb?PATH/lib64? contains the 64-bit compiled library. Depending on
3785the compiler used to compile {\sf ADOL-C} one of these libraries will
3786be used for linking.
3788The option \verb=--disable-stdczero= turns off the initialization in the {\sf adouble} default
3789constructor. This will improve efficiency but requires that there be no implicit array initialization in the code, see
3793\subsection{Compiling and Linking the Example Programs}
3795The installation procedure described in \autoref{genlib} also
3796provides the \verb=Makefile=s  to compile the example programs in the
3797directories \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= and the
3798additional examples in
3799\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples/additional_examples=. However,
3800one has to execute the
3801\verb=configure= command with  appropriate options for the ADOL-C package to enable the compilation of
3802examples. Available options are:
3805\verb=--enable-docexa=&build all examples discussed in this manual\\
3806&(compare \autoref{example})\\
3807\verb=--enable-addexa=&build all additional examples\\
3808&(See file \verb=README= in the various subdirectories)
3812Just calling \verb=make= from the packages base directory generates
3813all configured examples and the library if necessary. Compiling from
3814subdirectory \verb=examples= or one of its subfolders is possible
3815too. At least one kind of the ADOL-C library (static or shared) must
3816have been built previously in that case. Hence, building the library
3817is always the first step.
3819For Compiling the library and the documented examples on Windows using
3820Visual Studio please refer to the \verb=<Readme_VC++.txt>= files in
3821the \verb=<windows/>=, \verb=<ThirdParty/ColPack/>= and
3822\verb=<ADOL-C/examples/>= subdirectories.
3824\subsection{Description of Important Header Files}
3827The application of the facilities of ADOL-C requires the user
3828source code (program or module) to include appropriate
3829header files where the desired data types and routines are
3830prototyped. The new hierarchy of header files enables the user
3831to take one of two possible ways to access the right interfaces.
3832The first and easy way is recommended to beginners: As indicated in
3833\autoref{globalHeaders} the provided {\em global} header file
3834\verb=<adolc/adolc.h>= can be included by any user code to support all
3835capabilities of ADOL-C depending on the particular programming language
3836of the source.   
3839\center \small
3841\verb=<adolc/adolc.h>= & 
3843  \boldmath $\rightarrow$ \unboldmath
3844                 & global header file available for easy use of ADOL-C; \\
3845  $\bullet$      & includes all ADOL-C header files depending on
3846                   whether the users source is C++ or C code.
3848\\ \hline
3849\verb=<adolc/usrparms.h>= &
3851  \boldmath $\rightarrow$ \unboldmath
3852                 & user customization of ADOL-C package (see
3853                   \autoref{Customizing}); \\
3854  $\bullet$      & after a change of
3855                   user options the ADOL-C library \verb=libadolc.*=
3856                   has to be rebuilt (see \autoref{genlib}); \\
3857  $\bullet$      & is included by all ADOL-C header files and thus by all user
3858                   programs.
3859\end{tabular*} \\ \hline
3861\caption{Global header files}
3865The second way is meant for the more advanced ADOL-C user: Some source code
3866includes only those interfaces used by the particular application.
3867The respectively needed header files are indicated
3868throughout the manual.
3869Existing application determined dependences between the provided
3870ADOL-C routines are realized by automatic includes of headers in order
3871to maintain easy use. The header files important to the user are described
3872in the \autoref{importantHeaders1} and \autoref{importantHeaders2}.
3875\center \small
3877%\multicolumn{2}{|l|}{\bf Tracing/taping}\\ \hline
3878\verb=<adolc/adouble.h>= & 
3880  \boldmath $\rightarrow$ \unboldmath
3881                & provides the interface to the basic active
3882                  scalar data type of ADOL-C: {\sf class adouble} 
3883                  (see \autoref{prepar});
3884%  $\bullet$     & includes the header files \verb=<adolc/avector.h>= and \verb=<adolc/taputil.h>=.
3886\\ \hline
3887% \verb=<adolc/avector.h>= &
3889%  \boldmath $\rightarrow$ \unboldmath
3890%                & provides the interface to the active vector
3891%                  and matrix data types of ADOL-C: {\sf class adoublev}
3892%                  and {\sf class adoublem}, respectively
3893%                   (see \autoref{arrays}); \\
3894%  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
3896%\\ \hline
3897 \verb=<adolc/taputil.h>= & 
3899  \boldmath $\rightarrow$ \unboldmath
3900                & provides functions to start/stop the tracing of
3901                  active sections (see \autoref{markingActive})
3902                  as well as utilities to obtain
3903                  tape statistics (see \autoref{examiningTape}); \\
3904  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
3906\\ \hline
3908\caption{Important header files: tracing/taping}
3913\center \small
3915%\multicolumn{2}{|l|}{\bf Evaluation of derivatives}\\ \hline
3916\verb=<adolc/interfaces.h>= & 
3918  \boldmath $\rightarrow$ \unboldmath
3919                & provides interfaces to the {\sf forward} and
3920                  {\sf reverse} routines as basic versions of derivative
3921                  evaluation (see \autoref{forw_rev}); \\
3922  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
3923  $\bullet$     & includes the header \verb=<adolc/sparse/sparsedrivers.h>=; \\
3924  $\bullet$     & is included by the header \verb=<adolc/drivers/odedrivers.h>=.
3926\\ \hline
3927\verb=<adolc/drivers.h>= & 
3929  \boldmath $\rightarrow$ \unboldmath
3930                & provides ``easy to use'' drivers for solving
3931                  optimization problems and nonlinear equations
3932                  (see \autoref{optdrivers}); \\
3933  $\bullet$     & comprises C and Fortran-callable versions.
3935\\ \hline
3937\verb=<adolc/sparse/=\newline\verb= sparsedrivers.h>=
3938\end{minipage}  & 
3940  \boldmath $\rightarrow$ \unboldmath
3941                & provides the ``easy to use'' sparse drivers
3942                  to exploit the sparsity structure of
3943                  Jacobians (see \autoref{sparse}); \\
3944  \boldmath $\rightarrow$ \unboldmath & provides interfaces to \mbox{C++}-callable versions
3945                  of {\sf forward} and {\sf reverse} routines
3946                  propagating bit patterns (see \autoref{ProBit}); \\
3948  $\bullet$     & is included by the header \verb=<adolc/interfaces.h>=.
3950\\ \hline
3952\verb=<adolc/sparse/=\newline\verb= sparse_fo_rev.h>=
3953\end{minipage}  & 
3955  \boldmath $\rightarrow$ \unboldmath
3956                & provides interfaces to the underlying C-callable
3957                  versions of {\sf forward} and {\sf reverse} routines
3958                  propagating bit patterns.
3960\\ \hline
3962\verb=<adolc/drivers/=\newline\verb= odedrivers.h>=
3963\end{minipage}  &
3965  \boldmath $\rightarrow$ \unboldmath
3966                & provides ``easy to use'' drivers for numerical
3967                  solution of ordinary differential equations
3968                  (see \autoref{odedrivers}); \\
3969  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
3970  $\bullet$     & includes the header \verb=<adolc/interfaces.h>=.
3972\\ \hline
3974\verb=<adolc/drivers/=\newline\verb= taylor.h>=
3975\end{minipage}  &
3977  \boldmath $\rightarrow$ \unboldmath
3978                & provides ``easy to use'' drivers for evaluation
3979                  of higher order derivative tensors (see
3980                  \autoref{higherOrderDeriv}) and inverse/implicit function
3981                  differentiation (see \autoref{implicitInverse});\\
3982  $\bullet$     & comprises C++ and C-callable versions.
3984\\ \hline
3985\verb=<adolc/adalloc.h>= &
3987  \boldmath $\rightarrow$ \unboldmath
3988                & provides C++ and C functions for allocation of
3989                  vectors, matrices and three dimensional arrays
3990                  of {\sf double}s.
3992\\ \hline
3994\caption{Important header files: evaluation of derivatives}
3998\subsection{Compiling and Linking C/C++ Programs}
4000To compile a C/C++ program or single module using ADOL-C
4001data types and routines one has to ensure that all necessary
4002header files according to \autoref{ssec:DesIH} are
4003included. All modules involving {\em active} data types as
4004{\sf adouble}
4005%, {\bf adoublev} and {\bf adoublem}
4006have to be compiled as C++. Modules that make use of a previously
4007generated tape to evaluate derivatives can either be programmed in ANSI-C
4008(while avoiding all C++ interfaces) or in C++. Depending
4009on the chosen programming language the header files provide
4010the right ADOL-C prototypes.
4011For linking the resulting object codes the library \verb=libadolc.*=
4012must be used (see \autoref{genlib}).
4014\subsection{Adding Quadratures as Special Functions}
4018Suppose an integral
4019\[ f(x) = \int\limits^{x}_{0} g(t) dt \]
4020is evaluated numerically by a user-supplied function
4022{\sf  double  myintegral(double\& x);}
4024Similarly, let us suppose that the integrand itself is evaluated by
4025a user-supplied block of C code {\sf integrand}, which computes a
4026variable with the fixed name {\sf val} from a variable with the fixed
4027name {\sf arg}. In many cases of interest, {\sf integrand} will
4028simply be of the form
4030{\sf \{ val = expression(arg) \}}\enspace .
4032In general, the final assignment to {\sf val} may be preceded
4033by several intermediate calculations, possibly involving local
4034active variables of type {\sf adouble}, but no external or static
4035variables of that type.  However, {\sf integrand} may involve local
4036or global variables of type {\sf double} or {\sf int}, provided they
4037do not depend on the value of {\sf arg}. The variables {\sf arg} and
4038{\sf val} are declared automatically; and as {\sf integrand} is a block
4039rather than a function, {\sf integrand} should have no header line. 
4041Now the function {\sf myintegral} can be overloaded for {\sf adouble}
4042arguments and thus included in the library of elementary functions
4043by the following modifications:
4046At the end of the file \verb=<adouble.cpp>=, include the full code
4047defining \\ {\sf double myintegral(double\& x)}, and add the line
4049{\sf extend\_quad(myintegral, integrand); }
4051This macro is extended to the definition of
4052 {\sf adouble myintegral(adouble\& arg)}.
4053Then remake the library \verb=libadolc.*= (see \autoref{genlib}).
4055In the definition of the class
4056{\sf ADOLC\_DLL\_EXPORT adouble} in \verb=<adolc/adouble.h>=, add the statement
4058{\sf friend adouble myintegral(adouble\&)}.
4061In the first modification, {\sf myintegral} represents the name of the
4062{\sf double} function, whereas {\sf integrand} represents the actual block
4063of C code.
4065For example, in case of the inverse hyperbolic cosine, we have
4066{\sf myintegral} = {\sf acosh}. Then {\sf integrand} can be written as
4067{\sf \{ val = sqrt(arg*arg-1); \}} 
4068so that the line
4070{\sf extend\_quad(acosh,val = sqrt(arg*arg-1));} 
4072can be added to the file \verb=<adouble.cpp>=.
4073A mathematically equivalent but longer representation of
4074{\sf integrand} is
4077{\sf \{ }\hspace{1.0in}\= {\sf  \{ adouble} \= temp =   \kill
4078 \>{\sf  \{ adouble} \> {\sf temp = arg;} \\
4079 \> \ \> {\sf  temp = temp*temp; } \\ 
4080 \> \ \> {\sf  val = sqrt(temp-1); \}} 
4083The code block {\sf integrand} may call on any elementary function that has already
4084been defined in file \verb=<adouble.cpp>=, so that one may also introduce
4085iterated integrals.
4088\section{Example Codes}
4091The following listings are all simplified versions of codes that
4092are contained in the example subdirectory
4093\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= of ADOL-C. In particular,
4094we have left out timings, which are included in the complete codes.
4096\subsection{Speelpenning's Example ({\tt speelpenning.cpp})}
4098The first example evaluates the gradient and the Hessian of
4099the function
4101y \; = \; f(x)\; =\; \prod_{i=0}^{n-1} x_i
4103using the appropriate drivers {\sf gradient} and {\sf hessian}.
4106#include <adolc/adouble.h>               // use of active doubles and taping
4107#include <adolc/drivers/drivers.h>       // use of "Easy to Use" drivers
4108                                   // gradient(.) and hessian(.)
4109#include <adolc/taping.h>                // use of taping
4111void main() {
4112int n,i,j;
4113size_t tape_stats[STAT_SIZE];
4114cout << "SPEELPENNINGS PRODUCT (ADOL-C Documented Example) \n";
4115cout << "number of independent variables = ?  \n";
4116cin >> n;
4117double* xp = new double[n];         
4118double  yp = 0.0;
4119adouble* x = new adouble[n];     
4120adouble  y = 1;
4122  xp[i] = (i+1.0)/(2.0+i);         // some initialization
4123trace_on(1);                       // tag =1, keep=0 by default
4124  for(i=0;i<n;i++) {
4125    x[i] <<= xp[i]; y *= x[i]; }     
4126  y >>= yp;
4127  delete[] x;                     
4129tapestats(1,tape_stats);           // reading of tape statistics
4130cout<<"maxlive "<<tape_stats[2]<<"\n";
4131...                                // ..... print other tape stats
4132double* g = new double[n];       
4133gradient(1,n,xp,g);                // gradient evaluation
4134double** H=(double**)malloc(n*sizeof(double*));
4136  H[i]=(double*)malloc((i+1)*sizeof(double));
4137hessian(1,n,xp,H);                 // H equals (n-1)g since g is
4138double errg = 0;                   // homogeneous of degree n-1.
4139double errh = 0;
4141  errg += fabs(g[i]-yp/xp[i]);     // vanishes analytically.
4142for(i=0;i<n;i++) {
4143  for(j=0;j<n;j++) {
4144    if (i>j)                       // lower half of hessian
4145      errh += fabs(H[i][j]-g[i]/xp[j]); } }
4146cout << yp-1/(1.0+n) << " error in function \n";
4147cout << errg <<" error in gradient \n";
4148cout << errh <<" consistency check \n";
4149}                                  // end main
4152\subsection{Power Example ({\tt powexam.cpp})}
4154The second example function evaluates the $n$-th power of a real
4155variable $x$ in
4156$\log_2 n$ multiplications by recursive halving of the exponent. Since
4157there is only one independent variable, the scalar derivative can be
4158computed by
4159using both {\sf forward} and {\sf reverse}, and the
4160results are subsequently compared.
4162#include <adolc/adolc.h>                 // use of ALL ADOL-C interfaces
4164adouble power(adouble x, int n) {
4165adouble z=1;
4166if (n>0) {                         // recursion and branches
4167  int nh =n/2;                     // that do not depend on
4168  z = power(x,nh);                 // adoubles are fine !!!!
4169  z *= z;
4170  if (2*nh != n)
4171    z *= x;
4172  return z; }                      // end if
4173else {
4174  if (n==0)                        // the local adouble z dies
4175    return z;                      // as it goes out of scope.
4176  else
4177    return 1/power(x,-n); }        // end else
4178} // end power
4180The function {\sf power} above was obtained from the original
4181undifferentiated version by simply changing the type of all
4182{\sf double}s including the return variable to {\sf adouble}s. The new version
4183can now be called from within any active section, as in the following
4184main program.
4186#include ...                       // as above
4187int main() {
4188int i,n,tag=1;
4189cout <<"COMPUTATION OF N-TH POWER (ADOL-C Documented Example)\n\n";
4190cout<<"monomial degree=? \n";      // input the desired degree
4191cin >> n;
4192                                   // allocations and initializations
4193double* Y[1];
4194*Y = new double[n+2];
4195double* X[1];                      // allocate passive variables with
4196*X = new double[n+4];              // extra dimension for derivatives
4197X[0][0] = 0.5;                     // function value = 0. coefficient
4198X[0][1] = 1.0;                     // first derivative = 1. coefficient
4200  X[0][i+2]=0;                     // further coefficients
4201double* Z[1];                      // used for checking consistency
4202*Z = new double[n+2];              // between forward and reverse
4203adouble y,x;                       // declare active variables
4204                                   // beginning of active section
4205trace_on(1);                       // tag = 1 and keep = 0
4206x <<= X[0][0];                     // only one independent var
4207y = power(x,n);                    // actual function call
4208y >>= Y[0][0];                     // only one dependent adouble
4209trace_off();                       // no global adouble has died
4210                                   // end of active section
4211double u[1];                       // weighting vector
4212u[0]=1;                            // for reverse call
4213for(i=0;i<n+2;i++) {               // note that keep = i+1 in call
4214  forward(tag,1,1,i,i+1,X,Y);      // evaluate the i-the derivative
4215  if (i==0)
4216    cout << Y[0][i] << " - " << y.value() << " = " << Y[0][i]-y.value()
4217    << " (should be 0)\n";
4218  else
4219    cout << Y[0][i] << " - " << Z[0][i] << " = " << Y[0][i]-Z[0][i]
4220    << " (should be 0)\n";
4221  reverse(tag,1,1,i,u,Z);          // evaluate the (i+1)-st derivative
4222  Z[0][i+1]=Z[0][i]/(i+1); }       // scale derivative to Taylorcoeff.
4223return 1;
4224}                                  // end main
4226Since this example has only one independent and one dependent variable,
4227{\sf forward} and {\sf reverse} have the same complexity and calculate
4228the same scalar derivatives, albeit with a slightly different scaling.
4229By replacing the function {\sf power} with any other univariate test function,
4230one can check that {\sf forward} and {\sf reverse} are at least consistent.
4231In the following example the number of independents is much larger
4232than the number of dependents, which makes the reverse mode preferable.
4234\subsection{Determinant Example ({\tt detexam.cpp})}
4236Now let us consider an exponentially expensive calculation,
4237namely, the evaluation of a determinant by recursive expansion
4238along rows. The gradient of the determinant with respect to the
4239matrix elements is simply the adjoint, i.e. the matrix of cofactors.
4240Hence the correctness of the numerical result is easily checked by
4241matrix-vector multiplication. The example illustrates the use
4242of {\sf adouble} arrays and pointers. 
4245#include <adolc/adouble.h>               // use of active doubles and taping
4246#include <adolc/interfaces.h>            // use of basic forward/reverse
4247                                   // interfaces of ADOL-C
4248adouble** A;                       // A is an n x n matrix
4249int i,n;                           // k <= n is the order
4250adouble det(int k, int m) {        // of the sub-matrix
4251if (m == 0) return 1.0 ;           // its column indices
4252else {                             // are encoded in m
4253  adouble* pt = A[k-1];
4254  adouble t = zero;                // zero is predefined
4255  int s, p =1;
4256  if (k%2) s = 1; else s = -1;
4257  for(i=0;i<n;i++) {
4258    int p1 = 2*p;
4259    if (m%p1 >= p) {
4260      if (m == p) {
4261        if (s>0) t += *pt; else t -= *pt; }
4262      else {
4263        if (s>0)
4264          t += *pt*det(k-1,m-p);   // recursive call to det
4265        else
4266          t -= *pt*det(k-1,m-p); } // recursive call to det
4267      s = -s;}
4268    ++pt;
4269    p = p1;}
4270  return t; }
4271}                                  // end det
4273As one can see, the overloading mechanism has no problem with pointers
4274and looks exactly the same as the original undifferentiated function
4275except for the change of type from {\sf double} to {\sf adouble}.
4276If the type of the temporary {\sf t} or the pointer {\sf pt} had not been changed,
4277a compile time error would have resulted. Now consider a corresponding
4278calling program.
4281#include ...                       // as above
4282int main() {
4283int i,j, m=1,tag=1,keep=1;
4284cout << "COMPUTATION OF DETERMINANTS (ADOL-C Documented Example)\n\n";
4285cout << "order of matrix = ? \n";  // select matrix size
4286cin >> n;
4287A = new adouble*[n];             
4288trace_on(tag,keep);                // tag=1=keep
4289  double detout=0.0, diag = 1.0;   // here keep the intermediates for
4290  for(i=0;i<n;i++) {               // the subsequent call to reverse
4291    m *=2;
4292    A[i] = new adouble[n];         // not needed for adoublem
4293    adouble* pt = A[i];
4294    for(j=0;j<n;j++)
4295      A[i][j] <<= j/(1.0+i);       // make all elements of A independent
4296    diag += A[i][i].value();        // value() converts to double
4297    A[i][i] += 1.0; }
4298  det(n,m-1) >>= detout;           // actual function call
4299  printf("\n %f - %f = %f  (should be 0)\n",detout,diag,detout-diag);
4301double u[1];
4302u[0] = 1.0;
4303double* B = new double[n*n];
4305cout <<" \n first base? : ";
4306for (i=0;i<n;i++) {
4307  adouble sum = 0;
4308  for (j=0;j<n;j++)                // the matrix A times the first n
4309    sum += A[i][j]*B[j];           // components of the gradient B
4310  cout<<sum.value()<<" "; }         // must be a Cartesian basis vector
4311return 1;
4312}                                  // end main
4314The variable {\sf diag} should be mathematically
4315equal to the determinant, because the
4316matrix {\sf A} is defined as a rank 1 perturbation of the identity.
4318\subsection{Ordinary Differential Equation Example ({\tt odexam.cpp})}
4321Here, we consider a nonlinear ordinary differential equation that
4322is a slight modification of the Robertson test problem
4323given in Hairer and Wanner's book on the numerical solution of
4324ODEs \cite{HW}. The following source code computes the corresponding
4325values of $y^{\prime} \in \R^3$:
4327#include <adolc/adouble.h>                  // use of active doubles and taping
4328#include <adolc/drivers/odedrivers.h>       // use of "Easy To use" ODE drivers
4329#include <adolc/adalloc.h>                  // use of ADOL-C allocation utilities
4331void tracerhs(short int tag, double* py, double* pyprime) {
4332adouble y[3];                         // this time we left the parameters
4333adouble yprime[3];                    // passive and use the vector types
4335for (int i=0; i<3; i++)
4336     y[i] <<= py[i];                  // initialize and mark independents
4337yprime[0] = -sin(y[2]) + 1e8*y[2]*(1-1/y[0]);
4338yprime[1] = -10*y[0] + 3e7*y[2]*(1-y[1]);
4339yprime[2] = -yprime[0] - yprime[1];
4340yprime >>= pyprime;                   // mark and pass dependents
4342}                                     // end tracerhs
4344The Jacobian of the right-hand side has large
4345negative eigenvalues, which make the ODE quite stiff. We  have added
4346some numerically benign transcendentals to make the differentiation
4347more interesting.
4348The following main program uses {\sf forode} to calculate the Taylor series
4349defined by the ODE at the given point $y_0$ and {\sf reverse} as well
4350as {\sf accode} to compute the Jacobians of the coefficient vectors
4351with respect to $x_0$.
4353#include .......                   // as above
4354int main() {
4355int i,j,deg; 
4356int n=3;
4357double py[3];
4358double pyp[3];
4359cout << "MODIFIED ROBERTSON TEST PROBLEM (ADOL-C Documented Example)\n";
4360cout << "degree of Taylor series =?\n";
4361cin >> deg;
4362double **X;
4365  X[i]=(double*)malloc((deg+1)*sizeof(double));
4366double*** Z=new double**[n];
4367double*** B=new double**[n];
4368short** nz = new short*[n];
4369for(i=0;i<n;i++) {
4370  Z[i]=new double*[n];
4371  B[i]=new double*[n];
4372  for(j=0;j<n;j++) {
4373    Z[i][j]=new double[deg];
4374    B[i][j]=new double[deg]; }     // end for
4375}                                  // end for
4376for(i=0;i<n;i++) {
4377  py[i] = (i == 0) ? 1.0 : 0.0;    // initialize the base point
4378  X[i][0] = py[i];                 // and the Taylor coefficient;
4379  nz[i] = new short[n]; }          // set up sparsity array
4380tracerhs(1,py,pyp);                // trace RHS with tag = 1
4381forode(1,n,deg,X);                 // compute deg coefficients
4382reverse(1,n,n,deg-1,Z,nz);         // U defaults to the identity
4384cout << "nonzero pattern:\n";
4385for(i=0;i<n;i++) {
4386  for(j=0;j<n;j++)
4387    cout << nz[i][j]<<"\t";
4388  cout <<"\n"; }                   // end for
4389return 1;
4390}                                  // end main
4392\noindent The pattern {\sf nz} returned by {\sf accode} is
4394              3  -1   4
4395              1   2   2
4396              3   2   4
4398The original pattern {\sf nz} returned by {\sf reverse} is the same
4399except that the negative entry $-1$ was zero.
4401%\subsection {Gaussian Elimination Example ({\tt gaussexam.cpp})}
4404%The following example uses conditional assignments to show the usage of a once produced tape
4405%for evaluation at new arguments. The elimination is performed with
4406%column pivoting.
4408%#include <adolc/adolc.h>           // use of ALL ADOL-C interfaces
4410%void gausselim(int n, adoublem& A, adoublev& bv) {
4411%along i;                           // active integer declaration
4412%adoublev temp(n);                  // active vector declaration
4413%adouble r,rj,temps;
4414%int j,k;
4415%for(k=0;k<n;k++) {                 // elimination loop
4416%  i = k;
4417%  r = fabs(A[k][k]);               // initial pivot size
4418%  for(j=k+1;j<n;j++) {
4419%    rj = fabs(A[j][k]);             
4420%    condassign(i,rj-r,j);          // look for a larger element in the same
4421%    condassign(r,rj-r,rj); }       // column with conditional assignments
4422%  temp = A[i];                     // switch rows using active subscripting
4423%  A[i] = A[k];                     // necessary even if i happens to equal
4424%  A[k] = temp;                     // k during taping
4425%  temps = bv[i];
4426%  bv[i]=bv[k];
4427%  bv[k]=temps;
4428%  if (!value(A[k][k]))             // passive subscripting
4429%    exit(1);                       // matrix singular!
4430%  temps= A[k][k];
4431%  A[k] /= temps;
4432%  bv[k] /= temps;
4433%  for(j=k+1;j<n;j++) {
4434%    temps= A[j][k];
4435%    A[j] -= temps*A[k];            // vector operations
4436%    bv[j] -= temps*bv[k]; }        // endfor
4437%}                                  // end elimination loop
4439%for(k=n-1;k>=0;k--)                // backsubstitution
4440%  temp[k] = (bv[k]-(A[k]*temp))/A[k][k];
4442%}                                  // end gausselim
4444%\noindent This function can be called from any program
4445%that suitably initializes
4446%the components of {\sf A} and {\sf bv}
4447%as independents. The resulting tape can be
4448%used to solve any nonsingular linear system of the same size and
4449%to get the sensitivities of the solution with respect to the
4450%system matrix and the right hand side.
4455Parts of the ADOL-C source were developed by Andreas
4456Kowarz, Hristo Mitev, Sebastian Schlenkrich,  and Olaf
4457Vogel. We are also indebted to George Corliss,
4458Tom Epperly, Bruce Christianson, David Gay,  David Juedes,
4459Brad Karp, Koichi Kubota, Bob Olson,  Marcela Rosemblun, Dima
4460Shiriaev, Jay Srinivasan, Chuck Tyner, Jean Utke, and Duane Yoder for helping in
4461various ways with the development and documentation of ADOL-C.
4466Christian~H. Bischof, Peyvand~M. Khademi, Ali Bouaricha and Alan Carle.
4467\newblock {\em Efficient computation of gradients and Jacobians by dynamic
4468  exploitation of sparsity in automatic differentiation}.
4469\newblock Optimization Methods and Software 7(1):1-39, 1996.
4472Bruce Christianson.
4473\newblock {\em Reverse accumulation and accurate rounding error estimates for
4474Taylor series}.
4475\newblock  Optimization Methods and Software 1:81--94, 1992.
4478Assefaw Gebremedhin, Fredrik Manne, and Alex Pothen.
4479\newblock {\em What color is your {J}acobian? {G}raph coloring for computing
4480  derivatives}.
4481\newblock SIAM Review 47(4):629--705, 2005.
4484Assefaw Gebremedhin, Alex Pothen, Arijit Tarafdar and Andrea Walther.
4485{\em Efficient Computation of Sparse Hessians: An Experimental Study
4486  using ADOL-C}. Tech. Rep. (2006). To appear in INFORMS Journal on Computing.
4488\bibitem{GePoWa08} Assefaw Gebremedhin, Alex Pothen, and Andrea
4489  Walther.
4490{\em Exploiting  Sparsity  in Jacobian Computation via Coloring and Automatic Differentiation:
4491a Case Study in a Simulated Moving Bed Process}.
4492In Chr. Bischof et al., eds.,  {\em Proceedings AD 2008 conference}, LNCSE 64, pp. 327 -- 338, Springer (2008).
4495Assefaw Gebremedhin, Arijit Tarafdar, Fredrik Manne, and Alex Pothen,
4496{\em New Acyclic and Star Coloring Algorithms with Applications to Hessian Computation}.
4497SIAM Journal on Scientific Computing 29(3):1042--1072, 2007.
4501Andreas Griewank and Andrea Walther: {\em Evaluating Derivatives, Principles and Techniques of
4502  Algorithmic Differentiation. Second edition}. SIAM, 2008.
4506Andreas Griewank, Jean Utke, and Andrea Walther.
4507\newblock {\em Evaluating higher derivative tensors by forward propagation
4508          of univariate Taylor series}.
4509\newblock Mathematics of Computation, 69:1117--1130, 2000.
4512Andreas Griewank and Andrea Walther. {\em Revolve: An Implementation of Checkpointing for the Reverse
4513                 or Adjoint Mode of Computational Differentiation},
4514                 ACM Transaction on Mathematical Software 26:19--45, 2000.
4517    Ernst Hairer and Gerhard Wanner.
4518    {\it Solving Ordinary Differential Equations II.\/}
4519    Springer-Verlag, Berlin, 1991.
4522Donald~E. Knuth.
4523\newblock {\em The Art of Computer Programming. Second edition.}
4524\newblock Addison-Wesley, Reading, 1973.
4527Andrea Walther.
4528\newblock {\em Computing Sparse Hessians with Automatic Differentiation}.
4529\newblock Transaction on Mathematical Software, 34(1), Artikel 3 (2008).
Note: See TracBrowser for help on using the repository browser.