source: trunk/ADOL-C/doc/adolc-manual.tex @ 708

Last change on this file since 708 was 708, checked in by kulshres, 3 years ago

Merge branch 'master' of 'gitclone' into 'svn'

The following changes have been merged:

commit e2291bde44a282a133894b0db350aeb0b92a87db
Author: Mladen Banovic <mladenbanovic2705@…>
Date: Fri Jul 8 10:15:51 2016 +0200

Add methods getNumLiveVar and getNumDir in adtl.h, change counter type in FOR_I_EQ_0_LT_NUMDIR macro to size_t (instead of int). Update chunk size of BOOST pool in adouble_tl.cpp according to adouble::numDir.

commit 2ffb294465b973bfd4bf1f73d84478f8233c0d2f
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jun 23 12:32:14 2016 +0200

implement missing ref_eq_mult_p und ref_eq_min_p in ho_rev.c

somehow these were left out when parameters were being implemented.

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 8cf0e5c1bd36f1dcf3be72cd67de631b2e1d0ee6
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jun 23 12:31:04 2016 +0200

make sure the result is the last locint written in trace for each operation

since we're trying to generate ascii traces in the future, we'll need this
convention that the last location is the result, and previous locations
are arguments. This has been the case for almost all operations anyway
except for a few new one's that I wrote without keeping this in mind.

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 9ae0ff220f37463f2ed85cafc8a626c24e472f2f
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Tue Jun 21 14:16:27 2016 +0200

on some compilers newer boost interferes with AC_FUNC_MALLOC test

so do AC_FUNC_MALLOC and AC_FUNC_REALLOC as usual and check for boost
library later.

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit b746f620772cc8cce53e8f350adc6281279caf72
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Mon Jun 20 15:32:22 2016 +0200

make Klaus Röbenack's name UTF-8 instead of ISO-8859-1

These are the only places where we're not simple ASCII or UTF-8 already

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 1171aa3961b5eb46a5d2ee64751c02a393a8a6f5
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Fri Jun 17 10:42:39 2016 +0200

correct short_ref document about include file

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 2c6b2aac2ef04431ece2c6ff80e574aa2e58814b
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Fri Jun 17 10:40:34 2016 +0200

correct error message to new semantics

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 506cde73451740bf0a15eff7d4abb158ee719ab0
Author: mflehmig <martin.flehmig@…>
Date: Fri Jun 17 10:14:26 2016 +0200

Fixed include of ADOL-C header.

ADOL-C header was included in old fashion (without adolc directory) for this example.

commit 2a023d3281d3d6d9824bad724a5768e3ee2fff94
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jun 16 13:50:39 2016 +0200

Try to use boost::pool for allocating advals in traceless vector mode

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 80f1e2019ac1faab96fe06f3e9da47efcc1bcd23
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Mon May 23 15:13:22 2016 +0200

correct a reference in doc and rebuild

commit d7ab5283afe58bacb2e8739d72ede4e17f4c8081
Author: Mladen Banovic <mladenbanovic2705@…>
Date: Fri May 20 16:42:13 2016 +0200

Update section 7 of adolc-manual related to the Traceless forward differentiation.

commit bedb8e36f959c5272e4610fe504acc83208e5e9d
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Tue May 17 16:09:36 2016 +0200

macro name correction

commit 92ff596a0331776901df7f172ca347572e3daafd
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Tue May 17 15:56:17 2016 +0200

Add a warning about using static build of ADOL-C

static build of ADOL-C does not call constructors
for internal global objects, thereby causing
segmentation faults.

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

File size: 227.8 KB
Line 
1% Latex file containing the documentation of ADOL-C
2%
3% Copyright (C) Andrea Walther, Andreas Griewank, Andreas Kowarz,
4%               Hristo Mitev, Sebastian Schlenkrich, Jean Utke, Olaf Vogel
5%
6% This file is part of ADOL-C. This software is provided as open source.
7% Any use, reproduction, or distribution of the software constitutes
8% recipient's acceptance of the terms of the accompanying license file.
9%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
10
11\documentclass[11pt,twoside]{article}
12\usepackage{hyperref}
13\usepackage{amsmath,amsthm,amssymb}
14\usepackage{graphicx}
15\usepackage{datetime}
16
17\newdateformat{monthyear}{\monthname\ \THEYEAR}
18
19\usepackage{color}
20
21\pagestyle{headings} 
22\bibliographystyle{plain}
23\parskip=6pt
24
25\setlength{\textwidth}{433.6pt}
26\setlength{\oddsidemargin}{23pt}
27\setlength{\evensidemargin}{23pt}
28\setlength{\topmargin}{25.0pt}
29\setlength{\textheight}{580pt}
30\setlength{\baselineskip}{8pt}
31
32\newcommand{\N}{{ {\rm I} \kern -.225em {\rm N} }}
33\newcommand{\R}{{ {\rm I} \kern -.225em {\rm R} }}
34\newcommand{\T}{{ {\rm I} \kern -.425em {\rm T} }}
35
36\renewcommand{\sectionautorefname}{Section}
37\renewcommand{\subsectionautorefname}{Section}
38\renewcommand{\figureautorefname}{Figure}
39\renewcommand{\tableautorefname}{Table}
40
41\setcounter{equation}{0}
42
43\input{version.tex}
44
45%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
46\begin{document}
47
48\begin{titlepage}
49\begin{center}
50{\Large {\bf ADOL-C:}} 
51\footnote{The development of earlier versions was supported by the Office of
52  Scientific Computing, U.S. Department of Energy, the NSF, and the Deutsche
53  Forschungsgemeinschaft. During the development of the current
54  version Andrea Walther and Andreas Kowarz were supported by the
55  grant Wa 1607/2-1 of the Deutsche Forschungsgemeinschaft} 
56\vspace{0.2in} \\
57%
58{\Large A Package for the Automatic Differentiation}\vspace{0.1in} \\
59{\Large of Algorithms Written in C/C++}\\
60\vspace{.2in}
61{\large\bf  Version \packageversion, \monthyear\today} \\
62\bigskip
63 \mbox{Andrea Walther}\footnote{Institute of Mathematics, University
64   of Paderborn, 33098 Paderborn, Germany} and
65 \mbox{Andreas Griewank}\footnote{Department of Mathematics,
66 Humboldt-Universit\"at zu Berlin, 10099 Berlin, Germany}
67\end{center}
68%
69%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
70\begin{abstract}
71The C++ package ADOL-C described here facilitates the evaluation of
72first and higher derivatives of vector functions that are defined
73by computer programs written in C or C++. The resulting derivative
74evaluation routines may be called from C, C++, Fortran, or any other
75language that can be linked with C.
76
77The numerical values of derivative vectors are obtained free
78of truncation errors at a small multiple of the run time and
79random access memory required by the given function evaluation program.
80Derivative matrices are obtained by columns, by rows or in sparse format.
81For solution curves defined by ordinary differential equations,
82special routines are provided that evaluate the Taylor coefficient vectors
83and their Jacobians with respect to the current state vector.
84For explicitly or implicitly defined functions derivative tensors are
85obtained with a complexity that grows only quadratically in their
86degree. The derivative calculations involve a possibly substantial but
87always predictable amount of data. Since the data is accessed strictly sequentially
88it can be automatically paged out to external files.
89\end{abstract}
90%
91{\bf Keywords}: Computational Differentiation, Automatic
92         Differentiation,
93         Chain Rule, Overloading, Taylor Coefficients,
94         Gradients, Hessians, Forward Mode, Reverse Mode,
95         Implicit Function Differentiation, Inverse Function Differentiation
96\medskip
97
98\noindent
99{\bf Abbreviated title}: Automatic differentiation by overloading in C++
100%
101\end{titlepage}
102%
103\tableofcontents       
104%
105
106\newpage
107%
108%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
109%
110%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
111\section{Preparing a Section of C or C++ Code for Differentiation}
112\label{prepar}
113%
114\subsection{Introduction}
115%
116\setcounter{equation}{0}
117The package \mbox{ADOL-C} 
118utilizes overloading in C++, but the
119user has to know only C. The acronym stands for {\bf A}utomatic
120{\bf D}ifferentiation by {\bf O}ver{\bf L}oading in {\bf C}++.
121In contrast to source transformation approaches, overloading does not generate intermediate
122source code.
123As starting points to retrieve further information on techniques and
124application of automatic differentiation, as well as on other AD
125tools, we refer to the book \cite{GrWa08}. Furthermore, the web page
126\verb=http://www.autodiff.org= of the AD community forms a rich source
127of further information and pointers.
128
129
130ADOL-C facilitates the simultaneous
131evaluation of arbitrarily high directional derivatives and the
132gradients of these Taylor coefficients with respect to all independent
133variables. Relative to the cost of evaluating the underlying function,
134the cost for evaluating any such scalar-vector pair grows as the
135square of the degree of the derivative but is still completely
136independent of the numbers $m$ and $n$.
137
138This manual is organized as follows. This section explains the
139modifications required to convert undifferentiated code to code that
140compiles with ADOL-C.
141\autoref{tape} covers aspects of the tape of recorded data that ADOL-C uses to
142evaluate arbitrarily high order derivatives. The discussion includes storage
143requirements and the tailoring of certain tape characteristics to fit specific
144user needs. Descriptions of easy-to-use drivers for a  convenient derivative
145evaluation are contained in \autoref{drivers}.
146\autoref{forw_rev_ad} offers a more mathematical characterization of
147the different modes of AD to compute derivatives. At the same time, the
148corresponding drivers of ADOL-C are explained. 
149The overloaded derivative evaluation routines using the forward and the reverse
150mode of AD are explained in \autoref{forw_rev}.
151Advanced differentiation techniques as the optimal checkpointing for
152time integrations, the exploitation of fixed point iterations, the usages
153of external differentiated functions and the differentiation of OpenMP
154parallel programs are described in \autoref{adv_ad}.
155The tapeless forward mode is presented in \autoref{tapeless}.
156\autoref{install} details the installation and
157use of the ADOL-C package. Finally, \autoref{example} 
158furnishes some example programs that incorporate the ADOL-C package to
159evaluate first and higher-order
160derivatives.  These and other examples are distributed with the ADOL-C
161source code.
162The user should simply refer to them if the more abstract and general
163descriptions of ADOL-C provided in this document do not suffice.
164%
165%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
166\subsection{Declaring Active Variables}
167%
168\label{DecActVar}
169%
170The key ingredient of automatic differentiation by overloading is the
171concept of an {\em active variable}. All variables that may be
172considered as differentiable quantities at some time
173during the program execution must be of an active
174type. ADOL-C uses one
175active scalar type, called {\sf adouble}, whose real part is of the
176standard type {\sf double}.
177Typically, one will declare the independent variables
178and all quantities that directly or indirectly depend on them as
179{\em active}. Other variables that do not depend on the independent
180variables but enter, for example, as parameters, may remain one of the
181{\em passive} types {\sf double, float}, or {\sf int}. There is no
182implicit type conversion from {\sf adouble} to any of these passive
183types; thus, {\bf failure to declare variables as active when they
184depend on other active variables will result in a compile-time error
185message}. In data flow terminology, the set of active variable names
186must contain all its successors in the dependency graph. All components
187of indexed arrays must have the same activity status.
188
189The real component of an {\sf adouble x} can be extracted as
190{\sf x.value()}. In particular,
191such explicit conversions are needed for the standard output procedure
192{\sf printf}. The output stream operator \boldmath $\ll$ \unboldmath is overloaded such
193that first the real part of an {\sf adouble} and then the string
194``{\sf (a)}" is added to the stream. The input stream operator \boldmath $\gg$ \unboldmath  can
195be used to assign a constant value to an {\sf adouble}.
196Naturally, {\sf adouble}s may be
197components of vectors, matrices, and other arrays, as well as
198members of structures or classes.
199
200The C++ class {\sf adouble}, its member functions, and the overloaded
201versions of all arithmetic operations, comparison operators, and
202most ANSI C functions are contained in the file \verb=adouble.cpp= and its
203header \verb=<adolc/adouble.h>=. The latter must be included for compilation
204of all program files containing {\sf adouble}s and corresponding
205operations.
206%
207%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
208\subsection{Marking Active Sections}
209\label{markingActive}
210%
211All calculations involving active variables that occur between
212the void function calls
213\begin{center}
214{\sf trace\_on(tag,keep)} \hspace{0.3in} and \hspace{0.3in}
215{\sf trace\_off(file)}
216\end{center}
217are recorded on a sequential data set called {\em tape}. Pairs of
218these function calls can appear anywhere in a C++ program, but
219they must not overlap. The nonnegative integer argument {\sf tag} identifies the
220particular tape for subsequent function or derivative evaluations.
221Unless several tapes need to be kept, ${\sf tag} =0$ may be used throughout.
222The optional integer arguments {\sf keep} and
223{\sf file} will be discussed in \autoref{tape}. We will refer to the
224sequence of statements executed between a particular call to
225{\sf trace\_on} and the following call to {\sf trace\_off} as an
226{\em active section} of the code. The same active section may be
227entered repeatedly, and one can successively generate several traces
228on distinct tapes by changing the value of {\sf tag}.
229Both functions {\sf trace\_on} and {\sf trace\_off} are prototyped in
230the header file \verb=<adolc/taputil.h>=, which is included by the header
231\verb=<adolc/adouble.h>= automatically.
232
233Active sections may contain nested or even recursive calls to functions
234provided by the user. Naturally, their formal and actual parameters
235must have matching types. In particular, the functions must be
236compiled with their active variables declared as
237{\sf adouble}s and with the header file \verb=<adolc/adouble.h>= included. 
238Variables of type {\sf adouble} may be declared outside an active section and need not
239go out of scope before the end of an active section.
240It is not necessary -- though desirable -- that free-store {\sf adouble}s
241allocated within
242an active section be deleted before its completion. The values of all
243{\sf adouble}s that exist at the beginning and end of an active section
244are automatically
245recorded by {\sf trace\_on} and {\sf trace\_off}, respectively.
246%
247%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
248\subsection{Selecting Independent and Dependent Variables}
249%
250One or more active variables that are read in or initialized to
251the values of constants or passive variables must be distinguished as
252independent variables. Other active variables that are similarly
253initialized may be considered as temporaries (e.g., a variable that
254accumulates the partial sums of a scalar product after being
255initialized to zero). In order to distinguish an active variable {\sf x} as
256independent, ADOL-C requires an assignment of the form
257\begin{center}
258{\sf x} \boldmath $\ll=$ \unboldmath {\sf px}\hspace{0.2in}// {\sf px} of any passive numeric type $\enspace .$
259\end{center}
260This special initialization ensures that {\sf x.value()} = {\sf px}, and it should
261precede any other assignment to {\sf x}. However, {\sf x} may be reassigned
262other values subsequently. Similarly, one or more active variables {\sf y}
263must be distinguished as dependent by an assignment of the form
264\begin{center}
265{\sf y \boldmath $\gg=$ \unboldmath py}\hspace{0.2in}// {\sf py} of any  passive type $\enspace ,$ 
266\end{center}
267which ensures that {\sf py} = {\sf y.value()} and should not be succeeded
268by any other assignment to {\sf y}. However, a dependent variable {\sf y} 
269may have been assigned other real values previously, and it could even be an
270independent variable as well.  The derivative values calculated after
271the
272completion of an active section always represent {\bf derivatives of the final
273values of the dependent variables with respect to the initial values of the
274independent variables}.
275
276The order in which the independent and dependent variables are marked
277by the \boldmath $\ll=$ \unboldmath and \boldmath $\gg=$ \unboldmath statements matters crucially for the subsequent
278derivative evaluations. However, these variables do not have to be
279combined into contiguous vectors. ADOL-C counts the number of
280independent and dependent variable specifications within each active
281section and records them in the header of the tape.
282%
283%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
284\subsection{A Subprogram as an Active Section} 
285%
286As a generic example let us consider a C(++) function of the form
287shown in \autoref{code1}.
288%
289\begin{figure}[hbt]
290\framebox[\textwidth]{\parbox{\textwidth}{
291\begin{center}
292\begin{tabbing}
293{\sf void eval(}\= {\sf int n, int m,} \hspace{0.5 in} \=  // number of independents and dependents\\
294\>{\sf  double *x,} \> // independent variable vector \\
295\>{\sf  double *y,} \> // dependent variable vector  \\ 
296\> {\sf int *k, } \> // integer parameters \\ 
297\>{\sf  double *z)}  \> // real parameters \\
298{\sf \{ }\hspace{0.1 in } \=  \> // beginning of function body \\
299\>{\sf double t = 0;}  \> // local variable declaration \\
300\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> // begin of computation \\
301\>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue  \\
302\>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue \\
303\>{\sf  y[m-1] = t/m; }   \> //   end of computation \\
304{\sf  \} } \>  \> // end of function
305\end{tabbing}
306\end{center}
307}} 
308\caption{Generic example of a subprogram to be activated}
309\label{code1}
310\end{figure}
311%
312
313If {\sf eval} is to be called from within an active C(++)
314section with {\sf x}
315and {\sf y} as vectors of {\sf adouble}s and the other parameters
316passive, then one merely has to change the type declarations of all
317variables that depend on {\sf x} from {\sf double} or {\sf float} to
318{\sf adouble}. Subsequently, the subprogram must be compiled with the
319header file \verb=<adolc/adouble.h>= included as described
320in \autoref{DecActVar}. Now let us consider the situation when {\sf eval} is
321still to be called with integer and real arguments, possibly from
322a program written in Fortran77, which  does not allow overloading.
323
324To automatically compute derivatives of the dependent
325variables {\sf y} with respect to the independent variables {\sf x}, we
326can make the body of the function into an active section. For
327example, we may modify the previous program segment
328as in \autoref{adolcexam}.
329The renaming and doubling up of the original independent and dependent
330variable vectors by active counterparts may seem at first a bit clumsy.
331However, this transformation has the advantage that the calling
332sequence and the computational part, i.e., where the function is
333really evaluated, of {\sf eval} remain completely
334unaltered. If the temporary variable {\sf t} had remained a {\sf double},
335the code would not compile, because of a type conflict in the assignment
336following the declaration. More detailed example codes are listed in
337\autoref{example}.
338
339\begin{figure}[htb]
340\framebox[\textwidth]{\parbox{\textwidth}{
341\begin{center}
342\begin{tabbing}
343{\sf void eval(} \= {\sf  int n,m,} \hspace{1.0 in}\= // number of independents and dependents\\
344\> {\sf double *px,} \> // independent passive variable vector \\
345\> {\sf double *py,} \> // dependent passive variable vector  \\ 
346\> {\sf int *k,}  \> // integer parameters \\
347\> {\sf double *z)} \> // parameter vector \\
348{\sf \{}\hspace{0.1 in}\= \> // beginning of function body \\
349\>{\sf  short int tag = 0;} \>   // tape array and/or tape file specifier\\
350\>{\sf trace\_on(tag);} \> // start tracing  \\
351\>{\sf adouble *x, *y;} \> // declare active variable pointers \\
352\>{\sf x = new adouble[n];}\>// declare active independent variables \\ 
353\>{\sf y = new adouble[m];} \> // declare active dependent variables \\
354\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \\
355\>\hspace{0.2in} {\sf x[i] \boldmath $\ll=$ \unboldmath  px[i];} \> // select independent variables \\
356\>{\sf adouble t = 0;}  \> // local variable declaration \\
357     \>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> //  begin crunch \\
358     \>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue crunch \\
359     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
360     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
361     \>{\sf  y[m-1] = t/m; }   \> //   end crunch as before\\
362     \>{\sf for (int j=0; j \boldmath $<$ \unboldmath m; j++)} \\
363     \>\hspace{0.2in}{\sf y[j] \boldmath $\gg=$ \unboldmath py[j];} \> // select dependent variables \\
364     \>{\sf  delete[] y;} \>// delete dependent active variables \\
365     \>{\sf  delete[] x;} \>// delete independent active variables \\
366     \>{\sf trace\_off();} \> // complete tape \\
367{\sf  \}}   \>\> // end of function
368\end{tabbing} 
369\end{center}}}
370\caption{Activated version of the code listed in \autoref{code1}}
371\label{adolcexam}
372\end{figure}
373%
374%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
375\subsection{Overloaded Operators and Functions}
376\label{OverOper}
377%
378As in the subprogram discussed above, the actual computational
379statements of a C(++) code need not be altered for the purposes of
380automatic differentiation. All arithmetic operations, as well as the
381comparison and assignment operators, are overloaded, so any or all of
382their operands can be an active variable. An {\sf adouble x} occurring
383in a comparison operator is effectively replaced by its real value
384{\sf x.value()}. Most functions contained in the ANSI C standard for
385the math library are overloaded for active arguments. The only
386exceptions are the non-differentiable functions {\sf fmod} and
387{\sf modf}. Otherwise, legitimate C code in active sections can remain
388completely unchanged, provided the direct output of active variables
389is avoided. The rest of this subsection may be skipped by first time
390users who are not worried about marginal issues of differentiability
391and efficiency.
392
393The modulus {\sf fabs(x)} is everywhere Lipschitz continuous but not
394properly differentiable at the origin, which raises the question of
395how this exception ought to be handled. Fortunately, one can easily
396see that {\sf fabs(x)} and all its compositions with smooth
397functions are still directionally differentiable. These
398directional derivatives of arbitrary order can be propagated in the
399forward mode without any ambiguity. In other words, the forward mode as
400implemented in ADOL-C  computes Gateaux derivatives
401in certain directions, which reduce to Fr\'echet derivatives only
402if the dependence on the direction is linear. Otherwise,
403the directional derivatives are merely positively homogeneous with
404respect to the scaling of the directions.
405For the reverse mode, ADOL-C sets the derivative of {\sf fabs(x)} at
406the origin somewhat arbitrarily to zero.
407
408We have defined binary functions {\sf fmin} and {\sf fmax} for {\sf adouble}
409arguments, so that function and derivative values are obtained consistent
410with those of {\sf fabs} according to the identities
411\[
412 \min(a,b) = [a+b-|a-b|]/2 \quad {\rm and} \quad
413 \max(a,b) = [a+b+|a-b|]/2 \quad .
414\]
415These relations cannot hold if either $a$ or $b$ is infinite, in which
416case {\sf fmin} or {\sf fmax} and their derivatives may still be well
417defined. It should be noted that the directional differentiation of
418{\sf fmin} and {\sf fmax} yields at ties $a=b$ different results from
419the corresponding assignment based on the sign of $a-b$. For example,
420the statement
421\begin{center}
422 {\sf if (a $<$ b) c = a; else c = b;}
423\end{center} 
424yields for {\sf a}~=~{\sf b} and {\sf a}$^\prime < $~{\sf b}$^\prime$
425the incorrect directional derivative value
426{\sf c}$^\prime = $~{\sf  b}$^\prime$ rather than the correct
427{\sf c}$^\prime = $~{\sf  a}$^\prime$. Therefore this form of conditional assignment
428should be avoided by use of the function $\sf fmin(a,b)$. There
429are also versions of {\sf fmin} and {\sf fmax} for two passive
430arguments and mixed passive/active arguments are handled by
431implicit conversion.
432On the function class obtained by composing the modulus with real
433analytic functions, the concept of directional differentiation can be
434extended to the propagation of unique one-sided Taylor expansions.
435The branches taken by {\sf fabs, fmin}, and {\sf fmax}, are recorded
436on the tape.
437
438The functions {\sf sqrt}, {\sf pow}, and some inverse trigonometric
439functions have infinite slopes at the boundary points of their domains.
440At these marginal points the derivatives are set by ADOL-C to
441either {\sf $\pm$InfVal}, 0
442or {\sf NoNum}, where {\sf InfVal} and {\sf NoNum} are user-defined
443parameters, see \autoref{Customizing}.
444On IEEE machines {\sf InfVal} can be set to the special value
445{\sf Inf}~=~$1.0/0.0$ and {\sf NoNum} to {\sf NaN}~=~$0.0/0.0$.
446For example, at {\sf a}~=~0 the first derivative {\sf b}$^\prime$ 
447of {\sf b}~=~{\sf sqrt(a)} is set to
448\[
449{\sf b}^\prime = \left\{
450\begin{array}{ll}
451\sf InfVal&\mbox{if}\;\; {\sf a}^\prime>0  \\
4520&\mbox{if}\;\;{\sf a}^\prime =0 \\
453\sf NoNum&\mbox{if}\;\;{\sf a}^\prime <0\\
454\end{array} \right\enspace .
455\]
456In other words, we consider {\sf a} and
457consequently {\sf b}  as a constant when {\sf a}$^\prime$ or more generally
458all computed Taylor coefficients are zero.
459
460The general power function ${\sf pow(x,y)=x^y}$ is computed whenever
461it is defined for the corresponding {\sf double} arguments. If {\sf x} is
462negative, however, the partial derivative with respect to an integral exponent
463is set to zero.
464%Similarly, the partial of {\bf pow} with respect to both arguments
465%is set to zero at the origin, where both arguments vanish.     
466The derivatives of the step functions
467{\sf floor}, {\sf ceil}, {\sf frexp}, and {\sf ldexp} are set to zero at all
468arguments {\sf x}. The result values of the step functions
469are recorded on the tape and can later be checked to recognize
470whether a step to another level was taken during a forward sweep
471at different arguments than at taping time.
472
473Some C implementations supply other special
474functions, in particular the error function {\sf erf(x)}. For the
475latter, we have included an {\sf adouble} version in \verb=<adouble.cpp>=, which
476has been commented out for systems on which the {\sf double} valued version
477is not available. The increment and decrement operators {\sf ++}, \boldmath $--$ \unboldmath (prefix and
478postfix) are available for {\sf adouble}s.
479%
480% XXX: Vector and matrix class have to be reimplemented !!!
481%
482% and also the
483%active subscripts described in the \autoref{act_subscr}.
484Ambiguous statements like {\sf a += a++;} must be
485avoided because the compiler may sequence the evaluation of the
486overloaded
487expression differently from the original in terms of {\sf double}s.
488
489As we have indicated above, all subroutines called with active arguments
490must be modified or suitably overloaded. The simplest procedure is
491to declare the local variables of the function as active so that
492their internal calculations are also recorded on the tape.
493Unfortunately, this approach is likely to be unnecessarily inefficient
494and inaccurate if the original subroutine evaluates a special function
495that is defined as the solution of a particular mathematical problem.
496The most important examples are implicit functions, quadratures,
497and solutions of ordinary differential equations. Often
498the numerical methods for evaluating such special functions are
499elaborate, and their internal workings are not at all differentiable in
500the data. Rather than differentiating through such an adaptive
501procedure, one can obtain first and higher derivatives directly from
502the mathematical definition of the special function. Currently this
503direct approach has been implemented only for user-supplied quadratures
504as described in \autoref{quadrat}.
505%
506%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
507\subsection{Reusing the Tape for Arbitrary Input Values}
508\label{reuse_tape}
509%
510In some situations it may be desirable to calculate the value and
511derivatives of a function at arbitrary arguments by using a tape of
512the function evaluation at one argument and reevaluating the
513function  and its derivatives using the given ADOL-C
514routines. This approach can
515significantly reduce run times, and it
516also allows to port problem functions, in the form of the 
517corresponding tape files, into a computing environment that
518does not support C++ but does support C or Fortran. 
519Therefore, the routines provided by ADOL-C for the evaluation of derivatives
520can be used to at arguments $x$ other than the
521point at which the tape was generated, provided there are
522no user defined quadratures and all comparisons involving
523{\sf adouble}s yield the same result. The last condition
524implies that the control flow is unaltered by the change
525of the independent variable values. Therefore, this sufficient
526condition is tested by ADOL-C and if it is not met
527the ADOL-C routine called for derivative calculations indicates this
528contingency through its return value. Currently, there are six return values,
529see \autoref{retvalues}.
530\begin{table}[h]
531\center\small
532\begin{tabular}{|r|l|}\hline
533 +3 &
534\begin{minipage}{12.5cm}
535\vspace*{1ex}
536The function is locally analytic.
537\vspace*{1ex}
538\end{minipage} \\ \hline
539 +2 &
540\begin{minipage}{12.5cm}
541\vspace*{1ex}
542The function is locally analytic but the sparsity
543structure (compared to the situation at the  taping point)
544may have changed, e.g. while at taping arguments
545{\sf fmax(a,b)} returned {\sf a} we get {\sf b} at
546the argument currently used.
547\vspace*{1ex}
548\end{minipage} \\ \hline
549 +1 &
550\begin{minipage}{12.5cm}
551\vspace*{1ex}
552At least one of the functions {\sf fmin}, {\sf fmax} or {\sf fabs}
553is  evaluated at a tie or zero, respectively.  Hence, the function to be differentiated is
554Lipschitz-continuous but possibly non-differentiable.
555\vspace*{1ex}
556\end{minipage} \\ \hline
557 0 &
558\begin{minipage}{12.5cm}
559\vspace*{1ex}
560Some arithmetic comparison involving {\sf adouble}s yields a tie.
561Hence, the function to be differentiated  may be discontinuous.
562\vspace*{1ex}
563\end{minipage} \\ \hline
564 -1 &
565\begin{minipage}{12.5cm}
566\vspace*{1ex}
567An {\sf adouble} comparison yields different results
568from the evaluation point at which the tape was generated.
569\vspace*{1ex}
570\end{minipage} \\ \hline
571 -2 &
572\begin{minipage}{12.5cm}
573\vspace*{1ex}
574The argument of a user-defined quadrature has changed
575from the evaluation point at which the tape was generated.
576\vspace*{1ex}
577\end{minipage} \\ \hline
578\end{tabular}
579\caption{Description of return values}
580\label{retvalues}
581\end{table}                           
582
583\begin{figure}[h]
584\centering\includegraphics[width=10.0cm]{tap_point}
585\caption{Return values around the taping point}
586\label{fi:tap_point}
587\end{figure}         
588
589In \autoref{fi:tap_point} these return values are illustrated.
590If the user finds the return value of an ADOL-C routine to be negative the
591taping process simply has to be repeated by executing the active section again.
592The crux of the problem lies in the fact that the tape records only
593the operations that are executed during one particular evaluation of the
594function.
595It also has no way to evaluate integrals since the corresponding
596quadratures are never recorded on the tape.
597Therefore, when there are user-defined quadratures the retaping is necessary at each
598new point. If there are only branches conditioned on {\sf adouble}
599comparisons one may hope that re-taping becomes unnecessary when
600the points settle down in some small neighborhood, as one would
601expect for example in an iterative equation solver.
602%
603%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
604\subsection{Conditional Assignments}
605\label{condassign}
606%
607It appears unsatisfactory that, for example, a simple table lookup
608of some physical property forces the re-recording of a possibly
609much larger calculation. However, the basic philosophy of ADOL-C
610is to overload arithmetic, rather than to generate a new program
611with jumps between ``instructions'', which would destroy the
612strictly sequential tape access and
613require the infusion of substantial compiler technology.
614Therefore, we introduce the two constructs of conditional
615assignments and active integers as partial remedies to the
616branching problem.
617
618In many cases, the functionality of branches
619can be replaced by conditional assignments. 
620For this purpose, we provide a special function called
621{\sf condassign(a,b,c,d)}. Its calling sequence corresponds to the
622syntax of the conditional assignment
623\begin{center}
624    {\sf a = (b \boldmath $>$ \unboldmath 0) ? c : d;} 
625\end{center}
626which C++ inherited from C. However, here the arguments are restricted to be
627active or passive scalar arguments, and all expression arguments
628are evaluated before the test on {\sf  b}, which is different from
629the usual conditional assignment or the code segment.
630
631Suppose the original program contains the code segment
632\begin{center}
633{\sf if (b \boldmath $>$ \unboldmath 0) a = c; else a = d;}\\
634\end{center}
635Here, only one of the expressions (or, more generally, program blocks)
636{\sf c} and {\sf d} is evaluated, which exactly constitutes the problem
637for ADOL-C. To obtain the correct value {\sf a} with ADOL-C, one
638may first execute both branches and then pick either {\sf c}
639or {\sf d} using
640{\sf condassign(a,b,c,d)}. To maintain
641consistency with the original code, one has to make sure
642that the two branches do not have any side effects that can
643interfere with each other or may be important for subsequent
644calculations. Furthermore the test parameter {\sf b} has to be an
645{\sf adouble} or an {\sf adouble} expression. Otherwise the
646test condition {\sf b} is recorded on the tape as a {\em constant} with its
647run time value. Thus the original dependency of {\sf b} on
648active variables gets lost, for instance if {\sf b} is a comparison
649expression, see \autoref{OverOper}.
650If there is no {\sf else} part in a conditional assignment, one may call
651the three argument version
652{\sf condassign(a,b,c)}, which
653is logically equivalent to {\sf condassign(a,b,c,a)} in that
654nothing happens if {\sf b} is non-positive. 
655The header file \verb=<adolc/adouble.h>=
656contains also corresponding definitions of
657{\sf condassign(a,b,c,d)} 
658and {\sf condassign(a,b,c)} for
659passive {\sf double} arguments so that the modified code
660without any differentiation can be tested
661for correctness.
662
663A generalization of this concept for more than two branches, e.g.,
664akin to a \texttt{switch} statement or a cascade of \texttt{if...else if}, may be done by enabling
665\textsf{ADOLC\_ADVANCED\_BRANCHING} and performing selection on
666elements of an \texttt{advector} with active indices.
667%
668%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
669\subsection{Step-by-Step Modification Procedure}
670%
671To prepare a section of given C or C++ code for automatic
672differentiation as described above, one applies the following step-by-step procedure.
673\begin{enumerate}
674\item
675Use the statements {\sf trace\_on(tag)} or {\sf trace\_on(tag,keep)}
676and {\sf trace\_off()} or {\sf trace\_off(file)} to mark the
677beginning and end of the active section.
678\item 
679Select the set of active variables, and change their type from
680{\sf double} or {\sf float} to {\sf adouble}.
681\item
682Select a sequence of independent variables, and initialize them with
683\boldmath $\ll=$ \unboldmath assignments from passive variables or vectors.
684\item
685Select a sequence of dependent variables among the active variables,
686and pass their final values to passive variable or vectors thereof
687by \boldmath $\gg=$ \unboldmath assignments.
688\item 
689Compile the codes after including the header file \verb=<adolc/adouble.h>=.
690\end{enumerate}
691Typically, the first compilation will detect several type conflicts
692-- usually attempts to convert from active to passive
693variables or to perform standard I/O of active variables.
694Since all standard
695C programs can be activated by a mechanical application of the
696procedure above, the following section is of importance
697only to advanced users.
698%                                                                 
699%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
700\section{Numbering the Tapes and Controlling the Buffer}
701\label{tape}
702%
703The trace generated by the execution of an active section may stay
704within a triplet of internal arrays or it may be written out
705to three corresponding files. We will refer to these triplets as the
706tape array or tape file, in general tape, which may subsequently be
707used to evaluate the
708underlying function and its derivatives at the original point or at
709alternative arguments. If the active section involves user-defined
710quadratures it must be executed and
711re-taped at each new argument. Similarly, if conditions on
712{\sf adouble} values lead to a different program branch being taken at
713a new argument the evaluation process also needs to be re-taped at the
714new point. Otherwise, direct evaluation from
715the tape by the routine {\sf function} (\autoref{optdrivers}) is
716likely to be
717faster. The use of quadratures and the results of all comparisons on
718{\sf adouble}s are recorded on the tape so that {\sf function} and other
719forward routines stop and  return appropriate flags if their use without
720prior re-taping is unsafe. To avoid any re-taping certain types of
721branches can be recorded on the tape through
722the use of conditional assignments 
723described before in \autoref{condassign}.
724
725Several tapes may be generated and kept simultaneously.
726A tape array is used as a triplet of buffers or a tape file is generated if
727the length of any of the buffers exceeds the maximal array lengths of
728{\sf OBUFSIZE}, {\sf VBUFSIZE} or {\sf LBUFSIZE}. These parameters are
729defined in the header file \verb=<adolc/usrparms.h>=
730and may be adjusted by the user in the header file before compiling
731the ADOL-C library, or on runtime using a file named \verb=.adolcrc=.
732Lines in this file must have the form
733\begin{verbatim}
734"VARIABLE" = "VALUE"
735\end{verbatim}
736where the quotation marks are mandatory.
737The filesystem folder, where the tapes files may be written to disk,
738can be changed by changing the definition of {\sf TAPE\_DIR} in
739the header file \verb=<adolc/dvlparms.h>= before
740compiling the ADOL-C library, or on runtime by defining {\sf
741  TAPE\_DIR} in the \verb=.adolcrc= file. By default this is defined
742to be the present working directory (\verb=.=).
743
744For simple usage, {\sf trace\_on} may be called with only the tape
745{\sf tag} as argument, and {\sf trace\_off} may be called
746without argument. The optional integer argument {\sf keep} of
747{\sf trace\_on} determines whether the numerical values of all
748active variables are recorded in a buffered temporary array or file
749called the taylor stack.
750This option takes effect if
751{\sf keep} = 1 and prepares the scene for an immediately following
752gradient evaluation by a call to a routine implementing the reverse mode
753as described in the \autoref{forw_rev_ad} and \autoref{forw_rev}. A
754file is used instead of an array if the size exceeds the maximal array
755length of {\sf TBUFSIZE} defined in \verb=<adolc/usrparms.h>= and may
756be adjusted in the same way like the other buffer sizes mentioned above.
757Alternatively, gradients may be evaluated by a call
758to {\sf gradient}, which includes a preparatory forward sweep
759for the creation of the temporary file. If omitted, the argument
760{\sf  keep} defaults to 0, so that no temporary
761taylor stack file is generated.
762
763By setting the optional integer argument {\sf file} of
764{\sf  trace\_off} to 1, the user may force a numbered  tape
765file to be written even if the tape array (buffer) does not overflow.
766If the argument {\sf file} is omitted, it
767defaults to 0, so that the tape array is written onto a tape file only
768if the length of any of the buffers exceeds {\sf [OLVT]BUFSIZE} elements.
769
770After the execution of an active section, if a tape file was generated, i.e.,
771if the length of some buffer exceeded {\sf [OLVT]BUFSIZE} elements or if the
772argument {\sf file} of {\sf trace\_off} was set to 1, the files will be
773saved in the directory defined as {\sf ADOLC\_TAPE\_DIR} (by default
774the current working directory) under filenames formed by
775the strings {\sf ADOLC\_OPERATIONS\_NAME}, {\sf
776  ADOLC\_LOCATIONS\_NAME}, {\sf ADOLC\_VALUES\_NAME} and {\sf
777  ADOLC\_TAYLORS\_NAME} defined in
778the header file \verb=<adolc/dvlparms.h>= appended with the number
779given as the {\sf tag} argument to {\sf trace\_on} and have the
780extension {\sf .tap}.
781
782 Later, all problem-independent routines
783like {\sf gradient}, {\sf jacobian}, {\sf forward}, {\sf reverse}, and others
784expect as first argument a {\sf tag} to determine
785the tape on which their respective computational task is to be performed.
786By calling {\sf trace\_on} with different tape {\sf tag}s, one can create
787several tapes for various function evaluations and subsequently perform
788function and derivative evaluations on one or more of them.
789
790For example, suppose one wishes to calculate for two smooth functions
791$f_1(x)$ and $f_2(x)$ 
792\[
793   f(x) = \max \{f_1(x) ,f_2(x)\},\qquad \nabla f(x),
794\]
795and possibly higher derivatives where the two functions do not tie.
796Provided $f_1$ and $f_2$ are evaluated in two separate active sections,
797one can generate two different tapes by calling {\sf trace\_on} with
798{\sf tag} = 1 and {\sf tag} = 2 at the beginning of the respective active
799sections.
800Subsequently, one can decide whether $f(x)=f_1(x)$ or $f(x)=f_2(x)$ at the
801current argument and then evaluate the gradient $\nabla f(x)$ by calling
802{\sf gradient} with the appropriate argument value {\sf tag} = 1 or
803{\sf tag} = 2.
804%
805%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
806\subsection{Examining the Tape and Predicting Storage Requirements }
807\label{examiningTape}
808%
809At any point in the program, one may call the routine
810\begin{center}
811{\sf void tapestats(unsigned short tag, size\_t* counts)}
812\end{center}
813with {\sf counts} beeing an array of at least eleven integers.
814The first argument {\sf tag} specifies the particular tape of
815interest. The components of {\sf counts} represent
816\[
817\begin{tabular}{ll}
818{\sf counts[0]}: & the number of independents, i.e.~calls to \boldmath $\ll=$ \unboldmath, \\
819{\sf counts[1]}: & the number of dependents, i.e.~calls to \boldmath $\gg=$ \unboldmath,\\ 
820{\sf counts[2]}: & the maximal number of live active variables,\\
821{\sf counts[3]}: & the size of taylor stack (number of overwrites),\\
822{\sf counts[4]}: & the buffer size (a multiple of eight),
823\end{tabular}
824\]
825\[
826\begin{tabular}{ll}
827{\sf counts[5]}: & the total number of operations recorded,\\
828{\sf counts[6-13]}: & other internal information about the tape.
829\end{tabular}
830\]
831The values {\sf maxlive} = {\sf counts[2]} and {\sf tssize} = {\sf counts[3]} 
832determine the temporary
833storage requirements during calls to the routines
834implementing the forward and the reverse mode.
835For a certain degree {\sf deg} $\geq$ 0, the scalar version of the
836forward mode involves apart from the tape buffers an array of
837 $(${\sf deg}$+1)*${\sf maxlive} {\sf double}s in
838core and, in addition, a sequential data set called the value stack
839of {\sf tssize}$*${\sf keep} {\sf revreal}s if called with the
840option {\sf keep} $>$ 0. Here
841the type {\sf revreal} is defined as {\sf double} or {\sf float}. The latter choice halves the storage
842requirement for the sequential data set, which stays in core if
843its length is less than {\sf TBUFSIZE} bytes and is otherwise written
844out to a temporary file. The parameter {\sf TBUFSIZE} is defined in the header file \verb=<adolc/usrparms.h>=.
845The drawback of the economical
846{\sf revreal} = {\sf float} choice is that subsequent calls to reverse mode implementations
847yield gradients and other adjoint vectors only in single-precision
848accuracy. This may be acceptable if the adjoint vectors
849represent rows of a Jacobian that is  used for the calculation of
850Newton steps. In its scalar version, the reverse mode implementation involves
851the same number of {\sf double}s and twice as many {\sf revreal}s as the
852forward mode implementation.
853The storage requirements of the vector versions of the forward mode and
854reverse mode implementation are equal to that of the scalar versions multiplied by
855the vector length.
856%
857%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
858\subsection{Customizing ADOL-C}
859\label{Customizing}
860%
861Based on the information provided by the routine {\sf tapestats}, the user may alter the
862following types and constant dimensions in the header file \verb=<adolc/usrparms.h>=
863to suit his problem and environment.
864
865\begin{description}
866\item[{\sf OBUFSIZE}, {\sf LBUFSIZE}, {\sf VBUFSIZE}{\rm :}] These integer determines the length of
867in\-ter\-nal buf\-fers (default: 524$\,$288). If the buffers are large enough to accommodate all
868required data, any file access is avoided unless {\sf trace\_off}
869is called with a positive argument. This desirable situation can
870be achieved for many problem functions with an execution trace of moderate
871size. Primarily these values occur as an argument
872to {\sf malloc}, so that setting it unnecessarily large may have no
873ill effects, unless the operating system prohibits or penalizes large
874array allocations. It is however recommended to leave the values in
875\texttt{<adolc/usrparms.h>} unchanged and set them using the
876\texttt{.adolcrc} file in the current working directory at runtime.
877
878\item[{\sf TBUFSIZE}{\rm :}] This integer determines the length of the
879in\-ter\-nal buf\-fer for a taylor stack (default: 524$\,$288).
880
881\item[{\sf TBUFNUM}{\rm :}] This integer determines the maximal number of taylor stacks (default: 32).
882
883\item[{\sf fint}{\rm :}] The integer data type used by Fortran callable versions of functions.
884
885\item[{\sf fdouble}{\rm :}] The floating point data type used by Fortran callable versions of functions.
886
887\item[{\sf inf\_num}{\rm :}] This together with {\sf inf\_den}
888sets the ``vertical'' slope {\sf InfVal} = {\sf inf\_num/inf\_den} 
889of special functions at the boundaries of their domains (default: {\sf inf\_num} = 1.0). On IEEE machines
890the default setting produces the standard {\sf Inf}. On non-IEEE machines
891change these values to produce a small {\sf InfVal} value and compare
892the results of two forward sweeps with different {\sf InfVal} settings
893to detect a ``vertical'' slope.
894
895\item[{\sf inf\_den}{\rm :}] See {\sf inf\_num} (default: 0.0).
896
897\item[{\sf non\_num}{\rm :}] This together with {\sf non\_den} 
898sets the mathematically
899undefined derivative value {\sf NoNum} = {\sf non\_num/non\_den}
900of special functions at the boundaries of their domains (default: {\sf non\_num} = 0.0). On IEEE machines
901the default setting produces the standard {\sf NaN}. On non-IEEE machines
902change these values to produce a small {\sf NoNum} value and compare
903the results of two forward sweeps with different {\sf NoNum} settings
904to detect the occurrence of undefined derivative values.
905
906\item[{\sf non\_den}{\rm :}] See {\sf non\_num} (default: 0.0).
907
908\item[{\sf ADOLC\_EPS}{\rm :}] For testing on small numbers to avoid overflows (default: 10E-20).
909
910\item[{\sf DIAG\_OUT}{\rm :}] File identifier used as standard output for ADOL-C diagnostics (default: stdout).
911\end{description}
912
913The following types and options may be set using the command-line
914options of the \texttt{./configure} script.
915
916\begin{description}
917\item[{\sf locint}{\rm :}] The range of the integer type
918{\sf locint} determines how many {\sf adouble}s can be simultaneously
919alive (default: {\sf unsigned int}).  In extreme cases when there are more than $2^{32}$ {\sf adouble}s
920alive at any one time, the type {\sf locint} must be changed to
921 {\sf unsigned long}. This can be done by passing
922 \texttt{--enable-ulong} to \texttt{./configure}.
923
924\item[{\sf revreal}{\rm :}] The choice of this floating-point type
925trades accuracy with storage for reverse sweeps (default: {\sf double}). While functions
926and their derivatives are always evaluated in double precision
927during forward sweeps, gradients and other adjoint vectors are obtained
928with the precision determined by the type {\sf revreal}. The less
929accurate choice {\sf revreal} = {\sf float} nearly halves the
930storage requirement during reverse sweeps. This can be done by passing
931\texttt{--disable-double} to \texttt{./configure}.
932
933\item[{\sf ATRIG\_ERF}{\rm :}] The overloaded versions of the inverse
934  hyperbolic functions and the error function are enabled (default:
935  undefined) by passing \texttt{--enable-atrig-erf}
936  to \texttt{./configure}
937
938\item[{\sf ADOLC\_USE\_CALLOC}{\rm :}] Selects the memory allocation routine
939  used by ADOL-C. {\sf Malloc} will be used if this variable is
940  undefined. {\sf ADOLC\_USE\_CALLOC} is defined by default to avoid incorrect
941  result caused by uninitialized memory. It can be set undefined by
942  passing \texttt{--disable-use-calloc} to \texttt{./configure}.
943
944\item[{\sf ADOLC\_ADVANCED\_BRANCHING}{\rm :}] Enables routines
945  required for automatic branch selection (default: disabled). The
946  boolean valued comparison operators with two \texttt{adouble} type
947  arguments will not return boolean values anymore and may not be used
948  in branch control statements (\texttt{if}, \texttt{while}, \texttt{for}
949  etc.). Instead conditional assignments using \texttt{condassign} or
950  selection operations on elements of \texttt{advector} type should be
951  used. Enabling this option and rewriting the function evaluation
952  using \texttt{condassign} or selections of \texttt{advector}
953  elements will prevent the need for retracing the function at branch
954  switches. This can be enabled by passing
955  \texttt{--enable-advanced-branching} to \texttt{./configure}.
956\end{description}
957%
958%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
959\subsection{Warnings and Suggestions for Improved Efficiency}
960\label{WarSug}
961%
962Since the type {\sf adouble} has a nontrivial constructor,
963the mere declaration of large {\sf adouble} arrays may take up
964considerable run time. The user should be warned against
965the usual Fortran practice of declaring fixed-size arrays
966that can accommodate the largest possible case of an evaluation program
967with variable dimensions. If such programs are converted to or written
968in C, the overloading in combination with ADOL-C will lead to very
969large run time increases for comparatively small values of the
970problem dimension, because the actual computation is completely
971dominated by the construction of the large {\sf adouble} arrays.
972The user is advised to
973create dynamic arrays of
974{\sf adouble}s by using the C++ operator {\sf new} and to destroy them
975using {\sf delete}. For storage efficiency it is desirable that
976dynamic objects are created and destroyed in a last-in-first-out
977fashion.
978
979Whenever an {\sf adouble} is declared, the constructor for the type
980{\sf adouble} assigns it a nominal address, which we will refer to as
981its  {\em location}.  The location is of the type {\sf locint} defined
982in the header file \verb=<adolc/usrparms.h>=. Active vectors occupy
983a range of contiguous locations. As long as the program execution
984never involves more than 65$\,$536 active variables, the type {\sf locint}
985may be defined as {\sf unsigned short}. Otherwise, the range may be
986extended by defining {\sf locint} as {\sf (unsigned) int} or
987{\sf (unsigned) long}, which may nearly double
988the overall mass storage requirement. Sometimes one can avoid exceeding
989the accessible range of {\sf unsigned short}s by using more local variables and deleting
990{\sf adouble}s  created by the new operator in a
991last-in-first-out
992fashion.  When memory for {\sf adouble}s is requested through a call to
993{\sf malloc()} or other related C memory-allocating
994functions, the storage for these {\sf adouble}s is allocated; however, the
995C++ {\sf adouble} constructor is never called.  The newly defined
996{\sf adouble}s are never assigned a location and are not counted in
997the stack of live variables. Thus, any results depending upon these
998pseudo-{\sf adouble}s will be incorrect. For these reasons {\bf DO NOT use
999  malloc() and related C memory-allocating
1000functions when declaring adoubles (see the following paragraph).}
1001%
1002% XXX: Vector and matrix class have to be reimplemented !!!
1003%
1004%The same point applies, of course,
1005% for active vectors.
1006
1007When an {\sf adouble}
1008%
1009% XXX: Vector and matrix class have to be reimplemented !!!
1010%
1011% or {\bf adoublev}
1012goes out of
1013scope or is explicitly deleted, the destructor notices that its
1014location(s) may be
1015freed for subsequent (nominal) reallocation. In general, this is not done
1016immediately but is delayed until the locations to be deallocated form a
1017contiguous tail of all locations currently being used. 
1018
1019 As a consequence of this allocation scheme, the currently
1020alive {\sf adouble} locations always form a contiguous range of integers
1021that grows and shrinks like a stack. Newly declared {\sf adouble}s are
1022placed on the top so that vectors of {\sf adouble}s obtain a contiguous
1023range of locations. While the C++ compiler can be expected to construct
1024and destruct automatic variables in a last-in-first-out fashion, the
1025user may upset this desirable pattern by deleting free-store {\sf adouble}s
1026too early or too late. Then the {\sf adouble} stack may grow
1027unnecessarily, but the numerical results will still be
1028correct, unless an exception occurs because the range of {\sf locint}
1029is exceeded. In general, free-store {\sf adouble}s
1030%
1031% XXX: Vector and matrix class have to be reimplemented !!!
1032%
1033%and {\bf adoublev}s
1034should be deleted in a last-in-first-out fashion toward the end of
1035the program block in which they were created.
1036When this pattern is maintained, the maximum number of
1037{\sf adouble}s alive and, as a consequence, the
1038randomly accessed storage space
1039of the derivative evaluation routines is bounded by a
1040small multiple of the memory used in the relevant section of the
1041original program. Failure to delete dynamically allocated {\sf adouble}s
1042may cause that the  maximal number of {\sf adouble}s alive at one time will be exceeded
1043if the same active section is called repeatedly. The same effect
1044occurs if static {\sf adouble}s are used.
1045
1046To avoid the storage and manipulation of structurally
1047trivial derivative values, one should pay careful attention to
1048the naming of variables. Ideally, the intermediate
1049values generated during the evaluation of a vector function
1050should be assigned to program variables that are
1051consistently either active or passive, in that all their values
1052either are or are not dependent on the independent variables
1053in a nontrivial way. For example, this rule is violated if a temporary
1054variable is successively used to accumulate inner products involving
1055first only passive and later active arrays. Then the first inner
1056product and all its successors in the data dependency graph become
1057artificially active and the derivative evaluation routines
1058described later will waste
1059time allocating and propagating
1060trivial or useless derivatives. Sometimes even values that do
1061depend on the independent variables may be of only transitory
1062importance and may not affect the dependent variables. For example,
1063this is true for multipliers that are used to scale linear
1064equations, but whose values do not influence the dependent
1065variables in a mathematical sense. Such dead-end variables
1066can be deactivated by the use of the {\sf value} function, which
1067converts {\sf adouble}s to {\sf double}s. The deleterious effects
1068of unnecessary activity are partly alleviated by run time
1069activity flags in the derivative routine
1070{\sf hov\_reverse} presented in \autoref{forw_rev_ad}.
1071
1072The {\sf adouble} default constructor sets to zero the associated value.
1073This implies a certain overhead that may seem unnecessary when no initial value
1074is actually given, however,
1075the implicit initialization of arrays from a partial value list is the only legitimate construct (known to us) that requires this behavior.
1076An array instantiation such as
1077\begin{center}
1078\sf double x[3]=\{2.0\};
1079\end{center}
1080will initialize {\sf x[0]} to {\sf 2.0} and initialize (implicitly) the remaining array elements
1081{\sf x[1]} and {\sf x[2]}  to {\sf 0.0}. According to the C++ standard the array element  construction of
1082the type changed instantiation
1083\begin{center}
1084\sf adouble x[3]=\{2.0\};
1085\end{center}
1086will use the constructor {\sf adouble(const double\&);} for {\sf x[0]} passing in {\sf 2.0} but
1087will call the {\sf adouble} default constructor {\sf x[1]} and {\sf x[2]} leaving these array
1088elements uninitialized {\em unless} the default constructor does implement the initialization to
1089zero.
1090The C++ constructor syntax does not provide a means to  distinguish this implicit initialization from the declaration of any simple uninitialized variable.
1091If the user can ascertain the absence of array instantiations such as the above then one can 
1092configure ADOL-C with the \verb=--disable-stdczero= option , see \autoref{genlib}, to
1093avoid the overhead of these initializations. 
1094 
1095%
1096%
1097%
1098%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1099\section{Easy-To-Use Drivers}
1100\label{drivers}
1101%
1102For the convenience of the user, ADOL-C provides several
1103easy-to-use drivers that compute the most frequently required
1104derivative objects. Throughout, we assume that after the execution of an
1105active section, the corresponding tape with the identifier {\sf tag}
1106contains a detailed record of the computational process by which the
1107final values $y$ of the dependent variables were obtained from the
1108values $x$ of the independent variables. We will denote this functional
1109relation between the input variables $x$ and the output variables $y$ by
1110\[
1111F : \R^n \mapsto \R^m, \qquad x \rightarrow F(x) \equiv y.
1112\]
1113The return value of all drivers presented in this section
1114indicate the validity of the tape as explained in \autoref{reuse_tape}.
1115The presented drivers are all C functions and therefore can be used within
1116C and C++ programs. Some Fortran-callable companions can be found
1117in the appropriate header files.
1118%
1119%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1120\subsection{Drivers for Optimization and Nonlinear Equations}
1121%
1122\label{optdrivers}
1123%
1124The drivers provided for solving optimization problems and nonlinear
1125equations are prototyped in the header file \verb=<adolc/drivers/drivers.h>=,
1126which is included automatically by the global header file \verb=<adolc/adolc.h>=
1127(see \autoref{ssec:DesIH}).
1128
1129The routine {\sf function} allows to evaluate the desired function from
1130the tape instead of executing the corresponding source code:
1131%
1132\begin{tabbing}
1133\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1134\>{\sf int function(tag,m,n,x,y)}\\
1135\>{\sf short int tag;}         \> // tape identification \\
1136\>{\sf int m;}                 \> // number of dependent variables $m$\\
1137\>{\sf int n;}                 \> // number of independent variables $n$\\
1138\>{\sf double x[n];}           \> // independent vector $x$ \\
1139\>{\sf double y[m];}           \> // dependent vector $y=F(x)$ 
1140\end{tabbing}
1141%
1142If the original evaluation program is available this double version
1143should be used to compute the function value in order to avoid the
1144interpretative overhead. 
1145
1146For the calculation of whole derivative vectors and matrices up to order
11472 there are the following procedures:
1148%
1149\begin{tabbing}
1150\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1151\>{\sf int gradient(tag,n,x,g)}\\
1152\>{\sf short int tag;}         \> // tape identification \\
1153\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1154\>{\sf double x[n];}           \> // independent vector $x$ \\
1155\>{\sf double g[n];}           \> // resulting gradient $\nabla F(x)$
1156\end{tabbing}
1157%
1158\begin{tabbing}
1159\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1160\>{\sf int jacobian(tag,m,n,x,J)}\\
1161\>{\sf short int tag;}         \> // tape identification \\
1162\>{\sf int m;}                 \> // number of dependent variables $m$\\
1163\>{\sf int n;}                 \> // number of independent variables $n$\\
1164\>{\sf double x[n];}           \> // independent vector $x$ \\
1165\>{\sf double J[m][n];}        \> // resulting Jacobian $F^\prime (x)$
1166\end{tabbing}
1167%
1168\begin{tabbing}
1169\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1170\>{\sf int hessian(tag,n,x,H)}\\
1171\>{\sf short int tag;}         \> // tape identification \\
1172\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1173\>{\sf double x[n];}           \> // independent vector $x$ \\
1174\>{\sf double H[n][n];}        \> // resulting Hessian matrix $\nabla^2F(x)$ 
1175\end{tabbing}
1176%
1177The driver routine {\sf hessian} computes only the lower half of
1178$\nabla^2f(x_0)$ so that all values {\sf H[i][j]} with $j>i$ 
1179of {\sf H} allocated as a square array remain untouched during the call
1180of {\sf hessian}. Hence only $i+1$ {\sf double}s  need to be
1181allocated starting at the position {\sf H[i]}.
1182
1183To use the full capability of automatic differentiation when the
1184product of derivatives with certain weight vectors or directions are needed, ADOL-C offers
1185the following four drivers: 
1186%
1187\begin{tabbing}
1188\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1189\>{\sf int vec\_jac(tag,m,n,repeat,x,u,z)}\\
1190\>{\sf short int tag;}         \> // tape identification \\
1191\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1192\>{\sf int n;}                 \> // number of independent variables $n$\\
1193\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1194\>{\sf double x[n];}           \> // independent vector $x$ \\
1195\>{\sf double u[m];}           \> // range weight vector $u$ \\ 
1196\>{\sf double z[n];}           \> // result $z = u^TF^\prime (x)$
1197\end{tabbing}
1198If a nonzero value of the parameter {\sf repeat} indicates that the
1199routine {\sf vec\_jac} has been called at the same argument immediately
1200before, the internal forward mode evaluation will be skipped and only
1201reverse mode evaluation with the corresponding arguments is executed
1202resulting in a reduced computational complexity of the function {\sf vec\_jac}.
1203%
1204\begin{tabbing}
1205\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1206\>{\sf int jac\_vec(tag,m,n,x,v,z)}\\
1207\>{\sf short int tag;}         \> // tape identification \\
1208\>{\sf int m;}                 \> // number of dependent variables $m$\\
1209\>{\sf int n;}                 \> // number of independent variables $n$\\
1210\>{\sf double x[n];}           \> // independent vector $x$\\
1211\>{\sf double v[n];}           \> // tangent vector $v$\\ 
1212\>{\sf double z[m];}           \> // result $z = F^\prime (x)v$
1213\end{tabbing}
1214%
1215\begin{tabbing}
1216\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1217\>{\sf int hess\_vec(tag,n,x,v,z)}\\
1218\>{\sf short int tag;}         \> // tape identification \\
1219\>{\sf int n;}                 \> // number of independent variables $n$\\
1220\>{\sf double x[n];}           \> // independent vector $x$\\
1221\>{\sf double v[n];}           \> // tangent vector $v$\\
1222\>{\sf double z[n];}           \> // result $z = \nabla^2F(x) v$ 
1223\end{tabbing}
1224%
1225\begin{tabbing}
1226\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1227\>{\sf int hess\_mat(tag,n,p,x,V,Z)}\\
1228\>{\sf short int tag;}         \> // tape identification \\
1229\>{\sf int n;}                 \> // number of independent variables $n$\\
1230\>{\sf int p;}                 \> // number of columns in $V$\\
1231\>{\sf double x[n];}           \> // independent vector $x$\\
1232\>{\sf double V[n][p];}        \> // tangent matrix $V$\\
1233\>{\sf double Z[n][p];}        \> // result $Z = \nabla^2F(x) V$ 
1234\end{tabbing}
1235%
1236\begin{tabbing}
1237\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1238\>{\sf int lagra\_hess\_vec(tag,m,n,x,v,u,h)}\\
1239\>{\sf short int tag;}         \> // tape identification \\
1240\>{\sf int m;}                 \> // number of dependent variables $m$\\
1241\>{\sf int n;}                 \> // number of independent variables $n$\\
1242\>{\sf double x[n];}           \> // independent vector $x$\\
1243\>{\sf double v[n];}           \> // tangent vector $v$\\
1244\>{\sf double u[m];}           \> // range weight vector $u$ \\
1245\>{\sf double h[n];}           \> // result $h = u^T\nabla^2F(x) v $
1246\end{tabbing}
1247%
1248The next procedure allows the user to perform Newton steps only
1249having the corresponding tape at hand:
1250%
1251\begin{tabbing}
1252\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1253\>{\sf int jac\_solv(tag,n,x,b,mode)} \\
1254\>{\sf short int tag;}         \> // tape identification \\
1255\>{\sf int n;}                 \> // number of independent variables $n$\\
1256\>{\sf double x[n];}           \> // independent vector $x$ as\\
1257\>{\sf double b[n];}           \> // in: right-hand side b, out: result $w$ of
1258$F(x)w = b$\\
1259\>{\sf int mode;}              \> // option to choose different solvers
1260\end{tabbing}
1261%
1262On entry, parameter {\sf b} of the routine {\sf jac\_solv}
1263contains the right-hand side of the equation $F(x)w = b$ to be solved. On exit,
1264{\sf b} equals the solution $w$ of this equation. If {\sf mode} = 0 only
1265the Jacobian of the function
1266given by the tape labeled with {\sf tag} is provided internally.
1267The LU-factorization of this Jacobian is computed for {\sf mode} = 1. The
1268solution of the equation is calculated if {\sf mode} = 2.
1269Hence, it is possible to compute the
1270LU-factorization only once. Then the equation can be solved for several
1271right-hand sides $b$ without calculating the Jacobian and
1272its factorization again. 
1273
1274If the original evaluation code of a function contains neither
1275quadratures nor branches, all drivers described above can be used to
1276evaluate derivatives at any argument in its domain. The same still
1277applies if there are no user defined quadratures and
1278all comparisons  involving {\sf adouble}s have the same result as
1279during taping. If this assumption is falsely made all drivers
1280while internally calling the forward mode evaluation will return the value -1 or -2
1281as already specified in \autoref{reuse_tape}
1282
1283%
1284%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1285\subsection{Drivers for Ordinary Differential Equations}
1286\label{odedrivers}
1287%
1288When $F$ is the right-hand side of an (autonomous) ordinary
1289differential equation 
1290\[
1291x^\prime(t) \; = \; F(x(t)) , 
1292\] 
1293we must have $m=n$. Along any solution path $x(t)$ its Taylor
1294coefficients $x_j$ at some time, e.g., $t=0$, must satisfy
1295the relation
1296\[
1297 x_{i+1} = \frac{1}{1+i} y_i.
1298\]
1299with the $y_j$ the Taylor coefficients of its derivative $y(t)=x^\prime(t)$, namely,
1300\[
1301 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
1302\]
1303defined by an autonomous right-hand side $F$ recorded on the tape.
1304Using this relation, one can generate the Taylor coefficients $x_i$,
1305$i \le deg$,
1306recursively from the current point $x_0$. This task is achieved by the
1307driver routine {\sf forode} defined as follows:
1308%
1309\begin{tabbing}
1310\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1311\>{\sf int forode(tag,n,tau,dol,deg,X)}\\
1312\>{\sf short int tag;}         \> // tape identification \\
1313\>{\sf int n;}                 \> // number of state variables $n$\\
1314\>{\sf double tau;}            \> // scaling parameter\\
1315\>{\sf int dol;}               \> // degree on previous call\\
1316\>{\sf int deg;}               \> // degree on current call\\
1317\>{\sf double X[n][deg+1];}    \> // Taylor coefficient vector $X$
1318\end{tabbing}
1319%
1320If {\sf dol} is positive, it is assumed that {\sf forode}
1321has been called before at the same point so that all Taylor coefficient
1322vectors up to the {\sf dol}-th are already correct.
1323
1324Subsequently one may call the driver routine {\sf reverse} or corresponding
1325low level routines as explained in the \autoref{forw_rev} and
1326\autoref{forw_rev_ad}, respectively, to compute
1327the family of square matrices {\sf Z[n][n][deg]} defined by
1328\[
1329Z_j \equiv U\/\frac{\partial y_j}{\partial x_0} \in{I\!\!R}^{q \times n} ,
1330\]
1331with {\sf double** U}$=I_n$ the identity matrix of order {\sf n}.
1332
1333For the numerical solutions of ordinary differential equations,
1334one may also wish to calculate the Jacobians
1335\begin{equation} 
1336\label{eq:bees}
1337B_j \; \equiv \; \frac{\mbox{d}x_{j+1}}{\mbox{d} x_0}\;\in\;{I\!\!R}^{n \times n}\, ,
1338\end{equation}
1339which exist provided $F$ is sufficiently smooth. These matrices can
1340be obtained from the partial derivatives $\partial y_i/\partial x_0$
1341by an appropriate version of the chain rule.
1342To compute the total derivatives $B = (B_j)_{0\leq j <d}$
1343defined in \eqref{eq:bees}, one has to evaluate $\frac{1}{2}d(d-1)$
1344matrix-matrix products. This can be done by a call of the routine {\sf accode} after the
1345corresponding evaluation of the {\sf hov\_reverse} function. The interface of
1346{\sf accode} is defined as follows:
1347%
1348\begin{tabbing}
1349\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1350\>{\sf int accode(n,tau,deg,Z,B,nz)}\\
1351\>{\sf int n;}                 \> // number of state variables $n$ \\
1352\>{\sf double tau;}            \> // scaling parameter\\
1353\>{\sf int deg;}               \> // degree on current call\\
1354\>{\sf double Z[n][n][deg];}   \> // partials of coefficient vectors\\
1355\>{\sf double B[n][n][deg];}   \> // result $B$ as defined in \eqref{eq:bees}\\
1356\>{\sf short nz[n][n];}        \> // optional nonzero pattern
1357\end{tabbing}
1358%
1359Sparsity information can be exploited by {\sf accode} using the array {\sf
1360nz}. For this purpose, {\sf nz} has to be set by a call of the routine {\sf
1361reverse} or the corresponding basic routines as explained below in
1362\autoref{forw_rev_ad} and \autoref{forw_rev}, respectively. The
1363non-positive entries of {\sf nz} are then changed by {\sf accode} so that upon
1364return
1365\[
1366  \mbox{{\sf B[i][j][k]}} \; \equiv \; 0 \quad {\rm if} \quad \mbox{\sf k} \leq \mbox{\sf $-$nz[i][j]}\; .
1367\]
1368In other words, the matrices $B_k$ = {\sf B[ ][ ][k]} have a
1369sparsity pattern that fills in as $k$ grows. Note, that there need to be no
1370loss in computational efficiency if a time-dependent ordinary differential equation
1371is rewritten in autonomous form.
1372
1373The prototype of the ODE-drivers {\sf forode} and {\sf accode} is contained in the header file
1374\verb=<adolc/drivers/odedrivers.h>=. The global header file
1375\verb=<adolc/adolc.h>=
1376includes this file automatically, see \autoref{ssec:DesIH}.
1377
1378An example program using the procedures {\sf forode} and {\sf accode} together
1379with more detailed information about the coding can be found in
1380\autoref{exam:ode}. The corresponding source code
1381\verb=odexam.cpp= is contained in the subdirectory
1382\verb=examples=.
1383%
1384%
1385%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1386\subsection{Drivers for Sparse Jacobians and Sparse Hessians}
1387\label{sparse}
1388%
1389Quite often, the Jacobians and Hessians that have to be computed are sparse
1390matrices. Therefore, ADOL-C provides additionally drivers that
1391allow the exploitation of sparsity. The exploitation of sparsity is
1392frequently based on {\em graph coloring} methods, discussed
1393for example in \cite{GeMaPo05} and \cite{GeTaMaPo07}. The sparse drivers of ADOL-C presented in this section
1394rely on the the coloring package ColPack developed by the authors of \cite{GeMaPo05} and \cite{GeTaMaPo07}.
1395ColPack is not directly incorporated in ADOL-C, and therefore needs to be installed
1396separately to use the sparse drivers described here. ColPack is available for download at
1397\verb=http://cscapes.cs.purdue.edu/coloringpage/software.htm=. More information about the required
1398installation of ColPack is given in \autoref{install}.
1399%
1400\subsubsection*{Sparse Jacobians and Sparse Hessians}
1401%
1402To compute the entries of sparse Jacobians and sparse Hessians,
1403respectively, in coordinate format one may use the drivers:
1404\begin{tabbing}
1405\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1406\>{\sf int sparse\_jac(tag,m,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1407\>{\sf short int tag;}         \> // tape identification \\
1408\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1409\>{\sf int n;}                 \> // number of independent variables $n$\\
1410\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1411\>{\sf double x[n];}           \> // independent vector $x$ \\
1412\>{\sf int nnz;}               \> // number of nonzeros \\ 
1413\>{\sf unsigned int rind[nnz];}\> // row index\\ 
1414\>{\sf unsigned int cind[nnz];}\> // column index\\ 
1415\>{\sf double values[nnz];}    \> // non-zero values\\ 
1416\>{\sf int options[4];}        \> // array of control parameters\\ 
1417\end{tabbing}
1418%
1419\begin{tabbing}
1420\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1421\>{\sf int sparse\_hess(tag,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1422\>{\sf short int tag;}         \> // tape identification \\
1423\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1424\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1425\>{\sf double x[n];}           \> // independent vector $x$ \\
1426\>{\sf int nnz;}               \> // number of nonzeros \\ 
1427\>{\sf unsigned int rind[nnz];}\> // row indices\\ 
1428\>{\sf unsigned int cind[nnz];}\> // column indices\\ 
1429\>{\sf double values[nnz];}    \> // non-zero values  \\
1430\>{\sf int options[2];}        \> // array of control parameters\\ 
1431\end{tabbing}
1432%
1433Once more, the input variables are the identifier for the internal
1434representation {\sf tag}, if required the number of dependents {\sf m},
1435and the number of independents {\sf n} for a consistency check.
1436Furthermore, the flag {\sf repeat=0} indicates that the functions are called
1437at a point with a new sparsity structure, whereas  {\sf repeat=1} results in
1438the re-usage of the sparsity pattern from the previous call.
1439The current values of the independents are given by the array {\sf x}.
1440The input/output
1441variable {\sf nnz} stores the number of the nonzero entries.
1442Therefore, {\sf nnz} denotes also the length of the arrays {\sf r\_ind} storing
1443the row indices, {\sf c\_ind} storing the column indices, and
1444{\sf values} storing the values of the nonzero entries.
1445If {\sf sparse\_jac} and {\sf sparse\_hess} are called with {\sf repeat=0},
1446the functions determine the number of nonzeros for the sparsity pattern
1447defined by the value of {\sf x}, allocate appropriate arrays {\sf r\_ind},
1448{\sf c\_ind}, and {\sf values} and store the desired information in these
1449arrays.
1450During the next function call with {\sf repeat=1} the allocated memory
1451is reused such that only the values of the arrays are changed.   
1452Before calling {\sf sparse\_jac} or {\sf sparse\_hess} once more with {\sf
1453  repeat=0} the user is responsible for the deallocation of the array
1454 {\sf r\_ind}, {\sf c\_ind}, and {\sf values} using the function {\sf
1455   free()}!
1456
1457For each driver the array {\sf options} can be used to adapted the
1458computation of the sparse derivative matrices to the special
1459needs of application under consideration. Most frequently, the default options
1460will give a reasonable performance. The elements of the array {\sf options} control the action of
1461{\sf sparse\_jac} according to \autoref{options_sparse_jac}.
1462\begin{table}[h]
1463\center
1464\begin{tabular}{|c|c|l|} \hline
1465component & value &  \\ \hline
1466{\sf options[0]} &    &  way of sparsity pattern computation \\
1467                 & 0  &  propagation of index domains (default) \\
1468                 & 1  &  propagation of bit pattern \\ \hline
1469{\sf options[1]} &    &  test the computational graph control flow \\
1470                 & 0  &  safe mode (default) \\
1471                 & 1  &  tight mode \\ \hline
1472{\sf options[2]} &    &  way of bit pattern propagation \\
1473                 & 0  &  automatic detection (default) \\
1474                 & 1  &  forward mode \\ 
1475                 & 2  &  reverse mode \\ \hline
1476{\sf options[3]} &    &  way of compression \\
1477                 & 0  &  column compression (default) \\
1478                 & 1  &  row compression \\ \hline
1479\end{tabular}
1480\caption{ {\sf sparse\_jac} parameter {\sf options}\label{options_sparse_jac}}
1481\end{table}           
1482
1483The component {\sf options[1]} determines
1484the usage of the safe or tight mode of sparsity computation.
1485The first, more conservative option is the default. It accounts for all
1486dependences that might occur for any value of the
1487independent variables. For example, the intermediate
1488{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1489always assumed to depend on all independent variables that {\sf a} or {\sf b}
1490dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1491logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1492In contrast
1493the tight option gives this result only in the unlikely event of an exact
1494tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1495associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1496depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1497Obviously, the sparsity pattern obtained with the tight option may contain
1498more zeros than that obtained with the safe option. On the other hand, it
1499will only be valid at points belonging to an area where the function $F$ is locally
1500analytic and that contains the point at which the internal representation was
1501generated. Since generating the sparsity structure using the safe version does not
1502require any reevaluation, it may thus reduce the overall computational cost
1503despite the fact that it produces more nonzero entries.
1504The value of {\sf options[2]} selects the direction of bit pattern propagation.
1505Depending on the number of independent $n$ and of dependent variables $m$ 
1506one would prefer the forward mode if $n$ is significant smaller than $m$ and
1507would otherwise use the reverse mode.
1508
1509 The elements of the array {\sf options} control the action of
1510{\sf sparse\_hess} according to \autoref{options_sparse_hess}.
1511\begin{table}[h]
1512\center
1513\begin{tabular}{|c|c|l|} \hline
1514component & value &  \\ \hline
1515{\sf options[0]} &    &  test the computational graph control flow \\
1516                 & 0  &  safe mode (default) \\
1517                 & 1  &  tight mode \\ \hline
1518{\sf options[1]} &    &  way of recovery \\
1519                 & 0  &  indirect recovery (default) \\
1520                 & 1  &  direct recovery \\ \hline
1521\end{tabular}
1522\caption{ {\sf sparse\_hess} parameter {\sf options}\label{options_sparse_hess}}
1523\end{table}           
1524
1525The described driver routines for the computation of sparse derivative
1526matrices are prototyped in the header file
1527\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1528global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1529Example codes illustrating the usage of {\sf
1530  sparse\_jac} and {\sf sparse\_hess} can be found in the file
1531\verb=sparse_jacobian.cpp=  and \verb=sparse_hessian.cpp= contained in %the subdirectory
1532\verb=examples/additional_examples/sparse=.
1533%
1534%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1535\subsubsection*{Computation of Sparsity Pattern}
1536%
1537ADOL-C offers a convenient way of determining the 
1538sparsity structure of a Jacobian matrix using the function:
1539%
1540\begin{tabbing}
1541\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1542\>{\sf int jac\_pat(tag, m, n, x, JP, options)}\\
1543\>{\sf short int tag;} \> // tape identification \\
1544\>{\sf int m;} \> // number of dependent variables $m$\\
1545\>{\sf int n;} \> // number of independent variables $n$\\
1546\>{\sf double x[n];} \> // independent variables $x_0$\\
1547\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure\\
1548\>{\sf int options[2];} \> // array of control parameters
1549\end{tabbing}
1550%
1551The sparsity pattern of the
1552Jacobian is computed in a compressed row format. For this purpose,
1553{\sf JP} has to be an $m$ dimensional array of pointers to {\sf
1554  unsigned int}s, i.e., one has {\sf unsigned int* JP[m]}.
1555During the call of  {\sf jac\_pat}, the number $\hat{n}_i$ of nonzero
1556entries in row $i$ of the Jacobian is determined for all $1\le i\le
1557m$. Then, a memory allocation is performed such that {\sf JP[i-1]}
1558points to a block of $\hat{n}_i+1$ {\sf  unsigned int} for all $1\le
1559i\le m$ and {\sf JP[i-1][0]} is set to $\hat{n}_i$. Subsequently, the
1560column indices of the $j$ nonzero entries in the $i$th row are stored
1561in the components  {\sf JP[i-1][1]}, \ldots, {\sf JP[i-1][j]}.
1562
1563The elements of the array {\sf options} control the action of
1564{\sf jac\_pat} according to \autoref{options}.
1565\begin{table}[h]
1566\center
1567\begin{tabular}{|c|c|l|} \hline
1568component & value &  \\ \hline
1569{\sf options[0]} &    &  way of sparsity pattern computation \\
1570                 & 0  &  propagation of index domains (default) \\
1571                 & 1  &  propagation of bit pattern \\ \hline
1572{\sf options[1]} &    &  test the computational graph control flow \\
1573                 & 0  &  safe mode (default) \\
1574                 & 1  &  tight mode \\ \hline
1575{\sf options[2]} &    &  way of bit pattern propagation \\
1576                 & 0  &  automatic detection (default) \\
1577                 & 1  &  forward mode \\ 
1578                 & 2  &  reverse mode \\ \hline
1579\end{tabular}
1580\caption{ {\sf jac\_pat} parameter {\sf options}\label{options}}
1581\end{table}           
1582The value of {\sf options[0]} selects the way to compute the sparsity
1583pattern. The component {\sf options[1]} determines
1584the usage of the safe or tight mode of bit pattern propagation.
1585The first, more conservative option is the default. It accounts for all
1586dependences that might occur for any value of the
1587independent variables. For example, the intermediate
1588{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1589always assumed to depend on all independent variables that {\sf a} or {\sf b}
1590dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1591logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1592In contrast
1593the tight option gives this result only in the unlikely event of an exact
1594tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1595associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1596depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1597Obviously, the sparsity pattern obtained with the tight option may contain
1598more zeros than that obtained with the safe option. On the other hand, it
1599will only be valid at points belonging to an area where the function $F$ is locally
1600analytic and that contains the point at which the internal representation was
1601generated. Since generating the sparsity structure using the safe version does not
1602require any reevaluation, it may thus reduce the overall computational cost
1603despite the fact that it produces more nonzero entries. The value of
1604{\sf options[2]} selects the direction of bit pattern propagation.
1605Depending on the number of independent $n$ and of dependent variables $m$ 
1606one would prefer the forward mode if $n$ is significant smaller than $m$ and
1607would otherwise use the reverse mode.
1608
1609The routine {\sf jac\_pat} may use the propagation of bitpattern to
1610determine the sparsity pattern. Therefore, a kind of ``strip-mining''
1611is used to cope with large matrix dimensions. If the system happens to run out of memory, one may reduce
1612the value of the constant {\sf PQ\_STRIPMINE\_MAX}
1613following the instructions in \verb=<adolc/sparse/sparse_fo_rev.h>=.
1614
1615The driver routine is prototyped in the header file
1616\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1617global header file \verb=<adolc/adolc.h>= (see
1618\autoref{ssec:DesIH}). The determination of sparsity patterns is
1619illustrated by the examples \verb=sparse_jacobian.cpp=
1620and \verb=jacpatexam.cpp=
1621contained in
1622\verb=examples/additional_examples/sparse=.
1623
1624To compute the sparsity pattern of a Hessian in a row compressed form, ADOL-C provides the
1625driver
1626\begin{tabbing}
1627\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1628\>{\sf int hess\_pat(tag, n, x, HP, options)}\\
1629\>{\sf short int tag;}       \> // tape identification \\
1630\>{\sf int n;}               \> // number of independent variables $n$\\
1631\>{\sf double x[n];}         \> // independent variables $x_0$\\
1632\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure\\
1633\>{\sf int option;}          \> // control parameter
1634\end{tabbing}
1635where the user has to provide {\sf HP} as an $n$ dimensional array of pointers to {\sf
1636 unsigned int}s.
1637After the function call {\sf HP} contains the sparsity pattern,
1638where {\sf HP[j][0]} contains the number of nonzero elements in the
1639 $j$th row for $1 \le j\le n$.
1640The components {\sf P[j][i]}, $0<${\sf i}~$\le$~{\sf P[j][0]} store the
1641 indices of these entries. For determining the sparsity pattern, ADOL-C uses
1642 the algorithm described in \cite{Wa05a}.  The parameter{\sf option} determines
1643the usage of the safe ({\sf option = 0}, default) or tight mode ({\sf
1644  option = 1}) of the computation of the sparsity pattern as described
1645above.
1646
1647This driver routine is prototyped in the header file
1648\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1649global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1650An example employing the procedure {\sf hess\_pat}  can be found in the file
1651\verb=sparse_hessian.cpp=  contained in
1652\verb=examples/additional_examples/sparse=.
1653%
1654%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1655\subsubsection*{Calculation of Seed Matrices}
1656%
1657To compute a compressed derivative matrix from a given sparsity
1658pattern, one has to calculate an appropriate seed matrix that can be
1659used as input for the derivative calculation. To facilitate the
1660generation of seed matrices for a sparsity pattern given in
1661row compressed form, ADOL-C provides the following two drivers,
1662which are based on the ColPack library:
1663\begin{tabbing}
1664\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1665\>{\sf int generate\_seed\_jac(m, n, JP, S, p)}\\
1666\>{\sf int m;} \> // number of dependent variables $m$\\
1667\>{\sf int n;} \> // number of independent variables $n$\\
1668\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure
1669of Jacobian\\
1670\>{\sf double S[n][p];} \> // seed matrix\\
1671\>{\sf int p;} \> // number of columns in $S$
1672\end{tabbing}
1673The input variables to {\sf generate\_seed\_jac} are the number of dependent variables $m$, the
1674number of independent variables {\sf n} and the sparsity pattern {\sf
1675  JP} of the Jacobian computed for example by {\sf jac\_pat}. First,
1676{\sf generate\_seed\_jac} performs a distance-2 coloring of the bipartite graph defined by the sparsity
1677pattern {\sf JP} as described in \cite{GeMaPo05}. The number of colors needed for the coloring
1678determines the number of columns {\sf p} in the seed
1679matrix. Subsequently, {\sf generate\_seed\_jac} allocates the memory needed by {\sf
1680 S} and initializes {\sf S} according to the graph coloring.
1681The coloring algorithm that is applied in {\sf
1682  generate\_seed\_jac} is used also by the driver {\sf sparse\_jac}
1683described earlier.
1684
1685\begin{tabbing}
1686\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1687\>{\sf int generate\_seed\_hess(n, HP, S, p)}\\
1688\>{\sf int n;} \> // number of independent variables $n$\\
1689\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure
1690of Jacobian\\
1691\>{\sf double S[n][p];} \> // seed matrix\\
1692\>{\sf int p;} \> // number of columns in $S$
1693\end{tabbing}
1694The input variables to {\sf generate\_seed\_hess} are the number of independents $n$
1695and the sparsity pattern {\sf HP} of the Hessian computed for example
1696by {\sf hess\_pat}. First, {\sf generate\_seed\_hess} performs an
1697appropriate coloring of the adjacency graph defined by the sparsity
1698pattern {\sf HP}: An acyclic coloring in the case of an indirect recovery of the Hessian from its
1699    compressed representation and a star coloring in the case of a direct recovery.
1700 Subsequently, {\sf generate\_seed\_hess} allocates the memory needed by {\sf
1701 S} and initializes {\sf S} according to the graph coloring.
1702The coloring algorithm applied in {\sf
1703  generate\_seed\_hess} is used also by the driver {\sf sparse\_hess}
1704described earlier.
1705
1706The specific set of criteria used to define a seed matrix $S$ depends
1707on whether the sparse derivative matrix
1708to be computed is a Jacobian (nonsymmetric) or a Hessian (symmetric). 
1709It also depends on whether the entries of the derivative matrix  are to be
1710recovered from the compressed representation \emph{directly}
1711(without requiring any further arithmetic) or \emph{indirectly} (for
1712example, by solving for unknowns via successive substitutions).
1713Appropriate recovery routines are provided by ColPack and used
1714in the drivers {\sf sparse\_jac} and {\sf sparse\_hess} described in
1715the previous subsection. Examples with a detailed analysis of the
1716employed drivers for the exploitation of sparsity can be found in the
1717papers \cite{GePoTaWa06} and \cite{GePoWa08}.
1718
1719
1720These driver routines are prototyped in
1721\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1722global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1723An example code illustrating the usage of {\sf
1724generate\_seed\_jac} and {\sf generate\_seed\_hess} can be found in the file
1725\verb=sparse_jac_hess_exam.cpp= contained in \verb=examples/additional_examples/sparse=.
1726%
1727%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1728\subsection{Higher Derivative Tensors}
1729\label{higherOrderDeriv}
1730%
1731Many applications in scientific computing need second- and higher-order
1732derivatives. Often, one does not require full derivative tensors but
1733only the derivatives in certain directions $s_i \in \R^{n}$.
1734Suppose a collection of $p$ directions
1735$s_i \in \R^{n}$ is given, which form a matrix
1736\[
1737S\; =\; \left [ s_1, s_2,\ldots,  s_p \right ]\; \in \;
1738 \R^{n \times p}.
1739\]
1740One possible choice is $S = I_n$ with  $p = n$, which leads to
1741full tensors being evaluated.
1742ADOL-C provides the function {\sf tensor\_eval}
1743to calculate the derivative tensors
1744\begin{eqnarray}
1745\label{eq:tensor}
1746\left. \nabla_{\mbox{$\scriptstyle \!\!S$}}^{k}
1747     F(x_0) \; = \; \frac{\partial^k}{\partial z^k} F(x_0+Sz) \right |_{z=0} 
1748     \in \R^{p^k}\quad \mbox{for} \quad k = 0,\ldots,d
1749\end{eqnarray}
1750simultaneously. The function {\sf tensor\_eval} has the following calling sequence and
1751parameters:
1752%
1753\begin{tabbing}
1754\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1755\>{\sf void tensor\_eval(tag,m,n,d,p,x,tensor,S)}\\
1756\>{\sf short int tag;}         \> // tape identification \\
1757\>{\sf int m;}                 \> // number of dependent variables $m$ \\
1758\>{\sf int n;}                 \> // number of independent variables $n$\\
1759\>{\sf int d;}                 \> // highest derivative degree $d$\\
1760\>{\sf int p;}                 \> // number of directions $p$\\
1761\>{\sf double x[n];}           \> // values of independent variables $x_0$\\
1762\>{\sf double tensor[m][size];}\> // result as defined in \eqref{eq:tensor} in compressed form\\
1763\>{\sf double S[n][p];}        \> // seed matrix $S$
1764\end{tabbing}
1765%
1766Using the symmetry of the tensors defined by \eqref{eq:tensor}, the memory 
1767requirement can be reduced enormously. The collection of  tensors up to order $d$ comprises 
1768$\binom{p+d}{d}$ distinct elements. Hence, the second dimension of {\sf tensor} must be
1769greater or equal to $\binom{p+d}{d}$.
1770To compute the derivatives, {\sf tensor\_eval} propagates internally univariate Taylor
1771series along $\binom{n+d-1}{d}$ directions. Then the desired values are interpolated. This
1772approach is described in \cite{Griewank97}.
1773
1774The access of individual entries in symmetric tensors of
1775higher order is a little tricky. We always store the derivative
1776values in the two dimensional array {\sf tensor} and provide two
1777different ways of accessing them. 
1778The leading dimension of the tensor array ranges over
1779the component index $i$ of the function $F$, i.e., $F_{i+1}$ for $i =
17800,\ldots,m-1$. The sub-arrays pointed to by {\sf tensor[i]} have identical
1781structure for all $i$. Each of them represents the symmetric tensors up to
1782order $d$ of the scalar function $F_{i+1}$ in $p$ variables. 
1783%
1784The $\binom{p+d}{d}$ mixed partial derivatives in each of the $m$
1785tensors are linearly ordered according to the tetrahedral
1786scheme described by Knuth \cite{Knuth73}. In the familiar quadratic
1787case $d=2$ the derivative with respect to $z_j$ and $z_k$ with $z$ 
1788as in \eqref{eq:tensor} and $j \leq k$ is stored at {\sf tensor[i][l]} with
1789$l = k*(k+1)/2+j$. At $j = 0 = k$ and hence $l = 0$ we find the
1790function value $F_{i+1}$ itself and the gradient
1791$\nabla F_{i+1}= \partial F_{i+1}/\partial x_k $ is stored at $l=k(k+1)/2$
1792with $j=0$ for $k=1,\ldots,p$.
1793
1794For general $d$ we combine the variable
1795indices to a multi-index $j = (j_1,j_2,\ldots,j_d)$,
1796where $j_k$ indicates differentiation with respect to variable
1797$x_{j_k}$ with $j_k \in \{0,1,\ldots,p\}$. The value $j_k=0$ indicates
1798no differentiation so that all lower derivatives are also
1799contained in the same data structure as described above for
1800the quadratic case. The location of the partial derivative specified
1801by $j$ is computed by the function
1802%
1803\begin{tabbing}
1804\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1805\>{\sf int tensor\_address(d,$\,$j)} \\
1806\>{\sf int d;}                 \> // highest derivative degree $d$ \\
1807\>{\sf int j[d];}              \> // multi-index $j$
1808\end{tabbing}       
1809%
1810and it may thus be referenced as {\sf tensor[i][tensor\_address(d,$\,$j)]}.
1811Notice that the address computation does depend on the degree $d$ 
1812but not on the number of directions $p$, which could theoretically be
1813enlarged without the need to reallocate the original tensor.
1814Also, the components of $j$ need to be non-increasing.
1815%
1816To some C programmers it may appear more natural to access tensor
1817entries by successive dereferencing in the form
1818{\sf tensorentry[i][$\,$j1$\,$][$\,$j2$\,$]$\ldots$[$\,$jd$\,$]}.
1819We have also provided this mode, albeit with the restriction
1820that the indices $j_1,j_2,\ldots,j_d$ are non-increasing.
1821In the second order case this means that the Hessian entries must be
1822specified in or below the diagonal. If this restriction is
1823violated the values are almost certain to be wrong and array bounds
1824may be violated. We emphasize that subscripting is not overloaded
1825but that {\sf tensorentry} is a conventional and
1826thus moderately efficient C pointer structure.
1827Such a pointer structure can be allocated and set up completely by the
1828function
1829%
1830\begin{tabbing}
1831\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1832\>{\sf void** tensorsetup(m,p,d,tensor)} \\
1833\>{\sf int m;}                 \> // number of dependent variables $n$ \\
1834\>{\sf int p;}                 \> // number of directions $p$\\
1835\>{\sf int d;}                 \> // highest derivative degree $d$\\
1836\>{\sf double tensor[m][size];}\> // pointer to two dimensional array
1837\end{tabbing}     
1838%
1839Here, {\sf tensor} is the array of $m$ pointers pointing to arrays of {\sf size}
1840$\geq \binom{p+d}{d}$ allocated by the user before. During the execution of {\sf tensorsetup},
1841 $d-1$ layers of pointers are set up so that the return value
1842allows the direct dereferencing of individual tensor elements.
1843
1844For example, suppose some active section involving  $m \geq 5$ dependents and
1845$n \geq 2$ independents has been executed and taped. We may
1846select $p=2$, $d=3$ and initialize the $n\times 2$ seed matrix $S$ with two
1847columns $s_1$ and $s_2$. Then we are able to execute the code segment
1848\begin{tabbing}
1849\hspace{0.5in}\={\sf double**** tensorentry = (double****) tensorsetup(m,p,d,tensor);} \\
1850              \>{\sf tensor\_eval(tag,m,n,d,p,x,tensor,S);}   
1851\end{tabbing}
1852This way, we evaluated all tensors defined in \eqref{eq:tensor} up to degree 3
1853in both directions $s_1$ and
1854$s_2$ at some argument $x$. To allow the access of tensor entries by dereferencing the pointer
1855structure {\sf tensorentry} has been created. Now, 
1856the value of the mixed partial
1857\[
1858 \left. \frac{\partial ^ 3 F_5(x+s_1 z_1+s_2 z_2)}{\partial z_1^2 \partial z_2}   \right |_{z_1=0=z_2
1859\]
1860can be recovered as
1861\begin{center}
1862   {\sf tensorentry[4][2][1][1]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[4][tensor\_address(d,$\,$j)]},
1863\end{center}
1864where the integer array {\sf j} may equal (1,1,2), (1,2,1) or (2,1,1). 
1865Analogously, the entry
1866\begin{center}   
1867   {\sf tensorentry[2][1][0][0]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[2][tensor\_address(d,$\,$j)]}
1868\end{center}
1869with {\sf j} = (1,0,0) contains the first derivative of the third dependent
1870variable $F_3$ with respect to the first differentiation parameter $z_1$.
1871
1872Note, that the pointer structure {\sf tensorentry} has to be set up only once. Changing the values of the
1873array {\sf tensor}, e.g.~by a further call of {\sf tensor\_eval}, directly effects the values accessed
1874by {\sf tensorentry}.
1875%
1876When no more derivative evaluations are desired the pointer structure
1877{\sf tensorentry} can be deallocated by a call to the function
1878%
1879\begin{tabbing}
1880\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1881\>{\sf int freetensor(m,p,d, (double ****) tensorentry)}\\
1882\>{\sf int m;}                    \> // number of dependent variables $m$ \\
1883\>{\sf int p;}                    \> // number of independent variables $p$\\
1884\>{\sf int d;}                    \> // highest derivative degree $d$\\
1885\>{\sf double*** tensorentry[m];} \> // return value of {\sf tensorsetup} 
1886\end{tabbing} 
1887%
1888that does not deallocate the array {\sf tensor}.
1889
1890The drivers provided for efficient calculation of higher order
1891derivatives are prototyped in the header file \verb=<adolc/drivers/taylor.h>=,
1892which is included by the global header file \verb=<adolc/adolc.h>= automatically
1893(see \autoref{ssec:DesIH}).
1894Example codes using the above procedures can be found in the files
1895\verb=taylorexam.C= and \verb=accessexam.C= contained in the subdirectory
1896\verb=examples/additional_examples/taylor=.
1897%
1898%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1899\subsection{Derivatives of Implicit and Inverse Functions}
1900\label{implicitInverse}
1901%
1902Frequently, one needs derivatives of variables
1903$y \in \R^{m}$ that are implicitly defined as
1904functions of some variables $x \in \R^{n-m}$
1905by an algebraic system of equations
1906\[
1907G(z) \; = \; 0 \in \R^m \quad
1908{\rm with} \quad z = (y, x) \in \R^n .
1909\] 
1910Naturally, the $n$ arguments of $G$ need not be partitioned in
1911this regular fashion and we wish to provide flexibility for a
1912convenient selection of the $n-m$ {\em truly} independent
1913variables. Let $P \in \R^{(n-m)\times n}$ be a $0-1$ matrix
1914that picks out these variables so that it is a column
1915permutation of the matrix $[0,I_{n-m}] \in \R^{(n-m)\times n}$.
1916Then the nonlinear system
1917\[
1918  G(z) \; = \; 0, \quad P z =  x,                           
1919\] 
1920has a regular Jacobian, wherever the implicit function theorem
1921yields $y$ as a function of $x$. Hence, we may also write
1922\begin{equation}
1923\label{eq:inv_tensor}
1924F(z) = \left(\begin{array}{c}
1925                        G(z) \\
1926                        P z
1927                      \end{array} \right)\; \equiv \;
1928                \left(\begin{array}{c}
1929                        0 \\
1930                        P z
1931                      \end{array} \right)\; \equiv \; S\, x,
1932\end{equation}
1933where $S = [0,I_p]^{T} \in \R^{n \times p}$ with $p=n-m$. Now, we have rewritten
1934the original implicit functional relation between $x$ and $y$ as an inverse
1935relation $F(z) = Sx$. In practice, we may implement the projection $P$ simply
1936by marking $n-m$ of the independents also dependent. 
1937
1938Given any $ F : \R^n \mapsto \R^n $ that is locally invertible and an arbitrary
1939seed matrix $S \in \R^{n \times p}$ we may evaluate all derivatives of $z \in \R^n$
1940with respect to $x \in \R^p$ by calling the following routine:
1941%
1942\begin{tabbing}
1943\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1944\>{\sf void inverse\_tensor\_eval(tag,n,d,p,z,tensor,S)}\\
1945\>{\sf short int tag;}         \> // tape identification \\
1946\>{\sf int n;}                 \> // number of variables $n$\\
1947\>{\sf int d;}                 \> // highest derivative degree $d$\\
1948\>{\sf int p;}                 \> // number of directions $p$\\
1949\>{\sf double z[n];}          \> // values of independent variables $z$\\
1950\>{\sf double tensor[n][size];}\> // partials of $z$ with respect to $x$\\
1951\>{\sf double S[n][p];}        \> // seed matrix $S$
1952\end{tabbing}                 
1953%                 
1954The results obtained in {\sf tensor} are exactly the same as if we had called {\sf tensor\_eval} with
1955{\sf tag} pointing to a tape for the evaluation of the inverse function
1956$z=F^{-1}(y)$ for which naturally $n=m$. Note that the columns of $S$ belong
1957to the domain of that function. Individual derivative components can be
1958accessed in tensor exactly as in the explicit case described above.
1959
1960It must be understood that {\sf inverse\_tensor\_eval} actually computes the
1961derivatives of $z$ with respect to $x$ that is defined by the equation
1962$F(z)=F(z_0)+S \, x$. In other words the base point at
1963which the inverse function is differentiated is given by $F(z_0)$.
1964The routine has no capability for inverting $F$ itself as
1965solving systems of nonlinear
1966equations $F(z)=0$ in the first place is not just a differentiation task.
1967However, the routine {\sf jac\_solv} described in \autoref{optdrivers} may certainly be very
1968useful for that purpose.
1969
1970As an example consider the following two nonlinear expressions
1971\begin{eqnarray*}
1972      G_1(z_1,z_2,z_3,z_4) & = & z_1^2+z_2^2-z_3^\\
1973      G_2(z_1,z_2,z_3,z_4) & = & \cos(z_4) - z_1/z_3 \enspace   .
1974\end{eqnarray*}   
1975The equations $G(z)=0$ describe the relation between the Cartesian
1976coordinates $(z_1,z_2)$ and the polar coordinates $(z_3,z_4)$ in the plane.
1977Now, suppose we are interested in the derivatives of the second Cartesian
1978$y_1=z_2$ and the second (angular) polar coordinate $y_2=z_4$ with respect
1979to the other two variables $x_1=z_1$ and $x_2=z_3$. Then the active section
1980could look simply like
1981%
1982\begin{tabbing}
1983\hspace{1.5in}\={\sf for (j=1; j $<$ 5;$\,$j++)}\hspace{0.15in} \= {\sf z[j] \boldmath $\ll=$ \unboldmath  zp[j];}\\
1984\>{\sf g[1] = z[1]*z[1]+z[2]*z[2]-z[3]*z[3]; }\\
1985\>{\sf g[2] = cos(z[4]) - z[1]/z[3]; }\\
1986\>{\sf g[1] \boldmath $\gg=$ \unboldmath gp[1];} \> {\sf g[2] \boldmath $\gg=$ \unboldmath gp[2];}\\
1987\>{\sf z[1] \boldmath $\gg=$ \unboldmath zd[1];} \> {\sf z[3] \boldmath $\gg=$ \unboldmath zd[2];}
1988\end{tabbing}     
1989%
1990where {\sf zd[1]} and {\sf zd[2]} are dummy arguments.
1991In the last line the two independent variables {\sf z[1]} and
1992{\sf z[3]} are made
1993simultaneously dependent thus generating a square system that can be
1994inverted (at most arguments). The corresponding projection and seed
1995matrix are
1996\begin{eqnarray*}
1997P \;=\; \left( \begin{array}{cccc}
1998               1 & 0 & 0 & 0 \\
1999               0 & 0 & 1 & 0
2000            \end{array}\right) \quad \mbox{and} \quad
2001S^T \; = \; \left( \begin{array}{cccc}
2002               0 & 0 & 1 & 0 \\
2003               0 & 0 & 0 & 1
2004            \end{array}\right\enspace .
2005\end{eqnarray*}
2006Provided the vector {\sf zp} is consistent in that its Cartesian and polar
2007components describe the same point in the plane the resulting tuple
2008{\sf gp} must vanish. The call to {\sf inverse\_tensor\_eval} with
2009$n=4$, $p=2$ and $d$
2010as desired will yield the implicit derivatives, provided
2011{\sf tensor} has been allocated appropriately of course and $S$ has the value
2012given above.
2013%
2014The example is untypical in that the implicit function could also be
2015obtained explicitly by symbolic mani\-pu\-lations. It is typical in that
2016the subset of $z$ components that are to be considered as truly
2017independent can be selected and altered with next to no effort at all.
2018
2019The presented drivers are prototyped in the header file
2020\verb=<adolc/drivers/taylor.h>=. As indicated before this header
2021is included by the global header file \verb=<adolc/adolc.h>= automatically
2022(see \autoref{ssec:DesIH}).
2023The example programs \verb=inversexam.cpp=, \verb=coordinates.cpp= and
2024\verb=trigger.cpp=  in the directory \verb=examples/additional_examples/taylor=
2025show the application of the procedures described here.
2026%
2027%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2028\subsection{Drivers of the Toolbox for Lie Derivatives}
2029Nonlinear controller and observer design often require certain types of Lie derivatives. These derivatives also arise in other areas such as classical mechanics and relativity.
2030Lie derivatives are total derivatives of tensor fields along a vector field.
2031The drivers for calculating Lie derivatives in ADOL-C presented in this section
2032were developed by Klaus R{\"o}benack~\cite{Roeb05} and his co-workers Jan Winkler, Siqian Wang and Mirko Franke~\cite{Roeb11}.
2033They are prototyped as C functions in the header file {\verb=<adolc_lie.h>=} and
2034allow the calculation of Lie derivatives of scalar, vector and covector fields. In addition, the calculation of gradients of Lie derivatives of scalar fields is supported.
2035
2036We consider computational problems occurring in controller and observer design for
2037nonlinear control systems
2038\begin{equation*}
2039\dot{\mathbf x}=\mathbf f(\mathbf x)+\mathbf g(\mathbf x)u,
2040\quad \mathbf y=\mathbf h(\mathbf x),\quad \mathbf x(0)=\mathbf x_0 \in \Omega\\
2041\end{equation*}
2042with the scalar input u, the state $\mathbf x$, and the output $\mathbf y$, where the vector
2043fields $\mathbf f : \Omega \to \R^n$, $\mathbf g: \Omega \to \R^n$
2044and the map $\mathbf h: \Omega\to \R^m$ (for $m=1$ this map is the scalar field $h$)
2045are defined on an open subset $\Omega\subseteq\R^n$.
2046
2047\sloppy
2048\subsubsection*{Lie Derivatives of Scalar Fields}
2049Consider the initial value problem
2050\[
2051 \dot{\mathbf x}={\mathbf f}(\mathbf x),\quad y=h(\mathbf x),\quad \mathbf{x}(0)=\mathbf{x}_0\in\Omega
2052\]
2053with the vector field $\mathbf f : \Omega \to \R^n$. The Lie derivative of the scalar field $h:\Omega\to\R$ along the vector field~$\mathbf{f}$ are time derivatives of the curve~$y$, i.e.,
2054\[
2055 \dot{y}(0)=L_\mathbf{f}h(\mathbf{x}_0),\;
2056 \ddot{y}(0)=L_\mathbf{f}^2h(\mathbf{x}_0),\;  \ldots,\;
2057 y^{(d)}(0)=L_\mathbf{f}^dh(\mathbf{x}_0).
2058\]
2059Formally, the Lie derivative of the scalar field~$h$ along the vector field~$\mathbf{f}$ is defined by
2060\[
2061 L_{\mathbf f} h(\mathbf x)=\frac{\partial h(\mathbf x)}{\partial \mathbf x}\,\mathbf f(\mathbf x).
2062\]
2063Higher order Lie derivatives are given by the recursion
2064\[
2065 L_{\mathbf f}^{k+1} h(\mathbf x)= \frac{\partial L_{\mathbf f}^k h(\mathbf x)}{\partial \mathbf x}\,\mathbf f(\mathbf x)\quad\text{with}\quad L_{\mathbf f}^0 h(\mathbf x)= h(\mathbf x).
2066\]
2067To compute Lie derivatives
2068\[
2069(L_\mathbf f^0h(\mathbf x_0),\ldots,L_\mathbf f^dh(\mathbf x_0))
2070\]
2071we need the trace numbers of the active sections of the vector field~$\mathbf f$ and the scalar field~$h$,
2072the number of independent variables $n$, the initial value $\mathbf x_0\in\Omega$
2073and the highest derivative degree $d$. The values of the Lie derivatives are
2074then stored in the one dimensional array {\verb=result=} of length $d+1$:
2075
2076\begin{tabbing}
2077\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2078\>{\sf int lie\_scalarc(Tape\_F, Tape\_H, n, x0, d, result)}\\
2079\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2080\>{\sf short Tape\_H;}         \> // trace identification of scalar field $h$\\
2081\>{\sf short n;}               \> // number of independent variables $n$ and $m = 1$\\
2082\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2083\>{\sf short d; }              \> // highest derivative degree $d$\\
2084\>{\sf double result[d+1];}    \> // resulting Lie derivatives of a scalar field\\
2085\end{tabbing}   
2086
2087
2088For a smooth vector-valued map $\mathbf h: \Omega\rightarrow \R^m$ with
2089$\mathbf h(\mathbf x)=(h_1(\mathbf x),\ldots,h_m(\mathbf x))^T$
2090we define the Lie derivative
2091\begin{equation}\label{eq:lie_vectorial}
2092L_\mathbf f^k\mathbf h(\mathbf x_0)=
2093(L_\mathbf f^kh_1(\mathbf x_0),\ldots,L_\mathbf f^kh_m(\mathbf x_0))^T\in\R^m
2094 \quad \text{for}\quad k = 0, \cdots, d,
2095\end{equation}
2096component-wise by the Lie derivatives $L_\mathbf f^k h_1(\mathbf x_0),\ldots,L_\mathbf f^k h_m(\mathbf x_0)$ of the $m$ scalar fields $h_1, \ldots, h_m:\Omega\to\R$. Note that~\eqref{eq:lie_vectorial} should not be confused with the Lie derivative of a vector field. To compute Lie derivatives~\eqref{eq:lie_vectorial} of the vector-valued map $\mathbf h$
2097we additionally need the number of dependent variables $m$:
2098
2099\begin{tabbing}
2100\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2101\>{\sf int lie\_scalarcv(Tape\_F, Tape\_H, n, m, x0, d, result)}\\
2102\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2103\>{\sf short Tape\_H;}         \> // trace identification of vector-valued map $\mathbf h$\\
2104\>{\sf short n;}               \> // number of independent variables $n$\\
2105\>{\sf short m;}               \> // number of dependent variables $m$\\
2106\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2107\>{\sf short d; }              \> // highest derivative degree $d$\\
2108\>{\sf double result[m][d+1];} \> // resulting Lie derivatives of vectorial scalar fields\\
2109\end{tabbing} 
2110
2111\subsubsection*{Gradients of Lie Derivatives of Scalar Fields}
2112To compute the gradients
2113\[
2114(\mathrm{d}L_\mathbf f^0h(\mathbf x_0),\ldots, \mathrm{d}L_\mathbf f^dh(\mathbf x_0))
2115\]
2116of the Lie derivatives $L_\mathbf f^0h(\mathbf x_0),\ldots, L_\mathbf f^dh(\mathbf x_0)$ of the scalar field $h:\Omega\to\R$ along the vector field $\mathbf f:\Omega\to\R^n$,
2117the following C function can be used:
2118
2119\begin{tabbing}
2120\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2121\>{\sf int lie\_gradientc(Tape\_F, Tape\_H, n, x0, d, result)}\\
2122\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2123\>{\sf short Tape\_H;}         \> // trace identification of scalar field $h$\\
2124\>{\sf short n;}               \> // number of independent variables $n$ and $m=1$\\
2125\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2126\>{\sf short d;}               \> // highest derivative degree $d$\\
2127\>{\sf double result[n][d+1];} \> // resulting gradients of Lie derivatives\\
2128\>{\sf} \> // of a scalar field\\
2129\end{tabbing} 
2130
2131For calculating the jacobians $\mathrm{d}L_\mathbf f^k\mathbf h(\mathbf x_0)\in\R^{m\times n}$
2132of Lie derivatives~\eqref{eq:lie_vectorial} of the vector-valued map $\mathbf h: \Omega\rightarrow \R^m$ for $k=0,\ldots,d$ one can use the following function, where the $d+1$ matrices are stored in the three dimensional array {\verb=result=}:
2133
2134\begin{tabbing}
2135\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2136\>{\sf int lie\_gradientcv(Tape\_F, Tape\_H, n, m, x0, d, result)}\\
2137\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2138\>{\sf short Tape\_H;}         \> // trace identification of vector-valued map $\mathbf h$\\
2139\>{\sf short n;}               \> // number of independent variables $n$\\
2140\>{\sf short m;}               \> // number of dependent variables $m$\\
2141\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2142\>{\sf short d;}               \> // highest derivative degree $d$\\
2143\>{\sf double result[m][n][d+1];} \> // resulting jacobians of Lie derivatives\\
2144\>{\sf} \> // of vectorial scalar fields\\
2145\end{tabbing} 
2146
2147\subsubsection*{Lie Derivatives of Covector Fields}
2148
2149If we consider the elements of the real vector space $\R^n$ as column vectors, the elements of the associated dual space $(\R^n)^*$ can be represented by row vectors. These row vectors are called adjoint vectors in ADOL-C and covectors in differential geometry.
2150In a program the user does not have to distinguish between a vector field
2151$\mathbf f: \Omega \rightarrow \R^n$ and a covector field
2152$\boldsymbol{\omega}: \Omega \rightarrow (\R^n)^*$, since for both fields the dependent variables are stored in one dimensional arrays.
2153The Lie derivative of a covector field~$\boldsymbol{\omega}$ is defined by
2154\[
2155 L_{\mathbf f}\boldsymbol{\omega}(\mathbf x)=\boldsymbol{\omega}(\mathbf x)\,{\mathbf f}^\prime(\mathbf x)
2156 + {\mathbf f}^T(\mathbf x) \left(\frac{\partial \boldsymbol{\omega}^T (\mathbf x)}{\partial \mathbf x}\right)^T.
2157\]
2158The $d+1$ Lie derivatives $L_{\mathbf f}^0\boldsymbol{\omega}(\mathbf x),\ldots,L_{\mathbf f}^d\boldsymbol{\omega}(\mathbf x)$ of the covector field $\boldsymbol{\omega}$ along the vector field~$\mathbf{f}$
2159can be computed by
2160
2161\begin{tabbing}
2162\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2163\>{\sf int lie\_covector(Tape\_F, Tape\_W, n, x0, d, result)}\\
2164\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2165\>{\sf short Tape\_W;}         \> // trace identification of covector field $\boldsymbol{\omega}$\\
2166\>{\sf short n;}               \> // number of independent variables $n$\\
2167\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2168\>{\sf short d;}               \> // highest derivative degree $d$\\
2169\>{\sf double result[n][d+1];} \> // resulting Lie derivatives of a covector field\\
2170\end{tabbing} 
2171
2172\subsubsection*{Lie Derivatives of a Vector Field (Lie Brackets)}
2173
2174Lie derivatives (Lie brackets) of a vector field $\mathbf g: \Omega \rightarrow \R^n$ along a vector field $\mathbf f: \Omega \rightarrow \R^n$ are defined by
2175\[
2176 \mathrm{ad}_\mathbf{f}\mathbf{g}(\mathbf{x})=[\mathbf{f},\mathbf{g}](\mathbf{x})
2177 =\mathbf{g}^{\prime}(\mathbf{x})\mathbf{f}(\mathbf{x})-\mathbf{f}^{\prime}(\mathbf{x})\mathbf{g}(\mathbf{x}).
2178\]
2179We can calculate iterated Lie brackets $\mathrm{ad}_\mathbf{f}^0\mathbf{g}(\mathbf{x}),\ldots,\mathrm{ad}_\mathbf{f}^{d}\mathbf{g}(\mathbf{x})$ at the point $\mathbf{x}_0\in\Omega$ 
2180using the C function listed below:
2181
2182\begin{tabbing}
2183\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2184\>{\sf int lie\_bracket(Tape\_F, Tape\_G, n, x0, d, result)}\\
2185\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2186\>{\sf short Tape\_G;}         \> // trace identification of vector field $\mathbf g$\\
2187\>{\sf short n;}               \> // number of independent variables $n$\\
2188\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2189\>{\sf short d;}               \> // highest derivative degree $d$\\
2190\>{\sf double result[n][d+1];} \> // resulting Lie derivatives of a vector field\\
2191\end{tabbing} 
2192
2193\subsubsection*{Additional C++ Drivers for Lie Derivatives}
2194
2195For Lie derivatives of scalar fields, the C interface offers two different drivers, namely {\verb=lie_scalarc=} for $m=1$ scalar field $h:\Omega\to\R$ and {\verb=lie_scalarcv=} for $m\geq1$ scalar fields $h_1,\ldots,h_m:\Omega\to\R$, which are combined in a vector-valued map $\mathbf{h}:\Omega\to\R^m$. Using the polymorphism in C++, both C drivers are unified by the C++ driver {\verb=lie_scalar=} for the computation of Lie derivatives of scalar fields:
2196
2197\begin{tabbing}
2198\hspace{0.5in}\={\sf int lie\_scalar(Tape\_F, Tape\_H, n, x0, d, result)} \hspace{0.45in}\= // case $m=1$\\
2199\>{\sf int lie\_scalar(Tape\_F, Tape\_H, n, m, x0, d, result)} \> // case $m\geq1$
2200\end{tabbing} 
2201
2202The same situation occurs with the gradients or jacobians of Lie derivatives of scalar fields. Here, the C drivers {\verb=lie_gradientc=} and {\verb=lie_gradientcv=} are unified by the C++ driver {\verb=lie_gradient=}:
2203
2204\begin{tabbing}
2205\hspace{0.5in}\={\sf int lie\_gradient(Tape\_F, Tape\_H, n, x0, d, result)} \hspace{0.35in}\= // case $m=1$\\
2206\>{\sf int lie\_gradient(Tape\_F, Tape\_H, n, m, x0, d, result)} \> // case $m\geq1$
2207\end{tabbing} 
2208
2209The C++ drivers are also prototyped in the header file {\sf $<$adolc\_lie.h$>$}. An example how to use these drivers for the calculation of Lie derivatives can be found in the file {\sf GantryCrane.cpp} in the directory {\verb=additional_examples/lie=}.
2210
2211%
2212%
2213%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
2214\section{Basic Drivers for the Forward and Reverse Mode}
2215\label{forw_rev_ad}
2216%
2217In this section, we present tailored drivers for different
2218variants of the forward mode and the reverse mode, respectively.
2219For a better understanding, we start with a short
2220description of the mathematical background.
2221
2222Provided no arithmetic exception occurs,
2223no comparison including {\sf fmax} or  {\sf fmin} yields a tie,
2224{\sf fabs} does not yield zero,
2225and all special functions were evaluated in the
2226interior of their domains, the functional relation between the input
2227variables $x$
2228and the output variables $y$ denoted by $y=F(x)$ is in
2229fact analytic.  In other words, we can compute arbitrarily high
2230derivatives of the vector function $F : I\!\!R^n \mapsto I\!\!R^m$ defined
2231by the active section.
2232We find it most convenient to describe and
2233compute derivatives in terms of univariate Taylor expansions, which
2234are truncated after the highest derivative degree $d$ that is desired
2235by the user. Let
2236\begin{equation}
2237\label{eq:x_of_t}
2238x(t) \; \equiv \; \sum_{j=0}^dx_jt^j \; : \;  I\!\!R \; \mapsto \;
2239I\!\!R^n
2240\end{equation} 
2241denote any vector polynomial in the scalar variable $t \in I\!\!R$.
2242In other words, $x(t)$ describes a path in $I\!\!R^n$ parameterized by $t$.
2243The Taylor coefficient vectors
2244\[ x_j \; = \; 
2245\frac{1}{j!} \left .  \frac{\partial ^j}{\partial t^j} x(t)
2246\right |_{t=0}
2247\] 
2248are simply the scaled derivatives of $x(t)$ at the parameter
2249origin $t=0$. The first two vectors $x_1,x_2 \in I\!\!R^n$ can be
2250visualized as tangent and curvature at the base point $x_0$,
2251respectively.
2252Provided that $F$ is $d$ times continuously differentiable, it
2253follows from the chain rule that the image path
2254\begin{equation}
2255\label{eq:rela}
2256 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
2257\end{equation}
2258is also smooth and has $(d+1)$ Taylor coefficient vectors
2259$y_j \in I\!\!R^m$ at $t=0$, so that
2260\begin{equation}
2261\label{eq:series}
2262y(t) \; = \; \sum_{j=0}^d y_jt^j + O(t^{d+1}).
2263\end{equation}
2264Also as a consequence of the chain rule, one can observe that
2265each $y_j$ is uniquely and smoothly determined by the coefficient
2266vectors $x_i$ with $i \leq j$.  In particular we have
2267\begin{align}
2268\label{eq:y_0y_1}
2269  y_0 & = F(x_0) \nonumber \\
2270  y_1 & = F'(x_0) x_1 \nonumber\\
2271  y_2 & = F'(x_0) x_2 + \frac{1}{2}F''(x_0)x_1 x_1 \\
2272  y_3 & = F'(x_0) x_3 + F''(x_0)x_1 x_2
2273          + \frac{1}{6}F'''(x_0)x_1 x_1 x_1\nonumber\\
2274  & \ldots\nonumber
2275\end{align}
2276In writing down the last equations we have already departed from the
2277usual matrix-vector notation. It is well known that the number of
2278terms that occur in these ``symbolic'' expressions for
2279the $y_j$ in terms of the first $j$ derivative tensors of $F$ and
2280the ``input'' coefficients $x_i$ with $i\leq j$ grows very rapidly
2281with $j$. Fortunately, this exponential growth does not occur
2282in automatic differentiation, where the many terms are somehow
2283implicitly combined  so that storage and operations count grow only
2284quadratically in the bound $d$ on $j$.
2285
2286Provided $F$ is analytic, this property is inherited by the functions
2287\[
2288y_j = y_j (x_0,x_1, \ldots ,x_j) \in {I\!\!R}^m ,
2289\]
2290and their derivatives satisfy the identities
2291\begin{equation}
2292\label{eq:identity}
2293\frac{\partial y_j}{\partial x_i}  = \frac{\partial y_{j-i}}
2294{\partial x_0} = A_{j-i}(x_0,x_1, \ldots ,x_{j-i})
2295\end{equation}
2296as established in \cite{Chri91a}. This yields in particular
2297\begin{align*}
2298  \frac{\partial y_0}{\partial x_0} =
2299  \frac{\partial y_1}{\partial x_1} =
2300  \frac{\partial y_2}{\partial x_2} =
2301  \frac{\partial y_3}{\partial x_3} =
2302  A_0 & = F'(x_0) \\
2303  \frac{\partial y_1}{\partial x_0} =
2304  \frac{\partial y_2}{\partial x_1} =
2305  \frac{\partial y_3}{\partial x_2} =
2306  A_1 & = F''(x_0) x_1 \\
2307  \frac{\partial y_2}{\partial x_0} =
2308  \frac{\partial y_3}{\partial x_1} =
2309  A_2 & = F''(x_0) x_2 + \frac{1}{2}F'''(x_0)x_1 x_1 \\
2310  \frac{\partial y_3}{\partial x_0} =
2311  A_3 & = F''(x_0) x_3 + F'''(x_0)x_1 x_2
2312          + \frac{1}{6}F^{(4)}(x_0)x_1 x_1 x_1 \\
2313  & \ldots
2314\end{align*}
2315The $m \times n$ matrices $A_k, k=0,\ldots,d$, are actually the Taylor
2316coefficients of the Jacobian path $F^\prime(x(t))$, a fact that is of
2317interest primarily in the context of ordinary differential
2318equations and differential algebraic equations.
2319
2320Given the tape of an active section and the coefficients $x_j$,
2321the resulting $y_j$ and their derivatives $A_j$ can be evaluated
2322by appropriate calls to the ADOL-C forward mode implementations and
2323the ADOL-C reverse mode implementations. The scalar versions of the forward
2324mode propagate just one truncated Taylor series from the $(x_j)_{j\leq d}$
2325to the $(y_j)_{j\leq d}$. The vector versions of the forward
2326mode propagate families of $p\geq 1$ such truncated Taylor series
2327in order to reduce the relative cost of the overhead incurred
2328in the tape interpretation. In detail, ADOL-C provides
2329\begin{tabbing}
2330\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2331\>{\sf int zos\_forward(tag,m,n,keep,x,y)}\\
2332\>{\sf short int tag;}         \> // tape identification \\
2333\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2334\>{\sf int n;}                 \> // number of independent variables $n$\\
2335\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2336\>{\sf double x[n];}           \> // independent vector $x=x_0$\\
2337\>{\sf double y[m];}           \> // dependent vector $y=F(x_0)$
2338\end{tabbing}                 
2339for the {\bf z}ero-{\bf o}rder {\bf s}calar forward mode. This driver computes
2340$y=F(x)$ with $0\leq\text{\sf keep}\leq 1$. The integer
2341flag {\sf keep} plays a similar role as in the call to 
2342{\sf trace\_on}: It determines if {\sf zos\_forward} writes
2343the first Taylor coefficients of all intermediate quantities into a buffered
2344temporary file, i.e., the value stack, in preparation for a subsequent
2345reverse mode evaluation. The value {\sf keep} $=1$
2346prepares for {\sf fos\_reverse} or {\sf fov\_reverse} as exlained below.
2347
2348To compute first-order derivatives, one has
2349\begin{tabbing}
2350\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2351\>{\sf int fos\_forward(tag,m,n,keep,x0,x1,y0,y1)}\\
2352\>{\sf short int tag;}         \> // tape identification \\
2353\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2354\>{\sf int n;}                 \> // number of independent variables $n$\\
2355\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2356\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2357\>{\sf double x1[n];}          \> // tangent vector $x_1$\\
2358\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2359\>{\sf double y1[m];}          \> // first derivative $y_1=F'(x_0)x_1$
2360\end{tabbing}                 
2361for the {\bf f}irst-{\bf o}rder {\bf s}calar forward mode. Here, one has
2362$0\leq\text{\sf keep}\leq 2$, where
2363\begin{align*}
2364\text{\sf keep} = \left\{\begin{array}{cl}
2365       1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2366       2 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2367       \end{array}\right.
2368\end{align*}
2369as exlained below. For the {\bf f}irst-{\bf o}rder {\bf v}ector forward mode,
2370ADOL-C provides
2371\begin{tabbing}
2372\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2373\>{\sf int fov\_forward(tag,m,n,p,x0,X,y0,Y)}\\
2374\>{\sf short int tag;}         \> // tape identification \\
2375\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2376\>{\sf int n;}                 \> // number of independent variables $n$\\
2377\>{\sf int p;}                 \> // number of directions\\
2378\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2379\>{\sf double X[n][p];}        \> // tangent matrix $X$\\
2380\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2381\>{\sf double Y[m][p];}        \> // first derivative matrix $Y=F'(x)X$
2382\end{tabbing}                 
2383For the computation of higher derivative, the driver
2384\begin{tabbing}
2385\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2386\>{\sf int hos\_forward(tag,m,n,d,keep,x0,X,y0,Y)}\\
2387\>{\sf short int tag;}         \> // tape identification \\
2388\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2389\>{\sf int n;}                 \> // number of independent variables $n$\\
2390\>{\sf int d;}                 \> // highest derivative degree $d$\\
2391\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2392\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2393\>{\sf double X[n][d];}        \> // tangent matrix $X$\\
2394\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2395\>{\sf double Y[m][d];}        \> // derivative matrix $Y$
2396\end{tabbing}
2397implementing the  {\bf h}igher-{\bf o}rder {\bf s}calar forward mode.
2398The rows of the matrix $X$ must correspond to the independent variables in the order of their
2399initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2400$X = \{x_j\}_{j=1\ldots d}$ represent Taylor coefficient vectors as in
2401\eqref{eq:x_of_t}. The rows of the matrix $Y$ must correspond to the
2402dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2403The columns of $Y = \{y_j\}_{j=1\ldots d}$ represent
2404Taylor coefficient vectors as in \eqref{eq:series}, i.e., {\sf hos\_forward}
2405computes the values
2406$y_0=F(x_0)$, $y_1=F'(x_0)x_1$, \ldots, where
2407$X=[x_1,x_2,\ldots,x_d]$ and  $Y=[y_1,y_2,\ldots,y_d]$. Furthermore, one has
2408$0\leq\text{\sf keep}\leq d+1$, with
2409\begin{align*}
2410\text{\sf keep}  \left\{\begin{array}{cl}
2411       = 1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2412       > 1 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2413       \end{array}\right.
2414\end{align*}
2415Once more, there is also a vector version given by
2416\begin{tabbing}
2417\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2418\>{\sf int hov\_forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2419\>{\sf short int tag;}         \> // tape identification \\
2420\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2421\>{\sf int n;}                 \> // number of independent variables $n$\\
2422\>{\sf int d;}                 \> // highest derivative degree $d$\\
2423\>{\sf int p;}                 \> // number of directions $p$\\
2424\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2425\>{\sf double X[n][p][d];}     \> // tangent matrix $X$\\
2426\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2427\>{\sf double Y[m][p][d];}     \> // derivative matrix $Y$
2428\end{tabbing}
2429for the  {\bf h}igher-{\bf o}rder {\bf v}ector forward mode that computes
2430$y_0=F(x_0)$, $Y_1=F'(x_0)X_1$, \ldots, where $X=[X_1,X_2,\ldots,X_d]$ and 
2431$Y=[Y_1,Y_2,\ldots,Y_d]$.
2432
2433There are also overloaded versions providing a general {\sf forward}-call.
2434Details of the appropriate calling sequences are given in \autoref{forw_rev}.
2435
2436Once, the required information is generated due to a forward mode evaluation
2437with an approriate value of the parameter {\sf keep}, one may use the
2438following implementation variants of the reverse mode. To compute first-order derivatives
2439one can use
2440\begin{tabbing}
2441\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2442\>{\sf int fos\_reverse(tag,m,n,u,z)}\\
2443\>{\sf short int tag;}         \> // tape identification \\
2444\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2445\>{\sf int n;}                 \> // number of independent variables $n$\\
2446\>{\sf double u[m];}           \> // weight vector $u$\\
2447\>{\sf double z[n];}           \> // resulting adjoint value $z^T=u^T F'(x)$
2448\end{tabbing}                 
2449as {\bf f}irst-{\bf o}rder {\bf s}calar reverse mode implementation that computes
2450the product $z^T=u^T F'(x)$ after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2451{\sf hos\_forward} with {\sf keep}=1. The corresponding {\bf f}irst-{\bf
2452  o}rder {\bf v}ector reverse mode driver is given by
2453\begin{tabbing}
2454\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2455\>{\sf int fov\_reverse(tag,m,n,q,U,Z)}\\
2456\>{\sf short int tag;}         \> // tape identification \\
2457\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2458\>{\sf int n;}                 \> // number of independent variables $n$\\
2459\>{\sf int q;}                 \> // number of weight vectors $q$\\
2460\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2461\>{\sf double Z[q][n];}        \> // resulting adjoint $Z=U F'(x)$
2462\end{tabbing}                 
2463that can be used after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2464{\sf hos\_forward} with {\sf keep}=1. To compute higher-order derivatives,
2465ADOL-C provides
2466\begin{tabbing}
2467\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2468\>{\sf int hos\_reverse(tag,m,n,d,u,Z)}\\
2469\>{\sf short int tag;}         \> // tape identification \\
2470\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2471\>{\sf int n;}                 \> // number of independent variables $n$\\
2472\>{\sf int d;}                 \> // highest derivative degree $d$\\
2473\>{\sf double u[m];}           \> // weight vector $u$\\
2474\>{\sf double Z[n][d+1];}      \> // resulting adjoints
2475\end{tabbing}                 
2476as {\bf h}igher-{\bf o}rder {\bf s}calar reverse mode implementation yielding
2477the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2478\ldots, where $Z=[z_0,z_1,\ldots,z_d]$ after calling  {\sf fos\_forward} or
2479{\sf hos\_forward} with {\sf keep} $=d+1>1$. The vector version is given by
2480\begin{tabbing}
2481\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2482\>{\sf int hov\_reverse(tag,m,n,d,q,U,Z,nz)}\\
2483\>{\sf short int tag;}         \> // tape identification \\
2484\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2485\>{\sf int n;}                 \> // number of independent variables $n$\\
2486\>{\sf int d;}                 \> // highest derivative degree $d$\\
2487\>{\sf double U[q][m];}        \> // weight vector $u$\\
2488\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints\\
2489\>{\sf short int nz[q][n];}    \> // nonzero pattern of {\sf Z}
2490\end{tabbing}                 
2491as {\bf h}igher-{\bf o}rder {\bf v}ector reverse mode driver to compute
2492the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2493\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$ after calling  {\sf fos\_forward} or
2494{\sf hos\_forward} with {\sf keep} $=d+1>1$.
2495After the function call, the last argument of {\sf hov\_reverse} 
2496contains information about the sparsity pattern, i.e. each {\sf nz[i][j]}
2497has a value that characterizes the functional relation between the
2498$i$-th component of $UF^\prime(x)$ and the $j$-th independent value
2499$x_j$ as:
2500\begin{center}
2501\begin{tabular}{ll}
2502 0 & trivial \\
2503 1 & linear
2504\end{tabular} \hspace*{4ex}
2505\begin{tabular}{ll}
2506 2 & polynomial\\
2507 3 & rational
2508\end{tabular} \hspace*{4ex}
2509\begin{tabular}{ll}
2510 4 & transcendental\\
2511 5 & non-smooth
2512\end{tabular}
2513\end{center}
2514Here, ``trivial'' means that there is no dependence at all and ``linear'' means
2515that the partial derivative is a constant that
2516does not dependent on other variables either. ``Non-smooth'' means that one of
2517the functions on the path between $x_i$ and $y_j$ was evaluated at a point
2518where it is not differentiable.  All positive labels
2519$1, 2, 3, 4, 5$ are pessimistic in that the actual functional relation may
2520in fact be simpler, for example due to exact cancellations. 
2521
2522There are also overloaded versions providing a general {\sf reverse}-call.
2523Details of the appropriate calling sequences are given in the following \autoref{forw_rev}.
2524
2525
2526\input{absdrivers}
2527%
2528%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2529\section{Overloaded Forward and Reverse Calls}
2530\label{forw_rev}
2531%
2532In this section, the several versions of the {\sf forward} and
2533{\sf reverse} routines, which utilize the overloading capabilities
2534of C++, are described in detail. With exception of the bit pattern
2535versions all interfaces are prototyped in the header file
2536\verb=<adolc/interfaces.h>=, where also some more specialized {\sf forward}
2537and {\sf reverse} routines are explained. Furthermore, \mbox{ADOL-C} provides
2538C and Fortran-callable versions prototyped in the same header file.
2539The bit pattern versions of {\sf forward} and {\sf reverse} introduced
2540in the \autoref{ProBit} are prototyped in the header file
2541\verb=<adolc/sparse/sparsedrivers.h>=, which will be included by the header
2542file \verb=<adolc/interfaces.h>= automatically.
2543%
2544%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2545\subsection{The Scalar Case}
2546%
2547\label{ScaCas}
2548%     
2549Given any correct tape, one may call from within
2550the generating program, or subsequently during another run, the following
2551procedure:
2552%
2553\begin{tabbing}
2554\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2555\>{\sf int forward(tag,m,n,d,keep,X,Y)} \\
2556\>{\sf short int tag;}         \> // tape identification \\
2557\>{\sf int m;}                 \> // number of dependent variables $m$\\
2558\>{\sf int n;}                 \> // number of independent variables $n$\\
2559\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2560\>{\sf  int keep;}             \> // flag for reverse sweep \\ 
2561\>{\sf  double X[n][d+1];}     \> // Taylor coefficients $X$ of
2562                                     independent variables \\
2563\>{\sf double Y[m][d+1];}      \> // Taylor coefficients $Y$ as
2564                                     in \eqref{eq:series}
2565\end{tabbing}
2566%
2567The rows of the matrix $X$ must correspond to the independent variables in the order of their
2568initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2569$X = \{x_j\}_{j=0\ldots d}$ represent Taylor coefficient vectors as in
2570\eqref{eq:x_of_t}. The rows of the matrix $Y$ must
2571correspond to the
2572dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2573The columns of $Y = \{y_j\}_{j=0\ldots d}$ represent
2574Taylor coefficient vectors as in \eqref{eq:series}.
2575Thus the first column of $Y$ contains the
2576function value $F(x)$ itself, the next column represents the first
2577Taylor coefficient vector of $F$, and the last column the
2578$d$-th Taylor coefficient vector. The integer flag {\sf keep} determines
2579how many Taylor coefficients of all intermediate quantities are
2580written into the value stack as explained in \autoref{forw_rev_ad}.
2581 If {\sf keep} is omitted, it defaults to 0.
2582
2583The given {\sf tag} value is used by {\sf forward} to determine the
2584name of the file on which the tape was written. If the tape file does
2585not exist, {\sf forward} assumes that the relevant
2586tape is still in core and reads from the buffers.
2587After the execution of an active section with \mbox{{\sf keep} = 1} or a call to
2588{\sf forward} with any {\sf keep} $\leq$ $d+1$, one may call
2589the function {\sf reverse} with \mbox{{\sf d} = {\sf keep} $-$ 1} and the same tape
2590identifier {\sf tag}. When $u$ is a vector
2591and $Z$ an $n\times (d+1)$ matrix
2592{\sf reverse} is executed in the scalar mode by the calling
2593sequence
2594%
2595\begin{tabbing}
2596\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position             
2597\>{\sf int reverse(tag,m,n,d,u,Z)}\\
2598\>{\sf short int tag;}         \> // tape identification \\
2599\>{\sf int m;}                 \> // number of dependent variables $m$\\
2600\>{\sf int n;}                 \> // number of independent variables $n$\\
2601\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2602\>{\sf  double u[m];}          \> // weighting vector $u$\\
2603\>{\sf double Z[n][d+1];}      \> // resulting adjoints $Z$ 
2604\end{tabbing}
2605to compute
2606the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2607\ldots, where $Z=[z_0,z_1,\ldots,z_d]$.
2608%
2609%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2610\subsection{The Vector Case}
2611%
2612\label{vecCas}
2613%
2614When $U$ is a matrix {\sf reverse} is executed in the vector mode by the following calling sequence
2615%
2616\begin{tabbing}
2617\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position       
2618\>{\sf int reverse(tag,m,n,d,q,U,Z,nz)}\\
2619\>{\sf short int tag;}         \> // tape identification \\
2620\>{\sf int m;}                 \> // number of dependent variables $m$\\
2621\>{\sf int n;}                 \> // number of independent variables $n$\\
2622\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2623\>{\sf int q;}                 \> // number of weight vectors $q$\\
2624\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2625\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints \\
2626\>{\sf short nz[q][n];}        \> // nonzero pattern of {\sf Z}
2627\end{tabbing}
2628%
2629to compute the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2630\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$.
2631When the arguments {\sf p} and {\sf U} are omitted, they default to
2632$m$ and the identity matrix of order $m$, respectively. 
2633
2634Through the optional argument {\sf nz} of {\sf reverse} one can compute
2635information about the sparsity pattern of $Z$ as described in detail
2636in the previous \autoref{forw_rev_ad}.
2637
2638The return values of {\sf reverse} calls can be interpreted according
2639to \autoref{retvalues}, but negative return values are not
2640valid, since the corresponding forward sweep would have
2641stopped without completing the necessary taylor file.
2642The return value of {\sf reverse} may be higher
2643than that of the preceding {\sf forward} call because some operations
2644that were evaluated  at a critical argument during the forward sweep
2645were found not to impact the dependents during the reverse sweep.
2646
2647In both scalar and vector mode, the degree $d$ must agree with
2648{\sf keep}~$-$~1 for the most recent call to {\sf forward}, or it must be
2649equal to zero if {\sf reverse} directly follows the taping of an active
2650section. Otherwise, {\sf reverse} will return control with a suitable error
2651message.
2652In order to avoid possible confusion, the first four arguments must always be
2653present in the calling sequence. However, if $m$ or $d$
2654attain their trivial values 1 and 0, respectively, then
2655corresponding dimensions of the arrays {\sf X}, {\sf Y}, {\sf u},
2656{\sf U}, or {\sf Z} can be omitted, thus eliminating one level of
2657indirection.  For example, we may call
2658{\sf reverse(tag,1,n,0,1.0,g)} after declaring
2659{\sf double g[n]} 
2660to calculate a gradient of a scalar-valued function.
2661
2662Sometimes it may be useful to perform a forward sweep for families of
2663Taylor series with the same leading term.
2664This vector version of {\sf forward} can be called in the form
2665%
2666\begin{tabbing}
2667\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2668\>{\sf int forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2669\>{\sf short int tag;}         \> // tape identification \\
2670\>{\sf int m;}                 \> // number of dependent variables $m$\\
2671\>{\sf int n;}                 \> // number of independent variables $n$\\
2672\>{\sf int d;}                 \> // highest derivative degree $d$\\
2673\>{\sf int p;}                 \> // number of Taylor series $p$\\
2674\>{\sf  double x0[n];}          \> // values of independent variables $x_0$\\
2675\>{\sf double X[n][p][d];}     \> // Taylor coefficients $X$ of independent variables\\
2676\>{\sf double y0[m];}           \> // values of dependent variables $y_0$\\
2677\>{\sf double Y[m][p][d];}     \> // Taylor coefficients $Y$ of dependent variables
2678\end{tabbing}
2679%
2680where {\sf X} and {\sf Y} hold the Taylor coefficients of first
2681and higher degree and {\sf x0}, {\sf y0} the common Taylor coefficients of
2682degree 0. There is no option to keep the values of active variables
2683that are going out of scope or that are overwritten. Therefore this
2684function cannot prepare a subsequent reverse sweep.
2685The return integer serves as a flag to indicate quadratures or altered
2686comparisons as described above in \autoref{reuse_tape}.
2687
2688Since the calculation of Jacobians is probably the most important
2689automatic differentia\-tion task, we have provided a specialization
2690of vector {\sf forward} to the case where $d = 1$. This version can be
2691called in the form
2692%
2693\begin{tabbing}
2694\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2695\>{\sf int forward(tag,m,n,p,x,X,y,Y)}\\
2696\>{\sf short int tag;}         \> // tape identification \\
2697\>{\sf int m;}                 \> // number of dependent variables $m$\\
2698\>{\sf int n;}                 \> // number of independent variables $n$\\
2699\>{\sf int p;}                 \> // number of partial derivatives $p$ \\
2700\>{\sf double x[n];}          \> // values of independent variables $x_0$\\
2701\>{\sf double X[n][p];}        \> // seed derivatives of independent variables $X$\\
2702\>{\sf double y[m];}           \> // values of dependent variables $y_0$\\
2703\>{\sf double Y[m][p];}        \> // first derivatives of dependent variables $Y$
2704\end{tabbing}
2705%
2706When this routine is called with {\sf p} = {\sf n} and {\sf X} the identity matrix,
2707the resulting {\sf Y} is simply the Jacobian $F^\prime(x_0)$. In general,
2708one obtains the $m\times p$ matrix $Y=F^\prime(x_0)\,X $ for the
2709chosen initialization of $X$. In a workstation environment a value
2710of $p$ somewhere between $10$ and $50$
2711appears to be fairly optimal. For smaller $p$ the interpretive
2712overhead is not appropriately amortized, and for larger $p$ the
2713$p$-fold increase in storage causes too many page faults. Therefore,
2714large Jacobians that cannot be compressed via column coloring
2715as could be done for example using the driver {\sf sparse\_jac}
2716should be ``strip-mined'' in the sense that the above
2717first-order-vector version of {\sf forward} is called
2718repeatedly with the successive \mbox{$n \times p$} matrices $X$ forming
2719a partition of the identity matrix of order $n$.
2720%
2721%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2722\subsection{Dependence Analysis}
2723%
2724\label{ProBit}
2725%
2726The sparsity pattern of Jacobians is often needed to set up data structures
2727for their storage and factorization or to allow their economical evaluation
2728by compression \cite{BeKh96}. Compared to the evaluation of the full
2729Jacobian $F'(x_0)$ in real arithmetic computing the Boolean matrix
2730$\tilde{P}\in\left\{0,1\right\}^{m\times n}$ representing its sparsity
2731pattern in the obvious way requires a little less run-time and
2732certainly a lot less memory.
2733
2734The entry $\tilde{P}_{ji}$ in the $j$-th row and $i$-th column
2735of $\tilde{P}$ should be $1 = true$ exactly when there is a data
2736dependence between the $i$-th independent variable $x_{i}$ and
2737the $j$-th dependent variable $y_{j}$. Just like for real arguments
2738one would wish to compute matrix-vector and vector-matrix products
2739of the form $\tilde{P}\tilde{v}$ or $\tilde{u}^{T}\tilde{P}$ 
2740by appropriate {\sf forward} and {\sf reverse} routines where
2741$\tilde{v}\in\{0,1\}^{n}$ and $\tilde{u}\in\{0,1\}^{m}$.
2742Here, multiplication corresponds to logical
2743{\sf AND} and addition to logical {\sf OR}, so that algebra is performed in a
2744semi-ring.
2745
2746For practical reasons it is assumed that
2747$s=8*${\sf sizeof}$(${\sf unsigned long int}$)$ such Boolean vectors
2748$\tilde{v}$ and $\tilde{u}$ are combined to integer vectors
2749$v\in\N^{n}$ and $u\in\N^{m}$ whose components can be interpreted
2750as bit patterns. Moreover $p$ or $q$ such integer vectors may
2751be combined column-wise or row-wise to integer matrices $X\in\N^{n \times p}$ 
2752and $U\in\N^{q \times m}$, which naturally correspond
2753to Boolean matrices $\tilde{X}\in\{0,1\}^{n\times\left(sp\right)}$
2754and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times m}$. The provided
2755bit pattern versions of {\sf forward} and {\sf reverse} allow
2756to compute integer matrices $Y\in\N^{m \times p}$ and
2757$Z\in\N^{q \times m}$ corresponding to
2758\begin{equation}
2759\label{eq:int_forrev}
2760\tilde{Y} = \tilde{P}\tilde{X} \qquad \mbox{and} \qquad 
2761\tilde{Z} = \tilde{U}\tilde{P} \, ,
2762\end{equation} 
2763respectively, with $\tilde{Y}\in\{0,1\}^{m\times\left(sp\right)}$
2764and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times n}$.
2765In general, the application of the bit pattern versions of
2766{\sf forward} or {\sf reverse} can be interpreted as
2767propagating dependences between variables forward or backward, therefore
2768both the propagated integer matrices and the corresponding
2769Boolean matrices are called {\em dependence structures}.
2770 
2771The bit pattern {\sf forward} routine
2772%
2773\begin{tabbing}
2774\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2775\>{\sf int forward(tag,m,n,p,x,X,y,Y,mode)}\\
2776\>{\sf short int tag;}              \> // tape identification \\
2777\>{\sf int m;}                      \> // number of dependent variables $m$\\
2778\>{\sf int n;}                      \> // number of independent variables $n$\\
2779\>{\sf int p;}                      \> // number of integers propagated $p$\\
2780\>{\sf double x[n];}                \> // values of independent variables $x_0$\\
2781\>{\sf unsigned long int X[n][p];}  \> // dependence structure $X$ \\
2782\>{\sf double y[m];}                \> // values of dependent variables $y_0$\\
2783\>{\sf unsigned long int Y[m][p];}  \> // dependence structure $Y$ according to
2784                                     \eqref{eq:int_forrev}\\
2785\>{\sf char mode;}                  \> // 0 : safe mode (default), 1 : tight mode
2786\end{tabbing}
2787%
2788can be used to obtain the dependence structure $Y$ for a given dependence structure
2789$X$. The dependence structures are
2790represented as arrays of {\sf unsigned long int} the entries of which are
2791interpreted as bit patterns as described above.   
2792For example, for $n=3$ the identity matrix $I_3$ should be passed
2793with $p=1$ as the $3 \times 1$ array
2794\begin{eqnarray*}
2795{\sf X} \; = \;
2796\left( \begin{array}{r}
2797         {\sf 1}0000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2798         0{\sf 1}000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2799         00{\sf 1}00000 \: 00000000 \: 00000000 \: 00000000_2
2800       \end{array} \right)
2801\end{eqnarray*}
2802in the 4-byte long integer format. The parameter {\sf mode} determines
2803the mode of dependence analysis as explained already in \autoref{sparse}.
2804
2805A call to the corresponding bit pattern {\sf reverse} routine
2806%
2807\begin{tabbing}
2808\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2809\>{\sf int reverse(tag,m,n,q,U,Z,mode)}\\
2810\>{\sf short int tag;}         \> // tape identification \\
2811\>{\sf int m;}                 \> // number of dependent variables $m$\\
2812\>{\sf int n;}                 \> // number of independent variables $n$\\
2813\>{\sf int q;}                 \> // number of integers propagated q\\
2814\>{\sf unsigned long int U[q][m];}  \> // dependence structure $U$ \\
2815\>{\sf unsigned long int Z[q][n];}  \> // dependence structure $Z$ according
2816                                     to \eqref{eq:int_forrev}\\
2817\>{\sf char mode;}        \> // 0 : safe mode (default), 1 : tight mode
2818\end{tabbing}
2819%
2820yields the dependence structure $Z$ for a given dependence structure
2821$U$.
2822
2823To determine the whole sparsity pattern $\tilde{P}$ of the Jacobian $F'(x)$
2824as an integer matrix $P$ one may call {\sf forward} or {\sf reverse} 
2825with $p \ge n/s$ or $q \ge m/s$, respectively. For this purpose the
2826corresponding dependence structure $X$ or $U$ must be defined to represent 
2827the identity matrix of the respective dimension.
2828Due to the fact that always a multiple of $s$ Boolean vectors are propagated
2829there may be superfluous vectors, which can be set to zero.
2830
2831The return values of the bit pattern {\sf forward} and {\sf reverse} routines
2832correspond to those described in \autoref{retvalues}.
2833
2834One can control the storage growth by the factor $p$ using
2835``strip-mining'' for the calls of {\sf forward} or {\sf reverse} with successive
2836groups of columns or respectively rows at a time, i.e.~partitioning
2837$X$ or $U$ appropriately as described for the computation of Jacobians
2838in \autoref{vecCas}.
2839%
2840%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2841%
2842\section{Advanced algorithmic differentiation in ADOL-C}
2843\label{adv_ad}
2844%
2845\subsection{Differentiating parameter dependent functions}
2846%
2847Normally the functions to be differentiated using AD are defined as
2848mappings from $\R^n$ to $\R^m$. The derivatives of the $m$ dependents
2849are then computed w.r.t. all the $n$ independents. However sometimes
2850the application requires only derivatives w.r.t. some of the
2851independents and the others only occur in the evaluation procedure as
2852parameters. One important such use case is Lagrange Multipliers in a
2853nonlinear optimization problem. Declaring such variables as
2854\verb=adouble= and marking them as independent results in the
2855evaluation of superfluous derivative information and increases
2856runtime. However, if the parameters are not defined as \verb=adouble=
2857type variables they get hard coded as constants onto the ADOL-C
2858trace. One may want to change these parameters for some evaluations,
2859however the overhead of retracing the function may be very high. In
2860such cases these constants may be specially marked as parameters
2861during tracing using the function
2862\begin{tabbing}
2863\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2864\>{\sf mkparam(d)}\\
2865\>{\sf double d;}         \> // constant value to be marked as parameter \\
2866\end{tabbing}
2867This stores the constant in a different place on the trace than the
2868other unmarked constants and enables the user to replace a whole
2869vector of parameters with a new one before computing derivatives later
2870on. All such marked parameter constants can be used in any evaluation
2871expression where the original unmarked constants were used.
2872
2873Sometimes the same parameter may be needed in various different
2874expressions. During trace creation the marked parameters are given a
2875running index as they are marked. This index may me saved by the user
2876and used to access the same stored value in different expressions.
2877
2878To save the index while marking a parameter use
2879\begin{tabbing}
2880\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2881\>{\sf locint mkparam\_idx(d)}\\
2882\>{\sf double d;} \> // constant value to be marked
2883\end{tabbing}
2884The resulting \verb=locint= can be saved in a variable by the user and
2885later used to access the saved parameter from in the trace in any expression
2886
2887To access a saved parameter inside any evaluation expression use
2888\begin{tabbing}
2889\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2890\>{\sf getparam(idx)}\\
2891\>{\sf locint idx;} \> // index of the parameter as returned by
2892mkparam\_idx()\\
2893\end{tabbing}
2894
2895The above three functions work only during the creation of the trace,
2896i.e., between the invokation of \verb=trace_on()= and \verb=trace_off()=.
2897
2898For all following derivative computations using the drivers described
2899in the preceding sections, the saved parameter values will be used,
2900unless the user asks to replace them with a different vector of
2901values.
2902
2903To replace the whole parameter vector one uses the routine
2904\begin{tabbing}
2905\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2906\>{\sf set\_param\_vec(tag, number, parameters)}\\
2907\>{\sf short int tag;}   \> // tape identification\\
2908\>{\sf size\_t number;}  \> // total number of constants marked as
2909parameters\\
2910\>{\sf double* parameters;} \> // array of constants to be used as replacement\\
2911\end{tabbing}
2912All following derivative computations using the drivers described in
2913the preceding sections will now use the new parameters.
2914
2915After replacing the parameters, however, one cannot directly call a basic
2916reverse mode driver. A basic forward mode driver must be called. However,
2917the easy to use drivers from Section \ref{drivers} do this automatically.
2918%
2919\subsection{Differentiating external functions}
2920%
2921Ideally, AD is applied to a given computation as a whole.
2922This is not always possible because parts of the computation may
2923be coded in a different programming language or may a call to
2924an external library.
2925In the former case one may want to differentiate the parts in
2926question with a different AD tool or provide hand written derivatives.
2927To integrate these
2928In practice, however, sophisticated projects usually evolve over a long period of time.
2929Within this process, a heterogeneous code base for the project
2930develops, which may include the incorporation of external solutions,
2931changes in programming paradigms or even of programming languages.
2932Equally heterogeneous, the computation of derivative values appears.
2933Hence, different \mbox{AD-tools} may be combined with hand-derived
2934codes based on the same or different programming languages.
2935ADOL-C supports such settings  by  the concept of externally
2936differentiated functions, that is, a function
2937not differentiated by ADOL-C itself. The required derivatives
2938have to be provided by the user.
2939
2940For this purpose, it is required that the externally differentiated
2941function (for example named {\sf\em euler\_step} ) has the following signature.
2942\smallskip
2943
2944\noindent
2945\hspace*{2cm}{\sf int euler\_step(int n, double *x, int m, double *y);}
2946\medskip
2947
2948\noindent
2949Note that the formal paraemters in the signature have {\sf double} type, that is,
2950they are not active as in the original program before the ADOL-C type change.
2951The externally differentiated function has to
2952be {\em registered}\footnote{we record the function pointer} using an \mbox{ADOL-C} method as follows.
2953\smallskip
2954
2955\noindent
2956\hspace*{2cm}{\sf ext\_diff\_fct *edf = reg\_ext\_fct(euler\_step);}.
2957\smallskip
2958
2959\noindent
2960This returns a pointer to an {\sf ext\_diff\_fct} instance specific to the registered function.
2961Then, the user has to supply the function pointers for the call back methods (for example here
2962{\sf zos\_for\_euler\_step} and {\sf  fos\_rev\_euler\_step}) the user implemented
2963to compute the derivatives as follows.
2964\begin{tabbing}
2965\hspace*{2cm}\= {\sf edf-$>$zos\_forward = {\em zos\_for\_euler\_step};}\\
2966             \> {\sf // function pointer for computing
2967               Zero-Order-Scalar (=zos)}\\
2968             \> {\sf // forward information}\\
2969%             \> {\sf edf-$>$dp\_x = xp;}\\
2970%             \> {\sf edf-$>$dp\_y = yp;}\\
2971%             \> {\sf // double arrays for arguments and results}\\
2972             \> {\sf edf-$>$fos\_reverse = fos\_rev\_euler\_step;} \\
2973             \> {\sf // function pointer for computing
2974               First-Order-Scalar (=fos)}\\ 
2975             \> {\sf reverse information}
2976\end{tabbing}
2977To facilitate the switch between active and passive versions of the parameters {\sf x} and {\sf y}
2978one has to provide (allocate) both variants. I.e. if the call to {\sf euler\_step} was originally
2979\noindent
2980\hspace*{2cm}{\sf rc=euler\_step(n, sOld, m, sNew);}
2981\medskip
2982then the ADOL-C typechange for the calling context will turn  {\sf sOld} and {\sf sNew} in {\sf adouble} pointers.
2983To trigger the appropriate action for the derivative computation (i.e. creating an external differentiation entry on the trace)
2984the original call to the externally differentiated function
2985must be substituted by
2986\medskip
2987
2988\noindent
2989\hspace*{2cm}{\sf rc=call\_ext\_fct(edf, n, sOldPassive, sOld, m, sNewPassive, sNew);}
2990\medskip
2991
2992\noindent
2993Here, {\sf sOldPassive} and {\sf sNewPassive} are the passive counterparts ({\sf double} pointers allocated to length {\sf n} and {\sf m}, respectively) 
2994to the active arguments {\sf sNew} in {\sf adouble}.
2995The usage of the external function facility is illustrated by the
2996example \verb=ext_diff_func= contained in
2997\verb=examples/additional_examples/ext_diff_func=.
2998There,the external differentiated function is also a C code, but the
2999handling as external differentiated functions also a decrease of the
3000overall required tape size.
3001
3002%
3003\subsection{Advanced algorithmic differentiation of time integration processes}
3004%
3005For many time-dependent applications, the corresponding simulations
3006are based on ordinary or partial differential equations.
3007Furthermore, frequently there are quantities that influence the
3008result of the simulation and can be seen as  control of the systems.
3009To compute an approximation of the
3010simulated process for a time interval $[0,T]$ and evaluated the
3011desired target function, one applies an
3012appropriate integration scheme given by
3013\begin{tabbing}
3014\hspace{5mm} \= some initializations yielding $x_0$\\
3015\> for $i=0,\ldots, N-1$\\
3016\hspace{10mm}\= $x_{i+1} = F(x_i,u_i,t_i)$\\
3017\hspace{5mm} \= evaluation of the target function
3018\end{tabbing}
3019where $x_i\in {\bf R}^n$ denotes the state and $u_i\in {\bf R}^m$ the control at
3020time $t_i$ for a given time grid $t_0,\ldots,t_N$ with $t_0=0$ and
3021$t_N=T$. The operator $F : {\bf R}^n \times {\bf R}^m \times {\bf R} \mapsto {\bf R}^n$
3022defines the time step to compute the state at time $t_i$. Note that we
3023do not assume a uniform grid.
3024
3025When computing derivatives of the target function with respect to the
3026control, the consequences for the tape generation using the ``basic''
3027taping approach as implemented in ADOL-C so far are shown in the left part of
3028\autoref{fig:bas_tap}.
3029\begin{figure}[htbp] 
3030\begin{center}
3031\hspace*{0.5cm}\includegraphics[width=5.8cm]{tapebasic}\hfill
3032\includegraphics[width=5.8cm]{tapeadv} \hspace*{0.5cm}\
3033\end{center}
3034\hspace*{0.8cm} Basic taping process \hspace*{4.3cm} Advanced taping process
3035\caption{Different taping approaches}
3036\label{fig:bas_tap}
3037\end{figure} 
3038As can be seen, the iterative process is completely
3039unrolled due to the taping process. That is, the tape contains an internal representation of each
3040time step. Hence, the overall tape comprises a serious amount of redundant
3041information as illustrated by the light grey rectangles in
3042\autoref{fig:bas_tap}
3043
3044To overcome the repeated storage of essentially the same information,
3045a {\em nested taping} mechanism has been incorporated into ADOL-C as illustrated on
3046the right-hand side of \autoref{fig:bas_tap}. This new
3047capability allows the encapsulation of the time-stepping procedure
3048such that only the last time step $x_{N} = F(x_{N-1},u_{N-1})$ is taped as one
3049representative of the time steps in addition to a function pointer to the
3050evaluation procedure $F$ of the time steps.  The function pointer has
3051to be stored for a possibly necessary retaping during the derivative calculation
3052as explained below.
3053
3054Instead of storing the complete tape, only a very limited number of intermediate
3055states are kept in memory. They serve as checkpoints, such that
3056the required information for the backward integration is generated
3057piecewise during the adjoint calculation.
3058For this modified adjoint computation the optimal checkpointing schedules
3059provided by {\bf revolve} are employed. An adapted version of the
3060software package {\sf revolve} is part of ADOL-C and automatically
3061integrated in the ADOL-C library. Based on {\sf revolve}, $c$ checkpoints are
3062distributed such that computational effort is minimized for the given
3063number of checkpoints and time steps $N$. It is important to note that the overall tape
3064size is drastically reduced due to the advanced taping strategy.  For the
3065implementation of this nested taping we introduced
3066a so-called ``differentiating context'' that enables \mbox{ADOL-C} to
3067handle different internal function representations during the taping
3068procedure and the derivative calculation. This approach allows the generation of a new
3069tape inside the overall tape, where the coupling of the different tapes is based on
3070the {\em external differentiated function} described above.
3071
3072Written under the objective of minimal user effort, the checkpointing routines
3073of \mbox{ADOL-C} need only very limited information. The user must
3074provide two routines as implementation of the time-stepping function $F$ 
3075with the signatures
3076\medskip
3077
3078\noindent
3079\hspace*{2cm}{\sf int time\_step\_function(int n, adouble *u);}\\
3080\hspace*{2cm}{\sf int time\_step\_function(int n, double *u);}
3081\medskip
3082
3083\noindent
3084where the function names can be chosen by the user as long as the names are
3085unique.It is possible that the result vector of one time step
3086iteration overwrites the argument vector of the same time step. Then, no
3087copy operations are required to prepare the next time step.
3088
3089At first, the {\sf adouble} version of the time step function has to
3090be {\em registered} using the \mbox{ADOL-C} function
3091\medskip
3092
3093\noindent
3094\hspace*{2cm}{\sf CP\_Context cpc(time\_step\_function);}.
3095\medskip
3096
3097\noindent
3098This function initializes the structure {\sf cpc}. Then,
3099the user has to provide the remaining checkpointing information
3100by the following commands:
3101\begin{tabbing}
3102\hspace*{2cm}\= {\sf cpc.setDoubleFct(time\_step\_function);}\\
3103             \> {\sf // double variante of the time step function}\\
3104             \> {\sf cpc.setNumberOfSteps(N);}\\
3105             \> {\sf // number of time steps to perform}\\
3106             \> {\sf cpc.setNumberOfCheckpoints(10);}\\
3107             \> {\sf // number of checkpoint} \\
3108             \> {\sf cpc.setDimensionXY(n);}\\
3109             \> {\sf // dimension of input/output}\\
3110             \> {\sf cpc.setInput(y);}\\
3111             \> {\sf // input vector} \\
3112             \> {\sf cpc.setOutput(y);}\\
3113             \> {\sf // output vector }\\
3114             \> {\sf cpc.setTapeNumber(tag\_check);}\\
3115             \> {\sf // subtape number for checkpointing} \\
3116             \> {\sf cpc.setAlwaysRetaping(false);}\\
3117             \> {\sf // always retape or not ?}
3118\end{tabbing}
3119Subsequently, the time loop in the function evaluation can be
3120substituted by a call of the function
3121\medskip
3122
3123\noindent
3124\hspace*{2cm}{\sf int cpc.checkpointing();}
3125\medskip
3126
3127\noindent
3128Then, ADOL-C computes derivative information using the optimal checkpointing
3129strategy provided by {\sf revolve} internally, i.e., completely hidden from the user.
3130
3131The presented driver is prototyped in the header file
3132\verb=<adolc/checkpointing.h>=. This header
3133is included by the global header file \verb=<adolc/adolc.h>= automatically.
3134An example program \verb=checkpointing.cpp= illustrates the
3135checkpointing facilities. It can be found in the directory \verb=examples/additional_examples/checkpointing=.
3136%
3137%
3138%
3139\subsection{Advanced algorithmic differentiation of fixed point iterations}
3140%
3141Quite often, the state of the considered system denoted by $x\in\R^n$
3142depends on some design parameters denoted by $u\in\R^m$. One example for this setting
3143forms the flow over an aircraft wing. Here, the shape of the wing that
3144is defined by the design vector $u$ 
3145determines the flow field $x$. The desired quasi-steady state $x_*$
3146fulfills the fixed point equation
3147\begin{align}
3148  \label{eq:fixedpoint}
3149  x_* = F(x_*,u)
3150\end{align}
3151for a given continuously differentiable function
3152$F:\R^n\times\R^m\rightarrow\R^n$. A fixed point property of this kind is
3153also exploited by many other applications.
3154
3155Assume that one can apply the iteration 
3156\begin{align}
3157\label{eq:iteration}
3158 x_{k+1} = F(x_k,u)
3159\end{align}
3160to obtain a linear converging sequence $\{x_k\}$ generated
3161for any given control $u\in\R^n$. Then the limit point $x_*\in\R^n$ fulfils the fixed
3162point equation~\eqref{eq:fixedpoint}. Moreover,
3163suppose that $\|\frac{dF}{dx}(x_*,u)\|<1$ holds for any pair
3164$(x_*,u)$ satisfying equation \eqref{eq:fixedpoint}.
3165Hence, there exists a
3166differentiable function $\phi:\R^m \rightarrow \R^n$,
3167such that $\phi(u) = F(\phi(u),u)$, where the state
3168$\phi(u)$ is a fixed point of $F$ according to a control
3169$u$. To optimize the system described by the state vector $x=\phi(u)$ with respect to
3170the design vector $u$, derivatives of $\phi$ with respect
3171to $u$ are of particular interest.
3172
3173To exploit the advanced algorithmic differentiation  of such fixed point iterations
3174ADOL-C provides the special functions {\tt fp\_iteration(...)}.
3175It has the following interface:
3176\begin{tabbing}
3177\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
3178\>{\sf int
3179  fp\_iteration(}\={\sf sub\_tape\_num,double\_F,adouble\_F,norm,norm\_deriv,eps,eps\_deriv,}\\
3180\>              \>{\sf N\_max,N\_max\_deriv,x\_0,u,x\_fix,dim\_x,dim\_u)}\\
3181\hspace{0.5in}\={\sf short int tag;} \hspace{0.9in}\= \kill    % define tab position
3182\>{\sf short int sub\_tape\_num;}         \> // tape identification for sub\_tape \\
3183\>{\sf int *double\_F;}         \> // pointer to a function that compute for $x$ and $u$ \\
3184\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
3185\>{\sf int *adouble\_F;}        \> // pointer to a function that compute for $x$ and $u$ \\
3186\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
3187\>{\sf int *norm;}              \> // pointer to a function that computes\\
3188\>                              \> // the norm of a vector\\
3189\>{\sf int *norm\_deriv;}       \> // pointer to a function that computes\\
3190\>                              \> // the norm of a vector\\
3191\>{\sf double eps;}             \> // termination criterion for fixed point iteration\\
3192\>{\sf double eps\_deriv;}      \> // termination criterion for adjoint fixed point iteration\\
3193\>{\sf N\_max;}                 \> // maximal number of itertions for state computation\\
3194\>{\sf N\_max\_deriv;}          \> // maximal number of itertions for adjoint computation\\
3195\>{\sf adouble *x\_0;}          \> // inital state of fixed point iteration\\
3196\>{\sf adouble *u;}             \> // value of $u$\\
3197\>{\sf adouble *x\_fic;}        \> // final state of fixed point iteration\\
3198\>{\sf int dim\_x;}             \> // dimension of $x$\\
3199\>{\sf int dim\_u;}             \> // dimension of $u$\\
3200\end{tabbing}
3201%
3202Here {\tt sub\_tape\_num} is an ADOL-C identifier for the subtape that
3203should be used for the fixed point iteration.
3204{\tt double\_F} and {\tt adouble\_F} are pointers to functions, that
3205compute for $x$ and $u$ a single iteration step $y=F(x,u)$. Thereby
3206{\tt double\_F} uses {\tt double} arguments and {\tt adouble\_F}
3207uses ADOL-C {\tt adouble} arguments. The parameters {\tt norm} and
3208{\tt norm\_deriv} are pointers to functions computing the norm
3209of a vector. The latter functions together with {\tt eps},
3210{\tt eps\_deriv}, {\tt N\_max}, and {\tt N\_max\_deriv} control
3211the iterations. Thus the following loops are performed:
3212\begin{center}
3213\begin{tabular}{ll}
3214  do                     &   do                           \\
3215  ~~~~$k = k+1$          &   ~~~~$k = k+1$                \\
3216  ~~~~$x = y$            &   ~~~~$\zeta = \xi$            \\
3217  ~~~~$y = F(x,u)$       &   ~~~
3218  $(\xi^T,\bar u^T) = \zeta^TF'(x_*,u) + (\bar x^T, 0^T)$ \\
3219  while $\|y-x\|\geq\varepsilon$ and $k\leq N_{max}$ \hspace*{0.5cm} &
3220  while $\|\xi -\zeta\|_{deriv}\geq\varepsilon_{deriv}$   \\
3221  & and $k\leq N_{max,deriv}$
3222\end{tabular}
3223\end{center}
3224The vector for the initial iterate and the control is stored
3225in {\tt x\_0} and {\tt u} respectively. The vector in which the
3226fixed point is stored is {\tt x\_fix}. Finally {\tt dim\_x}
3227and {\tt dim\_u} represent the dimensions $n$ and $m$ of the
3228corresponding vectors.
3229
3230The presented driver is prototyped in the header file
3231\verb=<adolc/fixpoint.h>=. This header
3232is included by the global header file \verb=<adolc/adolc.h>= automatically.
3233An example code that shows also the
3234expected signature of the function pointers is contained in the directory \verb=examples/additional_examples/fixpoint_exam=.
3235%
3236\subsection{Advanced algorithmic differentiation of OpenMP parallel programs}
3237%
3238ADOL-C allows to compute derivatives in parallel for functions
3239containing OpenMP parallel loops.
3240This implies that an explicit loop-handling approach is applied. A
3241typical situation is shown in \autoref{fig:basic_layout},
3242\begin{figure}[hbt]
3243    \vspace{3ex}
3244    \begin{center}
3245        \includegraphics[height=4cm]{multiplexed} \\
3246        \begin{picture}(0,0)
3247            \put(-48,40){\vdots}
3248            \put(48,40){\vdots}
3249            \put(-48,80){\vdots}
3250            \put(48,80){\vdots}
3251            \put(-83,132){function eval.}
3252            \put(5,132){derivative calcul.}
3253        \end{picture}
3254    \end{center}
3255    \vspace{-5ex}
3256    \caption{Basic layout of mixed function and the corresponding derivation process}
3257    \label{fig:basic_layout}
3258\end{figure}
3259where the OpenMP-parallel loop is preceded by a serial startup
3260calculation and followed by a serial finalization phase.
3261
3262Initialization of the OpenMP-parallel regions for \mbox{ADOL-C} is only a matter of adding a macro to the outermost OpenMP statement.
3263Two macros are available that only differ in the way the global tape information is handled.
3264Using {\tt ADOLC\_OPENMP}, this information, including the values of the augmented variables, is always transferred from the serial to the parallel region using {\it firstprivate} directives for initialization.
3265For the special case of iterative codes where parallel regions, working on the same data structures, are called repeatedly the {\tt ADOLC\_OPENMP\_NC} macro can be used.
3266Then, the information transfer is performed only once within the iterative process upon encounter of the first parallel region through use of the {\it threadprivate} feature of OpenMP that makes use of thread-local storage, i.e., global memory local to a thread.
3267Due to the inserted macro, the OpenMP statement has the following structure:
3268\begin{tabbing}
3269\hspace*{1cm} \= {\sf \#pragma omp ... ADOLC\_OPENMP} \qquad \qquad or \\
3270              \> {\sf \#pragma omp ... ADOLC\_OPENMP\_NC}
3271\end{tabbing}
3272Inside the parallel region, separate tapes may then be created.
3273Each single thread works in its own dedicated AD-environment, and all
3274serial facilities of \mbox{ADOL-C} are applicable as usual. The global
3275derivatives can be computed using the tapes created in the serial and
3276parallel parts of the function evaluation, where user interaction is
3277required for the correct derivative concatenation of the various tapes.
3278
3279For the usage of the parallel facilities, the \verb=configure=-command
3280has to be used with the option \verb?--with-openmp-flag=FLAG?, where
3281\verb=FLAG= stands for the system dependent OpenMP flag.
3282The parallel differentiation of a parallel program is illustrated
3283by the example program \verb=openmp_exam.cpp= contained in \verb=examples/additional_examples/openmp_exam=.
3284%
3285%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3286%
3287\section{Traceless forward differentiation in ADOL-C}
3288\label{tapeless}
3289%
3290Up to version 1.9.0, the development of the ADOL-C software package
3291was based on the decision to store all data necessary for derivative
3292computation on traces (tapes), where large applications require the traces to be
3293written out to corresponding files. In almost all cases this means
3294a considerable drawback in terms of run time due to the excessive
3295memory accesses. Using these traces enables ADOL-C to offer multiple
3296functions. However, it is not necessary for all tasks of derivative
3297computation to do that.
3298
3299Starting with version 1.10.0, ADOL-C now features a traceless forward
3300mode for computing first order derivatives in scalar mode, i.e.,
3301$\dot{y} = F'(x)\dot{x}$, and in vector mode, i.e., $\dot{Y} = F'(x)\dot{X}$.
3302This traceless variant coexists with the more universal
3303trace based mode in the package. The following subsections describe
3304the source code modifications required to use the traceless forward mode of
3305ADOL-C. 
3306%
3307\subsection{Modifying the Source Code}
3308%
3309Let us consider the coordinate transformation from Cartesian to spherical
3310polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y
3311= F(x)$, with
3312\begin{eqnarray*}
3313y_1  =  \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad
3314y_2  =  \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad
3315y_3  =  \arctan(x_2/x_1),
3316\end{eqnarray*}
3317as an example. The corresponding source code is shown in \autoref{fig:traceless}.
3318\begin{figure}[htb]
3319\framebox[\textwidth]{\parbox{\textwidth}{
3320%\begin{center}
3321%\begin{flushleft}
3322\begin{tabbing}
3323\= \kill
3324\> {\sf \#include} {\sf $<$iostream$>$}\\
3325\> {\sf using namespace std;}\\
3326\> \\
3327\> {\sf int main() \{}\\
3328\> {\sf \rule{0.5cm}{0pt}double x[3], y[3];}\\
3329\> \\
3330\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3331\> {\sf \rule{1cm}{0pt}...}\\
3332\> \\
3333\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3334\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3335\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3336\> \\
3337\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0] $<<$ " , y2=" $<<$ y[1] $<<$ " , y3=" $<<$ y[2] $<<$ endl;}\\
3338\> \\
3339\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3340\> \}
3341\end{tabbing}
3342%\end{flushleft}
3343%\end{center}
3344}}
3345\caption{Example for traceless forward mode}
3346\label{fig:traceless}
3347\end{figure}
3348%
3349Changes to the source code that are necessary for applying the
3350traceless forward mode of ADOL-C are described in the following two
3351subsections, where the vector mode version is described
3352as extension of the scalar mode.
3353%
3354\subsubsection*{The scalar mode}
3355%
3356To use the traceless forward mode, one has to include only
3357the header file \verb#adtl.h# since it does not include the
3358trace based functions defined in other header files.
3359
3360As in the trace based forward version of ADOL-C all derivative
3361calculations are introduced by calls to overloaded
3362operators. Therefore, similar to the trace-based version all
3363independent, intermediate and dependent variables must be declared
3364with type {\sf adouble}. The whole traceless functionality provided by
3365\verb#adtl.h# was written as complete inline intended code
3366due to run time aspects, where the real portion of inlined code can
3367be influenced by switches for many compilers. Likely, the whole
3368derivative code is inlined by default. Our experiments
3369with the traceless mode have produced complete inlined code by using
3370standard switches (optimization) for GNU and Intel C++
3371compiler.
3372
3373To avoid name conflicts
3374resulting from the inlining the traceless version has its own namespace
3375\verb#adtl#. As a result four possibilities of using the {\sf adouble}
3376type are available for the traceless version:
3377\begin{itemize}
3378\item Defining a new type
3379      \begin{center}
3380        \begin{tabular}{l}
3381          {\sf typedef adtl::adouble adouble;}\\
3382          ...\\
3383          {\sf adouble tmp;}
3384        \end{tabular}
3385      \end{center}
3386      This is the preferred way. Remember, you can not write an own
3387      {\sf adouble} type/class with different meaning after doing the typedef.
3388\item Declaring with namespace prefix
3389      \begin{center}
3390        \begin{tabular}{l}
3391          {\sf adtl::adouble tmp;}
3392        \end{tabular}
3393      \end{center}
3394      Not the most handsome and efficient way with respect to coding
3395      but without any doubt one of the safest ways. The identifier
3396      {\sf adouble} is still available for user types/classes.
3397\item Trusting macros
3398      \begin{center}
3399        \begin{tabular}{l}
3400          {\sf \#define adouble adtl::adouble}\\
3401          ...\\
3402          {\sf adouble tmp;}
3403        \end{tabular}
3404      \end{center}
3405      This approach should be used with care, since standard defines are text replacements.
3406  \item Using the complete namespace
3407        \begin{center}
3408          \begin{tabular}{l}
3409            {\sf using namespace adtl;}\\
3410            ...\\
3411            {\sf adouble tmp;}
3412          \end{tabular}
3413        \end{center}
3414        A very clear approach with the disadvantage of uncovering all the hidden secrets. Name conflicts may arise!
3415\end{itemize}
3416After defining the variables only two things are left to do. First
3417one needs to initialize the values of the independent variables for the
3418function evaluation. This can be done by assigning the variables a {\sf
3419double} value. The {\sf ad}-value is set to zero in this case.
3420Additionally, the traceless forward mode variant of ADOL-C
3421offers a function named {\sf setValue} for setting the value without
3422changing the {\sf ad}-value. To set the {\sf ad}-values of the independent
3423variables ADOL-C offers two possibilities:
3424\begin{itemize}
3425  \item Using the constructor
3426        \begin{center}
3427          \begin{tabular}{l}
3428        {\sf double seedOne = 1., seedZero = 0.;}\\
3429            {\sf adouble x1(2,\&seedOne), x2(4,\&seedZero), y;}
3430          \end{tabular}
3431        \end{center}
3432        This would create three adoubles $x_1$, $x_2$ and $y$. Obviously, the latter
3433        remains uninitialized. In terms of function evaluation
3434        $x_1$ holds the value 2 and $x_2$ the value 4 whereas the derivative values
3435        are initialized to $\dot{x}_1=1$ and $\dot{x}_2=0$.
3436   \item Setting point values directly
3437         \begin{center}
3438           \begin{tabular}{l}
3439         {\sf double seedOne = 1., seedZero = 0.;}\\
3440             {\sf adouble x1=2, x2=4, y;}\\
3441             ...\\
3442             {\sf x1.setADValue(\&seedOne);}\\
3443             {\sf x2.setADValue(\&seedZero);}
3444           \end{tabular}
3445         \end{center}
3446         The same example as above but now using {\sf setADValue}-method for initializing the derivative values. It is important to note that in both cases the derivative values have to be set by passing an address of a variable or a pointer of the {\sf double} type (instead of passing the value directly). The reason for doing this will become clear in the next subsection.
3447\end{itemize}
3448%
3449The derivatives can be obtained at any time during the evaluation
3450process by calling the {\sf getADValue}-method which returns a pointer (instead of a value), and therefore, it can be assigned to another pointer or it can be dereferenced immediately in order to read the value:
3451\begin{center}
3452  \begin{tabular}{l}
3453    {\sf adouble y;}\\
3454    ...\\
3455    {\sf cout $<<$ *(y.getADValue());}
3456  \end{tabular}
3457\end{center}
3458\autoref{fig:modcode} shows the resulting source code incorporating
3459all required changes for the example
3460given above.
3461
3462\begin{figure}[htb]
3463\framebox[\textwidth]{\parbox{\textwidth}{
3464%\begin{center}
3465\begin{tabbing}
3466\hspace*{-1cm} \= \kill
3467\> {\sf \#include $<$iostream$>$}\\
3468\> {\sf using namespace std;}\\
3469\> \\
3470\> {\sf \#include $<$adolc/adtl.h$>$}\\
3471\> {\sf typedef adtl::adouble adouble;}\\
3472\\
3473\> {\sf int main() \{}\\
3474\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3475\\
3476\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3477\> {\sf \rule{1cm}{0pt}...}\\
3478\\
3479\> {\sf \rule{0.5cm}{0pt}double seed = 1.;}\\
3480\> {\sf \rule{0.5cm}{0pt}x[0].setADValue(\&seed);\hspace*{3cm}// derivative of f with respect to $x_1$}\\
3481\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3482\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3483\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3484\\
3485\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3486\> {\sf \rule{0.5cm}{0pt}cout $<<$ "dy2/dx1 = " $<<$ *(y[1].getADValue()) $<<$ endl;}\\
3487\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3488\> {\sf \}}
3489\end{tabbing}
3490%\end{center}
3491}}
3492\caption{Example for traceless scalar forward mode}
3493\label{fig:modcode}
3494\end{figure}
3495%
3496\subsubsection*{The vector mode}
3497%
3498In scalar mode only one direction element has to be stored per {\sf
3499  adouble} whereas a field of $p$ elements is needed in the vector
3500  mode to cover the computations for the given $p$ directions. The
3501  resulting changes to the source code are described in this section.
3502
3503In order to use the vector mode one has to call
3504\verb#adtl::setNumDir(N)#.  This function takes the maximal number of
3505directions to be used within the resulting vector mode. Changing the
3506number of directions in the middle of executing code containing active
3507variables is undesirable and can lead to segmentation violations,
3508hence such an operation will result in an
3509error. \verb#adtl::setNumDir(N)# must therefore be called before
3510declaring any \verb#adtl::adouble# objects.
3511
3512Setting and getting the derivative values is done in the same manner as in the scalar case, by passing and retrieving the pointers, as illustrated in the following example:
3513\begin{center}
3514  \begin{tabular}{l}
3515    {\sf adtl::setNumDir(10);}\\
3516    {\sf adouble x, y;}\\
3517    {\sf double *ptr1 = new double[10];}\\
3518    {\sf double *ptr2;}\\
3519      ...\\
3520    {\sf x1=2;}\\
3521    {\sf x1.setADValue(ptr1);}\\
3522    ...\\
3523    {\sf ptr2=y.getADValue();}
3524  \end{tabular}
3525\end{center}
3526Now using the pointers in the methods {\sf setADValue} and {\sf getADValue} makes more sense since the vector mode requires the field of $p$ elements per {\sf adouble} which holds the derivative values.
3527Additionally, the traceless vector forward mode of ADOL-C offers two
3528new methods for setting/getting the derivative values. Setting a derivative value of the desired vector element is possible by passing a {\sf double} value and its position in the vector ({\sf integer}). Getting a derivative value of the desired vector element requires as argument only its position in the vector. One can note that the pointers are not used in these cases, as illustrated in the following example:
3529\begin{center}
3530  \begin{tabular}{l}
3531    {\sf adtl::setNumDir(10);}\\
3532    ...\\
3533    {\sf adouble x, y;}\\
3534    ...\\
3535    {\sf x1=2;}\\
3536    {\sf x1.setADValue(5,1);\hspace*{3.7cm}// set the 6th point value of x to 1.0}\\
3537      ...\\
3538    {\sf cout $<<$ y.getADValue(3) $<<$ endl;\hspace*{1cm}// print the 4th derivative value of y}
3539  \end{tabular}
3540\end{center}
3541The resulting source code containing all changes that are required is
3542shown in \autoref{fig:modcode2}
3543
3544\begin{figure}[!h!t!b]
3545\framebox[\textwidth]{\parbox{\textwidth}{
3546\begin{tabbing}
3547\hspace*{-1cm} \= \kill
3548\> {\sf \#include $<$iostream$>$}\\
3549\> {\sf  using namespace std;}\\
3550\\
3551\> {\sf \#include $<$adolc/adtl.h$>$}\\
3552\> {\sf typedef adtl::adouble adouble;}\\
3553\\
3554\> {\sf int main() \{}\\
3555\> {\sf \rule{0.5cm}{0pt}adtl::setNumDir(3);}\\
3556\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3557\\
3558\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3559\> {\sf \rule{1cm}{0pt}...\hspace*{3cm}// Initialize $x_i$}\\
3560\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j) if (i==j) x[i].setADValue(j,1);}\\
3561\> {\sf \rule{0.5cm}{0pt}\}}\\
3562\\
3563\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3564\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3565\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3566\\
3567\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3568\> {\sf \rule{0.5cm}{0pt}cout $<<$ "jacobian : " $<<$ endl;}\\
3569\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3570\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j)}\\
3571\> {\sf \rule{1.5cm}{0pt}cout $<<$ y[i].getADValue(j) $<<$ "  ";}\\
3572\> {\sf \rule{1cm}{0pt}cout $<<$ endl;}\\
3573\> {\sf \rule{0.5cm}{0pt}\}}\\
3574\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3575\> {\sf \}}
3576\end{tabbing}
3577}}
3578\caption{Example for traceless vector forward mode}
3579\label{fig:modcode2}
3580\end{figure}
3581%
3582\subsection{Compiling and Linking the Source Code}
3583%
3584After incorporating the required changes, one has to compile the
3585source code and link the object files to get the executable.
3586As long as the ADOL-C header files are not included in the absolute path
3587the compile sequence should be similar to the following example:
3588\begin{center}
3589  \begin{tabular}{l}
3590    {\sf g++ -I/home/username/adolc\_base/include -c traceless\_scalar.cpp}
3591  \end{tabular}
3592\end{center}
3593The \verb#-I# option tells the compiler where to search for the ADOL-C
3594header files. This option can be omitted when the headers are included
3595with absolute path or if ADOL-C is installed in a ``global'' directory.
3596
3597Although originally designed to be completely inline, certain global
3598variables need to be initialised in the compiled part of the library
3599and therefore the ADOL-C library must be linked with the compiled code.
3600The example started above could be finished with the
3601following command:
3602\begin{center}
3603  \begin{tabular}{l}
3604    {\sf g++ -o traceless\_scalar traceless\_scalar.o}\\
3605    {\sf -Wl,-{}-rpath -Wl,/home/username/adolc\_base/lib -L/home/username/adolc\_base/lib -ladolc}
3606  \end{tabular}
3607\end{center}
3608The mentioned source codes {\sf traceless\_scalar.c} and {\sf traceless\_vector.c} 
3609illustrating the use of the for traceless scalar and vector mode can be found in
3610the directory {\sf examples}.
3611%
3612\subsection{Concluding Remarks for the Traceless Forward Mode Variant}
3613%
3614As many other AD methods the traceless forward mode provided by the
3615ADOL-C package has its own strengths and drawbacks. Please read the
3616following section carefully to become familiar with the things that
3617can occur:
3618\begin{itemize}
3619  \item Advantages:
3620    \begin{itemize}
3621      \item Code speed\\
3622        Increasing computation speed was one of the main aspects in writing
3623        the traceless code. In many cases higher performance can be
3624        expected this way. However this is not guarenteed for repeated
3625        evaluations.
3626      \item Smaller overall memory requirements\\
3627        Traceless ADOL-C does not write traces anymore, as the name
3628        implies. Loop ''unrolling'' can be avoided this
3629        way. Considered main memory plus disk space as overall memory
3630        requirements the traceless version can be
3631        executed in a more efficient way.
3632    \end{itemize}
3633  \item Drawbacks:
3634    \begin{itemize}
3635    \item Main memory limitations\\
3636      The ability to compute derivatives to a given function is
3637      bounded by the main memory plus swap size  when using
3638      traceless ADOL-C. Computation from swap should be avoided anyway
3639      as far as possible since it slows down the computing time
3640      drastically. Therefore, if the program execution is 
3641      terminated without error message insufficient memory size can be
3642      the reason among other things. The memory requirements $M$ can
3643      be determined roughly as followed:
3644      \begin{itemize}
3645        \item Scalar mode: $M=$(number of {\sf adouble}s)$*2 + M_p$
3646        \item Vector mode: $M=$(number of {\sf adouble}s)*({\sf
3647          NUMBER\_DIRECTIONS}$+1) + M_p$ 
3648      \end{itemize}
3649      where the storage size of all non {\sf adouble} based variables is described by $M_p$.
3650    \item Compile time\\
3651      As discussed in the previous sections, the traceless forward mode of
3652      the ADOL-C package is implemented as inline intended version. Using
3653      this approach results in a higher source code size, since every
3654      operation involving at least one {\sf adouble} stands for the
3655      operation itself as well as for the corresponding derivative
3656      code after the inlining process. Therefore, the compilation time
3657      needed for the traceless version may be higher than that of the trace based code.
3658    \item Code Size\\
3659      A second drawback and result of the code inlining is the
3660      increase of code sizes for the binaries. The increase
3661      factor compared to the corresponding trace based program is
3662      difficult to quantify as it is task dependent. Practical results
3663      have shown that values between 1.0 and 2.0 can be
3664      expected. Factors higher than 2.0 are possible too and even
3665      values below 1.0 have been observed.
3666    \end{itemize}
3667\end{itemize}
3668
3669
3670\section{Traceless forward differentiation in ADOL-C using Cuda}
3671\label{tracelessCuda}
3672%
3673One major drawback using the traceless version of ADOL-C is the fact
3674that several function evaluations are needed to compute derivatives
3675in many different points. More precisely, to calculate the Jacobian for
3676a function $F:\mathbb{R}^n\rightarrow \mathbb{R}^m$ in $M$ points, $M$
3677function evaluations are needed for the traceless vector mode and even
3678$M*n$ for the traceless scalar mode. Depending on the size of the function
3679this can result in a long runtime. To achieve a better performance one can
3680use parallelisation techniques as the same operations are
3681performed during a function evaluation. One possibility is to use GPUs
3682since they are optimized for data parallel computation. Starting with
3683version 2.3.0 ADOL-C now features a traceless forward mode for computing
3684first order derivatives in scalar mode on GPUs using the
3685general purpose parallel computing architecture Cuda.
3686
3687The idea is to include parallel code that executes in many GPU threads across
3688processing elements. This can be done by using kernel functions, that is functions
3689which are executed on GPU as an array of threads in parallel. In general all
3690threads execute the same code. They are grouped into blocks which are then grouped
3691into grids. A kernel is executed as a grid of blocks of threads. For more
3692details see, e.g., the NVIDIA CUDA C Programming Guide which can be downloaded from
3693the web page {\sf www.nvidia.com}. To solve the problem of calculating
3694the Jacobian of $F$ at $M$ points it is possible to let each thread perform a
3695function evaluation and thus the computation of derivatives for one direction
3696at one point. The advantage is that the function is evaluated in different points
3697in parallel which can result in a faster wallclock runtime.
3698
3699The following subsection describes the source code modifications required to use
3700the traceless forward mode of ADOL-C with Cuda. 
3701
3702\subsection{Modifying the source code}
3703
3704Let us again consider the coordinate transformation from Cartesian to spherical
3705polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y
3706= F(x)$, with
3707\begin{eqnarray*}
3708y_1  =  \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad
3709y_2  =  \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad
3710y_3  =  \arctan(x_2/x_1),
3711\end{eqnarray*}
3712as an example. We now calculate the Jacobian at $M=1024$ different points.
3713The source code for one point is shown in \autoref{fig:traceless}. This example
3714has no real application but can still be used to show the combination of the traceless version
3715of ADOL-C and Cuda.
3716
3717For the use of this mode a Cuda toolkit, which is suitable for the grafic card used,
3718has to be installed. Furthermore, it is important to check that the graphic card
3719used supports double precision (for details see e.g. NVIDIA CUDA C Programming Guide).
3720Otherwise the data type employed inside of the {\sf adouble} class has to be adapted
3721to {\sf float}. To use the traceless forward mode with Cuda, one has to include the header
3722files \verb#adoublecuda.h# and \verb#cuda.h#. The first one contains the
3723definition of the class {\sf adouble} and the overloaded operators
3724for this version of ADOL-C. As in the other versions all derivative
3725calculations are introduced by calls to overloaded operators. The
3726second header file is needed for the use of Cuda. 
3727
3728One possibility to solve the problem above is the following. First of all
3729three {\sf double} arrays are needed: {\sf x} for the independent variables,
3730{\sf y} for the dependent variables and {\sf deriv} for the values of the
3731Jacobian matrices. The independent variables have to be initialised, therefore
3732the points at which the function should be evaluated are saved in a row in the
3733same array of length $3*M$. For the computation on GPUs one also has to allocate memory
3734on this device. Using the syntax in Cuda one can allocate an array of
3735length $3*M$ (number of independent variables times number of points) for
3736the independent variables as follows:
3737\begin{center}
3738  \begin{tabular}{l}
3739    {\sf double * devx;}\\
3740    {\sf cudaMalloc((void**)\&devx, 3*M*sizeof(double));}\\
3741  \end{tabular}
3742\end{center}
3743The arrays for the dependent variables and the values of the Jacobian matrices
3744are allocated in the same way. Then the values of the independent variables
3745have to be copied to the GPU using the following command
3746\begin{center}
3747  \begin{tabular}{l}
3748    {\sf cudaMemcpy(devx, x, sizeof(double)*3*M, cudaMemcpyHostToDevice);}\\
3749  \end{tabular}
3750\end{center}   
3751The argument {\sf cudaMemcpyHostToDevice} indicates that the values are copied from
3752the host to the GPU. In this case the values stored in {\sf x} are copied to {\sf devx}.
3753
3754Now all required information has been transferred to the GPU. The changes
3755in the source code made so far are summarized in \autoref{fig:cudacode1}.
3756 
3757\begin{figure}[!h!t!b]
3758\framebox[\textwidth]{\parbox{\textwidth}{
3759%\begin{center}
3760\begin{tabbing}
3761\hspace*{-1cm} \= \kill
3762\> {\sf \#include $<$iostream$>$}\\
3763\> {\sf \#include $<$cuda.h$>$}\\
3764\> {\sf \#include $<$adoublecuda.h$>$}\\
3765\> {\sf using namespace std;}\\
3766\\
3767\> {\sf int main() \{}\\
3768\> {\sf \rule{0.5cm}{0pt}int M=1024;}\\
3769\> {\sf \rule{0.5cm}{0pt}double* deriv = new double[9*M];}\\
3770\> {\sf \rule{0.5cm}{0pt}double* y = new double[3*M];}\\
3771\> {\sf \rule{0.5cm}{0pt}double* x = new double[3*M];}\\
3772\\
3773\> {\sf \rule{0.5cm}{0pt}// Initialize $x_i$}\\
3774\> {\sf \rule{0.5cm}{0pt}for (int k=0; k$<$M; ++k)\{}\\
3775\> {\sf \rule{1cm}{0pt}for (int i=0; i$<$3; ++i)}\\
3776\> {\sf \rule{1.5cm}{0pt}x[k*3+i] =i + 1/(k+1);\}}\\
3777\\
3778\> {\sf \rule{0.5cm}{0pt}// Allocate array for independent and dependent variables}\\
3779\> {\sf \rule{0.5cm}{0pt}// and Jacobian matrices on GPU}\\
3780\> {\sf \rule{0.5cm}{0pt}double * devx;}\\
3781\> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devx, 3*M*sizeof(double));}\\
3782\> {\sf \rule{0.5cm}{0pt}double * devy;}\\
3783\> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devy, 3*M*sizeof(double));}\\
3784\> {\sf \rule{0.5cm}{0pt}double * devderiv;}\\
3785\> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devderiv, 3*3*M*sizeof(double));}\\
3786\\
3787\> {\sf \rule{0.5cm}{0pt}// Copy values of independent variables from host to GPU}\\
3788\> {\sf \rule{0.5cm}{0pt}cudaMemcpy(devx, x, sizeof(double)*3*M, cudaMemcpyHostToDevice);}\\
3789\\
3790\> {\sf \rule{0.5cm}{0pt}// Call function to specify amount of blocks and threads to be used}\\
3791\> {\sf \rule{0.5cm}{0pt}kernellaunch(devx, devy, devderiv,M);}\\
3792\\
3793\> {\sf \rule{0.5cm}{0pt}// Copy values of dependent variables and Jacobian matrices from GPU to host}\\
3794\> {\sf \rule{0.5cm}{0pt}cudaMemcpy(y, devy, sizeof(double)*3*M,cudaMemcpyDeviceToHost);}\\
3795\> {\sf \rule{0.5cm}{0pt}cudaMemcpy(deriv, devderiv, sizeof(double)*3*3*M, cudaMemcpyDeviceToHost);}\\
3796\> {\sf \}}
3797\end{tabbing}
3798%\end{center}
3799}}
3800\caption{Example for traceless scalar forward mode with Cuda}
3801\label{fig:cudacode1}
3802\end{figure}
3803
3804In the next step the user has to specify how many blocks and threads per block
3805will be needed for the function evaluations. In the present example this is done by 
3806the call of the function {\sf kernellaunch}, see \autoref{fig:cudacode2}.
3807In this case the blocks are two dimensional: the x-dimension is determined
3808by the number of points $M$ at which the Jacobian matrix has to be calculated
3809while the y-dimension is given by the number of independent variables, i.e., 3.
3810Since a block cannot contain more than 1024 threads, the
3811x-dimension in the example is 64 instead of $M=1024$. Therefore $1024/64=16$
3812blocks are needed. The described division into blocks is reasonable as each
3813thread has to perform a function evaluation for one point and one direction,
3814hence $M*3$ threads are needed where 3 denotes the number of independent
3815variables corresponding to the number of directions needed for the computation
3816of the Jacobian in one point.
3817
3818\begin{figure}[!h!t!b]
3819\framebox[\textwidth]{\parbox{\textwidth}{
3820\begin{tabbing}
3821\hspace*{-1cm} \= \kill
3822\> {\sf cudaError\_t kernellaunch(double* inx, double* outy, double* outderiv, int M) \{}\\
3823\> {\sf \rule{0.5cm}{0pt}// Create 16 blocks}\\
3824\> {\sf \rule{0.5cm}{0pt}int Blocks=16;}\\
3825\> {\sf \rule{0.5cm}{0pt}// Two dimensional (M/Blocks)$\times$3 blocks}\\
3826\> {\sf \rule{0.5cm}{0pt}dim3 threadsPerBlock(M/Blocks,3);}\\
3827\\
3828\> {\sf \rule{0.5cm}{0pt}// Call kernel function with 16 blocks with (M/Blocks)$\times$3 threads per block}\\
3829\> {\sf \rule{0.5cm}{0pt}kernel $<<<$ Blocks, threadsPerBlock $>>>$(inx, outy, outderiv);}\\
3830\> {\sf \rule{0.5cm}{0pt}cudaError\_t cudaErr = cudaGetLastError();}\\
3831\\
3832\> {\sf \rule{0.5cm}{0pt}return cudaErr;}\\
3833\> {\sf \}}
3834\end{tabbing}
3835}}
3836\caption{Example for traceless scalar forward mode with Cuda}
3837\label{fig:cudacode2}
3838\end{figure}
3839
3840We can now perform the function evaluations together with the calculation
3841of the Jacobians. The corresponding code is illustrated in \autoref{fig:cudacode3}.
3842Since the function evaluations should be performed on a GPU a kernel function is
3843needed which is defined by using the \mbox{{\sf \_\_ global\_\_}} declaration specifier
3844(see \autoref{fig:cudacode3}). Then each thread executes the operations that are
3845defined in the kernel.
3846
3847\begin{figure}[!h!t!b]
3848\framebox[\textwidth]{\parbox{\textwidth}{
3849\begin{tabbing}
3850\hspace*{-1cm} \= \kill
3851\> {\sf \_\_global\_\_ void kernel(double* inx, double* outy, double* outderiv) \{}\\
3852\> {\sf \rule{0.5cm}{0pt}const int index = threadIdx.x ;}\\
3853\> {\sf \rule{0.5cm}{0pt}const int index1 = threadIdx.y;}\\
3854\> {\sf \rule{0.5cm}{0pt}const int index2 = blockIdx.x;}\\
3855\> {\sf \rule{0.5cm}{0pt}const int index3 = blockDim.x;}\\
3856\> {\sf \rule{0.5cm}{0pt}const int index4 = blockDim.x*blockDim.y;}\\
3857\\
3858\> {\sf \rule{0.5cm}{0pt}// Declare dependent and independent variables as {\sf adouble}s}\\
3859\> {\sf \rule{0.5cm}{0pt}adtlc::adouble y[3], x[3];}\\
3860\> {\sf \rule{0.5cm}{0pt}// Read out point for function evaluation}\\
3861\> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\
3862\> {\sf \rule{1cm}{0pt}x[i]=inx[index2*index4+index*3+i];}\\
3863\> {\sf \rule{0.5cm}{0pt}// Set direction for calculation of derivatives}\\
3864\> {\sf \rule{0.5cm}{0pt}x[index1].setADValue(1);}\\
3865\\
3866\> {\sf \rule{0.5cm}{0pt}// Function evaluation}\\
3867\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3868\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3869\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3870\\
3871\> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\
3872\> {\sf \rule{1cm}{0pt}outy[(index2*index3+index)*3+i]=y[i].getValue();}\\
3873\> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\
3874\> {\sf \rule{1cm}{0pt}outderiv[(index2*index4+index*3+index1)*3+i]=y[i].getADValue();}\\
3875\> {\sf \}}
3876\end{tabbing}
3877}}
3878\caption{Example for traceless scalar forward mode with Cuda}
3879\label{fig:cudacode3}
3880\end{figure}
3881
3882In this example each thread is assigned the task of calculating the
3883derivatives in one point with respect to one independent variable. Therefore
3884some indices are needed for the implementation. Each thread has a unique thread
3885ID in a block consisting of an x and an y-dimension. The blocks have an ID as
3886well. In the example the following indices are used.
3887 
3888\begin{itemize}
3889 \item {\sf index = threadIdx.x} denotes the x-dimension of a thread (ranges from 0 to 63 in the example)
3890 \item {\sf index1 = threadIdx.y} denotes the y-dimension of a thread (ranges from 0 to 2 in the example)
3891 \item {\sf index2 = blockIdx.x} denotes the block index (ranges from 0 to 15 in the example)
3892 \item {\sf index3 = blockDim.x} denotes the x-dimension of a block (always 64 in the example)
3893 \item {\sf index4 = blockDim.x*blockDim.y} denotes the size of a block (always $64*3=192$ in the example)
3894\end{itemize}
3895
3896For the calculation of derivatives the function evaluation has to be performed
3897with {\sf adouble}s. Therefore the dependent and the independent variables
3898have to be declared as {\sf adouble}s as in the other versions of ADOL-C (see, e.g., \autoref{tapeless}).
3899Similar to the traceless version without Cuda the namespace \verb#adtlc# is used.
3900Now each thread has to read out the point at which the function evaluation is
3901then performed. This is determined by the blockindex and the x-dimension of
3902the thread, therefore the independent variables in a thread
3903have the values
3904\begin{center}
3905  \begin{tabular}{l}
3906    {\sf \rule{0.5cm}{0pt}x[i]=inx[index2*index4+index*3+i];}\\
3907  \end{tabular}
3908\end{center} 
3909for {\sf i=0,1,2} where {\sf inx} corresponds to the vector {\sf devx}. The direction for
3910the derivatives is given by {\sf index1}.
3911\begin{center}
3912  \begin{tabular}{l}
3913     {\sf \rule{0.5cm}{0pt}x[index1].setADValue(1);}\\
3914  \end{tabular}
3915\end{center} 
3916The functions for setting and getting the value and the derivative value of an
3917{\sf adouble} are the same as in the traceless version of ADOL-C for first
3918order derivatives (see \autoref{tapeless}).
3919The function evaluation is then performed with {\sf adouble x} on GPU and the results
3920are saved in {\sf adouble y}. The function evaluation itself remains unchanged
3921(see \autoref{fig:cudacode3}). Then we store the values of the function
3922evaluations in each point in a row in one array:
3923\begin{center}
3924  \begin{tabular}{l}
3925   {\sf \rule{0.5cm}{0pt}outy[(index2*index3+index)*3+i]=y[i].getValue();}\\
3926  \end{tabular}
3927\end{center} 
3928for {\sf i=0,1,2}. The values of a Jacobian are stored column by column in
3929a row:
3930\begin{center}
3931  \begin{tabular}{l}
3932   {\sf \rule{0.5cm}{0pt}outderiv[(index2*index4+index*3+index1)*3+i]=y[i].getADValue();}\\
3933  \end{tabular}
3934\end{center} 
3935for {\sf i=0,1,2}.
3936
3937Now there is one thing left to do. The values calculated on the GPU have to be copied
3938back to host. For the dependent variables in the example this can be done by the
3939following call:
3940\begin{center}
3941  \begin{tabular}{l}
3942 {\sf \rule{0.5cm}{0pt}cudaMemcpy(y, devy, sizeof(double)*3*M,cudaMemcpyDeviceToHost);}\\
3943  \end{tabular}
3944\end{center} 
3945see \autoref{fig:cudacode1}. The argument {\sf cudaMemcpyDeviceToHost} determines that
3946the values are copied from GPU to host.
3947
3948%
3949\subsection{Compiling and Linking the Source Code}
3950%
3951After incorporating the required changes, one has to compile the
3952source code and link the object files to get the executable. For
3953the compilation of a Cuda file the {\sf nvcc} compiler is needed.
3954The compile sequence should be similar to the following example:
3955\begin{center}
3956  \begin{tabular}{l}
3957    {\sf nvcc -arch=sm\_20 -o traceless\_cuda traceless\_cuda.cu}
3958  \end{tabular}
3959\end{center}
3960The compiler option \verb#-arch=sm_20# specifies the compute capability that is
3961assumed, in this case one that supports double precision.
3962
3963The mentioned source code {\sf traceless\_cuda.cu} illustrating the use of the
3964forward traceless scalar mode with Cuda and a further example {\sf liborgpu.cu}
3965can be found in the directory examples. The second example is an adaption of
3966the OpenMP example to the traceless version with Cuda.
3967
3968%
3969%++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3970\section{Installing and Using ADOL-C}
3971\label{install}
3972%
3973\subsection{Generating the ADOL-C Library}
3974\label{genlib}
3975%
3976The currently built system is best summarized by the ubiquitous gnu
3977install triplet
3978\begin{center}
3979\verb=configure - make - make install= .
3980\end{center}
3981Executing this three steps from the package base directory
3982\verb=</SOMEPATH/=\texttt{\packagetar}\verb=>= will compile the static and the dynamic
3983ADOL-C library with default options and install the package (libraries
3984and headers) into the default installation directory {\tt
3985  \verb=<=\$HOME/adolc\_base\verb=>=}. Inside the install directory
3986the subdirectory \verb=include= will contain all the installed header
3987files that may be included by the user program, the subdirectory
3988\verb=lib= will contain the 32-bit compiled library
3989and the subdirectory \verb=lib64= will contain the 64-bit compiled
3990library. Depending on the compiler only one of \verb=lib= or
3991\verb=lib64= may be created.
3992
3993Before doing so the user may modify the header file \verb=usrparms.h=
3994in order to tailor the \mbox{ADOL-C} package to the needs in the
3995particular system environment as discussed in
3996\autoref{Customizing}. The configure procedure which creates the necessary
3997\verb=Makefile=s can be customized by use of some switches. Available
3998options and their meaning can be obtained by executing
3999\verb=./configure --help= from the package base directory.
4000
4001All object files and other intermediately generated files can be
4002removed by the call \verb=make clean=. Uninstalling ADOL-C by
4003executing \verb=make uninstall= is only reasonable after a previous
4004called \verb=make install= and will remove all installed package files
4005but will leave the created directories behind.
4006
4007The sparse drivers are included in the ADOL-C libraries if the
4008\verb=./configure= command is executed with the option
4009\verb=--enable-sparse=. The ColPack library available at
4010\verb=http://cscapes.cs.purdue.edu/coloringpage/software.htm= is required to
4011compute the sparse structures, and is searched for in all the default
4012locations.
4013In case the library and its headers are installed in a nonstandard path
4014this may be specified with the \verb?--with-colpack=PATH? option.
4015It is assumed that the library and its header files have the following
4016directory structure: \verb?PATH/include? contains all the header
4017files,
4018\verb?PATH/lib? contains the 32-bit compiled library and
4019\verb?PATH/lib64? contains the 64-bit compiled library. Depending on
4020the compiler used to compile {\sf ADOL-C} one of these libraries will
4021be used for linking.
4022 
4023The option \verb=--disable-stdczero= turns off the initialization in the {\sf adouble} default
4024constructor. This will improve efficiency but requires that there be no implicit array initialization in the code, see
4025\autoref{WarSug}
4026
4027Support for MPI and the AdjoinableMPI API (see\\
4028\verb=http://www.mcs.anl.gov/~utke/AdjoinableMPI/AdjoinableMPIDox/index.html=
4029) may be enabled using the option \verb=--enable-ampi=. This requires
4030the presence of MPI compiler wrappers \verb=mpicc= and \verb=mpicxx=
4031in the \verb=$PATH= and the AdjoinableMPI libraries in the standard
4032locations. If MPI is installed in a nonstandard path one may specify
4033this using the \verb?--with-mpi=PATH? option. Similarly if the
4034AdjoinableMPI libraries are in an nonstandard path this may be
4035specified using the \verb?--with-ampi=PATH? option.  When MPI and
4036AdjoinableMPI support is compiled into {\sf ADOL-C} the generated
4037library will be unsuitable for linking with non-MPI
4038programs and is called \verb=libadolc_ampi=. Therefore the user
4039programs should use the flag \verb=-ladolc_ampi= instead of
4040\verb=-ladolc= in this case. An advanced user may infact specify
4041whatever name the resulting {\sf ADOL-C} library should have using the
4042\verb?--with-soname=SONAME? option.
4043
4044
4045\subsection{Compiling and Linking the Example Programs}
4046%
4047The installation procedure described in \autoref{genlib} also
4048provides the \verb=Makefile=s  to compile the example programs in the
4049directories \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= and the
4050additional examples in
4051\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples/additional_examples=. However,
4052one has to execute the
4053\verb=configure= command with  appropriate options for the ADOL-C package to enable the compilation of
4054examples. Available options are:
4055\begin{center}
4056\begin{tabular}[t]{ll}
4057\verb=--enable-docexa=&build all examples discussed in this manual\\
4058&(compare \autoref{example})\\
4059\verb=--enable-addexa=&build all additional examples\\
4060&(See file \verb=README= in the various subdirectories)
4061\end{tabular}
4062\end{center}
4063
4064Just calling \verb=make= from the packages base directory generates
4065all configured examples and the library if necessary. Compiling from
4066subdirectory \verb=examples= or one of its subfolders is possible
4067too. At least one kind of the ADOL-C library (static or shared) must
4068have been built previously in that case. Hence, building the library
4069is always the first step.
4070
4071For Compiling the library and the documented examples on Windows using
4072Visual Studio please refer to the \verb=<Readme_VC++.txt>= files in
4073the \verb=<windows/>=, \verb=<ThirdParty/ColPack/>= and
4074\verb=<ADOL-C/examples/>= subdirectories.
4075%
4076\subsection{Description of Important Header Files}
4077\label{ssec:DesIH}
4078%
4079The application of the facilities of ADOL-C requires the user
4080source code (program or module) to include appropriate
4081header files where the desired data types and routines are
4082prototyped. The new hierarchy of header files enables the user
4083to take one of two possible ways to access the right interfaces.
4084The first and easy way is recommended to beginners: As indicated in
4085\autoref{globalHeaders} the provided {\em global} header file
4086\verb=<adolc/adolc.h>= can be included by any user code to support all
4087capabilities of ADOL-C depending on the particular programming language
4088of the source.   
4089
4090\begin{table}[h]
4091\center \small
4092\begin{tabular}{|p{3.6cm}|p{10.5cm}|}\hline
4093\verb=<adolc/adolc.h>= &
4094\begin{tabular*}{10.5cm}{cp{9.5cm}}
4095  \boldmath $\rightarrow$ \unboldmath
4096                 & global header file available for easy use of ADOL-C; \\
4097  $\bullet$      & includes all ADOL-C header files depending on
4098                   whether the users source is C++ or C code.
4099\end{tabular*}
4100\\ \hline
4101\verb=<adolc/usrparms.h>= &
4102\begin{tabular*}{10.5cm}{cp{9.5cm}}
4103  \boldmath $\rightarrow$ \unboldmath
4104                 & user customization of ADOL-C package (see
4105                   \autoref{Customizing}); \\
4106  $\bullet$      & after a change of
4107                   user options the ADOL-C library \verb=libadolc.*=
4108                   has to be rebuilt (see \autoref{genlib}); \\
4109  $\bullet$      & is included by all ADOL-C header files and thus by all user
4110                   programs.
4111\end{tabular*} \\ \hline
4112\end{tabular}
4113\caption{Global header files}
4114\label{globalHeaders}
4115\end{table} 
4116
4117The second way is meant for the more advanced ADOL-C user: Some source code
4118includes only those interfaces used by the particular application.
4119The respectively needed header files are indicated
4120throughout the manual.
4121Existing application determined dependences between the provided
4122ADOL-C routines are realized by automatic includes of headers in order
4123to maintain easy use. The header files important to the user are described
4124in the \autoref{importantHeaders1} and \autoref{importantHeaders2}.
4125
4126\begin{table}[h]
4127\center \small
4128\begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline
4129%\multicolumn{2}{|l|}{\bf Tracing/taping}\\ \hline
4130\verb=<adolc/adouble.h>= &
4131\begin{tabular*}{10.5cm}{cp{9.5cm}}
4132  \boldmath $\rightarrow$ \unboldmath
4133                & provides the interface to the basic active
4134                  scalar data type of ADOL-C: {\sf class adouble}
4135                  (see \autoref{prepar});
4136%  $\bullet$     & includes the header files \verb=<adolc/avector.h>= and \verb=<adolc/taputil.h>=.
4137\end{tabular*}
4138\\ \hline
4139% \verb=<adolc/avector.h>= &
4140%\begin{tabular*}{10.5cm}{cp{9.5cm}}
4141%  \boldmath $\rightarrow$ \unboldmath
4142%                & provides the interface to the active vector
4143%                  and matrix data types of ADOL-C: {\sf class adoublev}
4144%                  and {\sf class adoublem}, respectively
4145%                   (see \autoref{arrays}); \\
4146%  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
4147%\end{tabular*}
4148%\\ \hline
4149 \verb=<adolc/taputil.h>= &
4150\begin{tabular*}{10.5cm}{cp{9.5cm}}
4151  \boldmath $\rightarrow$ \unboldmath
4152                & provides functions to start/stop the tracing of
4153                  active sections (see \autoref{markingActive})
4154                  as well as utilities to obtain
4155                  tape statistics (see \autoref{examiningTape}); \\
4156  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
4157\end{tabular*}
4158\\ \hline
4159\end{tabular}
4160\caption{Important header files: tracing/taping}
4161\label{importantHeaders1}
4162\end{table} 
4163%
4164\begin{table}[h]
4165\center \small
4166\begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline
4167%\multicolumn{2}{|l|}{\bf Evaluation of derivatives}\\ \hline
4168\verb=<adolc/interfaces.h>= &
4169\begin{tabular*}{10.5cm}{cp{9.5cm}}
4170  \boldmath $\rightarrow$ \unboldmath
4171                & provides interfaces to the {\sf forward} and
4172                  {\sf reverse} routines as basic versions of derivative
4173                  evaluation (see \autoref{forw_rev}); \\
4174  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
4175  $\bullet$     & includes the header \verb=<adolc/sparse/sparsedrivers.h>=; \\
4176  $\bullet$     & is included by the header \verb=<adolc/drivers/odedrivers.h>=.
4177\end{tabular*}
4178\\ \hline
4179\verb=<adolc/drivers.h>= &
4180\begin{tabular*}{10.5cm}{cp{9.5cm}}
4181  \boldmath $\rightarrow$ \unboldmath
4182                & provides ``easy to use'' drivers for solving
4183                  optimization problems and nonlinear equations
4184                  (see \autoref{optdrivers}); \\
4185  $\bullet$     & comprises C and Fortran-callable versions.
4186\end{tabular*}
4187\\ \hline
4188\begin{minipage}{3cm}
4189\verb=<adolc/sparse/=\newline\verb= sparsedrivers.h>=
4190\end{minipage}  &
4191\begin{tabular*}{10.5cm}{cp{9.5cm}}
4192  \boldmath $\rightarrow$ \unboldmath
4193                & provides the ``easy to use'' sparse drivers
4194                  to exploit the sparsity structure of
4195                  Jacobians (see \autoref{sparse}); \\
4196  \boldmath $\rightarrow$ \unboldmath & provides interfaces to \mbox{C++}-callable versions
4197                  of {\sf forward} and {\sf reverse} routines
4198                  propagating bit patterns (see \autoref{ProBit}); \\
4199
4200  $\bullet$     & is included by the header \verb=<adolc/interfaces.h>=.
4201\end{tabular*}
4202\\ \hline
4203\begin{minipage}{3cm}
4204\verb=<adolc/sparse/=\newline\verb= sparse_fo_rev.h>=
4205\end{minipage}  &
4206\begin{tabular*}{10.5cm}{cp{9.5cm}}
4207  \boldmath $\rightarrow$ \unboldmath
4208                & provides interfaces to the underlying C-callable
4209                  versions of {\sf forward} and {\sf reverse} routines
4210                  propagating bit patterns.
4211\end{tabular*}
4212\\ \hline
4213\begin{minipage}{3cm}
4214\verb=<adolc/drivers/=\newline\verb= odedrivers.h>=
4215\end{minipage}  &
4216\begin{tabular*}{10.5cm}{cp{9.5cm}}
4217  \boldmath $\rightarrow$ \unboldmath
4218                & provides ``easy to use'' drivers for numerical
4219                  solution of ordinary differential equations
4220                  (see \autoref{odedrivers}); \\
4221  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
4222  $\bullet$     & includes the header \verb=<adolc/interfaces.h>=.
4223\end{tabular*}
4224\\ \hline
4225\begin{minipage}{3cm}
4226\verb=<adolc/drivers/=\newline\verb= taylor.h>=
4227\end{minipage}  &
4228\begin{tabular*}{10.5cm}{cp{9.5cm}}
4229  \boldmath $\rightarrow$ \unboldmath
4230                & provides ``easy to use'' drivers for evaluation
4231                  of higher order derivative tensors (see
4232                  \autoref{higherOrderDeriv}) and inverse/implicit function
4233                  differentiation (see \autoref{implicitInverse});\\
4234  $\bullet$     & comprises C++ and C-callable versions.
4235\end{tabular*}
4236\\ \hline
4237\verb=<adolc/adalloc.h>= &
4238\begin{tabular*}{10.5cm}{cp{9.5cm}}
4239  \boldmath $\rightarrow$ \unboldmath
4240                & provides C++ and C functions for allocation of
4241                  vectors, matrices and three dimensional arrays
4242                  of {\sf double}s.
4243\end{tabular*}
4244\\ \hline
4245\end{tabular}
4246\caption{Important header files: evaluation of derivatives}
4247\label{importantHeaders2}
4248\end{table} 
4249%
4250\subsection{Compiling and Linking C/C++ Programs}
4251%
4252To compile a C/C++ program or single module using ADOL-C
4253data types and routines one has to ensure that all necessary
4254header files according to \autoref{ssec:DesIH} are
4255included. All modules involving {\em active} data types as
4256{\sf adouble}
4257%, {\bf adoublev} and {\bf adoublem}
4258have to be compiled as C++. Modules that make use of a previously
4259generated tape to evaluate derivatives can either be programmed in ANSI-C
4260(while avoiding all C++ interfaces) or in C++. Depending
4261on the chosen programming language the header files provide
4262the right ADOL-C prototypes.
4263For linking the resulting object codes the library \verb=libadolc.*= 
4264must be used (see \autoref{genlib}).
4265%
4266\subsection{Adding Quadratures as Special Functions}
4267%
4268\label{quadrat}
4269%
4270Suppose an integral
4271\[ f(x) = \int\limits^{x}_{0} g(t) dt \]
4272is evaluated numerically by a user-supplied function
4273\begin{center}
4274{\sf  double  myintegral(double\& x);}
4275\end{center}
4276Similarly, let us suppose that the integrand itself is evaluated by
4277a user-supplied block of C code {\sf integrand}, which computes a
4278variable with the fixed name {\sf val} from a variable with the fixed
4279name {\sf arg}. In many cases of interest, {\sf integrand} will
4280simply be of the form
4281\begin{center}
4282{\sf \{ val = expression(arg) \}}\enspace .
4283\end{center}
4284In general, the final assignment to {\sf val} may be preceded
4285by several intermediate calculations, possibly involving local
4286active variables of type {\sf adouble}, but no external or static
4287variables of that type.  However, {\sf integrand} may involve local
4288or global variables of type {\sf double} or {\sf int}, provided they
4289do not depend on the value of {\sf arg}. The variables {\sf arg} and
4290{\sf val} are declared automatically; and as {\sf integrand} is a block
4291rather than a function, {\sf integrand} should have no header line. 
4292
4293Now the function {\sf myintegral} can be overloaded for {\sf adouble}
4294arguments and thus included in the library of elementary functions
4295by the following modifications:
4296\begin{enumerate}
4297\item
4298At the end of the file \verb=<adouble.cpp>=, include the full code
4299defining \\ {\sf double myintegral(double\& x)}, and add the line
4300\begin{center}
4301{\sf extend\_quad(myintegral, integrand); }
4302\end{center}
4303This macro is extended to the definition of
4304 {\sf adouble myintegral(adouble\& arg)}.
4305Then remake the library \verb=libadolc.*= (see \autoref{genlib}).
4306\item
4307In the definition of the class
4308{\sf ADOLC\_DLL\_EXPORT adouble} in \verb=<adolc/adouble.h>=, add the statement
4309\begin{center}
4310{\sf friend adouble myintegral(adouble\&)}.
4311\end{center}
4312\end{enumerate}
4313In the first modification, {\sf myintegral} represents the name of the
4314{\sf double} function, whereas {\sf integrand} represents the actual block
4315of C code.
4316
4317For example, in case of the inverse hyperbolic cosine, we have
4318{\sf myintegral} = {\sf acosh}. Then {\sf integrand} can be written as
4319{\sf \{ val = sqrt(arg*arg-1); \}}
4320so that the line
4321\begin{center}
4322{\sf extend\_quad(acosh,val = sqrt(arg*arg-1));}
4323\end{center}
4324can be added to the file \verb=<adouble.cpp>=.
4325A mathematically equivalent but longer representation of
4326{\sf integrand} is
4327\begin{center}
4328\begin{tabbing}
4329{\sf \{ }\hspace{1.0in}\= {\sf  \{ adouble} \= temp =   \kill
4330 \>{\sf  \{ adouble} \> {\sf temp = arg;} \\
4331 \> \ \> {\sf  temp = temp*temp; } \\ 
4332 \> \ \> {\sf  val = sqrt(temp-1); \}
4333\end{tabbing}
4334\end{center}
4335The code block {\sf integrand} may call on any elementary function that has already
4336been defined in file \verb=<adouble.cpp>=, so that one may also introduce
4337iterated integrals.
4338%
4339%
4340\section{Example Codes}
4341\label{example}
4342%
4343The following listings are all simplified versions of codes that
4344are contained in the example subdirectory
4345\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= of ADOL-C. In particular,
4346we have left out timings, which are included in the complete codes.
4347%
4348\subsection{Speelpenning's Example ({\tt speelpenning.cpp})}
4349%
4350The first example evaluates the gradient and the Hessian of
4351the function
4352\[ 
4353y \; = \; f(x)\; =\; \prod_{i=0}^{n-1} x_i
4354\] 
4355using the appropriate drivers {\sf gradient} and {\sf hessian}.
4356
4357\begin{verbatim}
4358#include <adolc/adouble.h>               // use of active doubles and taping
4359#include <adolc/drivers/drivers.h>       // use of "Easy to Use" drivers
4360                                   // gradient(.) and hessian(.)
4361#include <adolc/taping.h>                // use of taping
4362...
4363void main() {
4364int n,i,j;
4365size_t tape_stats[STAT_SIZE];
4366cout << "SPEELPENNINGS PRODUCT (ADOL-C Documented Example) \n";
4367cout << "number of independent variables = ?  \n";
4368cin >> n;
4369double* xp = new double[n];         
4370double  yp = 0.0;
4371adouble* x = new adouble[n];     
4372adouble  y = 1;
4373for(i=0;i<n;i++)
4374  xp[i] = (i+1.0)/(2.0+i);         // some initialization
4375trace_on(1);                       // tag =1, keep=0 by default
4376  for(i=0;i<n;i++) {
4377    x[i] <<= xp[i]; y *= x[i]; }     
4378  y >>= yp;
4379  delete[] x;                     
4380trace_off();
4381tapestats(1,tape_stats);           // reading of tape statistics
4382cout<<"maxlive "<<tape_stats[2]<<"\n";
4383...                                // ..... print other tape stats
4384double* g = new double[n];       
4385gradient(1,n,xp,g);                // gradient evaluation
4386double** H=(double**)malloc(n*sizeof(double*));
4387for(i=0;i<n;i++)
4388  H[i]=(double*)malloc((i+1)*sizeof(double));
4389hessian(1,n,xp,H);                 // H equals (n-1)g since g is
4390double errg = 0;                   // homogeneous of degree n-1.
4391double errh = 0;
4392for(i=0;i<n;i++)
4393  errg += fabs(g[i]-yp/xp[i]);     // vanishes analytically.
4394for(i=0;i<n;i++) {
4395  for(j=0;j<n;j++) {
4396    if (i>j)                       // lower half of hessian
4397      errh += fabs(H[i][j]-g[i]/xp[j]); } }
4398cout << yp-1/(1.0+n) << " error in function \n";
4399cout << errg <<" error in gradient \n";
4400cout << errh <<" consistency check \n";
4401}                                  // end main
4402\end{verbatim}
4403%
4404\subsection{Power Example ({\tt powexam.cpp})}
4405%
4406The second example function evaluates the $n$-th power of a real
4407variable $x$ in
4408$\log_2 n$ multiplications by recursive halving of the exponent. Since
4409there is only one independent variable, the scalar derivative can be
4410computed by
4411using both {\sf forward} and {\sf reverse}, and the
4412results are subsequently compared.
4413\begin{verbatim}
4414#include <adolc/adolc.h>                 // use of ALL ADOL-C interfaces
4415
4416adouble power(adouble x, int n) {
4417adouble z=1;
4418if (n>0) {                         // recursion and branches
4419  int nh =n/2;                     // that do not depend on
4420  z = power(x,nh);                 // adoubles are fine !!!!
4421  z *= z;
4422  if (2*nh != n) 
4423    z *= x;
4424  return z; }                      // end if
4425else {
4426  if (n==0)                        // the local adouble z dies
4427    return z;                      // as it goes out of scope.
4428  else
4429    return 1/power(x,-n); }        // end else
4430} // end power
4431\end{verbatim}
4432The function {\sf power} above was obtained from the original
4433undifferentiated version by simply changing the type of all
4434{\sf double}s including the return variable to {\sf adouble}s. The new version
4435can now be called from within any active section, as in the following
4436main program.
4437\begin{verbatim}
4438#include ...                       // as above
4439int main() {
4440int i,n,tag=1;
4441cout <<"COMPUTATION OF N-TH POWER (ADOL-C Documented Example)\n\n";
4442cout<<"monomial degree=? \n";      // input the desired degree
4443cin >> n;
4444                                   // allocations and initializations
4445double* Y[1];
4446*Y = new double[n+2];
4447double* X[1];                      // allocate passive variables with
4448*X = new double[n+4];              // extra dimension for derivatives
4449X[0][0] = 0.5;                     // function value = 0. coefficient
4450X[0][1] = 1.0;                     // first derivative = 1. coefficient
4451for(i=0;i<n+2;i++)
4452  X[0][i+2]=0;                     // further coefficients
4453double* Z[1];                      // used for checking consistency
4454*Z = new double[n+2];              // between forward and reverse
4455adouble y,x;                       // declare active variables
4456                                   // beginning of active section
4457trace_on(1);                       // tag = 1 and keep = 0
4458x <<= X[0][0];                     // only one independent var
4459y = power(x,n);                    // actual function call
4460y >>= Y[0][0];                     // only one dependent adouble
4461trace_off();                       // no global adouble has died
4462                                   // end of active section
4463double u[1];                       // weighting vector
4464u[0]=1;                            // for reverse call
4465for(i=0;i<n+2;i++) {               // note that keep = i+1 in call
4466  forward(tag,1,1,i,i+1,X,Y);      // evaluate the i-the derivative
4467  if (i==0)
4468    cout << Y[0][i] << " - " << y.value() << " = " << Y[0][i]-y.value() 
4469    << " (should be 0)\n";
4470  else
4471    cout << Y[0][i] << " - " << Z[0][i] << " = " << Y[0][i]-Z[0][i] 
4472    << " (should be 0)\n";
4473  reverse(tag,1,1,i,u,Z);          // evaluate the (i+1)-st derivative
4474  Z[0][i+1]=Z[0][i]/(i+1); }       // scale derivative to Taylorcoeff.
4475return 1;
4476}                                  // end main
4477\end{verbatim}
4478Since this example has only one independent and one dependent variable,
4479{\sf forward} and {\sf reverse} have the same complexity and calculate
4480the same scalar derivatives, albeit with a slightly different scaling.
4481By replacing the function {\sf power} with any other univariate test function,
4482one can check that {\sf forward} and {\sf reverse} are at least consistent.
4483In the following example the number of independents is much larger
4484than the number of dependents, which makes the reverse mode preferable.
4485%
4486\subsection{Determinant Example ({\tt detexam.cpp})}
4487%
4488Now let us consider an exponentially expensive calculation,
4489namely, the evaluation of a determinant by recursive expansion
4490along rows. The gradient of the determinant with respect to the
4491matrix elements is simply the adjoint, i.e. the matrix of cofactors.
4492Hence the correctness of the numerical result is easily checked by
4493matrix-vector multiplication. The example illustrates the use
4494of {\sf adouble} arrays and pointers. 
4495
4496\begin{verbatim}
4497#include <adolc/adouble.h>               // use of active doubles and taping
4498#include <adolc/interfaces.h>            // use of basic forward/reverse
4499                                   // interfaces of ADOL-C
4500adouble** A;                       // A is an n x n matrix
4501int i,n;                           // k <= n is the order
4502adouble det(int k, int m) {        // of the sub-matrix
4503if (m == 0) return 1.0 ;           // its column indices
4504else {                             // are encoded in m
4505  adouble* pt = A[k-1];
4506  adouble t = zero;                // zero is predefined
4507  int s, p =1;
4508  if (k%2) s = 1; else s = -1;
4509  for(i=0;i<n;i++) {
4510    int p1 = 2*p;
4511    if (m%p1 >= p) {
4512      if (m == p) {
4513        if (s>0) t += *pt; else t -= *pt; }
4514      else {
4515        if (s>0)
4516          t += *pt*det(k-1,m-p);   // recursive call to det
4517        else
4518          t -= *pt*det(k-1,m-p); } // recursive call to det
4519      s = -s;}
4520    ++pt;
4521    p = p1;}
4522  return t; }
4523}                                  // end det
4524\end{verbatim}
4525As one can see, the overloading mechanism has no problem with pointers
4526and looks exactly the same as the original undifferentiated function
4527except for the change of type from {\sf double} to {\sf adouble}.
4528If the type of the temporary {\sf t} or the pointer {\sf pt} had not been changed,
4529a compile time error would have resulted. Now consider a corresponding
4530calling program.
4531
4532\begin{verbatim}
4533#include ...                       // as above
4534int main() {
4535int i,j, m=1,tag=1,keep=1;
4536cout << "COMPUTATION OF DETERMINANTS (ADOL-C Documented Example)\n\n";
4537cout << "order of matrix = ? \n";  // select matrix size
4538cin >> n;
4539A = new adouble*[n];             
4540trace_on(tag,keep);                // tag=1=keep
4541  double detout=0.0, diag = 1.0;   // here keep the intermediates for
4542  for(i=0;i<n;i++) {               // the subsequent call to reverse
4543    m *=2;
4544    A[i] = new adouble[n];         // not needed for adoublem
4545    adouble* pt = A[i];
4546    for(j=0;j<n;j++)
4547      A[i][j] <<= j/(1.0+i);       // make all elements of A independent
4548    diag += A[i][i].value();        // value() converts to double
4549    A[i][i] += 1.0; }
4550  det(n,m-1) >>= detout;           // actual function call
4551  printf("\n %f - %f = %f  (should be 0)\n",detout,diag,detout-diag);
4552trace_off();
4553double u[1];
4554u[0] = 1.0;
4555double* B = new double[n*n];
4556reverse(tag,1,n*n,1,u,B);
4557cout <<" \n first base? : ";
4558for (i=0;i<n;i++) {
4559  adouble sum = 0;
4560  for (j=0;j<n;j++)                // the matrix A times the first n
4561    sum += A[i][j]*B[j];           // components of the gradient B
4562  cout<<sum.value()<<" "; }         // must be a Cartesian basis vector
4563return 1;
4564}                                  // end main
4565\end{verbatim}
4566The variable {\sf diag} should be mathematically
4567equal to the determinant, because the
4568matrix {\sf A} is defined as a rank 1 perturbation of the identity.
4569%
4570\subsection{Ordinary Differential Equation Example ({\tt odexam.cpp})}
4571\label{exam:ode}
4572%
4573Here, we consider a nonlinear ordinary differential equation that
4574is a slight modification of the Robertson test problem
4575given in Hairer and Wanner's book on the numerical solution of
4576ODEs \cite{HW}. The following source code computes the corresponding
4577values of $y^{\prime} \in \R^3$:
4578\begin{verbatim}
4579#include <adolc/adouble.h>                  // use of active doubles and taping
4580#include <adolc/drivers/odedrivers.h>       // use of "Easy To use" ODE drivers
4581#include <adolc/adalloc.h>                  // use of ADOL-C allocation utilities
4582
4583void tracerhs(short int tag, double* py, double* pyprime) {
4584adouble y[3];                         // this time we left the parameters
4585adouble yprime[3];                    // passive and use the vector types
4586trace_on(tag);
4587for (int i=0; i<3; i++)
4588     y[i] <<= py[i];                  // initialize and mark independents
4589yprime[0] = -sin(y[2]) + 1e8*y[2]*(1-1/y[0]);
4590yprime[1] = -10*y[0] + 3e7*y[2]*(1-y[1]);
4591yprime[2] = -yprime[0] - yprime[1];
4592yprime >>= pyprime;                   // mark and pass dependents
4593trace_off(tag);
4594}                                     // end tracerhs
4595\end{verbatim}
4596The Jacobian of the right-hand side has large
4597negative eigenvalues, which make the ODE quite stiff. We  have added
4598some numerically benign transcendentals to make the differentiation
4599more interesting.
4600The following main program uses {\sf forode} to calculate the Taylor series
4601defined by the ODE at the given point $y_0$ and {\sf reverse} as well
4602as {\sf accode} to compute the Jacobians of the coefficient vectors
4603with respect to $x_0$.
4604\begin{verbatim}
4605#include .......                   // as above
4606int main() {
4607int i,j,deg; 
4608int n=3;
4609double py[3];
4610double pyp[3];
4611cout << "MODIFIED ROBERTSON TEST PROBLEM (ADOL-C Documented Example)\n";
4612cout << "degree of Taylor series =?\n";
4613cin >> deg;
4614double **X;
4615X=(double**)malloc(n*sizeof(double*));
4616for(i=0;i<n;i++)
4617  X[i]=(double*)malloc((deg+1)*sizeof(double));
4618double*** Z=new double**[n];
4619double*** B=new double**[n];
4620short** nz = new short*[n];
4621for(i=0;i<n;i++) {
4622  Z[i]=new double*[n];
4623  B[i]=new double*[n];
4624  for(j=0;j<n;j++) {
4625    Z[i][j]=new double[deg];
4626    B[i][j]=new double[deg]; }     // end for
4627}                                  // end for
4628for(i=0;i<n;i++) {
4629  py[i] = (i == 0) ? 1.0 : 0.0;    // initialize the base point
4630  X[i][0] = py[i];                 // and the Taylor coefficient;
4631  nz[i] = new short[n]; }          // set up sparsity array
4632tracerhs(1,py,pyp);                // trace RHS with tag = 1
4633forode(1,n,deg,X);                 // compute deg coefficients
4634reverse(1,n,n,deg-1,Z,nz);         // U defaults to the identity
4635accode(n,deg-1,Z,B,nz);
4636cout << "nonzero pattern:\n";
4637for(i=0;i<n;i++) {
4638  for(j=0;j<n;j++)
4639    cout << nz[i][j]<<"\t";
4640  cout <<"\n"; }                   // end for
4641return 1;
4642}                                  // end main
4643\end{verbatim}
4644\noindent The pattern {\sf nz} returned by {\sf accode} is
4645\begin{verbatim}
4646              3  -1   4
4647              1   2   2
4648              3   2   4
4649\end{verbatim}
4650The original pattern {\sf nz} returned by {\sf reverse} is the same
4651except that the negative entry $-1$ was zero.
4652%
4653%\subsection {Gaussian Elimination Example ({\tt gaussexam.cpp})}
4654%\label{gaussexam}
4655%
4656%The following example uses conditional assignments to show the usage of a once produced tape
4657%for evaluation at new arguments. The elimination is performed with
4658%column pivoting.
4659%\begin{verbatim}
4660%#include <adolc/adolc.h>           // use of ALL ADOL-C interfaces
4661
4662%void gausselim(int n, adoublem& A, adoublev& bv) {
4663%along i;                           // active integer declaration
4664%adoublev temp(n);                  // active vector declaration
4665%adouble r,rj,temps;
4666%int j,k;
4667%for(k=0;k<n;k++) {                 // elimination loop
4668%  i = k;
4669%  r = fabs(A[k][k]);               // initial pivot size
4670%  for(j=k+1;j<n;j++) {
4671%    rj = fabs(A[j][k]);             
4672%    condassign(i,rj-r,j);          // look for a larger element in the same
4673%    condassign(r,rj-r,rj); }       // column with conditional assignments
4674%  temp = A[i];                     // switch rows using active subscripting
4675%  A[i] = A[k];                     // necessary even if i happens to equal
4676%  A[k] = temp;                     // k during taping
4677%  temps = bv[i];
4678%  bv[i]=bv[k];
4679%  bv[k]=temps;
4680%  if (!value(A[k][k]))             // passive subscripting
4681%    exit(1);                       // matrix singular!
4682%  temps= A[k][k];
4683%  A[k] /= temps;
4684%  bv[k] /= temps;
4685%  for(j=k+1;j<n;j++) {
4686%    temps= A[j][k];
4687%    A[j] -= temps*A[k];            // vector operations
4688%    bv[j] -= temps*bv[k]; }        // endfor
4689%}                                  // end elimination loop
4690%temp=0.0;
4691%for(k=n-1;k>=0;k--)                // backsubstitution
4692%  temp[k] = (bv[k]-(A[k]*temp))/A[k][k];
4693%bv=temp;
4694%}                                  // end gausselim
4695%\end{verbatim}
4696%\noindent This function can be called from any program
4697%that suitably initializes
4698%the components of {\sf A} and {\sf bv}
4699%as independents. The resulting tape can be
4700%used to solve any nonsingular linear system of the same size and
4701%to get the sensitivities of the solution with respect to the
4702%system matrix and the right hand side.
4703%\vspace*{-4mm}
4704%
4705\section*{Acknowledgements}
4706%
4707Parts of the ADOL-C source were developed by Andreas
4708Kowarz, Hristo Mitev, Sebastian Schlenkrich,  and Olaf
4709Vogel. We are also indebted to George Corliss,
4710Tom Epperly, Bruce Christianson, David Gay,  David Juedes,
4711Brad Karp, Koichi Kubota, Bob Olson,  Marcela Rosemblun, Dima
4712Shiriaev, Jay Srinivasan, Chuck Tyner, Jean Utke, and Duane Yoder for helping in
4713various ways with the development and documentation of ADOL-C.
4714%
4715\begin{thebibliography}{10}
4716
4717\bibitem{BeKh96}
4718Christian~H. Bischof, Peyvand~M. Khademi, Ali Bouaricha and Alan Carle.
4719\newblock {\em Efficient computation of gradients and Jacobians by dynamic
4720  exploitation of sparsity in automatic differentiation}.
4721\newblock Optimization Methods and Software 7(1):1-39, 1996.
4722
4723\bibitem{Chri91a}
4724Bruce Christianson.
4725\newblock {\em Reverse accumulation and accurate rounding error estimates for
4726Taylor series}.
4727\newblock  Optimization Methods and Software 1:81--94, 1992.
4728
4729\bibitem{GeMaPo05}
4730Assefaw Gebremedhin, Fredrik Manne, and Alex Pothen.
4731\newblock {\em What color is your {J}acobian? {G}raph coloring for computing
4732  derivatives}.
4733\newblock SIAM Review 47(4):629--705, 2005.
4734
4735\bibitem{GePoTaWa06}
4736Assefaw Gebremedhin, Alex Pothen, Arijit Tarafdar and Andrea Walther.
4737{\em Efficient Computation of Sparse Hessians: An Experimental Study
4738  using ADOL-C}. Tech. Rep. (2006). To appear in INFORMS Journal on Computing.
4739
4740\bibitem{GePoWa08} Assefaw Gebremedhin, Alex Pothen, and Andrea
4741  Walther.
4742{\em Exploiting  Sparsity  in Jacobian Computation via Coloring and Automatic Differentiation:
4743a Case Study in a Simulated Moving Bed Process}.
4744In Chr. Bischof et al., eds.,  {\em Proceedings AD 2008 conference}, LNCSE 64, pp. 327 -- 338, Springer (2008).
4745
4746\bibitem{GeTaMaPo07}
4747Assefaw Gebremedhin, Arijit Tarafdar, Fredrik Manne, and Alex Pothen,
4748{\em New Acyclic and Star Coloring Algorithms with Applications to Hessian Computation}.
4749SIAM Journal on Scientific Computing 29(3):1042--1072, 2007.
4750
4751\bibitem{Griewank13}
4752Andreas Griewank,
4753{\em On stable piecewise linearization and generalized algorithmic differentiation}.
4754Optimization Methods and Software 28(6):1139--1178, 2013.
4755
4756\bibitem{GrWa08}
4757Andreas Griewank and Andrea Walther: {\em Evaluating Derivatives, Principles and Techniques of
4758  Algorithmic Differentiation. Second edition}. SIAM, 2008.
4759
4760
4761\bibitem{Griewank97}
4762Andreas Griewank, Jean Utke, and Andrea Walther.
4763\newblock {\em Evaluating higher derivative tensors by forward propagation
4764          of univariate Taylor series}.
4765\newblock Mathematics of Computation, 69:1117--1130, 2000.
4766
4767\bibitem{GrWa00}
4768Andreas Griewank and Andrea Walther. {\em Revolve: An Implementation of Checkpointing for the Reverse
4769                 or Adjoint Mode of Computational Differentiation},
4770                 ACM Transaction on Mathematical Software 26:19--45, 2000.
4771
4772\bibitem{HW}
4773    Ernst Hairer and Gerhard Wanner.
4774    {\it Solving Ordinary Differential Equations II.\/}
4775    Springer-Verlag, Berlin, 1991.
4776
4777\bibitem{Knuth73}
4778Donald~E. Knuth.
4779\newblock {\em The Art of Computer Programming. Second edition.}
4780\newblock Addison-Wesley, Reading, 1973.
4781
4782\bibitem{Roeb05}
4783Klaus R\"{o}benack.
4784\newblock{\em Computation of Lie Derivatives of Tensor Fields Required for
4785         Nonlinear Controller and Observer Design Employing Automatic Differentiation. }
4786\newblock PAMM, 5(1): 181-184, 2005.
4787
4788\bibitem{Roeb11}
4789Klaus R\"{o}benack, Jan Winkler and Siqian Wang.
4790\newblock {\em LIEDRIVERS - A Toolbox for the Efficient Computation of Lie
4791          Derivatives Based on the Object-Oriented Algorithmic Differentiation
4792          Package ADOL-C. }
4793\newblock Proc. of the 4th International Workshop on Equation-Based Object-Oriented
4794          Modeling Languages and Tools, F. E. Cellier and D. Broman and P. Fritzson
4795          and E. A. Lee, editors, volume 56 of Link\"oping Electronic Conference Proceedings,
4796          pages 57-66, Zurich 2011.
4797\bibitem{Wa05a}
4798Andrea Walther.
4799\newblock {\em Computing Sparse Hessians with Automatic Differentiation}.
4800\newblock Transaction on Mathematical Software, 34(1), Artikel 3 (2008).
4801\end{thebibliography}
4802\end{document}
4803
Note: See TracBrowser for help on using the repository browser.