1 | % Latex file containing the documentation of ADOL-C |
---|
2 | % |
---|
3 | % Copyright (C) Andrea Walther, Andreas Griewank, Andreas Kowarz, |
---|
4 | % Hristo Mitev, Sebastian Schlenkrich, Jean Utke, Olaf Vogel |
---|
5 | % |
---|
6 | % This file is part of ADOL-C. This software is provided as open source. |
---|
7 | % Any use, reproduction, or distribution of the software constitutes |
---|
8 | % recipient's acceptance of the terms of the accompanying license file. |
---|
9 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
10 | |
---|
11 | \documentclass[11pt,twoside]{article} |
---|
12 | \usepackage{hyperref} |
---|
13 | \usepackage{amsmath,amsthm,amssymb} |
---|
14 | \usepackage{graphicx} |
---|
15 | \usepackage{datetime} |
---|
16 | |
---|
17 | \newdateformat{monthyear}{\monthname\ \THEYEAR} |
---|
18 | |
---|
19 | \usepackage{color} |
---|
20 | |
---|
21 | \pagestyle{headings} |
---|
22 | \bibliographystyle{plain} |
---|
23 | \parskip=6pt |
---|
24 | |
---|
25 | \setlength{\textwidth}{433.6pt} |
---|
26 | \setlength{\oddsidemargin}{23pt} |
---|
27 | \setlength{\evensidemargin}{23pt} |
---|
28 | \setlength{\topmargin}{25.0pt} |
---|
29 | \setlength{\textheight}{580pt} |
---|
30 | \setlength{\baselineskip}{8pt} |
---|
31 | |
---|
32 | \newcommand{\N}{{ {\rm I} \kern -.225em {\rm N} }} |
---|
33 | \newcommand{\R}{{ {\rm I} \kern -.225em {\rm R} }} |
---|
34 | \newcommand{\T}{{ {\rm I} \kern -.425em {\rm T} }} |
---|
35 | |
---|
36 | \renewcommand{\sectionautorefname}{Section} |
---|
37 | \renewcommand{\subsectionautorefname}{Section} |
---|
38 | \renewcommand{\figureautorefname}{Figure} |
---|
39 | \renewcommand{\tableautorefname}{Table} |
---|
40 | |
---|
41 | \setcounter{equation}{0} |
---|
42 | |
---|
43 | \input{version.tex} |
---|
44 | |
---|
45 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
46 | \begin{document} |
---|
47 | |
---|
48 | \begin{titlepage} |
---|
49 | \begin{center} |
---|
50 | {\Large {\bf ADOL-C:}} |
---|
51 | \footnote{The development of earlier versions was supported by the Office of |
---|
52 | Scientific Computing, U.S. Department of Energy, the NSF, and the Deutsche |
---|
53 | Forschungsgemeinschaft. During the development of the current |
---|
54 | version Andrea Walther and Andreas Kowarz were supported by the |
---|
55 | grant Wa 1607/2-1 of the Deutsche Forschungsgemeinschaft} |
---|
56 | \vspace{0.2in} \\ |
---|
57 | % |
---|
58 | {\Large A Package for the Automatic Differentiation}\vspace{0.1in} \\ |
---|
59 | {\Large of Algorithms Written in C/C++}\\ |
---|
60 | \vspace{.2in} |
---|
61 | {\large\bf Version \packageversion, \monthyear\today} \\ |
---|
62 | \bigskip |
---|
63 | \mbox{Andrea Walther}\footnote{Institute of Mathematics, University |
---|
64 | of Paderborn, 33098 Paderborn, Germany} and |
---|
65 | \mbox{Andreas Griewank}\footnote{Department of Mathematics, |
---|
66 | Humboldt-Universit\"at zu Berlin, 10099 Berlin, Germany} |
---|
67 | \end{center} |
---|
68 | % |
---|
69 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
70 | \begin{abstract} |
---|
71 | The C++ package ADOL-C described here facilitates the evaluation of |
---|
72 | first and higher derivatives of vector functions that are defined |
---|
73 | by computer programs written in C or C++. The resulting derivative |
---|
74 | evaluation routines may be called from C, C++, Fortran, or any other |
---|
75 | language that can be linked with C. |
---|
76 | |
---|
77 | The numerical values of derivative vectors are obtained free |
---|
78 | of truncation errors at a small multiple of the run time and |
---|
79 | random access memory required by the given function evaluation program. |
---|
80 | Derivative matrices are obtained by columns, by rows or in sparse format. |
---|
81 | For solution curves defined by ordinary differential equations, |
---|
82 | special routines are provided that evaluate the Taylor coefficient vectors |
---|
83 | and their Jacobians with respect to the current state vector. |
---|
84 | For explicitly or implicitly defined functions derivative tensors are |
---|
85 | obtained with a complexity that grows only quadratically in their |
---|
86 | degree. The derivative calculations involve a possibly substantial but |
---|
87 | always predictable amount of data. Since the data is accessed strictly sequentially |
---|
88 | it can be automatically paged out to external files. |
---|
89 | \end{abstract} |
---|
90 | % |
---|
91 | {\bf Keywords}: Computational Differentiation, Automatic |
---|
92 | Differentiation, |
---|
93 | Chain Rule, Overloading, Taylor Coefficients, |
---|
94 | Gradients, Hessians, Forward Mode, Reverse Mode, |
---|
95 | Implicit Function Differentiation, Inverse Function Differentiation |
---|
96 | \medskip |
---|
97 | |
---|
98 | \noindent |
---|
99 | {\bf Abbreviated title}: Automatic differentiation by overloading in C++ |
---|
100 | % |
---|
101 | \end{titlepage} |
---|
102 | % |
---|
103 | \tableofcontents |
---|
104 | % |
---|
105 | |
---|
106 | \newpage |
---|
107 | % |
---|
108 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
109 | % |
---|
110 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
111 | \section{Preparing a Section of C or C++ Code for Differentiation} |
---|
112 | \label{prepar} |
---|
113 | % |
---|
114 | \subsection{Introduction} |
---|
115 | % |
---|
116 | \setcounter{equation}{0} |
---|
117 | The package \mbox{ADOL-C} |
---|
118 | utilizes overloading in C++, but the |
---|
119 | user has to know only C. The acronym stands for {\bf A}utomatic |
---|
120 | {\bf D}ifferentiation by {\bf O}ver{\bf L}oading in {\bf C}++. |
---|
121 | In contrast to source transformation approaches, overloading does not generate intermediate |
---|
122 | source code. |
---|
123 | As starting points to retrieve further information on techniques and |
---|
124 | application of automatic differentiation, as well as on other AD |
---|
125 | tools, we refer to the book \cite{GrWa08}. Furthermore, the web page |
---|
126 | \verb=http://www.autodiff.org= of the AD community forms a rich source |
---|
127 | of further information and pointers. |
---|
128 | |
---|
129 | |
---|
130 | ADOL-C facilitates the simultaneous |
---|
131 | evaluation of arbitrarily high directional derivatives and the |
---|
132 | gradients of these Taylor coefficients with respect to all independent |
---|
133 | variables. Relative to the cost of evaluating the underlying function, |
---|
134 | the cost for evaluating any such scalar-vector pair grows as the |
---|
135 | square of the degree of the derivative but is still completely |
---|
136 | independent of the numbers $m$ and $n$. |
---|
137 | |
---|
138 | This manual is organized as follows. This section explains the |
---|
139 | modifications required to convert undifferentiated code to code that |
---|
140 | compiles with ADOL-C. |
---|
141 | \autoref{tape} covers aspects of the tape of recorded data that ADOL-C uses to |
---|
142 | evaluate arbitrarily high order derivatives. The discussion includes storage |
---|
143 | requirements and the tailoring of certain tape characteristics to fit specific |
---|
144 | user needs. Descriptions of easy-to-use drivers for a convenient derivative |
---|
145 | evaluation are contained in \autoref{drivers}. |
---|
146 | \autoref{forw_rev_ad} offers a more mathematical characterization of |
---|
147 | the different modes of AD to compute derivatives. At the same time, the |
---|
148 | corresponding drivers of ADOL-C are explained. |
---|
149 | The overloaded derivative evaluation routines using the forward and the reverse |
---|
150 | mode of AD are explained in \autoref{forw_rev}. |
---|
151 | Advanced differentiation techniques as the optimal checkpointing for |
---|
152 | time integrations, the exploitation of fixed point iterations, the usages |
---|
153 | of external differentiated functions and the differentiation of OpenMP |
---|
154 | parallel programs are described in \autoref{adv_ad}. |
---|
155 | The tapeless forward mode is presented in \autoref{tapeless}. |
---|
156 | \autoref{install} details the installation and |
---|
157 | use of the ADOL-C package. Finally, \autoref{example} |
---|
158 | furnishes some example programs that incorporate the ADOL-C package to |
---|
159 | evaluate first and higher-order |
---|
160 | derivatives. These and other examples are distributed with the ADOL-C |
---|
161 | source code. |
---|
162 | The user should simply refer to them if the more abstract and general |
---|
163 | descriptions of ADOL-C provided in this document do not suffice. |
---|
164 | % |
---|
165 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
166 | \subsection{Declaring Active Variables} |
---|
167 | % |
---|
168 | \label{DecActVar} |
---|
169 | % |
---|
170 | The key ingredient of automatic differentiation by overloading is the |
---|
171 | concept of an {\em active variable}. All variables that may be |
---|
172 | considered as differentiable quantities at some time |
---|
173 | during the program execution must be of an active |
---|
174 | type. ADOL-C uses one |
---|
175 | active scalar type, called {\sf adouble}, whose real part is of the |
---|
176 | standard type {\sf double}. |
---|
177 | Typically, one will declare the independent variables |
---|
178 | and all quantities that directly or indirectly depend on them as |
---|
179 | {\em active}. Other variables that do not depend on the independent |
---|
180 | variables but enter, for example, as parameters, may remain one of the |
---|
181 | {\em passive} types {\sf double, float}, or {\sf int}. There is no |
---|
182 | implicit type conversion from {\sf adouble} to any of these passive |
---|
183 | types; thus, {\bf failure to declare variables as active when they |
---|
184 | depend on other active variables will result in a compile-time error |
---|
185 | message}. In data flow terminology, the set of active variable names |
---|
186 | must contain all its successors in the dependency graph. All components |
---|
187 | of indexed arrays must have the same activity status. |
---|
188 | |
---|
189 | The real component of an {\sf adouble x} can be extracted as |
---|
190 | {\sf x.value()}. In particular, |
---|
191 | such explicit conversions are needed for the standard output procedure |
---|
192 | {\sf printf}. The output stream operator \boldmath $\ll$ \unboldmath is overloaded such |
---|
193 | that first the real part of an {\sf adouble} and then the string |
---|
194 | ``{\sf (a)}" is added to the stream. The input stream operator \boldmath $\gg$ \unboldmath can |
---|
195 | be used to assign a constant value to an {\sf adouble}. |
---|
196 | Naturally, {\sf adouble}s may be |
---|
197 | components of vectors, matrices, and other arrays, as well as |
---|
198 | members of structures or classes. |
---|
199 | |
---|
200 | The C++ class {\sf adouble}, its member functions, and the overloaded |
---|
201 | versions of all arithmetic operations, comparison operators, and |
---|
202 | most ANSI C functions are contained in the file \verb=adouble.cpp= and its |
---|
203 | header \verb=<adolc/adouble.h>=. The latter must be included for compilation |
---|
204 | of all program files containing {\sf adouble}s and corresponding |
---|
205 | operations. |
---|
206 | % |
---|
207 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
208 | \subsection{Marking Active Sections} |
---|
209 | \label{markingActive} |
---|
210 | % |
---|
211 | All calculations involving active variables that occur between |
---|
212 | the void function calls |
---|
213 | \begin{center} |
---|
214 | {\sf trace\_on(tag,keep)} \hspace{0.3in} and \hspace{0.3in} |
---|
215 | {\sf trace\_off(file)} |
---|
216 | \end{center} |
---|
217 | are recorded on a sequential data set called {\em tape}. Pairs of |
---|
218 | these function calls can appear anywhere in a C++ program, but |
---|
219 | they must not overlap. The nonnegative integer argument {\sf tag} identifies the |
---|
220 | particular tape for subsequent function or derivative evaluations. |
---|
221 | Unless several tapes need to be kept, ${\sf tag} =0$ may be used throughout. |
---|
222 | The optional integer arguments {\sf keep} and |
---|
223 | {\sf file} will be discussed in \autoref{tape}. We will refer to the |
---|
224 | sequence of statements executed between a particular call to |
---|
225 | {\sf trace\_on} and the following call to {\sf trace\_off} as an |
---|
226 | {\em active section} of the code. The same active section may be |
---|
227 | entered repeatedly, and one can successively generate several traces |
---|
228 | on distinct tapes by changing the value of {\sf tag}. |
---|
229 | Both functions {\sf trace\_on} and {\sf trace\_off} are prototyped in |
---|
230 | the header file \verb=<adolc/taputil.h>=, which is included by the header |
---|
231 | \verb=<adolc/adouble.h>= automatically. |
---|
232 | |
---|
233 | Active sections may contain nested or even recursive calls to functions |
---|
234 | provided by the user. Naturally, their formal and actual parameters |
---|
235 | must have matching types. In particular, the functions must be |
---|
236 | compiled with their active variables declared as |
---|
237 | {\sf adouble}s and with the header file \verb=<adolc/adouble.h>= included. |
---|
238 | Variables of type {\sf adouble} may be declared outside an active section and need not |
---|
239 | go out of scope before the end of an active section. |
---|
240 | It is not necessary -- though desirable -- that free-store {\sf adouble}s |
---|
241 | allocated within |
---|
242 | an active section be deleted before its completion. The values of all |
---|
243 | {\sf adouble}s that exist at the beginning and end of an active section |
---|
244 | are automatically |
---|
245 | recorded by {\sf trace\_on} and {\sf trace\_off}, respectively. |
---|
246 | % |
---|
247 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
248 | \subsection{Selecting Independent and Dependent Variables} |
---|
249 | % |
---|
250 | One or more active variables that are read in or initialized to |
---|
251 | the values of constants or passive variables must be distinguished as |
---|
252 | independent variables. Other active variables that are similarly |
---|
253 | initialized may be considered as temporaries (e.g., a variable that |
---|
254 | accumulates the partial sums of a scalar product after being |
---|
255 | initialized to zero). In order to distinguish an active variable {\sf x} as |
---|
256 | independent, ADOL-C requires an assignment of the form |
---|
257 | \begin{center} |
---|
258 | {\sf x} \boldmath $\ll=$ \unboldmath {\sf px}\hspace{0.2in}// {\sf px} of any passive numeric type $\enspace .$ |
---|
259 | \end{center} |
---|
260 | This special initialization ensures that {\sf x.value()} = {\sf px}, and it should |
---|
261 | precede any other assignment to {\sf x}. However, {\sf x} may be reassigned |
---|
262 | other values subsequently. Similarly, one or more active variables {\sf y} |
---|
263 | must be distinguished as dependent by an assignment of the form |
---|
264 | \begin{center} |
---|
265 | {\sf y \boldmath $\gg=$ \unboldmath py}\hspace{0.2in}// {\sf py} of any passive type $\enspace ,$ |
---|
266 | \end{center} |
---|
267 | which ensures that {\sf py} = {\sf y.value()} and should not be succeeded |
---|
268 | by any other assignment to {\sf y}. However, a dependent variable {\sf y} |
---|
269 | may have been assigned other real values previously, and it could even be an |
---|
270 | independent variable as well. The derivative values calculated after |
---|
271 | the |
---|
272 | completion of an active section always represent {\bf derivatives of the final |
---|
273 | values of the dependent variables with respect to the initial values of the |
---|
274 | independent variables}. |
---|
275 | |
---|
276 | The order in which the independent and dependent variables are marked |
---|
277 | by the \boldmath $\ll=$ \unboldmath and \boldmath $\gg=$ \unboldmath statements matters crucially for the subsequent |
---|
278 | derivative evaluations. However, these variables do not have to be |
---|
279 | combined into contiguous vectors. ADOL-C counts the number of |
---|
280 | independent and dependent variable specifications within each active |
---|
281 | section and records them in the header of the tape. |
---|
282 | % |
---|
283 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
284 | \subsection{A Subprogram as an Active Section} |
---|
285 | % |
---|
286 | As a generic example let us consider a C(++) function of the form |
---|
287 | shown in \autoref{code1}. |
---|
288 | % |
---|
289 | \begin{figure}[hbt] |
---|
290 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
291 | \begin{center} |
---|
292 | \begin{tabbing} |
---|
293 | {\sf void eval(}\= {\sf int n, int m,} \hspace{0.5 in} \= // number of independents and dependents\\ |
---|
294 | \>{\sf double *x,} \> // independent variable vector \\ |
---|
295 | \>{\sf double *y,} \> // dependent variable vector \\ |
---|
296 | \> {\sf int *k, } \> // integer parameters \\ |
---|
297 | \>{\sf double *z)} \> // real parameters \\ |
---|
298 | {\sf \{ }\hspace{0.1 in } \= \> // beginning of function body \\ |
---|
299 | \>{\sf double t = 0;} \> // local variable declaration \\ |
---|
300 | \>{\sf for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> // begin of computation \\ |
---|
301 | \>\hspace{0.2in}{\sf t += z[i]*x[i];} \> // continue \\ |
---|
302 | \>{\sf $\cdots \cdots \cdots \cdots $} \> // continue \\ |
---|
303 | \>{\sf y[m-1] = t/m; } \> // end of computation \\ |
---|
304 | {\sf \} } \> \> // end of function |
---|
305 | \end{tabbing} |
---|
306 | \end{center} |
---|
307 | }} |
---|
308 | \caption{Generic example of a subprogram to be activated} |
---|
309 | \label{code1} |
---|
310 | \end{figure} |
---|
311 | % |
---|
312 | |
---|
313 | If {\sf eval} is to be called from within an active C(++) |
---|
314 | section with {\sf x} |
---|
315 | and {\sf y} as vectors of {\sf adouble}s and the other parameters |
---|
316 | passive, then one merely has to change the type declarations of all |
---|
317 | variables that depend on {\sf x} from {\sf double} or {\sf float} to |
---|
318 | {\sf adouble}. Subsequently, the subprogram must be compiled with the |
---|
319 | header file \verb=<adolc/adouble.h>= included as described |
---|
320 | in \autoref{DecActVar}. Now let us consider the situation when {\sf eval} is |
---|
321 | still to be called with integer and real arguments, possibly from |
---|
322 | a program written in Fortran77, which does not allow overloading. |
---|
323 | |
---|
324 | To automatically compute derivatives of the dependent |
---|
325 | variables {\sf y} with respect to the independent variables {\sf x}, we |
---|
326 | can make the body of the function into an active section. For |
---|
327 | example, we may modify the previous program segment |
---|
328 | as in \autoref{adolcexam}. |
---|
329 | The renaming and doubling up of the original independent and dependent |
---|
330 | variable vectors by active counterparts may seem at first a bit clumsy. |
---|
331 | However, this transformation has the advantage that the calling |
---|
332 | sequence and the computational part, i.e., where the function is |
---|
333 | really evaluated, of {\sf eval} remain completely |
---|
334 | unaltered. If the temporary variable {\sf t} had remained a {\sf double}, |
---|
335 | the code would not compile, because of a type conflict in the assignment |
---|
336 | following the declaration. More detailed example codes are listed in |
---|
337 | \autoref{example}. |
---|
338 | |
---|
339 | \begin{figure}[htb] |
---|
340 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
341 | \begin{center} |
---|
342 | \begin{tabbing} |
---|
343 | {\sf void eval(} \= {\sf int n,m,} \hspace{1.0 in}\= // number of independents and dependents\\ |
---|
344 | \> {\sf double *px,} \> // independent passive variable vector \\ |
---|
345 | \> {\sf double *py,} \> // dependent passive variable vector \\ |
---|
346 | \> {\sf int *k,} \> // integer parameters \\ |
---|
347 | \> {\sf double *z)} \> // parameter vector \\ |
---|
348 | {\sf \{}\hspace{0.1 in}\= \> // beginning of function body \\ |
---|
349 | \>{\sf short int tag = 0;} \> // tape array and/or tape file specifier\\ |
---|
350 | \>{\sf trace\_on(tag);} \> // start tracing \\ |
---|
351 | \>{\sf adouble *x, *y;} \> // declare active variable pointers \\ |
---|
352 | \>{\sf x = new adouble[n];}\>// declare active independent variables \\ |
---|
353 | \>{\sf y = new adouble[m];} \> // declare active dependent variables \\ |
---|
354 | \>{\sf for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \\ |
---|
355 | \>\hspace{0.2in} {\sf x[i] \boldmath $\ll=$ \unboldmath px[i];} \> // select independent variables \\ |
---|
356 | \>{\sf adouble t = 0;} \> // local variable declaration \\ |
---|
357 | \>{\sf for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> // begin crunch \\ |
---|
358 | \>\hspace{0.2in}{\sf t += z[i]*x[i];} \> // continue crunch \\ |
---|
359 | \>{\sf $\cdots \cdots \cdots \cdots $} \> // continue crunch \\ |
---|
360 | \>{\sf $\cdots \cdots \cdots \cdots $} \> // continue crunch \\ |
---|
361 | \>{\sf y[m-1] = t/m; } \> // end crunch as before\\ |
---|
362 | \>{\sf for (int j=0; j \boldmath $<$ \unboldmath m; j++)} \\ |
---|
363 | \>\hspace{0.2in}{\sf y[j] \boldmath $\gg=$ \unboldmath py[j];} \> // select dependent variables \\ |
---|
364 | \>{\sf delete[] y;} \>// delete dependent active variables \\ |
---|
365 | \>{\sf delete[] x;} \>// delete independent active variables \\ |
---|
366 | \>{\sf trace\_off();} \> // complete tape \\ |
---|
367 | {\sf \}} \>\> // end of function |
---|
368 | \end{tabbing} |
---|
369 | \end{center}}} |
---|
370 | \caption{Activated version of the code listed in \autoref{code1}} |
---|
371 | \label{adolcexam} |
---|
372 | \end{figure} |
---|
373 | % |
---|
374 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
375 | \subsection{Overloaded Operators and Functions} |
---|
376 | \label{OverOper} |
---|
377 | % |
---|
378 | As in the subprogram discussed above, the actual computational |
---|
379 | statements of a C(++) code need not be altered for the purposes of |
---|
380 | automatic differentiation. All arithmetic operations, as well as the |
---|
381 | comparison and assignment operators, are overloaded, so any or all of |
---|
382 | their operands can be an active variable. An {\sf adouble x} occurring |
---|
383 | in a comparison operator is effectively replaced by its real value |
---|
384 | {\sf x.value()}. Most functions contained in the ANSI C standard for |
---|
385 | the math library are overloaded for active arguments. The only |
---|
386 | exceptions are the non-differentiable functions {\sf fmod} and |
---|
387 | {\sf modf}. Otherwise, legitimate C code in active sections can remain |
---|
388 | completely unchanged, provided the direct output of active variables |
---|
389 | is avoided. The rest of this subsection may be skipped by first time |
---|
390 | users who are not worried about marginal issues of differentiability |
---|
391 | and efficiency. |
---|
392 | |
---|
393 | The modulus {\sf fabs(x)} is everywhere Lipschitz continuous but not |
---|
394 | properly differentiable at the origin, which raises the question of |
---|
395 | how this exception ought to be handled. Fortunately, one can easily |
---|
396 | see that {\sf fabs(x)} and all its compositions with smooth |
---|
397 | functions are still directionally differentiable. These |
---|
398 | directional derivatives of arbitrary order can be propagated in the |
---|
399 | forward mode without any ambiguity. In other words, the forward mode as |
---|
400 | implemented in ADOL-C computes Gateaux derivatives |
---|
401 | in certain directions, which reduce to Fr\'echet derivatives only |
---|
402 | if the dependence on the direction is linear. Otherwise, |
---|
403 | the directional derivatives are merely positively homogeneous with |
---|
404 | respect to the scaling of the directions. |
---|
405 | For the reverse mode, ADOL-C sets the derivative of {\sf fabs(x)} at |
---|
406 | the origin somewhat arbitrarily to zero. |
---|
407 | |
---|
408 | We have defined binary functions {\sf fmin} and {\sf fmax} for {\sf adouble} |
---|
409 | arguments, so that function and derivative values are obtained consistent |
---|
410 | with those of {\sf fabs} according to the identities |
---|
411 | \[ |
---|
412 | \min(a,b) = [a+b-|a-b|]/2 \quad {\rm and} \quad |
---|
413 | \max(a,b) = [a+b+|a-b|]/2 \quad . |
---|
414 | \] |
---|
415 | These relations cannot hold if either $a$ or $b$ is infinite, in which |
---|
416 | case {\sf fmin} or {\sf fmax} and their derivatives may still be well |
---|
417 | defined. It should be noted that the directional differentiation of |
---|
418 | {\sf fmin} and {\sf fmax} yields at ties $a=b$ different results from |
---|
419 | the corresponding assignment based on the sign of $a-b$. For example, |
---|
420 | the statement |
---|
421 | \begin{center} |
---|
422 | {\sf if (a $<$ b) c = a; else c = b;} |
---|
423 | \end{center} |
---|
424 | yields for {\sf a}~=~{\sf b} and {\sf a}$^\prime < $~{\sf b}$^\prime$ |
---|
425 | the incorrect directional derivative value |
---|
426 | {\sf c}$^\prime = $~{\sf b}$^\prime$ rather than the correct |
---|
427 | {\sf c}$^\prime = $~{\sf a}$^\prime$. Therefore this form of conditional assignment |
---|
428 | should be avoided by use of the function $\sf fmin(a,b)$. There |
---|
429 | are also versions of {\sf fmin} and {\sf fmax} for two passive |
---|
430 | arguments and mixed passive/active arguments are handled by |
---|
431 | implicit conversion. |
---|
432 | On the function class obtained by composing the modulus with real |
---|
433 | analytic functions, the concept of directional differentiation can be |
---|
434 | extended to the propagation of unique one-sided Taylor expansions. |
---|
435 | The branches taken by {\sf fabs, fmin}, and {\sf fmax}, are recorded |
---|
436 | on the tape. |
---|
437 | |
---|
438 | The functions {\sf sqrt}, {\sf pow}, and some inverse trigonometric |
---|
439 | functions have infinite slopes at the boundary points of their domains. |
---|
440 | At these marginal points the derivatives are set by ADOL-C to |
---|
441 | either {\sf $\pm$InfVal}, 0 |
---|
442 | or {\sf NoNum}, where {\sf InfVal} and {\sf NoNum} are user-defined |
---|
443 | parameters, see \autoref{Customizing}. |
---|
444 | On IEEE machines {\sf InfVal} can be set to the special value |
---|
445 | {\sf Inf}~=~$1.0/0.0$ and {\sf NoNum} to {\sf NaN}~=~$0.0/0.0$. |
---|
446 | For example, at {\sf a}~=~0 the first derivative {\sf b}$^\prime$ |
---|
447 | of {\sf b}~=~{\sf sqrt(a)} is set to |
---|
448 | \[ |
---|
449 | {\sf b}^\prime = \left\{ |
---|
450 | \begin{array}{ll} |
---|
451 | \sf InfVal&\mbox{if}\;\; {\sf a}^\prime>0 \\ |
---|
452 | 0&\mbox{if}\;\;{\sf a}^\prime =0 \\ |
---|
453 | \sf NoNum&\mbox{if}\;\;{\sf a}^\prime <0\\ |
---|
454 | \end{array} \right. \enspace . |
---|
455 | \] |
---|
456 | In other words, we consider {\sf a} and |
---|
457 | consequently {\sf b} as a constant when {\sf a}$^\prime$ or more generally |
---|
458 | all computed Taylor coefficients are zero. |
---|
459 | |
---|
460 | The general power function ${\sf pow(x,y)=x^y}$ is computed whenever |
---|
461 | it is defined for the corresponding {\sf double} arguments. If {\sf x} is |
---|
462 | negative, however, the partial derivative with respect to an integral exponent |
---|
463 | is set to zero. |
---|
464 | %Similarly, the partial of {\bf pow} with respect to both arguments |
---|
465 | %is set to zero at the origin, where both arguments vanish. |
---|
466 | The derivatives of the step functions |
---|
467 | {\sf floor}, {\sf ceil}, {\sf frexp}, and {\sf ldexp} are set to zero at all |
---|
468 | arguments {\sf x}. The result values of the step functions |
---|
469 | are recorded on the tape and can later be checked to recognize |
---|
470 | whether a step to another level was taken during a forward sweep |
---|
471 | at different arguments than at taping time. |
---|
472 | |
---|
473 | Some C implementations supply other special |
---|
474 | functions, in particular the error function {\sf erf(x)}. For the |
---|
475 | latter, we have included an {\sf adouble} version in \verb=<adouble.cpp>=, which |
---|
476 | has been commented out for systems on which the {\sf double} valued version |
---|
477 | is not available. The increment and decrement operators {\sf ++}, \boldmath $--$ \unboldmath (prefix and |
---|
478 | postfix) are available for {\sf adouble}s. |
---|
479 | % |
---|
480 | % XXX: Vector and matrix class have to be reimplemented !!! |
---|
481 | % |
---|
482 | % and also the |
---|
483 | %active subscripts described in the \autoref{act_subscr}. |
---|
484 | Ambiguous statements like {\sf a += a++;} must be |
---|
485 | avoided because the compiler may sequence the evaluation of the |
---|
486 | overloaded |
---|
487 | expression differently from the original in terms of {\sf double}s. |
---|
488 | |
---|
489 | As we have indicated above, all subroutines called with active arguments |
---|
490 | must be modified or suitably overloaded. The simplest procedure is |
---|
491 | to declare the local variables of the function as active so that |
---|
492 | their internal calculations are also recorded on the tape. |
---|
493 | Unfortunately, this approach is likely to be unnecessarily inefficient |
---|
494 | and inaccurate if the original subroutine evaluates a special function |
---|
495 | that is defined as the solution of a particular mathematical problem. |
---|
496 | The most important examples are implicit functions, quadratures, |
---|
497 | and solutions of ordinary differential equations. Often |
---|
498 | the numerical methods for evaluating such special functions are |
---|
499 | elaborate, and their internal workings are not at all differentiable in |
---|
500 | the data. Rather than differentiating through such an adaptive |
---|
501 | procedure, one can obtain first and higher derivatives directly from |
---|
502 | the mathematical definition of the special function. Currently this |
---|
503 | direct approach has been implemented only for user-supplied quadratures |
---|
504 | as described in \autoref{quadrat}. |
---|
505 | % |
---|
506 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
507 | \subsection{Reusing the Tape for Arbitrary Input Values} |
---|
508 | \label{reuse_tape} |
---|
509 | % |
---|
510 | In some situations it may be desirable to calculate the value and |
---|
511 | derivatives of a function at arbitrary arguments by using a tape of |
---|
512 | the function evaluation at one argument and reevaluating the |
---|
513 | function and its derivatives using the given ADOL-C |
---|
514 | routines. This approach can |
---|
515 | significantly reduce run times, and it |
---|
516 | also allows to port problem functions, in the form of the |
---|
517 | corresponding tape files, into a computing environment that |
---|
518 | does not support C++ but does support C or Fortran. |
---|
519 | Therefore, the routines provided by ADOL-C for the evaluation of derivatives |
---|
520 | can be used to at arguments $x$ other than the |
---|
521 | point at which the tape was generated, provided there are |
---|
522 | no user defined quadratures and all comparisons involving |
---|
523 | {\sf adouble}s yield the same result. The last condition |
---|
524 | implies that the control flow is unaltered by the change |
---|
525 | of the independent variable values. Therefore, this sufficient |
---|
526 | condition is tested by ADOL-C and if it is not met |
---|
527 | the ADOL-C routine called for derivative calculations indicates this |
---|
528 | contingency through its return value. Currently, there are six return values, |
---|
529 | see \autoref{retvalues}. |
---|
530 | \begin{table}[h] |
---|
531 | \center\small |
---|
532 | \begin{tabular}{|r|l|}\hline |
---|
533 | +3 & |
---|
534 | \begin{minipage}{12.5cm} |
---|
535 | \vspace*{1ex} |
---|
536 | The function is locally analytic. |
---|
537 | \vspace*{1ex} |
---|
538 | \end{minipage} \\ \hline |
---|
539 | +2 & |
---|
540 | \begin{minipage}{12.5cm} |
---|
541 | \vspace*{1ex} |
---|
542 | The function is locally analytic but the sparsity |
---|
543 | structure (compared to the situation at the taping point) |
---|
544 | may have changed, e.g. while at taping arguments |
---|
545 | {\sf fmax(a,b)} returned {\sf a} we get {\sf b} at |
---|
546 | the argument currently used. |
---|
547 | \vspace*{1ex} |
---|
548 | \end{minipage} \\ \hline |
---|
549 | +1 & |
---|
550 | \begin{minipage}{12.5cm} |
---|
551 | \vspace*{1ex} |
---|
552 | At least one of the functions {\sf fmin}, {\sf fmax} or {\sf fabs} |
---|
553 | is evaluated at a tie or zero, respectively. Hence, the function to be differentiated is |
---|
554 | Lipschitz-continuous but possibly non-differentiable. |
---|
555 | \vspace*{1ex} |
---|
556 | \end{minipage} \\ \hline |
---|
557 | 0 & |
---|
558 | \begin{minipage}{12.5cm} |
---|
559 | \vspace*{1ex} |
---|
560 | Some arithmetic comparison involving {\sf adouble}s yields a tie. |
---|
561 | Hence, the function to be differentiated may be discontinuous. |
---|
562 | \vspace*{1ex} |
---|
563 | \end{minipage} \\ \hline |
---|
564 | -1 & |
---|
565 | \begin{minipage}{12.5cm} |
---|
566 | \vspace*{1ex} |
---|
567 | An {\sf adouble} comparison yields different results |
---|
568 | from the evaluation point at which the tape was generated. |
---|
569 | \vspace*{1ex} |
---|
570 | \end{minipage} \\ \hline |
---|
571 | -2 & |
---|
572 | \begin{minipage}{12.5cm} |
---|
573 | \vspace*{1ex} |
---|
574 | The argument of a user-defined quadrature has changed |
---|
575 | from the evaluation point at which the tape was generated. |
---|
576 | \vspace*{1ex} |
---|
577 | \end{minipage} \\ \hline |
---|
578 | \end{tabular} |
---|
579 | \caption{Description of return values} |
---|
580 | \label{retvalues} |
---|
581 | \end{table} |
---|
582 | |
---|
583 | \begin{figure}[h] |
---|
584 | \centering\includegraphics[width=10.0cm]{tap_point} |
---|
585 | \caption{Return values around the taping point} |
---|
586 | \label{fi:tap_point} |
---|
587 | \end{figure} |
---|
588 | |
---|
589 | In \autoref{fi:tap_point} these return values are illustrated. |
---|
590 | If the user finds the return value of an ADOL-C routine to be negative the |
---|
591 | taping process simply has to be repeated by executing the active section again. |
---|
592 | The crux of the problem lies in the fact that the tape records only |
---|
593 | the operations that are executed during one particular evaluation of the |
---|
594 | function. |
---|
595 | It also has no way to evaluate integrals since the corresponding |
---|
596 | quadratures are never recorded on the tape. |
---|
597 | Therefore, when there are user-defined quadratures the retaping is necessary at each |
---|
598 | new point. If there are only branches conditioned on {\sf adouble} |
---|
599 | comparisons one may hope that re-taping becomes unnecessary when |
---|
600 | the points settle down in some small neighborhood, as one would |
---|
601 | expect for example in an iterative equation solver. |
---|
602 | % |
---|
603 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
604 | \subsection{Conditional Assignments} |
---|
605 | \label{condassign} |
---|
606 | % |
---|
607 | It appears unsatisfactory that, for example, a simple table lookup |
---|
608 | of some physical property forces the re-recording of a possibly |
---|
609 | much larger calculation. However, the basic philosophy of ADOL-C |
---|
610 | is to overload arithmetic, rather than to generate a new program |
---|
611 | with jumps between ``instructions'', which would destroy the |
---|
612 | strictly sequential tape access and |
---|
613 | require the infusion of substantial compiler technology. |
---|
614 | Therefore, we introduce the two constructs of conditional |
---|
615 | assignments and active integers as partial remedies to the |
---|
616 | branching problem. |
---|
617 | |
---|
618 | In many cases, the functionality of branches |
---|
619 | can be replaced by conditional assignments. |
---|
620 | For this purpose, we provide a special function called |
---|
621 | {\sf condassign(a,b,c,d)}. Its calling sequence corresponds to the |
---|
622 | syntax of the conditional assignment |
---|
623 | \begin{center} |
---|
624 | {\sf a = (b \boldmath $>$ \unboldmath 0) ? c : d;} |
---|
625 | \end{center} |
---|
626 | which C++ inherited from C. However, here the arguments are restricted to be |
---|
627 | active or passive scalar arguments, and all expression arguments |
---|
628 | are evaluated before the test on {\sf b}, which is different from |
---|
629 | the usual conditional assignment or the code segment. |
---|
630 | |
---|
631 | Suppose the original program contains the code segment |
---|
632 | \begin{center} |
---|
633 | {\sf if (b \boldmath $>$ \unboldmath 0) a = c; else a = d;}\\ |
---|
634 | \end{center} |
---|
635 | Here, only one of the expressions (or, more generally, program blocks) |
---|
636 | {\sf c} and {\sf d} is evaluated, which exactly constitutes the problem |
---|
637 | for ADOL-C. To obtain the correct value {\sf a} with ADOL-C, one |
---|
638 | may first execute both branches and then pick either {\sf c} |
---|
639 | or {\sf d} using |
---|
640 | {\sf condassign(a,b,c,d)}. To maintain |
---|
641 | consistency with the original code, one has to make sure |
---|
642 | that the two branches do not have any side effects that can |
---|
643 | interfere with each other or may be important for subsequent |
---|
644 | calculations. Furthermore the test parameter {\sf b} has to be an |
---|
645 | {\sf adouble} or an {\sf adouble} expression. Otherwise the |
---|
646 | test condition {\sf b} is recorded on the tape as a {\em constant} with its |
---|
647 | run time value. Thus the original dependency of {\sf b} on |
---|
648 | active variables gets lost, for instance if {\sf b} is a comparison |
---|
649 | expression, see \autoref{OverOper}. |
---|
650 | If there is no {\sf else} part in a conditional assignment, one may call |
---|
651 | the three argument version |
---|
652 | {\sf condassign(a,b,c)}, which |
---|
653 | is logically equivalent to {\sf condassign(a,b,c,a)} in that |
---|
654 | nothing happens if {\sf b} is non-positive. |
---|
655 | The header file \verb=<adolc/adouble.h>= |
---|
656 | contains also corresponding definitions of |
---|
657 | {\sf condassign(a,b,c,d)} |
---|
658 | and {\sf condassign(a,b,c)} for |
---|
659 | passive {\sf double} arguments so that the modified code |
---|
660 | without any differentiation can be tested |
---|
661 | for correctness. |
---|
662 | |
---|
663 | A generalization of this concept for more than two branches, e.g., |
---|
664 | akin to a \texttt{switch} statement or a cascade of \texttt{if...else if}, may be done by enabling |
---|
665 | \textsf{ADOLC\_ADVANCED\_BRANCHING} and performing selection on |
---|
666 | elements of an \texttt{advector} with active indices. |
---|
667 | % |
---|
668 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
669 | \subsection{Step-by-Step Modification Procedure} |
---|
670 | % |
---|
671 | To prepare a section of given C or C++ code for automatic |
---|
672 | differentiation as described above, one applies the following step-by-step procedure. |
---|
673 | \begin{enumerate} |
---|
674 | \item |
---|
675 | Use the statements {\sf trace\_on(tag)} or {\sf trace\_on(tag,keep)} |
---|
676 | and {\sf trace\_off()} or {\sf trace\_off(file)} to mark the |
---|
677 | beginning and end of the active section. |
---|
678 | \item |
---|
679 | Select the set of active variables, and change their type from |
---|
680 | {\sf double} or {\sf float} to {\sf adouble}. |
---|
681 | \item |
---|
682 | Select a sequence of independent variables, and initialize them with |
---|
683 | \boldmath $\ll=$ \unboldmath assignments from passive variables or vectors. |
---|
684 | \item |
---|
685 | Select a sequence of dependent variables among the active variables, |
---|
686 | and pass their final values to passive variable or vectors thereof |
---|
687 | by \boldmath $\gg=$ \unboldmath assignments. |
---|
688 | \item |
---|
689 | Compile the codes after including the header file \verb=<adolc/adouble.h>=. |
---|
690 | \end{enumerate} |
---|
691 | Typically, the first compilation will detect several type conflicts |
---|
692 | -- usually attempts to convert from active to passive |
---|
693 | variables or to perform standard I/O of active variables. |
---|
694 | Since all standard |
---|
695 | C programs can be activated by a mechanical application of the |
---|
696 | procedure above, the following section is of importance |
---|
697 | only to advanced users. |
---|
698 | % |
---|
699 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
700 | \section{Numbering the Tapes and Controlling the Buffer} |
---|
701 | \label{tape} |
---|
702 | % |
---|
703 | The trace generated by the execution of an active section may stay |
---|
704 | within a triplet of internal arrays or it may be written out |
---|
705 | to three corresponding files. We will refer to these triplets as the |
---|
706 | tape array or tape file, in general tape, which may subsequently be |
---|
707 | used to evaluate the |
---|
708 | underlying function and its derivatives at the original point or at |
---|
709 | alternative arguments. If the active section involves user-defined |
---|
710 | quadratures it must be executed and |
---|
711 | re-taped at each new argument. Similarly, if conditions on |
---|
712 | {\sf adouble} values lead to a different program branch being taken at |
---|
713 | a new argument the evaluation process also needs to be re-taped at the |
---|
714 | new point. Otherwise, direct evaluation from |
---|
715 | the tape by the routine {\sf function} (\autoref{optdrivers}) is |
---|
716 | likely to be |
---|
717 | faster. The use of quadratures and the results of all comparisons on |
---|
718 | {\sf adouble}s are recorded on the tape so that {\sf function} and other |
---|
719 | forward routines stop and return appropriate flags if their use without |
---|
720 | prior re-taping is unsafe. To avoid any re-taping certain types of |
---|
721 | branches can be recorded on the tape through |
---|
722 | the use of conditional assignments |
---|
723 | described before in \autoref{condassign}. |
---|
724 | |
---|
725 | Several tapes may be generated and kept simultaneously. |
---|
726 | A tape array is used as a triplet of buffers or a tape file is generated if |
---|
727 | the length of any of the buffers exceeds the maximal array lengths of |
---|
728 | {\sf OBUFSIZE}, {\sf VBUFSIZE} or {\sf LBUFSIZE}. These parameters are |
---|
729 | defined in the header file \verb=<adolc/usrparms.h>= |
---|
730 | and may be adjusted by the user in the header file before compiling |
---|
731 | the ADOL-C library, or on runtime using a file named \verb=.adolcrc=. |
---|
732 | The filesystem folder, where the tapes files may be written to disk, |
---|
733 | can be changed by changing the definition of {\sf TAPE\_DIR} in |
---|
734 | the header file \verb=<adolc/dvlparms.h>= before |
---|
735 | compiling the ADOL-C library, or on runtime by defining {\sf |
---|
736 | TAPE\_DIR} in the \verb=.adolcrc= file. By default this is defined |
---|
737 | to be the present working directory (\verb=.=). |
---|
738 | |
---|
739 | For simple usage, {\sf trace\_on} may be called with only the tape |
---|
740 | {\sf tag} as argument, and {\sf trace\_off} may be called |
---|
741 | without argument. The optional integer argument {\sf keep} of |
---|
742 | {\sf trace\_on} determines whether the numerical values of all |
---|
743 | active variables are recorded in a buffered temporary array or file |
---|
744 | called the taylor stack. |
---|
745 | This option takes effect if |
---|
746 | {\sf keep} = 1 and prepares the scene for an immediately following |
---|
747 | gradient evaluation by a call to a routine implementing the reverse mode |
---|
748 | as described in the \autoref{forw_rev_ad} and \autoref{forw_rev}. A |
---|
749 | file is used instead of an array if the size exceeds the maximal array |
---|
750 | length of {\sf TBUFSIZE} defined in \verb=<adolc/usrparms.h>= and may |
---|
751 | be adjusted in the same way like the other buffer sizes mentioned above. |
---|
752 | Alternatively, gradients may be evaluated by a call |
---|
753 | to {\sf gradient}, which includes a preparatory forward sweep |
---|
754 | for the creation of the temporary file. If omitted, the argument |
---|
755 | {\sf keep} defaults to 0, so that no temporary |
---|
756 | taylor stack file is generated. |
---|
757 | |
---|
758 | By setting the optional integer argument {\sf file} of |
---|
759 | {\sf trace\_off} to 1, the user may force a numbered tape |
---|
760 | file to be written even if the tape array (buffer) does not overflow. |
---|
761 | If the argument {\sf file} is omitted, it |
---|
762 | defaults to 0, so that the tape array is written onto a tape file only |
---|
763 | if the length of any of the buffers exceeds {\sf [OLVT]BUFSIZE} elements. |
---|
764 | |
---|
765 | After the execution of an active section, if a tape file was generated, i.e., |
---|
766 | if the length of some buffer exceeded {\sf [OLVT]BUFSIZE} elements or if the |
---|
767 | argument {\sf file} of {\sf trace\_off} was set to 1, the files will be |
---|
768 | saved in the directory defined as {\sf ADOLC\_TAPE\_DIR} (by default |
---|
769 | the current working directory) under filenames formed by |
---|
770 | the strings {\sf ADOLC\_OPERATIONS\_NAME}, {\sf |
---|
771 | ADOLC\_LOCATIONS\_NAME}, {\sf ADOLC\_VALUES\_NAME} and {\sf |
---|
772 | ADOLC\_TAYLORS\_NAME} defined in |
---|
773 | the header file \verb=<adolc/dvlparms.h>= appended with the number |
---|
774 | given as the {\sf tag} argument to {\sf trace\_on} and have the |
---|
775 | extension {\sf .tap}. |
---|
776 | |
---|
777 | Later, all problem-independent routines |
---|
778 | like {\sf gradient}, {\sf jacobian}, {\sf forward}, {\sf reverse}, and others |
---|
779 | expect as first argument a {\sf tag} to determine |
---|
780 | the tape on which their respective computational task is to be performed. |
---|
781 | By calling {\sf trace\_on} with different tape {\sf tag}s, one can create |
---|
782 | several tapes for various function evaluations and subsequently perform |
---|
783 | function and derivative evaluations on one or more of them. |
---|
784 | |
---|
785 | For example, suppose one wishes to calculate for two smooth functions |
---|
786 | $f_1(x)$ and $f_2(x)$ |
---|
787 | \[ |
---|
788 | f(x) = \max \{f_1(x) ,f_2(x)\},\qquad \nabla f(x), |
---|
789 | \] |
---|
790 | and possibly higher derivatives where the two functions do not tie. |
---|
791 | Provided $f_1$ and $f_2$ are evaluated in two separate active sections, |
---|
792 | one can generate two different tapes by calling {\sf trace\_on} with |
---|
793 | {\sf tag} = 1 and {\sf tag} = 2 at the beginning of the respective active |
---|
794 | sections. |
---|
795 | Subsequently, one can decide whether $f(x)=f_1(x)$ or $f(x)=f_2(x)$ at the |
---|
796 | current argument and then evaluate the gradient $\nabla f(x)$ by calling |
---|
797 | {\sf gradient} with the appropriate argument value {\sf tag} = 1 or |
---|
798 | {\sf tag} = 2. |
---|
799 | % |
---|
800 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
801 | \subsection{Examining the Tape and Predicting Storage Requirements } |
---|
802 | \label{examiningTape} |
---|
803 | % |
---|
804 | At any point in the program, one may call the routine |
---|
805 | \begin{center} |
---|
806 | {\sf void tapestats(unsigned short tag, size\_t* counts)} |
---|
807 | \end{center} |
---|
808 | with {\sf counts} beeing an array of at least eleven integers. |
---|
809 | The first argument {\sf tag} specifies the particular tape of |
---|
810 | interest. The components of {\sf counts} represent |
---|
811 | \[ |
---|
812 | \begin{tabular}{ll} |
---|
813 | {\sf counts[0]}: & the number of independents, i.e.~calls to \boldmath $\ll=$ \unboldmath, \\ |
---|
814 | {\sf counts[1]}: & the number of dependents, i.e.~calls to \boldmath $\gg=$ \unboldmath,\\ |
---|
815 | {\sf counts[2]}: & the maximal number of live active variables,\\ |
---|
816 | {\sf counts[3]}: & the size of taylor stack (number of overwrites),\\ |
---|
817 | {\sf counts[4]}: & the buffer size (a multiple of eight), |
---|
818 | \end{tabular} |
---|
819 | \] |
---|
820 | \[ |
---|
821 | \begin{tabular}{ll} |
---|
822 | {\sf counts[5]}: & the total number of operations recorded,\\ |
---|
823 | {\sf counts[6-13]}: & other internal information about the tape. |
---|
824 | \end{tabular} |
---|
825 | \] |
---|
826 | The values {\sf maxlive} = {\sf counts[2]} and {\sf tssize} = {\sf counts[3]} |
---|
827 | determine the temporary |
---|
828 | storage requirements during calls to the routines |
---|
829 | implementing the forward and the reverse mode. |
---|
830 | For a certain degree {\sf deg} $\geq$ 0, the scalar version of the |
---|
831 | forward mode involves apart from the tape buffers an array of |
---|
832 | $(${\sf deg}$+1)*${\sf maxlive} {\sf double}s in |
---|
833 | core and, in addition, a sequential data set called the value stack |
---|
834 | of {\sf tssize}$*${\sf keep} {\sf revreal}s if called with the |
---|
835 | option {\sf keep} $>$ 0. Here |
---|
836 | the type {\sf revreal} is defined as {\sf double} or {\sf float}. The latter choice halves the storage |
---|
837 | requirement for the sequential data set, which stays in core if |
---|
838 | its length is less than {\sf TBUFSIZE} bytes and is otherwise written |
---|
839 | out to a temporary file. The parameter {\sf TBUFSIZE} is defined in the header file \verb=<adolc/usrparms.h>=. |
---|
840 | The drawback of the economical |
---|
841 | {\sf revreal} = {\sf float} choice is that subsequent calls to reverse mode implementations |
---|
842 | yield gradients and other adjoint vectors only in single-precision |
---|
843 | accuracy. This may be acceptable if the adjoint vectors |
---|
844 | represent rows of a Jacobian that is used for the calculation of |
---|
845 | Newton steps. In its scalar version, the reverse mode implementation involves |
---|
846 | the same number of {\sf double}s and twice as many {\sf revreal}s as the |
---|
847 | forward mode implementation. |
---|
848 | The storage requirements of the vector versions of the forward mode and |
---|
849 | reverse mode implementation are equal to that of the scalar versions multiplied by |
---|
850 | the vector length. |
---|
851 | % |
---|
852 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
853 | \subsection{Customizing ADOL-C} |
---|
854 | \label{Customizing} |
---|
855 | % |
---|
856 | Based on the information provided by the routine {\sf tapestats}, the user may alter the |
---|
857 | following types and constant dimensions in the header file \verb=<adolc/usrparms.h>= |
---|
858 | to suit his problem and environment. |
---|
859 | |
---|
860 | \begin{description} |
---|
861 | \item[{\sf OBUFSIZE}, {\sf LBUFSIZE}, {\sf VBUFSIZE}{\rm :}] These integer determines the length of |
---|
862 | in\-ter\-nal buf\-fers (default: 524$\,$288). If the buffers are large enough to accommodate all |
---|
863 | required data, any file access is avoided unless {\sf trace\_off} |
---|
864 | is called with a positive argument. This desirable situation can |
---|
865 | be achieved for many problem functions with an execution trace of moderate |
---|
866 | size. Primarily these values occur as an argument |
---|
867 | to {\sf malloc}, so that setting it unnecessarily large may have no |
---|
868 | ill effects, unless the operating system prohibits or penalizes large |
---|
869 | array allocations. It is however recommended to leave the values in |
---|
870 | \texttt{<adolc/usrparms.h>} unchanged and set them using the |
---|
871 | \texttt{.adolcrc} file in the current working directory at runtime. |
---|
872 | |
---|
873 | \item[{\sf TBUFSIZE}{\rm :}] This integer determines the length of the |
---|
874 | in\-ter\-nal buf\-fer for a taylor stack (default: 524$\,$288). |
---|
875 | |
---|
876 | \item[{\sf TBUFNUM}{\rm :}] This integer determines the maximal number of taylor stacks (default: 32). |
---|
877 | |
---|
878 | \item[{\sf fint}{\rm :}] The integer data type used by Fortran callable versions of functions. |
---|
879 | |
---|
880 | \item[{\sf fdouble}{\rm :}] The floating point data type used by Fortran callable versions of functions. |
---|
881 | |
---|
882 | \item[{\sf inf\_num}{\rm :}] This together with {\sf inf\_den} |
---|
883 | sets the ``vertical'' slope {\sf InfVal} = {\sf inf\_num/inf\_den} |
---|
884 | of special functions at the boundaries of their domains (default: {\sf inf\_num} = 1.0). On IEEE machines |
---|
885 | the default setting produces the standard {\sf Inf}. On non-IEEE machines |
---|
886 | change these values to produce a small {\sf InfVal} value and compare |
---|
887 | the results of two forward sweeps with different {\sf InfVal} settings |
---|
888 | to detect a ``vertical'' slope. |
---|
889 | |
---|
890 | \item[{\sf inf\_den}{\rm :}] See {\sf inf\_num} (default: 0.0). |
---|
891 | |
---|
892 | \item[{\sf non\_num}{\rm :}] This together with {\sf non\_den} |
---|
893 | sets the mathematically |
---|
894 | undefined derivative value {\sf NoNum} = {\sf non\_num/non\_den} |
---|
895 | of special functions at the boundaries of their domains (default: {\sf non\_num} = 0.0). On IEEE machines |
---|
896 | the default setting produces the standard {\sf NaN}. On non-IEEE machines |
---|
897 | change these values to produce a small {\sf NoNum} value and compare |
---|
898 | the results of two forward sweeps with different {\sf NoNum} settings |
---|
899 | to detect the occurrence of undefined derivative values. |
---|
900 | |
---|
901 | \item[{\sf non\_den}{\rm :}] See {\sf non\_num} (default: 0.0). |
---|
902 | |
---|
903 | \item[{\sf ADOLC\_EPS}{\rm :}] For testing on small numbers to avoid overflows (default: 10E-20). |
---|
904 | |
---|
905 | \item[{\sf DIAG\_OUT}{\rm :}] File identifier used as standard output for ADOL-C diagnostics (default: stdout). |
---|
906 | \end{description} |
---|
907 | |
---|
908 | The following types and options may be set using the command-line |
---|
909 | options of the \texttt{./configure} script. |
---|
910 | |
---|
911 | \begin{description} |
---|
912 | \item[{\sf locint}{\rm :}] The range of the integer type |
---|
913 | {\sf locint} determines how many {\sf adouble}s can be simultaneously |
---|
914 | alive (default: {\sf unsigned int}). In extreme cases when there are more than $2^{32}$ {\sf adouble}s |
---|
915 | alive at any one time, the type {\sf locint} must be changed to |
---|
916 | {\sf unsigned long}. This can be done by passing |
---|
917 | \texttt{--enable-ulong} to \texttt{./configure}. |
---|
918 | |
---|
919 | \item[{\sf revreal}{\rm :}] The choice of this floating-point type |
---|
920 | trades accuracy with storage for reverse sweeps (default: {\sf double}). While functions |
---|
921 | and their derivatives are always evaluated in double precision |
---|
922 | during forward sweeps, gradients and other adjoint vectors are obtained |
---|
923 | with the precision determined by the type {\sf revreal}. The less |
---|
924 | accurate choice {\sf revreal} = {\sf float} nearly halves the |
---|
925 | storage requirement during reverse sweeps. This can be done by passing |
---|
926 | \texttt{--disable-double} to \texttt{./configure}. |
---|
927 | |
---|
928 | \item[{\sf ATRIG\_ERF}{\rm :}] The overloaded versions of the inverse |
---|
929 | hyperbolic functions and the error function are enabled (default: |
---|
930 | undefined) by passing \texttt{--enable-atrig-erf} |
---|
931 | to \texttt{./configure} |
---|
932 | |
---|
933 | \item[{\sf ADOLC\_USE\_CALLOC}{\rm :}] Selects the memory allocation routine |
---|
934 | used by ADOL-C. {\sf Malloc} will be used if this variable is |
---|
935 | undefined. {\sf ADOLC\_USE\_CALLOC} is defined by default to avoid incorrect |
---|
936 | result caused by uninitialized memory. It can be set undefined by |
---|
937 | passing \texttt{--disable-use-calloc} to \texttt{./configure}. |
---|
938 | |
---|
939 | \item[{\sf ADOLC\_ADVANCED\_BRANCHING}{\rm :}] Enables routines |
---|
940 | required for automatic branch selection (default: disabled). The |
---|
941 | boolean valued comparison operators with two \texttt{adouble} type |
---|
942 | arguments will not return boolean values anymore and may not be used |
---|
943 | in branch control statements (\texttt{if}, \texttt{while}, \texttt{for} |
---|
944 | etc.). Instead conditional assignments using \texttt{condassign} or |
---|
945 | selection operations on elements of \texttt{advector} type should be |
---|
946 | used. Enabling this option and rewriting the function evaluation |
---|
947 | using \texttt{condassign} or selections of \texttt{advector} |
---|
948 | elements will prevent the need for retracing the function at branch |
---|
949 | switches. This can be enabled by passing |
---|
950 | \texttt{--enable-advanced-branching} to \texttt{./configure}. |
---|
951 | \end{description} |
---|
952 | % |
---|
953 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
954 | \subsection{Warnings and Suggestions for Improved Efficiency} |
---|
955 | \label{WarSug} |
---|
956 | % |
---|
957 | Since the type {\sf adouble} has a nontrivial constructor, |
---|
958 | the mere declaration of large {\sf adouble} arrays may take up |
---|
959 | considerable run time. The user should be warned against |
---|
960 | the usual Fortran practice of declaring fixed-size arrays |
---|
961 | that can accommodate the largest possible case of an evaluation program |
---|
962 | with variable dimensions. If such programs are converted to or written |
---|
963 | in C, the overloading in combination with ADOL-C will lead to very |
---|
964 | large run time increases for comparatively small values of the |
---|
965 | problem dimension, because the actual computation is completely |
---|
966 | dominated by the construction of the large {\sf adouble} arrays. |
---|
967 | The user is advised to |
---|
968 | create dynamic arrays of |
---|
969 | {\sf adouble}s by using the C++ operator {\sf new} and to destroy them |
---|
970 | using {\sf delete}. For storage efficiency it is desirable that |
---|
971 | dynamic objects are created and destroyed in a last-in-first-out |
---|
972 | fashion. |
---|
973 | |
---|
974 | Whenever an {\sf adouble} is declared, the constructor for the type |
---|
975 | {\sf adouble} assigns it a nominal address, which we will refer to as |
---|
976 | its {\em location}. The location is of the type {\sf locint} defined |
---|
977 | in the header file \verb=<adolc/usrparms.h>=. Active vectors occupy |
---|
978 | a range of contiguous locations. As long as the program execution |
---|
979 | never involves more than 65$\,$536 active variables, the type {\sf locint} |
---|
980 | may be defined as {\sf unsigned short}. Otherwise, the range may be |
---|
981 | extended by defining {\sf locint} as {\sf (unsigned) int} or |
---|
982 | {\sf (unsigned) long}, which may nearly double |
---|
983 | the overall mass storage requirement. Sometimes one can avoid exceeding |
---|
984 | the accessible range of {\sf unsigned short}s by using more local variables and deleting |
---|
985 | {\sf adouble}s created by the new operator in a |
---|
986 | last-in-first-out |
---|
987 | fashion. When memory for {\sf adouble}s is requested through a call to |
---|
988 | {\sf malloc()} or other related C memory-allocating |
---|
989 | functions, the storage for these {\sf adouble}s is allocated; however, the |
---|
990 | C++ {\sf adouble} constructor is never called. The newly defined |
---|
991 | {\sf adouble}s are never assigned a location and are not counted in |
---|
992 | the stack of live variables. Thus, any results depending upon these |
---|
993 | pseudo-{\sf adouble}s will be incorrect. For these reasons {\bf DO NOT use |
---|
994 | malloc() and related C memory-allocating |
---|
995 | functions when declaring adoubles (see the following paragraph).} |
---|
996 | % |
---|
997 | % XXX: Vector and matrix class have to be reimplemented !!! |
---|
998 | % |
---|
999 | %The same point applies, of course, |
---|
1000 | % for active vectors. |
---|
1001 | |
---|
1002 | When an {\sf adouble} |
---|
1003 | % |
---|
1004 | % XXX: Vector and matrix class have to be reimplemented !!! |
---|
1005 | % |
---|
1006 | % or {\bf adoublev} |
---|
1007 | goes out of |
---|
1008 | scope or is explicitly deleted, the destructor notices that its |
---|
1009 | location(s) may be |
---|
1010 | freed for subsequent (nominal) reallocation. In general, this is not done |
---|
1011 | immediately but is delayed until the locations to be deallocated form a |
---|
1012 | contiguous tail of all locations currently being used. |
---|
1013 | |
---|
1014 | As a consequence of this allocation scheme, the currently |
---|
1015 | alive {\sf adouble} locations always form a contiguous range of integers |
---|
1016 | that grows and shrinks like a stack. Newly declared {\sf adouble}s are |
---|
1017 | placed on the top so that vectors of {\sf adouble}s obtain a contiguous |
---|
1018 | range of locations. While the C++ compiler can be expected to construct |
---|
1019 | and destruct automatic variables in a last-in-first-out fashion, the |
---|
1020 | user may upset this desirable pattern by deleting free-store {\sf adouble}s |
---|
1021 | too early or too late. Then the {\sf adouble} stack may grow |
---|
1022 | unnecessarily, but the numerical results will still be |
---|
1023 | correct, unless an exception occurs because the range of {\sf locint} |
---|
1024 | is exceeded. In general, free-store {\sf adouble}s |
---|
1025 | % |
---|
1026 | % XXX: Vector and matrix class have to be reimplemented !!! |
---|
1027 | % |
---|
1028 | %and {\bf adoublev}s |
---|
1029 | should be deleted in a last-in-first-out fashion toward the end of |
---|
1030 | the program block in which they were created. |
---|
1031 | When this pattern is maintained, the maximum number of |
---|
1032 | {\sf adouble}s alive and, as a consequence, the |
---|
1033 | randomly accessed storage space |
---|
1034 | of the derivative evaluation routines is bounded by a |
---|
1035 | small multiple of the memory used in the relevant section of the |
---|
1036 | original program. Failure to delete dynamically allocated {\sf adouble}s |
---|
1037 | may cause that the maximal number of {\sf adouble}s alive at one time will be exceeded |
---|
1038 | if the same active section is called repeatedly. The same effect |
---|
1039 | occurs if static {\sf adouble}s are used. |
---|
1040 | |
---|
1041 | To avoid the storage and manipulation of structurally |
---|
1042 | trivial derivative values, one should pay careful attention to |
---|
1043 | the naming of variables. Ideally, the intermediate |
---|
1044 | values generated during the evaluation of a vector function |
---|
1045 | should be assigned to program variables that are |
---|
1046 | consistently either active or passive, in that all their values |
---|
1047 | either are or are not dependent on the independent variables |
---|
1048 | in a nontrivial way. For example, this rule is violated if a temporary |
---|
1049 | variable is successively used to accumulate inner products involving |
---|
1050 | first only passive and later active arrays. Then the first inner |
---|
1051 | product and all its successors in the data dependency graph become |
---|
1052 | artificially active and the derivative evaluation routines |
---|
1053 | described later will waste |
---|
1054 | time allocating and propagating |
---|
1055 | trivial or useless derivatives. Sometimes even values that do |
---|
1056 | depend on the independent variables may be of only transitory |
---|
1057 | importance and may not affect the dependent variables. For example, |
---|
1058 | this is true for multipliers that are used to scale linear |
---|
1059 | equations, but whose values do not influence the dependent |
---|
1060 | variables in a mathematical sense. Such dead-end variables |
---|
1061 | can be deactivated by the use of the {\sf value} function, which |
---|
1062 | converts {\sf adouble}s to {\sf double}s. The deleterious effects |
---|
1063 | of unnecessary activity are partly alleviated by run time |
---|
1064 | activity flags in the derivative routine |
---|
1065 | {\sf hov\_reverse} presented in \autoref{forw_rev_ad}. |
---|
1066 | |
---|
1067 | The {\sf adouble} default constructor sets to zero the associated value. |
---|
1068 | This implies a certain overhead that may seem unnecessary when no initial value |
---|
1069 | is actually given, however, |
---|
1070 | the implicit initialization of arrays from a partial value list is the only legitimate construct (known to us) that requires this behavior. |
---|
1071 | An array instantiation such as |
---|
1072 | \begin{center} |
---|
1073 | \sf double x[3]=\{2.0\}; |
---|
1074 | \end{center} |
---|
1075 | will initialize {\sf x[0]} to {\sf 2.0} and initialize (implicitly) the remaining array elements |
---|
1076 | {\sf x[1]} and {\sf x[2]} to {\sf 0.0}. According to the C++ standard the array element construction of |
---|
1077 | the type changed instantiation |
---|
1078 | \begin{center} |
---|
1079 | \sf adouble x[3]=\{2.0\}; |
---|
1080 | \end{center} |
---|
1081 | will use the constructor {\sf adouble(const double\&);} for {\sf x[0]} passing in {\sf 2.0} but |
---|
1082 | will call the {\sf adouble} default constructor {\sf x[1]} and {\sf x[2]} leaving these array |
---|
1083 | elements uninitialized {\em unless} the default constructor does implement the initialization to |
---|
1084 | zero. |
---|
1085 | The C++ constructor syntax does not provide a means to distinguish this implicit initialization from the declaration of any simple uninitialized variable. |
---|
1086 | If the user can ascertain the absence of array instantiations such as the above then one can |
---|
1087 | configure ADOL-C with the \verb=--disable-stdczero= option , see \autoref{genlib}, to |
---|
1088 | avoid the overhead of these initializations. |
---|
1089 | |
---|
1090 | % |
---|
1091 | % |
---|
1092 | % |
---|
1093 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1094 | \section{Easy-To-Use Drivers} |
---|
1095 | \label{drivers} |
---|
1096 | % |
---|
1097 | For the convenience of the user, ADOL-C provides several |
---|
1098 | easy-to-use drivers that compute the most frequently required |
---|
1099 | derivative objects. Throughout, we assume that after the execution of an |
---|
1100 | active section, the corresponding tape with the identifier {\sf tag} |
---|
1101 | contains a detailed record of the computational process by which the |
---|
1102 | final values $y$ of the dependent variables were obtained from the |
---|
1103 | values $x$ of the independent variables. We will denote this functional |
---|
1104 | relation between the input variables $x$ and the output variables $y$ by |
---|
1105 | \[ |
---|
1106 | F : \R^n \mapsto \R^m, \qquad x \rightarrow F(x) \equiv y. |
---|
1107 | \] |
---|
1108 | The return value of all drivers presented in this section |
---|
1109 | indicate the validity of the tape as explained in \autoref{reuse_tape}. |
---|
1110 | The presented drivers are all C functions and therefore can be used within |
---|
1111 | C and C++ programs. Some Fortran-callable companions can be found |
---|
1112 | in the appropriate header files. |
---|
1113 | % |
---|
1114 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1115 | \subsection{Drivers for Optimization and Nonlinear Equations} |
---|
1116 | % |
---|
1117 | \label{optdrivers} |
---|
1118 | % |
---|
1119 | The drivers provided for solving optimization problems and nonlinear |
---|
1120 | equations are prototyped in the header file \verb=<adolc/drivers/drivers.h>=, |
---|
1121 | which is included automatically by the global header file \verb=<adolc/adolc.h>= |
---|
1122 | (see \autoref{ssec:DesIH}). |
---|
1123 | |
---|
1124 | The routine {\sf function} allows to evaluate the desired function from |
---|
1125 | the tape instead of executing the corresponding source code: |
---|
1126 | % |
---|
1127 | \begin{tabbing} |
---|
1128 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1129 | \>{\sf int function(tag,m,n,x,y)}\\ |
---|
1130 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1131 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1132 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1133 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1134 | \>{\sf double y[m];} \> // dependent vector $y=F(x)$ |
---|
1135 | \end{tabbing} |
---|
1136 | % |
---|
1137 | If the original evaluation program is available this double version |
---|
1138 | should be used to compute the function value in order to avoid the |
---|
1139 | interpretative overhead. |
---|
1140 | |
---|
1141 | For the calculation of whole derivative vectors and matrices up to order |
---|
1142 | 2 there are the following procedures: |
---|
1143 | % |
---|
1144 | \begin{tabbing} |
---|
1145 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1146 | \>{\sf int gradient(tag,n,x,g)}\\ |
---|
1147 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1148 | \>{\sf int n;} \> // number of independent variables $n$ and $m=1$\\ |
---|
1149 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1150 | \>{\sf double g[n];} \> // resulting gradient $\nabla F(x)$ |
---|
1151 | \end{tabbing} |
---|
1152 | % |
---|
1153 | \begin{tabbing} |
---|
1154 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1155 | \>{\sf int jacobian(tag,m,n,x,J)}\\ |
---|
1156 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1157 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1158 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1159 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1160 | \>{\sf double J[m][n];} \> // resulting Jacobian $F^\prime (x)$ |
---|
1161 | \end{tabbing} |
---|
1162 | % |
---|
1163 | \begin{tabbing} |
---|
1164 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1165 | \>{\sf int hessian(tag,n,x,H)}\\ |
---|
1166 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1167 | \>{\sf int n;} \> // number of independent variables $n$ and $m=1$\\ |
---|
1168 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1169 | \>{\sf double H[n][n];} \> // resulting Hessian matrix $\nabla^2F(x)$ |
---|
1170 | \end{tabbing} |
---|
1171 | % |
---|
1172 | The driver routine {\sf hessian} computes only the lower half of |
---|
1173 | $\nabla^2f(x_0)$ so that all values {\sf H[i][j]} with $j>i$ |
---|
1174 | of {\sf H} allocated as a square array remain untouched during the call |
---|
1175 | of {\sf hessian}. Hence only $i+1$ {\sf double}s need to be |
---|
1176 | allocated starting at the position {\sf H[i]}. |
---|
1177 | |
---|
1178 | To use the full capability of automatic differentiation when the |
---|
1179 | product of derivatives with certain weight vectors or directions are needed, ADOL-C offers |
---|
1180 | the following four drivers: |
---|
1181 | % |
---|
1182 | \begin{tabbing} |
---|
1183 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1184 | \>{\sf int vec\_jac(tag,m,n,repeat,x,u,z)}\\ |
---|
1185 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1186 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1187 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1188 | \>{\sf int repeat;} \> // indicate repeated call at same argument\\ |
---|
1189 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1190 | \>{\sf double u[m];} \> // range weight vector $u$ \\ |
---|
1191 | \>{\sf double z[n];} \> // result $z = u^TF^\prime (x)$ |
---|
1192 | \end{tabbing} |
---|
1193 | If a nonzero value of the parameter {\sf repeat} indicates that the |
---|
1194 | routine {\sf vec\_jac} has been called at the same argument immediately |
---|
1195 | before, the internal forward mode evaluation will be skipped and only |
---|
1196 | reverse mode evaluation with the corresponding arguments is executed |
---|
1197 | resulting in a reduced computational complexity of the function {\sf vec\_jac}. |
---|
1198 | % |
---|
1199 | \begin{tabbing} |
---|
1200 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1201 | \>{\sf int jac\_vec(tag,m,n,x,v,z)}\\ |
---|
1202 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1203 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1204 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1205 | \>{\sf double x[n];} \> // independent vector $x$\\ |
---|
1206 | \>{\sf double v[n];} \> // tangent vector $v$\\ |
---|
1207 | \>{\sf double z[m];} \> // result $z = F^\prime (x)v$ |
---|
1208 | \end{tabbing} |
---|
1209 | % |
---|
1210 | \begin{tabbing} |
---|
1211 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1212 | \>{\sf int hess\_vec(tag,n,x,v,z)}\\ |
---|
1213 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1214 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1215 | \>{\sf double x[n];} \> // independent vector $x$\\ |
---|
1216 | \>{\sf double v[n];} \> // tangent vector $v$\\ |
---|
1217 | \>{\sf double z[n];} \> // result $z = \nabla^2F(x) v$ |
---|
1218 | \end{tabbing} |
---|
1219 | % |
---|
1220 | \begin{tabbing} |
---|
1221 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1222 | \>{\sf int hess\_mat(tag,n,p,x,V,Z)}\\ |
---|
1223 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1224 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1225 | \>{\sf int p;} \> // number of columns in $V$\\ |
---|
1226 | \>{\sf double x[n];} \> // independent vector $x$\\ |
---|
1227 | \>{\sf double V[n][p];} \> // tangent matrix $V$\\ |
---|
1228 | \>{\sf double Z[n][p];} \> // result $Z = \nabla^2F(x) V$ |
---|
1229 | \end{tabbing} |
---|
1230 | % |
---|
1231 | \begin{tabbing} |
---|
1232 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1233 | \>{\sf int lagra\_hess\_vec(tag,m,n,x,v,u,h)}\\ |
---|
1234 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1235 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1236 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1237 | \>{\sf double x[n];} \> // independent vector $x$\\ |
---|
1238 | \>{\sf double v[n];} \> // tangent vector $v$\\ |
---|
1239 | \>{\sf double u[m];} \> // range weight vector $u$ \\ |
---|
1240 | \>{\sf double h[n];} \> // result $h = u^T\nabla^2F(x) v $ |
---|
1241 | \end{tabbing} |
---|
1242 | % |
---|
1243 | The next procedure allows the user to perform Newton steps only |
---|
1244 | having the corresponding tape at hand: |
---|
1245 | % |
---|
1246 | \begin{tabbing} |
---|
1247 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1248 | \>{\sf int jac\_solv(tag,n,x,b,mode)} \\ |
---|
1249 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1250 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1251 | \>{\sf double x[n];} \> // independent vector $x$ as\\ |
---|
1252 | \>{\sf double b[n];} \> // in: right-hand side b, out: result $w$ of |
---|
1253 | $F(x)w = b$\\ |
---|
1254 | \>{\sf int mode;} \> // option to choose different solvers |
---|
1255 | \end{tabbing} |
---|
1256 | % |
---|
1257 | On entry, parameter {\sf b} of the routine {\sf jac\_solv} |
---|
1258 | contains the right-hand side of the equation $F(x)w = b$ to be solved. On exit, |
---|
1259 | {\sf b} equals the solution $w$ of this equation. If {\sf mode} = 0 only |
---|
1260 | the Jacobian of the function |
---|
1261 | given by the tape labeled with {\sf tag} is provided internally. |
---|
1262 | The LU-factorization of this Jacobian is computed for {\sf mode} = 1. The |
---|
1263 | solution of the equation is calculated if {\sf mode} = 2. |
---|
1264 | Hence, it is possible to compute the |
---|
1265 | LU-factorization only once. Then the equation can be solved for several |
---|
1266 | right-hand sides $b$ without calculating the Jacobian and |
---|
1267 | its factorization again. |
---|
1268 | |
---|
1269 | If the original evaluation code of a function contains neither |
---|
1270 | quadratures nor branches, all drivers described above can be used to |
---|
1271 | evaluate derivatives at any argument in its domain. The same still |
---|
1272 | applies if there are no user defined quadratures and |
---|
1273 | all comparisons involving {\sf adouble}s have the same result as |
---|
1274 | during taping. If this assumption is falsely made all drivers |
---|
1275 | while internally calling the forward mode evaluation will return the value -1 or -2 |
---|
1276 | as already specified in \autoref{reuse_tape}. |
---|
1277 | % |
---|
1278 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1279 | \subsection{Drivers for Ordinary Differential Equations} |
---|
1280 | \label{odedrivers} |
---|
1281 | % |
---|
1282 | When $F$ is the right-hand side of an (autonomous) ordinary |
---|
1283 | differential equation |
---|
1284 | \[ |
---|
1285 | x^\prime(t) \; = \; F(x(t)) , |
---|
1286 | \] |
---|
1287 | we must have $m=n$. Along any solution path $x(t)$ its Taylor |
---|
1288 | coefficients $x_j$ at some time, e.g., $t=0$, must satisfy |
---|
1289 | the relation |
---|
1290 | \[ |
---|
1291 | x_{i+1} = \frac{1}{1+i} y_i. |
---|
1292 | \] |
---|
1293 | with the $y_j$ the Taylor coefficients of its derivative $y(t)=x^\prime(t)$, namely, |
---|
1294 | \[ |
---|
1295 | y(t) \; \equiv \; F(x(t)) \; : \; I\!\!R \;\mapsto \;I\!\!R^m |
---|
1296 | \] |
---|
1297 | defined by an autonomous right-hand side $F$ recorded on the tape. |
---|
1298 | Using this relation, one can generate the Taylor coefficients $x_i$, |
---|
1299 | $i \le deg$, |
---|
1300 | recursively from the current point $x_0$. This task is achieved by the |
---|
1301 | driver routine {\sf forode} defined as follows: |
---|
1302 | % |
---|
1303 | \begin{tabbing} |
---|
1304 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1305 | \>{\sf int forode(tag,n,tau,dol,deg,X)}\\ |
---|
1306 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1307 | \>{\sf int n;} \> // number of state variables $n$\\ |
---|
1308 | \>{\sf double tau;} \> // scaling parameter\\ |
---|
1309 | \>{\sf int dol;} \> // degree on previous call\\ |
---|
1310 | \>{\sf int deg;} \> // degree on current call\\ |
---|
1311 | \>{\sf double X[n][deg+1];} \> // Taylor coefficient vector $X$ |
---|
1312 | \end{tabbing} |
---|
1313 | % |
---|
1314 | If {\sf dol} is positive, it is assumed that {\sf forode} |
---|
1315 | has been called before at the same point so that all Taylor coefficient |
---|
1316 | vectors up to the {\sf dol}-th are already correct. |
---|
1317 | |
---|
1318 | Subsequently one may call the driver routine {\sf reverse} or corresponding |
---|
1319 | low level routines as explained in the \autoref{forw_rev} and |
---|
1320 | \autoref{forw_rev_ad}, respectively, to compute |
---|
1321 | the family of square matrices {\sf Z[n][n][deg]} defined by |
---|
1322 | \[ |
---|
1323 | Z_j \equiv U\/\frac{\partial y_j}{\partial x_0} \in{I\!\!R}^{q \times n} , |
---|
1324 | \] |
---|
1325 | with {\sf double** U}$=I_n$ the identity matrix of order {\sf n}. |
---|
1326 | |
---|
1327 | For the numerical solutions of ordinary differential equations, |
---|
1328 | one may also wish to calculate the Jacobians |
---|
1329 | \begin{equation} |
---|
1330 | \label{eq:bees} |
---|
1331 | B_j \; \equiv \; \frac{\mbox{d}x_{j+1}}{\mbox{d} x_0}\;\in\;{I\!\!R}^{n \times n}\, , |
---|
1332 | \end{equation} |
---|
1333 | which exist provided $F$ is sufficiently smooth. These matrices can |
---|
1334 | be obtained from the partial derivatives $\partial y_i/\partial x_0$ |
---|
1335 | by an appropriate version of the chain rule. |
---|
1336 | To compute the total derivatives $B = (B_j)_{0\leq j <d}$ |
---|
1337 | defined in \eqref{eq:bees}, one has to evaluate $\frac{1}{2}d(d-1)$ |
---|
1338 | matrix-matrix products. This can be done by a call of the routine {\sf accode} after the |
---|
1339 | corresponding evaluation of the {\sf hov\_reverse} function. The interface of |
---|
1340 | {\sf accode} is defined as follows: |
---|
1341 | % |
---|
1342 | \begin{tabbing} |
---|
1343 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1344 | \>{\sf int accode(n,tau,deg,Z,B,nz)}\\ |
---|
1345 | \>{\sf int n;} \> // number of state variables $n$ \\ |
---|
1346 | \>{\sf double tau;} \> // scaling parameter\\ |
---|
1347 | \>{\sf int deg;} \> // degree on current call\\ |
---|
1348 | \>{\sf double Z[n][n][deg];} \> // partials of coefficient vectors\\ |
---|
1349 | \>{\sf double B[n][n][deg];} \> // result $B$ as defined in \eqref{eq:bees}\\ |
---|
1350 | \>{\sf short nz[n][n];} \> // optional nonzero pattern |
---|
1351 | \end{tabbing} |
---|
1352 | % |
---|
1353 | Sparsity information can be exploited by {\sf accode} using the array {\sf |
---|
1354 | nz}. For this purpose, {\sf nz} has to be set by a call of the routine {\sf |
---|
1355 | reverse} or the corresponding basic routines as explained below in |
---|
1356 | \autoref{forw_rev_ad} and \autoref{forw_rev}, respectively. The |
---|
1357 | non-positive entries of {\sf nz} are then changed by {\sf accode} so that upon |
---|
1358 | return |
---|
1359 | \[ |
---|
1360 | \mbox{{\sf B[i][j][k]}} \; \equiv \; 0 \quad {\rm if} \quad \mbox{\sf k} \leq \mbox{\sf $-$nz[i][j]}\; . |
---|
1361 | \] |
---|
1362 | In other words, the matrices $B_k$ = {\sf B[ ][ ][k]} have a |
---|
1363 | sparsity pattern that fills in as $k$ grows. Note, that there need to be no |
---|
1364 | loss in computational efficiency if a time-dependent ordinary differential equation |
---|
1365 | is rewritten in autonomous form. |
---|
1366 | |
---|
1367 | The prototype of the ODE-drivers {\sf forode} and {\sf accode} is contained in the header file |
---|
1368 | \verb=<adolc/drivers/odedrivers.h>=. The global header file |
---|
1369 | \verb=<adolc/adolc.h>= |
---|
1370 | includes this file automatically, see \autoref{ssec:DesIH}. |
---|
1371 | |
---|
1372 | An example program using the procedures {\sf forode} and {\sf accode} together |
---|
1373 | with more detailed information about the coding can be found in |
---|
1374 | \autoref{exam:ode}. The corresponding source code |
---|
1375 | \verb=odexam.cpp= is contained in the subdirectory |
---|
1376 | \verb=examples=. |
---|
1377 | % |
---|
1378 | % |
---|
1379 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1380 | \subsection{Drivers for Sparse Jacobians and Sparse Hessians} |
---|
1381 | \label{sparse} |
---|
1382 | % |
---|
1383 | Quite often, the Jacobians and Hessians that have to be computed are sparse |
---|
1384 | matrices. Therefore, ADOL-C provides additionally drivers that |
---|
1385 | allow the exploitation of sparsity. The exploitation of sparsity is |
---|
1386 | frequently based on {\em graph coloring} methods, discussed |
---|
1387 | for example in \cite{GeMaPo05} and \cite{GeTaMaPo07}. The sparse drivers of ADOL-C presented in this section |
---|
1388 | rely on the the coloring package ColPack developed by the authors of \cite{GeMaPo05} and \cite{GeTaMaPo07}. |
---|
1389 | ColPack is not directly incorporated in ADOL-C, and therefore needs to be installed |
---|
1390 | separately to use the sparse drivers described here. ColPack is available for download at |
---|
1391 | \verb=http://www.cscapes.org/coloringpage/software.htm=. More information about the required |
---|
1392 | installation of ColPack is given in \autoref{install}. |
---|
1393 | % |
---|
1394 | \subsubsection*{Sparse Jacobians and Sparse Hessians} |
---|
1395 | % |
---|
1396 | To compute the entries of sparse Jacobians and sparse Hessians, |
---|
1397 | respectively, in coordinate format one may use the drivers: |
---|
1398 | \begin{tabbing} |
---|
1399 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1400 | \>{\sf int sparse\_jac(tag,m,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\ |
---|
1401 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1402 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1403 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1404 | \>{\sf int repeat;} \> // indicate repeated call at same argument\\ |
---|
1405 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1406 | \>{\sf int nnz;} \> // number of nonzeros \\ |
---|
1407 | \>{\sf unsigned int rind[nnz];}\> // row index\\ |
---|
1408 | \>{\sf unsigned int cind[nnz];}\> // column index\\ |
---|
1409 | \>{\sf double values[nnz];} \> // non-zero values\\ |
---|
1410 | \>{\sf int options[4];} \> // array of control parameters\\ |
---|
1411 | \end{tabbing} |
---|
1412 | % |
---|
1413 | \begin{tabbing} |
---|
1414 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1415 | \>{\sf int sparse\_hess(tag,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\ |
---|
1416 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1417 | \>{\sf int n;} \> // number of independent variables $n$ and $m=1$\\ |
---|
1418 | \>{\sf int repeat;} \> // indicate repeated call at same argument\\ |
---|
1419 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1420 | \>{\sf int nnz;} \> // number of nonzeros \\ |
---|
1421 | \>{\sf unsigned int rind[nnz];}\> // row indices\\ |
---|
1422 | \>{\sf unsigned int cind[nnz];}\> // column indices\\ |
---|
1423 | \>{\sf double values[nnz];} \> // non-zero values \\ |
---|
1424 | \>{\sf int options[2];} \> // array of control parameters\\ |
---|
1425 | \end{tabbing} |
---|
1426 | % |
---|
1427 | Once more, the input variables are the identifier for the internal |
---|
1428 | representation {\sf tag}, if required the number of dependents {\sf m}, |
---|
1429 | and the number of independents {\sf n} for a consistency check. |
---|
1430 | Furthermore, the flag {\sf repeat=0} indicates that the functions are called |
---|
1431 | at a point with a new sparsity structure, whereas {\sf repeat=1} results in |
---|
1432 | the re-usage of the sparsity pattern from the previous call. |
---|
1433 | The current values of the independents are given by the array {\sf x}. |
---|
1434 | The input/output |
---|
1435 | variable {\sf nnz} stores the number of the nonzero entries. |
---|
1436 | Therefore, {\sf nnz} denotes also the length of the arrays {\sf r\_ind} storing |
---|
1437 | the row indices, {\sf c\_ind} storing the column indices, and |
---|
1438 | {\sf values} storing the values of the nonzero entries. |
---|
1439 | If {\sf sparse\_jac} and {\sf sparse\_hess} are called with {\sf repeat=0}, |
---|
1440 | the functions determine the number of nonzeros for the sparsity pattern |
---|
1441 | defined by the value of {\sf x}, allocate appropriate arrays {\sf r\_ind}, |
---|
1442 | {\sf c\_ind}, and {\sf values} and store the desired information in these |
---|
1443 | arrays. |
---|
1444 | During the next function call with {\sf repeat=1} the allocated memory |
---|
1445 | is reused such that only the values of the arrays are changed. |
---|
1446 | Before calling {\sf sparse\_jac} or {\sf sparse\_hess} once more with {\sf |
---|
1447 | repeat=0} the user is responsible for the deallocation of the array |
---|
1448 | {\sf r\_ind}, {\sf c\_ind}, and {\sf values} using the function {\sf |
---|
1449 | delete[]}! |
---|
1450 | |
---|
1451 | For each driver the array {\sf options} can be used to adapted the |
---|
1452 | computation of the sparse derivative matrices to the special |
---|
1453 | needs of application under consideration. Most frequently, the default options |
---|
1454 | will give a reasonable performance. The elements of the array {\sf options} control the action of |
---|
1455 | {\sf sparse\_jac} according to \autoref{options_sparse_jac}. |
---|
1456 | \begin{table}[h] |
---|
1457 | \center |
---|
1458 | \begin{tabular}{|c|c|l|} \hline |
---|
1459 | component & value & \\ \hline |
---|
1460 | {\sf options[0]} & & way of sparsity pattern computation \\ |
---|
1461 | & 0 & propagation of index domains (default) \\ |
---|
1462 | & 1 & propagation of bit pattern \\ \hline |
---|
1463 | {\sf options[1]} & & test the computational graph control flow \\ |
---|
1464 | & 0 & safe mode (default) \\ |
---|
1465 | & 1 & tight mode \\ \hline |
---|
1466 | {\sf options[2]} & & way of bit pattern propagation \\ |
---|
1467 | & 0 & automatic detection (default) \\ |
---|
1468 | & 1 & forward mode \\ |
---|
1469 | & 2 & reverse mode \\ \hline |
---|
1470 | {\sf options[3]} & & way of compression \\ |
---|
1471 | & 0 & column compression (default) \\ |
---|
1472 | & 1 & row compression \\ \hline |
---|
1473 | \end{tabular} |
---|
1474 | \caption{ {\sf sparse\_jac} parameter {\sf options}\label{options_sparse_jac}} |
---|
1475 | \end{table} |
---|
1476 | |
---|
1477 | The component {\sf options[1]} determines |
---|
1478 | the usage of the safe or tight mode of sparsity computation. |
---|
1479 | The first, more conservative option is the default. It accounts for all |
---|
1480 | dependences that might occur for any value of the |
---|
1481 | independent variables. For example, the intermediate |
---|
1482 | {\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is |
---|
1483 | always assumed to depend on all independent variables that {\sf a} or {\sf b} |
---|
1484 | dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the |
---|
1485 | logical {\sf OR} of those associated with {\sf a} and {\sf b}. |
---|
1486 | In contrast |
---|
1487 | the tight option gives this result only in the unlikely event of an exact |
---|
1488 | tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern |
---|
1489 | associated with {\sf c} either to that of {\sf a} or to that of {\sf b}, |
---|
1490 | depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally. |
---|
1491 | Obviously, the sparsity pattern obtained with the tight option may contain |
---|
1492 | more zeros than that obtained with the safe option. On the other hand, it |
---|
1493 | will only be valid at points belonging to an area where the function $F$ is locally |
---|
1494 | analytic and that contains the point at which the internal representation was |
---|
1495 | generated. Since generating the sparsity structure using the safe version does not |
---|
1496 | require any reevaluation, it may thus reduce the overall computational cost |
---|
1497 | despite the fact that it produces more nonzero entries. |
---|
1498 | The value of {\sf options[2]} selects the direction of bit pattern propagation. |
---|
1499 | Depending on the number of independent $n$ and of dependent variables $m$ |
---|
1500 | one would prefer the forward mode if $n$ is significant smaller than $m$ and |
---|
1501 | would otherwise use the reverse mode. |
---|
1502 | |
---|
1503 | The elements of the array {\sf options} control the action of |
---|
1504 | {\sf sparse\_hess} according to \autoref{options_sparse_hess}. |
---|
1505 | \begin{table}[h] |
---|
1506 | \center |
---|
1507 | \begin{tabular}{|c|c|l|} \hline |
---|
1508 | component & value & \\ \hline |
---|
1509 | {\sf options[0]} & & test the computational graph control flow \\ |
---|
1510 | & 0 & safe mode (default) \\ |
---|
1511 | & 1 & tight mode \\ \hline |
---|
1512 | {\sf options[1]} & & way of recovery \\ |
---|
1513 | & 0 & indirect recovery (default) \\ |
---|
1514 | & 1 & direct recovery \\ \hline |
---|
1515 | \end{tabular} |
---|
1516 | \caption{ {\sf sparse\_hess} parameter {\sf options}\label{options_sparse_hess}} |
---|
1517 | \end{table} |
---|
1518 | |
---|
1519 | The described driver routines for the computation of sparse derivative |
---|
1520 | matrices are prototyped in the header file |
---|
1521 | \verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the |
---|
1522 | global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}). |
---|
1523 | Example codes illustrating the usage of {\sf |
---|
1524 | sparse\_jac} and {\sf sparse\_hess} can be found in the file |
---|
1525 | \verb=sparse_jacobian.cpp= and \verb=sparse_hessian.cpp= contained in %the subdirectory |
---|
1526 | \verb=examples/additional_examples/sparse=. |
---|
1527 | % |
---|
1528 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1529 | \subsubsection*{Computation of Sparsity Pattern} |
---|
1530 | % |
---|
1531 | ADOL-C offers a convenient way of determining the |
---|
1532 | sparsity structure of a Jacobian matrix using the function: |
---|
1533 | % |
---|
1534 | \begin{tabbing} |
---|
1535 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill % define tab position |
---|
1536 | \>{\sf int jac\_pat(tag, m, n, x, JP, options)}\\ |
---|
1537 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1538 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1539 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1540 | \>{\sf double x[n];} \> // independent variables $x_0$\\ |
---|
1541 | \>{\sf unsigned int JP[][];} \> // row compressed sparsity structure\\ |
---|
1542 | \>{\sf int options[2];} \> // array of control parameters |
---|
1543 | \end{tabbing} |
---|
1544 | % |
---|
1545 | The sparsity pattern of the |
---|
1546 | Jacobian is computed in a compressed row format. For this purpose, |
---|
1547 | {\sf JP} has to be an $m$ dimensional array of pointers to {\sf |
---|
1548 | unsigned int}s, i.e., one has {\sf unsigned int* JP[m]}. |
---|
1549 | During the call of {\sf jac\_pat}, the number $\hat{n}_i$ of nonzero |
---|
1550 | entries in row $i$ of the Jacobian is determined for all $1\le i\le |
---|
1551 | m$. Then, a memory allocation is performed such that {\sf JP[i-1]} |
---|
1552 | points to a block of $\hat{n}_i+1$ {\sf unsigned int} for all $1\le |
---|
1553 | i\le m$ and {\sf JP[i-1][0]} is set to $\hat{n}_i$. Subsequently, the |
---|
1554 | column indices of the $j$ nonzero entries in the $i$th row are stored |
---|
1555 | in the components {\sf JP[i-1][1]}, \ldots, {\sf JP[i-1][j]}. |
---|
1556 | |
---|
1557 | The elements of the array {\sf options} control the action of |
---|
1558 | {\sf jac\_pat} according to \autoref{options}. |
---|
1559 | \begin{table}[h] |
---|
1560 | \center |
---|
1561 | \begin{tabular}{|c|c|l|} \hline |
---|
1562 | component & value & \\ \hline |
---|
1563 | {\sf options[0]} & & way of sparsity pattern computation \\ |
---|
1564 | & 0 & propagation of index domains (default) \\ |
---|
1565 | & 1 & propagation of bit pattern \\ \hline |
---|
1566 | {\sf options[1]} & & test the computational graph control flow \\ |
---|
1567 | & 0 & safe mode (default) \\ |
---|
1568 | & 1 & tight mode \\ \hline |
---|
1569 | {\sf options[2]} & & way of bit pattern propagation \\ |
---|
1570 | & 0 & automatic detection (default) \\ |
---|
1571 | & 1 & forward mode \\ |
---|
1572 | & 2 & reverse mode \\ \hline |
---|
1573 | \end{tabular} |
---|
1574 | \caption{ {\sf jac\_pat} parameter {\sf options}\label{options}} |
---|
1575 | \end{table} |
---|
1576 | The value of {\sf options[0]} selects the way to compute the sparsity |
---|
1577 | pattern. The component {\sf options[1]} determines |
---|
1578 | the usage of the safe or tight mode of bit pattern propagation. |
---|
1579 | The first, more conservative option is the default. It accounts for all |
---|
1580 | dependences that might occur for any value of the |
---|
1581 | independent variables. For example, the intermediate |
---|
1582 | {\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is |
---|
1583 | always assumed to depend on all independent variables that {\sf a} or {\sf b} |
---|
1584 | dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the |
---|
1585 | logical {\sf OR} of those associated with {\sf a} and {\sf b}. |
---|
1586 | In contrast |
---|
1587 | the tight option gives this result only in the unlikely event of an exact |
---|
1588 | tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern |
---|
1589 | associated with {\sf c} either to that of {\sf a} or to that of {\sf b}, |
---|
1590 | depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally. |
---|
1591 | Obviously, the sparsity pattern obtained with the tight option may contain |
---|
1592 | more zeros than that obtained with the safe option. On the other hand, it |
---|
1593 | will only be valid at points belonging to an area where the function $F$ is locally |
---|
1594 | analytic and that contains the point at which the internal representation was |
---|
1595 | generated. Since generating the sparsity structure using the safe version does not |
---|
1596 | require any reevaluation, it may thus reduce the overall computational cost |
---|
1597 | despite the fact that it produces more nonzero entries. The value of |
---|
1598 | {\sf options[2]} selects the direction of bit pattern propagation. |
---|
1599 | Depending on the number of independent $n$ and of dependent variables $m$ |
---|
1600 | one would prefer the forward mode if $n$ is significant smaller than $m$ and |
---|
1601 | would otherwise use the reverse mode. |
---|
1602 | |
---|
1603 | The routine {\sf jac\_pat} may use the propagation of bitpattern to |
---|
1604 | determine the sparsity pattern. Therefore, a kind of ``strip-mining'' |
---|
1605 | is used to cope with large matrix dimensions. If the system happens to run out of memory, one may reduce |
---|
1606 | the value of the constant {\sf PQ\_STRIPMINE\_MAX} |
---|
1607 | following the instructions in \verb=<adolc/sparse/sparse_fo_rev.h>=. |
---|
1608 | |
---|
1609 | The driver routine is prototyped in the header file |
---|
1610 | \verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the |
---|
1611 | global header file \verb=<adolc/adolc.h>= (see |
---|
1612 | \autoref{ssec:DesIH}). The determination of sparsity patterns is |
---|
1613 | illustrated by the examples \verb=sparse_jacobian.cpp= |
---|
1614 | and \verb=jacpatexam.cpp= |
---|
1615 | contained in |
---|
1616 | \verb=examples/additional_examples/sparse=. |
---|
1617 | |
---|
1618 | To compute the sparsity pattern of a Hessian in a row compressed form, ADOL-C provides the |
---|
1619 | driver |
---|
1620 | \begin{tabbing} |
---|
1621 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill % define tab position |
---|
1622 | \>{\sf int hess\_pat(tag, n, x, HP, options)}\\ |
---|
1623 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1624 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1625 | \>{\sf double x[n];} \> // independent variables $x_0$\\ |
---|
1626 | \>{\sf unsigned int HP[][];} \> // row compressed sparsity structure\\ |
---|
1627 | \>{\sf int option;} \> // control parameter |
---|
1628 | \end{tabbing} |
---|
1629 | where the user has to provide {\sf HP} as an $n$ dimensional array of pointers to {\sf |
---|
1630 | unsigned int}s. |
---|
1631 | After the function call {\sf HP} contains the sparsity pattern, |
---|
1632 | where {\sf HP[j][0]} contains the number of nonzero elements in the |
---|
1633 | $j$th row for $1 \le j\le n$. |
---|
1634 | The components {\sf P[j][i]}, $0<${\sf i}~$\le$~{\sf P[j][0]} store the |
---|
1635 | indices of these entries. For determining the sparsity pattern, ADOL-C uses |
---|
1636 | the algorithm described in \cite{Wa05a}. The parameter{\sf option} determines |
---|
1637 | the usage of the safe ({\sf option = 0}, default) or tight mode ({\sf |
---|
1638 | option = 1}) of the computation of the sparsity pattern as described |
---|
1639 | above. |
---|
1640 | |
---|
1641 | This driver routine is prototyped in the header file |
---|
1642 | \verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the |
---|
1643 | global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}). |
---|
1644 | An example employing the procedure {\sf hess\_pat} can be found in the file |
---|
1645 | \verb=sparse_hessian.cpp= contained in |
---|
1646 | \verb=examples/additional_examples/sparse=. |
---|
1647 | % |
---|
1648 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1649 | \subsubsection*{Calculation of Seed Matrices} |
---|
1650 | % |
---|
1651 | To compute a compressed derivative matrix from a given sparsity |
---|
1652 | pattern, one has to calculate an appropriate seed matrix that can be |
---|
1653 | used as input for the derivative calculation. To facilitate the |
---|
1654 | generation of seed matrices for a sparsity pattern given in |
---|
1655 | row compressed form, ADOL-C provides the following two drivers, |
---|
1656 | which are based on the ColPack library: |
---|
1657 | \begin{tabbing} |
---|
1658 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill % define tab position |
---|
1659 | \>{\sf int generate\_seed\_jac(m, n, JP, S, p)}\\ |
---|
1660 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1661 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1662 | \>{\sf unsigned int JP[][];} \> // row compressed sparsity structure |
---|
1663 | of Jacobian\\ |
---|
1664 | \>{\sf double S[n][p];} \> // seed matrix\\ |
---|
1665 | \>{\sf int p;} \> // number of columns in $S$ |
---|
1666 | \end{tabbing} |
---|
1667 | The input variables to {\sf generate\_seed\_jac} are the number of dependent variables $m$, the |
---|
1668 | number of independent variables {\sf n} and the sparsity pattern {\sf |
---|
1669 | JP} of the Jacobian computed for example by {\sf jac\_pat}. First, |
---|
1670 | {\sf generate\_seed\_jac} performs a distance-2 coloring of the bipartite graph defined by the sparsity |
---|
1671 | pattern {\sf JP} as described in \cite{GeMaPo05}. The number of colors needed for the coloring |
---|
1672 | determines the number of columns {\sf p} in the seed |
---|
1673 | matrix. Subsequently, {\sf generate\_seed\_jac} allocates the memory needed by {\sf |
---|
1674 | S} and initializes {\sf S} according to the graph coloring. |
---|
1675 | The coloring algorithm that is applied in {\sf |
---|
1676 | generate\_seed\_jac} is used also by the driver {\sf sparse\_jac} |
---|
1677 | described earlier. |
---|
1678 | |
---|
1679 | \begin{tabbing} |
---|
1680 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill % define tab position |
---|
1681 | \>{\sf int generate\_seed\_hess(n, HP, S, p)}\\ |
---|
1682 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1683 | \>{\sf unsigned int HP[][];} \> // row compressed sparsity structure |
---|
1684 | of Jacobian\\ |
---|
1685 | \>{\sf double S[n][p];} \> // seed matrix\\ |
---|
1686 | \>{\sf int p;} \> // number of columns in $S$ |
---|
1687 | \end{tabbing} |
---|
1688 | The input variables to {\sf generate\_seed\_hess} are the number of independents $n$ |
---|
1689 | and the sparsity pattern {\sf HP} of the Hessian computed for example |
---|
1690 | by {\sf hess\_pat}. First, {\sf generate\_seed\_hess} performs an |
---|
1691 | appropriate coloring of the adjacency graph defined by the sparsity |
---|
1692 | pattern {\sf HP}: An acyclic coloring in the case of an indirect recovery of the Hessian from its |
---|
1693 | compressed representation and a star coloring in the case of a direct recovery. |
---|
1694 | Subsequently, {\sf generate\_seed\_hess} allocates the memory needed by {\sf |
---|
1695 | S} and initializes {\sf S} according to the graph coloring. |
---|
1696 | The coloring algorithm applied in {\sf |
---|
1697 | generate\_seed\_hess} is used also by the driver {\sf sparse\_hess} |
---|
1698 | described earlier. |
---|
1699 | |
---|
1700 | The specific set of criteria used to define a seed matrix $S$ depends |
---|
1701 | on whether the sparse derivative matrix |
---|
1702 | to be computed is a Jacobian (nonsymmetric) or a Hessian (symmetric). |
---|
1703 | It also depends on whether the entries of the derivative matrix are to be |
---|
1704 | recovered from the compressed representation \emph{directly} |
---|
1705 | (without requiring any further arithmetic) or \emph{indirectly} (for |
---|
1706 | example, by solving for unknowns via successive substitutions). |
---|
1707 | Appropriate recovery routines are provided by ColPack and used |
---|
1708 | in the drivers {\sf sparse\_jac} and {\sf sparse\_hess} described in |
---|
1709 | the previous subsection. Examples with a detailed analysis of the |
---|
1710 | employed drivers for the exploitation of sparsity can be found in the |
---|
1711 | papers \cite{GePoTaWa06} and \cite{GePoWa08}. |
---|
1712 | |
---|
1713 | |
---|
1714 | These driver routines are prototyped in |
---|
1715 | \verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the |
---|
1716 | global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}). |
---|
1717 | An example code illustrating the usage of {\sf |
---|
1718 | generate\_seed\_jac} and {\sf generate\_seed\_hess} can be found in the file |
---|
1719 | \verb=sparse_jac_hess_exam.cpp= contained in \verb=examples/additional_examples/sparse=. |
---|
1720 | % |
---|
1721 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1722 | \subsection{Higher Derivative Tensors} |
---|
1723 | \label{higherOrderDeriv} |
---|
1724 | % |
---|
1725 | Many applications in scientific computing need second- and higher-order |
---|
1726 | derivatives. Often, one does not require full derivative tensors but |
---|
1727 | only the derivatives in certain directions $s_i \in \R^{n}$. |
---|
1728 | Suppose a collection of $p$ directions |
---|
1729 | $s_i \in \R^{n}$ is given, which form a matrix |
---|
1730 | \[ |
---|
1731 | S\; =\; \left [ s_1, s_2,\ldots, s_p \right ]\; \in \; |
---|
1732 | \R^{n \times p}. |
---|
1733 | \] |
---|
1734 | One possible choice is $S = I_n$ with $p = n$, which leads to |
---|
1735 | full tensors being evaluated. |
---|
1736 | ADOL-C provides the function {\sf tensor\_eval} |
---|
1737 | to calculate the derivative tensors |
---|
1738 | \begin{eqnarray} |
---|
1739 | \label{eq:tensor} |
---|
1740 | \left. \nabla_{\mbox{$\scriptstyle \!\!S$}}^{k} |
---|
1741 | F(x_0) \; = \; \frac{\partial^k}{\partial z^k} F(x_0+Sz) \right |_{z=0} |
---|
1742 | \in \R^{p^k}\quad \mbox{for} \quad k = 0,\ldots,d |
---|
1743 | \end{eqnarray} |
---|
1744 | simultaneously. The function {\sf tensor\_eval} has the following calling sequence and |
---|
1745 | parameters: |
---|
1746 | % |
---|
1747 | \begin{tabbing} |
---|
1748 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1749 | \>{\sf void tensor\_eval(tag,m,n,d,p,x,tensor,S)}\\ |
---|
1750 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1751 | \>{\sf int m;} \> // number of dependent variables $m$ \\ |
---|
1752 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1753 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
1754 | \>{\sf int p;} \> // number of directions $p$\\ |
---|
1755 | \>{\sf double x[n];} \> // values of independent variables $x_0$\\ |
---|
1756 | \>{\sf double tensor[m][size];}\> // result as defined in \eqref{eq:tensor} in compressed form\\ |
---|
1757 | \>{\sf double S[n][p];} \> // seed matrix $S$ |
---|
1758 | \end{tabbing} |
---|
1759 | % |
---|
1760 | Using the symmetry of the tensors defined by \eqref{eq:tensor}, the memory |
---|
1761 | requirement can be reduced enormously. The collection of tensors up to order $d$ comprises |
---|
1762 | $\binom{p+d}{d}$ distinct elements. Hence, the second dimension of {\sf tensor} must be |
---|
1763 | greater or equal to $\binom{p+d}{d}$. |
---|
1764 | To compute the derivatives, {\sf tensor\_eval} propagates internally univariate Taylor |
---|
1765 | series along $\binom{n+d-1}{d}$ directions. Then the desired values are interpolated. This |
---|
1766 | approach is described in \cite{Griewank97}. |
---|
1767 | |
---|
1768 | The access of individual entries in symmetric tensors of |
---|
1769 | higher order is a little tricky. We always store the derivative |
---|
1770 | values in the two dimensional array {\sf tensor} and provide two |
---|
1771 | different ways of accessing them. |
---|
1772 | The leading dimension of the tensor array ranges over |
---|
1773 | the component index $i$ of the function $F$, i.e., $F_{i+1}$ for $i = |
---|
1774 | 0,\ldots,m-1$. The sub-arrays pointed to by {\sf tensor[i]} have identical |
---|
1775 | structure for all $i$. Each of them represents the symmetric tensors up to |
---|
1776 | order $d$ of the scalar function $F_{i+1}$ in $p$ variables. |
---|
1777 | % |
---|
1778 | The $\binom{p+d}{d}$ mixed partial derivatives in each of the $m$ |
---|
1779 | tensors are linearly ordered according to the tetrahedral |
---|
1780 | scheme described by Knuth \cite{Knuth73}. In the familiar quadratic |
---|
1781 | case $d=2$ the derivative with respect to $z_j$ and $z_k$ with $z$ |
---|
1782 | as in \eqref{eq:tensor} and $j \leq k$ is stored at {\sf tensor[i][l]} with |
---|
1783 | $l = k*(k+1)/2+j$. At $j = 0 = k$ and hence $l = 0$ we find the |
---|
1784 | function value $F_{i+1}$ itself and the gradient |
---|
1785 | $\nabla F_{i+1}= \partial F_{i+1}/\partial x_k $ is stored at $l=k(k+1)/2$ |
---|
1786 | with $j=0$ for $k=1,\ldots,p$. |
---|
1787 | |
---|
1788 | For general $d$ we combine the variable |
---|
1789 | indices to a multi-index $j = (j_1,j_2,\ldots,j_d)$, |
---|
1790 | where $j_k$ indicates differentiation with respect to variable |
---|
1791 | $x_{j_k}$ with $j_k \in \{0,1,\ldots,p\}$. The value $j_k=0$ indicates |
---|
1792 | no differentiation so that all lower derivatives are also |
---|
1793 | contained in the same data structure as described above for |
---|
1794 | the quadratic case. The location of the partial derivative specified |
---|
1795 | by $j$ is computed by the function |
---|
1796 | % |
---|
1797 | \begin{tabbing} |
---|
1798 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1799 | \>{\sf int address(d,$\,$j)} \\ |
---|
1800 | \>{\sf int d;} \> // highest derivative degree $d$ \\ |
---|
1801 | \>{\sf int j[d];} \> // multi-index $j$ |
---|
1802 | \end{tabbing} |
---|
1803 | % |
---|
1804 | and it may thus be referenced as {\sf tensor[i][address(d,$\,$j)]}. |
---|
1805 | Notice that the address computation does depend on the degree $d$ |
---|
1806 | but not on the number of directions $p$, which could theoretically be |
---|
1807 | enlarged without the need to reallocate the original tensor. |
---|
1808 | Also, the components of $j$ need to be non-increasing. |
---|
1809 | % |
---|
1810 | To some C programmers it may appear more natural to access tensor |
---|
1811 | entries by successive dereferencing in the form |
---|
1812 | {\sf tensorentry[i][$\,$j1$\,$][$\,$j2$\,$]$\ldots$[$\,$jd$\,$]}. |
---|
1813 | We have also provided this mode, albeit with the restriction |
---|
1814 | that the indices $j_1,j_2,\ldots,j_d$ are non-increasing. |
---|
1815 | In the second order case this means that the Hessian entries must be |
---|
1816 | specified in or below the diagonal. If this restriction is |
---|
1817 | violated the values are almost certain to be wrong and array bounds |
---|
1818 | may be violated. We emphasize that subscripting is not overloaded |
---|
1819 | but that {\sf tensorentry} is a conventional and |
---|
1820 | thus moderately efficient C pointer structure. |
---|
1821 | Such a pointer structure can be allocated and set up completely by the |
---|
1822 | function |
---|
1823 | % |
---|
1824 | \begin{tabbing} |
---|
1825 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1826 | \>{\sf void** tensorsetup(m,p,d,tensor)} \\ |
---|
1827 | \>{\sf int m;} \> // number of dependent variables $n$ \\ |
---|
1828 | \>{\sf int p;} \> // number of directions $p$\\ |
---|
1829 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
1830 | \>{\sf double tensor[m][size];}\> // pointer to two dimensional array |
---|
1831 | \end{tabbing} |
---|
1832 | % |
---|
1833 | Here, {\sf tensor} is the array of $m$ pointers pointing to arrays of {\sf size} |
---|
1834 | $\geq \binom{p+d}{d}$ allocated by the user before. During the execution of {\sf tensorsetup}, |
---|
1835 | $d-1$ layers of pointers are set up so that the return value |
---|
1836 | allows the direct dereferencing of individual tensor elements. |
---|
1837 | |
---|
1838 | For example, suppose some active section involving $m \geq 5$ dependents and |
---|
1839 | $n \geq 2$ independents has been executed and taped. We may |
---|
1840 | select $p=2$, $d=3$ and initialize the $n\times 2$ seed matrix $S$ with two |
---|
1841 | columns $s_1$ and $s_2$. Then we are able to execute the code segment |
---|
1842 | \begin{tabbing} |
---|
1843 | \hspace{0.5in}\={\sf double**** tensorentry = (double****) tensorsetup(m,p,d,tensor);} \\ |
---|
1844 | \>{\sf tensor\_eval(tag,m,n,d,p,x,tensor,S);} |
---|
1845 | \end{tabbing} |
---|
1846 | This way, we evaluated all tensors defined in \eqref{eq:tensor} up to degree 3 |
---|
1847 | in both directions $s_1$ and |
---|
1848 | $s_2$ at some argument $x$. To allow the access of tensor entries by dereferencing the pointer |
---|
1849 | structure {\sf tensorentry} has been created. Now, |
---|
1850 | the value of the mixed partial |
---|
1851 | \[ |
---|
1852 | \left. \frac{\partial ^ 3 F_5(x+s_1 z_1+s_2 z_2)}{\partial z_1^2 \partial z_2} \right |_{z_1=0=z_2} |
---|
1853 | \] |
---|
1854 | can be recovered as |
---|
1855 | \begin{center} |
---|
1856 | {\sf tensorentry[4][2][1][1]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[4][address(d,$\,$j)]}, |
---|
1857 | \end{center} |
---|
1858 | where the integer array {\sf j} may equal (1,1,2), (1,2,1) or (2,1,1). |
---|
1859 | Analogously, the entry |
---|
1860 | \begin{center} |
---|
1861 | {\sf tensorentry[2][1][0][0]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[2][address(d,$\,$j)]} |
---|
1862 | \end{center} |
---|
1863 | with {\sf j} = (1,0,0) contains the first derivative of the third dependent |
---|
1864 | variable $F_3$ with respect to the first differentiation parameter $z_1$. |
---|
1865 | |
---|
1866 | Note, that the pointer structure {\sf tensorentry} has to be set up only once. Changing the values of the |
---|
1867 | array {\sf tensor}, e.g.~by a further call of {\sf tensor\_eval}, directly effects the values accessed |
---|
1868 | by {\sf tensorentry}. |
---|
1869 | % |
---|
1870 | When no more derivative evaluations are desired the pointer structure |
---|
1871 | {\sf tensorentry} can be deallocated by a call to the function |
---|
1872 | % |
---|
1873 | \begin{tabbing} |
---|
1874 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1875 | \>{\sf int freetensor(m,p,d, (double ****) tensorentry)}\\ |
---|
1876 | \>{\sf int m;} \> // number of dependent variables $m$ \\ |
---|
1877 | \>{\sf int p;} \> // number of independent variables $p$\\ |
---|
1878 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
1879 | \>{\sf double*** tensorentry[m];} \> // return value of {\sf tensorsetup} |
---|
1880 | \end{tabbing} |
---|
1881 | % |
---|
1882 | that does not deallocate the array {\sf tensor}. |
---|
1883 | |
---|
1884 | The drivers provided for efficient calculation of higher order |
---|
1885 | derivatives are prototyped in the header file \verb=<adolc/drivers/taylor.h>=, |
---|
1886 | which is included by the global header file \verb=<adolc/adolc.h>= automatically |
---|
1887 | (see \autoref{ssec:DesIH}). |
---|
1888 | Example codes using the above procedures can be found in the files |
---|
1889 | \verb=taylorexam.C= and \verb=accessexam.C= contained in the subdirectory |
---|
1890 | \verb=examples/additional_examples/taylor=. |
---|
1891 | % |
---|
1892 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1893 | \subsection{Derivatives of Implicit and Inverse Functions} |
---|
1894 | \label{implicitInverse} |
---|
1895 | % |
---|
1896 | Frequently, one needs derivatives of variables |
---|
1897 | $y \in \R^{m}$ that are implicitly defined as |
---|
1898 | functions of some variables $x \in \R^{n-m}$ |
---|
1899 | by an algebraic system of equations |
---|
1900 | \[ |
---|
1901 | G(z) \; = \; 0 \in \R^m \quad |
---|
1902 | {\rm with} \quad z = (y, x) \in \R^n . |
---|
1903 | \] |
---|
1904 | Naturally, the $n$ arguments of $G$ need not be partitioned in |
---|
1905 | this regular fashion and we wish to provide flexibility for a |
---|
1906 | convenient selection of the $n-m$ {\em truly} independent |
---|
1907 | variables. Let $P \in \R^{(n-m)\times n}$ be a $0-1$ matrix |
---|
1908 | that picks out these variables so that it is a column |
---|
1909 | permutation of the matrix $[0,I_{n-m}] \in \R^{(n-m)\times n}$. |
---|
1910 | Then the nonlinear system |
---|
1911 | \[ |
---|
1912 | G(z) \; = \; 0, \quad P z = x, |
---|
1913 | \] |
---|
1914 | has a regular Jacobian, wherever the implicit function theorem |
---|
1915 | yields $y$ as a function of $x$. Hence, we may also write |
---|
1916 | \begin{equation} |
---|
1917 | \label{eq:inv_tensor} |
---|
1918 | F(z) = \left(\begin{array}{c} |
---|
1919 | G(z) \\ |
---|
1920 | P z |
---|
1921 | \end{array} \right)\; \equiv \; |
---|
1922 | \left(\begin{array}{c} |
---|
1923 | 0 \\ |
---|
1924 | P z |
---|
1925 | \end{array} \right)\; \equiv \; S\, x, |
---|
1926 | \end{equation} |
---|
1927 | where $S = [0,I_p]^{T} \in \R^{n \times p}$ with $p=n-m$. Now, we have rewritten |
---|
1928 | the original implicit functional relation between $x$ and $y$ as an inverse |
---|
1929 | relation $F(z) = Sx$. In practice, we may implement the projection $P$ simply |
---|
1930 | by marking $n-m$ of the independents also dependent. |
---|
1931 | |
---|
1932 | Given any $ F : \R^n \mapsto \R^n $ that is locally invertible and an arbitrary |
---|
1933 | seed matrix $S \in \R^{n \times p}$ we may evaluate all derivatives of $z \in \R^n$ |
---|
1934 | with respect to $x \in \R^p$ by calling the following routine: |
---|
1935 | % |
---|
1936 | \begin{tabbing} |
---|
1937 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1938 | \>{\sf void inverse\_tensor\_eval(tag,n,d,p,z,tensor,S)}\\ |
---|
1939 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1940 | \>{\sf int n;} \> // number of variables $n$\\ |
---|
1941 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
1942 | \>{\sf int p;} \> // number of directions $p$\\ |
---|
1943 | \>{\sf double z[n];} \> // values of independent variables $z$\\ |
---|
1944 | \>{\sf double tensor[n][size];}\> // partials of $z$ with respect to $x$\\ |
---|
1945 | \>{\sf double S[n][p];} \> // seed matrix $S$ |
---|
1946 | \end{tabbing} |
---|
1947 | % |
---|
1948 | The results obtained in {\sf tensor} are exactly the same as if we had called {\sf tensor\_eval} with |
---|
1949 | {\sf tag} pointing to a tape for the evaluation of the inverse function |
---|
1950 | $z=F^{-1}(y)$ for which naturally $n=m$. Note that the columns of $S$ belong |
---|
1951 | to the domain of that function. Individual derivative components can be |
---|
1952 | accessed in tensor exactly as in the explicit case described above. |
---|
1953 | |
---|
1954 | It must be understood that {\sf inverse\_tensor\_eval} actually computes the |
---|
1955 | derivatives of $z$ with respect to $x$ that is defined by the equation |
---|
1956 | $F(z)=F(z_0)+S \, x$. In other words the base point at |
---|
1957 | which the inverse function is differentiated is given by $F(z_0)$. |
---|
1958 | The routine has no capability for inverting $F$ itself as |
---|
1959 | solving systems of nonlinear |
---|
1960 | equations $F(z)=0$ in the first place is not just a differentiation task. |
---|
1961 | However, the routine {\sf jac\_solv} described in \autoref{optdrivers} may certainly be very |
---|
1962 | useful for that purpose. |
---|
1963 | |
---|
1964 | As an example consider the following two nonlinear expressions |
---|
1965 | \begin{eqnarray*} |
---|
1966 | G_1(z_1,z_2,z_3,z_4) & = & z_1^2+z_2^2-z_3^2 \\ |
---|
1967 | G_2(z_1,z_2,z_3,z_4) & = & \cos(z_4) - z_1/z_3 \enspace . |
---|
1968 | \end{eqnarray*} |
---|
1969 | The equations $G(z)=0$ describe the relation between the Cartesian |
---|
1970 | coordinates $(z_1,z_2)$ and the polar coordinates $(z_3,z_4)$ in the plane. |
---|
1971 | Now, suppose we are interested in the derivatives of the second Cartesian |
---|
1972 | $y_1=z_2$ and the second (angular) polar coordinate $y_2=z_4$ with respect |
---|
1973 | to the other two variables $x_1=z_1$ and $x_2=z_3$. Then the active section |
---|
1974 | could look simply like |
---|
1975 | % |
---|
1976 | \begin{tabbing} |
---|
1977 | \hspace{1.5in}\={\sf for (j=1; j $<$ 5;$\,$j++)}\hspace{0.15in} \= {\sf z[j] \boldmath $\ll=$ \unboldmath zp[j];}\\ |
---|
1978 | \>{\sf g[1] = z[1]*z[1]+z[2]*z[2]-z[3]*z[3]; }\\ |
---|
1979 | \>{\sf g[2] = cos(z[4]) - z[1]/z[3]; }\\ |
---|
1980 | \>{\sf g[1] \boldmath $\gg=$ \unboldmath gp[1];} \> {\sf g[2] \boldmath $\gg=$ \unboldmath gp[2];}\\ |
---|
1981 | \>{\sf z[1] \boldmath $\gg=$ \unboldmath zd[1];} \> {\sf z[3] \boldmath $\gg=$ \unboldmath zd[2];} |
---|
1982 | \end{tabbing} |
---|
1983 | % |
---|
1984 | where {\sf zd[1]} and {\sf zd[2]} are dummy arguments. |
---|
1985 | In the last line the two independent variables {\sf z[1]} and |
---|
1986 | {\sf z[3]} are made |
---|
1987 | simultaneously dependent thus generating a square system that can be |
---|
1988 | inverted (at most arguments). The corresponding projection and seed |
---|
1989 | matrix are |
---|
1990 | \begin{eqnarray*} |
---|
1991 | P \;=\; \left( \begin{array}{cccc} |
---|
1992 | 1 & 0 & 0 & 0 \\ |
---|
1993 | 0 & 0 & 1 & 0 |
---|
1994 | \end{array}\right) \quad \mbox{and} \quad |
---|
1995 | S^T \; = \; \left( \begin{array}{cccc} |
---|
1996 | 0 & 0 & 1 & 0 \\ |
---|
1997 | 0 & 0 & 0 & 1 |
---|
1998 | \end{array}\right) \enspace . |
---|
1999 | \end{eqnarray*} |
---|
2000 | Provided the vector {\sf zp} is consistent in that its Cartesian and polar |
---|
2001 | components describe the same point in the plane the resulting tuple |
---|
2002 | {\sf gp} must vanish. The call to {\sf inverse\_tensor\_eval} with |
---|
2003 | $n=4$, $p=2$ and $d$ |
---|
2004 | as desired will yield the implicit derivatives, provided |
---|
2005 | {\sf tensor} has been allocated appropriately of course and $S$ has the value |
---|
2006 | given above. |
---|
2007 | % |
---|
2008 | The example is untypical in that the implicit function could also be |
---|
2009 | obtained explicitly by symbolic mani\-pu\-lations. It is typical in that |
---|
2010 | the subset of $z$ components that are to be considered as truly |
---|
2011 | independent can be selected and altered with next to no effort at all. |
---|
2012 | |
---|
2013 | The presented drivers are prototyped in the header file |
---|
2014 | \verb=<adolc/drivers/taylor.h>=. As indicated before this header |
---|
2015 | is included by the global header file \verb=<adolc/adolc.h>= automatically |
---|
2016 | (see \autoref{ssec:DesIH}). |
---|
2017 | The example programs \verb=inversexam.cpp=, \verb=coordinates.cpp= and |
---|
2018 | \verb=trigger.cpp= in the directory \verb=examples/additional_examples/taylor= |
---|
2019 | show the application of the procedures described here. |
---|
2020 | % |
---|
2021 | % |
---|
2022 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2023 | \section{Basic Drivers for the Forward and Reverse Mode} |
---|
2024 | \label{forw_rev_ad} |
---|
2025 | % |
---|
2026 | In this section, we present tailored drivers for different |
---|
2027 | variants of the forward mode and the reverse mode, respectively. |
---|
2028 | For a better understanding, we start with a short |
---|
2029 | description of the mathematical background. |
---|
2030 | |
---|
2031 | Provided no arithmetic exception occurs, |
---|
2032 | no comparison including {\sf fmax} or {\sf fmin} yields a tie, |
---|
2033 | {\sf fabs} does not yield zero, |
---|
2034 | and all special functions were evaluated in the |
---|
2035 | interior of their domains, the functional relation between the input |
---|
2036 | variables $x$ |
---|
2037 | and the output variables $y$ denoted by $y=F(x)$ is in |
---|
2038 | fact analytic. In other words, we can compute arbitrarily high |
---|
2039 | derivatives of the vector function $F : I\!\!R^n \mapsto I\!\!R^m$ defined |
---|
2040 | by the active section. |
---|
2041 | We find it most convenient to describe and |
---|
2042 | compute derivatives in terms of univariate Taylor expansions, which |
---|
2043 | are truncated after the highest derivative degree $d$ that is desired |
---|
2044 | by the user. Let |
---|
2045 | \begin{equation} |
---|
2046 | \label{eq:x_of_t} |
---|
2047 | x(t) \; \equiv \; \sum_{j=0}^dx_jt^j \; : \; I\!\!R \; \mapsto \; |
---|
2048 | I\!\!R^n |
---|
2049 | \end{equation} |
---|
2050 | denote any vector polynomial in the scalar variable $t \in I\!\!R$. |
---|
2051 | In other words, $x(t)$ describes a path in $I\!\!R^n$ parameterized by $t$. |
---|
2052 | The Taylor coefficient vectors |
---|
2053 | \[ x_j \; = \; |
---|
2054 | \frac{1}{j!} \left . \frac{\partial ^j}{\partial t^j} x(t) |
---|
2055 | \right |_{t=0} |
---|
2056 | \] |
---|
2057 | are simply the scaled derivatives of $x(t)$ at the parameter |
---|
2058 | origin $t=0$. The first two vectors $x_1,x_2 \in I\!\!R^n$ can be |
---|
2059 | visualized as tangent and curvature at the base point $x_0$, |
---|
2060 | respectively. |
---|
2061 | Provided that $F$ is $d$ times continuously differentiable, it |
---|
2062 | follows from the chain rule that the image path |
---|
2063 | \begin{equation} |
---|
2064 | \label{eq:rela} |
---|
2065 | y(t) \; \equiv \; F(x(t)) \; : \; I\!\!R \;\mapsto \;I\!\!R^m |
---|
2066 | \end{equation} |
---|
2067 | is also smooth and has $(d+1)$ Taylor coefficient vectors |
---|
2068 | $y_j \in I\!\!R^m$ at $t=0$, so that |
---|
2069 | \begin{equation} |
---|
2070 | \label{eq:series} |
---|
2071 | y(t) \; = \; \sum_{j=0}^d y_jt^j + O(t^{d+1}). |
---|
2072 | \end{equation} |
---|
2073 | Also as a consequence of the chain rule, one can observe that |
---|
2074 | each $y_j$ is uniquely and smoothly determined by the coefficient |
---|
2075 | vectors $x_i$ with $i \leq j$. In particular we have |
---|
2076 | \begin{align} |
---|
2077 | \label{eq:y_0y_1} |
---|
2078 | y_0 & = F(x_0) \nonumber \\ |
---|
2079 | y_1 & = F'(x_0) x_1 \nonumber\\ |
---|
2080 | y_2 & = F'(x_0) x_2 + \frac{1}{2}F''(x_0)x_1 x_1 \\ |
---|
2081 | y_3 & = F'(x_0) x_3 + F''(x_0)x_1 x_2 |
---|
2082 | + \frac{1}{6}F'''(x_0)x_1 x_1 x_1\nonumber\\ |
---|
2083 | & \ldots\nonumber |
---|
2084 | \end{align} |
---|
2085 | In writing down the last equations we have already departed from the |
---|
2086 | usual matrix-vector notation. It is well known that the number of |
---|
2087 | terms that occur in these ``symbolic'' expressions for |
---|
2088 | the $y_j$ in terms of the first $j$ derivative tensors of $F$ and |
---|
2089 | the ``input'' coefficients $x_i$ with $i\leq j$ grows very rapidly |
---|
2090 | with $j$. Fortunately, this exponential growth does not occur |
---|
2091 | in automatic differentiation, where the many terms are somehow |
---|
2092 | implicitly combined so that storage and operations count grow only |
---|
2093 | quadratically in the bound $d$ on $j$. |
---|
2094 | |
---|
2095 | Provided $F$ is analytic, this property is inherited by the functions |
---|
2096 | \[ |
---|
2097 | y_j = y_j (x_0,x_1, \ldots ,x_j) \in {I\!\!R}^m , |
---|
2098 | \] |
---|
2099 | and their derivatives satisfy the identities |
---|
2100 | \begin{equation} |
---|
2101 | \label{eq:identity} |
---|
2102 | \frac{\partial y_j}{\partial x_i} = \frac{\partial y_{j-i}} |
---|
2103 | {\partial x_0} = A_{j-i}(x_0,x_1, \ldots ,x_{j-i}) |
---|
2104 | \end{equation} |
---|
2105 | as established in \cite{Chri91a}. This yields in particular |
---|
2106 | \begin{align*} |
---|
2107 | \frac{\partial y_0}{\partial x_0} = |
---|
2108 | \frac{\partial y_1}{\partial x_1} = |
---|
2109 | \frac{\partial y_2}{\partial x_2} = |
---|
2110 | \frac{\partial y_3}{\partial x_3} = |
---|
2111 | A_0 & = F'(x_0) \\ |
---|
2112 | \frac{\partial y_1}{\partial x_0} = |
---|
2113 | \frac{\partial y_2}{\partial x_1} = |
---|
2114 | \frac{\partial y_3}{\partial x_2} = |
---|
2115 | A_1 & = F''(x_0) x_1 \\ |
---|
2116 | \frac{\partial y_2}{\partial x_0} = |
---|
2117 | \frac{\partial y_3}{\partial x_1} = |
---|
2118 | A_2 & = F''(x_0) x_2 + \frac{1}{2}F'''(x_0)x_1 x_1 \\ |
---|
2119 | \frac{\partial y_3}{\partial x_0} = |
---|
2120 | A_3 & = F''(x_0) x_3 + F'''(x_0)x_1 x_2 |
---|
2121 | + \frac{1}{6}F^{(4)}(x_0)x_1 x_1 x_1 \\ |
---|
2122 | & \ldots |
---|
2123 | \end{align*} |
---|
2124 | The $m \times n$ matrices $A_k, k=0,\ldots,d$, are actually the Taylor |
---|
2125 | coefficients of the Jacobian path $F^\prime(x(t))$, a fact that is of |
---|
2126 | interest primarily in the context of ordinary differential |
---|
2127 | equations and differential algebraic equations. |
---|
2128 | |
---|
2129 | Given the tape of an active section and the coefficients $x_j$, |
---|
2130 | the resulting $y_j$ and their derivatives $A_j$ can be evaluated |
---|
2131 | by appropriate calls to the ADOL-C forward mode implementations and |
---|
2132 | the ADOL-C reverse mode implementations. The scalar versions of the forward |
---|
2133 | mode propagate just one truncated Taylor series from the $(x_j)_{j\leq d}$ |
---|
2134 | to the $(y_j)_{j\leq d}$. The vector versions of the forward |
---|
2135 | mode propagate families of $p\geq 1$ such truncated Taylor series |
---|
2136 | in order to reduce the relative cost of the overhead incurred |
---|
2137 | in the tape interpretation. In detail, ADOL-C provides |
---|
2138 | \begin{tabbing} |
---|
2139 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2140 | \>{\sf int zos\_forward(tag,m,n,keep,x,y)}\\ |
---|
2141 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2142 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2143 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2144 | \>{\sf int keep;} \> // flag for reverse mode preparation\\ |
---|
2145 | \>{\sf double x[n];} \> // independent vector $x=x_0$\\ |
---|
2146 | \>{\sf double y[m];} \> // dependent vector $y=F(x_0)$ |
---|
2147 | \end{tabbing} |
---|
2148 | for the {\bf z}ero-{\bf o}rder {\bf s}calar forward mode. This driver computes |
---|
2149 | $y=F(x)$ with $0\leq\text{\sf keep}\leq 1$. The integer |
---|
2150 | flag {\sf keep} plays a similar role as in the call to |
---|
2151 | {\sf trace\_on}: It determines if {\sf zos\_forward} writes |
---|
2152 | the first Taylor coefficients of all intermediate quantities into a buffered |
---|
2153 | temporary file, i.e., the value stack, in preparation for a subsequent |
---|
2154 | reverse mode evaluation. The value {\sf keep} $=1$ |
---|
2155 | prepares for {\sf fos\_reverse} or {\sf fov\_reverse} as exlained below. |
---|
2156 | |
---|
2157 | To compute first-order derivatives, one has |
---|
2158 | \begin{tabbing} |
---|
2159 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2160 | \>{\sf int fos\_forward(tag,m,n,keep,x0,x1,y0,y1)}\\ |
---|
2161 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2162 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2163 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2164 | \>{\sf int keep;} \> // flag for reverse mode preparation\\ |
---|
2165 | \>{\sf double x0[n];} \> // independent vector $x_0$\\ |
---|
2166 | \>{\sf double x1[n];} \> // tangent vector $x_1$\\ |
---|
2167 | \>{\sf double y0[m];} \> // dependent vector $y_0=F(x_0)$\\ |
---|
2168 | \>{\sf double y1[m];} \> // first derivative $y_1=F'(x_0)x_1$ |
---|
2169 | \end{tabbing} |
---|
2170 | for the {\bf f}irst-{\bf o}rder {\bf s}calar forward mode. Here, one has |
---|
2171 | $0\leq\text{\sf keep}\leq 2$, where |
---|
2172 | \begin{align*} |
---|
2173 | \text{\sf keep} = \left\{\begin{array}{cl} |
---|
2174 | 1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\ |
---|
2175 | 2 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}} |
---|
2176 | \end{array}\right. |
---|
2177 | \end{align*} |
---|
2178 | as exlained below. For the {\bf f}irst-{\bf o}rder {\bf v}ector forward mode, |
---|
2179 | ADOL-C provides |
---|
2180 | \begin{tabbing} |
---|
2181 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2182 | \>{\sf int fov\_forward(tag,m,n,p,x0,X,y0,Y)}\\ |
---|
2183 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2184 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2185 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2186 | \>{\sf int p;} \> // number of directions\\ |
---|
2187 | \>{\sf double x0[n];} \> // independent vector $x_0$\\ |
---|
2188 | \>{\sf double X[n][p];} \> // tangent matrix $X$\\ |
---|
2189 | \>{\sf double y0[m];} \> // dependent vector $y_0=F(x_0)$\\ |
---|
2190 | \>{\sf double Y[m][p];} \> // first derivative matrix $Y=F'(x)X$ |
---|
2191 | \end{tabbing} |
---|
2192 | For the computation of higher derivative, the driver |
---|
2193 | \begin{tabbing} |
---|
2194 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2195 | \>{\sf int hos\_forward(tag,m,n,d,keep,x0,X,y0,Y)}\\ |
---|
2196 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2197 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2198 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2199 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2200 | \>{\sf int keep;} \> // flag for reverse mode preparation\\ |
---|
2201 | \>{\sf double x0[n];} \> // independent vector $x_0$\\ |
---|
2202 | \>{\sf double X[n][d];} \> // tangent matrix $X$\\ |
---|
2203 | \>{\sf double y0[m];} \> // dependent vector $y_0=F(x_0)$\\ |
---|
2204 | \>{\sf double Y[m][d];} \> // derivative matrix $Y$ |
---|
2205 | \end{tabbing} |
---|
2206 | implementing the {\bf h}igher-{\bf o}rder {\bf s}calar forward mode. |
---|
2207 | The rows of the matrix $X$ must correspond to the independent variables in the order of their |
---|
2208 | initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of |
---|
2209 | $X = \{x_j\}_{j=1\ldots d}$ represent Taylor coefficient vectors as in |
---|
2210 | \eqref{eq:x_of_t}. The rows of the matrix $Y$ must correspond to the |
---|
2211 | dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator. |
---|
2212 | The columns of $Y = \{y_j\}_{j=1\ldots d}$ represent |
---|
2213 | Taylor coefficient vectors as in \eqref{eq:series}, i.e., {\sf hos\_forward} |
---|
2214 | computes the values |
---|
2215 | $y_0=F(x_0)$, $y_1=F'(x_0)x_1$, \ldots, where |
---|
2216 | $X=[x_1,x_2,\ldots,x_d]$ and $Y=[y_1,y_2,\ldots,y_d]$. Furthermore, one has |
---|
2217 | $0\leq\text{\sf keep}\leq d+1$, with |
---|
2218 | \begin{align*} |
---|
2219 | \text{\sf keep} \left\{\begin{array}{cl} |
---|
2220 | = 1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\ |
---|
2221 | > 1 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}} |
---|
2222 | \end{array}\right. |
---|
2223 | \end{align*} |
---|
2224 | Once more, there is also a vector version given by |
---|
2225 | \begin{tabbing} |
---|
2226 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2227 | \>{\sf int hov\_forward(tag,m,n,d,p,x0,X,y0,Y)}\\ |
---|
2228 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2229 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2230 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2231 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2232 | \>{\sf int p;} \> // number of directions $p$\\ |
---|
2233 | \>{\sf double x0[n];} \> // independent vector $x_0$\\ |
---|
2234 | \>{\sf double X[n][p][d];} \> // tangent matrix $X$\\ |
---|
2235 | \>{\sf double y0[m];} \> // dependent vector $y_0=F(x_0)$\\ |
---|
2236 | \>{\sf double Y[m][p][d];} \> // derivative matrix $Y$ |
---|
2237 | \end{tabbing} |
---|
2238 | for the {\bf h}igher-{\bf o}rder {\bf v}ector forward mode that computes |
---|
2239 | $y_0=F(x_0)$, $Y_1=F'(x_0)X_1$, \ldots, where $X=[X_1,X_2,\ldots,X_d]$ and |
---|
2240 | $Y=[Y_1,Y_2,\ldots,Y_d]$. |
---|
2241 | |
---|
2242 | There are also overloaded versions providing a general {\sf forward}-call. |
---|
2243 | Details of the appropriate calling sequences are given in \autoref{forw_rev}. |
---|
2244 | |
---|
2245 | Once, the required information is generated due to a forward mode evaluation |
---|
2246 | with an approriate value of the parameter {\sf keep}, one may use the |
---|
2247 | following implementation variants of the reverse mode. To compute first-order derivatives |
---|
2248 | one can use |
---|
2249 | \begin{tabbing} |
---|
2250 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2251 | \>{\sf int fos\_reverse(tag,m,n,u,z)}\\ |
---|
2252 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2253 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2254 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2255 | \>{\sf double u[m];} \> // weight vector $u$\\ |
---|
2256 | \>{\sf double z[n];} \> // resulting adjoint value $z^T=u^T F'(x)$ |
---|
2257 | \end{tabbing} |
---|
2258 | as {\bf f}irst-{\bf o}rder {\bf s}calar reverse mode implementation that computes |
---|
2259 | the product $z^T=u^T F'(x)$ after calling {\sf zos\_forward}, {\sf fos\_forward}, or |
---|
2260 | {\sf hos\_forward} with {\sf keep}=1. The corresponding {\bf f}irst-{\bf |
---|
2261 | o}rder {\bf v}ector reverse mode driver is given by |
---|
2262 | \begin{tabbing} |
---|
2263 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2264 | \>{\sf int fov\_reverse(tag,m,n,q,U,Z)}\\ |
---|
2265 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2266 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2267 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2268 | \>{\sf int q;} \> // number of weight vectors $q$\\ |
---|
2269 | \>{\sf double U[q][m];} \> // weight matrix $U$\\ |
---|
2270 | \>{\sf double Z[q][n];} \> // resulting adjoint $Z=U F'(x)$ |
---|
2271 | \end{tabbing} |
---|
2272 | that can be used after calling {\sf zos\_forward}, {\sf fos\_forward}, or |
---|
2273 | {\sf hos\_forward} with {\sf keep}=1. To compute higher-order derivatives, |
---|
2274 | ADOL-C provides |
---|
2275 | \begin{tabbing} |
---|
2276 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2277 | \>{\sf int hos\_reverse(tag,m,n,d,u,Z)}\\ |
---|
2278 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2279 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2280 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2281 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2282 | \>{\sf double u[m];} \> // weight vector $u$\\ |
---|
2283 | \>{\sf double Z[n][d+1];} \> // resulting adjoints |
---|
2284 | \end{tabbing} |
---|
2285 | as {\bf h}igher-{\bf o}rder {\bf s}calar reverse mode implementation yielding |
---|
2286 | the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$, |
---|
2287 | \ldots, where $Z=[z_0,z_1,\ldots,z_d]$ after calling {\sf fos\_forward} or |
---|
2288 | {\sf hos\_forward} with {\sf keep} $=d+1>1$. The vector version is given by |
---|
2289 | \begin{tabbing} |
---|
2290 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2291 | \>{\sf int hov\_reverse(tag,m,n,d,q,U,Z,nz)}\\ |
---|
2292 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2293 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2294 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2295 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2296 | \>{\sf double U[q][m];} \> // weight vector $u$\\ |
---|
2297 | \>{\sf double Z[q][n][d+1];} \> // resulting adjoints\\ |
---|
2298 | \>{\sf short int nz[q][n];} \> // nonzero pattern of {\sf Z} |
---|
2299 | \end{tabbing} |
---|
2300 | as {\bf h}igher-{\bf o}rder {\bf v}ector reverse mode driver to compute |
---|
2301 | the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$, |
---|
2302 | \ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$ after calling {\sf fos\_forward} or |
---|
2303 | {\sf hos\_forward} with {\sf keep} $=d+1>1$. |
---|
2304 | After the function call, the last argument of {\sf hov\_reverse} |
---|
2305 | contains information about the sparsity pattern, i.e. each {\sf nz[i][j]} |
---|
2306 | has a value that characterizes the functional relation between the |
---|
2307 | $i$-th component of $UF^\prime(x)$ and the $j$-th independent value |
---|
2308 | $x_j$ as: |
---|
2309 | \begin{center} |
---|
2310 | \begin{tabular}{ll} |
---|
2311 | 0 & trivial \\ |
---|
2312 | 1 & linear |
---|
2313 | \end{tabular} \hspace*{4ex} |
---|
2314 | \begin{tabular}{ll} |
---|
2315 | 2 & polynomial\\ |
---|
2316 | 3 & rational |
---|
2317 | \end{tabular} \hspace*{4ex} |
---|
2318 | \begin{tabular}{ll} |
---|
2319 | 4 & transcendental\\ |
---|
2320 | 5 & non-smooth |
---|
2321 | \end{tabular} |
---|
2322 | \end{center} |
---|
2323 | Here, ``trivial'' means that there is no dependence at all and ``linear'' means |
---|
2324 | that the partial derivative is a constant that |
---|
2325 | does not dependent on other variables either. ``Non-smooth'' means that one of |
---|
2326 | the functions on the path between $x_i$ and $y_j$ was evaluated at a point |
---|
2327 | where it is not differentiable. All positive labels |
---|
2328 | $1, 2, 3, 4, 5$ are pessimistic in that the actual functional relation may |
---|
2329 | in fact be simpler, for example due to exact cancellations. |
---|
2330 | |
---|
2331 | There are also overloaded versions providing a general {\sf reverse}-call. |
---|
2332 | Details of the appropriate calling sequences are given in the following \autoref{forw_rev}. |
---|
2333 | % |
---|
2334 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2335 | \section{Overloaded Forward and Reverse Calls} |
---|
2336 | \label{forw_rev} |
---|
2337 | % |
---|
2338 | In this section, the several versions of the {\sf forward} and |
---|
2339 | {\sf reverse} routines, which utilize the overloading capabilities |
---|
2340 | of C++, are described in detail. With exception of the bit pattern |
---|
2341 | versions all interfaces are prototyped in the header file |
---|
2342 | \verb=<adolc/interfaces.h>=, where also some more specialized {\sf forward} |
---|
2343 | and {\sf reverse} routines are explained. Furthermore, \mbox{ADOL-C} provides |
---|
2344 | C and Fortran-callable versions prototyped in the same header file. |
---|
2345 | The bit pattern versions of {\sf forward} and {\sf reverse} introduced |
---|
2346 | in the \autoref{ProBit} are prototyped in the header file |
---|
2347 | \verb=<adolc/sparse/sparsedrivers.h>=, which will be included by the header |
---|
2348 | file \verb=<adolc/interfaces.h>= automatically. |
---|
2349 | % |
---|
2350 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2351 | \subsection{The Scalar Case} |
---|
2352 | % |
---|
2353 | \label{ScaCas} |
---|
2354 | % |
---|
2355 | Given any correct tape, one may call from within |
---|
2356 | the generating program, or subsequently during another run, the following |
---|
2357 | procedure: |
---|
2358 | % |
---|
2359 | \begin{tabbing} |
---|
2360 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2361 | \>{\sf int forward(tag,m,n,d,keep,X,Y)} \\ |
---|
2362 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2363 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2364 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2365 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2366 | \>{\sf int keep;} \> // flag for reverse sweep \\ |
---|
2367 | \>{\sf double X[n][d+1];} \> // Taylor coefficients $X$ of |
---|
2368 | independent variables \\ |
---|
2369 | \>{\sf double Y[m][d+1];} \> // Taylor coefficients $Y$ as |
---|
2370 | in \eqref{eq:series} |
---|
2371 | \end{tabbing} |
---|
2372 | % |
---|
2373 | The rows of the matrix $X$ must correspond to the independent variables in the order of their |
---|
2374 | initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of |
---|
2375 | $X = \{x_j\}_{j=0\ldots d}$ represent Taylor coefficient vectors as in |
---|
2376 | \eqref{eq:x_of_t}. The rows of the matrix $Y$ must |
---|
2377 | correspond to the |
---|
2378 | dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator. |
---|
2379 | The columns of $Y = \{y_j\}_{j=0\ldots d}$ represent |
---|
2380 | Taylor coefficient vectors as in \eqref{eq:series}. |
---|
2381 | Thus the first column of $Y$ contains the |
---|
2382 | function value $F(x)$ itself, the next column represents the first |
---|
2383 | Taylor coefficient vector of $F$, and the last column the |
---|
2384 | $d$-th Taylor coefficient vector. The integer flag {\sf keep} determines |
---|
2385 | how many Taylor coefficients of all intermediate quantities are |
---|
2386 | written into the value stack as explained in \autoref{forw_rev_ad}. |
---|
2387 | If {\sf keep} is omitted, it defaults to 0. |
---|
2388 | |
---|
2389 | The given {\sf tag} value is used by {\sf forward} to determine the |
---|
2390 | name of the file on which the tape was written. If the tape file does |
---|
2391 | not exist, {\sf forward} assumes that the relevant |
---|
2392 | tape is still in core and reads from the buffers. |
---|
2393 | After the execution of an active section with \mbox{{\sf keep} = 1} or a call to |
---|
2394 | {\sf forward} with any {\sf keep} $\leq$ $d+1$, one may call |
---|
2395 | the function {\sf reverse} with \mbox{{\sf d} = {\sf keep} $-$ 1} and the same tape |
---|
2396 | identifier {\sf tag}. When $u$ is a vector |
---|
2397 | and $Z$ an $n\times (d+1)$ matrix |
---|
2398 | {\sf reverse} is executed in the scalar mode by the calling |
---|
2399 | sequence |
---|
2400 | % |
---|
2401 | \begin{tabbing} |
---|
2402 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2403 | \>{\sf int reverse(tag,m,n,d,u,Z)}\\ |
---|
2404 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2405 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2406 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2407 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2408 | \>{\sf double u[m];} \> // weighting vector $u$\\ |
---|
2409 | \>{\sf double Z[n][d+1];} \> // resulting adjoints $Z$ |
---|
2410 | \end{tabbing} |
---|
2411 | to compute |
---|
2412 | the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$, |
---|
2413 | \ldots, where $Z=[z_0,z_1,\ldots,z_d]$. |
---|
2414 | % |
---|
2415 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2416 | \subsection{The Vector Case} |
---|
2417 | % |
---|
2418 | \label{vecCas} |
---|
2419 | % |
---|
2420 | When $U$ is a matrix {\sf reverse} is executed in the vector mode by the following calling sequence |
---|
2421 | % |
---|
2422 | \begin{tabbing} |
---|
2423 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2424 | \>{\sf int reverse(tag,m,n,d,q,U,Z,nz)}\\ |
---|
2425 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2426 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2427 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2428 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2429 | \>{\sf int q;} \> // number of weight vectors $q$\\ |
---|
2430 | \>{\sf double U[q][m];} \> // weight matrix $U$\\ |
---|
2431 | \>{\sf double Z[q][n][d+1];} \> // resulting adjoints \\ |
---|
2432 | \>{\sf short nz[q][n];} \> // nonzero pattern of {\sf Z} |
---|
2433 | \end{tabbing} |
---|
2434 | % |
---|
2435 | to compute the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$, |
---|
2436 | \ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$. |
---|
2437 | When the arguments {\sf p} and {\sf U} are omitted, they default to |
---|
2438 | $m$ and the identity matrix of order $m$, respectively. |
---|
2439 | |
---|
2440 | Through the optional argument {\sf nz} of {\sf reverse} one can compute |
---|
2441 | information about the sparsity pattern of $Z$ as described in detail |
---|
2442 | in the previous \autoref{forw_rev_ad}. |
---|
2443 | |
---|
2444 | The return values of {\sf reverse} calls can be interpreted according |
---|
2445 | to \autoref{retvalues}, but negative return values are not |
---|
2446 | valid, since the corresponding forward sweep would have |
---|
2447 | stopped without completing the necessary taylor file. |
---|
2448 | The return value of {\sf reverse} may be higher |
---|
2449 | than that of the preceding {\sf forward} call because some operations |
---|
2450 | that were evaluated at a critical argument during the forward sweep |
---|
2451 | were found not to impact the dependents during the reverse sweep. |
---|
2452 | |
---|
2453 | In both scalar and vector mode, the degree $d$ must agree with |
---|
2454 | {\sf keep}~$-$~1 for the most recent call to {\sf forward}, or it must be |
---|
2455 | equal to zero if {\sf reverse} directly follows the taping of an active |
---|
2456 | section. Otherwise, {\sf reverse} will return control with a suitable error |
---|
2457 | message. |
---|
2458 | In order to avoid possible confusion, the first four arguments must always be |
---|
2459 | present in the calling sequence. However, if $m$ or $d$ |
---|
2460 | attain their trivial values 1 and 0, respectively, then |
---|
2461 | corresponding dimensions of the arrays {\sf X}, {\sf Y}, {\sf u}, |
---|
2462 | {\sf U}, or {\sf Z} can be omitted, thus eliminating one level of |
---|
2463 | indirection. For example, we may call |
---|
2464 | {\sf reverse(tag,1,n,0,1.0,g)} after declaring |
---|
2465 | {\sf double g[n]} |
---|
2466 | to calculate a gradient of a scalar-valued function. |
---|
2467 | |
---|
2468 | Sometimes it may be useful to perform a forward sweep for families of |
---|
2469 | Taylor series with the same leading term. |
---|
2470 | This vector version of {\sf forward} can be called in the form |
---|
2471 | % |
---|
2472 | \begin{tabbing} |
---|
2473 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2474 | \>{\sf int forward(tag,m,n,d,p,x0,X,y0,Y)}\\ |
---|
2475 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2476 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2477 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2478 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2479 | \>{\sf int p;} \> // number of Taylor series $p$\\ |
---|
2480 | \>{\sf double x0[n];} \> // values of independent variables $x_0$\\ |
---|
2481 | \>{\sf double X[n][p][d];} \> // Taylor coefficients $X$ of independent variables\\ |
---|
2482 | \>{\sf double y0[m];} \> // values of dependent variables $y_0$\\ |
---|
2483 | \>{\sf double Y[m][p][d];} \> // Taylor coefficients $Y$ of dependent variables |
---|
2484 | \end{tabbing} |
---|
2485 | % |
---|
2486 | where {\sf X} and {\sf Y} hold the Taylor coefficients of first |
---|
2487 | and higher degree and {\sf x0}, {\sf y0} the common Taylor coefficients of |
---|
2488 | degree 0. There is no option to keep the values of active variables |
---|
2489 | that are going out of scope or that are overwritten. Therefore this |
---|
2490 | function cannot prepare a subsequent reverse sweep. |
---|
2491 | The return integer serves as a flag to indicate quadratures or altered |
---|
2492 | comparisons as described above in \autoref{reuse_tape}. |
---|
2493 | |
---|
2494 | Since the calculation of Jacobians is probably the most important |
---|
2495 | automatic differentia\-tion task, we have provided a specialization |
---|
2496 | of vector {\sf forward} to the case where $d = 1$. This version can be |
---|
2497 | called in the form |
---|
2498 | % |
---|
2499 | \begin{tabbing} |
---|
2500 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2501 | \>{\sf int forward(tag,m,n,p,x,X,y,Y)}\\ |
---|
2502 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2503 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2504 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2505 | \>{\sf int p;} \> // number of partial derivatives $p$ \\ |
---|
2506 | \>{\sf double x[n];} \> // values of independent variables $x_0$\\ |
---|
2507 | \>{\sf double X[n][p];} \> // seed derivatives of independent variables $X$\\ |
---|
2508 | \>{\sf double y[m];} \> // values of dependent variables $y_0$\\ |
---|
2509 | \>{\sf double Y[m][p];} \> // first derivatives of dependent variables $Y$ |
---|
2510 | \end{tabbing} |
---|
2511 | % |
---|
2512 | When this routine is called with {\sf p} = {\sf n} and {\sf X} the identity matrix, |
---|
2513 | the resulting {\sf Y} is simply the Jacobian $F^\prime(x_0)$. In general, |
---|
2514 | one obtains the $m\times p$ matrix $Y=F^\prime(x_0)\,X $ for the |
---|
2515 | chosen initialization of $X$. In a workstation environment a value |
---|
2516 | of $p$ somewhere between $10$ and $50$ |
---|
2517 | appears to be fairly optimal. For smaller $p$ the interpretive |
---|
2518 | overhead is not appropriately amortized, and for larger $p$ the |
---|
2519 | $p$-fold increase in storage causes too many page faults. Therefore, |
---|
2520 | large Jacobians that cannot be compressed via column coloring |
---|
2521 | as could be done for example using the driver {\sf sparse\_jac} |
---|
2522 | should be ``strip-mined'' in the sense that the above |
---|
2523 | first-order-vector version of {\sf forward} is called |
---|
2524 | repeatedly with the successive \mbox{$n \times p$} matrices $X$ forming |
---|
2525 | a partition of the identity matrix of order $n$. |
---|
2526 | % |
---|
2527 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2528 | \subsection{Dependence Analysis} |
---|
2529 | % |
---|
2530 | \label{ProBit} |
---|
2531 | % |
---|
2532 | The sparsity pattern of Jacobians is often needed to set up data structures |
---|
2533 | for their storage and factorization or to allow their economical evaluation |
---|
2534 | by compression \cite{BeKh96}. Compared to the evaluation of the full |
---|
2535 | Jacobian $F'(x_0)$ in real arithmetic computing the Boolean matrix |
---|
2536 | $\tilde{P}\in\left\{0,1\right\}^{m\times n}$ representing its sparsity |
---|
2537 | pattern in the obvious way requires a little less run-time and |
---|
2538 | certainly a lot less memory. |
---|
2539 | |
---|
2540 | The entry $\tilde{P}_{ji}$ in the $j$-th row and $i$-th column |
---|
2541 | of $\tilde{P}$ should be $1 = true$ exactly when there is a data |
---|
2542 | dependence between the $i$-th independent variable $x_{i}$ and |
---|
2543 | the $j$-th dependent variable $y_{j}$. Just like for real arguments |
---|
2544 | one would wish to compute matrix-vector and vector-matrix products |
---|
2545 | of the form $\tilde{P}\tilde{v}$ or $\tilde{u}^{T}\tilde{P}$ |
---|
2546 | by appropriate {\sf forward} and {\sf reverse} routines where |
---|
2547 | $\tilde{v}\in\{0,1\}^{n}$ and $\tilde{u}\in\{0,1\}^{m}$. |
---|
2548 | Here, multiplication corresponds to logical |
---|
2549 | {\sf AND} and addition to logical {\sf OR}, so that algebra is performed in a |
---|
2550 | semi-ring. |
---|
2551 | |
---|
2552 | For practical reasons it is assumed that |
---|
2553 | $s=8*${\sf sizeof}$(${\sf unsigned long int}$)$ such Boolean vectors |
---|
2554 | $\tilde{v}$ and $\tilde{u}$ are combined to integer vectors |
---|
2555 | $v\in\N^{n}$ and $u\in\N^{m}$ whose components can be interpreted |
---|
2556 | as bit patterns. Moreover $p$ or $q$ such integer vectors may |
---|
2557 | be combined column-wise or row-wise to integer matrices $X\in\N^{n \times p}$ |
---|
2558 | and $U\in\N^{q \times m}$, which naturally correspond |
---|
2559 | to Boolean matrices $\tilde{X}\in\{0,1\}^{n\times\left(sp\right)}$ |
---|
2560 | and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times m}$. The provided |
---|
2561 | bit pattern versions of {\sf forward} and {\sf reverse} allow |
---|
2562 | to compute integer matrices $Y\in\N^{m \times p}$ and |
---|
2563 | $Z\in\N^{q \times m}$ corresponding to |
---|
2564 | \begin{equation} |
---|
2565 | \label{eq:int_forrev} |
---|
2566 | \tilde{Y} = \tilde{P}\tilde{X} \qquad \mbox{and} \qquad |
---|
2567 | \tilde{Z} = \tilde{U}\tilde{P} \, , |
---|
2568 | \end{equation} |
---|
2569 | respectively, with $\tilde{Y}\in\{0,1\}^{m\times\left(sp\right)}$ |
---|
2570 | and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times n}$. |
---|
2571 | In general, the application of the bit pattern versions of |
---|
2572 | {\sf forward} or {\sf reverse} can be interpreted as |
---|
2573 | propagating dependences between variables forward or backward, therefore |
---|
2574 | both the propagated integer matrices and the corresponding |
---|
2575 | Boolean matrices are called {\em dependence structures}. |
---|
2576 | |
---|
2577 | The bit pattern {\sf forward} routine |
---|
2578 | % |
---|
2579 | \begin{tabbing} |
---|
2580 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2581 | \>{\sf int forward(tag,m,n,p,x,X,y,Y,mode)}\\ |
---|
2582 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2583 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2584 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2585 | \>{\sf int p;} \> // number of integers propagated $p$\\ |
---|
2586 | \>{\sf double x[n];} \> // values of independent variables $x_0$\\ |
---|
2587 | \>{\sf unsigned long int X[n][p];} \> // dependence structure $X$ \\ |
---|
2588 | \>{\sf double y[m];} \> // values of dependent variables $y_0$\\ |
---|
2589 | \>{\sf unsigned long int Y[m][p];} \> // dependence structure $Y$ according to |
---|
2590 | \eqref{eq:int_forrev}\\ |
---|
2591 | \>{\sf char mode;} \> // 0 : safe mode (default), 1 : tight mode |
---|
2592 | \end{tabbing} |
---|
2593 | % |
---|
2594 | can be used to obtain the dependence structure $Y$ for a given dependence structure |
---|
2595 | $X$. The dependence structures are |
---|
2596 | represented as arrays of {\sf unsigned long int} the entries of which are |
---|
2597 | interpreted as bit patterns as described above. |
---|
2598 | For example, for $n=3$ the identity matrix $I_3$ should be passed |
---|
2599 | with $p=1$ as the $3 \times 1$ array |
---|
2600 | \begin{eqnarray*} |
---|
2601 | {\sf X} \; = \; |
---|
2602 | \left( \begin{array}{r} |
---|
2603 | {\sf 1}0000000 \: 00000000 \: 00000000 \: 00000000_2 \\ |
---|
2604 | 0{\sf 1}000000 \: 00000000 \: 00000000 \: 00000000_2 \\ |
---|
2605 | 00{\sf 1}00000 \: 00000000 \: 00000000 \: 00000000_2 |
---|
2606 | \end{array} \right) |
---|
2607 | \end{eqnarray*} |
---|
2608 | in the 4-byte long integer format. The parameter {\sf mode} determines |
---|
2609 | the mode of dependence analysis as explained already in \autoref{sparse}. |
---|
2610 | |
---|
2611 | A call to the corresponding bit pattern {\sf reverse} routine |
---|
2612 | % |
---|
2613 | \begin{tabbing} |
---|
2614 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2615 | \>{\sf int reverse(tag,m,n,q,U,Z,mode)}\\ |
---|
2616 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2617 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2618 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2619 | \>{\sf int q;} \> // number of integers propagated q\\ |
---|
2620 | \>{\sf unsigned long int U[q][m];} \> // dependence structure $U$ \\ |
---|
2621 | \>{\sf unsigned long int Z[q][n];} \> // dependence structure $Z$ according |
---|
2622 | to \eqref{eq:int_forrev}\\ |
---|
2623 | \>{\sf char mode;} \> // 0 : safe mode (default), 1 : tight mode |
---|
2624 | \end{tabbing} |
---|
2625 | % |
---|
2626 | yields the dependence structure $Z$ for a given dependence structure |
---|
2627 | $U$. |
---|
2628 | |
---|
2629 | To determine the whole sparsity pattern $\tilde{P}$ of the Jacobian $F'(x)$ |
---|
2630 | as an integer matrix $P$ one may call {\sf forward} or {\sf reverse} |
---|
2631 | with $p \ge n/s$ or $q \ge m/s$, respectively. For this purpose the |
---|
2632 | corresponding dependence structure $X$ or $U$ must be defined to represent |
---|
2633 | the identity matrix of the respective dimension. |
---|
2634 | Due to the fact that always a multiple of $s$ Boolean vectors are propagated |
---|
2635 | there may be superfluous vectors, which can be set to zero. |
---|
2636 | |
---|
2637 | The return values of the bit pattern {\sf forward} and {\sf reverse} routines |
---|
2638 | correspond to those described in \autoref{retvalues}. |
---|
2639 | |
---|
2640 | One can control the storage growth by the factor $p$ using |
---|
2641 | ``strip-mining'' for the calls of {\sf forward} or {\sf reverse} with successive |
---|
2642 | groups of columns or respectively rows at a time, i.e.~partitioning |
---|
2643 | $X$ or $U$ appropriately as described for the computation of Jacobians |
---|
2644 | in \autoref{vecCas}. |
---|
2645 | % |
---|
2646 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2647 | % |
---|
2648 | \section{Advanced algorithmic differentiation in ADOL-C} |
---|
2649 | \label{adv_ad} |
---|
2650 | % |
---|
2651 | \subsection{Differentiating external functions} |
---|
2652 | % |
---|
2653 | Ideally, AD is applied to a given computation as a whole. |
---|
2654 | This is not always possible because parts of the computation may |
---|
2655 | be coded in a different programming language or may a call to |
---|
2656 | an external library. |
---|
2657 | In the former case one may want to differentiate the parts in |
---|
2658 | question with a different AD tool or provide hand written derivatives. |
---|
2659 | To integrate these |
---|
2660 | In practice, however, sophisticated projects usually evolve over a long period of time. |
---|
2661 | Within this process, a heterogeneous code base for the project |
---|
2662 | develops, which may include the incorporation of external solutions, |
---|
2663 | changes in programming paradigms or even of programming languages. |
---|
2664 | Equally heterogeneous, the computation of derivative values appears. |
---|
2665 | Hence, different \mbox{AD-tools} may be combined with hand-derived |
---|
2666 | codes based on the same or different programming languages. |
---|
2667 | ADOL-C supports such settings by the concept of externally |
---|
2668 | differentiated functions, that is, a function |
---|
2669 | not differentiated by ADOL-C itself. The required derivatives |
---|
2670 | have to be provided by the user. |
---|
2671 | |
---|
2672 | For this purpose, it is required that the externally differentiated |
---|
2673 | function (for example named {\sf\em euler\_step} ) has the following signature. |
---|
2674 | \smallskip |
---|
2675 | |
---|
2676 | \noindent |
---|
2677 | \hspace*{2cm}{\sf int euler\_step(int n, double *x, int m, double *y);} |
---|
2678 | \medskip |
---|
2679 | |
---|
2680 | \noindent |
---|
2681 | Note that the formal paraemters in the signature have {\sf double} type, that is, |
---|
2682 | they are not active as in the original program before the ADOL-C type change. |
---|
2683 | The externally differentiated function has to |
---|
2684 | be {\em registered}\footnote{we record the function pointer} using an \mbox{ADOL-C} method as follows. |
---|
2685 | \smallskip |
---|
2686 | |
---|
2687 | \noindent |
---|
2688 | \hspace*{2cm}{\sf ext\_diff\_fct *edf = reg\_ext\_fct(euler\_step);}. |
---|
2689 | \smallskip |
---|
2690 | |
---|
2691 | \noindent |
---|
2692 | This returns a pointer to an {\sf ext\_diff\_fct} instance specific to the registered function. |
---|
2693 | Then, the user has to supply the function pointers for the call back methods (for example here |
---|
2694 | {\sf zos\_for\_euler\_step} and {\sf fos\_rev\_euler\_step}) the user implemented |
---|
2695 | to compute the derivatives as follows. |
---|
2696 | \begin{tabbing} |
---|
2697 | \hspace*{2cm}\= {\sf edf-$>$zos\_forward = {\em zos\_for\_euler\_step};}\\ |
---|
2698 | \> {\sf // function pointer for computing |
---|
2699 | Zero-Order-Scalar (=zos)}\\ |
---|
2700 | \> {\sf // forward information}\\ |
---|
2701 | % \> {\sf edf-$>$dp\_x = xp;}\\ |
---|
2702 | % \> {\sf edf-$>$dp\_y = yp;}\\ |
---|
2703 | % \> {\sf // double arrays for arguments and results}\\ |
---|
2704 | \> {\sf edf-$>$fos\_reverse = fos\_rev\_euler\_step;} \\ |
---|
2705 | \> {\sf // function pointer for computing |
---|
2706 | First-Order-Scalar (=fos)}\\ |
---|
2707 | \> {\sf reverse information} |
---|
2708 | \end{tabbing} |
---|
2709 | To facilitate the switch between active and passive versions of the parameters {\sf x} and {\sf y} |
---|
2710 | one has to provide (allocate) both variants. I.e. if the call to {\sf euler\_step} was originally |
---|
2711 | \noindent |
---|
2712 | \hspace*{2cm}{\sf rc=euler\_step(n, sOld, m, sNew);} |
---|
2713 | \medskip |
---|
2714 | then the ADOL-C typechange for the calling context will turn {\sf sOld} and {\sf sNew} in {\sf adouble} pointers. |
---|
2715 | To trigger the appropriate action for the derivative computation (i.e. creating an external differentiation entry on the trace) |
---|
2716 | the original call to the externally differentiated function |
---|
2717 | must be substituted by |
---|
2718 | \medskip |
---|
2719 | |
---|
2720 | \noindent |
---|
2721 | \hspace*{2cm}{\sf rc=call\_ext\_fct(edf, n, sOldPassive, sOld, m, sNewPassive, sNew);} |
---|
2722 | \medskip |
---|
2723 | |
---|
2724 | \noindent |
---|
2725 | Here, {\sf sOldPassive} and {\sf sNewPassive} are the passive counterparts ({\sf double} pointers allocated to length {\sf n} and {\sf m}, respectively) |
---|
2726 | to the active arguments {\sf sNew} in {\sf adouble}. |
---|
2727 | The usage of the external function facility is illustrated by the |
---|
2728 | example \verb=ext_diff_func= contained in |
---|
2729 | \verb=examples/additional_examples/ext_diff_func=. |
---|
2730 | There,the external differentiated function is also a C code, but the |
---|
2731 | handling as external differentiated functions also a decrease of the |
---|
2732 | overall required tape size. |
---|
2733 | |
---|
2734 | % |
---|
2735 | \subsection{Advanced algorithmic differentiation of time integration processes} |
---|
2736 | % |
---|
2737 | For many time-dependent applications, the corresponding simulations |
---|
2738 | are based on ordinary or partial differential equations. |
---|
2739 | Furthermore, frequently there are quantities that influence the |
---|
2740 | result of the simulation and can be seen as control of the systems. |
---|
2741 | To compute an approximation of the |
---|
2742 | simulated process for a time interval $[0,T]$ and evaluated the |
---|
2743 | desired target function, one applies an |
---|
2744 | appropriate integration scheme given by |
---|
2745 | \begin{tabbing} |
---|
2746 | \hspace{5mm} \= some initializations yielding $x_0$\\ |
---|
2747 | \> for $i=0,\ldots, N-1$\\ |
---|
2748 | \hspace{10mm}\= $x_{i+1} = F(x_i,u_i,t_i)$\\ |
---|
2749 | \hspace{5mm} \= evaluation of the target function |
---|
2750 | \end{tabbing} |
---|
2751 | where $x_i\in {\bf R}^n$ denotes the state and $u_i\in {\bf R}^m$ the control at |
---|
2752 | time $t_i$ for a given time grid $t_0,\ldots,t_N$ with $t_0=0$ and |
---|
2753 | $t_N=T$. The operator $F : {\bf R}^n \times {\bf R}^m \times {\bf R} \mapsto {\bf R}^n$ |
---|
2754 | defines the time step to compute the state at time $t_i$. Note that we |
---|
2755 | do not assume a uniform grid. |
---|
2756 | |
---|
2757 | When computing derivatives of the target function with respect to the |
---|
2758 | control, the consequences for the tape generation using the ``basic'' |
---|
2759 | taping approach as implemented in ADOL-C so far are shown in the left part of |
---|
2760 | \autoref{fig:bas_tap}. |
---|
2761 | \begin{figure}[htbp] |
---|
2762 | \begin{center} |
---|
2763 | \hspace*{0.5cm}\includegraphics[width=5.8cm]{tapebasic}\hfill |
---|
2764 | \includegraphics[width=5.8cm]{tapeadv} \hspace*{0.5cm}\ |
---|
2765 | \end{center} |
---|
2766 | \hspace*{0.8cm} Basic taping process \hspace*{4.3cm} Advanced taping process |
---|
2767 | \caption{Different taping approaches} |
---|
2768 | \label{fig:bas_tap} |
---|
2769 | \end{figure} |
---|
2770 | As can be seen, the iterative process is completely |
---|
2771 | unrolled due to the taping process. That is, the tape contains an internal representation of each |
---|
2772 | time step. Hence, the overall tape comprises a serious amount of redundant |
---|
2773 | information as illustrated by the light grey rectangles in |
---|
2774 | \autoref{fig:bas_tap}. |
---|
2775 | |
---|
2776 | To overcome the repeated storage of essentially the same information, |
---|
2777 | a {\em nested taping} mechanism has been incorporated into ADOL-C as illustrated on |
---|
2778 | the right-hand side of \autoref{fig:bas_tap}. This new |
---|
2779 | capability allows the encapsulation of the time-stepping procedure |
---|
2780 | such that only the last time step $x_{N} = F(x_{N-1},u_{N-1})$ is taped as one |
---|
2781 | representative of the time steps in addition to a function pointer to the |
---|
2782 | evaluation procedure $F$ of the time steps. The function pointer has |
---|
2783 | to be stored for a possibly necessary retaping during the derivative calculation |
---|
2784 | as explained below. |
---|
2785 | |
---|
2786 | Instead of storing the complete tape, only a very limited number of intermediate |
---|
2787 | states are kept in memory. They serve as checkpoints, such that |
---|
2788 | the required information for the backward integration is generated |
---|
2789 | piecewise during the adjoint calculation. |
---|
2790 | For this modified adjoint computation the optimal checkpointing schedules |
---|
2791 | provided by {\bf revolve} are employed. An adapted version of the |
---|
2792 | software package {\sf revolve} is part of ADOL-C and automatically |
---|
2793 | integrated in the ADOL-C library. Based on {\sf revolve}, $c$ checkpoints are |
---|
2794 | distributed such that computational effort is minimized for the given |
---|
2795 | number of checkpoints and time steps $N$. It is important to note that the overall tape |
---|
2796 | size is drastically reduced due to the advanced taping strategy. For the |
---|
2797 | implementation of this nested taping we introduced |
---|
2798 | a so-called ``differentiating context'' that enables \mbox{ADOL-C} to |
---|
2799 | handle different internal function representations during the taping |
---|
2800 | procedure and the derivative calculation. This approach allows the generation of a new |
---|
2801 | tape inside the overall tape, where the coupling of the different tapes is based on |
---|
2802 | the {\em external differentiated function} described above. |
---|
2803 | |
---|
2804 | Written under the objective of minimal user effort, the checkpointing routines |
---|
2805 | of \mbox{ADOL-C} need only very limited information. The user must |
---|
2806 | provide two routines as implementation of the time-stepping function $F$ |
---|
2807 | with the signatures |
---|
2808 | \medskip |
---|
2809 | |
---|
2810 | \noindent |
---|
2811 | \hspace*{2cm}{\sf int time\_step\_function(int n, adouble *u);}\\ |
---|
2812 | \hspace*{2cm}{\sf int time\_step\_function(int n, double *u);} |
---|
2813 | \medskip |
---|
2814 | |
---|
2815 | \noindent |
---|
2816 | where the function names can be chosen by the user as long as the names are |
---|
2817 | unique.It is possible that the result vector of one time step |
---|
2818 | iteration overwrites the argument vector of the same time step. Then, no |
---|
2819 | copy operations are required to prepare the next time step. |
---|
2820 | |
---|
2821 | At first, the {\sf adouble} version of the time step function has to |
---|
2822 | be {\em registered} using the \mbox{ADOL-C} function |
---|
2823 | \medskip |
---|
2824 | |
---|
2825 | \noindent |
---|
2826 | \hspace*{2cm}{\sf CP\_Context cpc(time\_step\_function);}. |
---|
2827 | \medskip |
---|
2828 | |
---|
2829 | \noindent |
---|
2830 | This function initializes the structure {\sf cpc}. Then, |
---|
2831 | the user has to provide the remaining checkpointing information |
---|
2832 | by the following commands: |
---|
2833 | \begin{tabbing} |
---|
2834 | \hspace*{2cm}\= {\sf cpc.setDoubleFct(time\_step\_function);}\\ |
---|
2835 | \> {\sf // double variante of the time step function}\\ |
---|
2836 | \> {\sf cpc.setNumberOfSteps(N);}\\ |
---|
2837 | \> {\sf // number of time steps to perform}\\ |
---|
2838 | \> {\sf cpc.setNumberOfCheckpoints(10);}\\ |
---|
2839 | \> {\sf // number of checkpoint} \\ |
---|
2840 | \> {\sf cpc.setDimensionXY(n);}\\ |
---|
2841 | \> {\sf // dimension of input/output}\\ |
---|
2842 | \> {\sf cpc.setInput(y);}\\ |
---|
2843 | \> {\sf // input vector} \\ |
---|
2844 | \> {\sf cpc.setOutput(y);}\\ |
---|
2845 | \> {\sf // output vector }\\ |
---|
2846 | \> {\sf cpc.setTapeNumber(tag\_check);}\\ |
---|
2847 | \> {\sf // subtape number for checkpointing} \\ |
---|
2848 | \> {\sf cpc.setAlwaysRetaping(false);}\\ |
---|
2849 | \> {\sf // always retape or not ?} |
---|
2850 | \end{tabbing} |
---|
2851 | Subsequently, the time loop in the function evaluation can be |
---|
2852 | substituted by a call of the function |
---|
2853 | \medskip |
---|
2854 | |
---|
2855 | \noindent |
---|
2856 | \hspace*{2cm}{\sf int cpc.checkpointing();} |
---|
2857 | \medskip |
---|
2858 | |
---|
2859 | \noindent |
---|
2860 | Then, ADOL-C computes derivative information using the optimal checkpointing |
---|
2861 | strategy provided by {\sf revolve} internally, i.e., completely hidden from the user. |
---|
2862 | |
---|
2863 | The presented driver is prototyped in the header file |
---|
2864 | \verb=<adolc/checkpointing.h>=. This header |
---|
2865 | is included by the global header file \verb=<adolc/adolc.h>= automatically. |
---|
2866 | An example program \verb=checkpointing.cpp= illustrates the |
---|
2867 | checkpointing facilities. It can be found in the directory \verb=examples/additional_examples/checkpointing=. |
---|
2868 | % |
---|
2869 | % |
---|
2870 | % |
---|
2871 | \subsection{Advanced algorithmic differentiation of fixed point iterations} |
---|
2872 | % |
---|
2873 | Quite often, the state of the considered system denoted by $x\in\R^n$ |
---|
2874 | depends on some design parameters denoted by $u\in\R^m$. One example for this setting |
---|
2875 | forms the flow over an aircraft wing. Here, the shape of the wing that |
---|
2876 | is defined by the design vector $u$ |
---|
2877 | determines the flow field $x$. The desired quasi-steady state $x_*$ |
---|
2878 | fulfills the fixed point equation |
---|
2879 | \begin{align} |
---|
2880 | \label{eq:fixedpoint} |
---|
2881 | x_* = F(x_*,u) |
---|
2882 | \end{align} |
---|
2883 | for a given continuously differentiable function |
---|
2884 | $F:\R^n\times\R^m\rightarrow\R^n$. A fixed point property of this kind is |
---|
2885 | also exploited by many other applications. |
---|
2886 | |
---|
2887 | Assume that one can apply the iteration |
---|
2888 | \begin{align} |
---|
2889 | \label{eq:iteration} |
---|
2890 | x_{k+1} = F(x_k,u) |
---|
2891 | \end{align} |
---|
2892 | to obtain a linear converging sequence $\{x_k\}$ generated |
---|
2893 | for any given control $u\in\R^n$. Then the limit point $x_*\in\R^n$ fulfils the fixed |
---|
2894 | point equation~\eqref{eq:fixedpoint}. Moreover, |
---|
2895 | suppose that $\|\frac{dF}{dx}(x_*,u)\|<1$ holds for any pair |
---|
2896 | $(x_*,u)$ satisfying equation \eqref{eq:fixedpoint}. |
---|
2897 | Hence, there exists a |
---|
2898 | differentiable function $\phi:\R^m \rightarrow \R^n$, |
---|
2899 | such that $\phi(u) = F(\phi(u),u)$, where the state |
---|
2900 | $\phi(u)$ is a fixed point of $F$ according to a control |
---|
2901 | $u$. To optimize the system described by the state vector $x=\phi(u)$ with respect to |
---|
2902 | the design vector $u$, derivatives of $\phi$ with respect |
---|
2903 | to $u$ are of particular interest. |
---|
2904 | |
---|
2905 | To exploit the advanced algorithmic differentiation of such fixed point iterations |
---|
2906 | ADOL-C provides the special functions {\tt fp\_iteration(...)}. |
---|
2907 | It has the following interface: |
---|
2908 | \begin{tabbing} |
---|
2909 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2910 | \>{\sf int |
---|
2911 | fp\_iteration(}\={\sf sub\_tape\_num,double\_F,adouble\_F,norm,norm\_deriv,eps,eps\_deriv,}\\ |
---|
2912 | \> \>{\sf N\_max,N\_max\_deriv,x\_0,u,x\_fix,dim\_x,dim\_u)}\\ |
---|
2913 | \hspace{0.5in}\={\sf short int tag;} \hspace{0.9in}\= \kill % define tab position |
---|
2914 | \>{\sf short int sub\_tape\_num;} \> // tape identification for sub\_tape \\ |
---|
2915 | \>{\sf int *double\_F;} \> // pointer to a function that compute for $x$ and $u$ \\ |
---|
2916 | \> \> // the value $y=F(x,u)$ for {\sf double} arguments\\ |
---|
2917 | \>{\sf int *adouble\_F;} \> // pointer to a function that compute for $x$ and $u$ \\ |
---|
2918 | \> \> // the value $y=F(x,u)$ for {\sf double} arguments\\ |
---|
2919 | \>{\sf int *norm;} \> // pointer to a function that computes\\ |
---|
2920 | \> \> // the norm of a vector\\ |
---|
2921 | \>{\sf int *norm\_deriv;} \> // pointer to a function that computes\\ |
---|
2922 | \> \> // the norm of a vector\\ |
---|
2923 | \>{\sf double eps;} \> // termination criterion for fixed point iteration\\ |
---|
2924 | \>{\sf double eps\_deriv;} \> // termination criterion for adjoint fixed point iteration\\ |
---|
2925 | \>{\sf N\_max;} \> // maximal number of itertions for state computation\\ |
---|
2926 | \>{\sf N\_max\_deriv;} \> // maximal number of itertions for adjoint computation\\ |
---|
2927 | \>{\sf adouble *x\_0;} \> // inital state of fixed point iteration\\ |
---|
2928 | \>{\sf adouble *u;} \> // value of $u$\\ |
---|
2929 | \>{\sf adouble *x\_fic;} \> // final state of fixed point iteration\\ |
---|
2930 | \>{\sf int dim\_x;} \> // dimension of $x$\\ |
---|
2931 | \>{\sf int dim\_u;} \> // dimension of $u$\\ |
---|
2932 | \end{tabbing} |
---|
2933 | % |
---|
2934 | Here {\tt sub\_tape\_num} is an ADOL-C identifier for the subtape that |
---|
2935 | should be used for the fixed point iteration. |
---|
2936 | {\tt double\_F} and {\tt adouble\_F} are pointers to functions, that |
---|
2937 | compute for $x$ and $u$ a single iteration step $y=F(x,u)$. Thereby |
---|
2938 | {\tt double\_F} uses {\tt double} arguments and {\tt adouble\_F} |
---|
2939 | uses ADOL-C {\tt adouble} arguments. The parameters {\tt norm} and |
---|
2940 | {\tt norm\_deriv} are pointers to functions computing the norm |
---|
2941 | of a vector. The latter functions together with {\tt eps}, |
---|
2942 | {\tt eps\_deriv}, {\tt N\_max}, and {\tt N\_max\_deriv} control |
---|
2943 | the iterations. Thus the following loops are performed: |
---|
2944 | \begin{center} |
---|
2945 | \begin{tabular}{ll} |
---|
2946 | do & do \\ |
---|
2947 | ~~~~$k = k+1$ & ~~~~$k = k+1$ \\ |
---|
2948 | ~~~~$x = y$ & ~~~~$\zeta = \xi$ \\ |
---|
2949 | ~~~~$y = F(x,u)$ & ~~~ |
---|
2950 | $(\xi^T,\bar u^T) = \zeta^TF'(x_*,u) + (\bar x^T, 0^T)$ \\ |
---|
2951 | while $\|y-x\|\geq\varepsilon$ and $k\leq N_{max}$ \hspace*{0.5cm} & |
---|
2952 | while $\|\xi -\zeta\|_{deriv}\geq\varepsilon_{deriv}$ \\ |
---|
2953 | & and $k\leq N_{max,deriv}$ |
---|
2954 | \end{tabular} |
---|
2955 | \end{center} |
---|
2956 | The vector for the initial iterate and the control is stored |
---|
2957 | in {\tt x\_0} and {\tt u} respectively. The vector in which the |
---|
2958 | fixed point is stored is {\tt x\_fix}. Finally {\tt dim\_x} |
---|
2959 | and {\tt dim\_u} represent the dimensions $n$ and $m$ of the |
---|
2960 | corresponding vectors. |
---|
2961 | |
---|
2962 | The presented driver is prototyped in the header file |
---|
2963 | \verb=<adolc/fixpoint.h>=. This header |
---|
2964 | is included by the global header file \verb=<adolc/adolc.h>= automatically. |
---|
2965 | An example code that shows also the |
---|
2966 | expected signature of the function pointers is contained in the directory \verb=examples/additional_examples/fixpoint_exam=. |
---|
2967 | % |
---|
2968 | \subsection{Advanced algorithmic differentiation of OpenMP parallel programs} |
---|
2969 | % |
---|
2970 | ADOL-C allows to compute derivatives in parallel for functions |
---|
2971 | containing OpenMP parallel loops. |
---|
2972 | This implies that an explicit loop-handling approach is applied. A |
---|
2973 | typical situation is shown in \autoref{fig:basic_layout}, |
---|
2974 | \begin{figure}[hbt] |
---|
2975 | \vspace{3ex} |
---|
2976 | \begin{center} |
---|
2977 | \includegraphics[height=4cm]{multiplexed} \\ |
---|
2978 | \begin{picture}(0,0) |
---|
2979 | \put(-48,40){\vdots} |
---|
2980 | \put(48,40){\vdots} |
---|
2981 | \put(-48,80){\vdots} |
---|
2982 | \put(48,80){\vdots} |
---|
2983 | \put(-83,132){function eval.} |
---|
2984 | \put(5,132){derivative calcul.} |
---|
2985 | \end{picture} |
---|
2986 | \end{center} |
---|
2987 | \vspace{-5ex} |
---|
2988 | \caption{Basic layout of mixed function and the corresponding derivation process} |
---|
2989 | \label{fig:basic_layout} |
---|
2990 | \end{figure} |
---|
2991 | where the OpenMP-parallel loop is preceded by a serial startup |
---|
2992 | calculation and followed by a serial finalization phase. |
---|
2993 | |
---|
2994 | Initialization of the OpenMP-parallel regions for \mbox{ADOL-C} is only a matter of adding a macro to the outermost OpenMP statement. |
---|
2995 | Two macros are available that only differ in the way the global tape information is handled. |
---|
2996 | Using {\tt ADOLC\_OPENMP}, this information, including the values of the augmented variables, is always transferred from the serial to the parallel region using {\it firstprivate} directives for initialization. |
---|
2997 | For the special case of iterative codes where parallel regions, working on the same data structures, are called repeatedly the {\tt ADOLC\_OPENMP\_NC} macro can be used. |
---|
2998 | Then, the information transfer is performed only once within the iterative process upon encounter of the first parallel region through use of the {\it threadprivate} feature of OpenMP that makes use of thread-local storage, i.e., global memory local to a thread. |
---|
2999 | Due to the inserted macro, the OpenMP statement has the following structure: |
---|
3000 | \begin{tabbing} |
---|
3001 | \hspace*{1cm} \= {\sf \#pragma omp ... ADOLC\_OPENMP} \qquad \qquad or \\ |
---|
3002 | \> {\sf \#pragma omp ... ADOLC\_OPENMP\_NC} |
---|
3003 | \end{tabbing} |
---|
3004 | Inside the parallel region, separate tapes may then be created. |
---|
3005 | Each single thread works in its own dedicated AD-environment, and all |
---|
3006 | serial facilities of \mbox{ADOL-C} are applicable as usual. The global |
---|
3007 | derivatives can be computed using the tapes created in the serial and |
---|
3008 | parallel parts of the function evaluation, where user interaction is |
---|
3009 | required for the correct derivative concatenation of the various tapes. |
---|
3010 | |
---|
3011 | For the usage of the parallel facilities, the \verb=configure=-command |
---|
3012 | has to be used with the option \verb?--with-openmp-flag=FLAG?, where |
---|
3013 | \verb=FLAG= stands for the system dependent OpenMP flag. |
---|
3014 | The parallel differentiation of a parallel program is illustrated |
---|
3015 | by the example program \verb=openmp_exam.cpp= contained in \verb=examples/additional_examples/openmp_exam=. |
---|
3016 | % |
---|
3017 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
3018 | % |
---|
3019 | \section{Tapeless forward differentiation in ADOL-C} |
---|
3020 | \label{tapeless} |
---|
3021 | % |
---|
3022 | Up to version 1.9.0, the development of the ADOL-C software package |
---|
3023 | was based on the decision to store all data necessary for derivative |
---|
3024 | computation on tapes, where large applications require the tapes to be |
---|
3025 | written out to corresponding files. In almost all cases this means |
---|
3026 | a considerable drawback in terms of run time due to the excessive |
---|
3027 | memory accesses. Using these tapes enables ADOL-C to offer multiple |
---|
3028 | functions. However, it is not necessary for all tasks of derivative |
---|
3029 | computation to do that. |
---|
3030 | |
---|
3031 | Starting with version 1.10.0, ADOL-C now features a tapeless forward |
---|
3032 | mode for computing first order derivatives in scalar mode, i.e., |
---|
3033 | $\dot{y} = F'(x)\dot{x}$, and in vector mode, i.e., $\dot{Y} = F'(x)\dot{X}$. |
---|
3034 | This tapeless variant coexists with the more universal |
---|
3035 | tape based mode in the package. The following subsections describe |
---|
3036 | the source code modifications required to use the tapeless forward mode of |
---|
3037 | ADOL-C. |
---|
3038 | % |
---|
3039 | \subsection{Modifying the Source Code} |
---|
3040 | % |
---|
3041 | Let us consider the coordinate transformation from Cartesian to spherical |
---|
3042 | polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y |
---|
3043 | = F(x)$, with |
---|
3044 | \begin{eqnarray*} |
---|
3045 | y_1 = \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad |
---|
3046 | y_2 = \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad |
---|
3047 | y_3 = \arctan(x_2/x_1), |
---|
3048 | \end{eqnarray*} |
---|
3049 | as an example. The corresponding source code is shown in \autoref{fig:tapeless}. |
---|
3050 | \begin{figure}[htb] |
---|
3051 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
3052 | %\begin{center} |
---|
3053 | %\begin{flushleft} |
---|
3054 | \begin{tabbing} |
---|
3055 | \= \kill |
---|
3056 | \> {\sf \#include} {\sf $<$iostream$>$}\\ |
---|
3057 | \> {\sf using namespace std;}\\ |
---|
3058 | \> \\ |
---|
3059 | \> {\sf int main() \{}\\ |
---|
3060 | \> {\sf \rule{0.5cm}{0pt}double x[3], y[3];}\\ |
---|
3061 | \> \\ |
---|
3062 | \> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\ |
---|
3063 | \> {\sf \rule{1cm}{0pt}...}\\ |
---|
3064 | \> \\ |
---|
3065 | \> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\ |
---|
3066 | \> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\ |
---|
3067 | \> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\ |
---|
3068 | \> \\ |
---|
3069 | \> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0] $<<$ " , y2=" $<<$ y[1] $<<$ " , y3=" $<<$ y[2] $<<$ endl;}\\ |
---|
3070 | \> \\ |
---|
3071 | \> {\sf \rule{0.5cm}{0pt}return 0;}\\ |
---|
3072 | \> \} |
---|
3073 | \end{tabbing} |
---|
3074 | %\end{flushleft} |
---|
3075 | %\end{center} |
---|
3076 | }} |
---|
3077 | \caption{Example for tapeless forward mode} |
---|
3078 | \label{fig:tapeless} |
---|
3079 | \end{figure} |
---|
3080 | % |
---|
3081 | Changes to the source code that are necessary for applying the |
---|
3082 | tapeless forward ADOL-C are described in the following two |
---|
3083 | subsections, where the vector mode version is described |
---|
3084 | as extension of the scalar mode. |
---|
3085 | % |
---|
3086 | \subsubsection*{The scalar mode} |
---|
3087 | % |
---|
3088 | To use the tapeless forward mode, one has to include one |
---|
3089 | of the header files \verb#adolc.h# or \verb#adouble.h# |
---|
3090 | where the latter should be preferred since it does not include the |
---|
3091 | tape based functions defined in other header files. Hence, including |
---|
3092 | \verb#adouble.h# avoids mode mixtures, since |
---|
3093 | \verb#adolc.h# is just a wrapper for including all public |
---|
3094 | headers of the ADOL-C package and does not offer own functions. |
---|
3095 | Since the two ADOL-C forward mode variants tape-based and tapeless, |
---|
3096 | are prototyped in the same header file, the compiler needs to know if a |
---|
3097 | tapeless version is intended. This can be done by defining a |
---|
3098 | preprocessor macro named {\sf ADOLC\_TAPELESS}. Note that it is |
---|
3099 | important to define this macro before the header file is included. |
---|
3100 | Otherwise, the tape-based version of ADOL-C will be used. |
---|
3101 | |
---|
3102 | As in the tape based forward version of ADOL-C all derivative |
---|
3103 | calculations are introduced by calls to overloaded |
---|
3104 | operators. Therefore, similar to the tape-based version all |
---|
3105 | independent, intermediate and dependent variables must be declared |
---|
3106 | with type {\sf adouble}. The whole tapeless functionality provided by |
---|
3107 | \verb#adolc.h# was written as complete inline intended code |
---|
3108 | due to run time aspects, where the real portion of inlined code can |
---|
3109 | be influenced by switches for many compilers. Likely, the whole |
---|
3110 | derivative code is inlined by default. Our experiments |
---|
3111 | with the tapeless mode have produced complete inlined code by using |
---|
3112 | standard switches (optimization) for GNU and Intel C++ |
---|
3113 | compiler. |
---|
3114 | |
---|
3115 | To avoid name conflicts |
---|
3116 | resulting from the inlining the tapeless version has its own namespace |
---|
3117 | \verb#adtl#. As a result four possibilities of using the {\sf adouble} |
---|
3118 | type are available for the tapeless version: |
---|
3119 | \begin{itemize} |
---|
3120 | \item Defining a new type |
---|
3121 | \begin{center} |
---|
3122 | \begin{tabular}{l} |
---|
3123 | {\sf typedef adtl::adouble adouble;}\\ |
---|
3124 | ...\\ |
---|
3125 | {\sf adouble tmp;} |
---|
3126 | \end{tabular} |
---|
3127 | \end{center} |
---|
3128 | This is the preferred way. Remember, you can not write an own |
---|
3129 | {\sf adouble} type/class with different meaning after doing the typedef. |
---|
3130 | \item Declaring with namespace prefix |
---|
3131 | \begin{center} |
---|
3132 | \begin{tabular}{l} |
---|
3133 | {\sf adtl::adouble tmp;} |
---|
3134 | \end{tabular} |
---|
3135 | \end{center} |
---|
3136 | Not the most handsome and efficient way with respect to coding |
---|
3137 | but without any doubt one of the safest ways. The identifier |
---|
3138 | {\sf adouble} is still available for user types/classes. |
---|
3139 | \item Trusting macros |
---|
3140 | \begin{center} |
---|
3141 | \begin{tabular}{l} |
---|
3142 | {\sf \#define adouble adtl::adouble}\\ |
---|
3143 | ...\\ |
---|
3144 | {\sf adouble tmp;} |
---|
3145 | \end{tabular} |
---|
3146 | \end{center} |
---|
3147 | This approach should be used with care, since standard defines are text replacements. |
---|
3148 | \item Using the complete namespace |
---|
3149 | \begin{center} |
---|
3150 | \begin{tabular}{l} |
---|
3151 | {\sf \#using namespace adtl;}\\ |
---|
3152 | ...\\ |
---|
3153 | {\sf adouble tmp;} |
---|
3154 | \end{tabular} |
---|
3155 | \end{center} |
---|
3156 | A very clear approach with the disadvantage of uncovering all the hidden secrets. Name conflicts may arise! |
---|
3157 | \end{itemize} |
---|
3158 | After defining the variables only two things are left to do. First |
---|
3159 | one needs to initialize the values of the independent variables for the |
---|
3160 | function evaluation. This can be done by assigning the variables a {\sf |
---|
3161 | double} value. The {\sf ad}-value is set to zero in this case. |
---|
3162 | Additionally, the tapeless forward mode variant of ADOL-C |
---|
3163 | offers a function named {\sf setValue} for setting the value without |
---|
3164 | changing the {\sf ad}-value. To set the {\sf ad}-values of the independent |
---|
3165 | variables ADOL-C offers two possibilities: |
---|
3166 | \begin{itemize} |
---|
3167 | \item Using the constructor |
---|
3168 | \begin{center} |
---|
3169 | \begin{tabular}{l} |
---|
3170 | {\sf adouble x1(2,1), x2(4,0), y;} |
---|
3171 | \end{tabular} |
---|
3172 | \end{center} |
---|
3173 | This would create three adoubles $x_1$, $x_2$ and $y$. Obviously, the latter |
---|
3174 | remains uninitialized. In terms of function evaluation |
---|
3175 | $x_1$ holds the value 2 and $x_2$ the value 4 whereas the derivative values |
---|
3176 | are initialized to $\dot{x}_1=1$ and $\dot{x}_2=0$. |
---|
3177 | \item Setting point values directly |
---|
3178 | \begin{center} |
---|
3179 | \begin{tabular}{l} |
---|
3180 | {\sf adouble x1=2, x2=4, y;}\\ |
---|
3181 | ...\\ |
---|
3182 | {\sf x1.setADValue(1);}\\ |
---|
3183 | {\sf x2.setADValue(0);} |
---|
3184 | \end{tabular} |
---|
3185 | \end{center} |
---|
3186 | The same example as above but now using {\sf setADValue}-method for initializing the derivative values. |
---|
3187 | \end{itemize} |
---|
3188 | % |
---|
3189 | The derivatives can be obtained at any time during the evaluation |
---|
3190 | process by calling the {\sf getADValue}-method |
---|
3191 | \begin{center} |
---|
3192 | \begin{tabular}{l} |
---|
3193 | {\sf adouble y;}\\ |
---|
3194 | ...\\ |
---|
3195 | {\sf cout $<<$ y.getADValue();} |
---|
3196 | \end{tabular} |
---|
3197 | \end{center} |
---|
3198 | \autoref{fig:modcode} shows the resulting source code incorporating |
---|
3199 | all required changes for the example |
---|
3200 | given above. |
---|
3201 | |
---|
3202 | \begin{figure}[htb] |
---|
3203 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
3204 | %\begin{center} |
---|
3205 | \begin{tabbing} |
---|
3206 | \hspace*{-1cm} \= \kill |
---|
3207 | \> {\sf \#include $<$iostream$>$}\\ |
---|
3208 | \> {\sf using namespace std;}\\ |
---|
3209 | \> \\ |
---|
3210 | \> {\sf \#define ADOLC\_TAPELESS}\\ |
---|
3211 | \> {\sf \#include $<$adouble.h$>$}\\ |
---|
3212 | \> {\sf typedef adtl::adouble adouble;}\\ |
---|
3213 | \\ |
---|
3214 | \> {\sf int main() \{}\\ |
---|
3215 | \> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\ |
---|
3216 | \\ |
---|
3217 | \> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\ |
---|
3218 | \> {\sf \rule{1cm}{0pt}...}\\ |
---|
3219 | \\ |
---|
3220 | \> {\sf \rule{0.5cm}{0pt}x[0].setADValue(1);\hspace*{3cm}// derivative of f with respect to $x_1$}\\ |
---|
3221 | \> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\ |
---|
3222 | \> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\ |
---|
3223 | \> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\ |
---|
3224 | \\ |
---|
3225 | \> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\ |
---|
3226 | \> {\sf \rule{0.5cm}{0pt}cout $<<$ "dy2/dx1 = " $<<$ y[1].getADValue() $<<$ endl;}\\ |
---|
3227 | \> {\sf \rule{0.5cm}{0pt}return 0;}\\ |
---|
3228 | \> {\sf \}} |
---|
3229 | \end{tabbing} |
---|
3230 | %\end{center} |
---|
3231 | }} |
---|
3232 | \caption{Example for tapeless scalar forward mode} |
---|
3233 | \label{fig:modcode} |
---|
3234 | \end{figure} |
---|
3235 | % |
---|
3236 | \subsubsection*{The vector mode} |
---|
3237 | % |
---|
3238 | In scalar mode only one direction element has to be stored per {\sf |
---|
3239 | adouble} whereas a field of $p$ elements is needed in the vector |
---|
3240 | mode to cover the computations for the given $p$ directions. The |
---|
3241 | resulting changes to the source code are described in this section. |
---|
3242 | |
---|
3243 | Similar to tapeless scalar forward mode, the tapeless vector forward |
---|
3244 | mode is used by defining {\sf ADOLC\_TAPELESS}. Furthermore, one has to define |
---|
3245 | an additional preprocessor macro named {\sf NUMBER\_DIRECTIONS}. This |
---|
3246 | macro takes the maximal number of directions to be used within the |
---|
3247 | resulting vector mode. Just as {\sf ADOLC\_TAPELESS} the new macro |
---|
3248 | must be defined before including the \verb#<adolc.h/adouble.h># |
---|
3249 | header file since it is ignored otherwise. |
---|
3250 | |
---|
3251 | In many situations recompiling the source code to get a new number of |
---|
3252 | directions is at least undesirable. ADOL-C offers a function named |
---|
3253 | {\sf setNumDir} to work around this problem partially. Calling this |
---|
3254 | function, ADOL-C does not take the number of directions |
---|
3255 | from the macro {\sf NUMBER\_DIRECTIONS} but from the argument of |
---|
3256 | {\sf setNumDir}. A corresponding source code would contain the following lines: |
---|
3257 | \begin{center} |
---|
3258 | \begin{tabular}{l} |
---|
3259 | {\sf \#define NUMBER\_DIRECTIONS 10}\\ |
---|
3260 | ...\\ |
---|
3261 | {\sf adtl::setNumDir(5);} |
---|
3262 | \end{tabular} |
---|
3263 | \end{center} |
---|
3264 | Note that using this function does not |
---|
3265 | change memory requirements that can be roughly determined by |
---|
3266 | ({\sf NUMBER\_DIRECTIONS}$+1$)*(number of {\sf adouble}s). |
---|
3267 | |
---|
3268 | Compared to the scalar case setting and getting the derivative |
---|
3269 | values, i.e. the directions, is more involved. Instead of |
---|
3270 | working with single {\sf double} values, pointer to fields of {\sf |
---|
3271 | double}s are used as illustrated by the following example: |
---|
3272 | \begin{center} |
---|
3273 | \begin{tabular}{l} |
---|
3274 | {\sf \#define NUMBER\_DIRECTIONS 10}\\ |
---|
3275 | ...\\ |
---|
3276 | {\sf adouble x, y;}\\ |
---|
3277 | {\sf double *ptr=new double[NUMBER\_DIRECTIONS];}\\ |
---|
3278 | ...\\ |
---|
3279 | {\sf x1=2;}\\ |
---|
3280 | {\sf x1.setADValue(ptr);}\\ |
---|
3281 | ...\\ |
---|
3282 | {\sf ptr=y.getADValue();} |
---|
3283 | \end{tabular} |
---|
3284 | \end{center} |
---|
3285 | Additionally, the tapeless vector forward mode of ADOL-C offers two |
---|
3286 | new methods for setting/getting the derivative values. Similar |
---|
3287 | to the scalar case, {\sf double} values are used but due to the vector |
---|
3288 | mode the position of the desired vector element must be supplied in |
---|
3289 | the argument list: |
---|
3290 | \begin{center} |
---|
3291 | \begin{tabular}{l} |
---|
3292 | {\sf \#define NUMBER\_DIRECTIONS 10}\\ |
---|
3293 | ...\\ |
---|
3294 | {\sf adouble x, y;}\\ |
---|
3295 | ...\\ |
---|
3296 | {\sf x1=2;}\\ |
---|
3297 | {\sf x1.setADValue(5,1);\hspace*{3.7cm}// set the 6th point value of x to 1.0}\\ |
---|
3298 | ...\\ |
---|
3299 | {\sf cout $<<$ y.getADValue(3) $<<$ endl;\hspace*{1cm}// print the 4th derivative value of y} |
---|
3300 | \end{tabular} |
---|
3301 | \end{center} |
---|
3302 | The resulting source code containing all changes that are required is |
---|
3303 | shown in \autoref{fig:modcode2} |
---|
3304 | |
---|
3305 | \begin{figure}[!h!t!b] |
---|
3306 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
3307 | \begin{tabbing} |
---|
3308 | \hspace*{-1cm} \= \kill |
---|
3309 | \> {\sf \#include $<$iostream$>$}\\ |
---|
3310 | \> {\sf using namespace std;}\\ |
---|
3311 | \\ |
---|
3312 | \> {\sf \#define ADOLC\_TAPELESS}\\ |
---|
3313 | \> {\sf \#define NUMBER\_DIRECTIONS 3}\\ |
---|
3314 | \> {\sf \#include $<$adouble.h$>$}\\ |
---|
3315 | \> {\sf typedef adtl::adouble adouble;}\\ |
---|
3316 | \\ |
---|
3317 | \> {\sf ADOLC\_TAPELESS\_UNIQUE\_INTERNALS;}\\ |
---|
3318 | \\ |
---|
3319 | \> {\sf int main() \{}\\ |
---|
3320 | \> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\ |
---|
3321 | \\ |
---|
3322 | \> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\ |
---|
3323 | \> {\sf \rule{1cm}{0pt}...\hspace*{3cm}// Initialize $x_i$}\\ |
---|
3324 | \> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j) if (i==j) x[i].setADValue(j,1);}\\ |
---|
3325 | \> {\sf \rule{0.5cm}{0pt}\}}\\ |
---|
3326 | \\ |
---|
3327 | \> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\ |
---|
3328 | \> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\ |
---|
3329 | \> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\ |
---|
3330 | \\ |
---|
3331 | \> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\ |
---|
3332 | \> {\sf \rule{0.5cm}{0pt}cout $<<$ "jacobian : " $<<$ endl;}\\ |
---|
3333 | \> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\ |
---|
3334 | \> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j)}\\ |
---|
3335 | \> {\sf \rule{1.5cm}{0pt}cout $<<$ y[i].getADValue(j) $<<$ " ";}\\ |
---|
3336 | \> {\sf \rule{1cm}{0pt}cout $<<$ endl;}\\ |
---|
3337 | \> {\sf \rule{0.5cm}{0pt}\}}\\ |
---|
3338 | \> {\sf \rule{0.5cm}{0pt}return 0;}\\ |
---|
3339 | \> {\sf \}} |
---|
3340 | \end{tabbing} |
---|
3341 | }} |
---|
3342 | \caption{Example for tapeless vector forward mode} |
---|
3343 | \label{fig:modcode2} |
---|
3344 | \end{figure} |
---|
3345 | % |
---|
3346 | \subsection{Compiling and Linking the Source Code} |
---|
3347 | % |
---|
3348 | After incorporating the required changes, one has to compile the |
---|
3349 | source code and link the object files to get the executable. |
---|
3350 | As long as the ADOL-C header files are not included in the absolute path |
---|
3351 | the compile sequence should be similar to the following example: |
---|
3352 | \begin{center} |
---|
3353 | \begin{tabular}{l} |
---|
3354 | {\sf g++ -I/home/username/adolc\_base/include -c tapeless\_scalar.cpp} |
---|
3355 | \end{tabular} |
---|
3356 | \end{center} |
---|
3357 | The \verb#-I# option tells the compiler where to search for the ADOL-C |
---|
3358 | header files. This option can be omitted when the headers are included |
---|
3359 | with absolute path or if ADOL-C is installed in a ``global'' directory. |
---|
3360 | |
---|
3361 | Since the tapeless forward version of ADOL-C is implemented in the |
---|
3362 | header \verb#adouble.h# as complete inline intended version, |
---|
3363 | the object files do not need to be linked against any external ADOL-C |
---|
3364 | code or the ADOL-C library. Therefore, the example started above could be finished with the |
---|
3365 | following command: |
---|
3366 | \begin{center} |
---|
3367 | \begin{tabular}{l} |
---|
3368 | {\sf g++ -o tapeless\_scalar tapeless\_scalar.o} |
---|
3369 | \end{tabular} |
---|
3370 | \end{center} |
---|
3371 | The mentioned source codes {\sf tapeless\_scalar.c} and {\sf tapeless\_vector.c} |
---|
3372 | illustrating the use of the for tapeless scalar and vector mode can be found in |
---|
3373 | the directory {\sf examples}. |
---|
3374 | % |
---|
3375 | \subsection{Concluding Remarks for the Tapeless Forward Mode Variant} |
---|
3376 | % |
---|
3377 | As many other AD methods the tapeless forward mode provided by the |
---|
3378 | ADOL-C package has its own strengths and drawbacks. Please read the |
---|
3379 | following section carefully to become familiar with the things that |
---|
3380 | can occur: |
---|
3381 | \begin{itemize} |
---|
3382 | \item Advantages: |
---|
3383 | \begin{itemize} |
---|
3384 | \item Code speed\\ |
---|
3385 | Increasing computation speed was one of the main aspects in writing |
---|
3386 | the tapeless code. In many cases higher performance can be |
---|
3387 | expected this way. |
---|
3388 | \item Easier linking process\\ |
---|
3389 | As another result from the code inlining the object code does |
---|
3390 | not need to be linked against an ADOL-C library. |
---|
3391 | \item Smaller overall memory requirements\\ |
---|
3392 | Tapeless ADOL-C does not write tapes anymore, as the name |
---|
3393 | implies. Loop ''unrolling'' can be avoided this |
---|
3394 | way. Considered main memory plus disk space as overall memory |
---|
3395 | requirements the tapeless version can be |
---|
3396 | executed in a more efficient way. |
---|
3397 | \end{itemize} |
---|
3398 | \item Drawbacks: |
---|
3399 | \begin{itemize} |
---|
3400 | \item Main memory limitations\\ |
---|
3401 | The ability to compute derivatives to a given function is |
---|
3402 | bounded by the main memory plus swap size when using |
---|
3403 | tapeless ADOL-C. Computation from swap should be avoided anyway |
---|
3404 | as far as possible since it slows down the computing time |
---|
3405 | drastically. Therefore, if the program execution is |
---|
3406 | terminated without error message insufficient memory size can be |
---|
3407 | the reason among other things. The memory requirements $M$ can |
---|
3408 | be determined roughly as followed: |
---|
3409 | \begin{itemize} |
---|
3410 | \item Scalar mode: $M=$(number of {\sf adouble}s)$*2 + M_p$ |
---|
3411 | \item Vector mode: $M=$(number of {\sf adouble}s)*({\sf |
---|
3412 | NUMBER\_DIRECTIONS}$+1) + M_p$ |
---|
3413 | \end{itemize} |
---|
3414 | where the storage size of all non {\sf adouble} based variables is described by $M_p$. |
---|
3415 | \item Compile time\\ |
---|
3416 | As discussed in the previous sections, the tapeless forward mode of |
---|
3417 | the ADOL-C package is implemented as inline intended version. Using |
---|
3418 | this approach results in a higher source code size, since every |
---|
3419 | operation involving at least one {\sf adouble} stands for the |
---|
3420 | operation itself as well as for the corresponding derivative |
---|
3421 | code after the inlining process. Therefore, the compilation time |
---|
3422 | needed for the tapeless version may be higher than that of the tape based code. |
---|
3423 | \item Code Size\\ |
---|
3424 | A second drawback and result of the code inlining is the |
---|
3425 | increase of code sizes for the binaries. The increase |
---|
3426 | factor compared to the corresponding tape based program is |
---|
3427 | difficult to quantify as it is task dependent. Practical results |
---|
3428 | have shown that values between 1.0 and 2.0 can be |
---|
3429 | expected. Factors higher than 2.0 are possible too and even |
---|
3430 | values below 1.0 have been observed. |
---|
3431 | \end{itemize} |
---|
3432 | \end{itemize} |
---|
3433 | % |
---|
3434 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
3435 | \section{Installing and Using ADOL-C} |
---|
3436 | \label{install} |
---|
3437 | % |
---|
3438 | \subsection{Generating the ADOL-C Library} |
---|
3439 | \label{genlib} |
---|
3440 | % |
---|
3441 | The currently built system is best summarized by the ubiquitous gnu |
---|
3442 | install triplet |
---|
3443 | \begin{center} |
---|
3444 | \verb=configure - make - make install= . |
---|
3445 | \end{center} |
---|
3446 | Executing this three steps from the package base directory |
---|
3447 | \verb=</SOMEPATH/=\texttt{\packagetar}\verb=>= will compile the static and the dynamic |
---|
3448 | ADOL-C library with default options and install the package (libraries |
---|
3449 | and headers) into the default installation directory {\tt |
---|
3450 | \verb=<=\$HOME/adolc\_base\verb=>=}. Inside the install directory |
---|
3451 | the subdirectory \verb=include= will contain all the installed header |
---|
3452 | files that may be included by the user program, the subdirectory |
---|
3453 | \verb=lib= will contain the 32-bit compiled library |
---|
3454 | and the subdirectory \verb=lib64= will contain the 64-bit compiled |
---|
3455 | library. Depending on the compiler only one of \verb=lib= or |
---|
3456 | \verb=lib64= may be created. |
---|
3457 | |
---|
3458 | Before doing so the user may modify the header file \verb=usrparms.h= |
---|
3459 | in order to tailor the \mbox{ADOL-C} package to the needs in the |
---|
3460 | particular system environment as discussed in |
---|
3461 | \autoref{Customizing}. The configure procedure which creates the necessary |
---|
3462 | \verb=Makefile=s can be customized by use of some switches. Available |
---|
3463 | options and their meaning can be obtained by executing |
---|
3464 | \verb=./configure --help= from the package base directory. |
---|
3465 | |
---|
3466 | All object files and other intermediately generated files can be |
---|
3467 | removed by the call \verb=make clean=. Uninstalling ADOL-C by |
---|
3468 | executing \verb=make uninstall= is only reasonable after a previous |
---|
3469 | called \verb=make install= and will remove all installed package files |
---|
3470 | but will leave the created directories behind. |
---|
3471 | |
---|
3472 | The sparse drivers are included in the ADOL-C libraries if the |
---|
3473 | \verb=./configure= command is executed with the option |
---|
3474 | \verb=--enable-sparse=. The ColPack library available at |
---|
3475 | \verb=http://www.cscapes.org/coloringpage/software.htm= is required to |
---|
3476 | compute the sparse structures, and is searched for in all the default |
---|
3477 | locations as well as in the subdirectory \verb=<ThirdParty/ColPack/>=. |
---|
3478 | In case the library and its headers are installed in a nonstandard path |
---|
3479 | this may be specified with the \verb?--with-colpack=PATH? option. |
---|
3480 | It is assumed that the library and its header files have the following |
---|
3481 | directory structure: \verb?PATH/include? contains all the header |
---|
3482 | files, |
---|
3483 | \verb?PATH/lib? contains the 32-bit compiled library and |
---|
3484 | \verb?PATH/lib64? contains the 64-bit compiled library. Depending on |
---|
3485 | the compiler used to compile {\sf ADOL-C} one of these libraries will |
---|
3486 | be used for linking. |
---|
3487 | |
---|
3488 | The option \verb=--disable-stdczero= turns off the initialization in the {\sf adouble} default |
---|
3489 | constructor. This will improve efficiency but requires that there be no implicit array initialization in the code, see |
---|
3490 | \autoref{WarSug}. |
---|
3491 | |
---|
3492 | |
---|
3493 | \subsection{Compiling and Linking the Example Programs} |
---|
3494 | % |
---|
3495 | The installation procedure described in \autoref{genlib} also |
---|
3496 | provides the \verb=Makefile=s to compile the example programs in the |
---|
3497 | directories \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= and the |
---|
3498 | additional examples in |
---|
3499 | \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples/additional_examples=. However, |
---|
3500 | one has to execute the |
---|
3501 | \verb=configure= command with appropriate options for the ADOL-C package to enable the compilation of |
---|
3502 | examples. Available options are: |
---|
3503 | \begin{center} |
---|
3504 | \begin{tabular}[t]{ll} |
---|
3505 | \verb=--enable-docexa=&build all examples discussed in this manual\\ |
---|
3506 | &(compare \autoref{example})\\ |
---|
3507 | \verb=--enable-addexa=&build all additional examples\\ |
---|
3508 | &(See file \verb=README= in the various subdirectories) |
---|
3509 | \end{tabular} |
---|
3510 | \end{center} |
---|
3511 | |
---|
3512 | Just calling \verb=make= from the packages base directory generates |
---|
3513 | all configured examples and the library if necessary. Compiling from |
---|
3514 | subdirectory \verb=examples= or one of its subfolders is possible |
---|
3515 | too. At least one kind of the ADOL-C library (static or shared) must |
---|
3516 | have been built previously in that case. Hence, building the library |
---|
3517 | is always the first step. |
---|
3518 | |
---|
3519 | For Compiling the library and the documented examples on Windows using |
---|
3520 | Visual Studio please refer to the \verb=<Readme_VC++.txt>= files in |
---|
3521 | the \verb=<windows/>=, \verb=<ThirdParty/ColPack/>= and |
---|
3522 | \verb=<ADOL-C/examples/>= subdirectories. |
---|
3523 | % |
---|
3524 | \subsection{Description of Important Header Files} |
---|
3525 | \label{ssec:DesIH} |
---|
3526 | % |
---|
3527 | The application of the facilities of ADOL-C requires the user |
---|
3528 | source code (program or module) to include appropriate |
---|
3529 | header files where the desired data types and routines are |
---|
3530 | prototyped. The new hierarchy of header files enables the user |
---|
3531 | to take one of two possible ways to access the right interfaces. |
---|
3532 | The first and easy way is recommended to beginners: As indicated in |
---|
3533 | \autoref{globalHeaders} the provided {\em global} header file |
---|
3534 | \verb=<adolc/adolc.h>= can be included by any user code to support all |
---|
3535 | capabilities of ADOL-C depending on the particular programming language |
---|
3536 | of the source. |
---|
3537 | |
---|
3538 | \begin{table}[h] |
---|
3539 | \center \small |
---|
3540 | \begin{tabular}{|p{3.6cm}|p{10.5cm}|}\hline |
---|
3541 | \verb=<adolc/adolc.h>= & |
---|
3542 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
3543 | \boldmath $\rightarrow$ \unboldmath |
---|
3544 | & global header file available for easy use of ADOL-C; \\ |
---|
3545 | $\bullet$ & includes all ADOL-C header files depending on |
---|
3546 | whether the users source is C++ or C code. |
---|
3547 | \end{tabular*} |
---|
3548 | \\ \hline |
---|
3549 | \verb=<adolc/usrparms.h>= & |
---|
3550 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
3551 | \boldmath $\rightarrow$ \unboldmath |
---|
3552 | & user customization of ADOL-C package (see |
---|
3553 | \autoref{Customizing}); \\ |
---|
3554 | $\bullet$ & after a change of |
---|
3555 | user options the ADOL-C library \verb=libadolc.*= |
---|
3556 | has to be rebuilt (see \autoref{genlib}); \\ |
---|
3557 | $\bullet$ & is included by all ADOL-C header files and thus by all user |
---|
3558 | programs. |
---|
3559 | \end{tabular*} \\ \hline |
---|
3560 | \end{tabular} |
---|
3561 | \caption{Global header files} |
---|
3562 | \label{globalHeaders} |
---|
3563 | \end{table} |
---|
3564 | |
---|
3565 | The second way is meant for the more advanced ADOL-C user: Some source code |
---|
3566 | includes only those interfaces used by the particular application. |
---|
3567 | The respectively needed header files are indicated |
---|
3568 | throughout the manual. |
---|
3569 | Existing application determined dependences between the provided |
---|
3570 | ADOL-C routines are realized by automatic includes of headers in order |
---|
3571 | to maintain easy use. The header files important to the user are described |
---|
3572 | in the \autoref{importantHeaders1} and \autoref{importantHeaders2}. |
---|
3573 | |
---|
3574 | \begin{table}[h] |
---|
3575 | \center \small |
---|
3576 | \begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline |
---|
3577 | %\multicolumn{2}{|l|}{\bf Tracing/taping}\\ \hline |
---|
3578 | \verb=<adolc/adouble.h>= & |
---|
3579 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
3580 | \boldmath $\rightarrow$ \unboldmath |
---|
3581 | & provides the interface to the basic active |
---|
3582 | scalar data type of ADOL-C: {\sf class adouble} |
---|
3583 | (see \autoref{prepar}); |
---|
3584 | % $\bullet$ & includes the header files \verb=<adolc/avector.h>= and \verb=<adolc/taputil.h>=. |
---|
3585 | \end{tabular*} |
---|
3586 | \\ \hline |
---|
3587 | % \verb=<adolc/avector.h>= & |
---|
3588 | %\begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
3589 | % \boldmath $\rightarrow$ \unboldmath |
---|
3590 | % & provides the interface to the active vector |
---|
3591 | % and matrix data types of ADOL-C: {\sf class adoublev} |
---|
3592 | % and {\sf class adoublem}, respectively |
---|
3593 | % (see \autoref{arrays}); \\ |
---|
3594 | % $\bullet$ & is included by the header \verb=<adolc/adouble.h>=. |
---|
3595 | %\end{tabular*} |
---|
3596 | %\\ \hline |
---|
3597 | \verb=<adolc/taputil.h>= & |
---|
3598 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
3599 | \boldmath $\rightarrow$ \unboldmath |
---|
3600 | & provides functions to start/stop the tracing of |
---|
3601 | active sections (see \autoref{markingActive}) |
---|
3602 | as well as utilities to obtain |
---|
3603 | tape statistics (see \autoref{examiningTape}); \\ |
---|
3604 | $\bullet$ & is included by the header \verb=<adolc/adouble.h>=. |
---|
3605 | \end{tabular*} |
---|
3606 | \\ \hline |
---|
3607 | \end{tabular} |
---|
3608 | \caption{Important header files: tracing/taping} |
---|
3609 | \label{importantHeaders1} |
---|
3610 | \end{table} |
---|
3611 | % |
---|
3612 | \begin{table}[h] |
---|
3613 | \center \small |
---|
3614 | \begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline |
---|
3615 | %\multicolumn{2}{|l|}{\bf Evaluation of derivatives}\\ \hline |
---|
3616 | \verb=<adolc/interfaces.h>= & |
---|
3617 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
3618 | \boldmath $\rightarrow$ \unboldmath |
---|
3619 | & provides interfaces to the {\sf forward} and |
---|
3620 | {\sf reverse} routines as basic versions of derivative |
---|
3621 | evaluation (see \autoref{forw_rev}); \\ |
---|
3622 | $\bullet$ & comprises C++, C, and Fortran-callable versions; \\ |
---|
3623 | $\bullet$ & includes the header \verb=<adolc/sparse/sparsedrivers.h>=; \\ |
---|
3624 | $\bullet$ & is included by the header \verb=<adolc/drivers/odedrivers.h>=. |
---|
3625 | \end{tabular*} |
---|
3626 | \\ \hline |
---|
3627 | \verb=<adolc/drivers.h>= & |
---|
3628 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
3629 | \boldmath $\rightarrow$ \unboldmath |
---|
3630 | & provides ``easy to use'' drivers for solving |
---|
3631 | optimization problems and nonlinear equations |
---|
3632 | (see \autoref{optdrivers}); \\ |
---|
3633 | $\bullet$ & comprises C and Fortran-callable versions. |
---|
3634 | \end{tabular*} |
---|
3635 | \\ \hline |
---|
3636 | \begin{minipage}{3cm} |
---|
3637 | \verb=<adolc/sparse/=\newline\verb= sparsedrivers.h>= |
---|
3638 | \end{minipage} & |
---|
3639 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
3640 | \boldmath $\rightarrow$ \unboldmath |
---|
3641 | & provides the ``easy to use'' sparse drivers |
---|
3642 | to exploit the sparsity structure of |
---|
3643 | Jacobians (see \autoref{sparse}); \\ |
---|
3644 | \boldmath $\rightarrow$ \unboldmath & provides interfaces to \mbox{C++}-callable versions |
---|
3645 | of {\sf forward} and {\sf reverse} routines |
---|
3646 | propagating bit patterns (see \autoref{ProBit}); \\ |
---|
3647 | |
---|
3648 | $\bullet$ & is included by the header \verb=<adolc/interfaces.h>=. |
---|
3649 | \end{tabular*} |
---|
3650 | \\ \hline |
---|
3651 | \begin{minipage}{3cm} |
---|
3652 | \verb=<adolc/sparse/=\newline\verb= sparse_fo_rev.h>= |
---|
3653 | \end{minipage} & |
---|
3654 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
3655 | \boldmath $\rightarrow$ \unboldmath |
---|
3656 | & provides interfaces to the underlying C-callable |
---|
3657 | versions of {\sf forward} and {\sf reverse} routines |
---|
3658 | propagating bit patterns. |
---|
3659 | \end{tabular*} |
---|
3660 | \\ \hline |
---|
3661 | \begin{minipage}{3cm} |
---|
3662 | \verb=<adolc/drivers/=\newline\verb= odedrivers.h>= |
---|
3663 | \end{minipage} & |
---|
3664 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
3665 | \boldmath $\rightarrow$ \unboldmath |
---|
3666 | & provides ``easy to use'' drivers for numerical |
---|
3667 | solution of ordinary differential equations |
---|
3668 | (see \autoref{odedrivers}); \\ |
---|
3669 | $\bullet$ & comprises C++, C, and Fortran-callable versions; \\ |
---|
3670 | $\bullet$ & includes the header \verb=<adolc/interfaces.h>=. |
---|
3671 | \end{tabular*} |
---|
3672 | \\ \hline |
---|
3673 | \begin{minipage}{3cm} |
---|
3674 | \verb=<adolc/drivers/=\newline\verb= taylor.h>= |
---|
3675 | \end{minipage} & |
---|
3676 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
3677 | \boldmath $\rightarrow$ \unboldmath |
---|
3678 | & provides ``easy to use'' drivers for evaluation |
---|
3679 | of higher order derivative tensors (see |
---|
3680 | \autoref{higherOrderDeriv}) and inverse/implicit function |
---|
3681 | differentiation (see \autoref{implicitInverse});\\ |
---|
3682 | $\bullet$ & comprises C++ and C-callable versions. |
---|
3683 | \end{tabular*} |
---|
3684 | \\ \hline |
---|
3685 | \verb=<adolc/adalloc.h>= & |
---|
3686 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
3687 | \boldmath $\rightarrow$ \unboldmath |
---|
3688 | & provides C++ and C functions for allocation of |
---|
3689 | vectors, matrices and three dimensional arrays |
---|
3690 | of {\sf double}s. |
---|
3691 | \end{tabular*} |
---|
3692 | \\ \hline |
---|
3693 | \end{tabular} |
---|
3694 | \caption{Important header files: evaluation of derivatives} |
---|
3695 | \label{importantHeaders2} |
---|
3696 | \end{table} |
---|
3697 | % |
---|
3698 | \subsection{Compiling and Linking C/C++ Programs} |
---|
3699 | % |
---|
3700 | To compile a C/C++ program or single module using ADOL-C |
---|
3701 | data types and routines one has to ensure that all necessary |
---|
3702 | header files according to \autoref{ssec:DesIH} are |
---|
3703 | included. All modules involving {\em active} data types as |
---|
3704 | {\sf adouble} |
---|
3705 | %, {\bf adoublev} and {\bf adoublem} |
---|
3706 | have to be compiled as C++. Modules that make use of a previously |
---|
3707 | generated tape to evaluate derivatives can either be programmed in ANSI-C |
---|
3708 | (while avoiding all C++ interfaces) or in C++. Depending |
---|
3709 | on the chosen programming language the header files provide |
---|
3710 | the right ADOL-C prototypes. |
---|
3711 | For linking the resulting object codes the library \verb=libadolc.*= |
---|
3712 | must be used (see \autoref{genlib}). |
---|
3713 | % |
---|
3714 | \subsection{Adding Quadratures as Special Functions} |
---|
3715 | % |
---|
3716 | \label{quadrat} |
---|
3717 | % |
---|
3718 | Suppose an integral |
---|
3719 | \[ f(x) = \int\limits^{x}_{0} g(t) dt \] |
---|
3720 | is evaluated numerically by a user-supplied function |
---|
3721 | \begin{center} |
---|
3722 | {\sf double myintegral(double\& x);} |
---|
3723 | \end{center} |
---|
3724 | Similarly, let us suppose that the integrand itself is evaluated by |
---|
3725 | a user-supplied block of C code {\sf integrand}, which computes a |
---|
3726 | variable with the fixed name {\sf val} from a variable with the fixed |
---|
3727 | name {\sf arg}. In many cases of interest, {\sf integrand} will |
---|
3728 | simply be of the form |
---|
3729 | \begin{center} |
---|
3730 | {\sf \{ val = expression(arg) \}}\enspace . |
---|
3731 | \end{center} |
---|
3732 | In general, the final assignment to {\sf val} may be preceded |
---|
3733 | by several intermediate calculations, possibly involving local |
---|
3734 | active variables of type {\sf adouble}, but no external or static |
---|
3735 | variables of that type. However, {\sf integrand} may involve local |
---|
3736 | or global variables of type {\sf double} or {\sf int}, provided they |
---|
3737 | do not depend on the value of {\sf arg}. The variables {\sf arg} and |
---|
3738 | {\sf val} are declared automatically; and as {\sf integrand} is a block |
---|
3739 | rather than a function, {\sf integrand} should have no header line. |
---|
3740 | |
---|
3741 | Now the function {\sf myintegral} can be overloaded for {\sf adouble} |
---|
3742 | arguments and thus included in the library of elementary functions |
---|
3743 | by the following modifications: |
---|
3744 | \begin{enumerate} |
---|
3745 | \item |
---|
3746 | At the end of the file \verb=<adouble.cpp>=, include the full code |
---|
3747 | defining \\ {\sf double myintegral(double\& x)}, and add the line |
---|
3748 | \begin{center} |
---|
3749 | {\sf extend\_quad(myintegral, integrand); } |
---|
3750 | \end{center} |
---|
3751 | This macro is extended to the definition of |
---|
3752 | {\sf adouble myintegral(adouble\& arg)}. |
---|
3753 | Then remake the library \verb=libadolc.*= (see \autoref{genlib}). |
---|
3754 | \item |
---|
3755 | In the definition of the class |
---|
3756 | {\sf ADOLC\_DLL\_EXPORT adouble} in \verb=<adolc/adouble.h>=, add the statement |
---|
3757 | \begin{center} |
---|
3758 | {\sf friend adouble myintegral(adouble\&)}. |
---|
3759 | \end{center} |
---|
3760 | \end{enumerate} |
---|
3761 | In the first modification, {\sf myintegral} represents the name of the |
---|
3762 | {\sf double} function, whereas {\sf integrand} represents the actual block |
---|
3763 | of C code. |
---|
3764 | |
---|
3765 | For example, in case of the inverse hyperbolic cosine, we have |
---|
3766 | {\sf myintegral} = {\sf acosh}. Then {\sf integrand} can be written as |
---|
3767 | {\sf \{ val = sqrt(arg*arg-1); \}} |
---|
3768 | so that the line |
---|
3769 | \begin{center} |
---|
3770 | {\sf extend\_quad(acosh,val = sqrt(arg*arg-1));} |
---|
3771 | \end{center} |
---|
3772 | can be added to the file \verb=<adouble.cpp>=. |
---|
3773 | A mathematically equivalent but longer representation of |
---|
3774 | {\sf integrand} is |
---|
3775 | \begin{center} |
---|
3776 | \begin{tabbing} |
---|
3777 | {\sf \{ }\hspace{1.0in}\= {\sf \{ adouble} \= temp = \kill |
---|
3778 | \>{\sf \{ adouble} \> {\sf temp = arg;} \\ |
---|
3779 | \> \ \> {\sf temp = temp*temp; } \\ |
---|
3780 | \> \ \> {\sf val = sqrt(temp-1); \}} |
---|
3781 | \end{tabbing} |
---|
3782 | \end{center} |
---|
3783 | The code block {\sf integrand} may call on any elementary function that has already |
---|
3784 | been defined in file \verb=<adouble.cpp>=, so that one may also introduce |
---|
3785 | iterated integrals. |
---|
3786 | % |
---|
3787 | % |
---|
3788 | \section{Example Codes} |
---|
3789 | \label{example} |
---|
3790 | % |
---|
3791 | The following listings are all simplified versions of codes that |
---|
3792 | are contained in the example subdirectory |
---|
3793 | \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= of ADOL-C. In particular, |
---|
3794 | we have left out timings, which are included in the complete codes. |
---|
3795 | % |
---|
3796 | \subsection{Speelpenning's Example ({\tt speelpenning.cpp})} |
---|
3797 | % |
---|
3798 | The first example evaluates the gradient and the Hessian of |
---|
3799 | the function |
---|
3800 | \[ |
---|
3801 | y \; = \; f(x)\; =\; \prod_{i=0}^{n-1} x_i |
---|
3802 | \] |
---|
3803 | using the appropriate drivers {\sf gradient} and {\sf hessian}. |
---|
3804 | |
---|
3805 | \begin{verbatim} |
---|
3806 | #include <adolc/adouble.h> // use of active doubles and taping |
---|
3807 | #include <adolc/drivers/drivers.h> // use of "Easy to Use" drivers |
---|
3808 | // gradient(.) and hessian(.) |
---|
3809 | #include <adolc/taping.h> // use of taping |
---|
3810 | ... |
---|
3811 | void main() { |
---|
3812 | int n,i,j; |
---|
3813 | size_t tape_stats[STAT_SIZE]; |
---|
3814 | cout << "SPEELPENNINGS PRODUCT (ADOL-C Documented Example) \n"; |
---|
3815 | cout << "number of independent variables = ? \n"; |
---|
3816 | cin >> n; |
---|
3817 | double* xp = new double[n]; |
---|
3818 | double yp = 0.0; |
---|
3819 | adouble* x = new adouble[n]; |
---|
3820 | adouble y = 1; |
---|
3821 | for(i=0;i<n;i++) |
---|
3822 | xp[i] = (i+1.0)/(2.0+i); // some initialization |
---|
3823 | trace_on(1); // tag =1, keep=0 by default |
---|
3824 | for(i=0;i<n;i++) { |
---|
3825 | x[i] <<= xp[i]; y *= x[i]; } |
---|
3826 | y >>= yp; |
---|
3827 | delete[] x; |
---|
3828 | trace_off(); |
---|
3829 | tapestats(1,tape_stats); // reading of tape statistics |
---|
3830 | cout<<"maxlive "<<tape_stats[2]<<"\n"; |
---|
3831 | ... // ..... print other tape stats |
---|
3832 | double* g = new double[n]; |
---|
3833 | gradient(1,n,xp,g); // gradient evaluation |
---|
3834 | double** H=(double**)malloc(n*sizeof(double*)); |
---|
3835 | for(i=0;i<n;i++) |
---|
3836 | H[i]=(double*)malloc((i+1)*sizeof(double)); |
---|
3837 | hessian(1,n,xp,H); // H equals (n-1)g since g is |
---|
3838 | double errg = 0; // homogeneous of degree n-1. |
---|
3839 | double errh = 0; |
---|
3840 | for(i=0;i<n;i++) |
---|
3841 | errg += fabs(g[i]-yp/xp[i]); // vanishes analytically. |
---|
3842 | for(i=0;i<n;i++) { |
---|
3843 | for(j=0;j<n;j++) { |
---|
3844 | if (i>j) // lower half of hessian |
---|
3845 | errh += fabs(H[i][j]-g[i]/xp[j]); } } |
---|
3846 | cout << yp-1/(1.0+n) << " error in function \n"; |
---|
3847 | cout << errg <<" error in gradient \n"; |
---|
3848 | cout << errh <<" consistency check \n"; |
---|
3849 | } // end main |
---|
3850 | \end{verbatim} |
---|
3851 | % |
---|
3852 | \subsection{Power Example ({\tt powexam.cpp})} |
---|
3853 | % |
---|
3854 | The second example function evaluates the $n$-th power of a real |
---|
3855 | variable $x$ in |
---|
3856 | $\log_2 n$ multiplications by recursive halving of the exponent. Since |
---|
3857 | there is only one independent variable, the scalar derivative can be |
---|
3858 | computed by |
---|
3859 | using both {\sf forward} and {\sf reverse}, and the |
---|
3860 | results are subsequently compared. |
---|
3861 | \begin{verbatim} |
---|
3862 | #include <adolc/adolc.h> // use of ALL ADOL-C interfaces |
---|
3863 | |
---|
3864 | adouble power(adouble x, int n) { |
---|
3865 | adouble z=1; |
---|
3866 | if (n>0) { // recursion and branches |
---|
3867 | int nh =n/2; // that do not depend on |
---|
3868 | z = power(x,nh); // adoubles are fine !!!! |
---|
3869 | z *= z; |
---|
3870 | if (2*nh != n) |
---|
3871 | z *= x; |
---|
3872 | return z; } // end if |
---|
3873 | else { |
---|
3874 | if (n==0) // the local adouble z dies |
---|
3875 | return z; // as it goes out of scope. |
---|
3876 | else |
---|
3877 | return 1/power(x,-n); } // end else |
---|
3878 | } // end power |
---|
3879 | \end{verbatim} |
---|
3880 | The function {\sf power} above was obtained from the original |
---|
3881 | undifferentiated version by simply changing the type of all |
---|
3882 | {\sf double}s including the return variable to {\sf adouble}s. The new version |
---|
3883 | can now be called from within any active section, as in the following |
---|
3884 | main program. |
---|
3885 | \begin{verbatim} |
---|
3886 | #include ... // as above |
---|
3887 | int main() { |
---|
3888 | int i,n,tag=1; |
---|
3889 | cout <<"COMPUTATION OF N-TH POWER (ADOL-C Documented Example)\n\n"; |
---|
3890 | cout<<"monomial degree=? \n"; // input the desired degree |
---|
3891 | cin >> n; |
---|
3892 | // allocations and initializations |
---|
3893 | double* Y[1]; |
---|
3894 | *Y = new double[n+2]; |
---|
3895 | double* X[1]; // allocate passive variables with |
---|
3896 | *X = new double[n+4]; // extra dimension for derivatives |
---|
3897 | X[0][0] = 0.5; // function value = 0. coefficient |
---|
3898 | X[0][1] = 1.0; // first derivative = 1. coefficient |
---|
3899 | for(i=0;i<n+2;i++) |
---|
3900 | X[0][i+2]=0; // further coefficients |
---|
3901 | double* Z[1]; // used for checking consistency |
---|
3902 | *Z = new double[n+2]; // between forward and reverse |
---|
3903 | adouble y,x; // declare active variables |
---|
3904 | // beginning of active section |
---|
3905 | trace_on(1); // tag = 1 and keep = 0 |
---|
3906 | x <<= X[0][0]; // only one independent var |
---|
3907 | y = power(x,n); // actual function call |
---|
3908 | y >>= Y[0][0]; // only one dependent adouble |
---|
3909 | trace_off(); // no global adouble has died |
---|
3910 | // end of active section |
---|
3911 | double u[1]; // weighting vector |
---|
3912 | u[0]=1; // for reverse call |
---|
3913 | for(i=0;i<n+2;i++) { // note that keep = i+1 in call |
---|
3914 | forward(tag,1,1,i,i+1,X,Y); // evaluate the i-the derivative |
---|
3915 | if (i==0) |
---|
3916 | cout << Y[0][i] << " - " << y.value() << " = " << Y[0][i]-y.value() |
---|
3917 | << " (should be 0)\n"; |
---|
3918 | else |
---|
3919 | cout << Y[0][i] << " - " << Z[0][i] << " = " << Y[0][i]-Z[0][i] |
---|
3920 | << " (should be 0)\n"; |
---|
3921 | reverse(tag,1,1,i,u,Z); // evaluate the (i+1)-st derivative |
---|
3922 | Z[0][i+1]=Z[0][i]/(i+1); } // scale derivative to Taylorcoeff. |
---|
3923 | return 1; |
---|
3924 | } // end main |
---|
3925 | \end{verbatim} |
---|
3926 | Since this example has only one independent and one dependent variable, |
---|
3927 | {\sf forward} and {\sf reverse} have the same complexity and calculate |
---|
3928 | the same scalar derivatives, albeit with a slightly different scaling. |
---|
3929 | By replacing the function {\sf power} with any other univariate test function, |
---|
3930 | one can check that {\sf forward} and {\sf reverse} are at least consistent. |
---|
3931 | In the following example the number of independents is much larger |
---|
3932 | than the number of dependents, which makes the reverse mode preferable. |
---|
3933 | % |
---|
3934 | \subsection{Determinant Example ({\tt detexam.cpp})} |
---|
3935 | % |
---|
3936 | Now let us consider an exponentially expensive calculation, |
---|
3937 | namely, the evaluation of a determinant by recursive expansion |
---|
3938 | along rows. The gradient of the determinant with respect to the |
---|
3939 | matrix elements is simply the adjoint, i.e. the matrix of cofactors. |
---|
3940 | Hence the correctness of the numerical result is easily checked by |
---|
3941 | matrix-vector multiplication. The example illustrates the use |
---|
3942 | of {\sf adouble} arrays and pointers. |
---|
3943 | |
---|
3944 | \begin{verbatim} |
---|
3945 | #include <adolc/adouble.h> // use of active doubles and taping |
---|
3946 | #include <adolc/interfaces.h> // use of basic forward/reverse |
---|
3947 | // interfaces of ADOL-C |
---|
3948 | adouble** A; // A is an n x n matrix |
---|
3949 | int i,n; // k <= n is the order |
---|
3950 | adouble det(int k, int m) { // of the sub-matrix |
---|
3951 | if (m == 0) return 1.0 ; // its column indices |
---|
3952 | else { // are encoded in m |
---|
3953 | adouble* pt = A[k-1]; |
---|
3954 | adouble t = zero; // zero is predefined |
---|
3955 | int s, p =1; |
---|
3956 | if (k%2) s = 1; else s = -1; |
---|
3957 | for(i=0;i<n;i++) { |
---|
3958 | int p1 = 2*p; |
---|
3959 | if (m%p1 >= p) { |
---|
3960 | if (m == p) { |
---|
3961 | if (s>0) t += *pt; else t -= *pt; } |
---|
3962 | else { |
---|
3963 | if (s>0) |
---|
3964 | t += *pt*det(k-1,m-p); // recursive call to det |
---|
3965 | else |
---|
3966 | t -= *pt*det(k-1,m-p); } // recursive call to det |
---|
3967 | s = -s;} |
---|
3968 | ++pt; |
---|
3969 | p = p1;} |
---|
3970 | return t; } |
---|
3971 | } // end det |
---|
3972 | \end{verbatim} |
---|
3973 | As one can see, the overloading mechanism has no problem with pointers |
---|
3974 | and looks exactly the same as the original undifferentiated function |
---|
3975 | except for the change of type from {\sf double} to {\sf adouble}. |
---|
3976 | If the type of the temporary {\sf t} or the pointer {\sf pt} had not been changed, |
---|
3977 | a compile time error would have resulted. Now consider a corresponding |
---|
3978 | calling program. |
---|
3979 | |
---|
3980 | \begin{verbatim} |
---|
3981 | #include ... // as above |
---|
3982 | int main() { |
---|
3983 | int i,j, m=1,tag=1,keep=1; |
---|
3984 | cout << "COMPUTATION OF DETERMINANTS (ADOL-C Documented Example)\n\n"; |
---|
3985 | cout << "order of matrix = ? \n"; // select matrix size |
---|
3986 | cin >> n; |
---|
3987 | A = new adouble*[n]; |
---|
3988 | trace_on(tag,keep); // tag=1=keep |
---|
3989 | double detout=0.0, diag = 1.0; // here keep the intermediates for |
---|
3990 | for(i=0;i<n;i++) { // the subsequent call to reverse |
---|
3991 | m *=2; |
---|
3992 | A[i] = new adouble[n]; // not needed for adoublem |
---|
3993 | adouble* pt = A[i]; |
---|
3994 | for(j=0;j<n;j++) |
---|
3995 | A[i][j] <<= j/(1.0+i); // make all elements of A independent |
---|
3996 | diag += A[i][i].value(); // value() converts to double |
---|
3997 | A[i][i] += 1.0; } |
---|
3998 | det(n,m-1) >>= detout; // actual function call |
---|
3999 | printf("\n %f - %f = %f (should be 0)\n",detout,diag,detout-diag); |
---|
4000 | trace_off(); |
---|
4001 | double u[1]; |
---|
4002 | u[0] = 1.0; |
---|
4003 | double* B = new double[n*n]; |
---|
4004 | reverse(tag,1,n*n,1,u,B); |
---|
4005 | cout <<" \n first base? : "; |
---|
4006 | for (i=0;i<n;i++) { |
---|
4007 | adouble sum = 0; |
---|
4008 | for (j=0;j<n;j++) // the matrix A times the first n |
---|
4009 | sum += A[i][j]*B[j]; // components of the gradient B |
---|
4010 | cout<<sum.value()<<" "; } // must be a Cartesian basis vector |
---|
4011 | return 1; |
---|
4012 | } // end main |
---|
4013 | \end{verbatim} |
---|
4014 | The variable {\sf diag} should be mathematically |
---|
4015 | equal to the determinant, because the |
---|
4016 | matrix {\sf A} is defined as a rank 1 perturbation of the identity. |
---|
4017 | % |
---|
4018 | \subsection{Ordinary Differential Equation Example ({\tt odexam.cpp})} |
---|
4019 | \label{exam:ode} |
---|
4020 | % |
---|
4021 | Here, we consider a nonlinear ordinary differential equation that |
---|
4022 | is a slight modification of the Robertson test problem |
---|
4023 | given in Hairer and Wanner's book on the numerical solution of |
---|
4024 | ODEs \cite{HW}. The following source code computes the corresponding |
---|
4025 | values of $y^{\prime} \in \R^3$: |
---|
4026 | \begin{verbatim} |
---|
4027 | #include <adolc/adouble.h> // use of active doubles and taping |
---|
4028 | #include <adolc/drivers/odedrivers.h> // use of "Easy To use" ODE drivers |
---|
4029 | #include <adolc/adalloc.h> // use of ADOL-C allocation utilities |
---|
4030 | |
---|
4031 | void tracerhs(short int tag, double* py, double* pyprime) { |
---|
4032 | adouble y[3]; // this time we left the parameters |
---|
4033 | adouble yprime[3]; // passive and use the vector types |
---|
4034 | trace_on(tag); |
---|
4035 | for (int i=0; i<3; i++) |
---|
4036 | y[i] <<= py[i]; // initialize and mark independents |
---|
4037 | yprime[0] = -sin(y[2]) + 1e8*y[2]*(1-1/y[0]); |
---|
4038 | yprime[1] = -10*y[0] + 3e7*y[2]*(1-y[1]); |
---|
4039 | yprime[2] = -yprime[0] - yprime[1]; |
---|
4040 | yprime >>= pyprime; // mark and pass dependents |
---|
4041 | trace_off(tag); |
---|
4042 | } // end tracerhs |
---|
4043 | \end{verbatim} |
---|
4044 | The Jacobian of the right-hand side has large |
---|
4045 | negative eigenvalues, which make the ODE quite stiff. We have added |
---|
4046 | some numerically benign transcendentals to make the differentiation |
---|
4047 | more interesting. |
---|
4048 | The following main program uses {\sf forode} to calculate the Taylor series |
---|
4049 | defined by the ODE at the given point $y_0$ and {\sf reverse} as well |
---|
4050 | as {\sf accode} to compute the Jacobians of the coefficient vectors |
---|
4051 | with respect to $x_0$. |
---|
4052 | \begin{verbatim} |
---|
4053 | #include ....... // as above |
---|
4054 | int main() { |
---|
4055 | int i,j,deg; |
---|
4056 | int n=3; |
---|
4057 | double py[3]; |
---|
4058 | double pyp[3]; |
---|
4059 | cout << "MODIFIED ROBERTSON TEST PROBLEM (ADOL-C Documented Example)\n"; |
---|
4060 | cout << "degree of Taylor series =?\n"; |
---|
4061 | cin >> deg; |
---|
4062 | double **X; |
---|
4063 | X=(double**)malloc(n*sizeof(double*)); |
---|
4064 | for(i=0;i<n;i++) |
---|
4065 | X[i]=(double*)malloc((deg+1)*sizeof(double)); |
---|
4066 | double*** Z=new double**[n]; |
---|
4067 | double*** B=new double**[n]; |
---|
4068 | short** nz = new short*[n]; |
---|
4069 | for(i=0;i<n;i++) { |
---|
4070 | Z[i]=new double*[n]; |
---|
4071 | B[i]=new double*[n]; |
---|
4072 | for(j=0;j<n;j++) { |
---|
4073 | Z[i][j]=new double[deg]; |
---|
4074 | B[i][j]=new double[deg]; } // end for |
---|
4075 | } // end for |
---|
4076 | for(i=0;i<n;i++) { |
---|
4077 | py[i] = (i == 0) ? 1.0 : 0.0; // initialize the base point |
---|
4078 | X[i][0] = py[i]; // and the Taylor coefficient; |
---|
4079 | nz[i] = new short[n]; } // set up sparsity array |
---|
4080 | tracerhs(1,py,pyp); // trace RHS with tag = 1 |
---|
4081 | forode(1,n,deg,X); // compute deg coefficients |
---|
4082 | reverse(1,n,n,deg-1,Z,nz); // U defaults to the identity |
---|
4083 | accode(n,deg-1,Z,B,nz); |
---|
4084 | cout << "nonzero pattern:\n"; |
---|
4085 | for(i=0;i<n;i++) { |
---|
4086 | for(j=0;j<n;j++) |
---|
4087 | cout << nz[i][j]<<"\t"; |
---|
4088 | cout <<"\n"; } // end for |
---|
4089 | return 1; |
---|
4090 | } // end main |
---|
4091 | \end{verbatim} |
---|
4092 | \noindent The pattern {\sf nz} returned by {\sf accode} is |
---|
4093 | \begin{verbatim} |
---|
4094 | 3 -1 4 |
---|
4095 | 1 2 2 |
---|
4096 | 3 2 4 |
---|
4097 | \end{verbatim} |
---|
4098 | The original pattern {\sf nz} returned by {\sf reverse} is the same |
---|
4099 | except that the negative entry $-1$ was zero. |
---|
4100 | % |
---|
4101 | %\subsection {Gaussian Elimination Example ({\tt gaussexam.cpp})} |
---|
4102 | %\label{gaussexam} |
---|
4103 | % |
---|
4104 | %The following example uses conditional assignments to show the usage of a once produced tape |
---|
4105 | %for evaluation at new arguments. The elimination is performed with |
---|
4106 | %column pivoting. |
---|
4107 | %\begin{verbatim} |
---|
4108 | %#include <adolc/adolc.h> // use of ALL ADOL-C interfaces |
---|
4109 | |
---|
4110 | %void gausselim(int n, adoublem& A, adoublev& bv) { |
---|
4111 | %along i; // active integer declaration |
---|
4112 | %adoublev temp(n); // active vector declaration |
---|
4113 | %adouble r,rj,temps; |
---|
4114 | %int j,k; |
---|
4115 | %for(k=0;k<n;k++) { // elimination loop |
---|
4116 | % i = k; |
---|
4117 | % r = fabs(A[k][k]); // initial pivot size |
---|
4118 | % for(j=k+1;j<n;j++) { |
---|
4119 | % rj = fabs(A[j][k]); |
---|
4120 | % condassign(i,rj-r,j); // look for a larger element in the same |
---|
4121 | % condassign(r,rj-r,rj); } // column with conditional assignments |
---|
4122 | % temp = A[i]; // switch rows using active subscripting |
---|
4123 | % A[i] = A[k]; // necessary even if i happens to equal |
---|
4124 | % A[k] = temp; // k during taping |
---|
4125 | % temps = bv[i]; |
---|
4126 | % bv[i]=bv[k]; |
---|
4127 | % bv[k]=temps; |
---|
4128 | % if (!value(A[k][k])) // passive subscripting |
---|
4129 | % exit(1); // matrix singular! |
---|
4130 | % temps= A[k][k]; |
---|
4131 | % A[k] /= temps; |
---|
4132 | % bv[k] /= temps; |
---|
4133 | % for(j=k+1;j<n;j++) { |
---|
4134 | % temps= A[j][k]; |
---|
4135 | % A[j] -= temps*A[k]; // vector operations |
---|
4136 | % bv[j] -= temps*bv[k]; } // endfor |
---|
4137 | %} // end elimination loop |
---|
4138 | %temp=0.0; |
---|
4139 | %for(k=n-1;k>=0;k--) // backsubstitution |
---|
4140 | % temp[k] = (bv[k]-(A[k]*temp))/A[k][k]; |
---|
4141 | %bv=temp; |
---|
4142 | %} // end gausselim |
---|
4143 | %\end{verbatim} |
---|
4144 | %\noindent This function can be called from any program |
---|
4145 | %that suitably initializes |
---|
4146 | %the components of {\sf A} and {\sf bv} |
---|
4147 | %as independents. The resulting tape can be |
---|
4148 | %used to solve any nonsingular linear system of the same size and |
---|
4149 | %to get the sensitivities of the solution with respect to the |
---|
4150 | %system matrix and the right hand side. |
---|
4151 | %\vspace*{-4mm} |
---|
4152 | % |
---|
4153 | \section*{Acknowledgements} |
---|
4154 | % |
---|
4155 | Parts of the ADOL-C source were developed by Andreas |
---|
4156 | Kowarz, Hristo Mitev, Sebastian Schlenkrich, and Olaf |
---|
4157 | Vogel. We are also indebted to George Corliss, |
---|
4158 | Tom Epperly, Bruce Christianson, David Gay, David Juedes, |
---|
4159 | Brad Karp, Koichi Kubota, Bob Olson, Marcela Rosemblun, Dima |
---|
4160 | Shiriaev, Jay Srinivasan, Chuck Tyner, Jean Utke, and Duane Yoder for helping in |
---|
4161 | various ways with the development and documentation of ADOL-C. |
---|
4162 | % |
---|
4163 | \begin{thebibliography}{10} |
---|
4164 | |
---|
4165 | \bibitem{BeKh96} |
---|
4166 | Christian~H. Bischof, Peyvand~M. Khademi, Ali Bouaricha and Alan Carle. |
---|
4167 | \newblock {\em Efficient computation of gradients and Jacobians by dynamic |
---|
4168 | exploitation of sparsity in automatic differentiation}. |
---|
4169 | \newblock Optimization Methods and Software 7(1):1-39, 1996. |
---|
4170 | |
---|
4171 | \bibitem{Chri91a} |
---|
4172 | Bruce Christianson. |
---|
4173 | \newblock {\em Reverse accumulation and accurate rounding error estimates for |
---|
4174 | Taylor series}. |
---|
4175 | \newblock Optimization Methods and Software 1:81--94, 1992. |
---|
4176 | |
---|
4177 | \bibitem{GeMaPo05} |
---|
4178 | Assefaw Gebremedhin, Fredrik Manne, and Alex Pothen. |
---|
4179 | \newblock {\em What color is your {J}acobian? {G}raph coloring for computing |
---|
4180 | derivatives}. |
---|
4181 | \newblock SIAM Review 47(4):629--705, 2005. |
---|
4182 | |
---|
4183 | \bibitem{GePoTaWa06} |
---|
4184 | Assefaw Gebremedhin, Alex Pothen, Arijit Tarafdar and Andrea Walther. |
---|
4185 | {\em Efficient Computation of Sparse Hessians: An Experimental Study |
---|
4186 | using ADOL-C}. Tech. Rep. (2006). To appear in INFORMS Journal on Computing. |
---|
4187 | |
---|
4188 | \bibitem{GePoWa08} Assefaw Gebremedhin, Alex Pothen, and Andrea |
---|
4189 | Walther. |
---|
4190 | {\em Exploiting Sparsity in Jacobian Computation via Coloring and Automatic Differentiation: |
---|
4191 | a Case Study in a Simulated Moving Bed Process}. |
---|
4192 | In Chr. Bischof et al., eds., {\em Proceedings AD 2008 conference}, LNCSE 64, pp. 327 -- 338, Springer (2008). |
---|
4193 | |
---|
4194 | \bibitem{GeTaMaPo07} |
---|
4195 | Assefaw Gebremedhin, Arijit Tarafdar, Fredrik Manne, and Alex Pothen, |
---|
4196 | {\em New Acyclic and Star Coloring Algorithms with Applications to Hessian Computation}. |
---|
4197 | SIAM Journal on Scientific Computing 29(3):1042--1072, 2007. |
---|
4198 | |
---|
4199 | |
---|
4200 | \bibitem{GrWa08} |
---|
4201 | Andreas Griewank and Andrea Walther: {\em Evaluating Derivatives, Principles and Techniques of |
---|
4202 | Algorithmic Differentiation. Second edition}. SIAM, 2008. |
---|
4203 | |
---|
4204 | |
---|
4205 | \bibitem{Griewank97} |
---|
4206 | Andreas Griewank, Jean Utke, and Andrea Walther. |
---|
4207 | \newblock {\em Evaluating higher derivative tensors by forward propagation |
---|
4208 | of univariate Taylor series}. |
---|
4209 | \newblock Mathematics of Computation, 69:1117--1130, 2000. |
---|
4210 | |
---|
4211 | \bibitem{GrWa00} |
---|
4212 | Andreas Griewank and Andrea Walther. {\em Revolve: An Implementation of Checkpointing for the Reverse |
---|
4213 | or Adjoint Mode of Computational Differentiation}, |
---|
4214 | ACM Transaction on Mathematical Software 26:19--45, 2000. |
---|
4215 | |
---|
4216 | \bibitem{HW} |
---|
4217 | Ernst Hairer and Gerhard Wanner. |
---|
4218 | {\it Solving Ordinary Differential Equations II.\/} |
---|
4219 | Springer-Verlag, Berlin, 1991. |
---|
4220 | |
---|
4221 | \bibitem{Knuth73} |
---|
4222 | Donald~E. Knuth. |
---|
4223 | \newblock {\em The Art of Computer Programming. Second edition.} |
---|
4224 | \newblock Addison-Wesley, Reading, 1973. |
---|
4225 | |
---|
4226 | \bibitem{Wa05a} |
---|
4227 | Andrea Walther. |
---|
4228 | \newblock {\em Computing Sparse Hessians with Automatic Differentiation}. |
---|
4229 | \newblock Transaction on Mathematical Software, 34(1), Artikel 3 (2008). |
---|
4230 | \end{thebibliography} |
---|
4231 | \end{document} |
---|
4232 | |
---|