1 | % Latex file containing the documentation of ADOL-C |
---|
2 | % |
---|
3 | % Copyright (C) Andrea Walther, Andreas Griewank, Andreas Kowarz, |
---|
4 | % Hristo Mitev, Sebastian Schlenkrich, Jean Utke, Olaf Vogel |
---|
5 | % |
---|
6 | % This file is part of ADOL-C. This software is provided as open source. |
---|
7 | % Any use, reproduction, or distribution of the software constitutes |
---|
8 | % recipient's acceptance of the terms of the accompanying license file. |
---|
9 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
10 | |
---|
11 | \documentclass[11pt,twoside]{article} |
---|
12 | \usepackage{hyperref} |
---|
13 | \usepackage{amsmath,amsthm,amssymb} |
---|
14 | \usepackage{graphicx} |
---|
15 | \usepackage{datetime} |
---|
16 | |
---|
17 | \newdateformat{monthyear}{\monthname\ \THEYEAR} |
---|
18 | |
---|
19 | \usepackage{color} |
---|
20 | |
---|
21 | \pagestyle{headings} |
---|
22 | \bibliographystyle{plain} |
---|
23 | \parskip=6pt |
---|
24 | |
---|
25 | \setlength{\textwidth}{433.6pt} |
---|
26 | \setlength{\oddsidemargin}{23pt} |
---|
27 | \setlength{\evensidemargin}{23pt} |
---|
28 | \setlength{\topmargin}{25.0pt} |
---|
29 | \setlength{\textheight}{580pt} |
---|
30 | \setlength{\baselineskip}{8pt} |
---|
31 | |
---|
32 | \newcommand{\N}{{ {\rm I} \kern -.225em {\rm N} }} |
---|
33 | \newcommand{\R}{{ {\rm I} \kern -.225em {\rm R} }} |
---|
34 | \newcommand{\T}{{ {\rm I} \kern -.425em {\rm T} }} |
---|
35 | |
---|
36 | \renewcommand{\sectionautorefname}{Section} |
---|
37 | \renewcommand{\subsectionautorefname}{Section} |
---|
38 | \renewcommand{\figureautorefname}{Figure} |
---|
39 | \renewcommand{\tableautorefname}{Table} |
---|
40 | |
---|
41 | \setcounter{equation}{0} |
---|
42 | |
---|
43 | \input{version.tex} |
---|
44 | |
---|
45 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
46 | \begin{document} |
---|
47 | |
---|
48 | \begin{titlepage} |
---|
49 | \begin{center} |
---|
50 | {\Large {\bf ADOL-C:}} |
---|
51 | \footnote{The development of earlier versions was supported by the Office of |
---|
52 | Scientific Computing, U.S. Department of Energy, the NSF, and the Deutsche |
---|
53 | Forschungsgemeinschaft. During the development of the current |
---|
54 | version Andrea Walther and Andreas Kowarz were supported by the |
---|
55 | grant Wa 1607/2-1 of the Deutsche Forschungsgemeinschaft} |
---|
56 | \vspace{0.2in} \\ |
---|
57 | % |
---|
58 | {\Large A Package for the Automatic Differentiation}\vspace{0.1in} \\ |
---|
59 | {\Large of Algorithms Written in C/C++}\\ |
---|
60 | \vspace{.2in} |
---|
61 | {\large\bf Version \packageversion, \monthyear\today} \\ |
---|
62 | \bigskip |
---|
63 | \mbox{Andrea Walther}\footnote{Institute of Mathematics, University |
---|
64 | of Paderborn, 33098 Paderborn, Germany} and |
---|
65 | \mbox{Andreas Griewank}\footnote{Department of Mathematics, |
---|
66 | Humboldt-Universit\"at zu Berlin, 10099 Berlin, Germany} |
---|
67 | \end{center} |
---|
68 | % |
---|
69 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
70 | \begin{abstract} |
---|
71 | The C++ package ADOL-C described here facilitates the evaluation of |
---|
72 | first and higher derivatives of vector functions that are defined |
---|
73 | by computer programs written in C or C++. The resulting derivative |
---|
74 | evaluation routines may be called from C, C++, Fortran, or any other |
---|
75 | language that can be linked with C. |
---|
76 | |
---|
77 | The numerical values of derivative vectors are obtained free |
---|
78 | of truncation errors at a small multiple of the run time and |
---|
79 | random access memory required by the given function evaluation program. |
---|
80 | Derivative matrices are obtained by columns, by rows or in sparse format. |
---|
81 | For solution curves defined by ordinary differential equations, |
---|
82 | special routines are provided that evaluate the Taylor coefficient vectors |
---|
83 | and their Jacobians with respect to the current state vector. |
---|
84 | For explicitly or implicitly defined functions derivative tensors are |
---|
85 | obtained with a complexity that grows only quadratically in their |
---|
86 | degree. The derivative calculations involve a possibly substantial but |
---|
87 | always predictable amount of data. Since the data is accessed strictly sequentially |
---|
88 | it can be automatically paged out to external files. |
---|
89 | \end{abstract} |
---|
90 | % |
---|
91 | {\bf Keywords}: Computational Differentiation, Automatic |
---|
92 | Differentiation, |
---|
93 | Chain Rule, Overloading, Taylor Coefficients, |
---|
94 | Gradients, Hessians, Forward Mode, Reverse Mode, |
---|
95 | Implicit Function Differentiation, Inverse Function Differentiation |
---|
96 | \medskip |
---|
97 | |
---|
98 | \noindent |
---|
99 | {\bf Abbreviated title}: Automatic differentiation by overloading in C++ |
---|
100 | % |
---|
101 | \end{titlepage} |
---|
102 | % |
---|
103 | \tableofcontents |
---|
104 | % |
---|
105 | |
---|
106 | \newpage |
---|
107 | % |
---|
108 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
109 | % |
---|
110 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
111 | \section{Preparing a Section of C or C++ Code for Differentiation} |
---|
112 | \label{prepar} |
---|
113 | % |
---|
114 | \subsection{Introduction} |
---|
115 | % |
---|
116 | \setcounter{equation}{0} |
---|
117 | The package \mbox{ADOL-C} |
---|
118 | utilizes overloading in C++, but the |
---|
119 | user has to know only C. The acronym stands for {\bf A}utomatic |
---|
120 | {\bf D}ifferentiation by {\bf O}ver{\bf L}oading in {\bf C}++. |
---|
121 | In contrast to source transformation approaches, overloading does not generate intermediate |
---|
122 | source code. |
---|
123 | As starting points to retrieve further information on techniques and |
---|
124 | application of automatic differentiation, as well as on other AD |
---|
125 | tools, we refer to the book \cite{GrWa08}. Furthermore, the web page |
---|
126 | \verb=http://www.autodiff.org= of the AD community forms a rich source |
---|
127 | of further information and pointers. |
---|
128 | |
---|
129 | |
---|
130 | ADOL-C facilitates the simultaneous |
---|
131 | evaluation of arbitrarily high directional derivatives and the |
---|
132 | gradients of these Taylor coefficients with respect to all independent |
---|
133 | variables. Relative to the cost of evaluating the underlying function, |
---|
134 | the cost for evaluating any such scalar-vector pair grows as the |
---|
135 | square of the degree of the derivative but is still completely |
---|
136 | independent of the numbers $m$ and $n$. |
---|
137 | |
---|
138 | This manual is organized as follows. This section explains the |
---|
139 | modifications required to convert undifferentiated code to code that |
---|
140 | compiles with ADOL-C. |
---|
141 | \autoref{tape} covers aspects of the tape of recorded data that ADOL-C uses to |
---|
142 | evaluate arbitrarily high order derivatives. The discussion includes storage |
---|
143 | requirements and the tailoring of certain tape characteristics to fit specific |
---|
144 | user needs. Descriptions of easy-to-use drivers for a convenient derivative |
---|
145 | evaluation are contained in \autoref{drivers}. |
---|
146 | \autoref{forw_rev_ad} offers a more mathematical characterization of |
---|
147 | the different modes of AD to compute derivatives. At the same time, the |
---|
148 | corresponding drivers of ADOL-C are explained. |
---|
149 | The overloaded derivative evaluation routines using the forward and the reverse |
---|
150 | mode of AD are explained in \autoref{forw_rev}. |
---|
151 | Advanced differentiation techniques as the optimal checkpointing for |
---|
152 | time integrations, the exploitation of fixed point iterations, the usages |
---|
153 | of external differentiated functions and the differentiation of OpenMP |
---|
154 | parallel programs are described in \autoref{adv_ad}. |
---|
155 | The traceless forward mode is presented in \autoref{traceless}. |
---|
156 | \autoref{install} details the installation and |
---|
157 | use of the ADOL-C package. Finally, \autoref{example} |
---|
158 | furnishes some example programs that incorporate the ADOL-C package to |
---|
159 | evaluate first and higher-order |
---|
160 | derivatives. These and other examples are distributed with the ADOL-C |
---|
161 | source code. |
---|
162 | The user should simply refer to them if the more abstract and general |
---|
163 | descriptions of ADOL-C provided in this document do not suffice. |
---|
164 | % |
---|
165 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
166 | \subsection{Declaring Active Variables} |
---|
167 | % |
---|
168 | \label{DecActVar} |
---|
169 | % |
---|
170 | The key ingredient of automatic differentiation by overloading is the |
---|
171 | concept of an {\em active variable}. All variables that may be |
---|
172 | considered as differentiable quantities at some time |
---|
173 | during the program execution must be of an active |
---|
174 | type. ADOL-C uses one |
---|
175 | active scalar type, called {\sf adouble}, whose real part is of the |
---|
176 | standard type {\sf double}. |
---|
177 | Typically, one will declare the independent variables |
---|
178 | and all quantities that directly or indirectly depend on them as |
---|
179 | {\em active}. Other variables that do not depend on the independent |
---|
180 | variables but enter, for example, as parameters, may remain one of the |
---|
181 | {\em passive} types {\sf double, float}, or {\sf int}. There is no |
---|
182 | implicit type conversion from {\sf adouble} to any of these passive |
---|
183 | types; thus, {\bf failure to declare variables as active when they |
---|
184 | depend on other active variables will result in a compile-time error |
---|
185 | message}. In data flow terminology, the set of active variable names |
---|
186 | must contain all its successors in the dependency graph. All components |
---|
187 | of indexed arrays must have the same activity status. |
---|
188 | |
---|
189 | The real component of an {\sf adouble x} can be extracted as |
---|
190 | {\sf x.value()}. In particular, |
---|
191 | such explicit conversions are needed for the standard output procedure |
---|
192 | {\sf printf}. The output stream operator \boldmath $\ll$ \unboldmath is overloaded such |
---|
193 | that first the real part of an {\sf adouble} and then the string |
---|
194 | ``{\sf (a)}" is added to the stream. The input stream operator \boldmath $\gg$ \unboldmath can |
---|
195 | be used to assign a constant value to an {\sf adouble}. |
---|
196 | Naturally, {\sf adouble}s may be |
---|
197 | components of vectors, matrices, and other arrays, as well as |
---|
198 | members of structures or classes. |
---|
199 | |
---|
200 | The C++ class {\sf adouble}, its member functions, and the overloaded |
---|
201 | versions of all arithmetic operations, comparison operators, and |
---|
202 | most ANSI C functions are contained in the file \verb=adouble.cpp= and its |
---|
203 | header \verb=<adolc/adouble.h>=. The latter must be included for compilation |
---|
204 | of all program files containing {\sf adouble}s and corresponding |
---|
205 | operations. |
---|
206 | % |
---|
207 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
208 | \subsection{Marking Active Sections} |
---|
209 | \label{markingActive} |
---|
210 | % |
---|
211 | All calculations involving active variables that occur between |
---|
212 | the void function calls |
---|
213 | \begin{center} |
---|
214 | {\sf trace\_on(tag,keep)} \hspace{0.3in} and \hspace{0.3in} |
---|
215 | {\sf trace\_off(file)} |
---|
216 | \end{center} |
---|
217 | are recorded on a sequential data set called {\em tape}. Pairs of |
---|
218 | these function calls can appear anywhere in a C++ program, but |
---|
219 | they must not overlap. The nonnegative integer argument {\sf tag} identifies the |
---|
220 | particular tape for subsequent function or derivative evaluations. |
---|
221 | Unless several tapes need to be kept, ${\sf tag} =0$ may be used throughout. |
---|
222 | The optional integer arguments {\sf keep} and |
---|
223 | {\sf file} will be discussed in \autoref{tape}. We will refer to the |
---|
224 | sequence of statements executed between a particular call to |
---|
225 | {\sf trace\_on} and the following call to {\sf trace\_off} as an |
---|
226 | {\em active section} of the code. The same active section may be |
---|
227 | entered repeatedly, and one can successively generate several traces |
---|
228 | on distinct tapes by changing the value of {\sf tag}. |
---|
229 | Both functions {\sf trace\_on} and {\sf trace\_off} are prototyped in |
---|
230 | the header file \verb=<adolc/taputil.h>=, which is included by the header |
---|
231 | \verb=<adolc/adouble.h>= automatically. |
---|
232 | |
---|
233 | Active sections may contain nested or even recursive calls to functions |
---|
234 | provided by the user. Naturally, their formal and actual parameters |
---|
235 | must have matching types. In particular, the functions must be |
---|
236 | compiled with their active variables declared as |
---|
237 | {\sf adouble}s and with the header file \verb=<adolc/adouble.h>= included. |
---|
238 | Variables of type {\sf adouble} may be declared outside an active section and need not |
---|
239 | go out of scope before the end of an active section. |
---|
240 | It is not necessary -- though desirable -- that free-store {\sf adouble}s |
---|
241 | allocated within |
---|
242 | an active section be deleted before its completion. The values of all |
---|
243 | {\sf adouble}s that exist at the beginning and end of an active section |
---|
244 | are automatically |
---|
245 | recorded by {\sf trace\_on} and {\sf trace\_off}, respectively. |
---|
246 | % |
---|
247 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
248 | \subsection{Selecting Independent and Dependent Variables} |
---|
249 | % |
---|
250 | One or more active variables that are read in or initialized to |
---|
251 | the values of constants or passive variables must be distinguished as |
---|
252 | independent variables. Other active variables that are similarly |
---|
253 | initialized may be considered as temporaries (e.g., a variable that |
---|
254 | accumulates the partial sums of a scalar product after being |
---|
255 | initialized to zero). In order to distinguish an active variable {\sf x} as |
---|
256 | independent, ADOL-C requires an assignment of the form |
---|
257 | \begin{center} |
---|
258 | {\sf x} \boldmath $\ll=$ \unboldmath {\sf px}\hspace{0.2in}// {\sf px} of any passive numeric type $\enspace .$ |
---|
259 | \end{center} |
---|
260 | This special initialization ensures that {\sf x.value()} = {\sf px}, and it should |
---|
261 | precede any other assignment to {\sf x}. However, {\sf x} may be reassigned |
---|
262 | other values subsequently. Similarly, one or more active variables {\sf y} |
---|
263 | must be distinguished as dependent by an assignment of the form |
---|
264 | \begin{center} |
---|
265 | {\sf y \boldmath $\gg=$ \unboldmath py}\hspace{0.2in}// {\sf py} of any passive type $\enspace ,$ |
---|
266 | \end{center} |
---|
267 | which ensures that {\sf py} = {\sf y.value()} and should not be succeeded |
---|
268 | by any other assignment to {\sf y}. However, a dependent variable {\sf y} |
---|
269 | may have been assigned other real values previously, and it could even be an |
---|
270 | independent variable as well. The derivative values calculated after |
---|
271 | the |
---|
272 | completion of an active section always represent {\bf derivatives of the final |
---|
273 | values of the dependent variables with respect to the initial values of the |
---|
274 | independent variables}. |
---|
275 | |
---|
276 | The order in which the independent and dependent variables are marked |
---|
277 | by the \boldmath $\ll=$ \unboldmath and \boldmath $\gg=$ \unboldmath statements matters crucially for the subsequent |
---|
278 | derivative evaluations. However, these variables do not have to be |
---|
279 | combined into contiguous vectors. ADOL-C counts the number of |
---|
280 | independent and dependent variable specifications within each active |
---|
281 | section and records them in the header of the tape. |
---|
282 | % |
---|
283 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
284 | \subsection{A Subprogram as an Active Section} |
---|
285 | % |
---|
286 | As a generic example let us consider a C(++) function of the form |
---|
287 | shown in \autoref{code1}. |
---|
288 | % |
---|
289 | \begin{figure}[hbt] |
---|
290 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
291 | \begin{center} |
---|
292 | \begin{tabbing} |
---|
293 | {\sf void eval(}\= {\sf int n, int m,} \hspace{0.5 in} \= // number of independents and dependents\\ |
---|
294 | \>{\sf double *x,} \> // independent variable vector \\ |
---|
295 | \>{\sf double *y,} \> // dependent variable vector \\ |
---|
296 | \> {\sf int *k, } \> // integer parameters \\ |
---|
297 | \>{\sf double *z)} \> // real parameters \\ |
---|
298 | {\sf \{ }\hspace{0.1 in } \= \> // beginning of function body \\ |
---|
299 | \>{\sf double t = 0;} \> // local variable declaration \\ |
---|
300 | \>{\sf for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> // begin of computation \\ |
---|
301 | \>\hspace{0.2in}{\sf t += z[i]*x[i];} \> // continue \\ |
---|
302 | \>{\sf $\cdots \cdots \cdots \cdots $} \> // continue \\ |
---|
303 | \>{\sf y[m-1] = t/m; } \> // end of computation \\ |
---|
304 | {\sf \} } \> \> // end of function |
---|
305 | \end{tabbing} |
---|
306 | \end{center} |
---|
307 | }} |
---|
308 | \caption{Generic example of a subprogram to be activated} |
---|
309 | \label{code1} |
---|
310 | \end{figure} |
---|
311 | % |
---|
312 | |
---|
313 | If {\sf eval} is to be called from within an active C(++) |
---|
314 | section with {\sf x} |
---|
315 | and {\sf y} as vectors of {\sf adouble}s and the other parameters |
---|
316 | passive, then one merely has to change the type declarations of all |
---|
317 | variables that depend on {\sf x} from {\sf double} or {\sf float} to |
---|
318 | {\sf adouble}. Subsequently, the subprogram must be compiled with the |
---|
319 | header file \verb=<adolc/adouble.h>= included as described |
---|
320 | in \autoref{DecActVar}. Now let us consider the situation when {\sf eval} is |
---|
321 | still to be called with integer and real arguments, possibly from |
---|
322 | a program written in Fortran77, which does not allow overloading. |
---|
323 | |
---|
324 | To automatically compute derivatives of the dependent |
---|
325 | variables {\sf y} with respect to the independent variables {\sf x}, we |
---|
326 | can make the body of the function into an active section. For |
---|
327 | example, we may modify the previous program segment |
---|
328 | as in \autoref{adolcexam}. |
---|
329 | The renaming and doubling up of the original independent and dependent |
---|
330 | variable vectors by active counterparts may seem at first a bit clumsy. |
---|
331 | However, this transformation has the advantage that the calling |
---|
332 | sequence and the computational part, i.e., where the function is |
---|
333 | really evaluated, of {\sf eval} remain completely |
---|
334 | unaltered. If the temporary variable {\sf t} had remained a {\sf double}, |
---|
335 | the code would not compile, because of a type conflict in the assignment |
---|
336 | following the declaration. More detailed example codes are listed in |
---|
337 | \autoref{example}. |
---|
338 | |
---|
339 | \begin{figure}[htb] |
---|
340 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
341 | \begin{center} |
---|
342 | \begin{tabbing} |
---|
343 | {\sf void eval(} \= {\sf int n,m,} \hspace{1.0 in}\= // number of independents and dependents\\ |
---|
344 | \> {\sf double *px,} \> // independent passive variable vector \\ |
---|
345 | \> {\sf double *py,} \> // dependent passive variable vector \\ |
---|
346 | \> {\sf int *k,} \> // integer parameters \\ |
---|
347 | \> {\sf double *z)} \> // parameter vector \\ |
---|
348 | {\sf \{}\hspace{0.1 in}\= \> // beginning of function body \\ |
---|
349 | \>{\sf short int tag = 0;} \> // tape array and/or tape file specifier\\ |
---|
350 | \>{\sf trace\_on(tag);} \> // start tracing \\ |
---|
351 | \>{\sf adouble *x, *y;} \> // declare active variable pointers \\ |
---|
352 | \>{\sf x = new adouble[n];}\>// declare active independent variables \\ |
---|
353 | \>{\sf y = new adouble[m];} \> // declare active dependent variables \\ |
---|
354 | \>{\sf for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \\ |
---|
355 | \>\hspace{0.2in} {\sf x[i] \boldmath $\ll=$ \unboldmath px[i];} \> // select independent variables \\ |
---|
356 | \>{\sf adouble t = 0;} \> // local variable declaration \\ |
---|
357 | \>{\sf for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> // begin crunch \\ |
---|
358 | \>\hspace{0.2in}{\sf t += z[i]*x[i];} \> // continue crunch \\ |
---|
359 | \>{\sf $\cdots \cdots \cdots \cdots $} \> // continue crunch \\ |
---|
360 | \>{\sf $\cdots \cdots \cdots \cdots $} \> // continue crunch \\ |
---|
361 | \>{\sf y[m-1] = t/m; } \> // end crunch as before\\ |
---|
362 | \>{\sf for (int j=0; j \boldmath $<$ \unboldmath m; j++)} \\ |
---|
363 | \>\hspace{0.2in}{\sf y[j] \boldmath $\gg=$ \unboldmath py[j];} \> // select dependent variables \\ |
---|
364 | \>{\sf delete[] y;} \>// delete dependent active variables \\ |
---|
365 | \>{\sf delete[] x;} \>// delete independent active variables \\ |
---|
366 | \>{\sf trace\_off();} \> // complete tape \\ |
---|
367 | {\sf \}} \>\> // end of function |
---|
368 | \end{tabbing} |
---|
369 | \end{center}}} |
---|
370 | \caption{Activated version of the code listed in \autoref{code1}} |
---|
371 | \label{adolcexam} |
---|
372 | \end{figure} |
---|
373 | % |
---|
374 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
375 | \subsection{Overloaded Operators and Functions} |
---|
376 | \label{OverOper} |
---|
377 | % |
---|
378 | As in the subprogram discussed above, the actual computational |
---|
379 | statements of a C(++) code need not be altered for the purposes of |
---|
380 | automatic differentiation. All arithmetic operations, as well as the |
---|
381 | comparison and assignment operators, are overloaded, so any or all of |
---|
382 | their operands can be an active variable. An {\sf adouble x} occurring |
---|
383 | in a comparison operator is effectively replaced by its real value |
---|
384 | {\sf x.value()}. Most functions contained in the ANSI C standard for |
---|
385 | the math library are overloaded for active arguments. The only |
---|
386 | exceptions are the non-differentiable functions {\sf fmod} and |
---|
387 | {\sf modf}. Otherwise, legitimate C code in active sections can remain |
---|
388 | completely unchanged, provided the direct output of active variables |
---|
389 | is avoided. The rest of this subsection may be skipped by first time |
---|
390 | users who are not worried about marginal issues of differentiability |
---|
391 | and efficiency. |
---|
392 | |
---|
393 | The modulus {\sf fabs(x)} is everywhere Lipschitz continuous but not |
---|
394 | properly differentiable at the origin, which raises the question of |
---|
395 | how this exception ought to be handled. Fortunately, one can easily |
---|
396 | see that {\sf fabs(x)} and all its compositions with smooth |
---|
397 | functions are still directionally differentiable. These |
---|
398 | directional derivatives of arbitrary order can be propagated in the |
---|
399 | forward mode without any ambiguity. In other words, the forward mode as |
---|
400 | implemented in ADOL-C computes Gateaux derivatives |
---|
401 | in certain directions, which reduce to Fr\'echet derivatives only |
---|
402 | if the dependence on the direction is linear. Otherwise, |
---|
403 | the directional derivatives are merely positively homogeneous with |
---|
404 | respect to the scaling of the directions. |
---|
405 | For the reverse mode, ADOL-C sets the derivative of {\sf fabs(x)} at |
---|
406 | the origin somewhat arbitrarily to zero. |
---|
407 | |
---|
408 | We have defined binary functions {\sf fmin} and {\sf fmax} for {\sf adouble} |
---|
409 | arguments, so that function and derivative values are obtained consistent |
---|
410 | with those of {\sf fabs} according to the identities |
---|
411 | \[ |
---|
412 | \min(a,b) = [a+b-|a-b|]/2 \quad {\rm and} \quad |
---|
413 | \max(a,b) = [a+b+|a-b|]/2 \quad . |
---|
414 | \] |
---|
415 | These relations cannot hold if either $a$ or $b$ is infinite, in which |
---|
416 | case {\sf fmin} or {\sf fmax} and their derivatives may still be well |
---|
417 | defined. It should be noted that the directional differentiation of |
---|
418 | {\sf fmin} and {\sf fmax} yields at ties $a=b$ different results from |
---|
419 | the corresponding assignment based on the sign of $a-b$. For example, |
---|
420 | the statement |
---|
421 | \begin{center} |
---|
422 | {\sf if (a $<$ b) c = a; else c = b;} |
---|
423 | \end{center} |
---|
424 | yields for {\sf a}~=~{\sf b} and {\sf a}$^\prime < $~{\sf b}$^\prime$ |
---|
425 | the incorrect directional derivative value |
---|
426 | {\sf c}$^\prime = $~{\sf b}$^\prime$ rather than the correct |
---|
427 | {\sf c}$^\prime = $~{\sf a}$^\prime$. Therefore this form of conditional assignment |
---|
428 | should be avoided by use of the function $\sf fmin(a,b)$. There |
---|
429 | are also versions of {\sf fmin} and {\sf fmax} for two passive |
---|
430 | arguments and mixed passive/active arguments are handled by |
---|
431 | implicit conversion. |
---|
432 | On the function class obtained by composing the modulus with real |
---|
433 | analytic functions, the concept of directional differentiation can be |
---|
434 | extended to the propagation of unique one-sided Taylor expansions. |
---|
435 | The branches taken by {\sf fabs, fmin}, and {\sf fmax}, are recorded |
---|
436 | on the tape. |
---|
437 | |
---|
438 | The functions {\sf sqrt}, {\sf pow}, and some inverse trigonometric |
---|
439 | functions have infinite slopes at the boundary points of their domains. |
---|
440 | At these marginal points the derivatives are set by ADOL-C to |
---|
441 | either {\sf $\pm$InfVal}, 0 |
---|
442 | or {\sf NoNum}, where {\sf InfVal} and {\sf NoNum} are user-defined |
---|
443 | parameters, see \autoref{Customizing}. |
---|
444 | On IEEE machines {\sf InfVal} can be set to the special value |
---|
445 | {\sf Inf}~=~$1.0/0.0$ and {\sf NoNum} to {\sf NaN}~=~$0.0/0.0$. |
---|
446 | For example, at {\sf a}~=~0 the first derivative {\sf b}$^\prime$ |
---|
447 | of {\sf b}~=~{\sf sqrt(a)} is set to |
---|
448 | \[ |
---|
449 | {\sf b}^\prime = \left\{ |
---|
450 | \begin{array}{ll} |
---|
451 | \sf InfVal&\mbox{if}\;\; {\sf a}^\prime>0 \\ |
---|
452 | 0&\mbox{if}\;\;{\sf a}^\prime =0 \\ |
---|
453 | \sf NoNum&\mbox{if}\;\;{\sf a}^\prime <0\\ |
---|
454 | \end{array} \right. \enspace . |
---|
455 | \] |
---|
456 | In other words, we consider {\sf a} and |
---|
457 | consequently {\sf b} as a constant when {\sf a}$^\prime$ or more generally |
---|
458 | all computed Taylor coefficients are zero. |
---|
459 | |
---|
460 | The general power function ${\sf pow(x,y)=x^y}$ is computed whenever |
---|
461 | it is defined for the corresponding {\sf double} arguments. If {\sf x} is |
---|
462 | negative, however, the partial derivative with respect to an integral exponent |
---|
463 | is set to zero. |
---|
464 | %Similarly, the partial of {\bf pow} with respect to both arguments |
---|
465 | %is set to zero at the origin, where both arguments vanish. |
---|
466 | The derivatives of the step functions |
---|
467 | {\sf floor}, {\sf ceil}, {\sf frexp}, and {\sf ldexp} are set to zero at all |
---|
468 | arguments {\sf x}. The result values of the step functions |
---|
469 | are recorded on the tape and can later be checked to recognize |
---|
470 | whether a step to another level was taken during a forward sweep |
---|
471 | at different arguments than at taping time. |
---|
472 | |
---|
473 | Some C implementations supply other special |
---|
474 | functions, in particular the error function {\sf erf(x)}. For the |
---|
475 | latter, we have included an {\sf adouble} version in \verb=<adouble.cpp>=, which |
---|
476 | has been commented out for systems on which the {\sf double} valued version |
---|
477 | is not available. The increment and decrement operators {\sf ++}, \boldmath $--$ \unboldmath (prefix and |
---|
478 | postfix) are available for {\sf adouble}s. |
---|
479 | % |
---|
480 | % XXX: Vector and matrix class have to be reimplemented !!! |
---|
481 | % |
---|
482 | % and also the |
---|
483 | %active subscripts described in the \autoref{act_subscr}. |
---|
484 | Ambiguous statements like {\sf a += a++;} must be |
---|
485 | avoided because the compiler may sequence the evaluation of the |
---|
486 | overloaded |
---|
487 | expression differently from the original in terms of {\sf double}s. |
---|
488 | |
---|
489 | As we have indicated above, all subroutines called with active arguments |
---|
490 | must be modified or suitably overloaded. The simplest procedure is |
---|
491 | to declare the local variables of the function as active so that |
---|
492 | their internal calculations are also recorded on the tape. |
---|
493 | Unfortunately, this approach is likely to be unnecessarily inefficient |
---|
494 | and inaccurate if the original subroutine evaluates a special function |
---|
495 | that is defined as the solution of a particular mathematical problem. |
---|
496 | The most important examples are implicit functions, quadratures, |
---|
497 | and solutions of ordinary differential equations. Often |
---|
498 | the numerical methods for evaluating such special functions are |
---|
499 | elaborate, and their internal workings are not at all differentiable in |
---|
500 | the data. Rather than differentiating through such an adaptive |
---|
501 | procedure, one can obtain first and higher derivatives directly from |
---|
502 | the mathematical definition of the special function. Currently this |
---|
503 | direct approach has been implemented only for user-supplied quadratures |
---|
504 | as described in \autoref{quadrat}. |
---|
505 | % |
---|
506 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
507 | \subsection{Reusing the Tape for Arbitrary Input Values} |
---|
508 | \label{reuse_tape} |
---|
509 | % |
---|
510 | In some situations it may be desirable to calculate the value and |
---|
511 | derivatives of a function at arbitrary arguments by using a tape of |
---|
512 | the function evaluation at one argument and reevaluating the |
---|
513 | function and its derivatives using the given ADOL-C |
---|
514 | routines. This approach can |
---|
515 | significantly reduce run times, and it |
---|
516 | also allows to port problem functions, in the form of the |
---|
517 | corresponding tape files, into a computing environment that |
---|
518 | does not support C++ but does support C or Fortran. |
---|
519 | Therefore, the routines provided by ADOL-C for the evaluation of derivatives |
---|
520 | can be used to at arguments $x$ other than the |
---|
521 | point at which the tape was generated, provided there are |
---|
522 | no user defined quadratures and all comparisons involving |
---|
523 | {\sf adouble}s yield the same result. The last condition |
---|
524 | implies that the control flow is unaltered by the change |
---|
525 | of the independent variable values. Therefore, this sufficient |
---|
526 | condition is tested by ADOL-C and if it is not met |
---|
527 | the ADOL-C routine called for derivative calculations indicates this |
---|
528 | contingency through its return value. Currently, there are six return values, |
---|
529 | see \autoref{retvalues}. |
---|
530 | \begin{table}[h] |
---|
531 | \center\small |
---|
532 | \begin{tabular}{|r|l|}\hline |
---|
533 | +3 & |
---|
534 | \begin{minipage}{12.5cm} |
---|
535 | \vspace*{1ex} |
---|
536 | The function is locally analytic. |
---|
537 | \vspace*{1ex} |
---|
538 | \end{minipage} \\ \hline |
---|
539 | +2 & |
---|
540 | \begin{minipage}{12.5cm} |
---|
541 | \vspace*{1ex} |
---|
542 | The function is locally analytic but the sparsity |
---|
543 | structure (compared to the situation at the taping point) |
---|
544 | may have changed, e.g. while at taping arguments |
---|
545 | {\sf fmax(a,b)} returned {\sf a} we get {\sf b} at |
---|
546 | the argument currently used. |
---|
547 | \vspace*{1ex} |
---|
548 | \end{minipage} \\ \hline |
---|
549 | +1 & |
---|
550 | \begin{minipage}{12.5cm} |
---|
551 | \vspace*{1ex} |
---|
552 | At least one of the functions {\sf fmin}, {\sf fmax} or {\sf fabs} |
---|
553 | is evaluated at a tie or zero, respectively. Hence, the function to be differentiated is |
---|
554 | Lipschitz-continuous but possibly non-differentiable. |
---|
555 | \vspace*{1ex} |
---|
556 | \end{minipage} \\ \hline |
---|
557 | 0 & |
---|
558 | \begin{minipage}{12.5cm} |
---|
559 | \vspace*{1ex} |
---|
560 | Some arithmetic comparison involving {\sf adouble}s yields a tie. |
---|
561 | Hence, the function to be differentiated may be discontinuous. |
---|
562 | \vspace*{1ex} |
---|
563 | \end{minipage} \\ \hline |
---|
564 | -1 & |
---|
565 | \begin{minipage}{12.5cm} |
---|
566 | \vspace*{1ex} |
---|
567 | An {\sf adouble} comparison yields different results |
---|
568 | from the evaluation point at which the tape was generated. |
---|
569 | \vspace*{1ex} |
---|
570 | \end{minipage} \\ \hline |
---|
571 | -2 & |
---|
572 | \begin{minipage}{12.5cm} |
---|
573 | \vspace*{1ex} |
---|
574 | The argument of a user-defined quadrature has changed |
---|
575 | from the evaluation point at which the tape was generated. |
---|
576 | \vspace*{1ex} |
---|
577 | \end{minipage} \\ \hline |
---|
578 | \end{tabular} |
---|
579 | \caption{Description of return values} |
---|
580 | \label{retvalues} |
---|
581 | \end{table} |
---|
582 | |
---|
583 | \begin{figure}[h] |
---|
584 | \centering\includegraphics[width=10.0cm]{tap_point} |
---|
585 | \caption{Return values around the taping point} |
---|
586 | \label{fi:tap_point} |
---|
587 | \end{figure} |
---|
588 | |
---|
589 | In \autoref{fi:tap_point} these return values are illustrated. |
---|
590 | If the user finds the return value of an ADOL-C routine to be negative the |
---|
591 | taping process simply has to be repeated by executing the active section again. |
---|
592 | The crux of the problem lies in the fact that the tape records only |
---|
593 | the operations that are executed during one particular evaluation of the |
---|
594 | function. |
---|
595 | It also has no way to evaluate integrals since the corresponding |
---|
596 | quadratures are never recorded on the tape. |
---|
597 | Therefore, when there are user-defined quadratures the retaping is necessary at each |
---|
598 | new point. If there are only branches conditioned on {\sf adouble} |
---|
599 | comparisons one may hope that re-taping becomes unnecessary when |
---|
600 | the points settle down in some small neighborhood, as one would |
---|
601 | expect for example in an iterative equation solver. |
---|
602 | % |
---|
603 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
604 | \subsection{Conditional Assignments} |
---|
605 | \label{condassign} |
---|
606 | % |
---|
607 | It appears unsatisfactory that, for example, a simple table lookup |
---|
608 | of some physical property forces the re-recording of a possibly |
---|
609 | much larger calculation. However, the basic philosophy of ADOL-C |
---|
610 | is to overload arithmetic, rather than to generate a new program |
---|
611 | with jumps between ``instructions'', which would destroy the |
---|
612 | strictly sequential tape access and |
---|
613 | require the infusion of substantial compiler technology. |
---|
614 | Therefore, we introduce the two constructs of conditional |
---|
615 | assignments and active integers as partial remedies to the |
---|
616 | branching problem. |
---|
617 | |
---|
618 | In many cases, the functionality of branches |
---|
619 | can be replaced by conditional assignments. |
---|
620 | For this purpose, we provide a special function called |
---|
621 | {\sf condassign(a,b,c,d)}. Its calling sequence corresponds to the |
---|
622 | syntax of the conditional assignment |
---|
623 | \begin{center} |
---|
624 | {\sf a = (b \boldmath $>$ \unboldmath 0) ? c : d;} |
---|
625 | \end{center} |
---|
626 | which C++ inherited from C. However, here the arguments are restricted to be |
---|
627 | active or passive scalar arguments, and all expression arguments |
---|
628 | are evaluated before the test on {\sf b}, which is different from |
---|
629 | the usual conditional assignment or the code segment. |
---|
630 | |
---|
631 | Suppose the original program contains the code segment |
---|
632 | \begin{center} |
---|
633 | {\sf if (b \boldmath $>$ \unboldmath 0) a = c; else a = d;}\\ |
---|
634 | \end{center} |
---|
635 | Here, only one of the expressions (or, more generally, program blocks) |
---|
636 | {\sf c} and {\sf d} is evaluated, which exactly constitutes the problem |
---|
637 | for ADOL-C. To obtain the correct value {\sf a} with ADOL-C, one |
---|
638 | may first execute both branches and then pick either {\sf c} |
---|
639 | or {\sf d} using |
---|
640 | {\sf condassign(a,b,c,d)}. To maintain |
---|
641 | consistency with the original code, one has to make sure |
---|
642 | that the two branches do not have any side effects that can |
---|
643 | interfere with each other or may be important for subsequent |
---|
644 | calculations. Furthermore the test parameter {\sf b} has to be an |
---|
645 | {\sf adouble} or an {\sf adouble} expression. Otherwise the |
---|
646 | test condition {\sf b} is recorded on the tape as a {\em constant} with its |
---|
647 | run time value. Thus the original dependency of {\sf b} on |
---|
648 | active variables gets lost, for instance if {\sf b} is a comparison |
---|
649 | expression, see \autoref{OverOper}. |
---|
650 | If there is no {\sf else} part in a conditional assignment, one may call |
---|
651 | the three argument version |
---|
652 | {\sf condassign(a,b,c)}, which |
---|
653 | is logically equivalent to {\sf condassign(a,b,c,a)} in that |
---|
654 | nothing happens if {\sf b} is non-positive. |
---|
655 | The header file \verb=<adolc/adouble.h>= |
---|
656 | contains also corresponding definitions of |
---|
657 | {\sf condassign(a,b,c,d)} |
---|
658 | and {\sf condassign(a,b,c)} for |
---|
659 | passive {\sf double} arguments so that the modified code |
---|
660 | without any differentiation can be tested |
---|
661 | for correctness. |
---|
662 | |
---|
663 | A generalization of this concept for more than two branches, e.g., |
---|
664 | akin to a \texttt{switch} statement or a cascade of \texttt{if...else if}, may be done by enabling |
---|
665 | \textsf{ADOLC\_ADVANCED\_BRANCHING} and performing selection on |
---|
666 | elements of an \texttt{advector} with active indices. |
---|
667 | % |
---|
668 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
669 | \subsection{Step-by-Step Modification Procedure} |
---|
670 | % |
---|
671 | To prepare a section of given C or C++ code for automatic |
---|
672 | differentiation as described above, one applies the following step-by-step procedure. |
---|
673 | \begin{enumerate} |
---|
674 | \item |
---|
675 | Use the statements {\sf trace\_on(tag)} or {\sf trace\_on(tag,keep)} |
---|
676 | and {\sf trace\_off()} or {\sf trace\_off(file)} to mark the |
---|
677 | beginning and end of the active section. |
---|
678 | \item |
---|
679 | Select the set of active variables, and change their type from |
---|
680 | {\sf double} or {\sf float} to {\sf adouble}. |
---|
681 | \item |
---|
682 | Select a sequence of independent variables, and initialize them with |
---|
683 | \boldmath $\ll=$ \unboldmath assignments from passive variables or vectors. |
---|
684 | \item |
---|
685 | Select a sequence of dependent variables among the active variables, |
---|
686 | and pass their final values to passive variable or vectors thereof |
---|
687 | by \boldmath $\gg=$ \unboldmath assignments. |
---|
688 | \item |
---|
689 | Compile the codes after including the header file \verb=<adolc/adouble.h>=. |
---|
690 | \end{enumerate} |
---|
691 | Typically, the first compilation will detect several type conflicts |
---|
692 | -- usually attempts to convert from active to passive |
---|
693 | variables or to perform standard I/O of active variables. |
---|
694 | Since all standard |
---|
695 | C programs can be activated by a mechanical application of the |
---|
696 | procedure above, the following section is of importance |
---|
697 | only to advanced users. |
---|
698 | % |
---|
699 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
700 | \section{Numbering the Tapes and Controlling the Buffer} |
---|
701 | \label{tape} |
---|
702 | % |
---|
703 | The trace generated by the execution of an active section may stay |
---|
704 | within a triplet of internal arrays or it may be written out |
---|
705 | to three corresponding files. We will refer to these triplets as the |
---|
706 | tape array or tape file, in general tape, which may subsequently be |
---|
707 | used to evaluate the |
---|
708 | underlying function and its derivatives at the original point or at |
---|
709 | alternative arguments. If the active section involves user-defined |
---|
710 | quadratures it must be executed and |
---|
711 | re-taped at each new argument. Similarly, if conditions on |
---|
712 | {\sf adouble} values lead to a different program branch being taken at |
---|
713 | a new argument the evaluation process also needs to be re-taped at the |
---|
714 | new point. Otherwise, direct evaluation from |
---|
715 | the tape by the routine {\sf function} (\autoref{optdrivers}) is |
---|
716 | likely to be |
---|
717 | faster. The use of quadratures and the results of all comparisons on |
---|
718 | {\sf adouble}s are recorded on the tape so that {\sf function} and other |
---|
719 | forward routines stop and return appropriate flags if their use without |
---|
720 | prior re-taping is unsafe. To avoid any re-taping certain types of |
---|
721 | branches can be recorded on the tape through |
---|
722 | the use of conditional assignments |
---|
723 | described before in \autoref{condassign}. |
---|
724 | |
---|
725 | Several tapes may be generated and kept simultaneously. |
---|
726 | A tape array is used as a triplet of buffers or a tape file is generated if |
---|
727 | the length of any of the buffers exceeds the maximal array lengths of |
---|
728 | {\sf OBUFSIZE}, {\sf VBUFSIZE} or {\sf LBUFSIZE}. These parameters are |
---|
729 | defined in the header file \verb=<adolc/internal/usrparms.h>= |
---|
730 | and may be adjusted by the user in the header file before compiling |
---|
731 | the ADOL-C library, or on runtime using a file named \verb=.adolcrc=. |
---|
732 | Lines in this file must have the form |
---|
733 | \begin{verbatim} |
---|
734 | "VARIABLE" = "VALUE" |
---|
735 | \end{verbatim} |
---|
736 | where the quotation marks are mandatory. |
---|
737 | The filesystem folder, where the tapes files may be written to disk, |
---|
738 | can be changed by changing the definition of {\sf TAPE\_DIR} in |
---|
739 | the header file \verb=<adolc/dvlparms.h>= before |
---|
740 | compiling the ADOL-C library, or on runtime by defining {\sf |
---|
741 | TAPE\_DIR} in the \verb=.adolcrc= file. By default this is defined |
---|
742 | to be the present working directory (\verb=.=). |
---|
743 | |
---|
744 | For simple usage, {\sf trace\_on} may be called with only the tape |
---|
745 | {\sf tag} as argument, and {\sf trace\_off} may be called |
---|
746 | without argument. The optional integer argument {\sf keep} of |
---|
747 | {\sf trace\_on} determines whether the numerical values of all |
---|
748 | active variables are recorded in a buffered temporary array or file |
---|
749 | called the taylor stack. |
---|
750 | This option takes effect if |
---|
751 | {\sf keep} = 1 and prepares the scene for an immediately following |
---|
752 | gradient evaluation by a call to a routine implementing the reverse mode |
---|
753 | as described in the \autoref{forw_rev_ad} and \autoref{forw_rev}. A |
---|
754 | file is used instead of an array if the size exceeds the maximal array |
---|
755 | length of {\sf TBUFSIZE} defined in \verb=<adolc/internal/usrparms.h>= and may |
---|
756 | be adjusted in the same way like the other buffer sizes mentioned above. |
---|
757 | Alternatively, gradients may be evaluated by a call |
---|
758 | to {\sf gradient}, which includes a preparatory forward sweep |
---|
759 | for the creation of the temporary file. If omitted, the argument |
---|
760 | {\sf keep} defaults to 0, so that no temporary |
---|
761 | taylor stack file is generated. |
---|
762 | |
---|
763 | By setting the optional integer argument {\sf file} of |
---|
764 | {\sf trace\_off} to 1, the user may force a numbered tape |
---|
765 | file to be written even if the tape array (buffer) does not overflow. |
---|
766 | If the argument {\sf file} is omitted, it |
---|
767 | defaults to 0, so that the tape array is written onto a tape file only |
---|
768 | if the length of any of the buffers exceeds {\sf [OLVT]BUFSIZE} elements. |
---|
769 | |
---|
770 | After the execution of an active section, if a tape file was generated, i.e., |
---|
771 | if the length of some buffer exceeded {\sf [OLVT]BUFSIZE} elements or if the |
---|
772 | argument {\sf file} of {\sf trace\_off} was set to 1, the files will be |
---|
773 | saved in the directory defined as {\sf ADOLC\_TAPE\_DIR} (by default |
---|
774 | the current working directory) under filenames formed by |
---|
775 | the strings {\sf ADOLC\_OPERATIONS\_NAME}, {\sf |
---|
776 | ADOLC\_LOCATIONS\_NAME}, {\sf ADOLC\_VALUES\_NAME} and {\sf |
---|
777 | ADOLC\_TAYLORS\_NAME} defined in |
---|
778 | the header file \verb=<adolc/dvlparms.h>= appended with the number |
---|
779 | given as the {\sf tag} argument to {\sf trace\_on} and have the |
---|
780 | extension {\sf .tap}. |
---|
781 | |
---|
782 | Later, all problem-independent routines |
---|
783 | like {\sf gradient}, {\sf jacobian}, {\sf forward}, {\sf reverse}, and others |
---|
784 | expect as first argument a {\sf tag} to determine |
---|
785 | the tape on which their respective computational task is to be performed. |
---|
786 | By calling {\sf trace\_on} with different tape {\sf tag}s, one can create |
---|
787 | several tapes for various function evaluations and subsequently perform |
---|
788 | function and derivative evaluations on one or more of them. |
---|
789 | |
---|
790 | For example, suppose one wishes to calculate for two smooth functions |
---|
791 | $f_1(x)$ and $f_2(x)$ |
---|
792 | \[ |
---|
793 | f(x) = \max \{f_1(x) ,f_2(x)\},\qquad \nabla f(x), |
---|
794 | \] |
---|
795 | and possibly higher derivatives where the two functions do not tie. |
---|
796 | Provided $f_1$ and $f_2$ are evaluated in two separate active sections, |
---|
797 | one can generate two different tapes by calling {\sf trace\_on} with |
---|
798 | {\sf tag} = 1 and {\sf tag} = 2 at the beginning of the respective active |
---|
799 | sections. |
---|
800 | Subsequently, one can decide whether $f(x)=f_1(x)$ or $f(x)=f_2(x)$ at the |
---|
801 | current argument and then evaluate the gradient $\nabla f(x)$ by calling |
---|
802 | {\sf gradient} with the appropriate argument value {\sf tag} = 1 or |
---|
803 | {\sf tag} = 2. |
---|
804 | % |
---|
805 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
806 | \subsection{Examining the Tape and Predicting Storage Requirements } |
---|
807 | \label{examiningTape} |
---|
808 | % |
---|
809 | At any point in the program, one may call the routine |
---|
810 | \begin{center} |
---|
811 | {\sf void tapestats(unsigned short tag, size\_t* counts)} |
---|
812 | \end{center} |
---|
813 | with {\sf counts} beeing an array of at least eleven integers. |
---|
814 | The first argument {\sf tag} specifies the particular tape of |
---|
815 | interest. The components of {\sf counts} represent |
---|
816 | \[ |
---|
817 | \begin{tabular}{ll} |
---|
818 | {\sf counts[0]}: & the number of independents, i.e.~calls to \boldmath $\ll=$ \unboldmath, \\ |
---|
819 | {\sf counts[1]}: & the number of dependents, i.e.~calls to \boldmath $\gg=$ \unboldmath,\\ |
---|
820 | {\sf counts[2]}: & the maximal number of live active variables,\\ |
---|
821 | {\sf counts[3]}: & the size of taylor stack (number of overwrites),\\ |
---|
822 | {\sf counts[4]}: & the buffer size (a multiple of eight), |
---|
823 | \end{tabular} |
---|
824 | \] |
---|
825 | \[ |
---|
826 | \begin{tabular}{ll} |
---|
827 | {\sf counts[5]}: & the total number of operations recorded,\\ |
---|
828 | {\sf counts[6-13]}: & other internal information about the tape. |
---|
829 | \end{tabular} |
---|
830 | \] |
---|
831 | The values {\sf maxlive} = {\sf counts[2]} and {\sf tssize} = {\sf counts[3]} |
---|
832 | determine the temporary |
---|
833 | storage requirements during calls to the routines |
---|
834 | implementing the forward and the reverse mode. |
---|
835 | For a certain degree {\sf deg} $\geq$ 0, the scalar version of the |
---|
836 | forward mode involves apart from the tape buffers an array of |
---|
837 | $(${\sf deg}$+1)*${\sf maxlive} {\sf double}s in |
---|
838 | core and, in addition, a sequential data set called the value stack |
---|
839 | of {\sf tssize}$*${\sf keep} {\sf revreal}s if called with the |
---|
840 | option {\sf keep} $>$ 0. Here |
---|
841 | the type {\sf revreal} is defined as {\sf double} or {\sf float}. The latter choice halves the storage |
---|
842 | requirement for the sequential data set, which stays in core if |
---|
843 | its length is less than {\sf TBUFSIZE} bytes and is otherwise written |
---|
844 | out to a temporary file. The parameter {\sf TBUFSIZE} is defined in the header file \verb=<adolc/internal/usrparms.h>=. |
---|
845 | The drawback of the economical |
---|
846 | {\sf revreal} = {\sf float} choice is that subsequent calls to reverse mode implementations |
---|
847 | yield gradients and other adjoint vectors only in single-precision |
---|
848 | accuracy. This may be acceptable if the adjoint vectors |
---|
849 | represent rows of a Jacobian that is used for the calculation of |
---|
850 | Newton steps. In its scalar version, the reverse mode implementation involves |
---|
851 | the same number of {\sf double}s and twice as many {\sf revreal}s as the |
---|
852 | forward mode implementation. |
---|
853 | The storage requirements of the vector versions of the forward mode and |
---|
854 | reverse mode implementation are equal to that of the scalar versions multiplied by |
---|
855 | the vector length. |
---|
856 | % |
---|
857 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
858 | \subsection{Customizing ADOL-C} |
---|
859 | \label{Customizing} |
---|
860 | % |
---|
861 | Based on the information provided by the routine {\sf tapestats}, the user may alter the |
---|
862 | following types and constant dimensions in the header file \verb=<adolc/internal/usrparms.h>= |
---|
863 | to suit his problem and environment. |
---|
864 | |
---|
865 | \begin{description} |
---|
866 | \item[{\sf OBUFSIZE}, {\sf LBUFSIZE}, {\sf VBUFSIZE}{\rm :}] These integer determines the length of |
---|
867 | in\-ter\-nal buf\-fers (default: 524$\,$288). If the buffers are large enough to accommodate all |
---|
868 | required data, any file access is avoided unless {\sf trace\_off} |
---|
869 | is called with a positive argument. This desirable situation can |
---|
870 | be achieved for many problem functions with an execution trace of moderate |
---|
871 | size. Primarily these values occur as an argument |
---|
872 | to {\sf malloc}, so that setting it unnecessarily large may have no |
---|
873 | ill effects, unless the operating system prohibits or penalizes large |
---|
874 | array allocations. It is however recommended to leave the values in |
---|
875 | \texttt{<adolc/internal/usrparms.h>} unchanged and set them using the |
---|
876 | \texttt{.adolcrc} file in the current working directory at runtime. |
---|
877 | |
---|
878 | \item[{\sf TBUFSIZE}{\rm :}] This integer determines the length of the |
---|
879 | in\-ter\-nal buf\-fer for a taylor stack (default: 524$\,$288). |
---|
880 | |
---|
881 | \item[{\sf TBUFNUM}{\rm :}] This integer determines the maximal number of taylor stacks (default: 32). |
---|
882 | |
---|
883 | \item[{\sf fint}{\rm :}] The integer data type used by Fortran callable versions of functions. |
---|
884 | |
---|
885 | \item[{\sf fdouble}{\rm :}] The floating point data type used by Fortran callable versions of functions. |
---|
886 | |
---|
887 | \item[{\sf inf\_num}{\rm :}] This together with {\sf inf\_den} |
---|
888 | sets the ``vertical'' slope {\sf InfVal} = {\sf inf\_num/inf\_den} |
---|
889 | of special functions at the boundaries of their domains (default: {\sf inf\_num} = 1.0). On IEEE machines |
---|
890 | the default setting produces the standard {\sf Inf}. On non-IEEE machines |
---|
891 | change these values to produce a small {\sf InfVal} value and compare |
---|
892 | the results of two forward sweeps with different {\sf InfVal} settings |
---|
893 | to detect a ``vertical'' slope. |
---|
894 | |
---|
895 | \item[{\sf inf\_den}{\rm :}] See {\sf inf\_num} (default: 0.0). |
---|
896 | |
---|
897 | \item[{\sf non\_num}{\rm :}] This together with {\sf non\_den} |
---|
898 | sets the mathematically |
---|
899 | undefined derivative value {\sf NoNum} = {\sf non\_num/non\_den} |
---|
900 | of special functions at the boundaries of their domains (default: {\sf non\_num} = 0.0). On IEEE machines |
---|
901 | the default setting produces the standard {\sf NaN}. On non-IEEE machines |
---|
902 | change these values to produce a small {\sf NoNum} value and compare |
---|
903 | the results of two forward sweeps with different {\sf NoNum} settings |
---|
904 | to detect the occurrence of undefined derivative values. |
---|
905 | |
---|
906 | \item[{\sf non\_den}{\rm :}] See {\sf non\_num} (default: 0.0). |
---|
907 | |
---|
908 | \item[{\sf ADOLC\_EPS}{\rm :}] For testing on small numbers to avoid overflows (default: 10E-20). |
---|
909 | |
---|
910 | \item[{\sf DIAG\_OUT}{\rm :}] File identifier used as standard output for ADOL-C diagnostics (default: stdout). |
---|
911 | \end{description} |
---|
912 | |
---|
913 | The following types and options may be set using the command-line |
---|
914 | options of the \texttt{./configure} script. |
---|
915 | |
---|
916 | \begin{description} |
---|
917 | \item[{\sf locint}{\rm :}] The range of the integer type |
---|
918 | {\sf locint} determines how many {\sf adouble}s can be simultaneously |
---|
919 | alive (default: {\sf unsigned int}). In extreme cases when there are more than $2^{32}$ {\sf adouble}s |
---|
920 | alive at any one time, the type {\sf locint} must be changed to |
---|
921 | {\sf unsigned long}. This can be done by passing |
---|
922 | \texttt{--enable-ulong} to \texttt{./configure}. |
---|
923 | |
---|
924 | \item[{\sf revreal}{\rm :}] The choice of this floating-point type |
---|
925 | trades accuracy with storage for reverse sweeps (default: {\sf double}). While functions |
---|
926 | and their derivatives are always evaluated in double precision |
---|
927 | during forward sweeps, gradients and other adjoint vectors are obtained |
---|
928 | with the precision determined by the type {\sf revreal}. The less |
---|
929 | accurate choice {\sf revreal} = {\sf float} nearly halves the |
---|
930 | storage requirement during reverse sweeps. This can be done by passing |
---|
931 | \texttt{--disable-double} to \texttt{./configure}. |
---|
932 | |
---|
933 | \item[{\sf ATRIG\_ERF}{\rm :}] The overloaded versions of the inverse |
---|
934 | hyperbolic functions and the error function are enabled (default: |
---|
935 | undefined) by passing \texttt{--enable-atrig-erf} |
---|
936 | to \texttt{./configure} |
---|
937 | |
---|
938 | \item[{\sf ADOLC\_USE\_CALLOC}{\rm :}] Selects the memory allocation routine |
---|
939 | used by ADOL-C. {\sf Malloc} will be used if this variable is |
---|
940 | undefined. {\sf ADOLC\_USE\_CALLOC} is defined by default to avoid incorrect |
---|
941 | result caused by uninitialized memory. It can be set undefined by |
---|
942 | passing \texttt{--disable-use-calloc} to \texttt{./configure}. |
---|
943 | |
---|
944 | \item[{\sf ADOLC\_ADVANCED\_BRANCHING}{\rm :}] Enables routines |
---|
945 | required for automatic branch selection (default: disabled). The |
---|
946 | boolean valued comparison operators with two \texttt{adouble} type |
---|
947 | arguments will not return boolean values anymore and may not be used |
---|
948 | in branch control statements (\texttt{if}, \texttt{while}, \texttt{for} |
---|
949 | etc.). Instead conditional assignments using \texttt{condassign} or |
---|
950 | selection operations on elements of \texttt{advector} type should be |
---|
951 | used. Enabling this option and rewriting the function evaluation |
---|
952 | using \texttt{condassign} or selections of \texttt{advector} |
---|
953 | elements will prevent the need for retracing the function at branch |
---|
954 | switches. This can be enabled by passing |
---|
955 | \texttt{--enable-advanced-branching} to \texttt{./configure}. |
---|
956 | \end{description} |
---|
957 | % |
---|
958 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
959 | \subsection{Warnings and Suggestions for Improved Efficiency} |
---|
960 | \label{WarSug} |
---|
961 | % |
---|
962 | Since the type {\sf adouble} has a nontrivial constructor, |
---|
963 | the mere declaration of large {\sf adouble} arrays may take up |
---|
964 | considerable run time. The user should be warned against |
---|
965 | the usual Fortran practice of declaring fixed-size arrays |
---|
966 | that can accommodate the largest possible case of an evaluation program |
---|
967 | with variable dimensions. If such programs are converted to or written |
---|
968 | in C, the overloading in combination with ADOL-C will lead to very |
---|
969 | large run time increases for comparatively small values of the |
---|
970 | problem dimension, because the actual computation is completely |
---|
971 | dominated by the construction of the large {\sf adouble} arrays. |
---|
972 | The user is advised to |
---|
973 | create dynamic arrays of |
---|
974 | {\sf adouble}s by using the C++ operator {\sf new} and to destroy them |
---|
975 | using {\sf delete}. For storage efficiency it is desirable that |
---|
976 | dynamic objects are created and destroyed in a last-in-first-out |
---|
977 | fashion. |
---|
978 | |
---|
979 | Whenever an {\sf adouble} is declared, the constructor for the type |
---|
980 | {\sf adouble} assigns it a nominal address, which we will refer to as |
---|
981 | its {\em location}. The location is of the type {\sf locint} defined |
---|
982 | in the header file \verb=<adolc/internal/usrparms.h>=. Active vectors occupy |
---|
983 | a range of contiguous locations. As long as the program execution |
---|
984 | never involves more than 65$\,$536 active variables, the type {\sf locint} |
---|
985 | may be defined as {\sf unsigned short}. Otherwise, the range may be |
---|
986 | extended by defining {\sf locint} as {\sf (unsigned) int} or |
---|
987 | {\sf (unsigned) long}, which may nearly double |
---|
988 | the overall mass storage requirement. Sometimes one can avoid exceeding |
---|
989 | the accessible range of {\sf unsigned short}s by using more local variables and deleting |
---|
990 | {\sf adouble}s created by the new operator in a |
---|
991 | last-in-first-out |
---|
992 | fashion. When memory for {\sf adouble}s is requested through a call to |
---|
993 | {\sf malloc()} or other related C memory-allocating |
---|
994 | functions, the storage for these {\sf adouble}s is allocated; however, the |
---|
995 | C++ {\sf adouble} constructor is never called. The newly defined |
---|
996 | {\sf adouble}s are never assigned a location and are not counted in |
---|
997 | the stack of live variables. Thus, any results depending upon these |
---|
998 | pseudo-{\sf adouble}s will be incorrect. For these reasons {\bf DO NOT use |
---|
999 | malloc() and related C memory-allocating |
---|
1000 | functions when declaring adoubles (see the following paragraph).} |
---|
1001 | % |
---|
1002 | % XXX: Vector and matrix class have to be reimplemented !!! |
---|
1003 | % |
---|
1004 | %The same point applies, of course, |
---|
1005 | % for active vectors. |
---|
1006 | |
---|
1007 | When an {\sf adouble} |
---|
1008 | % |
---|
1009 | % XXX: Vector and matrix class have to be reimplemented !!! |
---|
1010 | % |
---|
1011 | % or {\bf adoublev} |
---|
1012 | goes out of |
---|
1013 | scope or is explicitly deleted, the destructor notices that its |
---|
1014 | location(s) may be |
---|
1015 | freed for subsequent (nominal) reallocation. In general, this is not done |
---|
1016 | immediately but is delayed until the locations to be deallocated form a |
---|
1017 | contiguous tail of all locations currently being used. |
---|
1018 | |
---|
1019 | As a consequence of this allocation scheme, the currently |
---|
1020 | alive {\sf adouble} locations always form a contiguous range of integers |
---|
1021 | that grows and shrinks like a stack. Newly declared {\sf adouble}s are |
---|
1022 | placed on the top so that vectors of {\sf adouble}s obtain a contiguous |
---|
1023 | range of locations. While the C++ compiler can be expected to construct |
---|
1024 | and destruct automatic variables in a last-in-first-out fashion, the |
---|
1025 | user may upset this desirable pattern by deleting free-store {\sf adouble}s |
---|
1026 | too early or too late. Then the {\sf adouble} stack may grow |
---|
1027 | unnecessarily, but the numerical results will still be |
---|
1028 | correct, unless an exception occurs because the range of {\sf locint} |
---|
1029 | is exceeded. In general, free-store {\sf adouble}s |
---|
1030 | % |
---|
1031 | % XXX: Vector and matrix class have to be reimplemented !!! |
---|
1032 | % |
---|
1033 | %and {\bf adoublev}s |
---|
1034 | should be deleted in a last-in-first-out fashion toward the end of |
---|
1035 | the program block in which they were created. |
---|
1036 | When this pattern is maintained, the maximum number of |
---|
1037 | {\sf adouble}s alive and, as a consequence, the |
---|
1038 | randomly accessed storage space |
---|
1039 | of the derivative evaluation routines is bounded by a |
---|
1040 | small multiple of the memory used in the relevant section of the |
---|
1041 | original program. Failure to delete dynamically allocated {\sf adouble}s |
---|
1042 | may cause that the maximal number of {\sf adouble}s alive at one time will be exceeded |
---|
1043 | if the same active section is called repeatedly. The same effect |
---|
1044 | occurs if static {\sf adouble}s are used. |
---|
1045 | |
---|
1046 | To avoid the storage and manipulation of structurally |
---|
1047 | trivial derivative values, one should pay careful attention to |
---|
1048 | the naming of variables. Ideally, the intermediate |
---|
1049 | values generated during the evaluation of a vector function |
---|
1050 | should be assigned to program variables that are |
---|
1051 | consistently either active or passive, in that all their values |
---|
1052 | either are or are not dependent on the independent variables |
---|
1053 | in a nontrivial way. For example, this rule is violated if a temporary |
---|
1054 | variable is successively used to accumulate inner products involving |
---|
1055 | first only passive and later active arrays. Then the first inner |
---|
1056 | product and all its successors in the data dependency graph become |
---|
1057 | artificially active and the derivative evaluation routines |
---|
1058 | described later will waste |
---|
1059 | time allocating and propagating |
---|
1060 | trivial or useless derivatives. Sometimes even values that do |
---|
1061 | depend on the independent variables may be of only transitory |
---|
1062 | importance and may not affect the dependent variables. For example, |
---|
1063 | this is true for multipliers that are used to scale linear |
---|
1064 | equations, but whose values do not influence the dependent |
---|
1065 | variables in a mathematical sense. Such dead-end variables |
---|
1066 | can be deactivated by the use of the {\sf value} function, which |
---|
1067 | converts {\sf adouble}s to {\sf double}s. The deleterious effects |
---|
1068 | of unnecessary activity are partly alleviated by run time |
---|
1069 | activity flags in the derivative routine |
---|
1070 | {\sf hov\_reverse} presented in \autoref{forw_rev_ad}. |
---|
1071 | |
---|
1072 | The {\sf adouble} default constructor sets to zero the associated value. |
---|
1073 | This implies a certain overhead that may seem unnecessary when no initial value |
---|
1074 | is actually given, however, |
---|
1075 | the implicit initialization of arrays from a partial value list is the only legitimate construct (known to us) that requires this behavior. |
---|
1076 | An array instantiation such as |
---|
1077 | \begin{center} |
---|
1078 | \sf double x[3]=\{2.0\}; |
---|
1079 | \end{center} |
---|
1080 | will initialize {\sf x[0]} to {\sf 2.0} and initialize (implicitly) the remaining array elements |
---|
1081 | {\sf x[1]} and {\sf x[2]} to {\sf 0.0}. According to the C++ standard the array element construction of |
---|
1082 | the type changed instantiation |
---|
1083 | \begin{center} |
---|
1084 | \sf adouble x[3]=\{2.0\}; |
---|
1085 | \end{center} |
---|
1086 | will use the constructor {\sf adouble(const double\&);} for {\sf x[0]} passing in {\sf 2.0} but |
---|
1087 | will call the {\sf adouble} default constructor {\sf x[1]} and {\sf x[2]} leaving these array |
---|
1088 | elements uninitialized {\em unless} the default constructor does implement the initialization to |
---|
1089 | zero. |
---|
1090 | The C++ constructor syntax does not provide a means to distinguish this implicit initialization from the declaration of any simple uninitialized variable. |
---|
1091 | If the user can ascertain the absence of array instantiations such as the above then one can |
---|
1092 | configure ADOL-C with the \verb=--disable-stdczero= option , see \autoref{genlib}, to |
---|
1093 | avoid the overhead of these initializations. |
---|
1094 | |
---|
1095 | % |
---|
1096 | % |
---|
1097 | % |
---|
1098 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1099 | \section{Easy-To-Use Drivers} |
---|
1100 | \label{drivers} |
---|
1101 | % |
---|
1102 | For the convenience of the user, ADOL-C provides several |
---|
1103 | easy-to-use drivers that compute the most frequently required |
---|
1104 | derivative objects. Throughout, we assume that after the execution of an |
---|
1105 | active section, the corresponding tape with the identifier {\sf tag} |
---|
1106 | contains a detailed record of the computational process by which the |
---|
1107 | final values $y$ of the dependent variables were obtained from the |
---|
1108 | values $x$ of the independent variables. We will denote this functional |
---|
1109 | relation between the input variables $x$ and the output variables $y$ by |
---|
1110 | \[ |
---|
1111 | F : \R^n \mapsto \R^m, \qquad x \rightarrow F(x) \equiv y. |
---|
1112 | \] |
---|
1113 | The return value of all drivers presented in this section |
---|
1114 | indicate the validity of the tape as explained in \autoref{reuse_tape}. |
---|
1115 | The presented drivers are all C functions and therefore can be used within |
---|
1116 | C and C++ programs. Some Fortran-callable companions can be found |
---|
1117 | in the appropriate header files. |
---|
1118 | % |
---|
1119 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1120 | \subsection{Drivers for Optimization and Nonlinear Equations} |
---|
1121 | % |
---|
1122 | \label{optdrivers} |
---|
1123 | % |
---|
1124 | The drivers provided for solving optimization problems and nonlinear |
---|
1125 | equations are prototyped in the header file \verb=<adolc/drivers/drivers.h>=, |
---|
1126 | which is included automatically by the global header file \verb=<adolc/adolc.h>= |
---|
1127 | (see \autoref{ssec:DesIH}). |
---|
1128 | |
---|
1129 | The routine {\sf function} allows to evaluate the desired function from |
---|
1130 | the tape instead of executing the corresponding source code: |
---|
1131 | % |
---|
1132 | \begin{tabbing} |
---|
1133 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1134 | \>{\sf int function(tag,m,n,x,y)}\\ |
---|
1135 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1136 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1137 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1138 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1139 | \>{\sf double y[m];} \> // dependent vector $y=F(x)$ |
---|
1140 | \end{tabbing} |
---|
1141 | % |
---|
1142 | If the original evaluation program is available this double version |
---|
1143 | should be used to compute the function value in order to avoid the |
---|
1144 | interpretative overhead. |
---|
1145 | |
---|
1146 | For the calculation of whole derivative vectors and matrices up to order |
---|
1147 | 2 there are the following procedures: |
---|
1148 | % |
---|
1149 | \begin{tabbing} |
---|
1150 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1151 | \>{\sf int gradient(tag,n,x,g)}\\ |
---|
1152 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1153 | \>{\sf int n;} \> // number of independent variables $n$ and $m=1$\\ |
---|
1154 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1155 | \>{\sf double g[n];} \> // resulting gradient $\nabla F(x)$ |
---|
1156 | \end{tabbing} |
---|
1157 | % |
---|
1158 | \begin{tabbing} |
---|
1159 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1160 | \>{\sf int jacobian(tag,m,n,x,J)}\\ |
---|
1161 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1162 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1163 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1164 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1165 | \>{\sf double J[m][n];} \> // resulting Jacobian $F^\prime (x)$ |
---|
1166 | \end{tabbing} |
---|
1167 | % |
---|
1168 | \begin{tabbing} |
---|
1169 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1170 | \>{\sf int hessian(tag,n,x,H)}\\ |
---|
1171 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1172 | \>{\sf int n;} \> // number of independent variables $n$ and $m=1$\\ |
---|
1173 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1174 | \>{\sf double H[n][n];} \> // resulting Hessian matrix $\nabla^2F(x)$ |
---|
1175 | \end{tabbing} |
---|
1176 | % |
---|
1177 | The driver routine {\sf hessian} computes only the lower half of |
---|
1178 | $\nabla^2f(x_0)$ so that all values {\sf H[i][j]} with $j>i$ |
---|
1179 | of {\sf H} allocated as a square array remain untouched during the call |
---|
1180 | of {\sf hessian}. Hence only $i+1$ {\sf double}s need to be |
---|
1181 | allocated starting at the position {\sf H[i]}. |
---|
1182 | |
---|
1183 | To use the full capability of automatic differentiation when the |
---|
1184 | product of derivatives with certain weight vectors or directions are needed, ADOL-C offers |
---|
1185 | the following four drivers: |
---|
1186 | % |
---|
1187 | \begin{tabbing} |
---|
1188 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1189 | \>{\sf int vec\_jac(tag,m,n,repeat,x,u,z)}\\ |
---|
1190 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1191 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1192 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1193 | \>{\sf int repeat;} \> // indicate repeated call at same argument\\ |
---|
1194 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1195 | \>{\sf double u[m];} \> // range weight vector $u$ \\ |
---|
1196 | \>{\sf double z[n];} \> // result $z = u^TF^\prime (x)$ |
---|
1197 | \end{tabbing} |
---|
1198 | If a nonzero value of the parameter {\sf repeat} indicates that the |
---|
1199 | routine {\sf vec\_jac} has been called at the same argument immediately |
---|
1200 | before, the internal forward mode evaluation will be skipped and only |
---|
1201 | reverse mode evaluation with the corresponding arguments is executed |
---|
1202 | resulting in a reduced computational complexity of the function {\sf vec\_jac}. |
---|
1203 | % |
---|
1204 | \begin{tabbing} |
---|
1205 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1206 | \>{\sf int jac\_vec(tag,m,n,x,v,z)}\\ |
---|
1207 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1208 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1209 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1210 | \>{\sf double x[n];} \> // independent vector $x$\\ |
---|
1211 | \>{\sf double v[n];} \> // tangent vector $v$\\ |
---|
1212 | \>{\sf double z[m];} \> // result $z = F^\prime (x)v$ |
---|
1213 | \end{tabbing} |
---|
1214 | % |
---|
1215 | \begin{tabbing} |
---|
1216 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1217 | \>{\sf int hess\_vec(tag,n,x,v,z)}\\ |
---|
1218 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1219 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1220 | \>{\sf double x[n];} \> // independent vector $x$\\ |
---|
1221 | \>{\sf double v[n];} \> // tangent vector $v$\\ |
---|
1222 | \>{\sf double z[n];} \> // result $z = \nabla^2F(x) v$ |
---|
1223 | \end{tabbing} |
---|
1224 | % |
---|
1225 | \begin{tabbing} |
---|
1226 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1227 | \>{\sf int hess\_mat(tag,n,p,x,V,Z)}\\ |
---|
1228 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1229 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1230 | \>{\sf int p;} \> // number of columns in $V$\\ |
---|
1231 | \>{\sf double x[n];} \> // independent vector $x$\\ |
---|
1232 | \>{\sf double V[n][p];} \> // tangent matrix $V$\\ |
---|
1233 | \>{\sf double Z[n][p];} \> // result $Z = \nabla^2F(x) V$ |
---|
1234 | \end{tabbing} |
---|
1235 | % |
---|
1236 | \begin{tabbing} |
---|
1237 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1238 | \>{\sf int lagra\_hess\_vec(tag,m,n,x,v,u,h)}\\ |
---|
1239 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1240 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1241 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1242 | \>{\sf double x[n];} \> // independent vector $x$\\ |
---|
1243 | \>{\sf double v[n];} \> // tangent vector $v$\\ |
---|
1244 | \>{\sf double u[m];} \> // range weight vector $u$ \\ |
---|
1245 | \>{\sf double h[n];} \> // result $h = u^T\nabla^2F(x) v $ |
---|
1246 | \end{tabbing} |
---|
1247 | % |
---|
1248 | The next procedure allows the user to perform Newton steps only |
---|
1249 | having the corresponding tape at hand: |
---|
1250 | % |
---|
1251 | \begin{tabbing} |
---|
1252 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1253 | \>{\sf int jac\_solv(tag,n,x,b,mode)} \\ |
---|
1254 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1255 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1256 | \>{\sf double x[n];} \> // independent vector $x$ as\\ |
---|
1257 | \>{\sf double b[n];} \> // in: right-hand side b, out: result $w$ of |
---|
1258 | $F(x)w = b$\\ |
---|
1259 | \>{\sf int mode;} \> // option to choose different solvers |
---|
1260 | \end{tabbing} |
---|
1261 | % |
---|
1262 | On entry, parameter {\sf b} of the routine {\sf jac\_solv} |
---|
1263 | contains the right-hand side of the equation $F(x)w = b$ to be solved. On exit, |
---|
1264 | {\sf b} equals the solution $w$ of this equation. If {\sf mode} = 0 only |
---|
1265 | the Jacobian of the function |
---|
1266 | given by the tape labeled with {\sf tag} is provided internally. |
---|
1267 | The LU-factorization of this Jacobian is computed for {\sf mode} = 1. The |
---|
1268 | solution of the equation is calculated if {\sf mode} = 2. |
---|
1269 | Hence, it is possible to compute the |
---|
1270 | LU-factorization only once. Then the equation can be solved for several |
---|
1271 | right-hand sides $b$ without calculating the Jacobian and |
---|
1272 | its factorization again. |
---|
1273 | |
---|
1274 | If the original evaluation code of a function contains neither |
---|
1275 | quadratures nor branches, all drivers described above can be used to |
---|
1276 | evaluate derivatives at any argument in its domain. The same still |
---|
1277 | applies if there are no user defined quadratures and |
---|
1278 | all comparisons involving {\sf adouble}s have the same result as |
---|
1279 | during taping. If this assumption is falsely made all drivers |
---|
1280 | while internally calling the forward mode evaluation will return the value -1 or -2 |
---|
1281 | as already specified in \autoref{reuse_tape}. |
---|
1282 | |
---|
1283 | % |
---|
1284 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1285 | \subsection{Drivers for Ordinary Differential Equations} |
---|
1286 | \label{odedrivers} |
---|
1287 | % |
---|
1288 | When $F$ is the right-hand side of an (autonomous) ordinary |
---|
1289 | differential equation |
---|
1290 | \[ |
---|
1291 | x^\prime(t) \; = \; F(x(t)) , |
---|
1292 | \] |
---|
1293 | we must have $m=n$. Along any solution path $x(t)$ its Taylor |
---|
1294 | coefficients $x_j$ at some time, e.g., $t=0$, must satisfy |
---|
1295 | the relation |
---|
1296 | \[ |
---|
1297 | x_{i+1} = \frac{1}{1+i} y_i. |
---|
1298 | \] |
---|
1299 | with the $y_j$ the Taylor coefficients of its derivative $y(t)=x^\prime(t)$, namely, |
---|
1300 | \[ |
---|
1301 | y(t) \; \equiv \; F(x(t)) \; : \; I\!\!R \;\mapsto \;I\!\!R^m |
---|
1302 | \] |
---|
1303 | defined by an autonomous right-hand side $F$ recorded on the tape. |
---|
1304 | Using this relation, one can generate the Taylor coefficients $x_i$, |
---|
1305 | $i \le deg$, |
---|
1306 | recursively from the current point $x_0$. This task is achieved by the |
---|
1307 | driver routine {\sf forode} defined as follows: |
---|
1308 | % |
---|
1309 | \begin{tabbing} |
---|
1310 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1311 | \>{\sf int forode(tag,n,tau,dol,deg,X)}\\ |
---|
1312 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1313 | \>{\sf int n;} \> // number of state variables $n$\\ |
---|
1314 | \>{\sf double tau;} \> // scaling parameter\\ |
---|
1315 | \>{\sf int dol;} \> // degree on previous call\\ |
---|
1316 | \>{\sf int deg;} \> // degree on current call\\ |
---|
1317 | \>{\sf double X[n][deg+1];} \> // Taylor coefficient vector $X$ |
---|
1318 | \end{tabbing} |
---|
1319 | % |
---|
1320 | If {\sf dol} is positive, it is assumed that {\sf forode} |
---|
1321 | has been called before at the same point so that all Taylor coefficient |
---|
1322 | vectors up to the {\sf dol}-th are already correct. |
---|
1323 | |
---|
1324 | Subsequently one may call the driver routine {\sf reverse} or corresponding |
---|
1325 | low level routines as explained in the \autoref{forw_rev} and |
---|
1326 | \autoref{forw_rev_ad}, respectively, to compute |
---|
1327 | the family of square matrices {\sf Z[n][n][deg]} defined by |
---|
1328 | \[ |
---|
1329 | Z_j \equiv U\/\frac{\partial y_j}{\partial x_0} \in{I\!\!R}^{q \times n} , |
---|
1330 | \] |
---|
1331 | with {\sf double** U}$=I_n$ the identity matrix of order {\sf n}. |
---|
1332 | |
---|
1333 | For the numerical solutions of ordinary differential equations, |
---|
1334 | one may also wish to calculate the Jacobians |
---|
1335 | \begin{equation} |
---|
1336 | \label{eq:bees} |
---|
1337 | B_j \; \equiv \; \frac{\mbox{d}x_{j+1}}{\mbox{d} x_0}\;\in\;{I\!\!R}^{n \times n}\, , |
---|
1338 | \end{equation} |
---|
1339 | which exist provided $F$ is sufficiently smooth. These matrices can |
---|
1340 | be obtained from the partial derivatives $\partial y_i/\partial x_0$ |
---|
1341 | by an appropriate version of the chain rule. |
---|
1342 | To compute the total derivatives $B = (B_j)_{0\leq j <d}$ |
---|
1343 | defined in \eqref{eq:bees}, one has to evaluate $\frac{1}{2}d(d-1)$ |
---|
1344 | matrix-matrix products. This can be done by a call of the routine {\sf accode} after the |
---|
1345 | corresponding evaluation of the {\sf hov\_reverse} function. The interface of |
---|
1346 | {\sf accode} is defined as follows: |
---|
1347 | % |
---|
1348 | \begin{tabbing} |
---|
1349 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1350 | \>{\sf int accode(n,tau,deg,Z,B,nz)}\\ |
---|
1351 | \>{\sf int n;} \> // number of state variables $n$ \\ |
---|
1352 | \>{\sf double tau;} \> // scaling parameter\\ |
---|
1353 | \>{\sf int deg;} \> // degree on current call\\ |
---|
1354 | \>{\sf double Z[n][n][deg];} \> // partials of coefficient vectors\\ |
---|
1355 | \>{\sf double B[n][n][deg];} \> // result $B$ as defined in \eqref{eq:bees}\\ |
---|
1356 | \>{\sf short nz[n][n];} \> // optional nonzero pattern |
---|
1357 | \end{tabbing} |
---|
1358 | % |
---|
1359 | Sparsity information can be exploited by {\sf accode} using the array {\sf |
---|
1360 | nz}. For this purpose, {\sf nz} has to be set by a call of the routine {\sf |
---|
1361 | reverse} or the corresponding basic routines as explained below in |
---|
1362 | \autoref{forw_rev_ad} and \autoref{forw_rev}, respectively. The |
---|
1363 | non-positive entries of {\sf nz} are then changed by {\sf accode} so that upon |
---|
1364 | return |
---|
1365 | \[ |
---|
1366 | \mbox{{\sf B[i][j][k]}} \; \equiv \; 0 \quad {\rm if} \quad \mbox{\sf k} \leq \mbox{\sf $-$nz[i][j]}\; . |
---|
1367 | \] |
---|
1368 | In other words, the matrices $B_k$ = {\sf B[ ][ ][k]} have a |
---|
1369 | sparsity pattern that fills in as $k$ grows. Note, that there need to be no |
---|
1370 | loss in computational efficiency if a time-dependent ordinary differential equation |
---|
1371 | is rewritten in autonomous form. |
---|
1372 | |
---|
1373 | The prototype of the ODE-drivers {\sf forode} and {\sf accode} is contained in the header file |
---|
1374 | \verb=<adolc/drivers/odedrivers.h>=. The global header file |
---|
1375 | \verb=<adolc/adolc.h>= |
---|
1376 | includes this file automatically, see \autoref{ssec:DesIH}. |
---|
1377 | |
---|
1378 | An example program using the procedures {\sf forode} and {\sf accode} together |
---|
1379 | with more detailed information about the coding can be found in |
---|
1380 | \autoref{exam:ode}. The corresponding source code |
---|
1381 | \verb=odexam.cpp= is contained in the subdirectory |
---|
1382 | \verb=examples=. |
---|
1383 | % |
---|
1384 | % |
---|
1385 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1386 | \subsection{Drivers for Sparse Jacobians and Sparse Hessians} |
---|
1387 | \label{sparse} |
---|
1388 | % |
---|
1389 | Quite often, the Jacobians and Hessians that have to be computed are sparse |
---|
1390 | matrices. Therefore, ADOL-C provides additionally drivers that |
---|
1391 | allow the exploitation of sparsity. The exploitation of sparsity is |
---|
1392 | frequently based on {\em graph coloring} methods, discussed |
---|
1393 | for example in \cite{GeMaPo05} and \cite{GeTaMaPo07}. The sparse drivers of ADOL-C presented in this section |
---|
1394 | rely on the the coloring package ColPack developed by the authors of \cite{GeMaPo05} and \cite{GeTaMaPo07}. |
---|
1395 | ColPack is not directly incorporated in ADOL-C, and therefore needs to be installed |
---|
1396 | separately to use the sparse drivers described here. ColPack is available for download at |
---|
1397 | \verb=http://cscapes.cs.purdue.edu/coloringpage/software.htm=. More information about the required |
---|
1398 | installation of ColPack is given in \autoref{install}. |
---|
1399 | % |
---|
1400 | \subsubsection*{Sparse Jacobians and Sparse Hessians} |
---|
1401 | % |
---|
1402 | To compute the entries of sparse Jacobians and sparse Hessians, |
---|
1403 | respectively, in coordinate format one may use the drivers: |
---|
1404 | \begin{tabbing} |
---|
1405 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1406 | \>{\sf int sparse\_jac(tag,m,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\ |
---|
1407 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1408 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1409 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1410 | \>{\sf int repeat;} \> // indicate repeated call at same argument\\ |
---|
1411 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1412 | \>{\sf int nnz;} \> // number of nonzeros \\ |
---|
1413 | \>{\sf unsigned int rind[nnz];}\> // row index\\ |
---|
1414 | \>{\sf unsigned int cind[nnz];}\> // column index\\ |
---|
1415 | \>{\sf double values[nnz];} \> // non-zero values\\ |
---|
1416 | \>{\sf int options[4];} \> // array of control parameters\\ |
---|
1417 | \end{tabbing} |
---|
1418 | % |
---|
1419 | \begin{tabbing} |
---|
1420 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1421 | \>{\sf int sparse\_hess(tag,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\ |
---|
1422 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1423 | \>{\sf int n;} \> // number of independent variables $n$ and $m=1$\\ |
---|
1424 | \>{\sf int repeat;} \> // indicate repeated call at same argument\\ |
---|
1425 | \>{\sf double x[n];} \> // independent vector $x$ \\ |
---|
1426 | \>{\sf int nnz;} \> // number of nonzeros \\ |
---|
1427 | \>{\sf unsigned int rind[nnz];}\> // row indices\\ |
---|
1428 | \>{\sf unsigned int cind[nnz];}\> // column indices\\ |
---|
1429 | \>{\sf double values[nnz];} \> // non-zero values \\ |
---|
1430 | \>{\sf int options[2];} \> // array of control parameters\\ |
---|
1431 | \end{tabbing} |
---|
1432 | % |
---|
1433 | Once more, the input variables are the identifier for the internal |
---|
1434 | representation {\sf tag}, if required the number of dependents {\sf m}, |
---|
1435 | and the number of independents {\sf n} for a consistency check. |
---|
1436 | Furthermore, the flag {\sf repeat=0} indicates that the functions are called |
---|
1437 | at a point with a new sparsity structure, whereas {\sf repeat=1} results in |
---|
1438 | the re-usage of the sparsity pattern from the previous call. |
---|
1439 | The current values of the independents are given by the array {\sf x}. |
---|
1440 | The input/output |
---|
1441 | variable {\sf nnz} stores the number of the nonzero entries. |
---|
1442 | Therefore, {\sf nnz} denotes also the length of the arrays {\sf r\_ind} storing |
---|
1443 | the row indices, {\sf c\_ind} storing the column indices, and |
---|
1444 | {\sf values} storing the values of the nonzero entries. |
---|
1445 | If {\sf sparse\_jac} and {\sf sparse\_hess} are called with {\sf repeat=0}, |
---|
1446 | the functions determine the number of nonzeros for the sparsity pattern |
---|
1447 | defined by the value of {\sf x}, allocate appropriate arrays {\sf r\_ind}, |
---|
1448 | {\sf c\_ind}, and {\sf values} and store the desired information in these |
---|
1449 | arrays. |
---|
1450 | During the next function call with {\sf repeat=1} the allocated memory |
---|
1451 | is reused such that only the values of the arrays are changed. |
---|
1452 | Before calling {\sf sparse\_jac} or {\sf sparse\_hess} once more with {\sf |
---|
1453 | repeat=0} the user is responsible for the deallocation of the array |
---|
1454 | {\sf r\_ind}, {\sf c\_ind}, and {\sf values} using the function {\sf |
---|
1455 | free()}! |
---|
1456 | |
---|
1457 | For each driver the array {\sf options} can be used to adapted the |
---|
1458 | computation of the sparse derivative matrices to the special |
---|
1459 | needs of application under consideration. Most frequently, the default options |
---|
1460 | will give a reasonable performance. The elements of the array {\sf options} control the action of |
---|
1461 | {\sf sparse\_jac} according to \autoref{options_sparse_jac}. |
---|
1462 | \begin{table}[h] |
---|
1463 | \center |
---|
1464 | \begin{tabular}{|c|c|l|} \hline |
---|
1465 | component & value & \\ \hline |
---|
1466 | {\sf options[0]} & & way of sparsity pattern computation \\ |
---|
1467 | & 0 & propagation of index domains (default) \\ |
---|
1468 | & 1 & propagation of bit pattern \\ \hline |
---|
1469 | {\sf options[1]} & & test the computational graph control flow \\ |
---|
1470 | & 0 & safe mode (default) \\ |
---|
1471 | & 1 & tight mode \\ \hline |
---|
1472 | {\sf options[2]} & & way of bit pattern propagation \\ |
---|
1473 | & 0 & automatic detection (default) \\ |
---|
1474 | & 1 & forward mode \\ |
---|
1475 | & 2 & reverse mode \\ \hline |
---|
1476 | {\sf options[3]} & & way of compression \\ |
---|
1477 | & 0 & column compression (default) \\ |
---|
1478 | & 1 & row compression \\ \hline |
---|
1479 | \end{tabular} |
---|
1480 | \caption{ {\sf sparse\_jac} parameter {\sf options}\label{options_sparse_jac}} |
---|
1481 | \end{table} |
---|
1482 | |
---|
1483 | The component {\sf options[1]} determines |
---|
1484 | the usage of the safe or tight mode of sparsity computation. |
---|
1485 | The first, more conservative option is the default. It accounts for all |
---|
1486 | dependences that might occur for any value of the |
---|
1487 | independent variables. For example, the intermediate |
---|
1488 | {\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is |
---|
1489 | always assumed to depend on all independent variables that {\sf a} or {\sf b} |
---|
1490 | dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the |
---|
1491 | logical {\sf OR} of those associated with {\sf a} and {\sf b}. |
---|
1492 | In contrast |
---|
1493 | the tight option gives this result only in the unlikely event of an exact |
---|
1494 | tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern |
---|
1495 | associated with {\sf c} either to that of {\sf a} or to that of {\sf b}, |
---|
1496 | depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally. |
---|
1497 | Obviously, the sparsity pattern obtained with the tight option may contain |
---|
1498 | more zeros than that obtained with the safe option. On the other hand, it |
---|
1499 | will only be valid at points belonging to an area where the function $F$ is locally |
---|
1500 | analytic and that contains the point at which the internal representation was |
---|
1501 | generated. Since generating the sparsity structure using the safe version does not |
---|
1502 | require any reevaluation, it may thus reduce the overall computational cost |
---|
1503 | despite the fact that it produces more nonzero entries. |
---|
1504 | The value of {\sf options[2]} selects the direction of bit pattern propagation. |
---|
1505 | Depending on the number of independent $n$ and of dependent variables $m$ |
---|
1506 | one would prefer the forward mode if $n$ is significant smaller than $m$ and |
---|
1507 | would otherwise use the reverse mode. |
---|
1508 | |
---|
1509 | The elements of the array {\sf options} control the action of |
---|
1510 | {\sf sparse\_hess} according to \autoref{options_sparse_hess}. |
---|
1511 | \begin{table}[h] |
---|
1512 | \center |
---|
1513 | \begin{tabular}{|c|c|l|} \hline |
---|
1514 | component & value & \\ \hline |
---|
1515 | {\sf options[0]} & & test the computational graph control flow \\ |
---|
1516 | & 0 & safe mode (default) \\ |
---|
1517 | & 1 & tight mode \\ \hline |
---|
1518 | {\sf options[1]} & & way of recovery \\ |
---|
1519 | & 0 & indirect recovery (default) \\ |
---|
1520 | & 1 & direct recovery \\ \hline |
---|
1521 | \end{tabular} |
---|
1522 | \caption{ {\sf sparse\_hess} parameter {\sf options}\label{options_sparse_hess}} |
---|
1523 | \end{table} |
---|
1524 | |
---|
1525 | The described driver routines for the computation of sparse derivative |
---|
1526 | matrices are prototyped in the header file |
---|
1527 | \verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the |
---|
1528 | global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}). |
---|
1529 | Example codes illustrating the usage of {\sf |
---|
1530 | sparse\_jac} and {\sf sparse\_hess} can be found in the file |
---|
1531 | \verb=sparse_jacobian.cpp= and \verb=sparse_hessian.cpp= contained in %the subdirectory |
---|
1532 | \verb=examples/additional_examples/sparse=. |
---|
1533 | % |
---|
1534 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1535 | \subsubsection*{Computation of Sparsity Pattern} |
---|
1536 | % |
---|
1537 | ADOL-C offers a convenient way of determining the |
---|
1538 | sparsity structure of a Jacobian matrix using the function: |
---|
1539 | % |
---|
1540 | \begin{tabbing} |
---|
1541 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill % define tab position |
---|
1542 | \>{\sf int jac\_pat(tag, m, n, x, JP, options)}\\ |
---|
1543 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1544 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1545 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1546 | \>{\sf double x[n];} \> // independent variables $x_0$\\ |
---|
1547 | \>{\sf unsigned int JP[][];} \> // row compressed sparsity structure\\ |
---|
1548 | \>{\sf int options[2];} \> // array of control parameters |
---|
1549 | \end{tabbing} |
---|
1550 | % |
---|
1551 | The sparsity pattern of the |
---|
1552 | Jacobian is computed in a compressed row format. For this purpose, |
---|
1553 | {\sf JP} has to be an $m$ dimensional array of pointers to {\sf |
---|
1554 | unsigned int}s, i.e., one has {\sf unsigned int* JP[m]}. |
---|
1555 | During the call of {\sf jac\_pat}, the number $\hat{n}_i$ of nonzero |
---|
1556 | entries in row $i$ of the Jacobian is determined for all $1\le i\le |
---|
1557 | m$. Then, a memory allocation is performed such that {\sf JP[i-1]} |
---|
1558 | points to a block of $\hat{n}_i+1$ {\sf unsigned int} for all $1\le |
---|
1559 | i\le m$ and {\sf JP[i-1][0]} is set to $\hat{n}_i$. Subsequently, the |
---|
1560 | column indices of the $j$ nonzero entries in the $i$th row are stored |
---|
1561 | in the components {\sf JP[i-1][1]}, \ldots, {\sf JP[i-1][j]}. |
---|
1562 | |
---|
1563 | The elements of the array {\sf options} control the action of |
---|
1564 | {\sf jac\_pat} according to \autoref{options}. |
---|
1565 | \begin{table}[h] |
---|
1566 | \center |
---|
1567 | \begin{tabular}{|c|c|l|} \hline |
---|
1568 | component & value & \\ \hline |
---|
1569 | {\sf options[0]} & & way of sparsity pattern computation \\ |
---|
1570 | & 0 & propagation of index domains (default) \\ |
---|
1571 | & 1 & propagation of bit pattern \\ \hline |
---|
1572 | {\sf options[1]} & & test the computational graph control flow \\ |
---|
1573 | & 0 & safe mode (default) \\ |
---|
1574 | & 1 & tight mode \\ \hline |
---|
1575 | {\sf options[2]} & & way of bit pattern propagation \\ |
---|
1576 | & 0 & automatic detection (default) \\ |
---|
1577 | & 1 & forward mode \\ |
---|
1578 | & 2 & reverse mode \\ \hline |
---|
1579 | \end{tabular} |
---|
1580 | \caption{ {\sf jac\_pat} parameter {\sf options}\label{options}} |
---|
1581 | \end{table} |
---|
1582 | The value of {\sf options[0]} selects the way to compute the sparsity |
---|
1583 | pattern. The component {\sf options[1]} determines |
---|
1584 | the usage of the safe or tight mode of bit pattern propagation. |
---|
1585 | The first, more conservative option is the default. It accounts for all |
---|
1586 | dependences that might occur for any value of the |
---|
1587 | independent variables. For example, the intermediate |
---|
1588 | {\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is |
---|
1589 | always assumed to depend on all independent variables that {\sf a} or {\sf b} |
---|
1590 | dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the |
---|
1591 | logical {\sf OR} of those associated with {\sf a} and {\sf b}. |
---|
1592 | In contrast |
---|
1593 | the tight option gives this result only in the unlikely event of an exact |
---|
1594 | tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern |
---|
1595 | associated with {\sf c} either to that of {\sf a} or to that of {\sf b}, |
---|
1596 | depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally. |
---|
1597 | Obviously, the sparsity pattern obtained with the tight option may contain |
---|
1598 | more zeros than that obtained with the safe option. On the other hand, it |
---|
1599 | will only be valid at points belonging to an area where the function $F$ is locally |
---|
1600 | analytic and that contains the point at which the internal representation was |
---|
1601 | generated. Since generating the sparsity structure using the safe version does not |
---|
1602 | require any reevaluation, it may thus reduce the overall computational cost |
---|
1603 | despite the fact that it produces more nonzero entries. The value of |
---|
1604 | {\sf options[2]} selects the direction of bit pattern propagation. |
---|
1605 | Depending on the number of independent $n$ and of dependent variables $m$ |
---|
1606 | one would prefer the forward mode if $n$ is significant smaller than $m$ and |
---|
1607 | would otherwise use the reverse mode. |
---|
1608 | |
---|
1609 | The routine {\sf jac\_pat} may use the propagation of bitpattern to |
---|
1610 | determine the sparsity pattern. Therefore, a kind of ``strip-mining'' |
---|
1611 | is used to cope with large matrix dimensions. If the system happens to run out of memory, one may reduce |
---|
1612 | the value of the constant {\sf PQ\_STRIPMINE\_MAX} |
---|
1613 | following the instructions in \verb=<adolc/sparse/sparse_fo_rev.h>=. |
---|
1614 | |
---|
1615 | The driver routine is prototyped in the header file |
---|
1616 | \verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the |
---|
1617 | global header file \verb=<adolc/adolc.h>= (see |
---|
1618 | \autoref{ssec:DesIH}). The determination of sparsity patterns is |
---|
1619 | illustrated by the examples \verb=sparse_jacobian.cpp= |
---|
1620 | and \verb=jacpatexam.cpp= |
---|
1621 | contained in |
---|
1622 | \verb=examples/additional_examples/sparse=. |
---|
1623 | |
---|
1624 | To compute the sparsity pattern of a Hessian in a row compressed form, ADOL-C provides the |
---|
1625 | driver |
---|
1626 | \begin{tabbing} |
---|
1627 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill % define tab position |
---|
1628 | \>{\sf int hess\_pat(tag, n, x, HP, options)}\\ |
---|
1629 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1630 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1631 | \>{\sf double x[n];} \> // independent variables $x_0$\\ |
---|
1632 | \>{\sf unsigned int HP[][];} \> // row compressed sparsity structure\\ |
---|
1633 | \>{\sf int option;} \> // control parameter |
---|
1634 | \end{tabbing} |
---|
1635 | where the user has to provide {\sf HP} as an $n$ dimensional array of pointers to {\sf |
---|
1636 | unsigned int}s. |
---|
1637 | After the function call {\sf HP} contains the sparsity pattern, |
---|
1638 | where {\sf HP[j][0]} contains the number of nonzero elements in the |
---|
1639 | $j$th row for $1 \le j\le n$. |
---|
1640 | The components {\sf P[j][i]}, $0<${\sf i}~$\le$~{\sf P[j][0]} store the |
---|
1641 | indices of these entries. For determining the sparsity pattern, ADOL-C uses |
---|
1642 | the algorithm described in \cite{Wa05a}. The parameter{\sf option} determines |
---|
1643 | the usage of the safe ({\sf option = 0}, default) or tight mode ({\sf |
---|
1644 | option = 1}) of the computation of the sparsity pattern as described |
---|
1645 | above. |
---|
1646 | |
---|
1647 | This driver routine is prototyped in the header file |
---|
1648 | \verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the |
---|
1649 | global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}). |
---|
1650 | An example employing the procedure {\sf hess\_pat} can be found in the file |
---|
1651 | \verb=sparse_hessian.cpp= contained in |
---|
1652 | \verb=examples/additional_examples/sparse=. |
---|
1653 | % |
---|
1654 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1655 | \subsubsection*{Calculation of Seed Matrices} |
---|
1656 | % |
---|
1657 | To compute a compressed derivative matrix from a given sparsity |
---|
1658 | pattern, one has to calculate an appropriate seed matrix that can be |
---|
1659 | used as input for the derivative calculation. To facilitate the |
---|
1660 | generation of seed matrices for a sparsity pattern given in |
---|
1661 | row compressed form, ADOL-C provides the following two drivers, |
---|
1662 | which are based on the ColPack library: |
---|
1663 | \begin{tabbing} |
---|
1664 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill % define tab position |
---|
1665 | \>{\sf int generate\_seed\_jac(m, n, JP, S, p)}\\ |
---|
1666 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
1667 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1668 | \>{\sf unsigned int JP[][];} \> // row compressed sparsity structure |
---|
1669 | of Jacobian\\ |
---|
1670 | \>{\sf double S[n][p];} \> // seed matrix\\ |
---|
1671 | \>{\sf int p;} \> // number of columns in $S$ |
---|
1672 | \end{tabbing} |
---|
1673 | The input variables to {\sf generate\_seed\_jac} are the number of dependent variables $m$, the |
---|
1674 | number of independent variables {\sf n} and the sparsity pattern {\sf |
---|
1675 | JP} of the Jacobian computed for example by {\sf jac\_pat}. First, |
---|
1676 | {\sf generate\_seed\_jac} performs a distance-2 coloring of the bipartite graph defined by the sparsity |
---|
1677 | pattern {\sf JP} as described in \cite{GeMaPo05}. The number of colors needed for the coloring |
---|
1678 | determines the number of columns {\sf p} in the seed |
---|
1679 | matrix. Subsequently, {\sf generate\_seed\_jac} allocates the memory needed by {\sf |
---|
1680 | S} and initializes {\sf S} according to the graph coloring. |
---|
1681 | The coloring algorithm that is applied in {\sf |
---|
1682 | generate\_seed\_jac} is used also by the driver {\sf sparse\_jac} |
---|
1683 | described earlier. |
---|
1684 | |
---|
1685 | \begin{tabbing} |
---|
1686 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill % define tab position |
---|
1687 | \>{\sf int generate\_seed\_hess(n, HP, S, p)}\\ |
---|
1688 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1689 | \>{\sf unsigned int HP[][];} \> // row compressed sparsity structure |
---|
1690 | of Jacobian\\ |
---|
1691 | \>{\sf double S[n][p];} \> // seed matrix\\ |
---|
1692 | \>{\sf int p;} \> // number of columns in $S$ |
---|
1693 | \end{tabbing} |
---|
1694 | The input variables to {\sf generate\_seed\_hess} are the number of independents $n$ |
---|
1695 | and the sparsity pattern {\sf HP} of the Hessian computed for example |
---|
1696 | by {\sf hess\_pat}. First, {\sf generate\_seed\_hess} performs an |
---|
1697 | appropriate coloring of the adjacency graph defined by the sparsity |
---|
1698 | pattern {\sf HP}: An acyclic coloring in the case of an indirect recovery of the Hessian from its |
---|
1699 | compressed representation and a star coloring in the case of a direct recovery. |
---|
1700 | Subsequently, {\sf generate\_seed\_hess} allocates the memory needed by {\sf |
---|
1701 | S} and initializes {\sf S} according to the graph coloring. |
---|
1702 | The coloring algorithm applied in {\sf |
---|
1703 | generate\_seed\_hess} is used also by the driver {\sf sparse\_hess} |
---|
1704 | described earlier. |
---|
1705 | |
---|
1706 | The specific set of criteria used to define a seed matrix $S$ depends |
---|
1707 | on whether the sparse derivative matrix |
---|
1708 | to be computed is a Jacobian (nonsymmetric) or a Hessian (symmetric). |
---|
1709 | It also depends on whether the entries of the derivative matrix are to be |
---|
1710 | recovered from the compressed representation \emph{directly} |
---|
1711 | (without requiring any further arithmetic) or \emph{indirectly} (for |
---|
1712 | example, by solving for unknowns via successive substitutions). |
---|
1713 | Appropriate recovery routines are provided by ColPack and used |
---|
1714 | in the drivers {\sf sparse\_jac} and {\sf sparse\_hess} described in |
---|
1715 | the previous subsection. Examples with a detailed analysis of the |
---|
1716 | employed drivers for the exploitation of sparsity can be found in the |
---|
1717 | papers \cite{GePoTaWa06} and \cite{GePoWa08}. |
---|
1718 | |
---|
1719 | |
---|
1720 | These driver routines are prototyped in |
---|
1721 | \verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the |
---|
1722 | global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}). |
---|
1723 | An example code illustrating the usage of {\sf |
---|
1724 | generate\_seed\_jac} and {\sf generate\_seed\_hess} can be found in the file |
---|
1725 | \verb=sparse_jac_hess_exam.cpp= contained in \verb=examples/additional_examples/sparse=. |
---|
1726 | % |
---|
1727 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1728 | \subsection{Higher Derivative Tensors} |
---|
1729 | \label{higherOrderDeriv} |
---|
1730 | % |
---|
1731 | Many applications in scientific computing need second- and higher-order |
---|
1732 | derivatives. Often, one does not require full derivative tensors but |
---|
1733 | only the derivatives in certain directions $s_i \in \R^{n}$. |
---|
1734 | Suppose a collection of $p$ directions |
---|
1735 | $s_i \in \R^{n}$ is given, which form a matrix |
---|
1736 | \[ |
---|
1737 | S\; =\; \left [ s_1, s_2,\ldots, s_p \right ]\; \in \; |
---|
1738 | \R^{n \times p}. |
---|
1739 | \] |
---|
1740 | One possible choice is $S = I_n$ with $p = n$, which leads to |
---|
1741 | full tensors being evaluated. |
---|
1742 | ADOL-C provides the function {\sf tensor\_eval} |
---|
1743 | to calculate the derivative tensors |
---|
1744 | \begin{eqnarray} |
---|
1745 | \label{eq:tensor} |
---|
1746 | \left. \nabla_{\mbox{$\scriptstyle \!\!S$}}^{k} |
---|
1747 | F(x_0) \; = \; \frac{\partial^k}{\partial z^k} F(x_0+Sz) \right |_{z=0} |
---|
1748 | \in \R^{p^k}\quad \mbox{for} \quad k = 0,\ldots,d |
---|
1749 | \end{eqnarray} |
---|
1750 | simultaneously. The function {\sf tensor\_eval} has the following calling sequence and |
---|
1751 | parameters: |
---|
1752 | % |
---|
1753 | \begin{tabbing} |
---|
1754 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1755 | \>{\sf void tensor\_eval(tag,m,n,d,p,x,tensor,S)}\\ |
---|
1756 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1757 | \>{\sf int m;} \> // number of dependent variables $m$ \\ |
---|
1758 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
1759 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
1760 | \>{\sf int p;} \> // number of directions $p$\\ |
---|
1761 | \>{\sf double x[n];} \> // values of independent variables $x_0$\\ |
---|
1762 | \>{\sf double tensor[m][size];}\> // result as defined in \eqref{eq:tensor} in compressed form\\ |
---|
1763 | \>{\sf double S[n][p];} \> // seed matrix $S$ |
---|
1764 | \end{tabbing} |
---|
1765 | % |
---|
1766 | Using the symmetry of the tensors defined by \eqref{eq:tensor}, the memory |
---|
1767 | requirement can be reduced enormously. The collection of tensors up to order $d$ comprises |
---|
1768 | $\binom{p+d}{d}$ distinct elements. Hence, the second dimension of {\sf tensor} must be |
---|
1769 | greater or equal to $\binom{p+d}{d}$. |
---|
1770 | To compute the derivatives, {\sf tensor\_eval} propagates internally univariate Taylor |
---|
1771 | series along $\binom{n+d-1}{d}$ directions. Then the desired values are interpolated. This |
---|
1772 | approach is described in \cite{Griewank97}. |
---|
1773 | |
---|
1774 | The access of individual entries in symmetric tensors of |
---|
1775 | higher order is a little tricky. We always store the derivative |
---|
1776 | values in the two dimensional array {\sf tensor} and provide two |
---|
1777 | different ways of accessing them. |
---|
1778 | The leading dimension of the tensor array ranges over |
---|
1779 | the component index $i$ of the function $F$, i.e., $F_{i+1}$ for $i = |
---|
1780 | 0,\ldots,m-1$. The sub-arrays pointed to by {\sf tensor[i]} have identical |
---|
1781 | structure for all $i$. Each of them represents the symmetric tensors up to |
---|
1782 | order $d$ of the scalar function $F_{i+1}$ in $p$ variables. |
---|
1783 | % |
---|
1784 | The $\binom{p+d}{d}$ mixed partial derivatives in each of the $m$ |
---|
1785 | tensors are linearly ordered according to the tetrahedral |
---|
1786 | scheme described by Knuth \cite{Knuth73}. In the familiar quadratic |
---|
1787 | case $d=2$ the derivative with respect to $z_j$ and $z_k$ with $z$ |
---|
1788 | as in \eqref{eq:tensor} and $j \leq k$ is stored at {\sf tensor[i][l]} with |
---|
1789 | $l = k*(k+1)/2+j$. At $j = 0 = k$ and hence $l = 0$ we find the |
---|
1790 | function value $F_{i+1}$ itself and the gradient |
---|
1791 | $\nabla F_{i+1}= \partial F_{i+1}/\partial x_k $ is stored at $l=k(k+1)/2$ |
---|
1792 | with $j=0$ for $k=1,\ldots,p$. |
---|
1793 | |
---|
1794 | For general $d$ we combine the variable |
---|
1795 | indices to a multi-index $j = (j_1,j_2,\ldots,j_d)$, |
---|
1796 | where $j_k$ indicates differentiation with respect to variable |
---|
1797 | $x_{j_k}$ with $j_k \in \{0,1,\ldots,p\}$. The value $j_k=0$ indicates |
---|
1798 | no differentiation so that all lower derivatives are also |
---|
1799 | contained in the same data structure as described above for |
---|
1800 | the quadratic case. The location of the partial derivative specified |
---|
1801 | by $j$ is computed by the function |
---|
1802 | % |
---|
1803 | \begin{tabbing} |
---|
1804 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1805 | \>{\sf int tensor\_address(d,$\,$j)} \\ |
---|
1806 | \>{\sf int d;} \> // highest derivative degree $d$ \\ |
---|
1807 | \>{\sf int j[d];} \> // multi-index $j$ |
---|
1808 | \end{tabbing} |
---|
1809 | % |
---|
1810 | and it may thus be referenced as {\sf tensor[i][tensor\_address(d,$\,$j)]}. |
---|
1811 | Notice that the address computation does depend on the degree $d$ |
---|
1812 | but not on the number of directions $p$, which could theoretically be |
---|
1813 | enlarged without the need to reallocate the original tensor. |
---|
1814 | Also, the components of $j$ need to be non-increasing. |
---|
1815 | % |
---|
1816 | To some C programmers it may appear more natural to access tensor |
---|
1817 | entries by successive dereferencing in the form |
---|
1818 | {\sf tensorentry[i][$\,$j1$\,$][$\,$j2$\,$]$\ldots$[$\,$jd$\,$]}. |
---|
1819 | We have also provided this mode, albeit with the restriction |
---|
1820 | that the indices $j_1,j_2,\ldots,j_d$ are non-increasing. |
---|
1821 | In the second order case this means that the Hessian entries must be |
---|
1822 | specified in or below the diagonal. If this restriction is |
---|
1823 | violated the values are almost certain to be wrong and array bounds |
---|
1824 | may be violated. We emphasize that subscripting is not overloaded |
---|
1825 | but that {\sf tensorentry} is a conventional and |
---|
1826 | thus moderately efficient C pointer structure. |
---|
1827 | Such a pointer structure can be allocated and set up completely by the |
---|
1828 | function |
---|
1829 | % |
---|
1830 | \begin{tabbing} |
---|
1831 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1832 | \>{\sf void** tensorsetup(m,p,d,tensor)} \\ |
---|
1833 | \>{\sf int m;} \> // number of dependent variables $n$ \\ |
---|
1834 | \>{\sf int p;} \> // number of directions $p$\\ |
---|
1835 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
1836 | \>{\sf double tensor[m][size];}\> // pointer to two dimensional array |
---|
1837 | \end{tabbing} |
---|
1838 | % |
---|
1839 | Here, {\sf tensor} is the array of $m$ pointers pointing to arrays of {\sf size} |
---|
1840 | $\geq \binom{p+d}{d}$ allocated by the user before. During the execution of {\sf tensorsetup}, |
---|
1841 | $d-1$ layers of pointers are set up so that the return value |
---|
1842 | allows the direct dereferencing of individual tensor elements. |
---|
1843 | |
---|
1844 | For example, suppose some active section involving $m \geq 5$ dependents and |
---|
1845 | $n \geq 2$ independents has been executed and taped. We may |
---|
1846 | select $p=2$, $d=3$ and initialize the $n\times 2$ seed matrix $S$ with two |
---|
1847 | columns $s_1$ and $s_2$. Then we are able to execute the code segment |
---|
1848 | \begin{tabbing} |
---|
1849 | \hspace{0.5in}\={\sf double**** tensorentry = (double****) tensorsetup(m,p,d,tensor);} \\ |
---|
1850 | \>{\sf tensor\_eval(tag,m,n,d,p,x,tensor,S);} |
---|
1851 | \end{tabbing} |
---|
1852 | This way, we evaluated all tensors defined in \eqref{eq:tensor} up to degree 3 |
---|
1853 | in both directions $s_1$ and |
---|
1854 | $s_2$ at some argument $x$. To allow the access of tensor entries by dereferencing the pointer |
---|
1855 | structure {\sf tensorentry} has been created. Now, |
---|
1856 | the value of the mixed partial |
---|
1857 | \[ |
---|
1858 | \left. \frac{\partial ^ 3 F_5(x+s_1 z_1+s_2 z_2)}{\partial z_1^2 \partial z_2} \right |_{z_1=0=z_2} |
---|
1859 | \] |
---|
1860 | can be recovered as |
---|
1861 | \begin{center} |
---|
1862 | {\sf tensorentry[4][2][1][1]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[4][tensor\_address(d,$\,$j)]}, |
---|
1863 | \end{center} |
---|
1864 | where the integer array {\sf j} may equal (1,1,2), (1,2,1) or (2,1,1). |
---|
1865 | Analogously, the entry |
---|
1866 | \begin{center} |
---|
1867 | {\sf tensorentry[2][1][0][0]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[2][tensor\_address(d,$\,$j)]} |
---|
1868 | \end{center} |
---|
1869 | with {\sf j} = (1,0,0) contains the first derivative of the third dependent |
---|
1870 | variable $F_3$ with respect to the first differentiation parameter $z_1$. |
---|
1871 | |
---|
1872 | Note, that the pointer structure {\sf tensorentry} has to be set up only once. Changing the values of the |
---|
1873 | array {\sf tensor}, e.g.~by a further call of {\sf tensor\_eval}, directly effects the values accessed |
---|
1874 | by {\sf tensorentry}. |
---|
1875 | % |
---|
1876 | When no more derivative evaluations are desired the pointer structure |
---|
1877 | {\sf tensorentry} can be deallocated by a call to the function |
---|
1878 | % |
---|
1879 | \begin{tabbing} |
---|
1880 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1881 | \>{\sf int freetensor(m,p,d, (double ****) tensorentry)}\\ |
---|
1882 | \>{\sf int m;} \> // number of dependent variables $m$ \\ |
---|
1883 | \>{\sf int p;} \> // number of independent variables $p$\\ |
---|
1884 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
1885 | \>{\sf double*** tensorentry[m];} \> // return value of {\sf tensorsetup} |
---|
1886 | \end{tabbing} |
---|
1887 | % |
---|
1888 | that does not deallocate the array {\sf tensor}. |
---|
1889 | |
---|
1890 | The drivers provided for efficient calculation of higher order |
---|
1891 | derivatives are prototyped in the header file \verb=<adolc/drivers/taylor.h>=, |
---|
1892 | which is included by the global header file \verb=<adolc/adolc.h>= automatically |
---|
1893 | (see \autoref{ssec:DesIH}). |
---|
1894 | Example codes using the above procedures can be found in the files |
---|
1895 | \verb=taylorexam.C= and \verb=accessexam.C= contained in the subdirectory |
---|
1896 | \verb=examples/additional_examples/taylor=. |
---|
1897 | % |
---|
1898 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
1899 | \subsection{Derivatives of Implicit and Inverse Functions} |
---|
1900 | \label{implicitInverse} |
---|
1901 | % |
---|
1902 | Frequently, one needs derivatives of variables |
---|
1903 | $y \in \R^{m}$ that are implicitly defined as |
---|
1904 | functions of some variables $x \in \R^{n-m}$ |
---|
1905 | by an algebraic system of equations |
---|
1906 | \[ |
---|
1907 | G(z) \; = \; 0 \in \R^m \quad |
---|
1908 | {\rm with} \quad z = (y, x) \in \R^n . |
---|
1909 | \] |
---|
1910 | Naturally, the $n$ arguments of $G$ need not be partitioned in |
---|
1911 | this regular fashion and we wish to provide flexibility for a |
---|
1912 | convenient selection of the $n-m$ {\em truly} independent |
---|
1913 | variables. Let $P \in \R^{(n-m)\times n}$ be a $0-1$ matrix |
---|
1914 | that picks out these variables so that it is a column |
---|
1915 | permutation of the matrix $[0,I_{n-m}] \in \R^{(n-m)\times n}$. |
---|
1916 | Then the nonlinear system |
---|
1917 | \[ |
---|
1918 | G(z) \; = \; 0, \quad P z = x, |
---|
1919 | \] |
---|
1920 | has a regular Jacobian, wherever the implicit function theorem |
---|
1921 | yields $y$ as a function of $x$. Hence, we may also write |
---|
1922 | \begin{equation} |
---|
1923 | \label{eq:inv_tensor} |
---|
1924 | F(z) = \left(\begin{array}{c} |
---|
1925 | G(z) \\ |
---|
1926 | P z |
---|
1927 | \end{array} \right)\; \equiv \; |
---|
1928 | \left(\begin{array}{c} |
---|
1929 | 0 \\ |
---|
1930 | P z |
---|
1931 | \end{array} \right)\; \equiv \; S\, x, |
---|
1932 | \end{equation} |
---|
1933 | where $S = [0,I_p]^{T} \in \R^{n \times p}$ with $p=n-m$. Now, we have rewritten |
---|
1934 | the original implicit functional relation between $x$ and $y$ as an inverse |
---|
1935 | relation $F(z) = Sx$. In practice, we may implement the projection $P$ simply |
---|
1936 | by marking $n-m$ of the independents also dependent. |
---|
1937 | |
---|
1938 | Given any $ F : \R^n \mapsto \R^n $ that is locally invertible and an arbitrary |
---|
1939 | seed matrix $S \in \R^{n \times p}$ we may evaluate all derivatives of $z \in \R^n$ |
---|
1940 | with respect to $x \in \R^p$ by calling the following routine: |
---|
1941 | % |
---|
1942 | \begin{tabbing} |
---|
1943 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
1944 | \>{\sf void inverse\_tensor\_eval(tag,n,d,p,z,tensor,S)}\\ |
---|
1945 | \>{\sf short int tag;} \> // tape identification \\ |
---|
1946 | \>{\sf int n;} \> // number of variables $n$\\ |
---|
1947 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
1948 | \>{\sf int p;} \> // number of directions $p$\\ |
---|
1949 | \>{\sf double z[n];} \> // values of independent variables $z$\\ |
---|
1950 | \>{\sf double tensor[n][size];}\> // partials of $z$ with respect to $x$\\ |
---|
1951 | \>{\sf double S[n][p];} \> // seed matrix $S$ |
---|
1952 | \end{tabbing} |
---|
1953 | % |
---|
1954 | The results obtained in {\sf tensor} are exactly the same as if we had called {\sf tensor\_eval} with |
---|
1955 | {\sf tag} pointing to a tape for the evaluation of the inverse function |
---|
1956 | $z=F^{-1}(y)$ for which naturally $n=m$. Note that the columns of $S$ belong |
---|
1957 | to the domain of that function. Individual derivative components can be |
---|
1958 | accessed in tensor exactly as in the explicit case described above. |
---|
1959 | |
---|
1960 | It must be understood that {\sf inverse\_tensor\_eval} actually computes the |
---|
1961 | derivatives of $z$ with respect to $x$ that is defined by the equation |
---|
1962 | $F(z)=F(z_0)+S \, x$. In other words the base point at |
---|
1963 | which the inverse function is differentiated is given by $F(z_0)$. |
---|
1964 | The routine has no capability for inverting $F$ itself as |
---|
1965 | solving systems of nonlinear |
---|
1966 | equations $F(z)=0$ in the first place is not just a differentiation task. |
---|
1967 | However, the routine {\sf jac\_solv} described in \autoref{optdrivers} may certainly be very |
---|
1968 | useful for that purpose. |
---|
1969 | |
---|
1970 | As an example consider the following two nonlinear expressions |
---|
1971 | \begin{eqnarray*} |
---|
1972 | G_1(z_1,z_2,z_3,z_4) & = & z_1^2+z_2^2-z_3^2 \\ |
---|
1973 | G_2(z_1,z_2,z_3,z_4) & = & \cos(z_4) - z_1/z_3 \enspace . |
---|
1974 | \end{eqnarray*} |
---|
1975 | The equations $G(z)=0$ describe the relation between the Cartesian |
---|
1976 | coordinates $(z_1,z_2)$ and the polar coordinates $(z_3,z_4)$ in the plane. |
---|
1977 | Now, suppose we are interested in the derivatives of the second Cartesian |
---|
1978 | $y_1=z_2$ and the second (angular) polar coordinate $y_2=z_4$ with respect |
---|
1979 | to the other two variables $x_1=z_1$ and $x_2=z_3$. Then the active section |
---|
1980 | could look simply like |
---|
1981 | % |
---|
1982 | \begin{tabbing} |
---|
1983 | \hspace{1.5in}\={\sf for (j=1; j $<$ 5;$\,$j++)}\hspace{0.15in} \= {\sf z[j] \boldmath $\ll=$ \unboldmath zp[j];}\\ |
---|
1984 | \>{\sf g[1] = z[1]*z[1]+z[2]*z[2]-z[3]*z[3]; }\\ |
---|
1985 | \>{\sf g[2] = cos(z[4]) - z[1]/z[3]; }\\ |
---|
1986 | \>{\sf g[1] \boldmath $\gg=$ \unboldmath gp[1];} \> {\sf g[2] \boldmath $\gg=$ \unboldmath gp[2];}\\ |
---|
1987 | \>{\sf z[1] \boldmath $\gg=$ \unboldmath zd[1];} \> {\sf z[3] \boldmath $\gg=$ \unboldmath zd[2];} |
---|
1988 | \end{tabbing} |
---|
1989 | % |
---|
1990 | where {\sf zd[1]} and {\sf zd[2]} are dummy arguments. |
---|
1991 | In the last line the two independent variables {\sf z[1]} and |
---|
1992 | {\sf z[3]} are made |
---|
1993 | simultaneously dependent thus generating a square system that can be |
---|
1994 | inverted (at most arguments). The corresponding projection and seed |
---|
1995 | matrix are |
---|
1996 | \begin{eqnarray*} |
---|
1997 | P \;=\; \left( \begin{array}{cccc} |
---|
1998 | 1 & 0 & 0 & 0 \\ |
---|
1999 | 0 & 0 & 1 & 0 |
---|
2000 | \end{array}\right) \quad \mbox{and} \quad |
---|
2001 | S^T \; = \; \left( \begin{array}{cccc} |
---|
2002 | 0 & 0 & 1 & 0 \\ |
---|
2003 | 0 & 0 & 0 & 1 |
---|
2004 | \end{array}\right) \enspace . |
---|
2005 | \end{eqnarray*} |
---|
2006 | Provided the vector {\sf zp} is consistent in that its Cartesian and polar |
---|
2007 | components describe the same point in the plane the resulting tuple |
---|
2008 | {\sf gp} must vanish. The call to {\sf inverse\_tensor\_eval} with |
---|
2009 | $n=4$, $p=2$ and $d$ |
---|
2010 | as desired will yield the implicit derivatives, provided |
---|
2011 | {\sf tensor} has been allocated appropriately of course and $S$ has the value |
---|
2012 | given above. |
---|
2013 | % |
---|
2014 | The example is untypical in that the implicit function could also be |
---|
2015 | obtained explicitly by symbolic mani\-pu\-lations. It is typical in that |
---|
2016 | the subset of $z$ components that are to be considered as truly |
---|
2017 | independent can be selected and altered with next to no effort at all. |
---|
2018 | |
---|
2019 | The presented drivers are prototyped in the header file |
---|
2020 | \verb=<adolc/drivers/taylor.h>=. As indicated before this header |
---|
2021 | is included by the global header file \verb=<adolc/adolc.h>= automatically |
---|
2022 | (see \autoref{ssec:DesIH}). |
---|
2023 | The example programs \verb=inversexam.cpp=, \verb=coordinates.cpp= and |
---|
2024 | \verb=trigger.cpp= in the directory \verb=examples/additional_examples/taylor= |
---|
2025 | show the application of the procedures described here. |
---|
2026 | % |
---|
2027 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2028 | \subsection{Drivers of the Toolbox for Lie Derivatives} |
---|
2029 | Nonlinear controller and observer design often require certain types of Lie derivatives. These derivatives also arise in other areas such as classical mechanics and relativity. |
---|
2030 | Lie derivatives are total derivatives of tensor fields along a vector field. |
---|
2031 | The drivers for calculating Lie derivatives in ADOL-C presented in this section |
---|
2032 | were developed by Klaus R{\"o}benack~\cite{Roeb05} and his co-workers Jan Winkler, Siqian Wang and Mirko Franke~\cite{Roeb11}. |
---|
2033 | They are prototyped as C functions in the header file {\verb=<adolc_lie.h>=} and |
---|
2034 | allow the calculation of Lie derivatives of scalar, vector and covector fields. In addition, the calculation of gradients of Lie derivatives of scalar fields is supported. |
---|
2035 | |
---|
2036 | We consider computational problems occurring in controller and observer design for |
---|
2037 | nonlinear control systems |
---|
2038 | \begin{equation*} |
---|
2039 | \dot{\mathbf x}=\mathbf f(\mathbf x)+\mathbf g(\mathbf x)u, |
---|
2040 | \quad \mathbf y=\mathbf h(\mathbf x),\quad \mathbf x(0)=\mathbf x_0 \in \Omega\\ |
---|
2041 | \end{equation*} |
---|
2042 | with the scalar input u, the state $\mathbf x$, and the output $\mathbf y$, where the vector |
---|
2043 | fields $\mathbf f : \Omega \to \R^n$, $\mathbf g: \Omega \to \R^n$ |
---|
2044 | and the map $\mathbf h: \Omega\to \R^m$ (for $m=1$ this map is the scalar field $h$) |
---|
2045 | are defined on an open subset $\Omega\subseteq\R^n$. |
---|
2046 | |
---|
2047 | \sloppy |
---|
2048 | \subsubsection*{Lie Derivatives of Scalar Fields} |
---|
2049 | Consider the initial value problem |
---|
2050 | \[ |
---|
2051 | \dot{\mathbf x}={\mathbf f}(\mathbf x),\quad y=h(\mathbf x),\quad \mathbf{x}(0)=\mathbf{x}_0\in\Omega |
---|
2052 | \] |
---|
2053 | with the vector field $\mathbf f : \Omega \to \R^n$. The Lie derivative of the scalar field $h:\Omega\to\R$ along the vector field~$\mathbf{f}$ are time derivatives of the curve~$y$, i.e., |
---|
2054 | \[ |
---|
2055 | \dot{y}(0)=L_\mathbf{f}h(\mathbf{x}_0),\; |
---|
2056 | \ddot{y}(0)=L_\mathbf{f}^2h(\mathbf{x}_0),\; \ldots,\; |
---|
2057 | y^{(d)}(0)=L_\mathbf{f}^dh(\mathbf{x}_0). |
---|
2058 | \] |
---|
2059 | Formally, the Lie derivative of the scalar field~$h$ along the vector field~$\mathbf{f}$ is defined by |
---|
2060 | \[ |
---|
2061 | L_{\mathbf f} h(\mathbf x)=\frac{\partial h(\mathbf x)}{\partial \mathbf x}\,\mathbf f(\mathbf x). |
---|
2062 | \] |
---|
2063 | Higher order Lie derivatives are given by the recursion |
---|
2064 | \[ |
---|
2065 | L_{\mathbf f}^{k+1} h(\mathbf x)= \frac{\partial L_{\mathbf f}^k h(\mathbf x)}{\partial \mathbf x}\,\mathbf f(\mathbf x)\quad\text{with}\quad L_{\mathbf f}^0 h(\mathbf x)= h(\mathbf x). |
---|
2066 | \] |
---|
2067 | To compute Lie derivatives |
---|
2068 | \[ |
---|
2069 | (L_\mathbf f^0h(\mathbf x_0),\ldots,L_\mathbf f^dh(\mathbf x_0)) |
---|
2070 | \] |
---|
2071 | we need the trace numbers of the active sections of the vector field~$\mathbf f$ and the scalar field~$h$, |
---|
2072 | the number of independent variables $n$, the initial value $\mathbf x_0\in\Omega$ |
---|
2073 | and the highest derivative degree $d$. The values of the Lie derivatives are |
---|
2074 | then stored in the one dimensional array {\verb=result=} of length $d+1$: |
---|
2075 | |
---|
2076 | \begin{tabbing} |
---|
2077 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2078 | \>{\sf int lie\_scalarc(Tape\_F, Tape\_H, n, x0, d, result)}\\ |
---|
2079 | \>{\sf short Tape\_F;} \> // trace identification of vector field $\mathbf f$ \\ |
---|
2080 | \>{\sf short Tape\_H;} \> // trace identification of scalar field $h$\\ |
---|
2081 | \>{\sf short n;} \> // number of independent variables $n$ and $m = 1$\\ |
---|
2082 | \>{\sf double x0[n];} \> // values of independent variables $\mathbf{x}_0$\\ |
---|
2083 | \>{\sf short d; } \> // highest derivative degree $d$\\ |
---|
2084 | \>{\sf double result[d+1];} \> // resulting Lie derivatives of a scalar field\\ |
---|
2085 | \end{tabbing} |
---|
2086 | |
---|
2087 | |
---|
2088 | For a smooth vector-valued map $\mathbf h: \Omega\rightarrow \R^m$ with |
---|
2089 | $\mathbf h(\mathbf x)=(h_1(\mathbf x),\ldots,h_m(\mathbf x))^T$ |
---|
2090 | we define the Lie derivative |
---|
2091 | \begin{equation}\label{eq:lie_vectorial} |
---|
2092 | L_\mathbf f^k\mathbf h(\mathbf x_0)= |
---|
2093 | (L_\mathbf f^kh_1(\mathbf x_0),\ldots,L_\mathbf f^kh_m(\mathbf x_0))^T\in\R^m |
---|
2094 | \quad \text{for}\quad k = 0, \cdots, d, |
---|
2095 | \end{equation} |
---|
2096 | component-wise by the Lie derivatives $L_\mathbf f^k h_1(\mathbf x_0),\ldots,L_\mathbf f^k h_m(\mathbf x_0)$ of the $m$ scalar fields $h_1, \ldots, h_m:\Omega\to\R$. Note that~\eqref{eq:lie_vectorial} should not be confused with the Lie derivative of a vector field. To compute Lie derivatives~\eqref{eq:lie_vectorial} of the vector-valued map $\mathbf h$ |
---|
2097 | we additionally need the number of dependent variables $m$: |
---|
2098 | |
---|
2099 | \begin{tabbing} |
---|
2100 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2101 | \>{\sf int lie\_scalarcv(Tape\_F, Tape\_H, n, m, x0, d, result)}\\ |
---|
2102 | \>{\sf short Tape\_F;} \> // trace identification of vector field $\mathbf f$ \\ |
---|
2103 | \>{\sf short Tape\_H;} \> // trace identification of vector-valued map $\mathbf h$\\ |
---|
2104 | \>{\sf short n;} \> // number of independent variables $n$\\ |
---|
2105 | \>{\sf short m;} \> // number of dependent variables $m$\\ |
---|
2106 | \>{\sf double x0[n];} \> // values of independent variables $\mathbf{x}_0$\\ |
---|
2107 | \>{\sf short d; } \> // highest derivative degree $d$\\ |
---|
2108 | \>{\sf double result[m][d+1];} \> // resulting Lie derivatives of vectorial scalar fields\\ |
---|
2109 | \end{tabbing} |
---|
2110 | |
---|
2111 | \subsubsection*{Gradients of Lie Derivatives of Scalar Fields} |
---|
2112 | To compute the gradients |
---|
2113 | \[ |
---|
2114 | (\mathrm{d}L_\mathbf f^0h(\mathbf x_0),\ldots, \mathrm{d}L_\mathbf f^dh(\mathbf x_0)) |
---|
2115 | \] |
---|
2116 | of the Lie derivatives $L_\mathbf f^0h(\mathbf x_0),\ldots, L_\mathbf f^dh(\mathbf x_0)$ of the scalar field $h:\Omega\to\R$ along the vector field $\mathbf f:\Omega\to\R^n$, |
---|
2117 | the following C function can be used: |
---|
2118 | |
---|
2119 | \begin{tabbing} |
---|
2120 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2121 | \>{\sf int lie\_gradientc(Tape\_F, Tape\_H, n, x0, d, result)}\\ |
---|
2122 | \>{\sf short Tape\_F;} \> // trace identification of vector field $\mathbf f$ \\ |
---|
2123 | \>{\sf short Tape\_H;} \> // trace identification of scalar field $h$\\ |
---|
2124 | \>{\sf short n;} \> // number of independent variables $n$ and $m=1$\\ |
---|
2125 | \>{\sf double x0[n];} \> // values of independent variables $\mathbf{x}_0$\\ |
---|
2126 | \>{\sf short d;} \> // highest derivative degree $d$\\ |
---|
2127 | \>{\sf double result[n][d+1];} \> // resulting gradients of Lie derivatives\\ |
---|
2128 | \>{\sf} \> // of a scalar field\\ |
---|
2129 | \end{tabbing} |
---|
2130 | |
---|
2131 | For calculating the jacobians $\mathrm{d}L_\mathbf f^k\mathbf h(\mathbf x_0)\in\R^{m\times n}$ |
---|
2132 | of Lie derivatives~\eqref{eq:lie_vectorial} of the vector-valued map $\mathbf h: \Omega\rightarrow \R^m$ for $k=0,\ldots,d$ one can use the following function, where the $d+1$ matrices are stored in the three dimensional array {\verb=result=}: |
---|
2133 | |
---|
2134 | \begin{tabbing} |
---|
2135 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2136 | \>{\sf int lie\_gradientcv(Tape\_F, Tape\_H, n, m, x0, d, result)}\\ |
---|
2137 | \>{\sf short Tape\_F;} \> // trace identification of vector field $\mathbf f$ \\ |
---|
2138 | \>{\sf short Tape\_H;} \> // trace identification of vector-valued map $\mathbf h$\\ |
---|
2139 | \>{\sf short n;} \> // number of independent variables $n$\\ |
---|
2140 | \>{\sf short m;} \> // number of dependent variables $m$\\ |
---|
2141 | \>{\sf double x0[n];} \> // values of independent variables $\mathbf{x}_0$\\ |
---|
2142 | \>{\sf short d;} \> // highest derivative degree $d$\\ |
---|
2143 | \>{\sf double result[m][n][d+1];} \> // resulting jacobians of Lie derivatives\\ |
---|
2144 | \>{\sf} \> // of vectorial scalar fields\\ |
---|
2145 | \end{tabbing} |
---|
2146 | |
---|
2147 | \subsubsection*{Lie Derivatives of Covector Fields} |
---|
2148 | |
---|
2149 | If we consider the elements of the real vector space $\R^n$ as column vectors, the elements of the associated dual space $(\R^n)^*$ can be represented by row vectors. These row vectors are called adjoint vectors in ADOL-C and covectors in differential geometry. |
---|
2150 | In a program the user does not have to distinguish between a vector field |
---|
2151 | $\mathbf f: \Omega \rightarrow \R^n$ and a covector field |
---|
2152 | $\boldsymbol{\omega}: \Omega \rightarrow (\R^n)^*$, since for both fields the dependent variables are stored in one dimensional arrays. |
---|
2153 | The Lie derivative of a covector field~$\boldsymbol{\omega}$ is defined by |
---|
2154 | \[ |
---|
2155 | L_{\mathbf f}\boldsymbol{\omega}(\mathbf x)=\boldsymbol{\omega}(\mathbf x)\,{\mathbf f}^\prime(\mathbf x) |
---|
2156 | + {\mathbf f}^T(\mathbf x) \left(\frac{\partial \boldsymbol{\omega}^T (\mathbf x)}{\partial \mathbf x}\right)^T. |
---|
2157 | \] |
---|
2158 | The $d+1$ Lie derivatives $L_{\mathbf f}^0\boldsymbol{\omega}(\mathbf x),\ldots,L_{\mathbf f}^d\boldsymbol{\omega}(\mathbf x)$ of the covector field $\boldsymbol{\omega}$ along the vector field~$\mathbf{f}$ |
---|
2159 | can be computed by |
---|
2160 | |
---|
2161 | \begin{tabbing} |
---|
2162 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2163 | \>{\sf int lie\_covector(Tape\_F, Tape\_W, n, x0, d, result)}\\ |
---|
2164 | \>{\sf short Tape\_F;} \> // trace identification of vector field $\mathbf f$ \\ |
---|
2165 | \>{\sf short Tape\_W;} \> // trace identification of covector field $\boldsymbol{\omega}$\\ |
---|
2166 | \>{\sf short n;} \> // number of independent variables $n$\\ |
---|
2167 | \>{\sf double x0[n];} \> // values of independent variables $\mathbf{x}_0$\\ |
---|
2168 | \>{\sf short d;} \> // highest derivative degree $d$\\ |
---|
2169 | \>{\sf double result[n][d+1];} \> // resulting Lie derivatives of a covector field\\ |
---|
2170 | \end{tabbing} |
---|
2171 | |
---|
2172 | \subsubsection*{Lie Derivatives of a Vector Field (Lie Brackets)} |
---|
2173 | |
---|
2174 | Lie derivatives (Lie brackets) of a vector field $\mathbf g: \Omega \rightarrow \R^n$ along a vector field $\mathbf f: \Omega \rightarrow \R^n$ are defined by |
---|
2175 | \[ |
---|
2176 | \mathrm{ad}_\mathbf{f}\mathbf{g}(\mathbf{x})=[\mathbf{f},\mathbf{g}](\mathbf{x}) |
---|
2177 | =\mathbf{g}^{\prime}(\mathbf{x})\mathbf{f}(\mathbf{x})-\mathbf{f}^{\prime}(\mathbf{x})\mathbf{g}(\mathbf{x}). |
---|
2178 | \] |
---|
2179 | We can calculate iterated Lie brackets $\mathrm{ad}_\mathbf{f}^0\mathbf{g}(\mathbf{x}),\ldots,\mathrm{ad}_\mathbf{f}^{d}\mathbf{g}(\mathbf{x})$ at the point $\mathbf{x}_0\in\Omega$ |
---|
2180 | using the C function listed below: |
---|
2181 | |
---|
2182 | \begin{tabbing} |
---|
2183 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2184 | \>{\sf int lie\_bracket(Tape\_F, Tape\_G, n, x0, d, result)}\\ |
---|
2185 | \>{\sf short Tape\_F;} \> // trace identification of vector field $\mathbf f$ \\ |
---|
2186 | \>{\sf short Tape\_G;} \> // trace identification of vector field $\mathbf g$\\ |
---|
2187 | \>{\sf short n;} \> // number of independent variables $n$\\ |
---|
2188 | \>{\sf double x0[n];} \> // values of independent variables $\mathbf{x}_0$\\ |
---|
2189 | \>{\sf short d;} \> // highest derivative degree $d$\\ |
---|
2190 | \>{\sf double result[n][d+1];} \> // resulting Lie derivatives of a vector field\\ |
---|
2191 | \end{tabbing} |
---|
2192 | |
---|
2193 | \subsubsection*{Additional C++ Drivers for Lie Derivatives} |
---|
2194 | |
---|
2195 | For Lie derivatives of scalar fields, the C interface offers two different drivers, namely {\verb=lie_scalarc=} for $m=1$ scalar field $h:\Omega\to\R$ and {\verb=lie_scalarcv=} for $m\geq1$ scalar fields $h_1,\ldots,h_m:\Omega\to\R$, which are combined in a vector-valued map $\mathbf{h}:\Omega\to\R^m$. Using the polymorphism in C++, both C drivers are unified by the C++ driver {\verb=lie_scalar=} for the computation of Lie derivatives of scalar fields: |
---|
2196 | |
---|
2197 | \begin{tabbing} |
---|
2198 | \hspace{0.5in}\={\sf int lie\_scalar(Tape\_F, Tape\_H, n, x0, d, result)} \hspace{0.45in}\= // case $m=1$\\ |
---|
2199 | \>{\sf int lie\_scalar(Tape\_F, Tape\_H, n, m, x0, d, result)} \> // case $m\geq1$ |
---|
2200 | \end{tabbing} |
---|
2201 | |
---|
2202 | The same situation occurs with the gradients or jacobians of Lie derivatives of scalar fields. Here, the C drivers {\verb=lie_gradientc=} and {\verb=lie_gradientcv=} are unified by the C++ driver {\verb=lie_gradient=}: |
---|
2203 | |
---|
2204 | \begin{tabbing} |
---|
2205 | \hspace{0.5in}\={\sf int lie\_gradient(Tape\_F, Tape\_H, n, x0, d, result)} \hspace{0.35in}\= // case $m=1$\\ |
---|
2206 | \>{\sf int lie\_gradient(Tape\_F, Tape\_H, n, m, x0, d, result)} \> // case $m\geq1$ |
---|
2207 | \end{tabbing} |
---|
2208 | |
---|
2209 | The C++ drivers are also prototyped in the header file {\sf $<$adolc\_lie.h$>$}. An example how to use these drivers for the calculation of Lie derivatives can be found in the file {\sf GantryCrane.cpp} in the directory {\verb=additional_examples/lie=}. |
---|
2210 | |
---|
2211 | % |
---|
2212 | % |
---|
2213 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2214 | \section{Basic Drivers for the Forward and Reverse Mode} |
---|
2215 | \label{forw_rev_ad} |
---|
2216 | % |
---|
2217 | In this section, we present tailored drivers for different |
---|
2218 | variants of the forward mode and the reverse mode, respectively. |
---|
2219 | For a better understanding, we start with a short |
---|
2220 | description of the mathematical background. |
---|
2221 | |
---|
2222 | Provided no arithmetic exception occurs, |
---|
2223 | no comparison including {\sf fmax} or {\sf fmin} yields a tie, |
---|
2224 | {\sf fabs} does not yield zero, |
---|
2225 | and all special functions were evaluated in the |
---|
2226 | interior of their domains, the functional relation between the input |
---|
2227 | variables $x$ |
---|
2228 | and the output variables $y$ denoted by $y=F(x)$ is in |
---|
2229 | fact analytic. In other words, we can compute arbitrarily high |
---|
2230 | derivatives of the vector function $F : I\!\!R^n \mapsto I\!\!R^m$ defined |
---|
2231 | by the active section. |
---|
2232 | We find it most convenient to describe and |
---|
2233 | compute derivatives in terms of univariate Taylor expansions, which |
---|
2234 | are truncated after the highest derivative degree $d$ that is desired |
---|
2235 | by the user. Let |
---|
2236 | \begin{equation} |
---|
2237 | \label{eq:x_of_t} |
---|
2238 | x(t) \; \equiv \; \sum_{j=0}^dx_jt^j \; : \; I\!\!R \; \mapsto \; |
---|
2239 | I\!\!R^n |
---|
2240 | \end{equation} |
---|
2241 | denote any vector polynomial in the scalar variable $t \in I\!\!R$. |
---|
2242 | In other words, $x(t)$ describes a path in $I\!\!R^n$ parameterized by $t$. |
---|
2243 | The Taylor coefficient vectors |
---|
2244 | \[ x_j \; = \; |
---|
2245 | \frac{1}{j!} \left . \frac{\partial ^j}{\partial t^j} x(t) |
---|
2246 | \right |_{t=0} |
---|
2247 | \] |
---|
2248 | are simply the scaled derivatives of $x(t)$ at the parameter |
---|
2249 | origin $t=0$. The first two vectors $x_1,x_2 \in I\!\!R^n$ can be |
---|
2250 | visualized as tangent and curvature at the base point $x_0$, |
---|
2251 | respectively. |
---|
2252 | Provided that $F$ is $d$ times continuously differentiable, it |
---|
2253 | follows from the chain rule that the image path |
---|
2254 | \begin{equation} |
---|
2255 | \label{eq:rela} |
---|
2256 | y(t) \; \equiv \; F(x(t)) \; : \; I\!\!R \;\mapsto \;I\!\!R^m |
---|
2257 | \end{equation} |
---|
2258 | is also smooth and has $(d+1)$ Taylor coefficient vectors |
---|
2259 | $y_j \in I\!\!R^m$ at $t=0$, so that |
---|
2260 | \begin{equation} |
---|
2261 | \label{eq:series} |
---|
2262 | y(t) \; = \; \sum_{j=0}^d y_jt^j + O(t^{d+1}). |
---|
2263 | \end{equation} |
---|
2264 | Also as a consequence of the chain rule, one can observe that |
---|
2265 | each $y_j$ is uniquely and smoothly determined by the coefficient |
---|
2266 | vectors $x_i$ with $i \leq j$. In particular we have |
---|
2267 | \begin{align} |
---|
2268 | \label{eq:y_0y_1} |
---|
2269 | y_0 & = F(x_0) \nonumber \\ |
---|
2270 | y_1 & = F'(x_0) x_1 \nonumber\\ |
---|
2271 | y_2 & = F'(x_0) x_2 + \frac{1}{2}F''(x_0)x_1 x_1 \\ |
---|
2272 | y_3 & = F'(x_0) x_3 + F''(x_0)x_1 x_2 |
---|
2273 | + \frac{1}{6}F'''(x_0)x_1 x_1 x_1\nonumber\\ |
---|
2274 | & \ldots\nonumber |
---|
2275 | \end{align} |
---|
2276 | In writing down the last equations we have already departed from the |
---|
2277 | usual matrix-vector notation. It is well known that the number of |
---|
2278 | terms that occur in these ``symbolic'' expressions for |
---|
2279 | the $y_j$ in terms of the first $j$ derivative tensors of $F$ and |
---|
2280 | the ``input'' coefficients $x_i$ with $i\leq j$ grows very rapidly |
---|
2281 | with $j$. Fortunately, this exponential growth does not occur |
---|
2282 | in automatic differentiation, where the many terms are somehow |
---|
2283 | implicitly combined so that storage and operations count grow only |
---|
2284 | quadratically in the bound $d$ on $j$. |
---|
2285 | |
---|
2286 | Provided $F$ is analytic, this property is inherited by the functions |
---|
2287 | \[ |
---|
2288 | y_j = y_j (x_0,x_1, \ldots ,x_j) \in {I\!\!R}^m , |
---|
2289 | \] |
---|
2290 | and their derivatives satisfy the identities |
---|
2291 | \begin{equation} |
---|
2292 | \label{eq:identity} |
---|
2293 | \frac{\partial y_j}{\partial x_i} = \frac{\partial y_{j-i}} |
---|
2294 | {\partial x_0} = A_{j-i}(x_0,x_1, \ldots ,x_{j-i}) |
---|
2295 | \end{equation} |
---|
2296 | as established in \cite{Chri91a}. This yields in particular |
---|
2297 | \begin{align*} |
---|
2298 | \frac{\partial y_0}{\partial x_0} = |
---|
2299 | \frac{\partial y_1}{\partial x_1} = |
---|
2300 | \frac{\partial y_2}{\partial x_2} = |
---|
2301 | \frac{\partial y_3}{\partial x_3} = |
---|
2302 | A_0 & = F'(x_0) \\ |
---|
2303 | \frac{\partial y_1}{\partial x_0} = |
---|
2304 | \frac{\partial y_2}{\partial x_1} = |
---|
2305 | \frac{\partial y_3}{\partial x_2} = |
---|
2306 | A_1 & = F''(x_0) x_1 \\ |
---|
2307 | \frac{\partial y_2}{\partial x_0} = |
---|
2308 | \frac{\partial y_3}{\partial x_1} = |
---|
2309 | A_2 & = F''(x_0) x_2 + \frac{1}{2}F'''(x_0)x_1 x_1 \\ |
---|
2310 | \frac{\partial y_3}{\partial x_0} = |
---|
2311 | A_3 & = F''(x_0) x_3 + F'''(x_0)x_1 x_2 |
---|
2312 | + \frac{1}{6}F^{(4)}(x_0)x_1 x_1 x_1 \\ |
---|
2313 | & \ldots |
---|
2314 | \end{align*} |
---|
2315 | The $m \times n$ matrices $A_k, k=0,\ldots,d$, are actually the Taylor |
---|
2316 | coefficients of the Jacobian path $F^\prime(x(t))$, a fact that is of |
---|
2317 | interest primarily in the context of ordinary differential |
---|
2318 | equations and differential algebraic equations. |
---|
2319 | |
---|
2320 | Given the tape of an active section and the coefficients $x_j$, |
---|
2321 | the resulting $y_j$ and their derivatives $A_j$ can be evaluated |
---|
2322 | by appropriate calls to the ADOL-C forward mode implementations and |
---|
2323 | the ADOL-C reverse mode implementations. The scalar versions of the forward |
---|
2324 | mode propagate just one truncated Taylor series from the $(x_j)_{j\leq d}$ |
---|
2325 | to the $(y_j)_{j\leq d}$. The vector versions of the forward |
---|
2326 | mode propagate families of $p\geq 1$ such truncated Taylor series |
---|
2327 | in order to reduce the relative cost of the overhead incurred |
---|
2328 | in the tape interpretation. In detail, ADOL-C provides |
---|
2329 | \begin{tabbing} |
---|
2330 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2331 | \>{\sf int zos\_forward(tag,m,n,keep,x,y)}\\ |
---|
2332 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2333 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2334 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2335 | \>{\sf int keep;} \> // flag for reverse mode preparation\\ |
---|
2336 | \>{\sf double x[n];} \> // independent vector $x=x_0$\\ |
---|
2337 | \>{\sf double y[m];} \> // dependent vector $y=F(x_0)$ |
---|
2338 | \end{tabbing} |
---|
2339 | for the {\bf z}ero-{\bf o}rder {\bf s}calar forward mode. This driver computes |
---|
2340 | $y=F(x)$ with $0\leq\text{\sf keep}\leq 1$. The integer |
---|
2341 | flag {\sf keep} plays a similar role as in the call to |
---|
2342 | {\sf trace\_on}: It determines if {\sf zos\_forward} writes |
---|
2343 | the first Taylor coefficients of all intermediate quantities into a buffered |
---|
2344 | temporary file, i.e., the value stack, in preparation for a subsequent |
---|
2345 | reverse mode evaluation. The value {\sf keep} $=1$ |
---|
2346 | prepares for {\sf fos\_reverse} or {\sf fov\_reverse} as exlained below. |
---|
2347 | |
---|
2348 | To compute first-order derivatives, one has |
---|
2349 | \begin{tabbing} |
---|
2350 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2351 | \>{\sf int fos\_forward(tag,m,n,keep,x0,x1,y0,y1)}\\ |
---|
2352 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2353 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2354 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2355 | \>{\sf int keep;} \> // flag for reverse mode preparation\\ |
---|
2356 | \>{\sf double x0[n];} \> // independent vector $x_0$\\ |
---|
2357 | \>{\sf double x1[n];} \> // tangent vector $x_1$\\ |
---|
2358 | \>{\sf double y0[m];} \> // dependent vector $y_0=F(x_0)$\\ |
---|
2359 | \>{\sf double y1[m];} \> // first derivative $y_1=F'(x_0)x_1$ |
---|
2360 | \end{tabbing} |
---|
2361 | for the {\bf f}irst-{\bf o}rder {\bf s}calar forward mode. Here, one has |
---|
2362 | $0\leq\text{\sf keep}\leq 2$, where |
---|
2363 | \begin{align*} |
---|
2364 | \text{\sf keep} = \left\{\begin{array}{cl} |
---|
2365 | 1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\ |
---|
2366 | 2 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}} |
---|
2367 | \end{array}\right. |
---|
2368 | \end{align*} |
---|
2369 | as exlained below. For the {\bf f}irst-{\bf o}rder {\bf v}ector forward mode, |
---|
2370 | ADOL-C provides |
---|
2371 | \begin{tabbing} |
---|
2372 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2373 | \>{\sf int fov\_forward(tag,m,n,p,x0,X,y0,Y)}\\ |
---|
2374 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2375 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2376 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2377 | \>{\sf int p;} \> // number of directions\\ |
---|
2378 | \>{\sf double x0[n];} \> // independent vector $x_0$\\ |
---|
2379 | \>{\sf double X[n][p];} \> // tangent matrix $X$\\ |
---|
2380 | \>{\sf double y0[m];} \> // dependent vector $y_0=F(x_0)$\\ |
---|
2381 | \>{\sf double Y[m][p];} \> // first derivative matrix $Y=F'(x)X$ |
---|
2382 | \end{tabbing} |
---|
2383 | For the computation of higher derivative, the driver |
---|
2384 | \begin{tabbing} |
---|
2385 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2386 | \>{\sf int hos\_forward(tag,m,n,d,keep,x0,X,y0,Y)}\\ |
---|
2387 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2388 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2389 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2390 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2391 | \>{\sf int keep;} \> // flag for reverse mode preparation\\ |
---|
2392 | \>{\sf double x0[n];} \> // independent vector $x_0$\\ |
---|
2393 | \>{\sf double X[n][d];} \> // tangent matrix $X$\\ |
---|
2394 | \>{\sf double y0[m];} \> // dependent vector $y_0=F(x_0)$\\ |
---|
2395 | \>{\sf double Y[m][d];} \> // derivative matrix $Y$ |
---|
2396 | \end{tabbing} |
---|
2397 | implementing the {\bf h}igher-{\bf o}rder {\bf s}calar forward mode. |
---|
2398 | The rows of the matrix $X$ must correspond to the independent variables in the order of their |
---|
2399 | initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of |
---|
2400 | $X = \{x_j\}_{j=1\ldots d}$ represent Taylor coefficient vectors as in |
---|
2401 | \eqref{eq:x_of_t}. The rows of the matrix $Y$ must correspond to the |
---|
2402 | dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator. |
---|
2403 | The columns of $Y = \{y_j\}_{j=1\ldots d}$ represent |
---|
2404 | Taylor coefficient vectors as in \eqref{eq:series}, i.e., {\sf hos\_forward} |
---|
2405 | computes the values |
---|
2406 | $y_0=F(x_0)$, $y_1=F'(x_0)x_1$, \ldots, where |
---|
2407 | $X=[x_1,x_2,\ldots,x_d]$ and $Y=[y_1,y_2,\ldots,y_d]$. Furthermore, one has |
---|
2408 | $0\leq\text{\sf keep}\leq d+1$, with |
---|
2409 | \begin{align*} |
---|
2410 | \text{\sf keep} \left\{\begin{array}{cl} |
---|
2411 | = 1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\ |
---|
2412 | > 1 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}} |
---|
2413 | \end{array}\right. |
---|
2414 | \end{align*} |
---|
2415 | Once more, there is also a vector version given by |
---|
2416 | \begin{tabbing} |
---|
2417 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2418 | \>{\sf int hov\_forward(tag,m,n,d,p,x0,X,y0,Y)}\\ |
---|
2419 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2420 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2421 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2422 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2423 | \>{\sf int p;} \> // number of directions $p$\\ |
---|
2424 | \>{\sf double x0[n];} \> // independent vector $x_0$\\ |
---|
2425 | \>{\sf double X[n][p][d];} \> // tangent matrix $X$\\ |
---|
2426 | \>{\sf double y0[m];} \> // dependent vector $y_0=F(x_0)$\\ |
---|
2427 | \>{\sf double Y[m][p][d];} \> // derivative matrix $Y$ |
---|
2428 | \end{tabbing} |
---|
2429 | for the {\bf h}igher-{\bf o}rder {\bf v}ector forward mode that computes |
---|
2430 | $y_0=F(x_0)$, $Y_1=F'(x_0)X_1$, \ldots, where $X=[X_1,X_2,\ldots,X_d]$ and |
---|
2431 | $Y=[Y_1,Y_2,\ldots,Y_d]$. |
---|
2432 | |
---|
2433 | There are also overloaded versions providing a general {\sf forward}-call. |
---|
2434 | Details of the appropriate calling sequences are given in \autoref{forw_rev}. |
---|
2435 | |
---|
2436 | Once, the required information is generated due to a forward mode evaluation |
---|
2437 | with an approriate value of the parameter {\sf keep}, one may use the |
---|
2438 | following implementation variants of the reverse mode. To compute first-order derivatives |
---|
2439 | one can use |
---|
2440 | \begin{tabbing} |
---|
2441 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2442 | \>{\sf int fos\_reverse(tag,m,n,u,z)}\\ |
---|
2443 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2444 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2445 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2446 | \>{\sf double u[m];} \> // weight vector $u$\\ |
---|
2447 | \>{\sf double z[n];} \> // resulting adjoint value $z^T=u^T F'(x)$ |
---|
2448 | \end{tabbing} |
---|
2449 | as {\bf f}irst-{\bf o}rder {\bf s}calar reverse mode implementation that computes |
---|
2450 | the product $z^T=u^T F'(x)$ after calling {\sf zos\_forward}, {\sf fos\_forward}, or |
---|
2451 | {\sf hos\_forward} with {\sf keep}=1. The corresponding {\bf f}irst-{\bf |
---|
2452 | o}rder {\bf v}ector reverse mode driver is given by |
---|
2453 | \begin{tabbing} |
---|
2454 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2455 | \>{\sf int fov\_reverse(tag,m,n,q,U,Z)}\\ |
---|
2456 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2457 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2458 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2459 | \>{\sf int q;} \> // number of weight vectors $q$\\ |
---|
2460 | \>{\sf double U[q][m];} \> // weight matrix $U$\\ |
---|
2461 | \>{\sf double Z[q][n];} \> // resulting adjoint $Z=U F'(x)$ |
---|
2462 | \end{tabbing} |
---|
2463 | that can be used after calling {\sf zos\_forward}, {\sf fos\_forward}, or |
---|
2464 | {\sf hos\_forward} with {\sf keep}=1. To compute higher-order derivatives, |
---|
2465 | ADOL-C provides |
---|
2466 | \begin{tabbing} |
---|
2467 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2468 | \>{\sf int hos\_reverse(tag,m,n,d,u,Z)}\\ |
---|
2469 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2470 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2471 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2472 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2473 | \>{\sf double u[m];} \> // weight vector $u$\\ |
---|
2474 | \>{\sf double Z[n][d+1];} \> // resulting adjoints |
---|
2475 | \end{tabbing} |
---|
2476 | as {\bf h}igher-{\bf o}rder {\bf s}calar reverse mode implementation yielding |
---|
2477 | the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$, |
---|
2478 | \ldots, where $Z=[z_0,z_1,\ldots,z_d]$ after calling {\sf fos\_forward} or |
---|
2479 | {\sf hos\_forward} with {\sf keep} $=d+1>1$. The vector version is given by |
---|
2480 | \begin{tabbing} |
---|
2481 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2482 | \>{\sf int hov\_reverse(tag,m,n,d,q,U,Z,nz)}\\ |
---|
2483 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2484 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2485 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2486 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2487 | \>{\sf double U[q][m];} \> // weight vector $u$\\ |
---|
2488 | \>{\sf double Z[q][n][d+1];} \> // resulting adjoints\\ |
---|
2489 | \>{\sf short int nz[q][n];} \> // nonzero pattern of {\sf Z} |
---|
2490 | \end{tabbing} |
---|
2491 | as {\bf h}igher-{\bf o}rder {\bf v}ector reverse mode driver to compute |
---|
2492 | the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$, |
---|
2493 | \ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$ after calling {\sf fos\_forward} or |
---|
2494 | {\sf hos\_forward} with {\sf keep} $=d+1>1$. |
---|
2495 | After the function call, the last argument of {\sf hov\_reverse} |
---|
2496 | contains information about the sparsity pattern, i.e. each {\sf nz[i][j]} |
---|
2497 | has a value that characterizes the functional relation between the |
---|
2498 | $i$-th component of $UF^\prime(x)$ and the $j$-th independent value |
---|
2499 | $x_j$ as: |
---|
2500 | \begin{center} |
---|
2501 | \begin{tabular}{ll} |
---|
2502 | 0 & trivial \\ |
---|
2503 | 1 & linear |
---|
2504 | \end{tabular} \hspace*{4ex} |
---|
2505 | \begin{tabular}{ll} |
---|
2506 | 2 & polynomial\\ |
---|
2507 | 3 & rational |
---|
2508 | \end{tabular} \hspace*{4ex} |
---|
2509 | \begin{tabular}{ll} |
---|
2510 | 4 & transcendental\\ |
---|
2511 | 5 & non-smooth |
---|
2512 | \end{tabular} |
---|
2513 | \end{center} |
---|
2514 | Here, ``trivial'' means that there is no dependence at all and ``linear'' means |
---|
2515 | that the partial derivative is a constant that |
---|
2516 | does not dependent on other variables either. ``Non-smooth'' means that one of |
---|
2517 | the functions on the path between $x_i$ and $y_j$ was evaluated at a point |
---|
2518 | where it is not differentiable. All positive labels |
---|
2519 | $1, 2, 3, 4, 5$ are pessimistic in that the actual functional relation may |
---|
2520 | in fact be simpler, for example due to exact cancellations. |
---|
2521 | |
---|
2522 | There are also overloaded versions providing a general {\sf reverse}-call. |
---|
2523 | Details of the appropriate calling sequences are given in the following \autoref{forw_rev}. |
---|
2524 | |
---|
2525 | |
---|
2526 | \input{absdrivers} |
---|
2527 | % |
---|
2528 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2529 | \section{Overloaded Forward and Reverse Calls} |
---|
2530 | \label{forw_rev} |
---|
2531 | % |
---|
2532 | In this section, the several versions of the {\sf forward} and |
---|
2533 | {\sf reverse} routines, which utilize the overloading capabilities |
---|
2534 | of C++, are described in detail. With exception of the bit pattern |
---|
2535 | versions all interfaces are prototyped in the header file |
---|
2536 | \verb=<adolc/interfaces.h>=, where also some more specialized {\sf forward} |
---|
2537 | and {\sf reverse} routines are explained. Furthermore, \mbox{ADOL-C} provides |
---|
2538 | C and Fortran-callable versions prototyped in the same header file. |
---|
2539 | The bit pattern versions of {\sf forward} and {\sf reverse} introduced |
---|
2540 | in the \autoref{ProBit} are prototyped in the header file |
---|
2541 | \verb=<adolc/sparse/sparsedrivers.h>=, which will be included by the header |
---|
2542 | file \verb=<adolc/interfaces.h>= automatically. |
---|
2543 | % |
---|
2544 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2545 | \subsection{The Scalar Case} |
---|
2546 | % |
---|
2547 | \label{ScaCas} |
---|
2548 | % |
---|
2549 | Given any correct tape, one may call from within |
---|
2550 | the generating program, or subsequently during another run, the following |
---|
2551 | procedure: |
---|
2552 | % |
---|
2553 | \begin{tabbing} |
---|
2554 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2555 | \>{\sf int forward(tag,m,n,d,keep,X,Y)} \\ |
---|
2556 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2557 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2558 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2559 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2560 | \>{\sf int keep;} \> // flag for reverse sweep \\ |
---|
2561 | \>{\sf double X[n][d+1];} \> // Taylor coefficients $X$ of |
---|
2562 | independent variables \\ |
---|
2563 | \>{\sf double Y[m][d+1];} \> // Taylor coefficients $Y$ as |
---|
2564 | in \eqref{eq:series} |
---|
2565 | \end{tabbing} |
---|
2566 | % |
---|
2567 | The rows of the matrix $X$ must correspond to the independent variables in the order of their |
---|
2568 | initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of |
---|
2569 | $X = \{x_j\}_{j=0\ldots d}$ represent Taylor coefficient vectors as in |
---|
2570 | \eqref{eq:x_of_t}. The rows of the matrix $Y$ must |
---|
2571 | correspond to the |
---|
2572 | dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator. |
---|
2573 | The columns of $Y = \{y_j\}_{j=0\ldots d}$ represent |
---|
2574 | Taylor coefficient vectors as in \eqref{eq:series}. |
---|
2575 | Thus the first column of $Y$ contains the |
---|
2576 | function value $F(x)$ itself, the next column represents the first |
---|
2577 | Taylor coefficient vector of $F$, and the last column the |
---|
2578 | $d$-th Taylor coefficient vector. The integer flag {\sf keep} determines |
---|
2579 | how many Taylor coefficients of all intermediate quantities are |
---|
2580 | written into the value stack as explained in \autoref{forw_rev_ad}. |
---|
2581 | If {\sf keep} is omitted, it defaults to 0. |
---|
2582 | |
---|
2583 | The given {\sf tag} value is used by {\sf forward} to determine the |
---|
2584 | name of the file on which the tape was written. If the tape file does |
---|
2585 | not exist, {\sf forward} assumes that the relevant |
---|
2586 | tape is still in core and reads from the buffers. |
---|
2587 | After the execution of an active section with \mbox{{\sf keep} = 1} or a call to |
---|
2588 | {\sf forward} with any {\sf keep} $\leq$ $d+1$, one may call |
---|
2589 | the function {\sf reverse} with \mbox{{\sf d} = {\sf keep} $-$ 1} and the same tape |
---|
2590 | identifier {\sf tag}. When $u$ is a vector |
---|
2591 | and $Z$ an $n\times (d+1)$ matrix |
---|
2592 | {\sf reverse} is executed in the scalar mode by the calling |
---|
2593 | sequence |
---|
2594 | % |
---|
2595 | \begin{tabbing} |
---|
2596 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2597 | \>{\sf int reverse(tag,m,n,d,u,Z)}\\ |
---|
2598 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2599 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2600 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2601 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2602 | \>{\sf double u[m];} \> // weighting vector $u$\\ |
---|
2603 | \>{\sf double Z[n][d+1];} \> // resulting adjoints $Z$ |
---|
2604 | \end{tabbing} |
---|
2605 | to compute |
---|
2606 | the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$, |
---|
2607 | \ldots, where $Z=[z_0,z_1,\ldots,z_d]$. |
---|
2608 | % |
---|
2609 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2610 | \subsection{The Vector Case} |
---|
2611 | % |
---|
2612 | \label{vecCas} |
---|
2613 | % |
---|
2614 | When $U$ is a matrix {\sf reverse} is executed in the vector mode by the following calling sequence |
---|
2615 | % |
---|
2616 | \begin{tabbing} |
---|
2617 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2618 | \>{\sf int reverse(tag,m,n,d,q,U,Z,nz)}\\ |
---|
2619 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2620 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2621 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2622 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2623 | \>{\sf int q;} \> // number of weight vectors $q$\\ |
---|
2624 | \>{\sf double U[q][m];} \> // weight matrix $U$\\ |
---|
2625 | \>{\sf double Z[q][n][d+1];} \> // resulting adjoints \\ |
---|
2626 | \>{\sf short nz[q][n];} \> // nonzero pattern of {\sf Z} |
---|
2627 | \end{tabbing} |
---|
2628 | % |
---|
2629 | to compute the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$, |
---|
2630 | \ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$. |
---|
2631 | When the arguments {\sf p} and {\sf U} are omitted, they default to |
---|
2632 | $m$ and the identity matrix of order $m$, respectively. |
---|
2633 | |
---|
2634 | Through the optional argument {\sf nz} of {\sf reverse} one can compute |
---|
2635 | information about the sparsity pattern of $Z$ as described in detail |
---|
2636 | in the previous \autoref{forw_rev_ad}. |
---|
2637 | |
---|
2638 | The return values of {\sf reverse} calls can be interpreted according |
---|
2639 | to \autoref{retvalues}, but negative return values are not |
---|
2640 | valid, since the corresponding forward sweep would have |
---|
2641 | stopped without completing the necessary taylor file. |
---|
2642 | The return value of {\sf reverse} may be higher |
---|
2643 | than that of the preceding {\sf forward} call because some operations |
---|
2644 | that were evaluated at a critical argument during the forward sweep |
---|
2645 | were found not to impact the dependents during the reverse sweep. |
---|
2646 | |
---|
2647 | In both scalar and vector mode, the degree $d$ must agree with |
---|
2648 | {\sf keep}~$-$~1 for the most recent call to {\sf forward}, or it must be |
---|
2649 | equal to zero if {\sf reverse} directly follows the taping of an active |
---|
2650 | section. Otherwise, {\sf reverse} will return control with a suitable error |
---|
2651 | message. |
---|
2652 | In order to avoid possible confusion, the first four arguments must always be |
---|
2653 | present in the calling sequence. However, if $m$ or $d$ |
---|
2654 | attain their trivial values 1 and 0, respectively, then |
---|
2655 | corresponding dimensions of the arrays {\sf X}, {\sf Y}, {\sf u}, |
---|
2656 | {\sf U}, or {\sf Z} can be omitted, thus eliminating one level of |
---|
2657 | indirection. For example, we may call |
---|
2658 | {\sf reverse(tag,1,n,0,1.0,g)} after declaring |
---|
2659 | {\sf double g[n]} |
---|
2660 | to calculate a gradient of a scalar-valued function. |
---|
2661 | |
---|
2662 | Sometimes it may be useful to perform a forward sweep for families of |
---|
2663 | Taylor series with the same leading term. |
---|
2664 | This vector version of {\sf forward} can be called in the form |
---|
2665 | % |
---|
2666 | \begin{tabbing} |
---|
2667 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2668 | \>{\sf int forward(tag,m,n,d,p,x0,X,y0,Y)}\\ |
---|
2669 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2670 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2671 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2672 | \>{\sf int d;} \> // highest derivative degree $d$\\ |
---|
2673 | \>{\sf int p;} \> // number of Taylor series $p$\\ |
---|
2674 | \>{\sf double x0[n];} \> // values of independent variables $x_0$\\ |
---|
2675 | \>{\sf double X[n][p][d];} \> // Taylor coefficients $X$ of independent variables\\ |
---|
2676 | \>{\sf double y0[m];} \> // values of dependent variables $y_0$\\ |
---|
2677 | \>{\sf double Y[m][p][d];} \> // Taylor coefficients $Y$ of dependent variables |
---|
2678 | \end{tabbing} |
---|
2679 | % |
---|
2680 | where {\sf X} and {\sf Y} hold the Taylor coefficients of first |
---|
2681 | and higher degree and {\sf x0}, {\sf y0} the common Taylor coefficients of |
---|
2682 | degree 0. There is no option to keep the values of active variables |
---|
2683 | that are going out of scope or that are overwritten. Therefore this |
---|
2684 | function cannot prepare a subsequent reverse sweep. |
---|
2685 | The return integer serves as a flag to indicate quadratures or altered |
---|
2686 | comparisons as described above in \autoref{reuse_tape}. |
---|
2687 | |
---|
2688 | Since the calculation of Jacobians is probably the most important |
---|
2689 | automatic differentia\-tion task, we have provided a specialization |
---|
2690 | of vector {\sf forward} to the case where $d = 1$. This version can be |
---|
2691 | called in the form |
---|
2692 | % |
---|
2693 | \begin{tabbing} |
---|
2694 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2695 | \>{\sf int forward(tag,m,n,p,x,X,y,Y)}\\ |
---|
2696 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2697 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2698 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2699 | \>{\sf int p;} \> // number of partial derivatives $p$ \\ |
---|
2700 | \>{\sf double x[n];} \> // values of independent variables $x_0$\\ |
---|
2701 | \>{\sf double X[n][p];} \> // seed derivatives of independent variables $X$\\ |
---|
2702 | \>{\sf double y[m];} \> // values of dependent variables $y_0$\\ |
---|
2703 | \>{\sf double Y[m][p];} \> // first derivatives of dependent variables $Y$ |
---|
2704 | \end{tabbing} |
---|
2705 | % |
---|
2706 | When this routine is called with {\sf p} = {\sf n} and {\sf X} the identity matrix, |
---|
2707 | the resulting {\sf Y} is simply the Jacobian $F^\prime(x_0)$. In general, |
---|
2708 | one obtains the $m\times p$ matrix $Y=F^\prime(x_0)\,X $ for the |
---|
2709 | chosen initialization of $X$. In a workstation environment a value |
---|
2710 | of $p$ somewhere between $10$ and $50$ |
---|
2711 | appears to be fairly optimal. For smaller $p$ the interpretive |
---|
2712 | overhead is not appropriately amortized, and for larger $p$ the |
---|
2713 | $p$-fold increase in storage causes too many page faults. Therefore, |
---|
2714 | large Jacobians that cannot be compressed via column coloring |
---|
2715 | as could be done for example using the driver {\sf sparse\_jac} |
---|
2716 | should be ``strip-mined'' in the sense that the above |
---|
2717 | first-order-vector version of {\sf forward} is called |
---|
2718 | repeatedly with the successive \mbox{$n \times p$} matrices $X$ forming |
---|
2719 | a partition of the identity matrix of order $n$. |
---|
2720 | % |
---|
2721 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2722 | \subsection{Dependence Analysis} |
---|
2723 | % |
---|
2724 | \label{ProBit} |
---|
2725 | % |
---|
2726 | The sparsity pattern of Jacobians is often needed to set up data structures |
---|
2727 | for their storage and factorization or to allow their economical evaluation |
---|
2728 | by compression \cite{BeKh96}. Compared to the evaluation of the full |
---|
2729 | Jacobian $F'(x_0)$ in real arithmetic computing the Boolean matrix |
---|
2730 | $\tilde{P}\in\left\{0,1\right\}^{m\times n}$ representing its sparsity |
---|
2731 | pattern in the obvious way requires a little less run-time and |
---|
2732 | certainly a lot less memory. |
---|
2733 | |
---|
2734 | The entry $\tilde{P}_{ji}$ in the $j$-th row and $i$-th column |
---|
2735 | of $\tilde{P}$ should be $1 = true$ exactly when there is a data |
---|
2736 | dependence between the $i$-th independent variable $x_{i}$ and |
---|
2737 | the $j$-th dependent variable $y_{j}$. Just like for real arguments |
---|
2738 | one would wish to compute matrix-vector and vector-matrix products |
---|
2739 | of the form $\tilde{P}\tilde{v}$ or $\tilde{u}^{T}\tilde{P}$ |
---|
2740 | by appropriate {\sf forward} and {\sf reverse} routines where |
---|
2741 | $\tilde{v}\in\{0,1\}^{n}$ and $\tilde{u}\in\{0,1\}^{m}$. |
---|
2742 | Here, multiplication corresponds to logical |
---|
2743 | {\sf AND} and addition to logical {\sf OR}, so that algebra is performed in a |
---|
2744 | semi-ring. |
---|
2745 | |
---|
2746 | For practical reasons it is assumed that |
---|
2747 | $s=8*${\sf sizeof}$(${\sf unsigned long int}$)$ such Boolean vectors |
---|
2748 | $\tilde{v}$ and $\tilde{u}$ are combined to integer vectors |
---|
2749 | $v\in\N^{n}$ and $u\in\N^{m}$ whose components can be interpreted |
---|
2750 | as bit patterns. Moreover $p$ or $q$ such integer vectors may |
---|
2751 | be combined column-wise or row-wise to integer matrices $X\in\N^{n \times p}$ |
---|
2752 | and $U\in\N^{q \times m}$, which naturally correspond |
---|
2753 | to Boolean matrices $\tilde{X}\in\{0,1\}^{n\times\left(sp\right)}$ |
---|
2754 | and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times m}$. The provided |
---|
2755 | bit pattern versions of {\sf forward} and {\sf reverse} allow |
---|
2756 | to compute integer matrices $Y\in\N^{m \times p}$ and |
---|
2757 | $Z\in\N^{q \times m}$ corresponding to |
---|
2758 | \begin{equation} |
---|
2759 | \label{eq:int_forrev} |
---|
2760 | \tilde{Y} = \tilde{P}\tilde{X} \qquad \mbox{and} \qquad |
---|
2761 | \tilde{Z} = \tilde{U}\tilde{P} \, , |
---|
2762 | \end{equation} |
---|
2763 | respectively, with $\tilde{Y}\in\{0,1\}^{m\times\left(sp\right)}$ |
---|
2764 | and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times n}$. |
---|
2765 | In general, the application of the bit pattern versions of |
---|
2766 | {\sf forward} or {\sf reverse} can be interpreted as |
---|
2767 | propagating dependences between variables forward or backward, therefore |
---|
2768 | both the propagated integer matrices and the corresponding |
---|
2769 | Boolean matrices are called {\em dependence structures}. |
---|
2770 | |
---|
2771 | The bit pattern {\sf forward} routine |
---|
2772 | % |
---|
2773 | \begin{tabbing} |
---|
2774 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2775 | \>{\sf int forward(tag,m,n,p,x,X,y,Y,mode)}\\ |
---|
2776 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2777 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2778 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2779 | \>{\sf int p;} \> // number of integers propagated $p$\\ |
---|
2780 | \>{\sf double x[n];} \> // values of independent variables $x_0$\\ |
---|
2781 | \>{\sf unsigned long int X[n][p];} \> // dependence structure $X$ \\ |
---|
2782 | \>{\sf double y[m];} \> // values of dependent variables $y_0$\\ |
---|
2783 | \>{\sf unsigned long int Y[m][p];} \> // dependence structure $Y$ according to |
---|
2784 | \eqref{eq:int_forrev}\\ |
---|
2785 | \>{\sf char mode;} \> // 0 : safe mode (default), 1 : tight mode |
---|
2786 | \end{tabbing} |
---|
2787 | % |
---|
2788 | can be used to obtain the dependence structure $Y$ for a given dependence structure |
---|
2789 | $X$. The dependence structures are |
---|
2790 | represented as arrays of {\sf unsigned long int} the entries of which are |
---|
2791 | interpreted as bit patterns as described above. |
---|
2792 | For example, for $n=3$ the identity matrix $I_3$ should be passed |
---|
2793 | with $p=1$ as the $3 \times 1$ array |
---|
2794 | \begin{eqnarray*} |
---|
2795 | {\sf X} \; = \; |
---|
2796 | \left( \begin{array}{r} |
---|
2797 | {\sf 1}0000000 \: 00000000 \: 00000000 \: 00000000_2 \\ |
---|
2798 | 0{\sf 1}000000 \: 00000000 \: 00000000 \: 00000000_2 \\ |
---|
2799 | 00{\sf 1}00000 \: 00000000 \: 00000000 \: 00000000_2 |
---|
2800 | \end{array} \right) |
---|
2801 | \end{eqnarray*} |
---|
2802 | in the 4-byte long integer format. The parameter {\sf mode} determines |
---|
2803 | the mode of dependence analysis as explained already in \autoref{sparse}. |
---|
2804 | |
---|
2805 | A call to the corresponding bit pattern {\sf reverse} routine |
---|
2806 | % |
---|
2807 | \begin{tabbing} |
---|
2808 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2809 | \>{\sf int reverse(tag,m,n,q,U,Z,mode)}\\ |
---|
2810 | \>{\sf short int tag;} \> // tape identification \\ |
---|
2811 | \>{\sf int m;} \> // number of dependent variables $m$\\ |
---|
2812 | \>{\sf int n;} \> // number of independent variables $n$\\ |
---|
2813 | \>{\sf int q;} \> // number of integers propagated q\\ |
---|
2814 | \>{\sf unsigned long int U[q][m];} \> // dependence structure $U$ \\ |
---|
2815 | \>{\sf unsigned long int Z[q][n];} \> // dependence structure $Z$ according |
---|
2816 | to \eqref{eq:int_forrev}\\ |
---|
2817 | \>{\sf char mode;} \> // 0 : safe mode (default), 1 : tight mode |
---|
2818 | \end{tabbing} |
---|
2819 | % |
---|
2820 | yields the dependence structure $Z$ for a given dependence structure |
---|
2821 | $U$. |
---|
2822 | |
---|
2823 | To determine the whole sparsity pattern $\tilde{P}$ of the Jacobian $F'(x)$ |
---|
2824 | as an integer matrix $P$ one may call {\sf forward} or {\sf reverse} |
---|
2825 | with $p \ge n/s$ or $q \ge m/s$, respectively. For this purpose the |
---|
2826 | corresponding dependence structure $X$ or $U$ must be defined to represent |
---|
2827 | the identity matrix of the respective dimension. |
---|
2828 | Due to the fact that always a multiple of $s$ Boolean vectors are propagated |
---|
2829 | there may be superfluous vectors, which can be set to zero. |
---|
2830 | |
---|
2831 | The return values of the bit pattern {\sf forward} and {\sf reverse} routines |
---|
2832 | correspond to those described in \autoref{retvalues}. |
---|
2833 | |
---|
2834 | One can control the storage growth by the factor $p$ using |
---|
2835 | ``strip-mining'' for the calls of {\sf forward} or {\sf reverse} with successive |
---|
2836 | groups of columns or respectively rows at a time, i.e.~partitioning |
---|
2837 | $X$ or $U$ appropriately as described for the computation of Jacobians |
---|
2838 | in \autoref{vecCas}. |
---|
2839 | % |
---|
2840 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
2841 | % |
---|
2842 | \section{Advanced algorithmic differentiation in ADOL-C} |
---|
2843 | \label{adv_ad} |
---|
2844 | % |
---|
2845 | \subsection{Differentiating parameter dependent functions} |
---|
2846 | % |
---|
2847 | Normally the functions to be differentiated using AD are defined as |
---|
2848 | mappings from $\R^n$ to $\R^m$. The derivatives of the $m$ dependents |
---|
2849 | are then computed w.r.t. all the $n$ independents. However sometimes |
---|
2850 | the application requires only derivatives w.r.t. some of the |
---|
2851 | independents and the others only occur in the evaluation procedure as |
---|
2852 | parameters. One important such use case is Lagrange Multipliers in a |
---|
2853 | nonlinear optimization problem. Declaring such variables as |
---|
2854 | \verb=adouble= and marking them as independent results in the |
---|
2855 | evaluation of superfluous derivative information and increases |
---|
2856 | runtime. However, if the parameters are not defined as \verb=adouble= |
---|
2857 | type variables they get hard coded as constants onto the ADOL-C |
---|
2858 | trace. One may want to change these parameters for some evaluations, |
---|
2859 | however the overhead of retracing the function may be very high. In |
---|
2860 | such cases these constants may be specially marked as parameters |
---|
2861 | during tracing using the function |
---|
2862 | \begin{tabbing} |
---|
2863 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2864 | \>{\sf mkparam(d)}\\ |
---|
2865 | \>{\sf double d;} \> // constant value to be marked as parameter \\ |
---|
2866 | \end{tabbing} |
---|
2867 | This stores the constant in a different place on the trace than the |
---|
2868 | other unmarked constants and enables the user to replace a whole |
---|
2869 | vector of parameters with a new one before computing derivatives later |
---|
2870 | on. All such marked parameter constants can be used in any evaluation |
---|
2871 | expression where the original unmarked constants were used. |
---|
2872 | |
---|
2873 | Sometimes the same parameter may be needed in various different |
---|
2874 | expressions. During trace creation the marked parameters are given a |
---|
2875 | running index as they are marked. This index may me saved by the user |
---|
2876 | and used to access the same stored value in different expressions. |
---|
2877 | |
---|
2878 | To save the index while marking a parameter use |
---|
2879 | \begin{tabbing} |
---|
2880 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2881 | \>{\sf locint mkparam\_idx(d)}\\ |
---|
2882 | \>{\sf double d;} \> // constant value to be marked |
---|
2883 | \end{tabbing} |
---|
2884 | The resulting \verb=locint= can be saved in a variable by the user and |
---|
2885 | later used to access the saved parameter from in the trace in any expression |
---|
2886 | |
---|
2887 | To access a saved parameter inside any evaluation expression use |
---|
2888 | \begin{tabbing} |
---|
2889 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2890 | \>{\sf getparam(idx)}\\ |
---|
2891 | \>{\sf locint idx;} \> // index of the parameter as returned by |
---|
2892 | mkparam\_idx()\\ |
---|
2893 | \end{tabbing} |
---|
2894 | |
---|
2895 | The above three functions work only during the creation of the trace, |
---|
2896 | i.e., between the invokation of \verb=trace_on()= and \verb=trace_off()=. |
---|
2897 | |
---|
2898 | For all following derivative computations using the drivers described |
---|
2899 | in the preceding sections, the saved parameter values will be used, |
---|
2900 | unless the user asks to replace them with a different vector of |
---|
2901 | values. |
---|
2902 | |
---|
2903 | To replace the whole parameter vector one uses the routine |
---|
2904 | \begin{tabbing} |
---|
2905 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
2906 | \>{\sf set\_param\_vec(tag, number, parameters)}\\ |
---|
2907 | \>{\sf short int tag;} \> // tape identification\\ |
---|
2908 | \>{\sf size\_t number;} \> // total number of constants marked as |
---|
2909 | parameters\\ |
---|
2910 | \>{\sf double* parameters;} \> // array of constants to be used as replacement\\ |
---|
2911 | \end{tabbing} |
---|
2912 | All following derivative computations using the drivers described in |
---|
2913 | the preceding sections will now use the new parameters. |
---|
2914 | |
---|
2915 | After replacing the parameters, however, one cannot directly call a basic |
---|
2916 | reverse mode driver. A basic forward mode driver must be called. However, |
---|
2917 | the easy to use drivers from Section \ref{drivers} do this automatically. |
---|
2918 | % |
---|
2919 | \subsection{Differentiating external functions} |
---|
2920 | % |
---|
2921 | Ideally, AD is applied to a given computation as a whole. |
---|
2922 | This is not always possible because parts of the computation may |
---|
2923 | be coded in a different programming language or may a call to |
---|
2924 | an external library. |
---|
2925 | In the former case one may want to differentiate the parts in |
---|
2926 | question with a different AD tool or provide hand written derivatives. |
---|
2927 | In practice, however, sophisticated projects usually evolve over a long period of time. |
---|
2928 | Within this process, a heterogeneous code base for the project |
---|
2929 | develops, which may include the incorporation of external solutions, |
---|
2930 | changes in programming paradigms or even of programming languages. |
---|
2931 | Equally heterogeneous, the computation of derivative values appears. |
---|
2932 | Hence, different \mbox{AD-tools} may be combined with hand-derived |
---|
2933 | codes based on the same or different programming languages. |
---|
2934 | ADOL-C supports such settings by the concept of externally |
---|
2935 | differentiated functions, that is, a function |
---|
2936 | not differentiated by ADOL-C itself. The required derivatives |
---|
2937 | have to be provided by the user. |
---|
2938 | |
---|
2939 | For this purpose, it is required that the externally differentiated |
---|
2940 | function (for example named {\sf\em euler\_step} ) has the following signature |
---|
2941 | \smallskip |
---|
2942 | |
---|
2943 | \noindent |
---|
2944 | \hspace*{2cm}{\sf int euler\_step(int n, double *x, int m, double *y);} . |
---|
2945 | \medskip |
---|
2946 | |
---|
2947 | \noindent |
---|
2948 | Note that the formal paramters in the signature have {\sf double} type, that is, |
---|
2949 | they are not active as in the original program before the ADOL-C type change. |
---|
2950 | The externally differentiated function has to |
---|
2951 | be {\em registered}\footnote{we record the function pointer} using an \mbox{ADOL-C} method as follows |
---|
2952 | \smallskip |
---|
2953 | |
---|
2954 | \noindent |
---|
2955 | \hspace*{2cm}{\sf ext\_diff\_fct *edf = reg\_ext\_fct(euler\_step);} . |
---|
2956 | \smallskip |
---|
2957 | |
---|
2958 | \noindent |
---|
2959 | This returns a pointer to an {\sf ext\_diff\_fct} instance specific to the registered function. |
---|
2960 | Then, the user has to supply the function pointers for the callback methods (for example here |
---|
2961 | {\sf zos\_for\_euler\_step} and {\sf fos\_rev\_euler\_step}) the user implemented |
---|
2962 | to compute the derivatives as follows: |
---|
2963 | \begin{tabbing} |
---|
2964 | \hspace*{2cm}\= {\sf edf-$>$zos\_forward = {\em zos\_for\_euler\_step};}\\ |
---|
2965 | \> {\sf // function pointer for computing |
---|
2966 | Zero-Order-Scalar (=zos)}\\ |
---|
2967 | \> {\sf // forward information}\\ |
---|
2968 | % \> {\sf edf-$>$dp\_x = xp;}\\ |
---|
2969 | % \> {\sf edf-$>$dp\_y = yp;}\\ |
---|
2970 | % \> {\sf // double arrays for arguments and results}\\ |
---|
2971 | \> {\sf edf-$>$fos\_reverse = fos\_rev\_euler\_step;} \\ |
---|
2972 | \> {\sf // function pointer for computing |
---|
2973 | First-Order-Scalar (=fos)}\\ |
---|
2974 | \> {\sf reverse information} |
---|
2975 | \end{tabbing} |
---|
2976 | To facilitate the switch between active and passive versions of the parameters {\sf x} and {\sf y} |
---|
2977 | one has to provide (allocate) both variants. I.e. if the call to {\sf euler\_step} was originally |
---|
2978 | \smallskip |
---|
2979 | |
---|
2980 | \noindent |
---|
2981 | \hspace*{2cm}{\sf rc=euler\_step(n, sOld, m, sNew);} |
---|
2982 | \medskip |
---|
2983 | |
---|
2984 | \noindent |
---|
2985 | then the ADOL-C typechange for the calling context will turn {\sf sOld} and {\sf sNew} in {\sf adouble} pointers. |
---|
2986 | To trigger the appropriate action for the derivative computation (i.e. creating an external differentiation entry on the trace) |
---|
2987 | the original call to the externally differentiated function |
---|
2988 | must be substituted by |
---|
2989 | \smallskip |
---|
2990 | |
---|
2991 | \noindent |
---|
2992 | \hspace*{2cm}{\sf rc=call\_ext\_fct(edf, n, sOldPassive, sOld, m, sNewPassive, sNew);} . |
---|
2993 | \medskip |
---|
2994 | |
---|
2995 | \noindent |
---|
2996 | Here, {\sf sOldPassive} and {\sf sNewPassive} are the passive counterparts ({\sf double} pointers allocated to length {\sf n} and {\sf m}, respectively) |
---|
2997 | to the active arguments {\sf sNew} in {\sf adouble}. |
---|
2998 | The usage of the external function facility is illustrated by the |
---|
2999 | example \verb=ext_diff_func= contained in |
---|
3000 | \verb=examples/additional_examples/ext_diff_func=. |
---|
3001 | There, the external differentiated function is also a C code, but the |
---|
3002 | handling as external differentiated function decreases the |
---|
3003 | overall required tape size. |
---|
3004 | |
---|
3005 | % |
---|
3006 | \subsection{Advanced algorithmic differentiation of time integration processes} |
---|
3007 | % |
---|
3008 | For many time-dependent applications, the corresponding simulations |
---|
3009 | are based on ordinary or partial differential equations. |
---|
3010 | Furthermore, frequently there are quantities that influence the |
---|
3011 | result of the simulation and can be seen as control of the systems. |
---|
3012 | To compute an approximation of the |
---|
3013 | simulated process for a time interval $[0,T]$ and evaluated the |
---|
3014 | desired target function, one applies an |
---|
3015 | appropriate integration scheme given by |
---|
3016 | \begin{tabbing} |
---|
3017 | \hspace{5mm} \= some initializations yielding $x_0$\\ |
---|
3018 | \> for $i=0,\ldots, N-1$\\ |
---|
3019 | \hspace{10mm}\= $x_{i+1} = F(x_i,u_i,t_i)$\\ |
---|
3020 | \hspace{5mm} \= evaluation of the target function |
---|
3021 | \end{tabbing} |
---|
3022 | where $x_i\in {\bf R}^n$ denotes the state and $u_i\in {\bf R}^m$ the control at |
---|
3023 | time $t_i$ for a given time grid $t_0,\ldots,t_N$ with $t_0=0$ and |
---|
3024 | $t_N=T$. The operator $F : {\bf R}^n \times {\bf R}^m \times {\bf R} \mapsto {\bf R}^n$ |
---|
3025 | defines the time step to compute the state at time $t_i$. Note that we |
---|
3026 | do not assume a uniform grid. |
---|
3027 | |
---|
3028 | When computing derivatives of the target function with respect to the |
---|
3029 | control, the consequences for the tape generation using the ``basic'' |
---|
3030 | taping approach as implemented in ADOL-C so far are shown in the left part of |
---|
3031 | \autoref{fig:bas_tap}. |
---|
3032 | \begin{figure}[htbp] |
---|
3033 | \begin{center} |
---|
3034 | \hspace*{0.5cm}\includegraphics[width=5.8cm]{tapebasic}\hfill |
---|
3035 | \includegraphics[width=5.8cm]{tapeadv} \hspace*{0.5cm}\ |
---|
3036 | \end{center} |
---|
3037 | \hspace*{0.8cm} Basic taping process \hspace*{4.3cm} Advanced taping process |
---|
3038 | \caption{Different taping approaches} |
---|
3039 | \label{fig:bas_tap} |
---|
3040 | \end{figure} |
---|
3041 | As can be seen, the iterative process is completely |
---|
3042 | unrolled due to the taping process. That is, the tape contains an internal representation of each |
---|
3043 | time step. Hence, the overall tape comprises a serious amount of redundant |
---|
3044 | information as illustrated by the light grey rectangles in |
---|
3045 | \autoref{fig:bas_tap}. |
---|
3046 | |
---|
3047 | To overcome the repeated storage of essentially the same information, |
---|
3048 | a {\em nested taping} mechanism has been incorporated into ADOL-C as illustrated on |
---|
3049 | the right-hand side of \autoref{fig:bas_tap}. This new |
---|
3050 | capability allows the encapsulation of the time-stepping procedure |
---|
3051 | such that only the last time step $x_{N} = F(x_{N-1},u_{N-1})$ is taped as one |
---|
3052 | representative of the time steps in addition to a function pointer to the |
---|
3053 | evaluation procedure $F$ of the time steps. The function pointer has |
---|
3054 | to be stored for a possibly necessary retaping during the derivative calculation |
---|
3055 | as explained below. |
---|
3056 | |
---|
3057 | Instead of storing the complete tape, only a very limited number of intermediate |
---|
3058 | states are kept in memory. They serve as checkpoints, such that |
---|
3059 | the required information for the backward integration is generated |
---|
3060 | piecewise during the adjoint calculation. |
---|
3061 | For this modified adjoint computation the optimal checkpointing schedules |
---|
3062 | provided by {\bf revolve} are employed. An adapted version of the |
---|
3063 | software package {\sf revolve} is part of ADOL-C and automatically |
---|
3064 | integrated in the ADOL-C library. Based on {\sf revolve}, $c$ checkpoints are |
---|
3065 | distributed such that computational effort is minimized for the given |
---|
3066 | number of checkpoints and time steps $N$. It is important to note that the overall tape |
---|
3067 | size is drastically reduced due to the advanced taping strategy. For the |
---|
3068 | implementation of this nested taping we introduced |
---|
3069 | a so-called ``differentiating context'' that enables \mbox{ADOL-C} to |
---|
3070 | handle different internal function representations during the taping |
---|
3071 | procedure and the derivative calculation. This approach allows the generation of a new |
---|
3072 | tape inside the overall tape, where the coupling of the different tapes is based on |
---|
3073 | the {\em external differentiated function} described above. |
---|
3074 | |
---|
3075 | Written under the objective of minimal user effort, the checkpointing routines |
---|
3076 | of \mbox{ADOL-C} need only very limited information. The user must |
---|
3077 | provide two routines as implementation of the time-stepping function $F$ |
---|
3078 | with the signatures |
---|
3079 | \medskip |
---|
3080 | |
---|
3081 | \noindent |
---|
3082 | \hspace*{2cm}{\sf int time\_step\_function(int n, adouble *u);}\\ |
---|
3083 | \hspace*{2cm}{\sf int time\_step\_function(int n, double *u);} |
---|
3084 | \medskip |
---|
3085 | |
---|
3086 | \noindent |
---|
3087 | where the function names can be chosen by the user as long as the names are |
---|
3088 | unique.It is possible that the result vector of one time step |
---|
3089 | iteration overwrites the argument vector of the same time step. Then, no |
---|
3090 | copy operations are required to prepare the next time step. |
---|
3091 | |
---|
3092 | At first, the {\sf adouble} version of the time step function has to |
---|
3093 | be {\em registered} using the \mbox{ADOL-C} function |
---|
3094 | \medskip |
---|
3095 | |
---|
3096 | \noindent |
---|
3097 | \hspace*{2cm}{\sf CP\_Context cpc(time\_step\_function);}. |
---|
3098 | \medskip |
---|
3099 | |
---|
3100 | \noindent |
---|
3101 | This function initializes the structure {\sf cpc}. Then, |
---|
3102 | the user has to provide the remaining checkpointing information |
---|
3103 | by the following commands: |
---|
3104 | \begin{tabbing} |
---|
3105 | \hspace*{2cm}\= {\sf cpc.setDoubleFct(time\_step\_function);}\\ |
---|
3106 | \> {\sf // double variante of the time step function}\\ |
---|
3107 | \> {\sf cpc.setNumberOfSteps(N);}\\ |
---|
3108 | \> {\sf // number of time steps to perform}\\ |
---|
3109 | \> {\sf cpc.setNumberOfCheckpoints(10);}\\ |
---|
3110 | \> {\sf // number of checkpoint} \\ |
---|
3111 | \> {\sf cpc.setDimensionXY(n);}\\ |
---|
3112 | \> {\sf // dimension of input/output}\\ |
---|
3113 | \> {\sf cpc.setInput(y);}\\ |
---|
3114 | \> {\sf // input vector} \\ |
---|
3115 | \> {\sf cpc.setOutput(y);}\\ |
---|
3116 | \> {\sf // output vector }\\ |
---|
3117 | \> {\sf cpc.setTapeNumber(tag\_check);}\\ |
---|
3118 | \> {\sf // subtape number for checkpointing} \\ |
---|
3119 | \> {\sf cpc.setAlwaysRetaping(false);}\\ |
---|
3120 | \> {\sf // always retape or not ?} |
---|
3121 | \end{tabbing} |
---|
3122 | Subsequently, the time loop in the function evaluation can be |
---|
3123 | substituted by a call of the function |
---|
3124 | \medskip |
---|
3125 | |
---|
3126 | \noindent |
---|
3127 | \hspace*{2cm}{\sf int cpc.checkpointing();} |
---|
3128 | \medskip |
---|
3129 | |
---|
3130 | \noindent |
---|
3131 | Then, ADOL-C computes derivative information using the optimal checkpointing |
---|
3132 | strategy provided by {\sf revolve} internally, i.e., completely hidden from the user. |
---|
3133 | |
---|
3134 | The presented driver is prototyped in the header file |
---|
3135 | \verb=<adolc/checkpointing.h>=. This header |
---|
3136 | is included by the global header file \verb=<adolc/adolc.h>= automatically. |
---|
3137 | An example program \verb=checkpointing.cpp= illustrates the |
---|
3138 | checkpointing facilities. It can be found in the directory \verb=examples/additional_examples/checkpointing=. |
---|
3139 | % |
---|
3140 | % |
---|
3141 | % |
---|
3142 | \subsection{Advanced algorithmic differentiation of fixed point iterations} |
---|
3143 | % |
---|
3144 | Quite often, the state of the considered system denoted by $x\in\R^n$ |
---|
3145 | depends on some design parameters denoted by $u\in\R^m$. One example for this setting |
---|
3146 | forms the flow over an aircraft wing. Here, the shape of the wing that |
---|
3147 | is defined by the design vector $u$ |
---|
3148 | determines the flow field $x$. The desired quasi-steady state $x_*$ |
---|
3149 | fulfills the fixed point equation |
---|
3150 | \begin{align} |
---|
3151 | \label{eq:fixedpoint} |
---|
3152 | x_* = F(x_*,u) |
---|
3153 | \end{align} |
---|
3154 | for a given continuously differentiable function |
---|
3155 | $F:\R^n\times\R^m\rightarrow\R^n$. A fixed point property of this kind is |
---|
3156 | also exploited by many other applications. |
---|
3157 | |
---|
3158 | Assume that one can apply the iteration |
---|
3159 | \begin{align} |
---|
3160 | \label{eq:iteration} |
---|
3161 | x_{k+1} = F(x_k,u) |
---|
3162 | \end{align} |
---|
3163 | to obtain a linear converging sequence $\{x_k\}$ generated |
---|
3164 | for any given control $u\in\R^n$. Then the limit point $x_*\in\R^n$ fulfils the fixed |
---|
3165 | point equation~\eqref{eq:fixedpoint}. Moreover, |
---|
3166 | suppose that $\|\frac{dF}{dx}(x_*,u)\|<1$ holds for any pair |
---|
3167 | $(x_*,u)$ satisfying equation \eqref{eq:fixedpoint}. |
---|
3168 | Hence, there exists a |
---|
3169 | differentiable function $\phi:\R^m \rightarrow \R^n$, |
---|
3170 | such that $\phi(u) = F(\phi(u),u)$, where the state |
---|
3171 | $\phi(u)$ is a fixed point of $F$ according to a control |
---|
3172 | $u$. To optimize the system described by the state vector $x=\phi(u)$ with respect to |
---|
3173 | the design vector $u$, derivatives of $\phi$ with respect |
---|
3174 | to $u$ are of particular interest. |
---|
3175 | |
---|
3176 | To exploit the advanced algorithmic differentiation of such fixed point iterations |
---|
3177 | ADOL-C provides the special functions {\tt fp\_iteration(...)}. |
---|
3178 | It has the following interface: |
---|
3179 | \begin{tabbing} |
---|
3180 | \hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill % define tab position |
---|
3181 | \>{\sf int |
---|
3182 | fp\_iteration(}\={\sf sub\_tape\_num,double\_F,adouble\_F,norm,norm\_deriv,eps,eps\_deriv,}\\ |
---|
3183 | \> \>{\sf N\_max,N\_max\_deriv,x\_0,u,x\_fix,dim\_x,dim\_u)}\\ |
---|
3184 | \hspace{0.5in}\={\sf short int tag;} \hspace{0.9in}\= \kill % define tab position |
---|
3185 | \>{\sf short int sub\_tape\_num;} \> // tape identification for sub\_tape \\ |
---|
3186 | \>{\sf int *double\_F;} \> // pointer to a function that compute for $x$ and $u$ \\ |
---|
3187 | \> \> // the value $y=F(x,u)$ for {\sf double} arguments\\ |
---|
3188 | \>{\sf int *adouble\_F;} \> // pointer to a function that compute for $x$ and $u$ \\ |
---|
3189 | \> \> // the value $y=F(x,u)$ for {\sf double} arguments\\ |
---|
3190 | \>{\sf int *norm;} \> // pointer to a function that computes\\ |
---|
3191 | \> \> // the norm of a vector\\ |
---|
3192 | \>{\sf int *norm\_deriv;} \> // pointer to a function that computes\\ |
---|
3193 | \> \> // the norm of a vector\\ |
---|
3194 | \>{\sf double eps;} \> // termination criterion for fixed point iteration\\ |
---|
3195 | \>{\sf double eps\_deriv;} \> // termination criterion for adjoint fixed point iteration\\ |
---|
3196 | \>{\sf N\_max;} \> // maximal number of itertions for state computation\\ |
---|
3197 | \>{\sf N\_max\_deriv;} \> // maximal number of itertions for adjoint computation\\ |
---|
3198 | \>{\sf adouble *x\_0;} \> // inital state of fixed point iteration\\ |
---|
3199 | \>{\sf adouble *u;} \> // value of $u$\\ |
---|
3200 | \>{\sf adouble *x\_fic;} \> // final state of fixed point iteration\\ |
---|
3201 | \>{\sf int dim\_x;} \> // dimension of $x$\\ |
---|
3202 | \>{\sf int dim\_u;} \> // dimension of $u$\\ |
---|
3203 | \end{tabbing} |
---|
3204 | % |
---|
3205 | Here {\tt sub\_tape\_num} is an ADOL-C identifier for the subtape that |
---|
3206 | should be used for the fixed point iteration. |
---|
3207 | {\tt double\_F} and {\tt adouble\_F} are pointers to functions, that |
---|
3208 | compute for $x$ and $u$ a single iteration step $y=F(x,u)$. Thereby |
---|
3209 | {\tt double\_F} uses {\tt double} arguments and {\tt adouble\_F} |
---|
3210 | uses ADOL-C {\tt adouble} arguments. The parameters {\tt norm} and |
---|
3211 | {\tt norm\_deriv} are pointers to functions computing the norm |
---|
3212 | of a vector. The latter functions together with {\tt eps}, |
---|
3213 | {\tt eps\_deriv}, {\tt N\_max}, and {\tt N\_max\_deriv} control |
---|
3214 | the iterations. Thus the following loops are performed: |
---|
3215 | \begin{center} |
---|
3216 | \begin{tabular}{ll} |
---|
3217 | do & do \\ |
---|
3218 | ~~~~$k = k+1$ & ~~~~$k = k+1$ \\ |
---|
3219 | ~~~~$x = y$ & ~~~~$\zeta = \xi$ \\ |
---|
3220 | ~~~~$y = F(x,u)$ & ~~~ |
---|
3221 | $(\xi^T,\bar u^T) = \zeta^TF'(x_*,u) + (\bar x^T, 0^T)$ \\ |
---|
3222 | while $\|y-x\|\geq\varepsilon$ and $k\leq N_{max}$ \hspace*{0.5cm} & |
---|
3223 | while $\|\xi -\zeta\|_{deriv}\geq\varepsilon_{deriv}$ \\ |
---|
3224 | & and $k\leq N_{max,deriv}$ |
---|
3225 | \end{tabular} |
---|
3226 | \end{center} |
---|
3227 | The vector for the initial iterate and the control is stored |
---|
3228 | in {\tt x\_0} and {\tt u} respectively. The vector in which the |
---|
3229 | fixed point is stored is {\tt x\_fix}. Finally {\tt dim\_x} |
---|
3230 | and {\tt dim\_u} represent the dimensions $n$ and $m$ of the |
---|
3231 | corresponding vectors. |
---|
3232 | |
---|
3233 | The presented driver is prototyped in the header file |
---|
3234 | \verb=<adolc/fixpoint.h>=. This header |
---|
3235 | is included by the global header file \verb=<adolc/adolc.h>= automatically. |
---|
3236 | An example code that shows also the |
---|
3237 | expected signature of the function pointers is contained in the directory \verb=examples/additional_examples/fixpoint_exam=. |
---|
3238 | % |
---|
3239 | \subsection{Advanced algorithmic differentiation of OpenMP parallel programs} |
---|
3240 | % |
---|
3241 | ADOL-C allows to compute derivatives in parallel for functions |
---|
3242 | containing OpenMP parallel loops. |
---|
3243 | This implies that an explicit loop-handling approach is applied. A |
---|
3244 | typical situation is shown in \autoref{fig:basic_layout}, |
---|
3245 | \begin{figure}[hbt] |
---|
3246 | \vspace{3ex} |
---|
3247 | \begin{center} |
---|
3248 | \includegraphics[height=4cm]{multiplexed} \\ |
---|
3249 | \begin{picture}(0,0) |
---|
3250 | \put(-48,40){\vdots} |
---|
3251 | \put(48,40){\vdots} |
---|
3252 | \put(-48,80){\vdots} |
---|
3253 | \put(48,80){\vdots} |
---|
3254 | \put(-83,132){function eval.} |
---|
3255 | \put(5,132){derivative calcul.} |
---|
3256 | \end{picture} |
---|
3257 | \end{center} |
---|
3258 | \vspace{-5ex} |
---|
3259 | \caption{Basic layout of mixed function and the corresponding derivation process} |
---|
3260 | \label{fig:basic_layout} |
---|
3261 | \end{figure} |
---|
3262 | where the OpenMP-parallel loop is preceded by a serial startup |
---|
3263 | calculation and followed by a serial finalization phase. |
---|
3264 | |
---|
3265 | Initialization of the OpenMP-parallel regions for \mbox{ADOL-C} is only a matter of adding a macro to the outermost OpenMP statement. |
---|
3266 | Two macros are available that only differ in the way the global tape information is handled. |
---|
3267 | Using {\tt ADOLC\_OPENMP}, this information, including the values of the augmented variables, is always transferred from the serial to the parallel region using {\it firstprivate} directives for initialization. |
---|
3268 | For the special case of iterative codes where parallel regions, working on the same data structures, are called repeatedly the {\tt ADOLC\_OPENMP\_NC} macro can be used. |
---|
3269 | Then, the information transfer is performed only once within the iterative process upon encounter of the first parallel region through use of the {\it threadprivate} feature of OpenMP that makes use of thread-local storage, i.e., global memory local to a thread. |
---|
3270 | Due to the inserted macro, the OpenMP statement has the following structure: |
---|
3271 | \begin{tabbing} |
---|
3272 | \hspace*{1cm} \= {\sf \#pragma omp ... ADOLC\_OPENMP} \qquad \qquad or \\ |
---|
3273 | \> {\sf \#pragma omp ... ADOLC\_OPENMP\_NC} |
---|
3274 | \end{tabbing} |
---|
3275 | Inside the parallel region, separate tapes may then be created. |
---|
3276 | Each single thread works in its own dedicated AD-environment, and all |
---|
3277 | serial facilities of \mbox{ADOL-C} are applicable as usual. The global |
---|
3278 | derivatives can be computed using the tapes created in the serial and |
---|
3279 | parallel parts of the function evaluation, where user interaction is |
---|
3280 | required for the correct derivative concatenation of the various tapes. |
---|
3281 | |
---|
3282 | For the usage of the parallel facilities, the \verb=configure=-command |
---|
3283 | has to be used with the option \verb?--with-openmp-flag=FLAG?, where |
---|
3284 | \verb=FLAG= stands for the system dependent OpenMP flag. |
---|
3285 | The parallel differentiation of a parallel program is illustrated |
---|
3286 | by the example program \verb=openmp_exam.cpp= contained in \verb=examples/additional_examples/openmp_exam=. |
---|
3287 | % |
---|
3288 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
3289 | % |
---|
3290 | \section{Traceless forward differentiation in ADOL-C} |
---|
3291 | \label{traceless} |
---|
3292 | % |
---|
3293 | Up to version 1.9.0, the development of the ADOL-C software package |
---|
3294 | was based on the decision to store all data necessary for derivative |
---|
3295 | computation on traces (tapes), where large applications require the traces to be |
---|
3296 | written out to corresponding files. In almost all cases this means |
---|
3297 | a considerable drawback in terms of run time due to the excessive |
---|
3298 | memory accesses. Using these traces enables ADOL-C to offer multiple |
---|
3299 | functions. However, it is not necessary for all tasks of derivative |
---|
3300 | computation to do that. |
---|
3301 | |
---|
3302 | Starting with version 1.10.0, ADOL-C now features a traceless forward |
---|
3303 | mode for computing first order derivatives in scalar mode, i.e., |
---|
3304 | $\dot{y} = F'(x)\dot{x}$, and in vector mode, i.e., $\dot{Y} = F'(x)\dot{X}$. |
---|
3305 | This traceless variant coexists with the more universal |
---|
3306 | trace based mode in the package. The following subsections describe |
---|
3307 | the source code modifications required to use the traceless forward mode of |
---|
3308 | ADOL-C. |
---|
3309 | % |
---|
3310 | \subsection{Modifying the Source Code} |
---|
3311 | % |
---|
3312 | Let us consider the coordinate transformation from Cartesian to spherical |
---|
3313 | polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y |
---|
3314 | = F(x)$, with |
---|
3315 | \begin{eqnarray*} |
---|
3316 | y_1 = \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad |
---|
3317 | y_2 = \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad |
---|
3318 | y_3 = \arctan(x_2/x_1), |
---|
3319 | \end{eqnarray*} |
---|
3320 | as an example. The corresponding source code is shown in \autoref{fig:traceless}. |
---|
3321 | \begin{figure}[htb] |
---|
3322 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
3323 | %\begin{center} |
---|
3324 | %\begin{flushleft} |
---|
3325 | \begin{tabbing} |
---|
3326 | \= \kill |
---|
3327 | \> {\sf \#include} {\sf $<$iostream$>$}\\ |
---|
3328 | \> {\sf using namespace std;}\\ |
---|
3329 | \> \\ |
---|
3330 | \> {\sf int main() \{}\\ |
---|
3331 | \> {\sf \rule{0.5cm}{0pt}double x[3], y[3];}\\ |
---|
3332 | \> \\ |
---|
3333 | \> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\ |
---|
3334 | \> {\sf \rule{1cm}{0pt}...}\\ |
---|
3335 | \> \\ |
---|
3336 | \> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\ |
---|
3337 | \> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\ |
---|
3338 | \> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\ |
---|
3339 | \> \\ |
---|
3340 | \> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0] $<<$ " , y2=" $<<$ y[1] $<<$ " , y3=" $<<$ y[2] $<<$ endl;}\\ |
---|
3341 | \> \\ |
---|
3342 | \> {\sf \rule{0.5cm}{0pt}return 0;}\\ |
---|
3343 | \> \} |
---|
3344 | \end{tabbing} |
---|
3345 | %\end{flushleft} |
---|
3346 | %\end{center} |
---|
3347 | }} |
---|
3348 | \caption{Example for traceless forward mode} |
---|
3349 | \label{fig:traceless} |
---|
3350 | \end{figure} |
---|
3351 | % |
---|
3352 | Changes to the source code that are necessary for applying the |
---|
3353 | traceless forward mode of ADOL-C are described in the following two |
---|
3354 | subsections, where the vector mode version is described |
---|
3355 | as extension of the scalar mode. |
---|
3356 | % |
---|
3357 | \subsubsection*{The scalar mode} |
---|
3358 | % |
---|
3359 | To use the traceless forward mode, one has to include only |
---|
3360 | the header file \verb#adtl.h# since it does not include the |
---|
3361 | trace based functions defined in other header files. |
---|
3362 | |
---|
3363 | As in the trace based forward version of ADOL-C all derivative |
---|
3364 | calculations are introduced by calls to overloaded |
---|
3365 | operators. Therefore, similar to the trace-based version all |
---|
3366 | independent, intermediate and dependent variables must be declared |
---|
3367 | with type {\sf adouble}. The whole traceless functionality provided by |
---|
3368 | \verb#adtl.h# was written as complete inline intended code |
---|
3369 | due to run time aspects, where the real portion of inlined code can |
---|
3370 | be influenced by switches for many compilers. Likely, the whole |
---|
3371 | derivative code is inlined by default. Our experiments |
---|
3372 | with the traceless mode have produced complete inlined code by using |
---|
3373 | standard switches (optimization) for GNU and Intel C++ |
---|
3374 | compiler. |
---|
3375 | |
---|
3376 | To avoid name conflicts |
---|
3377 | resulting from the inlining the traceless version has its own namespace |
---|
3378 | \verb#adtl#. As a result four possibilities of using the {\sf adouble} |
---|
3379 | type are available for the traceless version: |
---|
3380 | \begin{itemize} |
---|
3381 | \item Defining a new type |
---|
3382 | \begin{center} |
---|
3383 | \begin{tabular}{l} |
---|
3384 | {\sf typedef adtl::adouble adouble;}\\ |
---|
3385 | ...\\ |
---|
3386 | {\sf adouble tmp;} |
---|
3387 | \end{tabular} |
---|
3388 | \end{center} |
---|
3389 | This is the preferred way. Remember, you can not write an own |
---|
3390 | {\sf adouble} type/class with different meaning after doing the typedef. |
---|
3391 | \item Declaring with namespace prefix |
---|
3392 | \begin{center} |
---|
3393 | \begin{tabular}{l} |
---|
3394 | {\sf adtl::adouble tmp;} |
---|
3395 | \end{tabular} |
---|
3396 | \end{center} |
---|
3397 | Not the most handsome and efficient way with respect to coding |
---|
3398 | but without any doubt one of the safest ways. The identifier |
---|
3399 | {\sf adouble} is still available for user types/classes. |
---|
3400 | \item Trusting macros |
---|
3401 | \begin{center} |
---|
3402 | \begin{tabular}{l} |
---|
3403 | {\sf \#define adouble adtl::adouble}\\ |
---|
3404 | ...\\ |
---|
3405 | {\sf adouble tmp;} |
---|
3406 | \end{tabular} |
---|
3407 | \end{center} |
---|
3408 | This approach should be used with care, since standard defines are text replacements. |
---|
3409 | \item Using the complete namespace |
---|
3410 | \begin{center} |
---|
3411 | \begin{tabular}{l} |
---|
3412 | {\sf using namespace adtl;}\\ |
---|
3413 | ...\\ |
---|
3414 | {\sf adouble tmp;} |
---|
3415 | \end{tabular} |
---|
3416 | \end{center} |
---|
3417 | A very clear approach with the disadvantage of uncovering all the hidden secrets. Name conflicts may arise! |
---|
3418 | \end{itemize} |
---|
3419 | After defining the variables only two things are left to do. First |
---|
3420 | one needs to initialize the values of the independent variables for the |
---|
3421 | function evaluation. This can be done by assigning the variables a {\sf |
---|
3422 | double} value. The {\sf ad}-value is set to zero in this case. |
---|
3423 | Additionally, the traceless forward mode variant of ADOL-C |
---|
3424 | offers a function named {\sf setValue} for setting the value without |
---|
3425 | changing the {\sf ad}-value. To set the {\sf ad}-values of the independent |
---|
3426 | variables ADOL-C offers two possibilities: |
---|
3427 | \begin{itemize} |
---|
3428 | \item Using the constructor |
---|
3429 | \begin{center} |
---|
3430 | \begin{tabular}{l} |
---|
3431 | {\sf double seedOne = 1., seedZero = 0.;}\\ |
---|
3432 | {\sf adouble x1(2,\&seedOne), x2(4,\&seedZero), y;} |
---|
3433 | \end{tabular} |
---|
3434 | \end{center} |
---|
3435 | This would create three adoubles $x_1$, $x_2$ and $y$. Obviously, the latter |
---|
3436 | remains uninitialized. In terms of function evaluation |
---|
3437 | $x_1$ holds the value 2 and $x_2$ the value 4 whereas the derivative values |
---|
3438 | are initialized to $\dot{x}_1=1$ and $\dot{x}_2=0$. |
---|
3439 | \item Setting point values directly |
---|
3440 | \begin{center} |
---|
3441 | \begin{tabular}{l} |
---|
3442 | {\sf double seedOne = 1., seedZero = 0.;}\\ |
---|
3443 | {\sf adouble x1=2, x2=4, y;}\\ |
---|
3444 | ...\\ |
---|
3445 | {\sf x1.setADValue(\&seedOne);}\\ |
---|
3446 | {\sf x2.setADValue(\&seedZero);} |
---|
3447 | \end{tabular} |
---|
3448 | \end{center} |
---|
3449 | The same example as above but now using {\sf setADValue}-method for initializing the derivative values. It is important to note that in both cases the derivative values have to be set by passing an address of a variable or a pointer of the {\sf double} type (instead of passing the value directly). The reason for doing this will become clear in the next subsection. |
---|
3450 | \end{itemize} |
---|
3451 | % |
---|
3452 | The derivatives can be obtained at any time during the evaluation |
---|
3453 | process by calling the {\sf getADValue}-method which returns a pointer (instead of a value), and therefore, it can be assigned to another pointer or it can be dereferenced immediately in order to read the value: |
---|
3454 | \begin{center} |
---|
3455 | \begin{tabular}{l} |
---|
3456 | {\sf adouble y;}\\ |
---|
3457 | ...\\ |
---|
3458 | {\sf cout $<<$ *(y.getADValue());} |
---|
3459 | \end{tabular} |
---|
3460 | \end{center} |
---|
3461 | \autoref{fig:modcode} shows the resulting source code incorporating |
---|
3462 | all required changes for the example |
---|
3463 | given above. |
---|
3464 | |
---|
3465 | \begin{figure}[htb] |
---|
3466 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
3467 | %\begin{center} |
---|
3468 | \begin{tabbing} |
---|
3469 | \hspace*{-1cm} \= \kill |
---|
3470 | \> {\sf \#include $<$iostream$>$}\\ |
---|
3471 | \> {\sf using namespace std;}\\ |
---|
3472 | \> \\ |
---|
3473 | \> {\sf \#include $<$adolc/adtl.h$>$}\\ |
---|
3474 | \> {\sf typedef adtl::adouble adouble;}\\ |
---|
3475 | \\ |
---|
3476 | \> {\sf int main() \{}\\ |
---|
3477 | \> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\ |
---|
3478 | \\ |
---|
3479 | \> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\ |
---|
3480 | \> {\sf \rule{1cm}{0pt}...}\\ |
---|
3481 | \\ |
---|
3482 | \> {\sf \rule{0.5cm}{0pt}double seed = 1.;}\\ |
---|
3483 | \> {\sf \rule{0.5cm}{0pt}x[0].setADValue(\&seed);\hspace*{3cm}// derivative of f with respect to $x_1$}\\ |
---|
3484 | \> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\ |
---|
3485 | \> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\ |
---|
3486 | \> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\ |
---|
3487 | \\ |
---|
3488 | \> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\ |
---|
3489 | \> {\sf \rule{0.5cm}{0pt}cout $<<$ "dy2/dx1 = " $<<$ *(y[1].getADValue()) $<<$ endl;}\\ |
---|
3490 | \> {\sf \rule{0.5cm}{0pt}return 0;}\\ |
---|
3491 | \> {\sf \}} |
---|
3492 | \end{tabbing} |
---|
3493 | %\end{center} |
---|
3494 | }} |
---|
3495 | \caption{Example for traceless scalar forward mode} |
---|
3496 | \label{fig:modcode} |
---|
3497 | \end{figure} |
---|
3498 | % |
---|
3499 | \subsubsection*{The vector mode} |
---|
3500 | % |
---|
3501 | In scalar mode only one direction element has to be stored per {\sf |
---|
3502 | adouble} whereas a field of $p$ elements is needed in the vector |
---|
3503 | mode to cover the computations for the given $p$ directions. The |
---|
3504 | resulting changes to the source code are described in this section. |
---|
3505 | |
---|
3506 | In order to use the vector mode one has to call |
---|
3507 | \verb#adtl::setNumDir(N)#. This function takes the maximal number of |
---|
3508 | directions to be used within the resulting vector mode. Changing the |
---|
3509 | number of directions in the middle of executing code containing active |
---|
3510 | variables is undesirable and can lead to segmentation violations, |
---|
3511 | hence such an operation will result in an |
---|
3512 | error. \verb#adtl::setNumDir(N)# must therefore be called before |
---|
3513 | declaring any \verb#adtl::adouble# objects. |
---|
3514 | |
---|
3515 | Setting and getting the derivative values is done in the same manner as in the scalar case, by passing and retrieving the pointers, as illustrated in the following example: |
---|
3516 | \begin{center} |
---|
3517 | \begin{tabular}{l} |
---|
3518 | {\sf adtl::setNumDir(10);}\\ |
---|
3519 | {\sf adouble x, y;}\\ |
---|
3520 | {\sf double *ptr1 = new double[10];}\\ |
---|
3521 | {\sf double *ptr2;}\\ |
---|
3522 | ...\\ |
---|
3523 | {\sf x1=2;}\\ |
---|
3524 | {\sf x1.setADValue(ptr1);}\\ |
---|
3525 | ...\\ |
---|
3526 | {\sf ptr2=y.getADValue();} |
---|
3527 | \end{tabular} |
---|
3528 | \end{center} |
---|
3529 | Now using the pointers in the methods {\sf setADValue} and {\sf getADValue} makes more sense since the vector mode requires the field of $p$ elements per {\sf adouble} which holds the derivative values. |
---|
3530 | Additionally, the traceless vector forward mode of ADOL-C offers two |
---|
3531 | new methods for setting/getting the derivative values. Setting a derivative value of the desired vector element is possible by passing a {\sf double} value and its position in the vector ({\sf integer}). Getting a derivative value of the desired vector element requires as argument only its position in the vector. One can note that the pointers are not used in these cases, as illustrated in the following example: |
---|
3532 | \begin{center} |
---|
3533 | \begin{tabular}{l} |
---|
3534 | {\sf adtl::setNumDir(10);}\\ |
---|
3535 | ...\\ |
---|
3536 | {\sf adouble x, y;}\\ |
---|
3537 | ...\\ |
---|
3538 | {\sf x1=2;}\\ |
---|
3539 | {\sf x1.setADValue(5,1);\hspace*{3.7cm}// set the 6th point value of x to 1.0}\\ |
---|
3540 | ...\\ |
---|
3541 | {\sf cout $<<$ y.getADValue(3) $<<$ endl;\hspace*{1cm}// print the 4th derivative value of y} |
---|
3542 | \end{tabular} |
---|
3543 | \end{center} |
---|
3544 | The resulting source code containing all changes that are required is |
---|
3545 | shown in \autoref{fig:modcode2} |
---|
3546 | |
---|
3547 | \begin{figure}[!h!t!b] |
---|
3548 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
3549 | \begin{tabbing} |
---|
3550 | \hspace*{-1cm} \= \kill |
---|
3551 | \> {\sf \#include $<$iostream$>$}\\ |
---|
3552 | \> {\sf using namespace std;}\\ |
---|
3553 | \\ |
---|
3554 | \> {\sf \#include $<$adolc/adtl.h$>$}\\ |
---|
3555 | \> {\sf typedef adtl::adouble adouble;}\\ |
---|
3556 | \\ |
---|
3557 | \> {\sf int main() \{}\\ |
---|
3558 | \> {\sf \rule{0.5cm}{0pt}adtl::setNumDir(3);}\\ |
---|
3559 | \> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\ |
---|
3560 | \\ |
---|
3561 | \> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\ |
---|
3562 | \> {\sf \rule{1cm}{0pt}...\hspace*{3cm}// Initialize $x_i$}\\ |
---|
3563 | \> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j) if (i==j) x[i].setADValue(j,1);}\\ |
---|
3564 | \> {\sf \rule{0.5cm}{0pt}\}}\\ |
---|
3565 | \\ |
---|
3566 | \> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\ |
---|
3567 | \> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\ |
---|
3568 | \> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\ |
---|
3569 | \\ |
---|
3570 | \> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\ |
---|
3571 | \> {\sf \rule{0.5cm}{0pt}cout $<<$ "jacobian : " $<<$ endl;}\\ |
---|
3572 | \> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\ |
---|
3573 | \> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j)}\\ |
---|
3574 | \> {\sf \rule{1.5cm}{0pt}cout $<<$ y[i].getADValue(j) $<<$ " ";}\\ |
---|
3575 | \> {\sf \rule{1cm}{0pt}cout $<<$ endl;}\\ |
---|
3576 | \> {\sf \rule{0.5cm}{0pt}\}}\\ |
---|
3577 | \> {\sf \rule{0.5cm}{0pt}return 0;}\\ |
---|
3578 | \> {\sf \}} |
---|
3579 | \end{tabbing} |
---|
3580 | }} |
---|
3581 | \caption{Example for traceless vector forward mode} |
---|
3582 | \label{fig:modcode2} |
---|
3583 | \end{figure} |
---|
3584 | % |
---|
3585 | \subsection{Compiling and Linking the Source Code} |
---|
3586 | % |
---|
3587 | After incorporating the required changes, one has to compile the |
---|
3588 | source code and link the object files to get the executable. |
---|
3589 | As long as the ADOL-C header files are not included in the absolute path |
---|
3590 | the compile sequence should be similar to the following example: |
---|
3591 | \begin{center} |
---|
3592 | \begin{tabular}{l} |
---|
3593 | {\sf g++ -I/home/username/adolc\_base/include -c traceless\_scalar.cpp} |
---|
3594 | \end{tabular} |
---|
3595 | \end{center} |
---|
3596 | The \verb#-I# option tells the compiler where to search for the ADOL-C |
---|
3597 | header files. This option can be omitted when the headers are included |
---|
3598 | with absolute path or if ADOL-C is installed in a ``global'' directory. |
---|
3599 | |
---|
3600 | Although originally designed to be completely inline, certain global |
---|
3601 | variables need to be initialised in the compiled part of the library |
---|
3602 | and therefore the ADOL-C library must be linked with the compiled code. |
---|
3603 | The example started above could be finished with the |
---|
3604 | following command: |
---|
3605 | \begin{center} |
---|
3606 | \begin{tabular}{l} |
---|
3607 | {\sf g++ -o traceless\_scalar traceless\_scalar.o}\\ |
---|
3608 | {\sf -Wl,-{}-rpath -Wl,/home/username/adolc\_base/lib -L/home/username/adolc\_base/lib -ladolc} |
---|
3609 | \end{tabular} |
---|
3610 | \end{center} |
---|
3611 | The mentioned source codes {\sf traceless\_scalar.c} and {\sf traceless\_vector.c} |
---|
3612 | illustrating the use of the for traceless scalar and vector mode can be found in |
---|
3613 | the directory {\sf examples}. |
---|
3614 | % |
---|
3615 | \subsection{Concluding Remarks for the Traceless Forward Mode Variant} |
---|
3616 | % |
---|
3617 | As many other AD methods the traceless forward mode provided by the |
---|
3618 | ADOL-C package has its own strengths and drawbacks. Please read the |
---|
3619 | following section carefully to become familiar with the things that |
---|
3620 | can occur: |
---|
3621 | \begin{itemize} |
---|
3622 | \item Advantages: |
---|
3623 | \begin{itemize} |
---|
3624 | \item Code speed\\ |
---|
3625 | Increasing computation speed was one of the main aspects in writing |
---|
3626 | the traceless code. In many cases higher performance can be |
---|
3627 | expected this way. However this is not guarenteed for repeated |
---|
3628 | evaluations. |
---|
3629 | \item Smaller overall memory requirements\\ |
---|
3630 | Traceless ADOL-C does not write traces anymore, as the name |
---|
3631 | implies. Loop ''unrolling'' can be avoided this |
---|
3632 | way. Considered main memory plus disk space as overall memory |
---|
3633 | requirements the traceless version can be |
---|
3634 | executed in a more efficient way. |
---|
3635 | \end{itemize} |
---|
3636 | \item Drawbacks: |
---|
3637 | \begin{itemize} |
---|
3638 | \item Main memory limitations\\ |
---|
3639 | The ability to compute derivatives to a given function is |
---|
3640 | bounded by the main memory plus swap size when using |
---|
3641 | traceless ADOL-C. Computation from swap should be avoided anyway |
---|
3642 | as far as possible since it slows down the computing time |
---|
3643 | drastically. Therefore, if the program execution is |
---|
3644 | terminated without error message insufficient memory size can be |
---|
3645 | the reason among other things. The memory requirements $M$ can |
---|
3646 | be determined roughly as followed: |
---|
3647 | \begin{itemize} |
---|
3648 | \item Scalar mode: $M=$(number of {\sf adouble}s)$*2 + M_p$ |
---|
3649 | \item Vector mode: $M=$(number of {\sf adouble}s)*({\sf |
---|
3650 | NUMBER\_DIRECTIONS}$+1) + M_p$ |
---|
3651 | \end{itemize} |
---|
3652 | where the storage size of all non {\sf adouble} based variables is described by $M_p$. |
---|
3653 | \item Compile time\\ |
---|
3654 | As discussed in the previous sections, the traceless forward mode of |
---|
3655 | the ADOL-C package is implemented as inline intended version. Using |
---|
3656 | this approach results in a higher source code size, since every |
---|
3657 | operation involving at least one {\sf adouble} stands for the |
---|
3658 | operation itself as well as for the corresponding derivative |
---|
3659 | code after the inlining process. Therefore, the compilation time |
---|
3660 | needed for the traceless version may be higher than that of the trace based code. |
---|
3661 | \item Code Size\\ |
---|
3662 | A second drawback and result of the code inlining is the |
---|
3663 | increase of code sizes for the binaries. The increase |
---|
3664 | factor compared to the corresponding trace based program is |
---|
3665 | difficult to quantify as it is task dependent. Practical results |
---|
3666 | have shown that values between 1.0 and 2.0 can be |
---|
3667 | expected. Factors higher than 2.0 are possible too and even |
---|
3668 | values below 1.0 have been observed. |
---|
3669 | \end{itemize} |
---|
3670 | \end{itemize} |
---|
3671 | |
---|
3672 | |
---|
3673 | \section{Traceless forward differentiation in ADOL-C using Cuda} |
---|
3674 | \label{tracelessCuda} |
---|
3675 | % |
---|
3676 | One major drawback using the traceless version of ADOL-C is the fact |
---|
3677 | that several function evaluations are needed to compute derivatives |
---|
3678 | in many different points. More precisely, to calculate the Jacobian for |
---|
3679 | a function $F:\mathbb{R}^n\rightarrow \mathbb{R}^m$ in $M$ points, $M$ |
---|
3680 | function evaluations are needed for the traceless vector mode and even |
---|
3681 | $M*n$ for the traceless scalar mode. Depending on the size of the function |
---|
3682 | this can result in a long runtime. To achieve a better performance one can |
---|
3683 | use parallelisation techniques as the same operations are |
---|
3684 | performed during a function evaluation. One possibility is to use GPUs |
---|
3685 | since they are optimized for data parallel computation. Starting with |
---|
3686 | version 2.3.0 ADOL-C now features a traceless forward mode for computing |
---|
3687 | first order derivatives in scalar mode on GPUs using the |
---|
3688 | general purpose parallel computing architecture Cuda. |
---|
3689 | |
---|
3690 | The idea is to include parallel code that executes in many GPU threads across |
---|
3691 | processing elements. This can be done by using kernel functions, that is functions |
---|
3692 | which are executed on GPU as an array of threads in parallel. In general all |
---|
3693 | threads execute the same code. They are grouped into blocks which are then grouped |
---|
3694 | into grids. A kernel is executed as a grid of blocks of threads. For more |
---|
3695 | details see, e.g., the NVIDIA CUDA C Programming Guide which can be downloaded from |
---|
3696 | the web page {\sf www.nvidia.com}. To solve the problem of calculating |
---|
3697 | the Jacobian of $F$ at $M$ points it is possible to let each thread perform a |
---|
3698 | function evaluation and thus the computation of derivatives for one direction |
---|
3699 | at one point. The advantage is that the function is evaluated in different points |
---|
3700 | in parallel which can result in a faster wallclock runtime. |
---|
3701 | |
---|
3702 | The following subsection describes the source code modifications required to use |
---|
3703 | the traceless forward mode of ADOL-C with Cuda. |
---|
3704 | |
---|
3705 | \subsection{Modifying the source code} |
---|
3706 | |
---|
3707 | Let us again consider the coordinate transformation from Cartesian to spherical |
---|
3708 | polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y |
---|
3709 | = F(x)$, with |
---|
3710 | \begin{eqnarray*} |
---|
3711 | y_1 = \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad |
---|
3712 | y_2 = \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad |
---|
3713 | y_3 = \arctan(x_2/x_1), |
---|
3714 | \end{eqnarray*} |
---|
3715 | as an example. We now calculate the Jacobian at $M=1024$ different points. |
---|
3716 | The source code for one point is shown in \autoref{fig:traceless}. This example |
---|
3717 | has no real application but can still be used to show the combination of the traceless version |
---|
3718 | of ADOL-C and Cuda. |
---|
3719 | |
---|
3720 | For the use of this mode a Cuda toolkit, which is suitable for the grafic card used, |
---|
3721 | has to be installed. Furthermore, it is important to check that the graphic card |
---|
3722 | used supports double precision (for details see e.g. NVIDIA CUDA C Programming Guide). |
---|
3723 | Otherwise the data type employed inside of the {\sf adouble} class has to be adapted |
---|
3724 | to {\sf float}. To use the traceless forward mode with Cuda, one has to include the header |
---|
3725 | files \verb#adoublecuda.h# and \verb#cuda.h#. The first one contains the |
---|
3726 | definition of the class {\sf adouble} and the overloaded operators |
---|
3727 | for this version of ADOL-C. As in the other versions all derivative |
---|
3728 | calculations are introduced by calls to overloaded operators. The |
---|
3729 | second header file is needed for the use of Cuda. |
---|
3730 | |
---|
3731 | One possibility to solve the problem above is the following. First of all |
---|
3732 | three {\sf double} arrays are needed: {\sf x} for the independent variables, |
---|
3733 | {\sf y} for the dependent variables and {\sf deriv} for the values of the |
---|
3734 | Jacobian matrices. The independent variables have to be initialised, therefore |
---|
3735 | the points at which the function should be evaluated are saved in a row in the |
---|
3736 | same array of length $3*M$. For the computation on GPUs one also has to allocate memory |
---|
3737 | on this device. Using the syntax in Cuda one can allocate an array of |
---|
3738 | length $3*M$ (number of independent variables times number of points) for |
---|
3739 | the independent variables as follows: |
---|
3740 | \begin{center} |
---|
3741 | \begin{tabular}{l} |
---|
3742 | {\sf double * devx;}\\ |
---|
3743 | {\sf cudaMalloc((void**)\&devx, 3*M*sizeof(double));}\\ |
---|
3744 | \end{tabular} |
---|
3745 | \end{center} |
---|
3746 | The arrays for the dependent variables and the values of the Jacobian matrices |
---|
3747 | are allocated in the same way. Then the values of the independent variables |
---|
3748 | have to be copied to the GPU using the following command |
---|
3749 | \begin{center} |
---|
3750 | \begin{tabular}{l} |
---|
3751 | {\sf cudaMemcpy(devx, x, sizeof(double)*3*M, cudaMemcpyHostToDevice);}\\ |
---|
3752 | \end{tabular} |
---|
3753 | \end{center} |
---|
3754 | The argument {\sf cudaMemcpyHostToDevice} indicates that the values are copied from |
---|
3755 | the host to the GPU. In this case the values stored in {\sf x} are copied to {\sf devx}. |
---|
3756 | |
---|
3757 | Now all required information has been transferred to the GPU. The changes |
---|
3758 | in the source code made so far are summarized in \autoref{fig:cudacode1}. |
---|
3759 | |
---|
3760 | \begin{figure}[!h!t!b] |
---|
3761 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
3762 | %\begin{center} |
---|
3763 | \begin{tabbing} |
---|
3764 | \hspace*{-1cm} \= \kill |
---|
3765 | \> {\sf \#include $<$iostream$>$}\\ |
---|
3766 | \> {\sf \#include $<$cuda.h$>$}\\ |
---|
3767 | \> {\sf \#include $<$adoublecuda.h$>$}\\ |
---|
3768 | \> {\sf using namespace std;}\\ |
---|
3769 | \\ |
---|
3770 | \> {\sf int main() \{}\\ |
---|
3771 | \> {\sf \rule{0.5cm}{0pt}int M=1024;}\\ |
---|
3772 | \> {\sf \rule{0.5cm}{0pt}double* deriv = new double[9*M];}\\ |
---|
3773 | \> {\sf \rule{0.5cm}{0pt}double* y = new double[3*M];}\\ |
---|
3774 | \> {\sf \rule{0.5cm}{0pt}double* x = new double[3*M];}\\ |
---|
3775 | \\ |
---|
3776 | \> {\sf \rule{0.5cm}{0pt}// Initialize $x_i$}\\ |
---|
3777 | \> {\sf \rule{0.5cm}{0pt}for (int k=0; k$<$M; ++k)\{}\\ |
---|
3778 | \> {\sf \rule{1cm}{0pt}for (int i=0; i$<$3; ++i)}\\ |
---|
3779 | \> {\sf \rule{1.5cm}{0pt}x[k*3+i] =i + 1/(k+1);\}}\\ |
---|
3780 | \\ |
---|
3781 | \> {\sf \rule{0.5cm}{0pt}// Allocate array for independent and dependent variables}\\ |
---|
3782 | \> {\sf \rule{0.5cm}{0pt}// and Jacobian matrices on GPU}\\ |
---|
3783 | \> {\sf \rule{0.5cm}{0pt}double * devx;}\\ |
---|
3784 | \> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devx, 3*M*sizeof(double));}\\ |
---|
3785 | \> {\sf \rule{0.5cm}{0pt}double * devy;}\\ |
---|
3786 | \> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devy, 3*M*sizeof(double));}\\ |
---|
3787 | \> {\sf \rule{0.5cm}{0pt}double * devderiv;}\\ |
---|
3788 | \> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devderiv, 3*3*M*sizeof(double));}\\ |
---|
3789 | \\ |
---|
3790 | \> {\sf \rule{0.5cm}{0pt}// Copy values of independent variables from host to GPU}\\ |
---|
3791 | \> {\sf \rule{0.5cm}{0pt}cudaMemcpy(devx, x, sizeof(double)*3*M, cudaMemcpyHostToDevice);}\\ |
---|
3792 | \\ |
---|
3793 | \> {\sf \rule{0.5cm}{0pt}// Call function to specify amount of blocks and threads to be used}\\ |
---|
3794 | \> {\sf \rule{0.5cm}{0pt}kernellaunch(devx, devy, devderiv,M);}\\ |
---|
3795 | \\ |
---|
3796 | \> {\sf \rule{0.5cm}{0pt}// Copy values of dependent variables and Jacobian matrices from GPU to host}\\ |
---|
3797 | \> {\sf \rule{0.5cm}{0pt}cudaMemcpy(y, devy, sizeof(double)*3*M,cudaMemcpyDeviceToHost);}\\ |
---|
3798 | \> {\sf \rule{0.5cm}{0pt}cudaMemcpy(deriv, devderiv, sizeof(double)*3*3*M, cudaMemcpyDeviceToHost);}\\ |
---|
3799 | \> {\sf \}} |
---|
3800 | \end{tabbing} |
---|
3801 | %\end{center} |
---|
3802 | }} |
---|
3803 | \caption{Example for traceless scalar forward mode with Cuda} |
---|
3804 | \label{fig:cudacode1} |
---|
3805 | \end{figure} |
---|
3806 | |
---|
3807 | In the next step the user has to specify how many blocks and threads per block |
---|
3808 | will be needed for the function evaluations. In the present example this is done by |
---|
3809 | the call of the function {\sf kernellaunch}, see \autoref{fig:cudacode2}. |
---|
3810 | In this case the blocks are two dimensional: the x-dimension is determined |
---|
3811 | by the number of points $M$ at which the Jacobian matrix has to be calculated |
---|
3812 | while the y-dimension is given by the number of independent variables, i.e., 3. |
---|
3813 | Since a block cannot contain more than 1024 threads, the |
---|
3814 | x-dimension in the example is 64 instead of $M=1024$. Therefore $1024/64=16$ |
---|
3815 | blocks are needed. The described division into blocks is reasonable as each |
---|
3816 | thread has to perform a function evaluation for one point and one direction, |
---|
3817 | hence $M*3$ threads are needed where 3 denotes the number of independent |
---|
3818 | variables corresponding to the number of directions needed for the computation |
---|
3819 | of the Jacobian in one point. |
---|
3820 | |
---|
3821 | \begin{figure}[!h!t!b] |
---|
3822 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
3823 | \begin{tabbing} |
---|
3824 | \hspace*{-1cm} \= \kill |
---|
3825 | \> {\sf cudaError\_t kernellaunch(double* inx, double* outy, double* outderiv, int M) \{}\\ |
---|
3826 | \> {\sf \rule{0.5cm}{0pt}// Create 16 blocks}\\ |
---|
3827 | \> {\sf \rule{0.5cm}{0pt}int Blocks=16;}\\ |
---|
3828 | \> {\sf \rule{0.5cm}{0pt}// Two dimensional (M/Blocks)$\times$3 blocks}\\ |
---|
3829 | \> {\sf \rule{0.5cm}{0pt}dim3 threadsPerBlock(M/Blocks,3);}\\ |
---|
3830 | \\ |
---|
3831 | \> {\sf \rule{0.5cm}{0pt}// Call kernel function with 16 blocks with (M/Blocks)$\times$3 threads per block}\\ |
---|
3832 | \> {\sf \rule{0.5cm}{0pt}kernel $<<<$ Blocks, threadsPerBlock $>>>$(inx, outy, outderiv);}\\ |
---|
3833 | \> {\sf \rule{0.5cm}{0pt}cudaError\_t cudaErr = cudaGetLastError();}\\ |
---|
3834 | \\ |
---|
3835 | \> {\sf \rule{0.5cm}{0pt}return cudaErr;}\\ |
---|
3836 | \> {\sf \}} |
---|
3837 | \end{tabbing} |
---|
3838 | }} |
---|
3839 | \caption{Example for traceless scalar forward mode with Cuda} |
---|
3840 | \label{fig:cudacode2} |
---|
3841 | \end{figure} |
---|
3842 | |
---|
3843 | We can now perform the function evaluations together with the calculation |
---|
3844 | of the Jacobians. The corresponding code is illustrated in \autoref{fig:cudacode3}. |
---|
3845 | Since the function evaluations should be performed on a GPU a kernel function is |
---|
3846 | needed which is defined by using the \mbox{{\sf \_\_ global\_\_}} declaration specifier |
---|
3847 | (see \autoref{fig:cudacode3}). Then each thread executes the operations that are |
---|
3848 | defined in the kernel. |
---|
3849 | |
---|
3850 | \begin{figure}[!h!t!b] |
---|
3851 | \framebox[\textwidth]{\parbox{\textwidth}{ |
---|
3852 | \begin{tabbing} |
---|
3853 | \hspace*{-1cm} \= \kill |
---|
3854 | \> {\sf \_\_global\_\_ void kernel(double* inx, double* outy, double* outderiv) \{}\\ |
---|
3855 | \> {\sf \rule{0.5cm}{0pt}const int index = threadIdx.x ;}\\ |
---|
3856 | \> {\sf \rule{0.5cm}{0pt}const int index1 = threadIdx.y;}\\ |
---|
3857 | \> {\sf \rule{0.5cm}{0pt}const int index2 = blockIdx.x;}\\ |
---|
3858 | \> {\sf \rule{0.5cm}{0pt}const int index3 = blockDim.x;}\\ |
---|
3859 | \> {\sf \rule{0.5cm}{0pt}const int index4 = blockDim.x*blockDim.y;}\\ |
---|
3860 | \\ |
---|
3861 | \> {\sf \rule{0.5cm}{0pt}// Declare dependent and independent variables as {\sf adouble}s}\\ |
---|
3862 | \> {\sf \rule{0.5cm}{0pt}adtlc::adouble y[3], x[3];}\\ |
---|
3863 | \> {\sf \rule{0.5cm}{0pt}// Read out point for function evaluation}\\ |
---|
3864 | \> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\ |
---|
3865 | \> {\sf \rule{1cm}{0pt}x[i]=inx[index2*index4+index*3+i];}\\ |
---|
3866 | \> {\sf \rule{0.5cm}{0pt}// Set direction for calculation of derivatives}\\ |
---|
3867 | \> {\sf \rule{0.5cm}{0pt}x[index1].setADValue(1);}\\ |
---|
3868 | \\ |
---|
3869 | \> {\sf \rule{0.5cm}{0pt}// Function evaluation}\\ |
---|
3870 | \> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\ |
---|
3871 | \> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\ |
---|
3872 | \> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\ |
---|
3873 | \\ |
---|
3874 | \> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\ |
---|
3875 | \> {\sf \rule{1cm}{0pt}outy[(index2*index3+index)*3+i]=y[i].getValue();}\\ |
---|
3876 | \> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\ |
---|
3877 | \> {\sf \rule{1cm}{0pt}outderiv[(index2*index4+index*3+index1)*3+i]=y[i].getADValue();}\\ |
---|
3878 | \> {\sf \}} |
---|
3879 | \end{tabbing} |
---|
3880 | }} |
---|
3881 | \caption{Example for traceless scalar forward mode with Cuda} |
---|
3882 | \label{fig:cudacode3} |
---|
3883 | \end{figure} |
---|
3884 | |
---|
3885 | In this example each thread is assigned the task of calculating the |
---|
3886 | derivatives in one point with respect to one independent variable. Therefore |
---|
3887 | some indices are needed for the implementation. Each thread has a unique thread |
---|
3888 | ID in a block consisting of an x and an y-dimension. The blocks have an ID as |
---|
3889 | well. In the example the following indices are used. |
---|
3890 | |
---|
3891 | \begin{itemize} |
---|
3892 | \item {\sf index = threadIdx.x} denotes the x-dimension of a thread (ranges from 0 to 63 in the example) |
---|
3893 | \item {\sf index1 = threadIdx.y} denotes the y-dimension of a thread (ranges from 0 to 2 in the example) |
---|
3894 | \item {\sf index2 = blockIdx.x} denotes the block index (ranges from 0 to 15 in the example) |
---|
3895 | \item {\sf index3 = blockDim.x} denotes the x-dimension of a block (always 64 in the example) |
---|
3896 | \item {\sf index4 = blockDim.x*blockDim.y} denotes the size of a block (always $64*3=192$ in the example) |
---|
3897 | \end{itemize} |
---|
3898 | |
---|
3899 | For the calculation of derivatives the function evaluation has to be performed |
---|
3900 | with {\sf adouble}s. Therefore the dependent and the independent variables |
---|
3901 | have to be declared as {\sf adouble}s as in the other versions of ADOL-C (see, e.g., \autoref{traceless}). |
---|
3902 | Similar to the traceless version without Cuda the namespace \verb#adtlc# is used. |
---|
3903 | Now each thread has to read out the point at which the function evaluation is |
---|
3904 | then performed. This is determined by the blockindex and the x-dimension of |
---|
3905 | the thread, therefore the independent variables in a thread |
---|
3906 | have the values |
---|
3907 | \begin{center} |
---|
3908 | \begin{tabular}{l} |
---|
3909 | {\sf \rule{0.5cm}{0pt}x[i]=inx[index2*index4+index*3+i];}\\ |
---|
3910 | \end{tabular} |
---|
3911 | \end{center} |
---|
3912 | for {\sf i=0,1,2} where {\sf inx} corresponds to the vector {\sf devx}. The direction for |
---|
3913 | the derivatives is given by {\sf index1}. |
---|
3914 | \begin{center} |
---|
3915 | \begin{tabular}{l} |
---|
3916 | {\sf \rule{0.5cm}{0pt}x[index1].setADValue(1);}\\ |
---|
3917 | \end{tabular} |
---|
3918 | \end{center} |
---|
3919 | The functions for setting and getting the value and the derivative value of an |
---|
3920 | {\sf adouble} are the same as in the traceless version of ADOL-C for first |
---|
3921 | order derivatives (see \autoref{traceless}). |
---|
3922 | The function evaluation is then performed with {\sf adouble x} on GPU and the results |
---|
3923 | are saved in {\sf adouble y}. The function evaluation itself remains unchanged |
---|
3924 | (see \autoref{fig:cudacode3}). Then we store the values of the function |
---|
3925 | evaluations in each point in a row in one array: |
---|
3926 | \begin{center} |
---|
3927 | \begin{tabular}{l} |
---|
3928 | {\sf \rule{0.5cm}{0pt}outy[(index2*index3+index)*3+i]=y[i].getValue();}\\ |
---|
3929 | \end{tabular} |
---|
3930 | \end{center} |
---|
3931 | for {\sf i=0,1,2}. The values of a Jacobian are stored column by column in |
---|
3932 | a row: |
---|
3933 | \begin{center} |
---|
3934 | \begin{tabular}{l} |
---|
3935 | {\sf \rule{0.5cm}{0pt}outderiv[(index2*index4+index*3+index1)*3+i]=y[i].getADValue();}\\ |
---|
3936 | \end{tabular} |
---|
3937 | \end{center} |
---|
3938 | for {\sf i=0,1,2}. |
---|
3939 | |
---|
3940 | Now there is one thing left to do. The values calculated on the GPU have to be copied |
---|
3941 | back to host. For the dependent variables in the example this can be done by the |
---|
3942 | following call: |
---|
3943 | \begin{center} |
---|
3944 | \begin{tabular}{l} |
---|
3945 | {\sf \rule{0.5cm}{0pt}cudaMemcpy(y, devy, sizeof(double)*3*M,cudaMemcpyDeviceToHost);}\\ |
---|
3946 | \end{tabular} |
---|
3947 | \end{center} |
---|
3948 | see \autoref{fig:cudacode1}. The argument {\sf cudaMemcpyDeviceToHost} determines that |
---|
3949 | the values are copied from GPU to host. |
---|
3950 | |
---|
3951 | % |
---|
3952 | \subsection{Compiling and Linking the Source Code} |
---|
3953 | % |
---|
3954 | After incorporating the required changes, one has to compile the |
---|
3955 | source code and link the object files to get the executable. For |
---|
3956 | the compilation of a Cuda file the {\sf nvcc} compiler is needed. |
---|
3957 | The compile sequence should be similar to the following example: |
---|
3958 | \begin{center} |
---|
3959 | \begin{tabular}{l} |
---|
3960 | {\sf nvcc -arch=sm\_20 -o traceless\_cuda traceless\_cuda.cu} |
---|
3961 | \end{tabular} |
---|
3962 | \end{center} |
---|
3963 | The compiler option \verb#-arch=sm_20# specifies the compute capability that is |
---|
3964 | assumed, in this case one that supports double precision. |
---|
3965 | |
---|
3966 | The mentioned source code {\sf traceless\_cuda.cu} illustrating the use of the |
---|
3967 | forward traceless scalar mode with Cuda and a further example {\sf liborgpu.cu} |
---|
3968 | can be found in the directory examples. The second example is an adaption of |
---|
3969 | the OpenMP example to the traceless version with Cuda. |
---|
3970 | |
---|
3971 | % |
---|
3972 | %++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
---|
3973 | \section{Installing and Using ADOL-C} |
---|
3974 | \label{install} |
---|
3975 | % |
---|
3976 | \subsection{Generating the ADOL-C Library} |
---|
3977 | \label{genlib} |
---|
3978 | % |
---|
3979 | The currently built system is best summarized by the ubiquitous gnu |
---|
3980 | install triplet |
---|
3981 | \begin{center} |
---|
3982 | \verb=configure - make - make install= . |
---|
3983 | \end{center} |
---|
3984 | Executing this three steps from the package base directory |
---|
3985 | \verb=</SOMEPATH/=\texttt{\packagetar}\verb=>= will compile the static and the dynamic |
---|
3986 | ADOL-C library with default options and install the package (libraries |
---|
3987 | and headers) into the default installation directory {\tt |
---|
3988 | \verb=<=\$HOME/adolc\_base\verb=>=}. Inside the install directory |
---|
3989 | the subdirectory \verb=include= will contain all the installed header |
---|
3990 | files that may be included by the user program, the subdirectory |
---|
3991 | \verb=lib= will contain the 32-bit compiled library |
---|
3992 | and the subdirectory \verb=lib64= will contain the 64-bit compiled |
---|
3993 | library. Depending on the compiler only one of \verb=lib= or |
---|
3994 | \verb=lib64= may be created. |
---|
3995 | |
---|
3996 | Before doing so the user may modify the header file \verb=usrparms.h= |
---|
3997 | in order to tailor the \mbox{ADOL-C} package to the needs in the |
---|
3998 | particular system environment as discussed in |
---|
3999 | \autoref{Customizing}. The configure procedure which creates the necessary |
---|
4000 | \verb=Makefile=s can be customized by use of some switches. Available |
---|
4001 | options and their meaning can be obtained by executing |
---|
4002 | \verb=./configure --help= from the package base directory. |
---|
4003 | |
---|
4004 | All object files and other intermediately generated files can be |
---|
4005 | removed by the call \verb=make clean=. Uninstalling ADOL-C by |
---|
4006 | executing \verb=make uninstall= is only reasonable after a previous |
---|
4007 | called \verb=make install= and will remove all installed package files |
---|
4008 | but will leave the created directories behind. |
---|
4009 | |
---|
4010 | The sparse drivers are included in the ADOL-C libraries if the |
---|
4011 | \verb=./configure= command is executed with the option |
---|
4012 | \verb=--enable-sparse=. The ColPack library available at |
---|
4013 | \verb=http://cscapes.cs.purdue.edu/coloringpage/software.htm= is required to |
---|
4014 | compute the sparse structures, and is searched for in all the default |
---|
4015 | locations. |
---|
4016 | In case the library and its headers are installed in a nonstandard path |
---|
4017 | this may be specified with the \verb?--with-colpack=PATH? option. |
---|
4018 | It is assumed that the library and its header files have the following |
---|
4019 | directory structure: \verb?PATH/include? contains all the header |
---|
4020 | files, |
---|
4021 | \verb?PATH/lib? contains the 32-bit compiled library and |
---|
4022 | \verb?PATH/lib64? contains the 64-bit compiled library. Depending on |
---|
4023 | the compiler used to compile {\sf ADOL-C} one of these libraries will |
---|
4024 | be used for linking. |
---|
4025 | |
---|
4026 | The option \verb=--disable-stdczero= turns off the initialization in the {\sf adouble} default |
---|
4027 | constructor. This will improve efficiency but requires that there be no implicit array initialization in the code, see |
---|
4028 | \autoref{WarSug}. |
---|
4029 | |
---|
4030 | Support for MPI and the AdjoinableMPI API (see\\ |
---|
4031 | \verb=http://www.mcs.anl.gov/~utke/AdjoinableMPI/AdjoinableMPIDox/index.html= |
---|
4032 | ) may be enabled using the option \verb=--enable-ampi=. This requires |
---|
4033 | the presence of MPI compiler wrappers \verb=mpicc= and \verb=mpicxx= |
---|
4034 | in the \verb=$PATH= and the AdjoinableMPI libraries in the standard |
---|
4035 | locations. If MPI is installed in a nonstandard path one may specify |
---|
4036 | this using the \verb?--with-mpi=PATH? option. Similarly if the |
---|
4037 | AdjoinableMPI libraries are in an nonstandard path this may be |
---|
4038 | specified using the \verb?--with-ampi=PATH? option. When MPI and |
---|
4039 | AdjoinableMPI support is compiled into {\sf ADOL-C} the generated |
---|
4040 | library will be unsuitable for linking with non-MPI |
---|
4041 | programs and is called \verb=libadolc_ampi=. Therefore the user |
---|
4042 | programs should use the flag \verb=-ladolc_ampi= instead of |
---|
4043 | \verb=-ladolc= in this case. An advanced user may infact specify |
---|
4044 | whatever name the resulting {\sf ADOL-C} library should have using the |
---|
4045 | \verb?--with-soname=SONAME? option. |
---|
4046 | |
---|
4047 | |
---|
4048 | \subsection{Compiling and Linking the Example Programs} |
---|
4049 | % |
---|
4050 | The installation procedure described in \autoref{genlib} also |
---|
4051 | provides the \verb=Makefile=s to compile the example programs in the |
---|
4052 | directories \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= and the |
---|
4053 | additional examples in |
---|
4054 | \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples/additional_examples=. However, |
---|
4055 | one has to execute the |
---|
4056 | \verb=configure= command with appropriate options for the ADOL-C package to enable the compilation of |
---|
4057 | examples. Available options are: |
---|
4058 | \begin{center} |
---|
4059 | \begin{tabular}[t]{ll} |
---|
4060 | \verb=--enable-docexa=&build all examples discussed in this manual\\ |
---|
4061 | &(compare \autoref{example})\\ |
---|
4062 | \verb=--enable-addexa=&build all additional examples\\ |
---|
4063 | &(See file \verb=README= in the various subdirectories) |
---|
4064 | \end{tabular} |
---|
4065 | \end{center} |
---|
4066 | |
---|
4067 | Just calling \verb=make= from the packages base directory generates |
---|
4068 | all configured examples and the library if necessary. Compiling from |
---|
4069 | subdirectory \verb=examples= or one of its subfolders is possible |
---|
4070 | too. At least one kind of the ADOL-C library (static or shared) must |
---|
4071 | have been built previously in that case. Hence, building the library |
---|
4072 | is always the first step. |
---|
4073 | |
---|
4074 | For Compiling the library and the documented examples on Windows using |
---|
4075 | Visual Studio please refer to the \verb=<Readme_VC++.txt>= files in |
---|
4076 | the \verb=<windows/>=, \verb=<ThirdParty/ColPack/>= and |
---|
4077 | \verb=<ADOL-C/examples/>= subdirectories. |
---|
4078 | % |
---|
4079 | \subsection{Description of Important Header Files} |
---|
4080 | \label{ssec:DesIH} |
---|
4081 | % |
---|
4082 | The application of the facilities of ADOL-C requires the user |
---|
4083 | source code (program or module) to include appropriate |
---|
4084 | header files where the desired data types and routines are |
---|
4085 | prototyped. The new hierarchy of header files enables the user |
---|
4086 | to take one of two possible ways to access the right interfaces. |
---|
4087 | The first and easy way is recommended to beginners: As indicated in |
---|
4088 | \autoref{globalHeaders} the provided {\em global} header file |
---|
4089 | \verb=<adolc/adolc.h>= can be included by any user code to support all |
---|
4090 | capabilities of ADOL-C depending on the particular programming language |
---|
4091 | of the source. |
---|
4092 | |
---|
4093 | \begin{table}[h] |
---|
4094 | \center \small |
---|
4095 | \begin{tabular}{|p{3.6cm}|p{10.5cm}|}\hline |
---|
4096 | \verb=<adolc/adolc.h>= & |
---|
4097 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
4098 | \boldmath $\rightarrow$ \unboldmath |
---|
4099 | & global header file available for easy use of ADOL-C; \\ |
---|
4100 | $\bullet$ & includes all ADOL-C header files depending on |
---|
4101 | whether the users source is C++ or C code. |
---|
4102 | \end{tabular*} |
---|
4103 | \\ \hline |
---|
4104 | \verb=<adolc/internal/usrparms.h>= & |
---|
4105 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
4106 | \boldmath $\rightarrow$ \unboldmath |
---|
4107 | & user customization of ADOL-C package (see |
---|
4108 | \autoref{Customizing}); \\ |
---|
4109 | $\bullet$ & after a change of |
---|
4110 | user options the ADOL-C library \verb=libadolc.*= |
---|
4111 | has to be rebuilt (see \autoref{genlib}); \\ |
---|
4112 | $\bullet$ & is included by all ADOL-C header files and thus by all user |
---|
4113 | programs. |
---|
4114 | \end{tabular*} \\ \hline |
---|
4115 | \end{tabular} |
---|
4116 | \caption{Global header files} |
---|
4117 | \label{globalHeaders} |
---|
4118 | \end{table} |
---|
4119 | |
---|
4120 | The second way is meant for the more advanced ADOL-C user: Some source code |
---|
4121 | includes only those interfaces used by the particular application. |
---|
4122 | The respectively needed header files are indicated |
---|
4123 | throughout the manual. |
---|
4124 | Existing application determined dependences between the provided |
---|
4125 | ADOL-C routines are realized by automatic includes of headers in order |
---|
4126 | to maintain easy use. The header files important to the user are described |
---|
4127 | in the \autoref{importantHeaders1} and \autoref{importantHeaders2}. |
---|
4128 | |
---|
4129 | \begin{table}[h] |
---|
4130 | \center \small |
---|
4131 | \begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline |
---|
4132 | %\multicolumn{2}{|l|}{\bf Tracing/taping}\\ \hline |
---|
4133 | \verb=<adolc/adouble.h>= & |
---|
4134 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
4135 | \boldmath $\rightarrow$ \unboldmath |
---|
4136 | & provides the interface to the basic active |
---|
4137 | scalar data type of ADOL-C: {\sf class adouble} |
---|
4138 | (see \autoref{prepar}); |
---|
4139 | % $\bullet$ & includes the header files \verb=<adolc/avector.h>= and \verb=<adolc/taputil.h>=. |
---|
4140 | \end{tabular*} |
---|
4141 | \\ \hline |
---|
4142 | % \verb=<adolc/avector.h>= & |
---|
4143 | %\begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
4144 | % \boldmath $\rightarrow$ \unboldmath |
---|
4145 | % & provides the interface to the active vector |
---|
4146 | % and matrix data types of ADOL-C: {\sf class adoublev} |
---|
4147 | % and {\sf class adoublem}, respectively |
---|
4148 | % (see \autoref{arrays}); \\ |
---|
4149 | % $\bullet$ & is included by the header \verb=<adolc/adouble.h>=. |
---|
4150 | %\end{tabular*} |
---|
4151 | %\\ \hline |
---|
4152 | \verb=<adolc/taputil.h>= & |
---|
4153 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
4154 | \boldmath $\rightarrow$ \unboldmath |
---|
4155 | & provides functions to start/stop the tracing of |
---|
4156 | active sections (see \autoref{markingActive}) |
---|
4157 | as well as utilities to obtain |
---|
4158 | tape statistics (see \autoref{examiningTape}); \\ |
---|
4159 | $\bullet$ & is included by the header \verb=<adolc/adouble.h>=. |
---|
4160 | \end{tabular*} |
---|
4161 | \\ \hline |
---|
4162 | \end{tabular} |
---|
4163 | \caption{Important header files: tracing/taping} |
---|
4164 | \label{importantHeaders1} |
---|
4165 | \end{table} |
---|
4166 | % |
---|
4167 | \begin{table}[h] |
---|
4168 | \center \small |
---|
4169 | \begin{tabular}{|p{3.8cm}|p{10.5cm}|}\hline |
---|
4170 | %\multicolumn{2}{|l|}{\bf Evaluation of derivatives}\\ \hline |
---|
4171 | \verb=<adolc/interfaces.h>= & |
---|
4172 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
4173 | \boldmath $\rightarrow$ \unboldmath |
---|
4174 | & provides interfaces to the {\sf forward} and |
---|
4175 | {\sf reverse} routines as basic versions of derivative |
---|
4176 | evaluation (see \autoref{forw_rev}); \\ |
---|
4177 | $\bullet$ & comprises C++, C, and Fortran-callable versions; \\ |
---|
4178 | $\bullet$ & includes the header \verb=<adolc/sparse/sparsedrivers.h>=; \\ |
---|
4179 | $\bullet$ & is included by the header \verb=<adolc/drivers/odedrivers.h>=. |
---|
4180 | \end{tabular*} |
---|
4181 | \\ \hline |
---|
4182 | \verb=<adolc/drivers.h>= & |
---|
4183 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
4184 | \boldmath $\rightarrow$ \unboldmath |
---|
4185 | & provides ``easy to use'' drivers for solving |
---|
4186 | optimization problems and nonlinear equations |
---|
4187 | (see \autoref{optdrivers}); \\ |
---|
4188 | $\bullet$ & comprises C and Fortran-callable versions. |
---|
4189 | \end{tabular*} |
---|
4190 | \\ \hline |
---|
4191 | \begin{minipage}{3cm} |
---|
4192 | \verb=<adolc/sparse/=\newline\verb= sparsedrivers.h>= |
---|
4193 | \end{minipage} & |
---|
4194 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
4195 | \boldmath $\rightarrow$ \unboldmath |
---|
4196 | & provides the ``easy to use'' sparse drivers |
---|
4197 | to exploit the sparsity structure of |
---|
4198 | Jacobians (see \autoref{sparse}); \\ |
---|
4199 | \boldmath $\rightarrow$ \unboldmath & provides interfaces to \mbox{C++}-callable versions |
---|
4200 | of {\sf forward} and {\sf reverse} routines |
---|
4201 | propagating bit patterns (see \autoref{ProBit}); \\ |
---|
4202 | |
---|
4203 | $\bullet$ & is included by the header \verb=<adolc/interfaces.h>=. |
---|
4204 | \end{tabular*} |
---|
4205 | \\ \hline |
---|
4206 | \begin{minipage}{3cm} |
---|
4207 | \verb=<adolc/sparse/=\newline\verb= sparse_fo_rev.h>= |
---|
4208 | \end{minipage} & |
---|
4209 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
4210 | \boldmath $\rightarrow$ \unboldmath |
---|
4211 | & provides interfaces to the underlying C-callable |
---|
4212 | versions of {\sf forward} and {\sf reverse} routines |
---|
4213 | propagating bit patterns. |
---|
4214 | \end{tabular*} |
---|
4215 | \\ \hline |
---|
4216 | \begin{minipage}{3cm} |
---|
4217 | \verb=<adolc/drivers/=\newline\verb= odedrivers.h>= |
---|
4218 | \end{minipage} & |
---|
4219 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
4220 | \boldmath $\rightarrow$ \unboldmath |
---|
4221 | & provides ``easy to use'' drivers for numerical |
---|
4222 | solution of ordinary differential equations |
---|
4223 | (see \autoref{odedrivers}); \\ |
---|
4224 | $\bullet$ & comprises C++, C, and Fortran-callable versions; \\ |
---|
4225 | $\bullet$ & includes the header \verb=<adolc/interfaces.h>=. |
---|
4226 | \end{tabular*} |
---|
4227 | \\ \hline |
---|
4228 | \begin{minipage}{3cm} |
---|
4229 | \verb=<adolc/drivers/=\newline\verb= taylor.h>= |
---|
4230 | \end{minipage} & |
---|
4231 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
4232 | \boldmath $\rightarrow$ \unboldmath |
---|
4233 | & provides ``easy to use'' drivers for evaluation |
---|
4234 | of higher order derivative tensors (see |
---|
4235 | \autoref{higherOrderDeriv}) and inverse/implicit function |
---|
4236 | differentiation (see \autoref{implicitInverse});\\ |
---|
4237 | $\bullet$ & comprises C++ and C-callable versions. |
---|
4238 | \end{tabular*} |
---|
4239 | \\ \hline |
---|
4240 | \verb=<adolc/adalloc.h>= & |
---|
4241 | \begin{tabular*}{10.5cm}{cp{9.5cm}} |
---|
4242 | \boldmath $\rightarrow$ \unboldmath |
---|
4243 | & provides C++ and C functions for allocation of |
---|
4244 | vectors, matrices and three dimensional arrays |
---|
4245 | of {\sf double}s. |
---|
4246 | \end{tabular*} |
---|
4247 | \\ \hline |
---|
4248 | \end{tabular} |
---|
4249 | \caption{Important header files: evaluation of derivatives} |
---|
4250 | \label{importantHeaders2} |
---|
4251 | \end{table} |
---|
4252 | % |
---|
4253 | \subsection{Compiling and Linking C/C++ Programs} |
---|
4254 | % |
---|
4255 | To compile a C/C++ program or single module using ADOL-C |
---|
4256 | data types and routines one has to ensure that all necessary |
---|
4257 | header files according to \autoref{ssec:DesIH} are |
---|
4258 | included. All modules involving {\em active} data types as |
---|
4259 | {\sf adouble} |
---|
4260 | %, {\bf adoublev} and {\bf adoublem} |
---|
4261 | have to be compiled as C++. Modules that make use of a previously |
---|
4262 | generated tape to evaluate derivatives can either be programmed in ANSI-C |
---|
4263 | (while avoiding all C++ interfaces) or in C++. Depending |
---|
4264 | on the chosen programming language the header files provide |
---|
4265 | the right ADOL-C prototypes. |
---|
4266 | For linking the resulting object codes the library \verb=libadolc.*= |
---|
4267 | must be used (see \autoref{genlib}). |
---|
4268 | % |
---|
4269 | \subsection{Adding Quadratures as Special Functions} |
---|
4270 | % |
---|
4271 | \label{quadrat} |
---|
4272 | % |
---|
4273 | Suppose an integral |
---|
4274 | \[ f(x) = \int\limits^{x}_{0} g(t) dt \] |
---|
4275 | is evaluated numerically by a user-supplied function |
---|
4276 | \begin{center} |
---|
4277 | {\sf double myintegral(double\& x);} |
---|
4278 | \end{center} |
---|
4279 | Similarly, let us suppose that the integrand itself is evaluated by |
---|
4280 | a user-supplied block of C code {\sf integrand}, which computes a |
---|
4281 | variable with the fixed name {\sf val} from a variable with the fixed |
---|
4282 | name {\sf arg}. In many cases of interest, {\sf integrand} will |
---|
4283 | simply be of the form |
---|
4284 | \begin{center} |
---|
4285 | {\sf \{ val = expression(arg) \}}\enspace . |
---|
4286 | \end{center} |
---|
4287 | In general, the final assignment to {\sf val} may be preceded |
---|
4288 | by several intermediate calculations, possibly involving local |
---|
4289 | active variables of type {\sf adouble}, but no external or static |
---|
4290 | variables of that type. However, {\sf integrand} may involve local |
---|
4291 | or global variables of type {\sf double} or {\sf int}, provided they |
---|
4292 | do not depend on the value of {\sf arg}. The variables {\sf arg} and |
---|
4293 | {\sf val} are declared automatically; and as {\sf integrand} is a block |
---|
4294 | rather than a function, {\sf integrand} should have no header line. |
---|
4295 | |
---|
4296 | Now the function {\sf myintegral} can be overloaded for {\sf adouble} |
---|
4297 | arguments and thus included in the library of elementary functions |
---|
4298 | by the following modifications: |
---|
4299 | \begin{enumerate} |
---|
4300 | \item |
---|
4301 | At the end of the file \verb=<adouble.cpp>=, include the full code |
---|
4302 | defining \\ {\sf double myintegral(double\& x)}, and add the line |
---|
4303 | \begin{center} |
---|
4304 | {\sf extend\_quad(myintegral, integrand); } |
---|
4305 | \end{center} |
---|
4306 | This macro is extended to the definition of |
---|
4307 | {\sf adouble myintegral(adouble\& arg)}. |
---|
4308 | Then remake the library \verb=libadolc.*= (see \autoref{genlib}). |
---|
4309 | \item |
---|
4310 | In the definition of the class |
---|
4311 | {\sf ADOLC\_DLL\_EXPORT adouble} in \verb=<adolc/adouble.h>=, add the statement |
---|
4312 | \begin{center} |
---|
4313 | {\sf friend adouble myintegral(adouble\&)}. |
---|
4314 | \end{center} |
---|
4315 | \end{enumerate} |
---|
4316 | In the first modification, {\sf myintegral} represents the name of the |
---|
4317 | {\sf double} function, whereas {\sf integrand} represents the actual block |
---|
4318 | of C code. |
---|
4319 | |
---|
4320 | For example, in case of the inverse hyperbolic cosine, we have |
---|
4321 | {\sf myintegral} = {\sf acosh}. Then {\sf integrand} can be written as |
---|
4322 | {\sf \{ val = sqrt(arg*arg-1); \}} |
---|
4323 | so that the line |
---|
4324 | \begin{center} |
---|
4325 | {\sf extend\_quad(acosh,val = sqrt(arg*arg-1));} |
---|
4326 | \end{center} |
---|
4327 | can be added to the file \verb=<adouble.cpp>=. |
---|
4328 | A mathematically equivalent but longer representation of |
---|
4329 | {\sf integrand} is |
---|
4330 | \begin{center} |
---|
4331 | \begin{tabbing} |
---|
4332 | {\sf \{ }\hspace{1.0in}\= {\sf \{ adouble} \= temp = \kill |
---|
4333 | \>{\sf \{ adouble} \> {\sf temp = arg;} \\ |
---|
4334 | \> \ \> {\sf temp = temp*temp; } \\ |
---|
4335 | \> \ \> {\sf val = sqrt(temp-1); \}} |
---|
4336 | \end{tabbing} |
---|
4337 | \end{center} |
---|
4338 | The code block {\sf integrand} may call on any elementary function that has already |
---|
4339 | been defined in file \verb=<adouble.cpp>=, so that one may also introduce |
---|
4340 | iterated integrals. |
---|
4341 | % |
---|
4342 | % |
---|
4343 | \section{Example Codes} |
---|
4344 | \label{example} |
---|
4345 | % |
---|
4346 | The following listings are all simplified versions of codes that |
---|
4347 | are contained in the example subdirectory |
---|
4348 | \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= of ADOL-C. In particular, |
---|
4349 | we have left out timings, which are included in the complete codes. |
---|
4350 | % |
---|
4351 | \subsection{Speelpenning's Example ({\tt speelpenning.cpp})} |
---|
4352 | % |
---|
4353 | The first example evaluates the gradient and the Hessian of |
---|
4354 | the function |
---|
4355 | \[ |
---|
4356 | y \; = \; f(x)\; =\; \prod_{i=0}^{n-1} x_i |
---|
4357 | \] |
---|
4358 | using the appropriate drivers {\sf gradient} and {\sf hessian}. |
---|
4359 | |
---|
4360 | \begin{verbatim} |
---|
4361 | #include <adolc/adouble.h> // use of active doubles and taping |
---|
4362 | #include <adolc/drivers/drivers.h> // use of "Easy to Use" drivers |
---|
4363 | // gradient(.) and hessian(.) |
---|
4364 | #include <adolc/taping.h> // use of taping |
---|
4365 | ... |
---|
4366 | void main() { |
---|
4367 | int n,i,j; |
---|
4368 | size_t tape_stats[STAT_SIZE]; |
---|
4369 | cout << "SPEELPENNINGS PRODUCT (ADOL-C Documented Example) \n"; |
---|
4370 | cout << "number of independent variables = ? \n"; |
---|
4371 | cin >> n; |
---|
4372 | double* xp = new double[n]; |
---|
4373 | double yp = 0.0; |
---|
4374 | adouble* x = new adouble[n]; |
---|
4375 | adouble y = 1; |
---|
4376 | for(i=0;i<n;i++) |
---|
4377 | xp[i] = (i+1.0)/(2.0+i); // some initialization |
---|
4378 | trace_on(1); // tag =1, keep=0 by default |
---|
4379 | for(i=0;i<n;i++) { |
---|
4380 | x[i] <<= xp[i]; y *= x[i]; } |
---|
4381 | y >>= yp; |
---|
4382 | delete[] x; |
---|
4383 | trace_off(); |
---|
4384 | tapestats(1,tape_stats); // reading of tape statistics |
---|
4385 | cout<<"maxlive "<<tape_stats[2]<<"\n"; |
---|
4386 | ... // ..... print other tape stats |
---|
4387 | double* g = new double[n]; |
---|
4388 | gradient(1,n,xp,g); // gradient evaluation |
---|
4389 | double** H=(double**)malloc(n*sizeof(double*)); |
---|
4390 | for(i=0;i<n;i++) |
---|
4391 | H[i]=(double*)malloc((i+1)*sizeof(double)); |
---|
4392 | hessian(1,n,xp,H); // H equals (n-1)g since g is |
---|
4393 | double errg = 0; // homogeneous of degree n-1. |
---|
4394 | double errh = 0; |
---|
4395 | for(i=0;i<n;i++) |
---|
4396 | errg += fabs(g[i]-yp/xp[i]); // vanishes analytically. |
---|
4397 | for(i=0;i<n;i++) { |
---|
4398 | for(j=0;j<n;j++) { |
---|
4399 | if (i>j) // lower half of hessian |
---|
4400 | errh += fabs(H[i][j]-g[i]/xp[j]); } } |
---|
4401 | cout << yp-1/(1.0+n) << " error in function \n"; |
---|
4402 | cout << errg <<" error in gradient \n"; |
---|
4403 | cout << errh <<" consistency check \n"; |
---|
4404 | } // end main |
---|
4405 | \end{verbatim} |
---|
4406 | % |
---|
4407 | \subsection{Power Example ({\tt powexam.cpp})} |
---|
4408 | % |
---|
4409 | The second example function evaluates the $n$-th power of a real |
---|
4410 | variable $x$ in |
---|
4411 | $\log_2 n$ multiplications by recursive halving of the exponent. Since |
---|
4412 | there is only one independent variable, the scalar derivative can be |
---|
4413 | computed by |
---|
4414 | using both {\sf forward} and {\sf reverse}, and the |
---|
4415 | results are subsequently compared. |
---|
4416 | \begin{verbatim} |
---|
4417 | #include <adolc/adolc.h> // use of ALL ADOL-C interfaces |
---|
4418 | |
---|
4419 | adouble power(adouble x, int n) { |
---|
4420 | adouble z=1; |
---|
4421 | if (n>0) { // recursion and branches |
---|
4422 | int nh =n/2; // that do not depend on |
---|
4423 | z = power(x,nh); // adoubles are fine !!!! |
---|
4424 | z *= z; |
---|
4425 | if (2*nh != n) |
---|
4426 | z *= x; |
---|
4427 | return z; } // end if |
---|
4428 | else { |
---|
4429 | if (n==0) // the local adouble z dies |
---|
4430 | return z; // as it goes out of scope. |
---|
4431 | else |
---|
4432 | return 1/power(x,-n); } // end else |
---|
4433 | } // end power |
---|
4434 | \end{verbatim} |
---|
4435 | The function {\sf power} above was obtained from the original |
---|
4436 | undifferentiated version by simply changing the type of all |
---|
4437 | {\sf double}s including the return variable to {\sf adouble}s. The new version |
---|
4438 | can now be called from within any active section, as in the following |
---|
4439 | main program. |
---|
4440 | \begin{verbatim} |
---|
4441 | #include ... // as above |
---|
4442 | int main() { |
---|
4443 | int i,n,tag=1; |
---|
4444 | cout <<"COMPUTATION OF N-TH POWER (ADOL-C Documented Example)\n\n"; |
---|
4445 | cout<<"monomial degree=? \n"; // input the desired degree |
---|
4446 | cin >> n; |
---|
4447 | // allocations and initializations |
---|
4448 | double* Y[1]; |
---|
4449 | *Y = new double[n+2]; |
---|
4450 | double* X[1]; // allocate passive variables with |
---|
4451 | *X = new double[n+4]; // extra dimension for derivatives |
---|
4452 | X[0][0] = 0.5; // function value = 0. coefficient |
---|
4453 | X[0][1] = 1.0; // first derivative = 1. coefficient |
---|
4454 | for(i=0;i<n+2;i++) |
---|
4455 | X[0][i+2]=0; // further coefficients |
---|
4456 | double* Z[1]; // used for checking consistency |
---|
4457 | *Z = new double[n+2]; // between forward and reverse |
---|
4458 | adouble y,x; // declare active variables |
---|
4459 | // beginning of active section |
---|
4460 | trace_on(1); // tag = 1 and keep = 0 |
---|
4461 | x <<= X[0][0]; // only one independent var |
---|
4462 | y = power(x,n); // actual function call |
---|
4463 | y >>= Y[0][0]; // only one dependent adouble |
---|
4464 | trace_off(); // no global adouble has died |
---|
4465 | // end of active section |
---|
4466 | double u[1]; // weighting vector |
---|
4467 | u[0]=1; // for reverse call |
---|
4468 | for(i=0;i<n+2;i++) { // note that keep = i+1 in call |
---|
4469 | forward(tag,1,1,i,i+1,X,Y); // evaluate the i-the derivative |
---|
4470 | if (i==0) |
---|
4471 | cout << Y[0][i] << " - " << y.value() << " = " << Y[0][i]-y.value() |
---|
4472 | << " (should be 0)\n"; |
---|
4473 | else |
---|
4474 | cout << Y[0][i] << " - " << Z[0][i] << " = " << Y[0][i]-Z[0][i] |
---|
4475 | << " (should be 0)\n"; |
---|
4476 | reverse(tag,1,1,i,u,Z); // evaluate the (i+1)-st derivative |
---|
4477 | Z[0][i+1]=Z[0][i]/(i+1); } // scale derivative to Taylorcoeff. |
---|
4478 | return 1; |
---|
4479 | } // end main |
---|
4480 | \end{verbatim} |
---|
4481 | Since this example has only one independent and one dependent variable, |
---|
4482 | {\sf forward} and {\sf reverse} have the same complexity and calculate |
---|
4483 | the same scalar derivatives, albeit with a slightly different scaling. |
---|
4484 | By replacing the function {\sf power} with any other univariate test function, |
---|
4485 | one can check that {\sf forward} and {\sf reverse} are at least consistent. |
---|
4486 | In the following example the number of independents is much larger |
---|
4487 | than the number of dependents, which makes the reverse mode preferable. |
---|
4488 | % |
---|
4489 | \subsection{Determinant Example ({\tt detexam.cpp})} |
---|
4490 | % |
---|
4491 | Now let us consider an exponentially expensive calculation, |
---|
4492 | namely, the evaluation of a determinant by recursive expansion |
---|
4493 | along rows. The gradient of the determinant with respect to the |
---|
4494 | matrix elements is simply the adjoint, i.e. the matrix of cofactors. |
---|
4495 | Hence the correctness of the numerical result is easily checked by |
---|
4496 | matrix-vector multiplication. The example illustrates the use |
---|
4497 | of {\sf adouble} arrays and pointers. |
---|
4498 | |
---|
4499 | \begin{verbatim} |
---|
4500 | #include <adolc/adouble.h> // use of active doubles and taping |
---|
4501 | #include <adolc/interfaces.h> // use of basic forward/reverse |
---|
4502 | // interfaces of ADOL-C |
---|
4503 | adouble** A; // A is an n x n matrix |
---|
4504 | int i,n; // k <= n is the order |
---|
4505 | adouble det(int k, int m) { // of the sub-matrix |
---|
4506 | if (m == 0) return 1.0 ; // its column indices |
---|
4507 | else { // are encoded in m |
---|
4508 | adouble* pt = A[k-1]; |
---|
4509 | adouble t = zero; // zero is predefined |
---|
4510 | int s, p =1; |
---|
4511 | if (k%2) s = 1; else s = -1; |
---|
4512 | for(i=0;i<n;i++) { |
---|
4513 | int p1 = 2*p; |
---|
4514 | if (m%p1 >= p) { |
---|
4515 | if (m == p) { |
---|
4516 | if (s>0) t += *pt; else t -= *pt; } |
---|
4517 | else { |
---|
4518 | if (s>0) |
---|
4519 | t += *pt*det(k-1,m-p); // recursive call to det |
---|
4520 | else |
---|
4521 | t -= *pt*det(k-1,m-p); } // recursive call to det |
---|
4522 | s = -s;} |
---|
4523 | ++pt; |
---|
4524 | p = p1;} |
---|
4525 | return t; } |
---|
4526 | } // end det |
---|
4527 | \end{verbatim} |
---|
4528 | As one can see, the overloading mechanism has no problem with pointers |
---|
4529 | and looks exactly the same as the original undifferentiated function |
---|
4530 | except for the change of type from {\sf double} to {\sf adouble}. |
---|
4531 | If the type of the temporary {\sf t} or the pointer {\sf pt} had not been changed, |
---|
4532 | a compile time error would have resulted. Now consider a corresponding |
---|
4533 | calling program. |
---|
4534 | |
---|
4535 | \begin{verbatim} |
---|
4536 | #include ... // as above |
---|
4537 | int main() { |
---|
4538 | int i,j, m=1,tag=1,keep=1; |
---|
4539 | cout << "COMPUTATION OF DETERMINANTS (ADOL-C Documented Example)\n\n"; |
---|
4540 | cout << "order of matrix = ? \n"; // select matrix size |
---|
4541 | cin >> n; |
---|
4542 | A = new adouble*[n]; |
---|
4543 | trace_on(tag,keep); // tag=1=keep |
---|
4544 | double detout=0.0, diag = 1.0; // here keep the intermediates for |
---|
4545 | for(i=0;i<n;i++) { // the subsequent call to reverse |
---|
4546 | m *=2; |
---|
4547 | A[i] = new adouble[n]; // not needed for adoublem |
---|
4548 | adouble* pt = A[i]; |
---|
4549 | for(j=0;j<n;j++) |
---|
4550 | A[i][j] <<= j/(1.0+i); // make all elements of A independent |
---|
4551 | diag += A[i][i].value(); // value() converts to double |
---|
4552 | A[i][i] += 1.0; } |
---|
4553 | det(n,m-1) >>= detout; // actual function call |
---|
4554 | printf("\n %f - %f = %f (should be 0)\n",detout,diag,detout-diag); |
---|
4555 | trace_off(); |
---|
4556 | double u[1]; |
---|
4557 | u[0] = 1.0; |
---|
4558 | double* B = new double[n*n]; |
---|
4559 | reverse(tag,1,n*n,1,u,B); |
---|
4560 | cout <<" \n first base? : "; |
---|
4561 | for (i=0;i<n;i++) { |
---|
4562 | adouble sum = 0; |
---|
4563 | for (j=0;j<n;j++) // the matrix A times the first n |
---|
4564 | sum += A[i][j]*B[j]; // components of the gradient B |
---|
4565 | cout<<sum.value()<<" "; } // must be a Cartesian basis vector |
---|
4566 | return 1; |
---|
4567 | } // end main |
---|
4568 | \end{verbatim} |
---|
4569 | The variable {\sf diag} should be mathematically |
---|
4570 | equal to the determinant, because the |
---|
4571 | matrix {\sf A} is defined as a rank 1 perturbation of the identity. |
---|
4572 | % |
---|
4573 | \subsection{Ordinary Differential Equation Example ({\tt odexam.cpp})} |
---|
4574 | \label{exam:ode} |
---|
4575 | % |
---|
4576 | Here, we consider a nonlinear ordinary differential equation that |
---|
4577 | is a slight modification of the Robertson test problem |
---|
4578 | given in Hairer and Wanner's book on the numerical solution of |
---|
4579 | ODEs \cite{HW}. The following source code computes the corresponding |
---|
4580 | values of $y^{\prime} \in \R^3$: |
---|
4581 | \begin{verbatim} |
---|
4582 | #include <adolc/adouble.h> // use of active doubles and taping |
---|
4583 | #include <adolc/drivers/odedrivers.h> // use of "Easy To use" ODE drivers |
---|
4584 | #include <adolc/adalloc.h> // use of ADOL-C allocation utilities |
---|
4585 | |
---|
4586 | void tracerhs(short int tag, double* py, double* pyprime) { |
---|
4587 | adouble y[3]; // this time we left the parameters |
---|
4588 | adouble yprime[3]; // passive and use the vector types |
---|
4589 | trace_on(tag); |
---|
4590 | for (int i=0; i<3; i++) |
---|
4591 | y[i] <<= py[i]; // initialize and mark independents |
---|
4592 | yprime[0] = -sin(y[2]) + 1e8*y[2]*(1-1/y[0]); |
---|
4593 | yprime[1] = -10*y[0] + 3e7*y[2]*(1-y[1]); |
---|
4594 | yprime[2] = -yprime[0] - yprime[1]; |
---|
4595 | yprime >>= pyprime; // mark and pass dependents |
---|
4596 | trace_off(tag); |
---|
4597 | } // end tracerhs |
---|
4598 | \end{verbatim} |
---|
4599 | The Jacobian of the right-hand side has large |
---|
4600 | negative eigenvalues, which make the ODE quite stiff. We have added |
---|
4601 | some numerically benign transcendentals to make the differentiation |
---|
4602 | more interesting. |
---|
4603 | The following main program uses {\sf forode} to calculate the Taylor series |
---|
4604 | defined by the ODE at the given point $y_0$ and {\sf reverse} as well |
---|
4605 | as {\sf accode} to compute the Jacobians of the coefficient vectors |
---|
4606 | with respect to $x_0$. |
---|
4607 | \begin{verbatim} |
---|
4608 | #include ....... // as above |
---|
4609 | int main() { |
---|
4610 | int i,j,deg; |
---|
4611 | int n=3; |
---|
4612 | double py[3]; |
---|
4613 | double pyp[3]; |
---|
4614 | cout << "MODIFIED ROBERTSON TEST PROBLEM (ADOL-C Documented Example)\n"; |
---|
4615 | cout << "degree of Taylor series =?\n"; |
---|
4616 | cin >> deg; |
---|
4617 | double **X; |
---|
4618 | X=(double**)malloc(n*sizeof(double*)); |
---|
4619 | for(i=0;i<n;i++) |
---|
4620 | X[i]=(double*)malloc((deg+1)*sizeof(double)); |
---|
4621 | double*** Z=new double**[n]; |
---|
4622 | double*** B=new double**[n]; |
---|
4623 | short** nz = new short*[n]; |
---|
4624 | for(i=0;i<n;i++) { |
---|
4625 | Z[i]=new double*[n]; |
---|
4626 | B[i]=new double*[n]; |
---|
4627 | for(j=0;j<n;j++) { |
---|
4628 | Z[i][j]=new double[deg]; |
---|
4629 | B[i][j]=new double[deg]; } // end for |
---|
4630 | } // end for |
---|
4631 | for(i=0;i<n;i++) { |
---|
4632 | py[i] = (i == 0) ? 1.0 : 0.0; // initialize the base point |
---|
4633 | X[i][0] = py[i]; // and the Taylor coefficient; |
---|
4634 | nz[i] = new short[n]; } // set up sparsity array |
---|
4635 | tracerhs(1,py,pyp); // trace RHS with tag = 1 |
---|
4636 | forode(1,n,deg,X); // compute deg coefficients |
---|
4637 | reverse(1,n,n,deg-1,Z,nz); // U defaults to the identity |
---|
4638 | accode(n,deg-1,Z,B,nz); |
---|
4639 | cout << "nonzero pattern:\n"; |
---|
4640 | for(i=0;i<n;i++) { |
---|
4641 | for(j=0;j<n;j++) |
---|
4642 | cout << nz[i][j]<<"\t"; |
---|
4643 | cout <<"\n"; } // end for |
---|
4644 | return 1; |
---|
4645 | } // end main |
---|
4646 | \end{verbatim} |
---|
4647 | \noindent The pattern {\sf nz} returned by {\sf accode} is |
---|
4648 | \begin{verbatim} |
---|
4649 | 3 -1 4 |
---|
4650 | 1 2 2 |
---|
4651 | 3 2 4 |
---|
4652 | \end{verbatim} |
---|
4653 | The original pattern {\sf nz} returned by {\sf reverse} is the same |
---|
4654 | except that the negative entry $-1$ was zero. |
---|
4655 | % |
---|
4656 | %\subsection {Gaussian Elimination Example ({\tt gaussexam.cpp})} |
---|
4657 | %\label{gaussexam} |
---|
4658 | % |
---|
4659 | %The following example uses conditional assignments to show the usage of a once produced tape |
---|
4660 | %for evaluation at new arguments. The elimination is performed with |
---|
4661 | %column pivoting. |
---|
4662 | %\begin{verbatim} |
---|
4663 | %#include <adolc/adolc.h> // use of ALL ADOL-C interfaces |
---|
4664 | |
---|
4665 | %void gausselim(int n, adoublem& A, adoublev& bv) { |
---|
4666 | %along i; // active integer declaration |
---|
4667 | %adoublev temp(n); // active vector declaration |
---|
4668 | %adouble r,rj,temps; |
---|
4669 | %int j,k; |
---|
4670 | %for(k=0;k<n;k++) { // elimination loop |
---|
4671 | % i = k; |
---|
4672 | % r = fabs(A[k][k]); // initial pivot size |
---|
4673 | % for(j=k+1;j<n;j++) { |
---|
4674 | % rj = fabs(A[j][k]); |
---|
4675 | % condassign(i,rj-r,j); // look for a larger element in the same |
---|
4676 | % condassign(r,rj-r,rj); } // column with conditional assignments |
---|
4677 | % temp = A[i]; // switch rows using active subscripting |
---|
4678 | % A[i] = A[k]; // necessary even if i happens to equal |
---|
4679 | % A[k] = temp; // k during taping |
---|
4680 | % temps = bv[i]; |
---|
4681 | % bv[i]=bv[k]; |
---|
4682 | % bv[k]=temps; |
---|
4683 | % if (!value(A[k][k])) // passive subscripting |
---|
4684 | % exit(1); // matrix singular! |
---|
4685 | % temps= A[k][k]; |
---|
4686 | % A[k] /= temps; |
---|
4687 | % bv[k] /= temps; |
---|
4688 | % for(j=k+1;j<n;j++) { |
---|
4689 | % temps= A[j][k]; |
---|
4690 | % A[j] -= temps*A[k]; // vector operations |
---|
4691 | % bv[j] -= temps*bv[k]; } // endfor |
---|
4692 | %} // end elimination loop |
---|
4693 | %temp=0.0; |
---|
4694 | %for(k=n-1;k>=0;k--) // backsubstitution |
---|
4695 | % temp[k] = (bv[k]-(A[k]*temp))/A[k][k]; |
---|
4696 | %bv=temp; |
---|
4697 | %} // end gausselim |
---|
4698 | %\end{verbatim} |
---|
4699 | %\noindent This function can be called from any program |
---|
4700 | %that suitably initializes |
---|
4701 | %the components of {\sf A} and {\sf bv} |
---|
4702 | %as independents. The resulting tape can be |
---|
4703 | %used to solve any nonsingular linear system of the same size and |
---|
4704 | %to get the sensitivities of the solution with respect to the |
---|
4705 | %system matrix and the right hand side. |
---|
4706 | %\vspace*{-4mm} |
---|
4707 | % |
---|
4708 | \section*{Acknowledgements} |
---|
4709 | % |
---|
4710 | Parts of the ADOL-C source were developed by Andreas |
---|
4711 | Kowarz, Hristo Mitev, Sebastian Schlenkrich, and Olaf |
---|
4712 | Vogel. We are also indebted to George Corliss, |
---|
4713 | Tom Epperly, Bruce Christianson, David Gay, David Juedes, |
---|
4714 | Brad Karp, Koichi Kubota, Bob Olson, Marcela Rosemblun, Dima |
---|
4715 | Shiriaev, Jay Srinivasan, Chuck Tyner, Jean Utke, and Duane Yoder for helping in |
---|
4716 | various ways with the development and documentation of ADOL-C. |
---|
4717 | % |
---|
4718 | \begin{thebibliography}{10} |
---|
4719 | |
---|
4720 | \bibitem{BeKh96} |
---|
4721 | Christian~H. Bischof, Peyvand~M. Khademi, Ali Bouaricha and Alan Carle. |
---|
4722 | \newblock {\em Efficient computation of gradients and Jacobians by dynamic |
---|
4723 | exploitation of sparsity in automatic differentiation}. |
---|
4724 | \newblock Optimization Methods and Software 7(1):1-39, 1996. |
---|
4725 | |
---|
4726 | \bibitem{Chri91a} |
---|
4727 | Bruce Christianson. |
---|
4728 | \newblock {\em Reverse accumulation and accurate rounding error estimates for |
---|
4729 | Taylor series}. |
---|
4730 | \newblock Optimization Methods and Software 1:81--94, 1992. |
---|
4731 | |
---|
4732 | \bibitem{GeMaPo05} |
---|
4733 | Assefaw Gebremedhin, Fredrik Manne, and Alex Pothen. |
---|
4734 | \newblock {\em What color is your {J}acobian? {G}raph coloring for computing |
---|
4735 | derivatives}. |
---|
4736 | \newblock SIAM Review 47(4):629--705, 2005. |
---|
4737 | |
---|
4738 | \bibitem{GePoTaWa06} |
---|
4739 | Assefaw Gebremedhin, Alex Pothen, Arijit Tarafdar and Andrea Walther. |
---|
4740 | {\em Efficient Computation of Sparse Hessians: An Experimental Study |
---|
4741 | using ADOL-C}. Tech. Rep. (2006). To appear in INFORMS Journal on Computing. |
---|
4742 | |
---|
4743 | \bibitem{GePoWa08} Assefaw Gebremedhin, Alex Pothen, and Andrea |
---|
4744 | Walther. |
---|
4745 | {\em Exploiting Sparsity in Jacobian Computation via Coloring and Automatic Differentiation: |
---|
4746 | a Case Study in a Simulated Moving Bed Process}. |
---|
4747 | In Chr. Bischof et al., eds., {\em Proceedings AD 2008 conference}, LNCSE 64, pp. 327 -- 338, Springer (2008). |
---|
4748 | |
---|
4749 | \bibitem{GeTaMaPo07} |
---|
4750 | Assefaw Gebremedhin, Arijit Tarafdar, Fredrik Manne, and Alex Pothen, |
---|
4751 | {\em New Acyclic and Star Coloring Algorithms with Applications to Hessian Computation}. |
---|
4752 | SIAM Journal on Scientific Computing 29(3):1042--1072, 2007. |
---|
4753 | |
---|
4754 | \bibitem{Griewank13} |
---|
4755 | Andreas Griewank, |
---|
4756 | {\em On stable piecewise linearization and generalized algorithmic differentiation}. |
---|
4757 | Optimization Methods and Software 28(6):1139--1178, 2013. |
---|
4758 | |
---|
4759 | \bibitem{GrWa08} |
---|
4760 | Andreas Griewank and Andrea Walther: {\em Evaluating Derivatives, Principles and Techniques of |
---|
4761 | Algorithmic Differentiation. Second edition}. SIAM, 2008. |
---|
4762 | |
---|
4763 | |
---|
4764 | \bibitem{Griewank97} |
---|
4765 | Andreas Griewank, Jean Utke, and Andrea Walther. |
---|
4766 | \newblock {\em Evaluating higher derivative tensors by forward propagation |
---|
4767 | of univariate Taylor series}. |
---|
4768 | \newblock Mathematics of Computation, 69:1117--1130, 2000. |
---|
4769 | |
---|
4770 | \bibitem{GrWa00} |
---|
4771 | Andreas Griewank and Andrea Walther. {\em Revolve: An Implementation of Checkpointing for the Reverse |
---|
4772 | or Adjoint Mode of Computational Differentiation}, |
---|
4773 | ACM Transaction on Mathematical Software 26:19--45, 2000. |
---|
4774 | |
---|
4775 | \bibitem{HW} |
---|
4776 | Ernst Hairer and Gerhard Wanner. |
---|
4777 | {\it Solving Ordinary Differential Equations II.\/} |
---|
4778 | Springer-Verlag, Berlin, 1991. |
---|
4779 | |
---|
4780 | \bibitem{Knuth73} |
---|
4781 | Donald~E. Knuth. |
---|
4782 | \newblock {\em The Art of Computer Programming. Second edition.} |
---|
4783 | \newblock Addison-Wesley, Reading, 1973. |
---|
4784 | |
---|
4785 | \bibitem{Roeb05} |
---|
4786 | Klaus R\"{o}benack. |
---|
4787 | \newblock{\em Computation of Lie Derivatives of Tensor Fields Required for |
---|
4788 | Nonlinear Controller and Observer Design Employing Automatic Differentiation. } |
---|
4789 | \newblock PAMM, 5(1): 181-184, 2005. |
---|
4790 | |
---|
4791 | \bibitem{Roeb11} |
---|
4792 | Klaus R\"{o}benack, Jan Winkler and Siqian Wang. |
---|
4793 | \newblock {\em LIEDRIVERS - A Toolbox for the Efficient Computation of Lie |
---|
4794 | Derivatives Based on the Object-Oriented Algorithmic Differentiation |
---|
4795 | Package ADOL-C. } |
---|
4796 | \newblock Proc. of the 4th International Workshop on Equation-Based Object-Oriented |
---|
4797 | Modeling Languages and Tools, F. E. Cellier and D. Broman and P. Fritzson |
---|
4798 | and E. A. Lee, editors, volume 56 of Link\"oping Electronic Conference Proceedings, |
---|
4799 | pages 57-66, Zurich 2011. |
---|
4800 | \bibitem{Wa05a} |
---|
4801 | Andrea Walther. |
---|
4802 | \newblock {\em Computing Sparse Hessians with Automatic Differentiation}. |
---|
4803 | \newblock Transaction on Mathematical Software, 34(1), Artikel 3 (2008). |
---|
4804 | \end{thebibliography} |
---|
4805 | \end{document} |
---|
4806 | |
---|