source: trunk/ADOL-C/doc/adolc-manual.tex @ 764

Last change on this file since 764 was 764, checked in by mbanovic, 19 months ago

Merge branch swig into svn.

The following commits have been merged:

commit 15c64c07d051ef872afd8bc04f772624ac3c0580
Merge: a22cb31 617f278
Author: Mladen Banovic <mladenbanovic2705@…>
Date: Wed Jan 30 14:53:13 2019 +0100

Merge branch 'swig'

commit 617f2781bde18dd7c58f3a2554dec70d94cedd92
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Fri Dec 21 16:23:11 2018 +0100

remove unnecessary %include

commit b7718a5407516afea3b1dabafeb3fa084a9b1ab3
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Fri Dec 21 16:12:06 2018 +0100

fix destructors for EDF classes

commit 79ae7862f370fbbc04033d42cfd796b429c19e5f
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Fri Dec 21 15:52:29 2018 +0100

improve example

commit 0dbe7842bde1f9f92909fed8459e8298183c7890
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Fri Dec 21 15:47:43 2018 +0100

add python examples

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 08ae3d98cadfb86016a48322319548d549ae3418
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Fri Dec 21 15:47:14 2018 +0100

Make external functions work

EDFobject_v2 is not implemented yet

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit acfd9c5e41c2c2e99d7e530f0837dd6a5b0dac96
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Tue Dec 11 12:07:01 2018 +0100

regenerate doc

commit c673f73ee35f8b8ce74258add01e085e68c0c13a
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Tue Dec 11 12:06:21 2018 +0100

add python and R installation to doc

commit 317c3c5a6744d38091e809d804c23d637d900fe9
Author: kulshres <kulshres@94ac48a7-3327-4b6a-8511-9a4036a20e83>
Date: Wed Jun 13 15:21:36 2018 +0000

make init_lib() static to avoid name clashes in other units

git-svn-id: 94ac48a7-3327-4b6a-8511-9a4036a20e83

commit 9a291b9b5646ae665382efafc3e9993c51aa4bc0
Author: kulshres <kulshres@94ac48a7-3327-4b6a-8511-9a4036a20e83>
Date: Wed Jun 13 15:02:22 2018 +0000

rename void init() to void init_lib()

this function is totally internal to tape_handling.cpp and called
from the constructor of Keeper class. However the name init() seems to refer
to something else in the scope of the class and correct initialization
isn't done, causing errors later.

git-svn-id: 94ac48a7-3327-4b6a-8511-9a4036a20e83

commit f84f47f5f12c9a69ef44ccefe1b654fa3524cda3
Merge: 5021e4a 6c0d2d5
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Oct 11 21:53:14 2017 +0200

Merge branch 'swig' of into swig

commit 5021e4ab8cab3fd0760635e13c1a59e25f3c7eb6
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Oct 11 21:44:36 2017 +0200

add abs and rename asin/acos/atan to arcsin/arccos/arctan etc

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 6c0d2d5c9c90350b384b360dfb9659c9f2ca4bd6
Merge: cba0df1 2444ff3
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Oct 5 19:39:47 2017 +0000

Merge branch 'swig' into 'swig'

Python and R interface

See merge request adol-c/adol-c!11

commit 2444ff3a28e4a33a4670c8ebf6956f7c85a2274e
Merge: 068b059 946c873
Author: Sri Hari Krishna Narayanan <snarayan@…>
Date: Thu Oct 5 12:26:38 2017 -0500

Merge branch 'swig' of into swig

commit 068b0592f59cbd11a434dca90be74ffbb767bde3
Author: Sri Hari Krishna Narayanan <snarayan@…>
Date: Thu Oct 5 12:19:52 2017 -0500

Change all use of clib to lib

commit ac987aca06b5c932adbd31bcf08811008b325257
Author: Sri Hari Krishna Narayanan <snarayan@…>
Date: Thu Oct 5 12:10:27 2017 -0500

Add python linking libraries

commit cba0df178387670ec3a1b94c237e13542090e78d
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Tue Sep 5 13:37:03 2017 +0200

make psdrivers numpy conformant and remove one warning

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit e450b62a4b3e501e9940400137fc89351dbdb8ba
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Tue Sep 5 13:35:07 2017 +0200

Make interface of psdrivers conformant and remove redundant parameter

Letters are as represented in the 2017 OMS paper.

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 83c5f035130849b98a299857c7c5524a6ffb29a2
Merge: a50a2e1 363e176
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Aug 3 14:05:37 2017 +0200

Merge branch 'master' into swig

commit a50a2e19f79d3178592948c400d6eb89ed828b06
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jun 22 13:52:26 2017 +0200

make install do everything

Of course this assumes you have autotools, libtool, and C++ compiler
previously installed as well as boost library and colpack.

commit 1a5f22002c83b99e8d69c297ae89cb03cc304e50
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jun 22 13:51:41 2017 +0200

add missing support for numpy.sqrt

this was seemingly just oversight

commit f06f5011e183a203eca79444d6b175556f242f6a
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Fri Jun 16 12:43:46 2017 +0200

add a for compatibilty with python installers

like pip or anaconda. In this case the will compile the C/C++
original library as well as the swig wrapper.
arguments to build are:

will install the C/C++ library andn headers in <DIR>/include <DIR>/lib64


will assume that ColPack? is installed in <DIR>/include and <DIR>/lib64


will assume that boost is installed in <DIR>/include and <DIR>/lib64

I haven't tested on anything except 64-bit linux until now.

for install I've only tested installing with --user option

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit cd793f6a8503a1e3751cfaeb843af249c522ce46
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Fri May 5 10:16:23 2017 +0200

a slightly more meaningful naming

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 4c2a248fe5937daf292948e8ace9af0e4ccdffc1
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Fri May 5 10:08:24 2017 +0200

Fix Python Exception properly

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit f272c673fcb0941f3f054462eb9843be4e8b829a
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Apr 26 17:29:37 2017 +0200

export the cleanUp() function for wrappers

commit 0ed3a9394d3ed0e2804a3cef8b39dba105817e84
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Apr 19 18:09:39 2017 +0200

python3 compatibility and slight change in

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit c085c16e96c2f5bce7f4a50a77d4164bcaa47cf6
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Feb 16 15:58:46 2017 +0100

add headers for implicit inclusions in edfclasses.h

commit 239078661e4d41e6ecdb131311f2fafe560874f3
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Feb 16 09:24:17 2017 +0100

add support for vectors and matrices in octave drivers

forward and reverse drivers to come. sparse drivers to come

commit 19e89992da84bdf392233c8add69d02bed714e60
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Dec 22 10:24:44 2016 +0100

Add swig interface for octave

commit 76a264d5412c5599102a4c9a182d3163a6ab6851
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Fri Nov 18 15:49:36 2016 +0100

rename persistent local header to .hpp

since it is a c++ header. And some temporary .h files are created that
will require cleaning in 'make clean'

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit b59d463221a708e98af350decd97bdd26feb27d7
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Fri Nov 18 14:01:19 2016 +0100

reorder some lines to fix compilation and %ignore issues

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 26535bfc6d9db954c246d991b5ef09d8cd8d66d1
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Nov 17 10:43:14 2016 +0100

Add directors for python extension of edfclasses.h

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit c522a5f6400e819eb393cfe44a68cae06fbb9ffe
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Nov 2 14:26:11 2016 -0500

call() must not be overloaded by child classes either

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 98fc1f127e71c3ffef73ed43525cd21ec45942b1
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Nov 2 14:19:59 2016 -0500

init_edf() must not be overloaded by child classes

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 1e5e0c68a7a0ff3369e9a3991edaa7d04164a3e5
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Nov 2 14:05:40 2016 -0500

design a C++ object oriented wrapper for external functions

it is easier to work with classes, inheritence and overloaded member
functions in scripting languages like python and R instead of raw
pointers to functions in order to call arbitrary code.

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 782465d4175a2453934cd1ffe9649e3f78a42c75
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Sep 14 09:49:12 2016 +0200

add support for calling numpy.cos and friends on adouble arrays

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit b0ed723914bdc46ce9128727972dc287288b0a71
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Aug 10 18:28:43 2016 +0200

helper for printing exception

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 6f263543749e7fcb688e8af5cde68da3745886b5
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Aug 10 18:06:59 2016 +0200

expose FatalError? to swig python interface

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 7f579bb5629eb5f5c5a49ce4d0484a306640748d
Merge: d2f49d3 3bbd2a2
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Aug 10 17:54:41 2016 +0200

Merge branch 'master' into swig

commit d2f49d373e00f583e9b7c0ee69c160fbae1322c7
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Aug 10 13:37:10 2016 +0200

improve logic in fmin and fmax

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit e138ca65b6361b23a4000c77128d0ed684662e1c
Merge: cafefb7 3c5e311
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Aug 10 13:31:06 2016 +0200

Merge remote-tracking branch 'origin/swig' into swig

commit cafefb7081ff8d5ba7c772b5b022404101b813e0
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Aug 10 13:30:32 2016 +0200

wrap newly added functions in master branch

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit ff975ebd035995f0e9dc01578ac3e84b91b1604f
Merge: 02e32a7 4c79b27
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Aug 10 13:11:42 2016 +0200

Merge remote-tracking branch 'origin/master' into swig

commit 02e32a7aa72d4310238f7ae9b85a52e29b46cf4a
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Aug 10 13:09:31 2016 +0200

add fmin, fmax, fabs, and printTapeStats for swig-python

fmin, fmax and fabs should work on scalars as well as badoubles

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 3c5e3112cdb978f0274f4d6cacd128abc034ff9b
Merge: dd06803 7aa4b9a
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Mon Aug 8 13:37:12 2016 +0200

Merge branch 'master' into swig

commit dd06803b3f40b3057976c315241f583bc0007397
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Aug 4 15:08:45 2016 +0200

raise exceptions for retracing in python

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit d253dfaa74f38b5b545f4f88ba08d86a65aa960a
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jul 21 13:49:16 2016 +0200

fix typo in variable name

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit ade470ecf44fad03025ef9fcfadc5cac5d3e9f80
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jul 21 11:36:06 2016 +0200

handle locint as an unsigned long

for some reason swig doesn't do this on it's own.

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit e60c95d9dd70d962aed3caed4e05d0ef48ffcab2
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jul 21 11:04:44 2016 +0200

use the compiler set up in ${CXX}

if ${CXX} is not set it defaults to 'c++'

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 42b439a47e2601325cb4d02aa27ac4b9c0535eea
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jul 21 03:12:12 2016 +0200

add numpy compatible set_param_vec()

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 774a2204590aeb57cbc18e654bc6fe21763142c2
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jul 21 02:41:22 2016 +0200

add function to convert numpy arrays directly to adouble arrays

works for any contiguous iterable type like list, tuple etc.

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit c3e04155d83b3ee76df294c2adea78ca1df865f0
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jul 14 10:57:43 2016 +0200

refresh swig patches for current swig master branch

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 48e0ef1cf7650d5b2e232d0d5450a158080d7849
Merge: 52445e4 06427e9
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Jul 14 10:36:59 2016 +0200

Merge branch 'master' into swig

commit 52445e41a8b768299084702418973ec79bd972c2
Merge: 666ba79 7507929
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Mar 23 14:59:05 2016 +0100

Merge branch 'master' into swig

commit 666ba7931e001bfd78171c7347b400b359dbd2d2
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Wed Mar 23 14:58:59 2016 +0100

add reverse drivers and use kwargs for overloaded forward/reverse call

also add tapestats.
more reverse drivers coming.

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit c16bf7ff198728740896a804d85684c0575c1e47
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Mon Mar 21 16:08:10 2016 +0100

add conversion for contiguous matrices and tensors

this time in the swig interface only, because they aren't
needed anywhere else.

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit f3895ffd1dbb07b8a04e867982683ae8de7a5052
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Mon Mar 21 15:45:22 2016 +0100

Revert "add conversion for contiguous matrices and tensors"

This reverts commit 7ad2746027b650147ec1d2da6567a3e5a7ecb876.

commit 76eb5d0899280cb9e3b4f4c1902823a45e884774
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Mon Mar 21 13:46:12 2016 +0100

change forward drivers to return numpy arrays

instead of expecting inplace arguments

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 73bde825b256809ae07d124c1c6c66e9ae50eb82
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Mon Mar 21 12:57:20 2016 +0100

change drivers to return numpy arrays

instead of expecting inplace arguments for output

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 946c87335675e68b7ea826eadd801d8d6fca4f54
Author: Sri Hari Krishna Narayanan <sriharikrishna@…>
Date: Thu Mar 17 07:55:25 2016 -0500

R: Added special handling for sparse_jac

commit 74d1c42cf5704c5ed7ae193cab6b2344334219c5
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Tue Mar 15 14:32:47 2016 +0100

fix inclusion of sparse drivers for swig (and other users)

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit e97a73cc2a33645c517a83367d3e709ff3f4a244
Merge: de9eef7 dbc9f38
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 10 22:52:57 2016 +0000

Merge branch 'swig' into 'swig'



See merge request !4

commit dbc9f3847b58781367d8bcfd0545f3283d3ce350
Merge: b2dd718 de9eef7
Author: Sri Hari Krishna Narayanan <sriharikrishna@…>
Date: Thu Mar 10 16:45:26 2016 -0600

Merge remote-tracking branch 'origin/swig' into swig

commit b2dd71855ce277d4f956c99f04e03f954b0ff344
Author: Sri Hari Krishna Narayanan <sriharikrishna@…>
Date: Thu Mar 10 16:45:01 2016 -0600

R: removed extraneous pragmas

commit de9eef7ce7ba0d5a4256018f4d0bbf79bba25f7f
Merge: 0096858 fdaa52b
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 10 22:37:21 2016 +0000

Merge branch 'swig' into 'swig'


Bugfixes for SWIG/R interface

See merge request !3

commit fdaa52bf6654fa2558c2982738f4de1f49c2c960
Merge: 2fa790a 0096858
Author: Sri Hari Krishna Narayanan <sriharikrishna@…>
Date: Thu Mar 10 16:20:57 2016 -0600

Merge remote-tracking branch 'origin/swig' into swig

commit 2fa790a85c04b41755e5587baa21e59fe7416ab9
Author: Sri Hari Krishna Narayanan <sriharikrishna@…>
Date: Thu Mar 10 16:13:41 2016 -0600

R: debug all SWIG and R related bugs.

SWIG 3.0.8 needs the two patches and the generated adolc.R needs
the diff. Then ADOL-C can be imported into the R runtime by
source ("init_adolc.R")

commit 009685847f9d81b97d7efcc85d02aa7325bb6b6a
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 10 22:05:52 2016 +0100

reflect file rename in for packaging

commit 9c10d2065974fe1afba0284f5970e55f2e7f28b7
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 10 22:05:04 2016 +0100

add numpy implementation for common drivers

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 08b723756aa4e6ab2e6ff3f6c3d532691c831019
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 10 22:04:26 2016 +0100

change header extension to .hpp to suppress clang warning

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 109cb5ca0ae4c07592854af2a48cb6935e2e629b
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 10 20:51:16 2016 +0100

numpy: Be more verbose in error message for mismatch of dimension

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit d76918c8e2306ca2e13665fae7feb183b45652aa
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 10 19:58:08 2016 +0100

more licensing info

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 075bda30cf9d1a6432a9db4e587d6f3897ec8f1d
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 10 18:47:47 2016 +0100

compile interface based on choice

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit c368a02f8d44ad772da5659234a8bdeed254ab7c
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 10 18:21:54 2016 +0100

add numpy support to python swig interface for forward mode

also in swigprocess compatibility with macosx and general bugfixing

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 7ad2746027b650147ec1d2da6567a3e5a7ecb876
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 10 18:13:35 2016 +0100

add conversion for contiguous matrices and tensors

Just a contiguous row major chunk for matrices ins't
good for indexing using 2 indices, we need to convert
it to double and add pointers. Similarly for tensors.

These will do just that, required for supporting numpy
arrays in the python swig interface

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 41cb636430fa636b4e25ab81da73d2e6fd741dec
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 3 17:08:03 2016 +0100

more verbosity for invoking shell commands

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 82ee8be28c090f84da7eb8a1254440d301fc5675
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 3 17:07:43 2016 +0100

add copyright and licensing everywhere

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit e5ec0154c1743d68b28e3fba1864742b23e3715b
Merge: 6ac52a6 a539d68
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 3 16:28:40 2016 +0100

Merge branch 'master' into swig

commit 6ac52a6a45d8fc9c11c9b0c3b1d0399f6537eadc
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Mar 3 16:28:33 2016 +0100

make script more verbose

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit a6ef9a80ea9593ea5f1861d3ff19315b92a75130
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Tue Feb 2 08:09:26 2016 +0100

correction in documentation of sparse functions

commit dbd7a07a1f5bcd6871d9a728f34a3d0679e01e4f
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Dec 17 22:50:11 2015 +0100

add error handling and make script python3 compatible

the swig generated interface may still not be python3 compatible

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 35ee8b63bea8bf7a0dea7af0e85f69336c6702e5
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Dec 17 18:14:12 2015 +0100

define better python operators

just renaming doesn't work. Maybe we'll need something similar in R too
but since I don't know R, I can't say.

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

commit 9654fa7e3b2a83187ee828f2ec9b39e6300ecaa7
Author: Kshitij Kulshreshtha <kshitij@…>
Date: Thu Dec 17 14:34:58 2015 +0100

Add a swig based python and R interface to ADOL-C

Signed-off-by: Kshitij Kulshreshtha <kshitij@…>

File size: 229.0 KB
1% Latex file containing the documentation of ADOL-C
3% Copyright (C) Andrea Walther, Andreas Griewank, Andreas Kowarz,
4%               Hristo Mitev, Sebastian Schlenkrich, Jean Utke, Olaf Vogel
6% This file is part of ADOL-C. This software is provided as open source.
7% Any use, reproduction, or distribution of the software constitutes
8% recipient's acceptance of the terms of the accompanying license file.
17\newdateformat{monthyear}{\monthname\ \THEYEAR}
32\newcommand{\N}{{ {\rm I} \kern -.225em {\rm N} }}
33\newcommand{\R}{{ {\rm I} \kern -.225em {\rm R} }}
34\newcommand{\T}{{ {\rm I} \kern -.425em {\rm T} }}
50{\Large {\bf ADOL-C:}} 
51\footnote{The development of earlier versions was supported by the Office of
52  Scientific Computing, U.S. Department of Energy, the NSF, and the Deutsche
53  Forschungsgemeinschaft. During the development of the current
54  version Andrea Walther and Andreas Kowarz were supported by the
55  grant Wa 1607/2-1 of the Deutsche Forschungsgemeinschaft} 
56\vspace{0.2in} \\
58{\Large A Package for the Automatic Differentiation}\vspace{0.1in} \\
59{\Large of Algorithms Written in C/C++}\\
61{\large\bf  Version \packageversion, \monthyear\today} \\
63 \mbox{Andrea Walther}\footnote{Institute of Mathematics, University
64   of Paderborn, 33098 Paderborn, Germany} and
65 \mbox{Andreas Griewank}\footnote{Department of Mathematics,
66 Humboldt-Universit\"at zu Berlin, 10099 Berlin, Germany}
71The C++ package ADOL-C described here facilitates the evaluation of
72first and higher derivatives of vector functions that are defined
73by computer programs written in C or C++. The resulting derivative
74evaluation routines may be called from C, C++, Fortran, or any other
75language that can be linked with C.
77The numerical values of derivative vectors are obtained free
78of truncation errors at a small multiple of the run time and
79random access memory required by the given function evaluation program.
80Derivative matrices are obtained by columns, by rows or in sparse format.
81For solution curves defined by ordinary differential equations,
82special routines are provided that evaluate the Taylor coefficient vectors
83and their Jacobians with respect to the current state vector.
84For explicitly or implicitly defined functions derivative tensors are
85obtained with a complexity that grows only quadratically in their
86degree. The derivative calculations involve a possibly substantial but
87always predictable amount of data. Since the data is accessed strictly sequentially
88it can be automatically paged out to external files.
91{\bf Keywords}: Computational Differentiation, Automatic
92         Differentiation,
93         Chain Rule, Overloading, Taylor Coefficients,
94         Gradients, Hessians, Forward Mode, Reverse Mode,
95         Implicit Function Differentiation, Inverse Function Differentiation
99{\bf Abbreviated title}: Automatic differentiation by overloading in C++
111\section{Preparing a Section of C or C++ Code for Differentiation}
117The package \mbox{ADOL-C} 
118utilizes overloading in C++, but the
119user has to know only C. The acronym stands for {\bf A}utomatic
120{\bf D}ifferentiation by {\bf O}ver{\bf L}oading in {\bf C}++.
121In contrast to source transformation approaches, overloading does not generate intermediate
122source code.
123As starting points to retrieve further information on techniques and
124application of automatic differentiation, as well as on other AD
125tools, we refer to the book \cite{GrWa08}. Furthermore, the web page
126\verb= of the AD community forms a rich source
127of further information and pointers.
130ADOL-C facilitates the simultaneous
131evaluation of arbitrarily high directional derivatives and the
132gradients of these Taylor coefficients with respect to all independent
133variables. Relative to the cost of evaluating the underlying function,
134the cost for evaluating any such scalar-vector pair grows as the
135square of the degree of the derivative but is still completely
136independent of the numbers $m$ and $n$.
138This manual is organized as follows. This section explains the
139modifications required to convert undifferentiated code to code that
140compiles with ADOL-C.
141\autoref{tape} covers aspects of the tape of recorded data that ADOL-C uses to
142evaluate arbitrarily high order derivatives. The discussion includes storage
143requirements and the tailoring of certain tape characteristics to fit specific
144user needs. Descriptions of easy-to-use drivers for a  convenient derivative
145evaluation are contained in \autoref{drivers}.
146\autoref{forw_rev_ad} offers a more mathematical characterization of
147the different modes of AD to compute derivatives. At the same time, the
148corresponding drivers of ADOL-C are explained. 
149The overloaded derivative evaluation routines using the forward and the reverse
150mode of AD are explained in \autoref{forw_rev}.
151Advanced differentiation techniques as the optimal checkpointing for
152time integrations, the exploitation of fixed point iterations, the usages
153of external differentiated functions and the differentiation of OpenMP
154parallel programs are described in \autoref{adv_ad}.
155The traceless forward mode is presented in \autoref{traceless}.
156\autoref{install} details the installation and
157use of the ADOL-C package. Finally, \autoref{example} 
158furnishes some example programs that incorporate the ADOL-C package to
159evaluate first and higher-order
160derivatives.  These and other examples are distributed with the ADOL-C
161source code.
162The user should simply refer to them if the more abstract and general
163descriptions of ADOL-C provided in this document do not suffice.
166\subsection{Declaring Active Variables}
170The key ingredient of automatic differentiation by overloading is the
171concept of an {\em active variable}. All variables that may be
172considered as differentiable quantities at some time
173during the program execution must be of an active
174type. ADOL-C uses one
175active scalar type, called {\sf adouble}, whose real part is of the
176standard type {\sf double}.
177Typically, one will declare the independent variables
178and all quantities that directly or indirectly depend on them as
179{\em active}. Other variables that do not depend on the independent
180variables but enter, for example, as parameters, may remain one of the
181{\em passive} types {\sf double, float}, or {\sf int}. There is no
182implicit type conversion from {\sf adouble} to any of these passive
183types; thus, {\bf failure to declare variables as active when they
184depend on other active variables will result in a compile-time error
185message}. In data flow terminology, the set of active variable names
186must contain all its successors in the dependency graph. All components
187of indexed arrays must have the same activity status.
189The real component of an {\sf adouble x} can be extracted as
190{\sf x.value()}. In particular,
191such explicit conversions are needed for the standard output procedure
192{\sf printf}. The output stream operator \boldmath $\ll$ \unboldmath is overloaded such
193that first the real part of an {\sf adouble} and then the string
194``{\sf (a)}" is added to the stream. The input stream operator \boldmath $\gg$ \unboldmath  can
195be used to assign a constant value to an {\sf adouble}.
196Naturally, {\sf adouble}s may be
197components of vectors, matrices, and other arrays, as well as
198members of structures or classes.
200The C++ class {\sf adouble}, its member functions, and the overloaded
201versions of all arithmetic operations, comparison operators, and
202most ANSI C functions are contained in the file \verb=adouble.cpp= and its
203header \verb=<adolc/adouble.h>=. The latter must be included for compilation
204of all program files containing {\sf adouble}s and corresponding
208\subsection{Marking Active Sections}
211All calculations involving active variables that occur between
212the void function calls
214{\sf trace\_on(tag,keep)} \hspace{0.3in} and \hspace{0.3in}
215{\sf trace\_off(file)}
217are recorded on a sequential data set called {\em tape}. Pairs of
218these function calls can appear anywhere in a C++ program, but
219they must not overlap. The nonnegative integer argument {\sf tag} identifies the
220particular tape for subsequent function or derivative evaluations.
221Unless several tapes need to be kept, ${\sf tag} =0$ may be used throughout.
222The optional integer arguments {\sf keep} and
223{\sf file} will be discussed in \autoref{tape}. We will refer to the
224sequence of statements executed between a particular call to
225{\sf trace\_on} and the following call to {\sf trace\_off} as an
226{\em active section} of the code. The same active section may be
227entered repeatedly, and one can successively generate several traces
228on distinct tapes by changing the value of {\sf tag}.
229Both functions {\sf trace\_on} and {\sf trace\_off} are prototyped in
230the header file \verb=<adolc/taputil.h>=, which is included by the header
231\verb=<adolc/adouble.h>= automatically.
233Active sections may contain nested or even recursive calls to functions
234provided by the user. Naturally, their formal and actual parameters
235must have matching types. In particular, the functions must be
236compiled with their active variables declared as
237{\sf adouble}s and with the header file \verb=<adolc/adouble.h>= included. 
238Variables of type {\sf adouble} may be declared outside an active section and need not
239go out of scope before the end of an active section.
240It is not necessary -- though desirable -- that free-store {\sf adouble}s
241allocated within
242an active section be deleted before its completion. The values of all
243{\sf adouble}s that exist at the beginning and end of an active section
244are automatically
245recorded by {\sf trace\_on} and {\sf trace\_off}, respectively.
248\subsection{Selecting Independent and Dependent Variables}
250One or more active variables that are read in or initialized to
251the values of constants or passive variables must be distinguished as
252independent variables. Other active variables that are similarly
253initialized may be considered as temporaries (e.g., a variable that
254accumulates the partial sums of a scalar product after being
255initialized to zero). In order to distinguish an active variable {\sf x} as
256independent, ADOL-C requires an assignment of the form
258{\sf x} \boldmath $\ll=$ \unboldmath {\sf px}\hspace{0.2in}// {\sf px} of any passive numeric type $\enspace .$
260This special initialization ensures that {\sf x.value()} = {\sf px}, and it should
261precede any other assignment to {\sf x}. However, {\sf x} may be reassigned
262other values subsequently. Similarly, one or more active variables {\sf y}
263must be distinguished as dependent by an assignment of the form
265{\sf y \boldmath $\gg=$ \unboldmath py}\hspace{0.2in}// {\sf py} of any  passive type $\enspace ,$ 
267which ensures that {\sf py} = {\sf y.value()} and should not be succeeded
268by any other assignment to {\sf y}. However, a dependent variable {\sf y} 
269may have been assigned other real values previously, and it could even be an
270independent variable as well.  The derivative values calculated after
272completion of an active section always represent {\bf derivatives of the final
273values of the dependent variables with respect to the initial values of the
274independent variables}.
276The order in which the independent and dependent variables are marked
277by the \boldmath $\ll=$ \unboldmath and \boldmath $\gg=$ \unboldmath statements matters crucially for the subsequent
278derivative evaluations. However, these variables do not have to be
279combined into contiguous vectors. ADOL-C counts the number of
280independent and dependent variable specifications within each active
281section and records them in the header of the tape.
284\subsection{A Subprogram as an Active Section} 
286As a generic example let us consider a C(++) function of the form
287shown in \autoref{code1}.
293{\sf void eval(}\= {\sf int n, int m,} \hspace{0.5 in} \=  // number of independents and dependents\\
294\>{\sf  double *x,} \> // independent variable vector \\
295\>{\sf  double *y,} \> // dependent variable vector  \\ 
296\> {\sf int *k, } \> // integer parameters \\ 
297\>{\sf  double *z)}  \> // real parameters \\
298{\sf \{ }\hspace{0.1 in } \=  \> // beginning of function body \\
299\>{\sf double t = 0;}  \> // local variable declaration \\
300\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> // begin of computation \\
301\>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue  \\
302\>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue \\
303\>{\sf  y[m-1] = t/m; }   \> //   end of computation \\
304{\sf  \} } \>  \> // end of function
308\caption{Generic example of a subprogram to be activated}
313If {\sf eval} is to be called from within an active C(++)
314section with {\sf x}
315and {\sf y} as vectors of {\sf adouble}s and the other parameters
316passive, then one merely has to change the type declarations of all
317variables that depend on {\sf x} from {\sf double} or {\sf float} to
318{\sf adouble}. Subsequently, the subprogram must be compiled with the
319header file \verb=<adolc/adouble.h>= included as described
320in \autoref{DecActVar}. Now let us consider the situation when {\sf eval} is
321still to be called with integer and real arguments, possibly from
322a program written in Fortran77, which  does not allow overloading.
324To automatically compute derivatives of the dependent
325variables {\sf y} with respect to the independent variables {\sf x}, we
326can make the body of the function into an active section. For
327example, we may modify the previous program segment
328as in \autoref{adolcexam}.
329The renaming and doubling up of the original independent and dependent
330variable vectors by active counterparts may seem at first a bit clumsy.
331However, this transformation has the advantage that the calling
332sequence and the computational part, i.e., where the function is
333really evaluated, of {\sf eval} remain completely
334unaltered. If the temporary variable {\sf t} had remained a {\sf double},
335the code would not compile, because of a type conflict in the assignment
336following the declaration. More detailed example codes are listed in
343{\sf void eval(} \= {\sf  int n,m,} \hspace{1.0 in}\= // number of independents and dependents\\
344\> {\sf double *px,} \> // independent passive variable vector \\
345\> {\sf double *py,} \> // dependent passive variable vector  \\ 
346\> {\sf int *k,}  \> // integer parameters \\
347\> {\sf double *z)} \> // parameter vector \\
348{\sf \{}\hspace{0.1 in}\= \> // beginning of function body \\
349\>{\sf  short int tag = 0;} \>   // tape array and/or tape file specifier\\
350\>{\sf trace\_on(tag);} \> // start tracing  \\
351\>{\sf adouble *x, *y;} \> // declare active variable pointers \\
352\>{\sf x = new adouble[n];}\>// declare active independent variables \\ 
353\>{\sf y = new adouble[m];} \> // declare active dependent variables \\
354\>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \\
355\>\hspace{0.2in} {\sf x[i] \boldmath $\ll=$ \unboldmath  px[i];} \> // select independent variables \\
356\>{\sf adouble t = 0;}  \> // local variable declaration \\
357     \>{\sf  for (int i=0; i \boldmath $<$ \unboldmath n; i++)} \> //  begin crunch \\
358     \>\hspace{0.2in}{\sf  t += z[i]*x[i];} \> //  continue crunch \\
359     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
360     \>{\sf  $\cdots \cdots \cdots \cdots $} \> // continue crunch \\
361     \>{\sf  y[m-1] = t/m; }   \> //   end crunch as before\\
362     \>{\sf for (int j=0; j \boldmath $<$ \unboldmath m; j++)} \\
363     \>\hspace{0.2in}{\sf y[j] \boldmath $\gg=$ \unboldmath py[j];} \> // select dependent variables \\
364     \>{\sf  delete[] y;} \>// delete dependent active variables \\
365     \>{\sf  delete[] x;} \>// delete independent active variables \\
366     \>{\sf trace\_off();} \> // complete tape \\
367{\sf  \}}   \>\> // end of function
370\caption{Activated version of the code listed in \autoref{code1}}
375\subsection{Overloaded Operators and Functions}
378As in the subprogram discussed above, the actual computational
379statements of a C(++) code need not be altered for the purposes of
380automatic differentiation. All arithmetic operations, as well as the
381comparison and assignment operators, are overloaded, so any or all of
382their operands can be an active variable. An {\sf adouble x} occurring
383in a comparison operator is effectively replaced by its real value
384{\sf x.value()}. Most functions contained in the ANSI C standard for
385the math library are overloaded for active arguments. The only
386exceptions are the non-differentiable functions {\sf fmod} and
387{\sf modf}. Otherwise, legitimate C code in active sections can remain
388completely unchanged, provided the direct output of active variables
389is avoided. The rest of this subsection may be skipped by first time
390users who are not worried about marginal issues of differentiability
391and efficiency.
393The modulus {\sf fabs(x)} is everywhere Lipschitz continuous but not
394properly differentiable at the origin, which raises the question of
395how this exception ought to be handled. Fortunately, one can easily
396see that {\sf fabs(x)} and all its compositions with smooth
397functions are still directionally differentiable. These
398directional derivatives of arbitrary order can be propagated in the
399forward mode without any ambiguity. In other words, the forward mode as
400implemented in ADOL-C  computes Gateaux derivatives
401in certain directions, which reduce to Fr\'echet derivatives only
402if the dependence on the direction is linear. Otherwise,
403the directional derivatives are merely positively homogeneous with
404respect to the scaling of the directions.
405For the reverse mode, ADOL-C sets the derivative of {\sf fabs(x)} at
406the origin somewhat arbitrarily to zero.
408We have defined binary functions {\sf fmin} and {\sf fmax} for {\sf adouble}
409arguments, so that function and derivative values are obtained consistent
410with those of {\sf fabs} according to the identities
412 \min(a,b) = [a+b-|a-b|]/2 \quad {\rm and} \quad
413 \max(a,b) = [a+b+|a-b|]/2 \quad .
415These relations cannot hold if either $a$ or $b$ is infinite, in which
416case {\sf fmin} or {\sf fmax} and their derivatives may still be well
417defined. It should be noted that the directional differentiation of
418{\sf fmin} and {\sf fmax} yields at ties $a=b$ different results from
419the corresponding assignment based on the sign of $a-b$. For example,
420the statement
422 {\sf if (a $<$ b) c = a; else c = b;}
424yields for {\sf a}~=~{\sf b} and {\sf a}$^\prime < $~{\sf b}$^\prime$
425the incorrect directional derivative value
426{\sf c}$^\prime = $~{\sf  b}$^\prime$ rather than the correct
427{\sf c}$^\prime = $~{\sf  a}$^\prime$. Therefore this form of conditional assignment
428should be avoided by use of the function $\sf fmin(a,b)$. There
429are also versions of {\sf fmin} and {\sf fmax} for two passive
430arguments and mixed passive/active arguments are handled by
431implicit conversion.
432On the function class obtained by composing the modulus with real
433analytic functions, the concept of directional differentiation can be
434extended to the propagation of unique one-sided Taylor expansions.
435The branches taken by {\sf fabs, fmin}, and {\sf fmax}, are recorded
436on the tape.
438The functions {\sf sqrt}, {\sf pow}, and some inverse trigonometric
439functions have infinite slopes at the boundary points of their domains.
440At these marginal points the derivatives are set by ADOL-C to
441either {\sf $\pm$InfVal}, 0
442or {\sf NoNum}, where {\sf InfVal} and {\sf NoNum} are user-defined
443parameters, see \autoref{Customizing}.
444On IEEE machines {\sf InfVal} can be set to the special value
445{\sf Inf}~=~$1.0/0.0$ and {\sf NoNum} to {\sf NaN}~=~$0.0/0.0$.
446For example, at {\sf a}~=~0 the first derivative {\sf b}$^\prime$ 
447of {\sf b}~=~{\sf sqrt(a)} is set to
449{\sf b}^\prime = \left\{
451\sf InfVal&\mbox{if}\;\; {\sf a}^\prime>0  \\
4520&\mbox{if}\;\;{\sf a}^\prime =0 \\
453\sf NoNum&\mbox{if}\;\;{\sf a}^\prime <0\\
454\end{array} \right\enspace .
456In other words, we consider {\sf a} and
457consequently {\sf b}  as a constant when {\sf a}$^\prime$ or more generally
458all computed Taylor coefficients are zero.
460The general power function ${\sf pow(x,y)=x^y}$ is computed whenever
461it is defined for the corresponding {\sf double} arguments. If {\sf x} is
462negative, however, the partial derivative with respect to an integral exponent
463is set to zero.
464%Similarly, the partial of {\bf pow} with respect to both arguments
465%is set to zero at the origin, where both arguments vanish.     
466The derivatives of the step functions
467{\sf floor}, {\sf ceil}, {\sf frexp}, and {\sf ldexp} are set to zero at all
468arguments {\sf x}. The result values of the step functions
469are recorded on the tape and can later be checked to recognize
470whether a step to another level was taken during a forward sweep
471at different arguments than at taping time.
473Some C implementations supply other special
474functions, in particular the error function {\sf erf(x)}. For the
475latter, we have included an {\sf adouble} version in \verb=<adouble.cpp>=, which
476has been commented out for systems on which the {\sf double} valued version
477is not available. The increment and decrement operators {\sf ++}, \boldmath $--$ \unboldmath (prefix and
478postfix) are available for {\sf adouble}s.
480% XXX: Vector and matrix class have to be reimplemented !!!
482% and also the
483%active subscripts described in the \autoref{act_subscr}.
484Ambiguous statements like {\sf a += a++;} must be
485avoided because the compiler may sequence the evaluation of the
487expression differently from the original in terms of {\sf double}s.
489As we have indicated above, all subroutines called with active arguments
490must be modified or suitably overloaded. The simplest procedure is
491to declare the local variables of the function as active so that
492their internal calculations are also recorded on the tape.
493Unfortunately, this approach is likely to be unnecessarily inefficient
494and inaccurate if the original subroutine evaluates a special function
495that is defined as the solution of a particular mathematical problem.
496The most important examples are implicit functions, quadratures,
497and solutions of ordinary differential equations. Often
498the numerical methods for evaluating such special functions are
499elaborate, and their internal workings are not at all differentiable in
500the data. Rather than differentiating through such an adaptive
501procedure, one can obtain first and higher derivatives directly from
502the mathematical definition of the special function. Currently this
503direct approach has been implemented only for user-supplied quadratures
504as described in \autoref{quadrat}.
507\subsection{Reusing the Tape for Arbitrary Input Values}
510In some situations it may be desirable to calculate the value and
511derivatives of a function at arbitrary arguments by using a tape of
512the function evaluation at one argument and reevaluating the
513function  and its derivatives using the given ADOL-C
514routines. This approach can
515significantly reduce run times, and it
516also allows to port problem functions, in the form of the 
517corresponding tape files, into a computing environment that
518does not support C++ but does support C or Fortran. 
519Therefore, the routines provided by ADOL-C for the evaluation of derivatives
520can be used to at arguments $x$ other than the
521point at which the tape was generated, provided there are
522no user defined quadratures and all comparisons involving
523{\sf adouble}s yield the same result. The last condition
524implies that the control flow is unaltered by the change
525of the independent variable values. Therefore, this sufficient
526condition is tested by ADOL-C and if it is not met
527the ADOL-C routine called for derivative calculations indicates this
528contingency through its return value. Currently, there are six return values,
529see \autoref{retvalues}.
533 +3 &
536The function is locally analytic.
538\end{minipage} \\ \hline
539 +2 &
542The function is locally analytic but the sparsity
543structure (compared to the situation at the  taping point)
544may have changed, e.g. while at taping arguments
545{\sf fmax(a,b)} returned {\sf a} we get {\sf b} at
546the argument currently used.
548\end{minipage} \\ \hline
549 +1 &
552At least one of the functions {\sf fmin}, {\sf fmax} or {\sf fabs}
553is  evaluated at a tie or zero, respectively.  Hence, the function to be differentiated is
554Lipschitz-continuous but possibly non-differentiable.
556\end{minipage} \\ \hline
557 0 &
560Some arithmetic comparison involving {\sf adouble}s yields a tie.
561Hence, the function to be differentiated  may be discontinuous.
563\end{minipage} \\ \hline
564 -1 &
567An {\sf adouble} comparison yields different results
568from the evaluation point at which the tape was generated.
570\end{minipage} \\ \hline
571 -2 &
574The argument of a user-defined quadrature has changed
575from the evaluation point at which the tape was generated.
577\end{minipage} \\ \hline
579\caption{Description of return values}
585\caption{Return values around the taping point}
589In \autoref{fi:tap_point} these return values are illustrated.
590If the user finds the return value of an ADOL-C routine to be negative the
591taping process simply has to be repeated by executing the active section again.
592The crux of the problem lies in the fact that the tape records only
593the operations that are executed during one particular evaluation of the
595It also has no way to evaluate integrals since the corresponding
596quadratures are never recorded on the tape.
597Therefore, when there are user-defined quadratures the retaping is necessary at each
598new point. If there are only branches conditioned on {\sf adouble}
599comparisons one may hope that re-taping becomes unnecessary when
600the points settle down in some small neighborhood, as one would
601expect for example in an iterative equation solver.
604\subsection{Conditional Assignments}
607It appears unsatisfactory that, for example, a simple table lookup
608of some physical property forces the re-recording of a possibly
609much larger calculation. However, the basic philosophy of ADOL-C
610is to overload arithmetic, rather than to generate a new program
611with jumps between ``instructions'', which would destroy the
612strictly sequential tape access and
613require the infusion of substantial compiler technology.
614Therefore, we introduce the two constructs of conditional
615assignments and active integers as partial remedies to the
616branching problem.
618In many cases, the functionality of branches
619can be replaced by conditional assignments. 
620For this purpose, we provide a special function called
621{\sf condassign(a,b,c,d)}. Its calling sequence corresponds to the
622syntax of the conditional assignment
624    {\sf a = (b \boldmath $>$ \unboldmath 0) ? c : d;} 
626which C++ inherited from C. However, here the arguments are restricted to be
627active or passive scalar arguments, and all expression arguments
628are evaluated before the test on {\sf  b}, which is different from
629the usual conditional assignment or the code segment.
631Suppose the original program contains the code segment
633{\sf if (b \boldmath $>$ \unboldmath 0) a = c; else a = d;}\\
635Here, only one of the expressions (or, more generally, program blocks)
636{\sf c} and {\sf d} is evaluated, which exactly constitutes the problem
637for ADOL-C. To obtain the correct value {\sf a} with ADOL-C, one
638may first execute both branches and then pick either {\sf c}
639or {\sf d} using
640{\sf condassign(a,b,c,d)}. To maintain
641consistency with the original code, one has to make sure
642that the two branches do not have any side effects that can
643interfere with each other or may be important for subsequent
644calculations. Furthermore the test parameter {\sf b} has to be an
645{\sf adouble} or an {\sf adouble} expression. Otherwise the
646test condition {\sf b} is recorded on the tape as a {\em constant} with its
647run time value. Thus the original dependency of {\sf b} on
648active variables gets lost, for instance if {\sf b} is a comparison
649expression, see \autoref{OverOper}.
650If there is no {\sf else} part in a conditional assignment, one may call
651the three argument version
652{\sf condassign(a,b,c)}, which
653is logically equivalent to {\sf condassign(a,b,c,a)} in that
654nothing happens if {\sf b} is non-positive. 
655The header file \verb=<adolc/adouble.h>=
656contains also corresponding definitions of
657{\sf condassign(a,b,c,d)} 
658and {\sf condassign(a,b,c)} for
659passive {\sf double} arguments so that the modified code
660without any differentiation can be tested
661for correctness.
663A generalization of this concept for more than two branches, e.g.,
664akin to a \texttt{switch} statement or a cascade of \texttt{if...else if}, may be done by enabling
665\textsf{ADOLC\_ADVANCED\_BRANCHING} and performing selection on
666elements of an \texttt{advector} with active indices.
669\subsection{Step-by-Step Modification Procedure}
671To prepare a section of given C or C++ code for automatic
672differentiation as described above, one applies the following step-by-step procedure.
675Use the statements {\sf trace\_on(tag)} or {\sf trace\_on(tag,keep)}
676and {\sf trace\_off()} or {\sf trace\_off(file)} to mark the
677beginning and end of the active section.
679Select the set of active variables, and change their type from
680{\sf double} or {\sf float} to {\sf adouble}.
682Select a sequence of independent variables, and initialize them with
683\boldmath $\ll=$ \unboldmath assignments from passive variables or vectors.
685Select a sequence of dependent variables among the active variables,
686and pass their final values to passive variable or vectors thereof
687by \boldmath $\gg=$ \unboldmath assignments.
689Compile the codes after including the header file \verb=<adolc/adouble.h>=.
691Typically, the first compilation will detect several type conflicts
692-- usually attempts to convert from active to passive
693variables or to perform standard I/O of active variables.
694Since all standard
695C programs can be activated by a mechanical application of the
696procedure above, the following section is of importance
697only to advanced users.
700\section{Numbering the Tapes and Controlling the Buffer}
703The trace generated by the execution of an active section may stay
704within a triplet of internal arrays or it may be written out
705to three corresponding files. We will refer to these triplets as the
706tape array or tape file, in general tape, which may subsequently be
707used to evaluate the
708underlying function and its derivatives at the original point or at
709alternative arguments. If the active section involves user-defined
710quadratures it must be executed and
711re-taped at each new argument. Similarly, if conditions on
712{\sf adouble} values lead to a different program branch being taken at
713a new argument the evaluation process also needs to be re-taped at the
714new point. Otherwise, direct evaluation from
715the tape by the routine {\sf function} (\autoref{optdrivers}) is
716likely to be
717faster. The use of quadratures and the results of all comparisons on
718{\sf adouble}s are recorded on the tape so that {\sf function} and other
719forward routines stop and  return appropriate flags if their use without
720prior re-taping is unsafe. To avoid any re-taping certain types of
721branches can be recorded on the tape through
722the use of conditional assignments 
723described before in \autoref{condassign}.
725Several tapes may be generated and kept simultaneously.
726A tape array is used as a triplet of buffers or a tape file is generated if
727the length of any of the buffers exceeds the maximal array lengths of
728{\sf OBUFSIZE}, {\sf VBUFSIZE} or {\sf LBUFSIZE}. These parameters are
729defined in the header file \verb=<adolc/internal/usrparms.h>=
730and may be adjusted by the user in the header file before compiling
731the ADOL-C library, or on runtime using a file named \verb=.adolcrc=.
732Lines in this file must have the form
736where the quotation marks are mandatory.
737The filesystem folder, where the tapes files may be written to disk,
738can be changed by changing the definition of {\sf TAPE\_DIR} in
739the header file \verb=<adolc/dvlparms.h>= before
740compiling the ADOL-C library, or on runtime by defining {\sf
741  TAPE\_DIR} in the \verb=.adolcrc= file. By default this is defined
742to be the present working directory (\verb=.=).
744For simple usage, {\sf trace\_on} may be called with only the tape
745{\sf tag} as argument, and {\sf trace\_off} may be called
746without argument. The optional integer argument {\sf keep} of
747{\sf trace\_on} determines whether the numerical values of all
748active variables are recorded in a buffered temporary array or file
749called the taylor stack.
750This option takes effect if
751{\sf keep} = 1 and prepares the scene for an immediately following
752gradient evaluation by a call to a routine implementing the reverse mode
753as described in the \autoref{forw_rev_ad} and \autoref{forw_rev}. A
754file is used instead of an array if the size exceeds the maximal array
755length of {\sf TBUFSIZE} defined in \verb=<adolc/internal/usrparms.h>= and may
756be adjusted in the same way like the other buffer sizes mentioned above.
757Alternatively, gradients may be evaluated by a call
758to {\sf gradient}, which includes a preparatory forward sweep
759for the creation of the temporary file. If omitted, the argument
760{\sf  keep} defaults to 0, so that no temporary
761taylor stack file is generated.
763By setting the optional integer argument {\sf file} of
764{\sf  trace\_off} to 1, the user may force a numbered  tape
765file to be written even if the tape array (buffer) does not overflow.
766If the argument {\sf file} is omitted, it
767defaults to 0, so that the tape array is written onto a tape file only
768if the length of any of the buffers exceeds {\sf [OLVT]BUFSIZE} elements.
770After the execution of an active section, if a tape file was generated, i.e.,
771if the length of some buffer exceeded {\sf [OLVT]BUFSIZE} elements or if the
772argument {\sf file} of {\sf trace\_off} was set to 1, the files will be
773saved in the directory defined as {\sf ADOLC\_TAPE\_DIR} (by default
774the current working directory) under filenames formed by
775the strings {\sf ADOLC\_OPERATIONS\_NAME}, {\sf
777  ADOLC\_TAYLORS\_NAME} defined in
778the header file \verb=<adolc/dvlparms.h>= appended with the number
779given as the {\sf tag} argument to {\sf trace\_on} and have the
780extension {\sf .tap}.
782 Later, all problem-independent routines
783like {\sf gradient}, {\sf jacobian}, {\sf forward}, {\sf reverse}, and others
784expect as first argument a {\sf tag} to determine
785the tape on which their respective computational task is to be performed.
786By calling {\sf trace\_on} with different tape {\sf tag}s, one can create
787several tapes for various function evaluations and subsequently perform
788function and derivative evaluations on one or more of them.
790For example, suppose one wishes to calculate for two smooth functions
791$f_1(x)$ and $f_2(x)$ 
793   f(x) = \max \{f_1(x) ,f_2(x)\},\qquad \nabla f(x),
795and possibly higher derivatives where the two functions do not tie.
796Provided $f_1$ and $f_2$ are evaluated in two separate active sections,
797one can generate two different tapes by calling {\sf trace\_on} with
798{\sf tag} = 1 and {\sf tag} = 2 at the beginning of the respective active
800Subsequently, one can decide whether $f(x)=f_1(x)$ or $f(x)=f_2(x)$ at the
801current argument and then evaluate the gradient $\nabla f(x)$ by calling
802{\sf gradient} with the appropriate argument value {\sf tag} = 1 or
803{\sf tag} = 2.
806\subsection{Examining the Tape and Predicting Storage Requirements }
809At any point in the program, one may call the routine
811{\sf void tapestats(unsigned short tag, size\_t* counts)}
813with {\sf counts} beeing an array of at least eleven integers.
814The first argument {\sf tag} specifies the particular tape of
815interest. The components of {\sf counts} represent
818{\sf counts[0]}: & the number of independents, i.e.~calls to \boldmath $\ll=$ \unboldmath, \\
819{\sf counts[1]}: & the number of dependents, i.e.~calls to \boldmath $\gg=$ \unboldmath,\\ 
820{\sf counts[2]}: & the maximal number of live active variables,\\
821{\sf counts[3]}: & the size of taylor stack (number of overwrites),\\
822{\sf counts[4]}: & the buffer size (a multiple of eight),
827{\sf counts[5]}: & the total number of operations recorded,\\
828{\sf counts[6-13]}: & other internal information about the tape.
831The values {\sf maxlive} = {\sf counts[2]} and {\sf tssize} = {\sf counts[3]} 
832determine the temporary
833storage requirements during calls to the routines
834implementing the forward and the reverse mode.
835For a certain degree {\sf deg} $\geq$ 0, the scalar version of the
836forward mode involves apart from the tape buffers an array of
837 $(${\sf deg}$+1)*${\sf maxlive} {\sf double}s in
838core and, in addition, a sequential data set called the value stack
839of {\sf tssize}$*${\sf keep} {\sf revreal}s if called with the
840option {\sf keep} $>$ 0. Here
841the type {\sf revreal} is defined as {\sf double} or {\sf float}. The latter choice halves the storage
842requirement for the sequential data set, which stays in core if
843its length is less than {\sf TBUFSIZE} bytes and is otherwise written
844out to a temporary file. The parameter {\sf TBUFSIZE} is defined in the header file \verb=<adolc/internal/usrparms.h>=.
845The drawback of the economical
846{\sf revreal} = {\sf float} choice is that subsequent calls to reverse mode implementations
847yield gradients and other adjoint vectors only in single-precision
848accuracy. This may be acceptable if the adjoint vectors
849represent rows of a Jacobian that is  used for the calculation of
850Newton steps. In its scalar version, the reverse mode implementation involves
851the same number of {\sf double}s and twice as many {\sf revreal}s as the
852forward mode implementation.
853The storage requirements of the vector versions of the forward mode and
854reverse mode implementation are equal to that of the scalar versions multiplied by
855the vector length.
858\subsection{Customizing ADOL-C}
861Based on the information provided by the routine {\sf tapestats}, the user may alter the
862following types and constant dimensions in the header file \verb=<adolc/internal/usrparms.h>=
863to suit his problem and environment.
866\item[{\sf OBUFSIZE}, {\sf LBUFSIZE}, {\sf VBUFSIZE}{\rm :}] These integer determines the length of
867in\-ter\-nal buf\-fers (default: 524$\,$288). If the buffers are large enough to accommodate all
868required data, any file access is avoided unless {\sf trace\_off}
869is called with a positive argument. This desirable situation can
870be achieved for many problem functions with an execution trace of moderate
871size. Primarily these values occur as an argument
872to {\sf malloc}, so that setting it unnecessarily large may have no
873ill effects, unless the operating system prohibits or penalizes large
874array allocations. It is however recommended to leave the values in
875\texttt{<adolc/internal/usrparms.h>} unchanged and set them using the
876\texttt{.adolcrc} file in the current working directory at runtime.
878\item[{\sf TBUFSIZE}{\rm :}] This integer determines the length of the
879in\-ter\-nal buf\-fer for a taylor stack (default: 524$\,$288).
881\item[{\sf TBUFNUM}{\rm :}] This integer determines the maximal number of taylor stacks (default: 32).
883\item[{\sf fint}{\rm :}] The integer data type used by Fortran callable versions of functions.
885\item[{\sf fdouble}{\rm :}] The floating point data type used by Fortran callable versions of functions.
887\item[{\sf inf\_num}{\rm :}] This together with {\sf inf\_den}
888sets the ``vertical'' slope {\sf InfVal} = {\sf inf\_num/inf\_den} 
889of special functions at the boundaries of their domains (default: {\sf inf\_num} = 1.0). On IEEE machines
890the default setting produces the standard {\sf Inf}. On non-IEEE machines
891change these values to produce a small {\sf InfVal} value and compare
892the results of two forward sweeps with different {\sf InfVal} settings
893to detect a ``vertical'' slope.
895\item[{\sf inf\_den}{\rm :}] See {\sf inf\_num} (default: 0.0).
897\item[{\sf non\_num}{\rm :}] This together with {\sf non\_den} 
898sets the mathematically
899undefined derivative value {\sf NoNum} = {\sf non\_num/non\_den}
900of special functions at the boundaries of their domains (default: {\sf non\_num} = 0.0). On IEEE machines
901the default setting produces the standard {\sf NaN}. On non-IEEE machines
902change these values to produce a small {\sf NoNum} value and compare
903the results of two forward sweeps with different {\sf NoNum} settings
904to detect the occurrence of undefined derivative values.
906\item[{\sf non\_den}{\rm :}] See {\sf non\_num} (default: 0.0).
908\item[{\sf ADOLC\_EPS}{\rm :}] For testing on small numbers to avoid overflows (default: 10E-20).
910\item[{\sf DIAG\_OUT}{\rm :}] File identifier used as standard output for ADOL-C diagnostics (default: stdout).
913The following types and options may be set using the command-line
914options of the \texttt{./configure} script.
917\item[{\sf locint}{\rm :}] The range of the integer type
918{\sf locint} determines how many {\sf adouble}s can be simultaneously
919alive (default: {\sf unsigned int}).  In extreme cases when there are more than $2^{32}$ {\sf adouble}s
920alive at any one time, the type {\sf locint} must be changed to
921 {\sf unsigned long}. This can be done by passing
922 \texttt{--enable-ulong} to \texttt{./configure}.
924\item[{\sf revreal}{\rm :}] The choice of this floating-point type
925trades accuracy with storage for reverse sweeps (default: {\sf double}). While functions
926and their derivatives are always evaluated in double precision
927during forward sweeps, gradients and other adjoint vectors are obtained
928with the precision determined by the type {\sf revreal}. The less
929accurate choice {\sf revreal} = {\sf float} nearly halves the
930storage requirement during reverse sweeps. This can be done by passing
931\texttt{--disable-double} to \texttt{./configure}.
933\item[{\sf ATRIG\_ERF}{\rm :}] The overloaded versions of the inverse
934  hyperbolic functions and the error function are enabled (default:
935  undefined) by passing \texttt{--enable-atrig-erf}
936  to \texttt{./configure}
938\item[{\sf ADOLC\_USE\_CALLOC}{\rm :}] Selects the memory allocation routine
939  used by ADOL-C. {\sf Malloc} will be used if this variable is
940  undefined. {\sf ADOLC\_USE\_CALLOC} is defined by default to avoid incorrect
941  result caused by uninitialized memory. It can be set undefined by
942  passing \texttt{--disable-use-calloc} to \texttt{./configure}.
944\item[{\sf ADOLC\_ADVANCED\_BRANCHING}{\rm :}] Enables routines
945  required for automatic branch selection (default: disabled). The
946  boolean valued comparison operators with two \texttt{adouble} type
947  arguments will not return boolean values anymore and may not be used
948  in branch control statements (\texttt{if}, \texttt{while}, \texttt{for}
949  etc.). Instead conditional assignments using \texttt{condassign} or
950  selection operations on elements of \texttt{advector} type should be
951  used. Enabling this option and rewriting the function evaluation
952  using \texttt{condassign} or selections of \texttt{advector}
953  elements will prevent the need for retracing the function at branch
954  switches. This can be enabled by passing
955  \texttt{--enable-advanced-branching} to \texttt{./configure}.
959\subsection{Warnings and Suggestions for Improved Efficiency}
962Since the type {\sf adouble} has a nontrivial constructor,
963the mere declaration of large {\sf adouble} arrays may take up
964considerable run time. The user should be warned against
965the usual Fortran practice of declaring fixed-size arrays
966that can accommodate the largest possible case of an evaluation program
967with variable dimensions. If such programs are converted to or written
968in C, the overloading in combination with ADOL-C will lead to very
969large run time increases for comparatively small values of the
970problem dimension, because the actual computation is completely
971dominated by the construction of the large {\sf adouble} arrays.
972The user is advised to
973create dynamic arrays of
974{\sf adouble}s by using the C++ operator {\sf new} and to destroy them
975using {\sf delete}. For storage efficiency it is desirable that
976dynamic objects are created and destroyed in a last-in-first-out
979Whenever an {\sf adouble} is declared, the constructor for the type
980{\sf adouble} assigns it a nominal address, which we will refer to as
981its  {\em location}.  The location is of the type {\sf locint} defined
982in the header file \verb=<adolc/internal/usrparms.h>=. Active vectors occupy
983a range of contiguous locations. As long as the program execution
984never involves more than 65$\,$536 active variables, the type {\sf locint}
985may be defined as {\sf unsigned short}. Otherwise, the range may be
986extended by defining {\sf locint} as {\sf (unsigned) int} or
987{\sf (unsigned) long}, which may nearly double
988the overall mass storage requirement. Sometimes one can avoid exceeding
989the accessible range of {\sf unsigned short}s by using more local variables and deleting
990{\sf adouble}s  created by the new operator in a
992fashion.  When memory for {\sf adouble}s is requested through a call to
993{\sf malloc()} or other related C memory-allocating
994functions, the storage for these {\sf adouble}s is allocated; however, the
995C++ {\sf adouble} constructor is never called.  The newly defined
996{\sf adouble}s are never assigned a location and are not counted in
997the stack of live variables. Thus, any results depending upon these
998pseudo-{\sf adouble}s will be incorrect. For these reasons {\bf DO NOT use
999  malloc() and related C memory-allocating
1000functions when declaring adoubles (see the following paragraph).}
1002% XXX: Vector and matrix class have to be reimplemented !!!
1004%The same point applies, of course,
1005% for active vectors.
1007When an {\sf adouble}
1009% XXX: Vector and matrix class have to be reimplemented !!!
1011% or {\bf adoublev}
1012goes out of
1013scope or is explicitly deleted, the destructor notices that its
1014location(s) may be
1015freed for subsequent (nominal) reallocation. In general, this is not done
1016immediately but is delayed until the locations to be deallocated form a
1017contiguous tail of all locations currently being used. 
1019 As a consequence of this allocation scheme, the currently
1020alive {\sf adouble} locations always form a contiguous range of integers
1021that grows and shrinks like a stack. Newly declared {\sf adouble}s are
1022placed on the top so that vectors of {\sf adouble}s obtain a contiguous
1023range of locations. While the C++ compiler can be expected to construct
1024and destruct automatic variables in a last-in-first-out fashion, the
1025user may upset this desirable pattern by deleting free-store {\sf adouble}s
1026too early or too late. Then the {\sf adouble} stack may grow
1027unnecessarily, but the numerical results will still be
1028correct, unless an exception occurs because the range of {\sf locint}
1029is exceeded. In general, free-store {\sf adouble}s
1031% XXX: Vector and matrix class have to be reimplemented !!!
1033%and {\bf adoublev}s
1034should be deleted in a last-in-first-out fashion toward the end of
1035the program block in which they were created.
1036When this pattern is maintained, the maximum number of
1037{\sf adouble}s alive and, as a consequence, the
1038randomly accessed storage space
1039of the derivative evaluation routines is bounded by a
1040small multiple of the memory used in the relevant section of the
1041original program. Failure to delete dynamically allocated {\sf adouble}s
1042may cause that the  maximal number of {\sf adouble}s alive at one time will be exceeded
1043if the same active section is called repeatedly. The same effect
1044occurs if static {\sf adouble}s are used.
1046To avoid the storage and manipulation of structurally
1047trivial derivative values, one should pay careful attention to
1048the naming of variables. Ideally, the intermediate
1049values generated during the evaluation of a vector function
1050should be assigned to program variables that are
1051consistently either active or passive, in that all their values
1052either are or are not dependent on the independent variables
1053in a nontrivial way. For example, this rule is violated if a temporary
1054variable is successively used to accumulate inner products involving
1055first only passive and later active arrays. Then the first inner
1056product and all its successors in the data dependency graph become
1057artificially active and the derivative evaluation routines
1058described later will waste
1059time allocating and propagating
1060trivial or useless derivatives. Sometimes even values that do
1061depend on the independent variables may be of only transitory
1062importance and may not affect the dependent variables. For example,
1063this is true for multipliers that are used to scale linear
1064equations, but whose values do not influence the dependent
1065variables in a mathematical sense. Such dead-end variables
1066can be deactivated by the use of the {\sf value} function, which
1067converts {\sf adouble}s to {\sf double}s. The deleterious effects
1068of unnecessary activity are partly alleviated by run time
1069activity flags in the derivative routine
1070{\sf hov\_reverse} presented in \autoref{forw_rev_ad}.
1072The {\sf adouble} default constructor sets to zero the associated value.
1073This implies a certain overhead that may seem unnecessary when no initial value
1074is actually given, however,
1075the implicit initialization of arrays from a partial value list is the only legitimate construct (known to us) that requires this behavior.
1076An array instantiation such as
1078\sf double x[3]=\{2.0\};
1080will initialize {\sf x[0]} to {\sf 2.0} and initialize (implicitly) the remaining array elements
1081{\sf x[1]} and {\sf x[2]}  to {\sf 0.0}. According to the C++ standard the array element  construction of
1082the type changed instantiation
1084\sf adouble x[3]=\{2.0\};
1086will use the constructor {\sf adouble(const double\&);} for {\sf x[0]} passing in {\sf 2.0} but
1087will call the {\sf adouble} default constructor {\sf x[1]} and {\sf x[2]} leaving these array
1088elements uninitialized {\em unless} the default constructor does implement the initialization to
1090The C++ constructor syntax does not provide a means to  distinguish this implicit initialization from the declaration of any simple uninitialized variable.
1091If the user can ascertain the absence of array instantiations such as the above then one can 
1092configure ADOL-C with the \verb=--disable-stdczero= option , see \autoref{genlib}, to
1093avoid the overhead of these initializations. 
1099\section{Easy-To-Use Drivers}
1102For the convenience of the user, ADOL-C provides several
1103easy-to-use drivers that compute the most frequently required
1104derivative objects. Throughout, we assume that after the execution of an
1105active section, the corresponding tape with the identifier {\sf tag}
1106contains a detailed record of the computational process by which the
1107final values $y$ of the dependent variables were obtained from the
1108values $x$ of the independent variables. We will denote this functional
1109relation between the input variables $x$ and the output variables $y$ by
1111F : \R^n \mapsto \R^m, \qquad x \rightarrow F(x) \equiv y.
1113The return value of all drivers presented in this section
1114indicate the validity of the tape as explained in \autoref{reuse_tape}.
1115The presented drivers are all C functions and therefore can be used within
1116C and C++ programs. Some Fortran-callable companions can be found
1117in the appropriate header files.
1120\subsection{Drivers for Optimization and Nonlinear Equations}
1124The drivers provided for solving optimization problems and nonlinear
1125equations are prototyped in the header file \verb=<adolc/drivers/drivers.h>=,
1126which is included automatically by the global header file \verb=<adolc/adolc.h>=
1127(see \autoref{ssec:DesIH}).
1129The routine {\sf function} allows to evaluate the desired function from
1130the tape instead of executing the corresponding source code:
1133\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1134\>{\sf int function(tag,m,n,x,y)}\\
1135\>{\sf short int tag;}         \> // tape identification \\
1136\>{\sf int m;}                 \> // number of dependent variables $m$\\
1137\>{\sf int n;}                 \> // number of independent variables $n$\\
1138\>{\sf double x[n];}           \> // independent vector $x$ \\
1139\>{\sf double y[m];}           \> // dependent vector $y=F(x)$ 
1142If the original evaluation program is available this double version
1143should be used to compute the function value in order to avoid the
1144interpretative overhead. 
1146For the calculation of whole derivative vectors and matrices up to order
11472 there are the following procedures:
1150\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1151\>{\sf int gradient(tag,n,x,g)}\\
1152\>{\sf short int tag;}         \> // tape identification \\
1153\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1154\>{\sf double x[n];}           \> // independent vector $x$ \\
1155\>{\sf double g[n];}           \> // resulting gradient $\nabla F(x)$
1159\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1160\>{\sf int jacobian(tag,m,n,x,J)}\\
1161\>{\sf short int tag;}         \> // tape identification \\
1162\>{\sf int m;}                 \> // number of dependent variables $m$\\
1163\>{\sf int n;}                 \> // number of independent variables $n$\\
1164\>{\sf double x[n];}           \> // independent vector $x$ \\
1165\>{\sf double J[m][n];}        \> // resulting Jacobian $F^\prime (x)$
1169\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1170\>{\sf int hessian(tag,n,x,H)}\\
1171\>{\sf short int tag;}         \> // tape identification \\
1172\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1173\>{\sf double x[n];}           \> // independent vector $x$ \\
1174\>{\sf double H[n][n];}        \> // resulting Hessian matrix $\nabla^2F(x)$ 
1177The driver routine {\sf hessian} computes only the lower half of
1178$\nabla^2f(x_0)$ so that all values {\sf H[i][j]} with $j>i$ 
1179of {\sf H} allocated as a square array remain untouched during the call
1180of {\sf hessian}. Hence only $i+1$ {\sf double}s  need to be
1181allocated starting at the position {\sf H[i]}.
1183To use the full capability of automatic differentiation when the
1184product of derivatives with certain weight vectors or directions are needed, ADOL-C offers
1185the following four drivers: 
1188\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1189\>{\sf int vec\_jac(tag,m,n,repeat,x,u,z)}\\
1190\>{\sf short int tag;}         \> // tape identification \\
1191\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1192\>{\sf int n;}                 \> // number of independent variables $n$\\
1193\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1194\>{\sf double x[n];}           \> // independent vector $x$ \\
1195\>{\sf double u[m];}           \> // range weight vector $u$ \\ 
1196\>{\sf double z[n];}           \> // result $z = u^TF^\prime (x)$
1198If a nonzero value of the parameter {\sf repeat} indicates that the
1199routine {\sf vec\_jac} has been called at the same argument immediately
1200before, the internal forward mode evaluation will be skipped and only
1201reverse mode evaluation with the corresponding arguments is executed
1202resulting in a reduced computational complexity of the function {\sf vec\_jac}.
1205\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1206\>{\sf int jac\_vec(tag,m,n,x,v,z)}\\
1207\>{\sf short int tag;}         \> // tape identification \\
1208\>{\sf int m;}                 \> // number of dependent variables $m$\\
1209\>{\sf int n;}                 \> // number of independent variables $n$\\
1210\>{\sf double x[n];}           \> // independent vector $x$\\
1211\>{\sf double v[n];}           \> // tangent vector $v$\\ 
1212\>{\sf double z[m];}           \> // result $z = F^\prime (x)v$
1216\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1217\>{\sf int hess\_vec(tag,n,x,v,z)}\\
1218\>{\sf short int tag;}         \> // tape identification \\
1219\>{\sf int n;}                 \> // number of independent variables $n$\\
1220\>{\sf double x[n];}           \> // independent vector $x$\\
1221\>{\sf double v[n];}           \> // tangent vector $v$\\
1222\>{\sf double z[n];}           \> // result $z = \nabla^2F(x) v$ 
1226\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1227\>{\sf int hess\_mat(tag,n,p,x,V,Z)}\\
1228\>{\sf short int tag;}         \> // tape identification \\
1229\>{\sf int n;}                 \> // number of independent variables $n$\\
1230\>{\sf int p;}                 \> // number of columns in $V$\\
1231\>{\sf double x[n];}           \> // independent vector $x$\\
1232\>{\sf double V[n][p];}        \> // tangent matrix $V$\\
1233\>{\sf double Z[n][p];}        \> // result $Z = \nabla^2F(x) V$ 
1237\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1238\>{\sf int lagra\_hess\_vec(tag,m,n,x,v,u,h)}\\
1239\>{\sf short int tag;}         \> // tape identification \\
1240\>{\sf int m;}                 \> // number of dependent variables $m$\\
1241\>{\sf int n;}                 \> // number of independent variables $n$\\
1242\>{\sf double x[n];}           \> // independent vector $x$\\
1243\>{\sf double v[n];}           \> // tangent vector $v$\\
1244\>{\sf double u[m];}           \> // range weight vector $u$ \\
1245\>{\sf double h[n];}           \> // result $h = u^T\nabla^2F(x) v $
1248The next procedure allows the user to perform Newton steps only
1249having the corresponding tape at hand:
1252\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1253\>{\sf int jac\_solv(tag,n,x,b,mode)} \\
1254\>{\sf short int tag;}         \> // tape identification \\
1255\>{\sf int n;}                 \> // number of independent variables $n$\\
1256\>{\sf double x[n];}           \> // independent vector $x$ as\\
1257\>{\sf double b[n];}           \> // in: right-hand side b, out: result $w$ of
1258$F(x)w = b$\\
1259\>{\sf int mode;}              \> // option to choose different solvers
1262On entry, parameter {\sf b} of the routine {\sf jac\_solv}
1263contains the right-hand side of the equation $F(x)w = b$ to be solved. On exit,
1264{\sf b} equals the solution $w$ of this equation. If {\sf mode} = 0 only
1265the Jacobian of the function
1266given by the tape labeled with {\sf tag} is provided internally.
1267The LU-factorization of this Jacobian is computed for {\sf mode} = 1. The
1268solution of the equation is calculated if {\sf mode} = 2.
1269Hence, it is possible to compute the
1270LU-factorization only once. Then the equation can be solved for several
1271right-hand sides $b$ without calculating the Jacobian and
1272its factorization again. 
1274If the original evaluation code of a function contains neither
1275quadratures nor branches, all drivers described above can be used to
1276evaluate derivatives at any argument in its domain. The same still
1277applies if there are no user defined quadratures and
1278all comparisons  involving {\sf adouble}s have the same result as
1279during taping. If this assumption is falsely made all drivers
1280while internally calling the forward mode evaluation will return the value -1 or -2
1281as already specified in \autoref{reuse_tape}
1285\subsection{Drivers for Ordinary Differential Equations}
1288When $F$ is the right-hand side of an (autonomous) ordinary
1289differential equation 
1291x^\prime(t) \; = \; F(x(t)) , 
1293we must have $m=n$. Along any solution path $x(t)$ its Taylor
1294coefficients $x_j$ at some time, e.g., $t=0$, must satisfy
1295the relation
1297 x_{i+1} = \frac{1}{1+i} y_i.
1299with the $y_j$ the Taylor coefficients of its derivative $y(t)=x^\prime(t)$, namely,
1301 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
1303defined by an autonomous right-hand side $F$ recorded on the tape.
1304Using this relation, one can generate the Taylor coefficients $x_i$,
1305$i \le deg$,
1306recursively from the current point $x_0$. This task is achieved by the
1307driver routine {\sf forode} defined as follows:
1310\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1311\>{\sf int forode(tag,n,tau,dol,deg,X)}\\
1312\>{\sf short int tag;}         \> // tape identification \\
1313\>{\sf int n;}                 \> // number of state variables $n$\\
1314\>{\sf double tau;}            \> // scaling parameter\\
1315\>{\sf int dol;}               \> // degree on previous call\\
1316\>{\sf int deg;}               \> // degree on current call\\
1317\>{\sf double X[n][deg+1];}    \> // Taylor coefficient vector $X$
1320If {\sf dol} is positive, it is assumed that {\sf forode}
1321has been called before at the same point so that all Taylor coefficient
1322vectors up to the {\sf dol}-th are already correct.
1324Subsequently one may call the driver routine {\sf reverse} or corresponding
1325low level routines as explained in the \autoref{forw_rev} and
1326\autoref{forw_rev_ad}, respectively, to compute
1327the family of square matrices {\sf Z[n][n][deg]} defined by
1329Z_j \equiv U\/\frac{\partial y_j}{\partial x_0} \in{I\!\!R}^{q \times n} ,
1331with {\sf double** U}$=I_n$ the identity matrix of order {\sf n}.
1333For the numerical solutions of ordinary differential equations,
1334one may also wish to calculate the Jacobians
1337B_j \; \equiv \; \frac{\mbox{d}x_{j+1}}{\mbox{d} x_0}\;\in\;{I\!\!R}^{n \times n}\, ,
1339which exist provided $F$ is sufficiently smooth. These matrices can
1340be obtained from the partial derivatives $\partial y_i/\partial x_0$
1341by an appropriate version of the chain rule.
1342To compute the total derivatives $B = (B_j)_{0\leq j <d}$
1343defined in \eqref{eq:bees}, one has to evaluate $\frac{1}{2}d(d-1)$
1344matrix-matrix products. This can be done by a call of the routine {\sf accode} after the
1345corresponding evaluation of the {\sf hov\_reverse} function. The interface of
1346{\sf accode} is defined as follows:
1349\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1350\>{\sf int accode(n,tau,deg,Z,B,nz)}\\
1351\>{\sf int n;}                 \> // number of state variables $n$ \\
1352\>{\sf double tau;}            \> // scaling parameter\\
1353\>{\sf int deg;}               \> // degree on current call\\
1354\>{\sf double Z[n][n][deg];}   \> // partials of coefficient vectors\\
1355\>{\sf double B[n][n][deg];}   \> // result $B$ as defined in \eqref{eq:bees}\\
1356\>{\sf short nz[n][n];}        \> // optional nonzero pattern
1359Sparsity information can be exploited by {\sf accode} using the array {\sf
1360nz}. For this purpose, {\sf nz} has to be set by a call of the routine {\sf
1361reverse} or the corresponding basic routines as explained below in
1362\autoref{forw_rev_ad} and \autoref{forw_rev}, respectively. The
1363non-positive entries of {\sf nz} are then changed by {\sf accode} so that upon
1366  \mbox{{\sf B[i][j][k]}} \; \equiv \; 0 \quad {\rm if} \quad \mbox{\sf k} \leq \mbox{\sf $-$nz[i][j]}\; .
1368In other words, the matrices $B_k$ = {\sf B[ ][ ][k]} have a
1369sparsity pattern that fills in as $k$ grows. Note, that there need to be no
1370loss in computational efficiency if a time-dependent ordinary differential equation
1371is rewritten in autonomous form.
1373The prototype of the ODE-drivers {\sf forode} and {\sf accode} is contained in the header file
1374\verb=<adolc/drivers/odedrivers.h>=. The global header file
1376includes this file automatically, see \autoref{ssec:DesIH}.
1378An example program using the procedures {\sf forode} and {\sf accode} together
1379with more detailed information about the coding can be found in
1380\autoref{exam:ode}. The corresponding source code
1381\verb=odexam.cpp= is contained in the subdirectory
1386\subsection{Drivers for Sparse Jacobians and Sparse Hessians}
1389Quite often, the Jacobians and Hessians that have to be computed are sparse
1390matrices. Therefore, ADOL-C provides additionally drivers that
1391allow the exploitation of sparsity. The exploitation of sparsity is
1392frequently based on {\em graph coloring} methods, discussed
1393for example in \cite{GeMaPo05} and \cite{GeTaMaPo07}. The sparse drivers of ADOL-C presented in this section
1394rely on the the coloring package ColPack developed by the authors of \cite{GeMaPo05} and \cite{GeTaMaPo07}.
1395ColPack is not directly incorporated in ADOL-C, and therefore needs to be installed
1396separately to use the sparse drivers described here. ColPack is available for download at
1397\verb= More information about the required
1398installation of ColPack is given in \autoref{install}.
1400\subsubsection*{Sparse Jacobians and Sparse Hessians}
1402To compute the entries of sparse Jacobians and sparse Hessians,
1403respectively, in coordinate format one may use the drivers:
1405\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1406\>{\sf int sparse\_jac(tag,m,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1407\>{\sf short int tag;}         \> // tape identification \\
1408\>{\sf int m;}                 \> // number of dependent variables $m$\\ 
1409\>{\sf int n;}                 \> // number of independent variables $n$\\
1410\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1411\>{\sf double x[n];}           \> // independent vector $x$ \\
1412\>{\sf int nnz;}               \> // number of nonzeros \\ 
1413\>{\sf unsigned int rind[nnz];}\> // row index\\ 
1414\>{\sf unsigned int cind[nnz];}\> // column index\\ 
1415\>{\sf double values[nnz];}    \> // non-zero values\\ 
1416\>{\sf int options[4];}        \> // array of control parameters\\ 
1420\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1421\>{\sf int sparse\_hess(tag,n,repeat,x,\&nnz,\&rind,\&cind,\&values,\&options)}\\
1422\>{\sf short int tag;}         \> // tape identification \\
1423\>{\sf int n;}                 \> // number of independent variables $n$ and $m=1$\\
1424\>{\sf int repeat;}            \> // indicate repeated call at same argument\\
1425\>{\sf double x[n];}           \> // independent vector $x$ \\
1426\>{\sf int nnz;}               \> // number of nonzeros \\ 
1427\>{\sf unsigned int rind[nnz];}\> // row indices\\ 
1428\>{\sf unsigned int cind[nnz];}\> // column indices\\ 
1429\>{\sf double values[nnz];}    \> // non-zero values  \\
1430\>{\sf int options[2];}        \> // array of control parameters\\ 
1433Once more, the input variables are the identifier for the internal
1434representation {\sf tag}, if required the number of dependents {\sf m},
1435and the number of independents {\sf n} for a consistency check.
1436Furthermore, the flag {\sf repeat=0} indicates that the functions are called
1437at a point with a new sparsity structure, whereas  {\sf repeat=1} results in
1438the re-usage of the sparsity pattern from the previous call.
1439The current values of the independents are given by the array {\sf x}.
1440The input/output
1441variable {\sf nnz} stores the number of the nonzero entries.
1442Therefore, {\sf nnz} denotes also the length of the arrays {\sf r\_ind} storing
1443the row indices, {\sf c\_ind} storing the column indices, and
1444{\sf values} storing the values of the nonzero entries.
1445If {\sf sparse\_jac} and {\sf sparse\_hess} are called with {\sf repeat=0},
1446the functions determine the number of nonzeros for the sparsity pattern
1447defined by the value of {\sf x}, allocate appropriate arrays {\sf r\_ind},
1448{\sf c\_ind}, and {\sf values} and store the desired information in these
1450During the next function call with {\sf repeat=1} the allocated memory
1451is reused such that only the values of the arrays are changed.   
1452Before calling {\sf sparse\_jac} or {\sf sparse\_hess} once more with {\sf
1453  repeat=0} the user is responsible for the deallocation of the array
1454 {\sf r\_ind}, {\sf c\_ind}, and {\sf values} using the function {\sf
1455   free()}!
1457For each driver the array {\sf options} can be used to adapted the
1458computation of the sparse derivative matrices to the special
1459needs of application under consideration. Most frequently, the default options
1460will give a reasonable performance. The elements of the array {\sf options} control the action of
1461{\sf sparse\_jac} according to \autoref{options_sparse_jac}.
1464\begin{tabular}{|c|c|l|} \hline
1465component & value &  \\ \hline
1466{\sf options[0]} &    &  way of sparsity pattern computation \\
1467                 & 0  &  propagation of index domains (default) \\
1468                 & 1  &  propagation of bit pattern \\ \hline
1469{\sf options[1]} &    &  test the computational graph control flow \\
1470                 & 0  &  safe mode (default) \\
1471                 & 1  &  tight mode \\ \hline
1472{\sf options[2]} &    &  way of bit pattern propagation \\
1473                 & 0  &  automatic detection (default) \\
1474                 & 1  &  forward mode \\ 
1475                 & 2  &  reverse mode \\ \hline
1476{\sf options[3]} &    &  way of compression \\
1477                 & 0  &  column compression (default) \\
1478                 & 1  &  row compression \\ \hline
1480\caption{ {\sf sparse\_jac} parameter {\sf options}\label{options_sparse_jac}}
1483The component {\sf options[1]} determines
1484the usage of the safe or tight mode of sparsity computation.
1485The first, more conservative option is the default. It accounts for all
1486dependences that might occur for any value of the
1487independent variables. For example, the intermediate
1488{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1489always assumed to depend on all independent variables that {\sf a} or {\sf b}
1490dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1491logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1492In contrast
1493the tight option gives this result only in the unlikely event of an exact
1494tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1495associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1496depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1497Obviously, the sparsity pattern obtained with the tight option may contain
1498more zeros than that obtained with the safe option. On the other hand, it
1499will only be valid at points belonging to an area where the function $F$ is locally
1500analytic and that contains the point at which the internal representation was
1501generated. Since generating the sparsity structure using the safe version does not
1502require any reevaluation, it may thus reduce the overall computational cost
1503despite the fact that it produces more nonzero entries.
1504The value of {\sf options[2]} selects the direction of bit pattern propagation.
1505Depending on the number of independent $n$ and of dependent variables $m$ 
1506one would prefer the forward mode if $n$ is significant smaller than $m$ and
1507would otherwise use the reverse mode.
1509 The elements of the array {\sf options} control the action of
1510{\sf sparse\_hess} according to \autoref{options_sparse_hess}.
1513\begin{tabular}{|c|c|l|} \hline
1514component & value &  \\ \hline
1515{\sf options[0]} &    &  test the computational graph control flow \\
1516                 & 0  &  safe mode (default) \\
1517                 & 1  &  tight mode \\ \hline
1518{\sf options[1]} &    &  way of recovery \\
1519                 & 0  &  indirect recovery (default) \\
1520                 & 1  &  direct recovery \\ \hline
1522\caption{ {\sf sparse\_hess} parameter {\sf options}\label{options_sparse_hess}}
1525The described driver routines for the computation of sparse derivative
1526matrices are prototyped in the header file
1527\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1528global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1529Example codes illustrating the usage of {\sf
1530  sparse\_jac} and {\sf sparse\_hess} can be found in the file
1531\verb=sparse_jacobian.cpp=  and \verb=sparse_hessian.cpp= contained in %the subdirectory
1535\subsubsection*{Computation of Sparsity Pattern}
1537ADOL-C offers a convenient way of determining the 
1538sparsity structure of a Jacobian matrix using the function:
1541\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1542\>{\sf int jac\_pat(tag, m, n, x, JP, options)}\\
1543\>{\sf short int tag;} \> // tape identification \\
1544\>{\sf int m;} \> // number of dependent variables $m$\\
1545\>{\sf int n;} \> // number of independent variables $n$\\
1546\>{\sf double x[n];} \> // independent variables $x_0$\\
1547\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure\\
1548\>{\sf int options[2];} \> // array of control parameters
1551The sparsity pattern of the
1552Jacobian is computed in a compressed row format. For this purpose,
1553{\sf JP} has to be an $m$ dimensional array of pointers to {\sf
1554  unsigned int}s, i.e., one has {\sf unsigned int* JP[m]}.
1555During the call of  {\sf jac\_pat}, the number $\hat{n}_i$ of nonzero
1556entries in row $i$ of the Jacobian is determined for all $1\le i\le
1557m$. Then, a memory allocation is performed such that {\sf JP[i-1]}
1558points to a block of $\hat{n}_i+1$ {\sf  unsigned int} for all $1\le
1559i\le m$ and {\sf JP[i-1][0]} is set to $\hat{n}_i$. Subsequently, the
1560column indices of the $j$ nonzero entries in the $i$th row are stored
1561in the components  {\sf JP[i-1][1]}, \ldots, {\sf JP[i-1][j]}.
1563The elements of the array {\sf options} control the action of
1564{\sf jac\_pat} according to \autoref{options}.
1567\begin{tabular}{|c|c|l|} \hline
1568component & value &  \\ \hline
1569{\sf options[0]} &    &  way of sparsity pattern computation \\
1570                 & 0  &  propagation of index domains (default) \\
1571                 & 1  &  propagation of bit pattern \\ \hline
1572{\sf options[1]} &    &  test the computational graph control flow \\
1573                 & 0  &  safe mode (default) \\
1574                 & 1  &  tight mode \\ \hline
1575{\sf options[2]} &    &  way of bit pattern propagation \\
1576                 & 0  &  automatic detection (default) \\
1577                 & 1  &  forward mode \\ 
1578                 & 2  &  reverse mode \\ \hline
1580\caption{ {\sf jac\_pat} parameter {\sf options}\label{options}}
1582The value of {\sf options[0]} selects the way to compute the sparsity
1583pattern. The component {\sf options[1]} determines
1584the usage of the safe or tight mode of bit pattern propagation.
1585The first, more conservative option is the default. It accounts for all
1586dependences that might occur for any value of the
1587independent variables. For example, the intermediate
1588{\sf c}~$=$~{\sf max}$(${\sf a}$,${\sf b}$)$ is
1589always assumed to depend on all independent variables that {\sf a} or {\sf b}
1590dependent on, i.e.\ the bit pattern associated with {\sf c} is set to the
1591logical {\sf OR} of those associated with {\sf a} and {\sf b}.
1592In contrast
1593the tight option gives this result only in the unlikely event of an exact
1594tie {\sf a}~$=$~{\sf b}. Otherwise it sets the bit pattern
1595associated with {\sf c} either to that of {\sf a} or to that of {\sf b},
1596depending on whether {\sf c}~$=$~{\sf a} or {\sf c}~$=$~{\sf b} locally.
1597Obviously, the sparsity pattern obtained with the tight option may contain
1598more zeros than that obtained with the safe option. On the other hand, it
1599will only be valid at points belonging to an area where the function $F$ is locally
1600analytic and that contains the point at which the internal representation was
1601generated. Since generating the sparsity structure using the safe version does not
1602require any reevaluation, it may thus reduce the overall computational cost
1603despite the fact that it produces more nonzero entries. The value of
1604{\sf options[2]} selects the direction of bit pattern propagation.
1605Depending on the number of independent $n$ and of dependent variables $m$ 
1606one would prefer the forward mode if $n$ is significant smaller than $m$ and
1607would otherwise use the reverse mode.
1609The routine {\sf jac\_pat} may use the propagation of bitpattern to
1610determine the sparsity pattern. Therefore, a kind of ``strip-mining''
1611is used to cope with large matrix dimensions. If the system happens to run out of memory, one may reduce
1612the value of the constant {\sf PQ\_STRIPMINE\_MAX}
1613following the instructions in \verb=<adolc/sparse/sparse_fo_rev.h>=.
1615The driver routine is prototyped in the header file
1616\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1617global header file \verb=<adolc/adolc.h>= (see
1618\autoref{ssec:DesIH}). The determination of sparsity patterns is
1619illustrated by the examples \verb=sparse_jacobian.cpp=
1620and \verb=jacpatexam.cpp=
1621contained in
1624To compute the sparsity pattern of a Hessian in a row compressed form, ADOL-C provides the
1627\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1628\>{\sf int hess\_pat(tag, n, x, HP, options)}\\
1629\>{\sf short int tag;}       \> // tape identification \\
1630\>{\sf int n;}               \> // number of independent variables $n$\\
1631\>{\sf double x[n];}         \> // independent variables $x_0$\\
1632\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure\\
1633\>{\sf int option;}          \> // control parameter
1635where the user has to provide {\sf HP} as an $n$ dimensional array of pointers to {\sf
1636 unsigned int}s.
1637After the function call {\sf HP} contains the sparsity pattern,
1638where {\sf HP[j][0]} contains the number of nonzero elements in the
1639 $j$th row for $1 \le j\le n$.
1640The components {\sf P[j][i]}, $0<${\sf i}~$\le$~{\sf P[j][0]} store the
1641 indices of these entries. For determining the sparsity pattern, ADOL-C uses
1642 the algorithm described in \cite{Wa05a}.  The parameter{\sf option} determines
1643the usage of the safe ({\sf option = 0}, default) or tight mode ({\sf
1644  option = 1}) of the computation of the sparsity pattern as described
1647This driver routine is prototyped in the header file
1648\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1649global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1650An example employing the procedure {\sf hess\_pat}  can be found in the file
1651\verb=sparse_hessian.cpp=  contained in
1655\subsubsection*{Calculation of Seed Matrices}
1657To compute a compressed derivative matrix from a given sparsity
1658pattern, one has to calculate an appropriate seed matrix that can be
1659used as input for the derivative calculation. To facilitate the
1660generation of seed matrices for a sparsity pattern given in
1661row compressed form, ADOL-C provides the following two drivers,
1662which are based on the ColPack library:
1664\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1665\>{\sf int generate\_seed\_jac(m, n, JP, S, p)}\\
1666\>{\sf int m;} \> // number of dependent variables $m$\\
1667\>{\sf int n;} \> // number of independent variables $n$\\
1668\>{\sf unsigned int JP[][];} \> // row compressed sparsity structure
1669of Jacobian\\
1670\>{\sf double S[n][p];} \> // seed matrix\\
1671\>{\sf int p;} \> // number of columns in $S$
1673The input variables to {\sf generate\_seed\_jac} are the number of dependent variables $m$, the
1674number of independent variables {\sf n} and the sparsity pattern {\sf
1675  JP} of the Jacobian computed for example by {\sf jac\_pat}. First,
1676{\sf generate\_seed\_jac} performs a distance-2 coloring of the bipartite graph defined by the sparsity
1677pattern {\sf JP} as described in \cite{GeMaPo05}. The number of colors needed for the coloring
1678determines the number of columns {\sf p} in the seed
1679matrix. Subsequently, {\sf generate\_seed\_jac} allocates the memory needed by {\sf
1680 S} and initializes {\sf S} according to the graph coloring.
1681The coloring algorithm that is applied in {\sf
1682  generate\_seed\_jac} is used also by the driver {\sf sparse\_jac}
1683described earlier.
1686\hspace{0.5in}\={\sf short int tag;} \hspace{1.3in}\= \kill    % define tab position
1687\>{\sf int generate\_seed\_hess(n, HP, S, p)}\\
1688\>{\sf int n;} \> // number of independent variables $n$\\
1689\>{\sf unsigned int HP[][];} \> // row compressed sparsity structure
1690of Jacobian\\
1691\>{\sf double S[n][p];} \> // seed matrix\\
1692\>{\sf int p;} \> // number of columns in $S$
1694The input variables to {\sf generate\_seed\_hess} are the number of independents $n$
1695and the sparsity pattern {\sf HP} of the Hessian computed for example
1696by {\sf hess\_pat}. First, {\sf generate\_seed\_hess} performs an
1697appropriate coloring of the adjacency graph defined by the sparsity
1698pattern {\sf HP}: An acyclic coloring in the case of an indirect recovery of the Hessian from its
1699    compressed representation and a star coloring in the case of a direct recovery.
1700 Subsequently, {\sf generate\_seed\_hess} allocates the memory needed by {\sf
1701 S} and initializes {\sf S} according to the graph coloring.
1702The coloring algorithm applied in {\sf
1703  generate\_seed\_hess} is used also by the driver {\sf sparse\_hess}
1704described earlier.
1706The specific set of criteria used to define a seed matrix $S$ depends
1707on whether the sparse derivative matrix
1708to be computed is a Jacobian (nonsymmetric) or a Hessian (symmetric). 
1709It also depends on whether the entries of the derivative matrix  are to be
1710recovered from the compressed representation \emph{directly}
1711(without requiring any further arithmetic) or \emph{indirectly} (for
1712example, by solving for unknowns via successive substitutions).
1713Appropriate recovery routines are provided by ColPack and used
1714in the drivers {\sf sparse\_jac} and {\sf sparse\_hess} described in
1715the previous subsection. Examples with a detailed analysis of the
1716employed drivers for the exploitation of sparsity can be found in the
1717papers \cite{GePoTaWa06} and \cite{GePoWa08}.
1720These driver routines are prototyped in
1721\verb=<adolc/sparse/sparsedrivers.h>=, which is included automatically by the
1722global header file \verb=<adolc/adolc.h>= (see \autoref{ssec:DesIH}).
1723An example code illustrating the usage of {\sf
1724generate\_seed\_jac} and {\sf generate\_seed\_hess} can be found in the file
1725\verb=sparse_jac_hess_exam.cpp= contained in \verb=examples/additional_examples/sparse=.
1728\subsection{Higher Derivative Tensors}
1731Many applications in scientific computing need second- and higher-order
1732derivatives. Often, one does not require full derivative tensors but
1733only the derivatives in certain directions $s_i \in \R^{n}$.
1734Suppose a collection of $p$ directions
1735$s_i \in \R^{n}$ is given, which form a matrix
1737S\; =\; \left [ s_1, s_2,\ldots,  s_p \right ]\; \in \;
1738 \R^{n \times p}.
1740One possible choice is $S = I_n$ with  $p = n$, which leads to
1741full tensors being evaluated.
1742ADOL-C provides the function {\sf tensor\_eval}
1743to calculate the derivative tensors
1746\left. \nabla_{\mbox{$\scriptstyle \!\!S$}}^{k}
1747     F(x_0) \; = \; \frac{\partial^k}{\partial z^k} F(x_0+Sz) \right |_{z=0} 
1748     \in \R^{p^k}\quad \mbox{for} \quad k = 0,\ldots,d
1750simultaneously. The function {\sf tensor\_eval} has the following calling sequence and
1754\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1755\>{\sf void tensor\_eval(tag,m,n,d,p,x,tensor,S)}\\
1756\>{\sf short int tag;}         \> // tape identification \\
1757\>{\sf int m;}                 \> // number of dependent variables $m$ \\
1758\>{\sf int n;}                 \> // number of independent variables $n$\\
1759\>{\sf int d;}                 \> // highest derivative degree $d$\\
1760\>{\sf int p;}                 \> // number of directions $p$\\
1761\>{\sf double x[n];}           \> // values of independent variables $x_0$\\
1762\>{\sf double tensor[m][size];}\> // result as defined in \eqref{eq:tensor} in compressed form\\
1763\>{\sf double S[n][p];}        \> // seed matrix $S$
1766Using the symmetry of the tensors defined by \eqref{eq:tensor}, the memory 
1767requirement can be reduced enormously. The collection of  tensors up to order $d$ comprises 
1768$\binom{p+d}{d}$ distinct elements. Hence, the second dimension of {\sf tensor} must be
1769greater or equal to $\binom{p+d}{d}$.
1770To compute the derivatives, {\sf tensor\_eval} propagates internally univariate Taylor
1771series along $\binom{n+d-1}{d}$ directions. Then the desired values are interpolated. This
1772approach is described in \cite{Griewank97}.
1774The access of individual entries in symmetric tensors of
1775higher order is a little tricky. We always store the derivative
1776values in the two dimensional array {\sf tensor} and provide two
1777different ways of accessing them. 
1778The leading dimension of the tensor array ranges over
1779the component index $i$ of the function $F$, i.e., $F_{i+1}$ for $i =
17800,\ldots,m-1$. The sub-arrays pointed to by {\sf tensor[i]} have identical
1781structure for all $i$. Each of them represents the symmetric tensors up to
1782order $d$ of the scalar function $F_{i+1}$ in $p$ variables. 
1784The $\binom{p+d}{d}$ mixed partial derivatives in each of the $m$
1785tensors are linearly ordered according to the tetrahedral
1786scheme described by Knuth \cite{Knuth73}. In the familiar quadratic
1787case $d=2$ the derivative with respect to $z_j$ and $z_k$ with $z$ 
1788as in \eqref{eq:tensor} and $j \leq k$ is stored at {\sf tensor[i][l]} with
1789$l = k*(k+1)/2+j$. At $j = 0 = k$ and hence $l = 0$ we find the
1790function value $F_{i+1}$ itself and the gradient
1791$\nabla F_{i+1}= \partial F_{i+1}/\partial x_k $ is stored at $l=k(k+1)/2$
1792with $j=0$ for $k=1,\ldots,p$.
1794For general $d$ we combine the variable
1795indices to a multi-index $j = (j_1,j_2,\ldots,j_d)$,
1796where $j_k$ indicates differentiation with respect to variable
1797$x_{j_k}$ with $j_k \in \{0,1,\ldots,p\}$. The value $j_k=0$ indicates
1798no differentiation so that all lower derivatives are also
1799contained in the same data structure as described above for
1800the quadratic case. The location of the partial derivative specified
1801by $j$ is computed by the function
1804\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1805\>{\sf int tensor\_address(d,$\,$j)} \\
1806\>{\sf int d;}                 \> // highest derivative degree $d$ \\
1807\>{\sf int j[d];}              \> // multi-index $j$
1810and it may thus be referenced as {\sf tensor[i][tensor\_address(d,$\,$j)]}.
1811Notice that the address computation does depend on the degree $d$ 
1812but not on the number of directions $p$, which could theoretically be
1813enlarged without the need to reallocate the original tensor.
1814Also, the components of $j$ need to be non-increasing.
1816To some C programmers it may appear more natural to access tensor
1817entries by successive dereferencing in the form
1818{\sf tensorentry[i][$\,$j1$\,$][$\,$j2$\,$]$\ldots$[$\,$jd$\,$]}.
1819We have also provided this mode, albeit with the restriction
1820that the indices $j_1,j_2,\ldots,j_d$ are non-increasing.
1821In the second order case this means that the Hessian entries must be
1822specified in or below the diagonal. If this restriction is
1823violated the values are almost certain to be wrong and array bounds
1824may be violated. We emphasize that subscripting is not overloaded
1825but that {\sf tensorentry} is a conventional and
1826thus moderately efficient C pointer structure.
1827Such a pointer structure can be allocated and set up completely by the
1831\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1832\>{\sf void** tensorsetup(m,p,d,tensor)} \\
1833\>{\sf int m;}                 \> // number of dependent variables $n$ \\
1834\>{\sf int p;}                 \> // number of directions $p$\\
1835\>{\sf int d;}                 \> // highest derivative degree $d$\\
1836\>{\sf double tensor[m][size];}\> // pointer to two dimensional array
1839Here, {\sf tensor} is the array of $m$ pointers pointing to arrays of {\sf size}
1840$\geq \binom{p+d}{d}$ allocated by the user before. During the execution of {\sf tensorsetup},
1841 $d-1$ layers of pointers are set up so that the return value
1842allows the direct dereferencing of individual tensor elements.
1844For example, suppose some active section involving  $m \geq 5$ dependents and
1845$n \geq 2$ independents has been executed and taped. We may
1846select $p=2$, $d=3$ and initialize the $n\times 2$ seed matrix $S$ with two
1847columns $s_1$ and $s_2$. Then we are able to execute the code segment
1849\hspace{0.5in}\={\sf double**** tensorentry = (double****) tensorsetup(m,p,d,tensor);} \\
1850              \>{\sf tensor\_eval(tag,m,n,d,p,x,tensor,S);}   
1852This way, we evaluated all tensors defined in \eqref{eq:tensor} up to degree 3
1853in both directions $s_1$ and
1854$s_2$ at some argument $x$. To allow the access of tensor entries by dereferencing the pointer
1855structure {\sf tensorentry} has been created. Now, 
1856the value of the mixed partial
1858 \left. \frac{\partial ^ 3 F_5(x+s_1 z_1+s_2 z_2)}{\partial z_1^2 \partial z_2}   \right |_{z_1=0=z_2
1860can be recovered as
1862   {\sf tensorentry[4][2][1][1]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[4][tensor\_address(d,$\,$j)]},
1864where the integer array {\sf j} may equal (1,1,2), (1,2,1) or (2,1,1). 
1865Analogously, the entry
1867   {\sf tensorentry[2][1][0][0]} \hspace{0.2in} or \hspace{0.2in} {\sf tensor[2][tensor\_address(d,$\,$j)]}
1869with {\sf j} = (1,0,0) contains the first derivative of the third dependent
1870variable $F_3$ with respect to the first differentiation parameter $z_1$.
1872Note, that the pointer structure {\sf tensorentry} has to be set up only once. Changing the values of the
1873array {\sf tensor}, e.g.~by a further call of {\sf tensor\_eval}, directly effects the values accessed
1874by {\sf tensorentry}.
1876When no more derivative evaluations are desired the pointer structure
1877{\sf tensorentry} can be deallocated by a call to the function
1880\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1881\>{\sf int freetensor(m,p,d, (double ****) tensorentry)}\\
1882\>{\sf int m;}                    \> // number of dependent variables $m$ \\
1883\>{\sf int p;}                    \> // number of independent variables $p$\\
1884\>{\sf int d;}                    \> // highest derivative degree $d$\\
1885\>{\sf double*** tensorentry[m];} \> // return value of {\sf tensorsetup} 
1888that does not deallocate the array {\sf tensor}.
1890The drivers provided for efficient calculation of higher order
1891derivatives are prototyped in the header file \verb=<adolc/drivers/taylor.h>=,
1892which is included by the global header file \verb=<adolc/adolc.h>= automatically
1893(see \autoref{ssec:DesIH}).
1894Example codes using the above procedures can be found in the files
1895\verb=taylorexam.C= and \verb=accessexam.C= contained in the subdirectory
1899\subsection{Derivatives of Implicit and Inverse Functions}
1902Frequently, one needs derivatives of variables
1903$y \in \R^{m}$ that are implicitly defined as
1904functions of some variables $x \in \R^{n-m}$
1905by an algebraic system of equations
1907G(z) \; = \; 0 \in \R^m \quad
1908{\rm with} \quad z = (y, x) \in \R^n .
1910Naturally, the $n$ arguments of $G$ need not be partitioned in
1911this regular fashion and we wish to provide flexibility for a
1912convenient selection of the $n-m$ {\em truly} independent
1913variables. Let $P \in \R^{(n-m)\times n}$ be a $0-1$ matrix
1914that picks out these variables so that it is a column
1915permutation of the matrix $[0,I_{n-m}] \in \R^{(n-m)\times n}$.
1916Then the nonlinear system
1918  G(z) \; = \; 0, \quad P z =  x,                           
1920has a regular Jacobian, wherever the implicit function theorem
1921yields $y$ as a function of $x$. Hence, we may also write
1924F(z) = \left(\begin{array}{c}
1925                        G(z) \\
1926                        P z
1927                      \end{array} \right)\; \equiv \;
1928                \left(\begin{array}{c}
1929                        0 \\
1930                        P z
1931                      \end{array} \right)\; \equiv \; S\, x,
1933where $S = [0,I_p]^{T} \in \R^{n \times p}$ with $p=n-m$. Now, we have rewritten
1934the original implicit functional relation between $x$ and $y$ as an inverse
1935relation $F(z) = Sx$. In practice, we may implement the projection $P$ simply
1936by marking $n-m$ of the independents also dependent. 
1938Given any $ F : \R^n \mapsto \R^n $ that is locally invertible and an arbitrary
1939seed matrix $S \in \R^{n \times p}$ we may evaluate all derivatives of $z \in \R^n$
1940with respect to $x \in \R^p$ by calling the following routine:
1943\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
1944\>{\sf void inverse\_tensor\_eval(tag,n,d,p,z,tensor,S)}\\
1945\>{\sf short int tag;}         \> // tape identification \\
1946\>{\sf int n;}                 \> // number of variables $n$\\
1947\>{\sf int d;}                 \> // highest derivative degree $d$\\
1948\>{\sf int p;}                 \> // number of directions $p$\\
1949\>{\sf double z[n];}          \> // values of independent variables $z$\\
1950\>{\sf double tensor[n][size];}\> // partials of $z$ with respect to $x$\\
1951\>{\sf double S[n][p];}        \> // seed matrix $S$
1954The results obtained in {\sf tensor} are exactly the same as if we had called {\sf tensor\_eval} with
1955{\sf tag} pointing to a tape for the evaluation of the inverse function
1956$z=F^{-1}(y)$ for which naturally $n=m$. Note that the columns of $S$ belong
1957to the domain of that function. Individual derivative components can be
1958accessed in tensor exactly as in the explicit case described above.
1960It must be understood that {\sf inverse\_tensor\_eval} actually computes the
1961derivatives of $z$ with respect to $x$ that is defined by the equation
1962$F(z)=F(z_0)+S \, x$. In other words the base point at
1963which the inverse function is differentiated is given by $F(z_0)$.
1964The routine has no capability for inverting $F$ itself as
1965solving systems of nonlinear
1966equations $F(z)=0$ in the first place is not just a differentiation task.
1967However, the routine {\sf jac\_solv} described in \autoref{optdrivers} may certainly be very
1968useful for that purpose.
1970As an example consider the following two nonlinear expressions
1972      G_1(z_1,z_2,z_3,z_4) & = & z_1^2+z_2^2-z_3^\\
1973      G_2(z_1,z_2,z_3,z_4) & = & \cos(z_4) - z_1/z_3 \enspace   .
1975The equations $G(z)=0$ describe the relation between the Cartesian
1976coordinates $(z_1,z_2)$ and the polar coordinates $(z_3,z_4)$ in the plane.
1977Now, suppose we are interested in the derivatives of the second Cartesian
1978$y_1=z_2$ and the second (angular) polar coordinate $y_2=z_4$ with respect
1979to the other two variables $x_1=z_1$ and $x_2=z_3$. Then the active section
1980could look simply like
1983\hspace{1.5in}\={\sf for (j=1; j $<$ 5;$\,$j++)}\hspace{0.15in} \= {\sf z[j] \boldmath $\ll=$ \unboldmath  zp[j];}\\
1984\>{\sf g[1] = z[1]*z[1]+z[2]*z[2]-z[3]*z[3]; }\\
1985\>{\sf g[2] = cos(z[4]) - z[1]/z[3]; }\\
1986\>{\sf g[1] \boldmath $\gg=$ \unboldmath gp[1];} \> {\sf g[2] \boldmath $\gg=$ \unboldmath gp[2];}\\
1987\>{\sf z[1] \boldmath $\gg=$ \unboldmath zd[1];} \> {\sf z[3] \boldmath $\gg=$ \unboldmath zd[2];}
1990where {\sf zd[1]} and {\sf zd[2]} are dummy arguments.
1991In the last line the two independent variables {\sf z[1]} and
1992{\sf z[3]} are made
1993simultaneously dependent thus generating a square system that can be
1994inverted (at most arguments). The corresponding projection and seed
1995matrix are
1997P \;=\; \left( \begin{array}{cccc}
1998               1 & 0 & 0 & 0 \\
1999               0 & 0 & 1 & 0
2000            \end{array}\right) \quad \mbox{and} \quad
2001S^T \; = \; \left( \begin{array}{cccc}
2002               0 & 0 & 1 & 0 \\
2003               0 & 0 & 0 & 1
2004            \end{array}\right\enspace .
2006Provided the vector {\sf zp} is consistent in that its Cartesian and polar
2007components describe the same point in the plane the resulting tuple
2008{\sf gp} must vanish. The call to {\sf inverse\_tensor\_eval} with
2009$n=4$, $p=2$ and $d$
2010as desired will yield the implicit derivatives, provided
2011{\sf tensor} has been allocated appropriately of course and $S$ has the value
2012given above.
2014The example is untypical in that the implicit function could also be
2015obtained explicitly by symbolic mani\-pu\-lations. It is typical in that
2016the subset of $z$ components that are to be considered as truly
2017independent can be selected and altered with next to no effort at all.
2019The presented drivers are prototyped in the header file
2020\verb=<adolc/drivers/taylor.h>=. As indicated before this header
2021is included by the global header file \verb=<adolc/adolc.h>= automatically
2022(see \autoref{ssec:DesIH}).
2023The example programs \verb=inversexam.cpp=, \verb=coordinates.cpp= and
2024\verb=trigger.cpp=  in the directory \verb=examples/additional_examples/taylor=
2025show the application of the procedures described here.
2028\subsection{Drivers of the Toolbox for Lie Derivatives}
2029Nonlinear controller and observer design often require certain types of Lie derivatives. These derivatives also arise in other areas such as classical mechanics and relativity.
2030Lie derivatives are total derivatives of tensor fields along a vector field.
2031The drivers for calculating Lie derivatives in ADOL-C presented in this section
2032were developed by Klaus R{\"o}benack~\cite{Roeb05} and his co-workers Jan Winkler, Siqian Wang and Mirko Franke~\cite{Roeb11}.
2033They are prototyped as C functions in the header file {\verb=<adolc_lie.h>=} and
2034allow the calculation of Lie derivatives of scalar, vector and covector fields. In addition, the calculation of gradients of Lie derivatives of scalar fields is supported.
2036We consider computational problems occurring in controller and observer design for
2037nonlinear control systems
2039\dot{\mathbf x}=\mathbf f(\mathbf x)+\mathbf g(\mathbf x)u,
2040\quad \mathbf y=\mathbf h(\mathbf x),\quad \mathbf x(0)=\mathbf x_0 \in \Omega\\
2042with the scalar input u, the state $\mathbf x$, and the output $\mathbf y$, where the vector
2043fields $\mathbf f : \Omega \to \R^n$, $\mathbf g: \Omega \to \R^n$
2044and the map $\mathbf h: \Omega\to \R^m$ (for $m=1$ this map is the scalar field $h$)
2045are defined on an open subset $\Omega\subseteq\R^n$.
2048\subsubsection*{Lie Derivatives of Scalar Fields}
2049Consider the initial value problem
2051 \dot{\mathbf x}={\mathbf f}(\mathbf x),\quad y=h(\mathbf x),\quad \mathbf{x}(0)=\mathbf{x}_0\in\Omega
2053with the vector field $\mathbf f : \Omega \to \R^n$. The Lie derivative of the scalar field $h:\Omega\to\R$ along the vector field~$\mathbf{f}$ are time derivatives of the curve~$y$, i.e.,
2055 \dot{y}(0)=L_\mathbf{f}h(\mathbf{x}_0),\;
2056 \ddot{y}(0)=L_\mathbf{f}^2h(\mathbf{x}_0),\;  \ldots,\;
2057 y^{(d)}(0)=L_\mathbf{f}^dh(\mathbf{x}_0).
2059Formally, the Lie derivative of the scalar field~$h$ along the vector field~$\mathbf{f}$ is defined by
2061 L_{\mathbf f} h(\mathbf x)=\frac{\partial h(\mathbf x)}{\partial \mathbf x}\,\mathbf f(\mathbf x).
2063Higher order Lie derivatives are given by the recursion
2065 L_{\mathbf f}^{k+1} h(\mathbf x)= \frac{\partial L_{\mathbf f}^k h(\mathbf x)}{\partial \mathbf x}\,\mathbf f(\mathbf x)\quad\text{with}\quad L_{\mathbf f}^0 h(\mathbf x)= h(\mathbf x).
2067To compute Lie derivatives
2069(L_\mathbf f^0h(\mathbf x_0),\ldots,L_\mathbf f^dh(\mathbf x_0))
2071we need the trace numbers of the active sections of the vector field~$\mathbf f$ and the scalar field~$h$,
2072the number of independent variables $n$, the initial value $\mathbf x_0\in\Omega$
2073and the highest derivative degree $d$. The values of the Lie derivatives are
2074then stored in the one dimensional array {\verb=result=} of length $d+1$:
2077\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2078\>{\sf int lie\_scalarc(Tape\_F, Tape\_H, n, x0, d, result)}\\
2079\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2080\>{\sf short Tape\_H;}         \> // trace identification of scalar field $h$\\
2081\>{\sf short n;}               \> // number of independent variables $n$ and $m = 1$\\
2082\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2083\>{\sf short d; }              \> // highest derivative degree $d$\\
2084\>{\sf double result[d+1];}    \> // resulting Lie derivatives of a scalar field\\
2088For a smooth vector-valued map $\mathbf h: \Omega\rightarrow \R^m$ with
2089$\mathbf h(\mathbf x)=(h_1(\mathbf x),\ldots,h_m(\mathbf x))^T$
2090we define the Lie derivative
2092L_\mathbf f^k\mathbf h(\mathbf x_0)=
2093(L_\mathbf f^kh_1(\mathbf x_0),\ldots,L_\mathbf f^kh_m(\mathbf x_0))^T\in\R^m
2094 \quad \text{for}\quad k = 0, \cdots, d,
2096component-wise by the Lie derivatives $L_\mathbf f^k h_1(\mathbf x_0),\ldots,L_\mathbf f^k h_m(\mathbf x_0)$ of the $m$ scalar fields $h_1, \ldots, h_m:\Omega\to\R$. Note that~\eqref{eq:lie_vectorial} should not be confused with the Lie derivative of a vector field. To compute Lie derivatives~\eqref{eq:lie_vectorial} of the vector-valued map $\mathbf h$
2097we additionally need the number of dependent variables $m$:
2100\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2101\>{\sf int lie\_scalarcv(Tape\_F, Tape\_H, n, m, x0, d, result)}\\
2102\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2103\>{\sf short Tape\_H;}         \> // trace identification of vector-valued map $\mathbf h$\\
2104\>{\sf short n;}               \> // number of independent variables $n$\\
2105\>{\sf short m;}               \> // number of dependent variables $m$\\
2106\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2107\>{\sf short d; }              \> // highest derivative degree $d$\\
2108\>{\sf double result[m][d+1];} \> // resulting Lie derivatives of vectorial scalar fields\\
2111\subsubsection*{Gradients of Lie Derivatives of Scalar Fields}
2112To compute the gradients
2114(\mathrm{d}L_\mathbf f^0h(\mathbf x_0),\ldots, \mathrm{d}L_\mathbf f^dh(\mathbf x_0))
2116of the Lie derivatives $L_\mathbf f^0h(\mathbf x_0),\ldots, L_\mathbf f^dh(\mathbf x_0)$ of the scalar field $h:\Omega\to\R$ along the vector field $\mathbf f:\Omega\to\R^n$,
2117the following C function can be used:
2120\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2121\>{\sf int lie\_gradientc(Tape\_F, Tape\_H, n, x0, d, result)}\\
2122\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2123\>{\sf short Tape\_H;}         \> // trace identification of scalar field $h$\\
2124\>{\sf short n;}               \> // number of independent variables $n$ and $m=1$\\
2125\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2126\>{\sf short d;}               \> // highest derivative degree $d$\\
2127\>{\sf double result[n][d+1];} \> // resulting gradients of Lie derivatives\\
2128\>{\sf} \> // of a scalar field\\
2131For calculating the jacobians $\mathrm{d}L_\mathbf f^k\mathbf h(\mathbf x_0)\in\R^{m\times n}$
2132of Lie derivatives~\eqref{eq:lie_vectorial} of the vector-valued map $\mathbf h: \Omega\rightarrow \R^m$ for $k=0,\ldots,d$ one can use the following function, where the $d+1$ matrices are stored in the three dimensional array {\verb=result=}:
2135\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2136\>{\sf int lie\_gradientcv(Tape\_F, Tape\_H, n, m, x0, d, result)}\\
2137\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2138\>{\sf short Tape\_H;}         \> // trace identification of vector-valued map $\mathbf h$\\
2139\>{\sf short n;}               \> // number of independent variables $n$\\
2140\>{\sf short m;}               \> // number of dependent variables $m$\\
2141\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2142\>{\sf short d;}               \> // highest derivative degree $d$\\
2143\>{\sf double result[m][n][d+1];} \> // resulting jacobians of Lie derivatives\\
2144\>{\sf} \> // of vectorial scalar fields\\
2147\subsubsection*{Lie Derivatives of Covector Fields}
2149If we consider the elements of the real vector space $\R^n$ as column vectors, the elements of the associated dual space $(\R^n)^*$ can be represented by row vectors. These row vectors are called adjoint vectors in ADOL-C and covectors in differential geometry.
2150In a program the user does not have to distinguish between a vector field
2151$\mathbf f: \Omega \rightarrow \R^n$ and a covector field
2152$\boldsymbol{\omega}: \Omega \rightarrow (\R^n)^*$, since for both fields the dependent variables are stored in one dimensional arrays.
2153The Lie derivative of a covector field~$\boldsymbol{\omega}$ is defined by
2155 L_{\mathbf f}\boldsymbol{\omega}(\mathbf x)=\boldsymbol{\omega}(\mathbf x)\,{\mathbf f}^\prime(\mathbf x)
2156 + {\mathbf f}^T(\mathbf x) \left(\frac{\partial \boldsymbol{\omega}^T (\mathbf x)}{\partial \mathbf x}\right)^T.
2158The $d+1$ Lie derivatives $L_{\mathbf f}^0\boldsymbol{\omega}(\mathbf x),\ldots,L_{\mathbf f}^d\boldsymbol{\omega}(\mathbf x)$ of the covector field $\boldsymbol{\omega}$ along the vector field~$\mathbf{f}$
2159can be computed by
2162\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2163\>{\sf int lie\_covector(Tape\_F, Tape\_W, n, x0, d, result)}\\
2164\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2165\>{\sf short Tape\_W;}         \> // trace identification of covector field $\boldsymbol{\omega}$\\
2166\>{\sf short n;}               \> // number of independent variables $n$\\
2167\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2168\>{\sf short d;}               \> // highest derivative degree $d$\\
2169\>{\sf double result[n][d+1];} \> // resulting Lie derivatives of a covector field\\
2172\subsubsection*{Lie Derivatives of a Vector Field (Lie Brackets)}
2174Lie derivatives (Lie brackets) of a vector field $\mathbf g: \Omega \rightarrow \R^n$ along a vector field $\mathbf f: \Omega \rightarrow \R^n$ are defined by
2176 \mathrm{ad}_\mathbf{f}\mathbf{g}(\mathbf{x})=[\mathbf{f},\mathbf{g}](\mathbf{x})
2177 =\mathbf{g}^{\prime}(\mathbf{x})\mathbf{f}(\mathbf{x})-\mathbf{f}^{\prime}(\mathbf{x})\mathbf{g}(\mathbf{x}).
2179We can calculate iterated Lie brackets $\mathrm{ad}_\mathbf{f}^0\mathbf{g}(\mathbf{x}),\ldots,\mathrm{ad}_\mathbf{f}^{d}\mathbf{g}(\mathbf{x})$ at the point $\mathbf{x}_0\in\Omega$ 
2180using the C function listed below:
2183\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2184\>{\sf int lie\_bracket(Tape\_F, Tape\_G, n, x0, d, result)}\\
2185\>{\sf short Tape\_F;}         \> // trace identification of vector field $\mathbf f$ \\
2186\>{\sf short Tape\_G;}         \> // trace identification of vector field $\mathbf g$\\
2187\>{\sf short n;}               \> // number of independent variables $n$\\
2188\>{\sf double x0[n];}          \> // values of independent variables $\mathbf{x}_0$\\
2189\>{\sf short d;}               \> // highest derivative degree $d$\\
2190\>{\sf double result[n][d+1];} \> // resulting Lie derivatives of a vector field\\
2193\subsubsection*{Additional C++ Drivers for Lie Derivatives}
2195For Lie derivatives of scalar fields, the C interface offers two different drivers, namely {\verb=lie_scalarc=} for $m=1$ scalar field $h:\Omega\to\R$ and {\verb=lie_scalarcv=} for $m\geq1$ scalar fields $h_1,\ldots,h_m:\Omega\to\R$, which are combined in a vector-valued map $\mathbf{h}:\Omega\to\R^m$. Using the polymorphism in C++, both C drivers are unified by the C++ driver {\verb=lie_scalar=} for the computation of Lie derivatives of scalar fields:
2198\hspace{0.5in}\={\sf int lie\_scalar(Tape\_F, Tape\_H, n, x0, d, result)} \hspace{0.45in}\= // case $m=1$\\
2199\>{\sf int lie\_scalar(Tape\_F, Tape\_H, n, m, x0, d, result)} \> // case $m\geq1$
2202The same situation occurs with the gradients or jacobians of Lie derivatives of scalar fields. Here, the C drivers {\verb=lie_gradientc=} and {\verb=lie_gradientcv=} are unified by the C++ driver {\verb=lie_gradient=}:
2205\hspace{0.5in}\={\sf int lie\_gradient(Tape\_F, Tape\_H, n, x0, d, result)} \hspace{0.35in}\= // case $m=1$\\
2206\>{\sf int lie\_gradient(Tape\_F, Tape\_H, n, m, x0, d, result)} \> // case $m\geq1$
2209The C++ drivers are also prototyped in the header file {\sf $<$adolc\_lie.h$>$}. An example how to use these drivers for the calculation of Lie derivatives can be found in the file {\sf GantryCrane.cpp} in the directory {\verb=additional_examples/lie=}.
2214\section{Basic Drivers for the Forward and Reverse Mode}
2217In this section, we present tailored drivers for different
2218variants of the forward mode and the reverse mode, respectively.
2219For a better understanding, we start with a short
2220description of the mathematical background.
2222Provided no arithmetic exception occurs,
2223no comparison including {\sf fmax} or  {\sf fmin} yields a tie,
2224{\sf fabs} does not yield zero,
2225and all special functions were evaluated in the
2226interior of their domains, the functional relation between the input
2227variables $x$
2228and the output variables $y$ denoted by $y=F(x)$ is in
2229fact analytic.  In other words, we can compute arbitrarily high
2230derivatives of the vector function $F : I\!\!R^n \mapsto I\!\!R^m$ defined
2231by the active section.
2232We find it most convenient to describe and
2233compute derivatives in terms of univariate Taylor expansions, which
2234are truncated after the highest derivative degree $d$ that is desired
2235by the user. Let
2238x(t) \; \equiv \; \sum_{j=0}^dx_jt^j \; : \;  I\!\!R \; \mapsto \;
2241denote any vector polynomial in the scalar variable $t \in I\!\!R$.
2242In other words, $x(t)$ describes a path in $I\!\!R^n$ parameterized by $t$.
2243The Taylor coefficient vectors
2244\[ x_j \; = \; 
2245\frac{1}{j!} \left .  \frac{\partial ^j}{\partial t^j} x(t)
2246\right |_{t=0}
2248are simply the scaled derivatives of $x(t)$ at the parameter
2249origin $t=0$. The first two vectors $x_1,x_2 \in I\!\!R^n$ can be
2250visualized as tangent and curvature at the base point $x_0$,
2252Provided that $F$ is $d$ times continuously differentiable, it
2253follows from the chain rule that the image path
2256 y(t) \; \equiv \; F(x(t)) \; : \;  I\!\!R \;\mapsto \;I\!\!R^m
2258is also smooth and has $(d+1)$ Taylor coefficient vectors
2259$y_j \in I\!\!R^m$ at $t=0$, so that
2262y(t) \; = \; \sum_{j=0}^d y_jt^j + O(t^{d+1}).
2264Also as a consequence of the chain rule, one can observe that
2265each $y_j$ is uniquely and smoothly determined by the coefficient
2266vectors $x_i$ with $i \leq j$.  In particular we have
2269  y_0 & = F(x_0) \nonumber \\
2270  y_1 & = F'(x_0) x_1 \nonumber\\
2271  y_2 & = F'(x_0) x_2 + \frac{1}{2}F''(x_0)x_1 x_1 \\
2272  y_3 & = F'(x_0) x_3 + F''(x_0)x_1 x_2
2273          + \frac{1}{6}F'''(x_0)x_1 x_1 x_1\nonumber\\
2274  & \ldots\nonumber
2276In writing down the last equations we have already departed from the
2277usual matrix-vector notation. It is well known that the number of
2278terms that occur in these ``symbolic'' expressions for
2279the $y_j$ in terms of the first $j$ derivative tensors of $F$ and
2280the ``input'' coefficients $x_i$ with $i\leq j$ grows very rapidly
2281with $j$. Fortunately, this exponential growth does not occur
2282in automatic differentiation, where the many terms are somehow
2283implicitly combined  so that storage and operations count grow only
2284quadratically in the bound $d$ on $j$.
2286Provided $F$ is analytic, this property is inherited by the functions
2288y_j = y_j (x_0,x_1, \ldots ,x_j) \in {I\!\!R}^m ,
2290and their derivatives satisfy the identities
2293\frac{\partial y_j}{\partial x_i}  = \frac{\partial y_{j-i}}
2294{\partial x_0} = A_{j-i}(x_0,x_1, \ldots ,x_{j-i})
2296as established in \cite{Chri91a}. This yields in particular
2298  \frac{\partial y_0}{\partial x_0} =
2299  \frac{\partial y_1}{\partial x_1} =
2300  \frac{\partial y_2}{\partial x_2} =
2301  \frac{\partial y_3}{\partial x_3} =
2302  A_0 & = F'(x_0) \\
2303  \frac{\partial y_1}{\partial x_0} =
2304  \frac{\partial y_2}{\partial x_1} =
2305  \frac{\partial y_3}{\partial x_2} =
2306  A_1 & = F''(x_0) x_1 \\
2307  \frac{\partial y_2}{\partial x_0} =
2308  \frac{\partial y_3}{\partial x_1} =
2309  A_2 & = F''(x_0) x_2 + \frac{1}{2}F'''(x_0)x_1 x_1 \\
2310  \frac{\partial y_3}{\partial x_0} =
2311  A_3 & = F''(x_0) x_3 + F'''(x_0)x_1 x_2
2312          + \frac{1}{6}F^{(4)}(x_0)x_1 x_1 x_1 \\
2313  & \ldots
2315The $m \times n$ matrices $A_k, k=0,\ldots,d$, are actually the Taylor
2316coefficients of the Jacobian path $F^\prime(x(t))$, a fact that is of
2317interest primarily in the context of ordinary differential
2318equations and differential algebraic equations.
2320Given the tape of an active section and the coefficients $x_j$,
2321the resulting $y_j$ and their derivatives $A_j$ can be evaluated
2322by appropriate calls to the ADOL-C forward mode implementations and
2323the ADOL-C reverse mode implementations. The scalar versions of the forward
2324mode propagate just one truncated Taylor series from the $(x_j)_{j\leq d}$
2325to the $(y_j)_{j\leq d}$. The vector versions of the forward
2326mode propagate families of $p\geq 1$ such truncated Taylor series
2327in order to reduce the relative cost of the overhead incurred
2328in the tape interpretation. In detail, ADOL-C provides
2330\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2331\>{\sf int zos\_forward(tag,m,n,keep,x,y)}\\
2332\>{\sf short int tag;}         \> // tape identification \\
2333\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2334\>{\sf int n;}                 \> // number of independent variables $n$\\
2335\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2336\>{\sf double x[n];}           \> // independent vector $x=x_0$\\
2337\>{\sf double y[m];}           \> // dependent vector $y=F(x_0)$
2339for the {\bf z}ero-{\bf o}rder {\bf s}calar forward mode. This driver computes
2340$y=F(x)$ with $0\leq\text{\sf keep}\leq 1$. The integer
2341flag {\sf keep} plays a similar role as in the call to 
2342{\sf trace\_on}: It determines if {\sf zos\_forward} writes
2343the first Taylor coefficients of all intermediate quantities into a buffered
2344temporary file, i.e., the value stack, in preparation for a subsequent
2345reverse mode evaluation. The value {\sf keep} $=1$
2346prepares for {\sf fos\_reverse} or {\sf fov\_reverse} as exlained below.
2348To compute first-order derivatives, one has
2350\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2351\>{\sf int fos\_forward(tag,m,n,keep,x0,x1,y0,y1)}\\
2352\>{\sf short int tag;}         \> // tape identification \\
2353\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2354\>{\sf int n;}                 \> // number of independent variables $n$\\
2355\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2356\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2357\>{\sf double x1[n];}          \> // tangent vector $x_1$\\
2358\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2359\>{\sf double y1[m];}          \> // first derivative $y_1=F'(x_0)x_1$
2361for the {\bf f}irst-{\bf o}rder {\bf s}calar forward mode. Here, one has
2362$0\leq\text{\sf keep}\leq 2$, where
2364\text{\sf keep} = \left\{\begin{array}{cl}
2365       1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2366       2 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2367       \end{array}\right.
2369as exlained below. For the {\bf f}irst-{\bf o}rder {\bf v}ector forward mode,
2370ADOL-C provides
2372\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2373\>{\sf int fov\_forward(tag,m,n,p,x0,X,y0,Y)}\\
2374\>{\sf short int tag;}         \> // tape identification \\
2375\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2376\>{\sf int n;}                 \> // number of independent variables $n$\\
2377\>{\sf int p;}                 \> // number of directions\\
2378\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2379\>{\sf double X[n][p];}        \> // tangent matrix $X$\\
2380\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2381\>{\sf double Y[m][p];}        \> // first derivative matrix $Y=F'(x)X$
2383For the computation of higher derivative, the driver
2385\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2386\>{\sf int hos\_forward(tag,m,n,d,keep,x0,X,y0,Y)}\\
2387\>{\sf short int tag;}         \> // tape identification \\
2388\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2389\>{\sf int n;}                 \> // number of independent variables $n$\\
2390\>{\sf int d;}                 \> // highest derivative degree $d$\\
2391\>{\sf int keep;}              \> // flag for reverse mode preparation\\
2392\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2393\>{\sf double X[n][d];}        \> // tangent matrix $X$\\
2394\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2395\>{\sf double Y[m][d];}        \> // derivative matrix $Y$
2397implementing the  {\bf h}igher-{\bf o}rder {\bf s}calar forward mode.
2398The rows of the matrix $X$ must correspond to the independent variables in the order of their
2399initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2400$X = \{x_j\}_{j=1\ldots d}$ represent Taylor coefficient vectors as in
2401\eqref{eq:x_of_t}. The rows of the matrix $Y$ must correspond to the
2402dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2403The columns of $Y = \{y_j\}_{j=1\ldots d}$ represent
2404Taylor coefficient vectors as in \eqref{eq:series}, i.e., {\sf hos\_forward}
2405computes the values
2406$y_0=F(x_0)$, $y_1=F'(x_0)x_1$, \ldots, where
2407$X=[x_1,x_2,\ldots,x_d]$ and  $Y=[y_1,y_2,\ldots,y_d]$. Furthermore, one has
2408$0\leq\text{\sf keep}\leq d+1$, with
2410\text{\sf keep}  \left\{\begin{array}{cl}
2411       = 1 & \text{prepares for {\sf fos\_reverse} or {\sf fov\_reverse}} \\
2412       > 1 & \text{prepares for {\sf hos\_reverse} or {\sf hov\_reverse}}
2413       \end{array}\right.
2415Once more, there is also a vector version given by
2417\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2418\>{\sf int hov\_forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2419\>{\sf short int tag;}         \> // tape identification \\
2420\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2421\>{\sf int n;}                 \> // number of independent variables $n$\\
2422\>{\sf int d;}                 \> // highest derivative degree $d$\\
2423\>{\sf int p;}                 \> // number of directions $p$\\
2424\>{\sf double x0[n];}          \> // independent vector $x_0$\\
2425\>{\sf double X[n][p][d];}     \> // tangent matrix $X$\\
2426\>{\sf double y0[m];}          \> // dependent vector $y_0=F(x_0)$\\
2427\>{\sf double Y[m][p][d];}     \> // derivative matrix $Y$
2429for the  {\bf h}igher-{\bf o}rder {\bf v}ector forward mode that computes
2430$y_0=F(x_0)$, $Y_1=F'(x_0)X_1$, \ldots, where $X=[X_1,X_2,\ldots,X_d]$ and 
2433There are also overloaded versions providing a general {\sf forward}-call.
2434Details of the appropriate calling sequences are given in \autoref{forw_rev}.
2436Once, the required information is generated due to a forward mode evaluation
2437with an approriate value of the parameter {\sf keep}, one may use the
2438following implementation variants of the reverse mode. To compute first-order derivatives
2439one can use
2441\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2442\>{\sf int fos\_reverse(tag,m,n,u,z)}\\
2443\>{\sf short int tag;}         \> // tape identification \\
2444\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2445\>{\sf int n;}                 \> // number of independent variables $n$\\
2446\>{\sf double u[m];}           \> // weight vector $u$\\
2447\>{\sf double z[n];}           \> // resulting adjoint value $z^T=u^T F'(x)$
2449as {\bf f}irst-{\bf o}rder {\bf s}calar reverse mode implementation that computes
2450the product $z^T=u^T F'(x)$ after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2451{\sf hos\_forward} with {\sf keep}=1. The corresponding {\bf f}irst-{\bf
2452  o}rder {\bf v}ector reverse mode driver is given by
2454\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2455\>{\sf int fov\_reverse(tag,m,n,q,U,Z)}\\
2456\>{\sf short int tag;}         \> // tape identification \\
2457\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2458\>{\sf int n;}                 \> // number of independent variables $n$\\
2459\>{\sf int q;}                 \> // number of weight vectors $q$\\
2460\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2461\>{\sf double Z[q][n];}        \> // resulting adjoint $Z=U F'(x)$
2463that can be used after calling  {\sf zos\_forward}, {\sf fos\_forward}, or
2464{\sf hos\_forward} with {\sf keep}=1. To compute higher-order derivatives,
2465ADOL-C provides
2467\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2468\>{\sf int hos\_reverse(tag,m,n,d,u,Z)}\\
2469\>{\sf short int tag;}         \> // tape identification \\
2470\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2471\>{\sf int n;}                 \> // number of independent variables $n$\\
2472\>{\sf int d;}                 \> // highest derivative degree $d$\\
2473\>{\sf double u[m];}           \> // weight vector $u$\\
2474\>{\sf double Z[n][d+1];}      \> // resulting adjoints
2476as {\bf h}igher-{\bf o}rder {\bf s}calar reverse mode implementation yielding
2477the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2478\ldots, where $Z=[z_0,z_1,\ldots,z_d]$ after calling  {\sf fos\_forward} or
2479{\sf hos\_forward} with {\sf keep} $=d+1>1$. The vector version is given by
2481\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2482\>{\sf int hov\_reverse(tag,m,n,d,q,U,Z,nz)}\\
2483\>{\sf short int tag;}         \> // tape identification \\
2484\>{\sf int m;}                 \> // number of  dependent variables $m$\\
2485\>{\sf int n;}                 \> // number of independent variables $n$\\
2486\>{\sf int d;}                 \> // highest derivative degree $d$\\
2487\>{\sf double U[q][m];}        \> // weight vector $u$\\
2488\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints\\
2489\>{\sf short int nz[q][n];}    \> // nonzero pattern of {\sf Z}
2491as {\bf h}igher-{\bf o}rder {\bf v}ector reverse mode driver to compute
2492the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2493\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$ after calling  {\sf fos\_forward} or
2494{\sf hos\_forward} with {\sf keep} $=d+1>1$.
2495After the function call, the last argument of {\sf hov\_reverse} 
2496contains information about the sparsity pattern, i.e. each {\sf nz[i][j]}
2497has a value that characterizes the functional relation between the
2498$i$-th component of $UF^\prime(x)$ and the $j$-th independent value
2499$x_j$ as:
2502 0 & trivial \\
2503 1 & linear
2504\end{tabular} \hspace*{4ex}
2506 2 & polynomial\\
2507 3 & rational
2508\end{tabular} \hspace*{4ex}
2510 4 & transcendental\\
2511 5 & non-smooth
2514Here, ``trivial'' means that there is no dependence at all and ``linear'' means
2515that the partial derivative is a constant that
2516does not dependent on other variables either. ``Non-smooth'' means that one of
2517the functions on the path between $x_i$ and $y_j$ was evaluated at a point
2518where it is not differentiable.  All positive labels
2519$1, 2, 3, 4, 5$ are pessimistic in that the actual functional relation may
2520in fact be simpler, for example due to exact cancellations. 
2522There are also overloaded versions providing a general {\sf reverse}-call.
2523Details of the appropriate calling sequences are given in the following \autoref{forw_rev}.
2529\section{Overloaded Forward and Reverse Calls}
2532In this section, the several versions of the {\sf forward} and
2533{\sf reverse} routines, which utilize the overloading capabilities
2534of C++, are described in detail. With exception of the bit pattern
2535versions all interfaces are prototyped in the header file
2536\verb=<adolc/interfaces.h>=, where also some more specialized {\sf forward}
2537and {\sf reverse} routines are explained. Furthermore, \mbox{ADOL-C} provides
2538C and Fortran-callable versions prototyped in the same header file.
2539The bit pattern versions of {\sf forward} and {\sf reverse} introduced
2540in the \autoref{ProBit} are prototyped in the header file
2541\verb=<adolc/sparse/sparsedrivers.h>=, which will be included by the header
2542file \verb=<adolc/interfaces.h>= automatically.
2545\subsection{The Scalar Case}
2549Given any correct tape, one may call from within
2550the generating program, or subsequently during another run, the following
2554\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2555\>{\sf int forward(tag,m,n,d,keep,X,Y)} \\
2556\>{\sf short int tag;}         \> // tape identification \\
2557\>{\sf int m;}                 \> // number of dependent variables $m$\\
2558\>{\sf int n;}                 \> // number of independent variables $n$\\
2559\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2560\>{\sf  int keep;}             \> // flag for reverse sweep \\ 
2561\>{\sf  double X[n][d+1];}     \> // Taylor coefficients $X$ of
2562                                     independent variables \\
2563\>{\sf double Y[m][d+1];}      \> // Taylor coefficients $Y$ as
2564                                     in \eqref{eq:series}
2567The rows of the matrix $X$ must correspond to the independent variables in the order of their
2568initialization by the \boldmath $\ll=$ \unboldmath operator. The columns of
2569$X = \{x_j\}_{j=0\ldots d}$ represent Taylor coefficient vectors as in
2570\eqref{eq:x_of_t}. The rows of the matrix $Y$ must
2571correspond to the
2572dependent variables in the order of their selection by the \boldmath $\gg=$ \unboldmath operator.
2573The columns of $Y = \{y_j\}_{j=0\ldots d}$ represent
2574Taylor coefficient vectors as in \eqref{eq:series}.
2575Thus the first column of $Y$ contains the
2576function value $F(x)$ itself, the next column represents the first
2577Taylor coefficient vector of $F$, and the last column the
2578$d$-th Taylor coefficient vector. The integer flag {\sf keep} determines
2579how many Taylor coefficients of all intermediate quantities are
2580written into the value stack as explained in \autoref{forw_rev_ad}.
2581 If {\sf keep} is omitted, it defaults to 0.
2583The given {\sf tag} value is used by {\sf forward} to determine the
2584name of the file on which the tape was written. If the tape file does
2585not exist, {\sf forward} assumes that the relevant
2586tape is still in core and reads from the buffers.
2587After the execution of an active section with \mbox{{\sf keep} = 1} or a call to
2588{\sf forward} with any {\sf keep} $\leq$ $d+1$, one may call
2589the function {\sf reverse} with \mbox{{\sf d} = {\sf keep} $-$ 1} and the same tape
2590identifier {\sf tag}. When $u$ is a vector
2591and $Z$ an $n\times (d+1)$ matrix
2592{\sf reverse} is executed in the scalar mode by the calling
2596\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position             
2597\>{\sf int reverse(tag,m,n,d,u,Z)}\\
2598\>{\sf short int tag;}         \> // tape identification \\
2599\>{\sf int m;}                 \> // number of dependent variables $m$\\
2600\>{\sf int n;}                 \> // number of independent variables $n$\\
2601\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2602\>{\sf  double u[m];}          \> // weighting vector $u$\\
2603\>{\sf double Z[n][d+1];}      \> // resulting adjoints $Z$ 
2605to compute
2606the adjoints $z_0^T=u^T F'(x_0)=u^T A_0$, $z_1^T=u^T F''(x_0)x_1=u^T A_1$,
2607\ldots, where $Z=[z_0,z_1,\ldots,z_d]$.
2610\subsection{The Vector Case}
2614When $U$ is a matrix {\sf reverse} is executed in the vector mode by the following calling sequence
2617\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position       
2618\>{\sf int reverse(tag,m,n,d,q,U,Z,nz)}\\
2619\>{\sf short int tag;}         \> // tape identification \\
2620\>{\sf int m;}                 \> // number of dependent variables $m$\\
2621\>{\sf int n;}                 \> // number of independent variables $n$\\
2622\>{\sf  int d;}                \> // highest derivative degree $d$\\ 
2623\>{\sf int q;}                 \> // number of weight vectors $q$\\
2624\>{\sf double U[q][m];}        \> // weight matrix $U$\\
2625\>{\sf double Z[q][n][d+1];}   \> // resulting adjoints \\
2626\>{\sf short nz[q][n];}        \> // nonzero pattern of {\sf Z}
2629to compute the adjoints $Z_0=U F'(x_0)=U A_0$, $Z_1=U F''(x_0)x_1=U A_1$,
2630\ldots, where $Z=[Z_0,Z_1,\ldots,Z_d]$.
2631When the arguments {\sf p} and {\sf U} are omitted, they default to
2632$m$ and the identity matrix of order $m$, respectively. 
2634Through the optional argument {\sf nz} of {\sf reverse} one can compute
2635information about the sparsity pattern of $Z$ as described in detail
2636in the previous \autoref{forw_rev_ad}.
2638The return values of {\sf reverse} calls can be interpreted according
2639to \autoref{retvalues}, but negative return values are not
2640valid, since the corresponding forward sweep would have
2641stopped without completing the necessary taylor file.
2642The return value of {\sf reverse} may be higher
2643than that of the preceding {\sf forward} call because some operations
2644that were evaluated  at a critical argument during the forward sweep
2645were found not to impact the dependents during the reverse sweep.
2647In both scalar and vector mode, the degree $d$ must agree with
2648{\sf keep}~$-$~1 for the most recent call to {\sf forward}, or it must be
2649equal to zero if {\sf reverse} directly follows the taping of an active
2650section. Otherwise, {\sf reverse} will return control with a suitable error
2652In order to avoid possible confusion, the first four arguments must always be
2653present in the calling sequence. However, if $m$ or $d$
2654attain their trivial values 1 and 0, respectively, then
2655corresponding dimensions of the arrays {\sf X}, {\sf Y}, {\sf u},
2656{\sf U}, or {\sf Z} can be omitted, thus eliminating one level of
2657indirection.  For example, we may call
2658{\sf reverse(tag,1,n,0,1.0,g)} after declaring
2659{\sf double g[n]} 
2660to calculate a gradient of a scalar-valued function.
2662Sometimes it may be useful to perform a forward sweep for families of
2663Taylor series with the same leading term.
2664This vector version of {\sf forward} can be called in the form
2667\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2668\>{\sf int forward(tag,m,n,d,p,x0,X,y0,Y)}\\
2669\>{\sf short int tag;}         \> // tape identification \\
2670\>{\sf int m;}                 \> // number of dependent variables $m$\\
2671\>{\sf int n;}                 \> // number of independent variables $n$\\
2672\>{\sf int d;}                 \> // highest derivative degree $d$\\
2673\>{\sf int p;}                 \> // number of Taylor series $p$\\
2674\>{\sf  double x0[n];}          \> // values of independent variables $x_0$\\
2675\>{\sf double X[n][p][d];}     \> // Taylor coefficients $X$ of independent variables\\
2676\>{\sf double y0[m];}           \> // values of dependent variables $y_0$\\
2677\>{\sf double Y[m][p][d];}     \> // Taylor coefficients $Y$ of dependent variables
2680where {\sf X} and {\sf Y} hold the Taylor coefficients of first
2681and higher degree and {\sf x0}, {\sf y0} the common Taylor coefficients of
2682degree 0. There is no option to keep the values of active variables
2683that are going out of scope or that are overwritten. Therefore this
2684function cannot prepare a subsequent reverse sweep.
2685The return integer serves as a flag to indicate quadratures or altered
2686comparisons as described above in \autoref{reuse_tape}.
2688Since the calculation of Jacobians is probably the most important
2689automatic differentia\-tion task, we have provided a specialization
2690of vector {\sf forward} to the case where $d = 1$. This version can be
2691called in the form
2694\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2695\>{\sf int forward(tag,m,n,p,x,X,y,Y)}\\
2696\>{\sf short int tag;}         \> // tape identification \\
2697\>{\sf int m;}                 \> // number of dependent variables $m$\\
2698\>{\sf int n;}                 \> // number of independent variables $n$\\
2699\>{\sf int p;}                 \> // number of partial derivatives $p$ \\
2700\>{\sf double x[n];}          \> // values of independent variables $x_0$\\
2701\>{\sf double X[n][p];}        \> // seed derivatives of independent variables $X$\\
2702\>{\sf double y[m];}           \> // values of dependent variables $y_0$\\
2703\>{\sf double Y[m][p];}        \> // first derivatives of dependent variables $Y$
2706When this routine is called with {\sf p} = {\sf n} and {\sf X} the identity matrix,
2707the resulting {\sf Y} is simply the Jacobian $F^\prime(x_0)$. In general,
2708one obtains the $m\times p$ matrix $Y=F^\prime(x_0)\,X $ for the
2709chosen initialization of $X$. In a workstation environment a value
2710of $p$ somewhere between $10$ and $50$
2711appears to be fairly optimal. For smaller $p$ the interpretive
2712overhead is not appropriately amortized, and for larger $p$ the
2713$p$-fold increase in storage causes too many page faults. Therefore,
2714large Jacobians that cannot be compressed via column coloring
2715as could be done for example using the driver {\sf sparse\_jac}
2716should be ``strip-mined'' in the sense that the above
2717first-order-vector version of {\sf forward} is called
2718repeatedly with the successive \mbox{$n \times p$} matrices $X$ forming
2719a partition of the identity matrix of order $n$.
2722\subsection{Dependence Analysis}
2726The sparsity pattern of Jacobians is often needed to set up data structures
2727for their storage and factorization or to allow their economical evaluation
2728by compression \cite{BeKh96}. Compared to the evaluation of the full
2729Jacobian $F'(x_0)$ in real arithmetic computing the Boolean matrix
2730$\tilde{P}\in\left\{0,1\right\}^{m\times n}$ representing its sparsity
2731pattern in the obvious way requires a little less run-time and
2732certainly a lot less memory.
2734The entry $\tilde{P}_{ji}$ in the $j$-th row and $i$-th column
2735of $\tilde{P}$ should be $1 = true$ exactly when there is a data
2736dependence between the $i$-th independent variable $x_{i}$ and
2737the $j$-th dependent variable $y_{j}$. Just like for real arguments
2738one would wish to compute matrix-vector and vector-matrix products
2739of the form $\tilde{P}\tilde{v}$ or $\tilde{u}^{T}\tilde{P}$ 
2740by appropriate {\sf forward} and {\sf reverse} routines where
2741$\tilde{v}\in\{0,1\}^{n}$ and $\tilde{u}\in\{0,1\}^{m}$.
2742Here, multiplication corresponds to logical
2743{\sf AND} and addition to logical {\sf OR}, so that algebra is performed in a
2746For practical reasons it is assumed that
2747$s=8*${\sf sizeof}$(${\sf unsigned long int}$)$ such Boolean vectors
2748$\tilde{v}$ and $\tilde{u}$ are combined to integer vectors
2749$v\in\N^{n}$ and $u\in\N^{m}$ whose components can be interpreted
2750as bit patterns. Moreover $p$ or $q$ such integer vectors may
2751be combined column-wise or row-wise to integer matrices $X\in\N^{n \times p}$ 
2752and $U\in\N^{q \times m}$, which naturally correspond
2753to Boolean matrices $\tilde{X}\in\{0,1\}^{n\times\left(sp\right)}$
2754and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times m}$. The provided
2755bit pattern versions of {\sf forward} and {\sf reverse} allow
2756to compute integer matrices $Y\in\N^{m \times p}$ and
2757$Z\in\N^{q \times m}$ corresponding to
2760\tilde{Y} = \tilde{P}\tilde{X} \qquad \mbox{and} \qquad 
2761\tilde{Z} = \tilde{U}\tilde{P} \, ,
2763respectively, with $\tilde{Y}\in\{0,1\}^{m\times\left(sp\right)}$
2764and $\tilde{U}\in\{0,1\}^{\left(sq\right)\times n}$.
2765In general, the application of the bit pattern versions of
2766{\sf forward} or {\sf reverse} can be interpreted as
2767propagating dependences between variables forward or backward, therefore
2768both the propagated integer matrices and the corresponding
2769Boolean matrices are called {\em dependence structures}.
2771The bit pattern {\sf forward} routine
2774\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2775\>{\sf int forward(tag,m,n,p,x,X,y,Y,mode)}\\
2776\>{\sf short int tag;}              \> // tape identification \\
2777\>{\sf int m;}                      \> // number of dependent variables $m$\\
2778\>{\sf int n;}                      \> // number of independent variables $n$\\
2779\>{\sf int p;}                      \> // number of integers propagated $p$\\
2780\>{\sf double x[n];}                \> // values of independent variables $x_0$\\
2781\>{\sf unsigned long int X[n][p];}  \> // dependence structure $X$ \\
2782\>{\sf double y[m];}                \> // values of dependent variables $y_0$\\
2783\>{\sf unsigned long int Y[m][p];}  \> // dependence structure $Y$ according to
2784                                     \eqref{eq:int_forrev}\\
2785\>{\sf char mode;}                  \> // 0 : safe mode (default), 1 : tight mode
2788can be used to obtain the dependence structure $Y$ for a given dependence structure
2789$X$. The dependence structures are
2790represented as arrays of {\sf unsigned long int} the entries of which are
2791interpreted as bit patterns as described above.   
2792For example, for $n=3$ the identity matrix $I_3$ should be passed
2793with $p=1$ as the $3 \times 1$ array
2795{\sf X} \; = \;
2796\left( \begin{array}{r}
2797         {\sf 1}0000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2798         0{\sf 1}000000 \: 00000000 \: 00000000 \: 00000000_2 \\
2799         00{\sf 1}00000 \: 00000000 \: 00000000 \: 00000000_2
2800       \end{array} \right)
2802in the 4-byte long integer format. The parameter {\sf mode} determines
2803the mode of dependence analysis as explained already in \autoref{sparse}.
2805A call to the corresponding bit pattern {\sf reverse} routine
2808\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2809\>{\sf int reverse(tag,m,n,q,U,Z,mode)}\\
2810\>{\sf short int tag;}         \> // tape identification \\
2811\>{\sf int m;}                 \> // number of dependent variables $m$\\
2812\>{\sf int n;}                 \> // number of independent variables $n$\\
2813\>{\sf int q;}                 \> // number of integers propagated q\\
2814\>{\sf unsigned long int U[q][m];}  \> // dependence structure $U$ \\
2815\>{\sf unsigned long int Z[q][n];}  \> // dependence structure $Z$ according
2816                                     to \eqref{eq:int_forrev}\\
2817\>{\sf char mode;}        \> // 0 : safe mode (default), 1 : tight mode
2820yields the dependence structure $Z$ for a given dependence structure
2823To determine the whole sparsity pattern $\tilde{P}$ of the Jacobian $F'(x)$
2824as an integer matrix $P$ one may call {\sf forward} or {\sf reverse} 
2825with $p \ge n/s$ or $q \ge m/s$, respectively. For this purpose the
2826corresponding dependence structure $X$ or $U$ must be defined to represent 
2827the identity matrix of the respective dimension.
2828Due to the fact that always a multiple of $s$ Boolean vectors are propagated
2829there may be superfluous vectors, which can be set to zero.
2831The return values of the bit pattern {\sf forward} and {\sf reverse} routines
2832correspond to those described in \autoref{retvalues}.
2834One can control the storage growth by the factor $p$ using
2835``strip-mining'' for the calls of {\sf forward} or {\sf reverse} with successive
2836groups of columns or respectively rows at a time, i.e.~partitioning
2837$X$ or $U$ appropriately as described for the computation of Jacobians
2838in \autoref{vecCas}.
2842\section{Advanced algorithmic differentiation in ADOL-C}
2845\subsection{Differentiating parameter dependent functions}
2847Normally the functions to be differentiated using AD are defined as
2848mappings from $\R^n$ to $\R^m$. The derivatives of the $m$ dependents
2849are then computed w.r.t. all the $n$ independents. However sometimes
2850the application requires only derivatives w.r.t. some of the
2851independents and the others only occur in the evaluation procedure as
2852parameters. One important such use case is Lagrange Multipliers in a
2853nonlinear optimization problem. Declaring such variables as
2854\verb=adouble= and marking them as independent results in the
2855evaluation of superfluous derivative information and increases
2856runtime. However, if the parameters are not defined as \verb=adouble=
2857type variables they get hard coded as constants onto the ADOL-C
2858trace. One may want to change these parameters for some evaluations,
2859however the overhead of retracing the function may be very high. In
2860such cases these constants may be specially marked as parameters
2861during tracing using the function
2863\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2864\>{\sf mkparam(d)}\\
2865\>{\sf double d;}         \> // constant value to be marked as parameter \\
2867This stores the constant in a different place on the trace than the
2868other unmarked constants and enables the user to replace a whole
2869vector of parameters with a new one before computing derivatives later
2870on. All such marked parameter constants can be used in any evaluation
2871expression where the original unmarked constants were used.
2873Sometimes the same parameter may be needed in various different
2874expressions. During trace creation the marked parameters are given a
2875running index as they are marked. This index may me saved by the user
2876and used to access the same stored value in different expressions.
2878To save the index while marking a parameter use
2880\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2881\>{\sf locint mkparam\_idx(d)}\\
2882\>{\sf double d;} \> // constant value to be marked
2884The resulting \verb=locint= can be saved in a variable by the user and
2885later used to access the saved parameter from in the trace in any expression
2887To access a saved parameter inside any evaluation expression use
2889\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2890\>{\sf getparam(idx)}\\
2891\>{\sf locint idx;} \> // index of the parameter as returned by
2895The above three functions work only during the creation of the trace,
2896i.e., between the invokation of \verb=trace_on()= and \verb=trace_off()=.
2898For all following derivative computations using the drivers described
2899in the preceding sections, the saved parameter values will be used,
2900unless the user asks to replace them with a different vector of
2903To replace the whole parameter vector one uses the routine
2905\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
2906\>{\sf set\_param\_vec(tag, number, parameters)}\\
2907\>{\sf short int tag;}   \> // tape identification\\
2908\>{\sf size\_t number;}  \> // total number of constants marked as
2910\>{\sf double* parameters;} \> // array of constants to be used as replacement\\
2912All following derivative computations using the drivers described in
2913the preceding sections will now use the new parameters.
2915After replacing the parameters, however, one cannot directly call a basic
2916reverse mode driver. A basic forward mode driver must be called. However,
2917the easy to use drivers from Section \ref{drivers} do this automatically.
2919\subsection{Differentiating external functions}
2921Ideally, AD is applied to a given computation as a whole.
2922This is not always possible because parts of the computation may
2923be coded in a different programming language or may a call to
2924an external library.
2925In the former case one may want to differentiate the parts in
2926question with a different AD tool or provide hand written derivatives.
2927In practice, however, sophisticated projects usually evolve over a long period of time.
2928Within this process, a heterogeneous code base for the project
2929develops, which may include the incorporation of external solutions,
2930changes in programming paradigms or even of programming languages.
2931Equally heterogeneous, the computation of derivative values appears.
2932Hence, different \mbox{AD-tools} may be combined with hand-derived
2933codes based on the same or different programming languages.
2934ADOL-C supports such settings by the concept of externally
2935differentiated functions, that is, a function
2936not differentiated by ADOL-C itself. The required derivatives
2937have to be provided by the user.
2939For this purpose, it is required that the externally differentiated
2940function (for example named {\sf\em euler\_step} ) has the following signature
2944\hspace*{2cm}{\sf int euler\_step(int n, double *x, int m, double *y);} .
2948Note that the formal paramters in the signature have {\sf double} type, that is,
2949they are not active as in the original program before the ADOL-C type change.
2950The externally differentiated function has to
2951be {\em registered}\footnote{we record the function pointer} using an \mbox{ADOL-C} method as follows
2955\hspace*{2cm}{\sf ext\_diff\_fct *edf = reg\_ext\_fct(euler\_step);} .
2959This returns a pointer to an {\sf ext\_diff\_fct} instance specific to the registered function.
2960Then, the user has to supply the function pointers for the callback methods (for example here
2961{\sf zos\_for\_euler\_step} and {\sf  fos\_rev\_euler\_step}) the user implemented
2962to compute the derivatives as follows:
2964\hspace*{2cm}\= {\sf edf-$>$zos\_forward = {\em zos\_for\_euler\_step};}\\
2965             \> {\sf // function pointer for computing
2966               Zero-Order-Scalar (=zos)}\\
2967             \> {\sf // forward information}\\
2968%             \> {\sf edf-$>$dp\_x = xp;}\\
2969%             \> {\sf edf-$>$dp\_y = yp;}\\
2970%             \> {\sf // double arrays for arguments and results}\\
2971             \> {\sf edf-$>$fos\_reverse = fos\_rev\_euler\_step;} \\
2972             \> {\sf // function pointer for computing
2973               First-Order-Scalar (=fos)}\\ 
2974             \> {\sf reverse information}
2976To facilitate the switch between active and passive versions of the parameters {\sf x} and {\sf y}
2977one has to provide (allocate) both variants. I.e. if the call to {\sf euler\_step} was originally
2981\hspace*{2cm}{\sf rc=euler\_step(n, sOld, m, sNew);}
2985then the ADOL-C typechange for the calling context will turn  {\sf sOld} and {\sf sNew} in {\sf adouble} pointers.
2986To trigger the appropriate action for the derivative computation (i.e. creating an external differentiation entry on the trace)
2987the original call to the externally differentiated function
2988must be substituted by
2992\hspace*{2cm}{\sf rc=call\_ext\_fct(edf, n, sOldPassive, sOld, m, sNewPassive, sNew);} .
2996Here, {\sf sOldPassive} and {\sf sNewPassive} are the passive counterparts ({\sf double} pointers allocated to length {\sf n} and {\sf m}, respectively) 
2997to the active arguments {\sf sNew} in {\sf adouble}.
2998The usage of the external function facility is illustrated by the
2999example \verb=ext_diff_func= contained in
3001There, the external differentiated function is also a C code, but the
3002handling as external differentiated function decreases the
3003overall required tape size.
3006\subsection{Advanced algorithmic differentiation of time integration processes}
3008For many time-dependent applications, the corresponding simulations
3009are based on ordinary or partial differential equations.
3010Furthermore, frequently there are quantities that influence the
3011result of the simulation and can be seen as  control of the systems.
3012To compute an approximation of the
3013simulated process for a time interval $[0,T]$ and evaluated the
3014desired target function, one applies an
3015appropriate integration scheme given by
3017\hspace{5mm} \= some initializations yielding $x_0$\\
3018\> for $i=0,\ldots, N-1$\\
3019\hspace{10mm}\= $x_{i+1} = F(x_i,u_i,t_i)$\\
3020\hspace{5mm} \= evaluation of the target function
3022where $x_i\in {\bf R}^n$ denotes the state and $u_i\in {\bf R}^m$ the control at
3023time $t_i$ for a given time grid $t_0,\ldots,t_N$ with $t_0=0$ and
3024$t_N=T$. The operator $F : {\bf R}^n \times {\bf R}^m \times {\bf R} \mapsto {\bf R}^n$
3025defines the time step to compute the state at time $t_i$. Note that we
3026do not assume a uniform grid.
3028When computing derivatives of the target function with respect to the
3029control, the consequences for the tape generation using the ``basic''
3030taping approach as implemented in ADOL-C so far are shown in the left part of
3035\includegraphics[width=5.8cm]{tapeadv} \hspace*{0.5cm}\
3037\hspace*{0.8cm} Basic taping process \hspace*{4.3cm} Advanced taping process
3038\caption{Different taping approaches}
3041As can be seen, the iterative process is completely
3042unrolled due to the taping process. That is, the tape contains an internal representation of each
3043time step. Hence, the overall tape comprises a serious amount of redundant
3044information as illustrated by the light grey rectangles in
3047To overcome the repeated storage of essentially the same information,
3048a {\em nested taping} mechanism has been incorporated into ADOL-C as illustrated on
3049the right-hand side of \autoref{fig:bas_tap}. This new
3050capability allows the encapsulation of the time-stepping procedure
3051such that only the last time step $x_{N} = F(x_{N-1},u_{N-1})$ is taped as one
3052representative of the time steps in addition to a function pointer to the
3053evaluation procedure $F$ of the time steps.  The function pointer has
3054to be stored for a possibly necessary retaping during the derivative calculation
3055as explained below.
3057Instead of storing the complete tape, only a very limited number of intermediate
3058states are kept in memory. They serve as checkpoints, such that
3059the required information for the backward integration is generated
3060piecewise during the adjoint calculation.
3061For this modified adjoint computation the optimal checkpointing schedules
3062provided by {\bf revolve} are employed. An adapted version of the
3063software package {\sf revolve} is part of ADOL-C and automatically
3064integrated in the ADOL-C library. Based on {\sf revolve}, $c$ checkpoints are
3065distributed such that computational effort is minimized for the given
3066number of checkpoints and time steps $N$. It is important to note that the overall tape
3067size is drastically reduced due to the advanced taping strategy.  For the
3068implementation of this nested taping we introduced
3069a so-called ``differentiating context'' that enables \mbox{ADOL-C} to
3070handle different internal function representations during the taping
3071procedure and the derivative calculation. This approach allows the generation of a new
3072tape inside the overall tape, where the coupling of the different tapes is based on
3073the {\em external differentiated function} described above.
3075Written under the objective of minimal user effort, the checkpointing routines
3076of \mbox{ADOL-C} need only very limited information. The user must
3077provide two routines as implementation of the time-stepping function $F$ 
3078with the signatures
3082\hspace*{2cm}{\sf int time\_step\_function(int n, adouble *u);}\\
3083\hspace*{2cm}{\sf int time\_step\_function(int n, double *u);}
3087where the function names can be chosen by the user as long as the names are
3088unique.It is possible that the result vector of one time step
3089iteration overwrites the argument vector of the same time step. Then, no
3090copy operations are required to prepare the next time step.
3092At first, the {\sf adouble} version of the time step function has to
3093be {\em registered} using the \mbox{ADOL-C} function
3097\hspace*{2cm}{\sf CP\_Context cpc(time\_step\_function);}.
3101This function initializes the structure {\sf cpc}. Then,
3102the user has to provide the remaining checkpointing information
3103by the following commands:
3105\hspace*{2cm}\= {\sf cpc.setDoubleFct(time\_step\_function);}\\
3106             \> {\sf // double variante of the time step function}\\
3107             \> {\sf cpc.setNumberOfSteps(N);}\\
3108             \> {\sf // number of time steps to perform}\\
3109             \> {\sf cpc.setNumberOfCheckpoints(10);}\\
3110             \> {\sf // number of checkpoint} \\
3111             \> {\sf cpc.setDimensionXY(n);}\\
3112             \> {\sf // dimension of input/output}\\
3113             \> {\sf cpc.setInput(y);}\\
3114             \> {\sf // input vector} \\
3115             \> {\sf cpc.setOutput(y);}\\
3116             \> {\sf // output vector }\\
3117             \> {\sf cpc.setTapeNumber(tag\_check);}\\
3118             \> {\sf // subtape number for checkpointing} \\
3119             \> {\sf cpc.setAlwaysRetaping(false);}\\
3120             \> {\sf // always retape or not ?}
3122Subsequently, the time loop in the function evaluation can be
3123substituted by a call of the function
3127\hspace*{2cm}{\sf int cpc.checkpointing();}
3131Then, ADOL-C computes derivative information using the optimal checkpointing
3132strategy provided by {\sf revolve} internally, i.e., completely hidden from the user.
3134The presented driver is prototyped in the header file
3135\verb=<adolc/checkpointing.h>=. This header
3136is included by the global header file \verb=<adolc/adolc.h>= automatically.
3137An example program \verb=checkpointing.cpp= illustrates the
3138checkpointing facilities. It can be found in the directory \verb=examples/additional_examples/checkpointing=.
3142\subsection{Advanced algorithmic differentiation of fixed point iterations}
3144Quite often, the state of the considered system denoted by $x\in\R^n$
3145depends on some design parameters denoted by $u\in\R^m$. One example for this setting
3146forms the flow over an aircraft wing. Here, the shape of the wing that
3147is defined by the design vector $u$ 
3148determines the flow field $x$. The desired quasi-steady state $x_*$
3149fulfills the fixed point equation
3151  \label{eq:fixedpoint}
3152  x_* = F(x_*,u)
3154for a given continuously differentiable function
3155$F:\R^n\times\R^m\rightarrow\R^n$. A fixed point property of this kind is
3156also exploited by many other applications.
3158Assume that one can apply the iteration 
3161 x_{k+1} = F(x_k,u)
3163to obtain a linear converging sequence $\{x_k\}$ generated
3164for any given control $u\in\R^n$. Then the limit point $x_*\in\R^n$ fulfils the fixed
3165point equation~\eqref{eq:fixedpoint}. Moreover,
3166suppose that $\|\frac{dF}{dx}(x_*,u)\|<1$ holds for any pair
3167$(x_*,u)$ satisfying equation \eqref{eq:fixedpoint}.
3168Hence, there exists a
3169differentiable function $\phi:\R^m \rightarrow \R^n$,
3170such that $\phi(u) = F(\phi(u),u)$, where the state
3171$\phi(u)$ is a fixed point of $F$ according to a control
3172$u$. To optimize the system described by the state vector $x=\phi(u)$ with respect to
3173the design vector $u$, derivatives of $\phi$ with respect
3174to $u$ are of particular interest.
3176To exploit the advanced algorithmic differentiation  of such fixed point iterations
3177ADOL-C provides the special functions {\tt fp\_iteration(...)}.
3178It has the following interface:
3180\hspace{0.5in}\={\sf short int tag;} \hspace{1.1in}\= \kill    % define tab position
3181\>{\sf int
3182  fp\_iteration(}\={\sf sub\_tape\_num,double\_F,adouble\_F,norm,norm\_deriv,eps,eps\_deriv,}\\
3183\>              \>{\sf N\_max,N\_max\_deriv,x\_0,u,x\_fix,dim\_x,dim\_u)}\\
3184\hspace{0.5in}\={\sf short int tag;} \hspace{0.9in}\= \kill    % define tab position
3185\>{\sf short int sub\_tape\_num;}         \> // tape identification for sub\_tape \\
3186\>{\sf int *double\_F;}         \> // pointer to a function that compute for $x$ and $u$ \\
3187\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
3188\>{\sf int *adouble\_F;}        \> // pointer to a function that compute for $x$ and $u$ \\
3189\>                              \> // the value $y=F(x,u)$ for {\sf double} arguments\\             
3190\>{\sf int *norm;}              \> // pointer to a function that computes\\
3191\>                              \> // the norm of a vector\\
3192\>{\sf int *norm\_deriv;}       \> // pointer to a function that computes\\
3193\>                              \> // the norm of a vector\\
3194\>{\sf double eps;}             \> // termination criterion for fixed point iteration\\
3195\>{\sf double eps\_deriv;}      \> // termination criterion for adjoint fixed point iteration\\
3196\>{\sf N\_max;}                 \> // maximal number of itertions for state computation\\
3197\>{\sf N\_max\_deriv;}          \> // maximal number of itertions for adjoint computation\\
3198\>{\sf adouble *x\_0;}          \> // inital state of fixed point iteration\\
3199\>{\sf adouble *u;}             \> // value of $u$\\
3200\>{\sf adouble *x\_fic;}        \> // final state of fixed point iteration\\
3201\>{\sf int dim\_x;}             \> // dimension of $x$\\
3202\>{\sf int dim\_u;}             \> // dimension of $u$\\
3205Here {\tt sub\_tape\_num} is an ADOL-C identifier for the subtape that
3206should be used for the fixed point iteration.
3207{\tt double\_F} and {\tt adouble\_F} are pointers to functions, that
3208compute for $x$ and $u$ a single iteration step $y=F(x,u)$. Thereby
3209{\tt double\_F} uses {\tt double} arguments and {\tt adouble\_F}
3210uses ADOL-C {\tt adouble} arguments. The parameters {\tt norm} and
3211{\tt norm\_deriv} are pointers to functions computing the norm
3212of a vector. The latter functions together with {\tt eps},
3213{\tt eps\_deriv}, {\tt N\_max}, and {\tt N\_max\_deriv} control
3214the iterations. Thus the following loops are performed:
3217  do                     &   do                           \\
3218  ~~~~$k = k+1$          &   ~~~~$k = k+1$                \\
3219  ~~~~$x = y$            &   ~~~~$\zeta = \xi$            \\
3220  ~~~~$y = F(x,u)$       &   ~~~
3221  $(\xi^T,\bar u^T) = \zeta^TF'(x_*,u) + (\bar x^T, 0^T)$ \\
3222  while $\|y-x\|\geq\varepsilon$ and $k\leq N_{max}$ \hspace*{0.5cm} &
3223  while $\|\xi -\zeta\|_{deriv}\geq\varepsilon_{deriv}$   \\
3224  & and $k\leq N_{max,deriv}$
3227The vector for the initial iterate and the control is stored
3228in {\tt x\_0} and {\tt u} respectively. The vector in which the
3229fixed point is stored is {\tt x\_fix}. Finally {\tt dim\_x}
3230and {\tt dim\_u} represent the dimensions $n$ and $m$ of the
3231corresponding vectors.
3233The presented driver is prototyped in the header file
3234\verb=<adolc/fixpoint.h>=. This header
3235is included by the global header file \verb=<adolc/adolc.h>= automatically.
3236An example code that shows also the
3237expected signature of the function pointers is contained in the directory \verb=examples/additional_examples/fixpoint_exam=.
3239\subsection{Advanced algorithmic differentiation of OpenMP parallel programs}
3241ADOL-C allows to compute derivatives in parallel for functions
3242containing OpenMP parallel loops.
3243This implies that an explicit loop-handling approach is applied. A
3244typical situation is shown in \autoref{fig:basic_layout},
3246    \vspace{3ex}
3247    \begin{center}
3248        \includegraphics[height=4cm]{multiplexed} \\
3249        \begin{picture}(0,0)
3250            \put(-48,40){\vdots}
3251            \put(48,40){\vdots}
3252            \put(-48,80){\vdots}
3253            \put(48,80){\vdots}
3254            \put(-83,132){function eval.}
3255            \put(5,132){derivative calcul.}
3256        \end{picture}
3257    \end{center}
3258    \vspace{-5ex}
3259    \caption{Basic layout of mixed function and the corresponding derivation process}
3260    \label{fig:basic_layout}
3262where the OpenMP-parallel loop is preceded by a serial startup
3263calculation and followed by a serial finalization phase.
3265Initialization of the OpenMP-parallel regions for \mbox{ADOL-C} is only a matter of adding a macro to the outermost OpenMP statement.
3266Two macros are available that only differ in the way the global tape information is handled.
3267Using {\tt ADOLC\_OPENMP}, this information, including the values of the augmented variables, is always transferred from the serial to the parallel region using {\it firstprivate} directives for initialization.
3268For the special case of iterative codes where parallel regions, working on the same data structures, are called repeatedly the {\tt ADOLC\_OPENMP\_NC} macro can be used.
3269Then, the information transfer is performed only once within the iterative process upon encounter of the first parallel region through use of the {\it threadprivate} feature of OpenMP that makes use of thread-local storage, i.e., global memory local to a thread.
3270Due to the inserted macro, the OpenMP statement has the following structure:
3272\hspace*{1cm} \= {\sf \#pragma omp ... ADOLC\_OPENMP} \qquad \qquad or \\
3273              \> {\sf \#pragma omp ... ADOLC\_OPENMP\_NC}
3275Inside the parallel region, separate tapes may then be created.
3276Each single thread works in its own dedicated AD-environment, and all
3277serial facilities of \mbox{ADOL-C} are applicable as usual. The global
3278derivatives can be computed using the tapes created in the serial and
3279parallel parts of the function evaluation, where user interaction is
3280required for the correct derivative concatenation of the various tapes.
3282For the usage of the parallel facilities, the \verb=configure=-command
3283has to be used with the option \verb?--with-openmp-flag=FLAG?, where
3284\verb=FLAG= stands for the system dependent OpenMP flag.
3285The parallel differentiation of a parallel program is illustrated
3286by the example program \verb=openmp_exam.cpp= contained in \verb=examples/additional_examples/openmp_exam=.
3290\section{Traceless forward differentiation in ADOL-C}
3293Up to version 1.9.0, the development of the ADOL-C software package
3294was based on the decision to store all data necessary for derivative
3295computation on traces (tapes), where large applications require the traces to be
3296written out to corresponding files. In almost all cases this means
3297a considerable drawback in terms of run time due to the excessive
3298memory accesses. Using these traces enables ADOL-C to offer multiple
3299functions. However, it is not necessary for all tasks of derivative
3300computation to do that.
3302Starting with version 1.10.0, ADOL-C now features a traceless forward
3303mode for computing first order derivatives in scalar mode, i.e.,
3304$\dot{y} = F'(x)\dot{x}$, and in vector mode, i.e., $\dot{Y} = F'(x)\dot{X}$.
3305This traceless variant coexists with the more universal
3306trace based mode in the package. The following subsections describe
3307the source code modifications required to use the traceless forward mode of
3310\subsection{Modifying the Source Code}
3312Let us consider the coordinate transformation from Cartesian to spherical
3313polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y
3314= F(x)$, with
3316y_1  =  \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad
3317y_2  =  \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad
3318y_3  =  \arctan(x_2/x_1),
3320as an example. The corresponding source code is shown in \autoref{fig:traceless}.
3326\= \kill
3327\> {\sf \#include} {\sf $<$iostream$>$}\\
3328\> {\sf using namespace std;}\\
3329\> \\
3330\> {\sf int main() \{}\\
3331\> {\sf \rule{0.5cm}{0pt}double x[3], y[3];}\\
3332\> \\
3333\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3334\> {\sf \rule{1cm}{0pt}...}\\
3335\> \\
3336\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3337\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3338\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3339\> \\
3340\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0] $<<$ " , y2=" $<<$ y[1] $<<$ " , y3=" $<<$ y[2] $<<$ endl;}\\
3341\> \\
3342\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3343\> \}
3348\caption{Example for traceless forward mode}
3352Changes to the source code that are necessary for applying the
3353traceless forward mode of ADOL-C are described in the following two
3354subsections, where the vector mode version is described
3355as extension of the scalar mode.
3357\subsubsection*{The scalar mode}
3359To use the traceless forward mode, one has to include only
3360the header file \verb#adtl.h# since it does not include the
3361trace based functions defined in other header files.
3363As in the trace based forward version of ADOL-C all derivative
3364calculations are introduced by calls to overloaded
3365operators. Therefore, similar to the trace-based version all
3366independent, intermediate and dependent variables must be declared
3367with type {\sf adouble}. The whole traceless functionality provided by
3368\verb#adtl.h# was written as complete inline intended code
3369due to run time aspects, where the real portion of inlined code can
3370be influenced by switches for many compilers. Likely, the whole
3371derivative code is inlined by default. Our experiments
3372with the traceless mode have produced complete inlined code by using
3373standard switches (optimization) for GNU and Intel C++
3376To avoid name conflicts
3377resulting from the inlining the traceless version has its own namespace
3378\verb#adtl#. As a result four possibilities of using the {\sf adouble}
3379type are available for the traceless version:
3381\item Defining a new type
3382      \begin{center}
3383        \begin{tabular}{l}
3384          {\sf typedef adtl::adouble adouble;}\\
3385          ...\\
3386          {\sf adouble tmp;}
3387        \end{tabular}
3388      \end{center}
3389      This is the preferred way. Remember, you can not write an own
3390      {\sf adouble} type/class with different meaning after doing the typedef.
3391\item Declaring with namespace prefix
3392      \begin{center}
3393        \begin{tabular}{l}
3394          {\sf adtl::adouble tmp;}
3395        \end{tabular}
3396      \end{center}
3397      Not the most handsome and efficient way with respect to coding
3398      but without any doubt one of the safest ways. The identifier
3399      {\sf adouble} is still available for user types/classes.
3400\item Trusting macros
3401      \begin{center}
3402        \begin{tabular}{l}
3403          {\sf \#define adouble adtl::adouble}\\
3404          ...\\
3405          {\sf adouble tmp;}
3406        \end{tabular}
3407      \end{center}
3408      This approach should be used with care, since standard defines are text replacements.
3409  \item Using the complete namespace
3410        \begin{center}
3411          \begin{tabular}{l}
3412            {\sf using namespace adtl;}\\
3413            ...\\
3414            {\sf adouble tmp;}
3415          \end{tabular}
3416        \end{center}
3417        A very clear approach with the disadvantage of uncovering all the hidden secrets. Name conflicts may arise!
3419After defining the variables only two things are left to do. First
3420one needs to initialize the values of the independent variables for the
3421function evaluation. This can be done by assigning the variables a {\sf
3422double} value. The {\sf ad}-value is set to zero in this case.
3423Additionally, the traceless forward mode variant of ADOL-C
3424offers a function named {\sf setValue} for setting the value without
3425changing the {\sf ad}-value. To set the {\sf ad}-values of the independent
3426variables ADOL-C offers two possibilities:
3428  \item Using the constructor
3429        \begin{center}
3430          \begin{tabular}{l}
3431        {\sf double seedOne = 1., seedZero = 0.;}\\
3432            {\sf adouble x1(2,\&seedOne), x2(4,\&seedZero), y;}
3433          \end{tabular}
3434        \end{center}
3435        This would create three adoubles $x_1$, $x_2$ and $y$. Obviously, the latter
3436        remains uninitialized. In terms of function evaluation
3437        $x_1$ holds the value 2 and $x_2$ the value 4 whereas the derivative values
3438        are initialized to $\dot{x}_1=1$ and $\dot{x}_2=0$.
3439   \item Setting point values directly
3440         \begin{center}
3441           \begin{tabular}{l}
3442         {\sf double seedOne = 1., seedZero = 0.;}\\
3443             {\sf adouble x1=2, x2=4, y;}\\
3444             ...\\
3445             {\sf x1.setADValue(\&seedOne);}\\
3446             {\sf x2.setADValue(\&seedZero);}
3447           \end{tabular}
3448         \end{center}
3449         The same example as above but now using {\sf setADValue}-method for initializing the derivative values. It is important to note that in both cases the derivative values have to be set by passing an address of a variable or a pointer of the {\sf double} type (instead of passing the value directly). The reason for doing this will become clear in the next subsection.
3452The derivatives can be obtained at any time during the evaluation
3453process by calling the {\sf getADValue}-method which returns a pointer (instead of a value), and therefore, it can be assigned to another pointer or it can be dereferenced immediately in order to read the value:
3455  \begin{tabular}{l}
3456    {\sf adouble y;}\\
3457    ...\\
3458    {\sf cout $<<$ *(y.getADValue());}
3459  \end{tabular}
3461\autoref{fig:modcode} shows the resulting source code incorporating
3462all required changes for the example
3463given above.
3469\hspace*{-1cm} \= \kill
3470\> {\sf \#include $<$iostream$>$}\\
3471\> {\sf using namespace std;}\\
3472\> \\
3473\> {\sf \#include $<$adolc/adtl.h$>$}\\
3474\> {\sf typedef adtl::adouble adouble;}\\
3476\> {\sf int main() \{}\\
3477\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3479\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i)\hspace*{3cm}// Initialize $x_i$}\\
3480\> {\sf \rule{1cm}{0pt}...}\\
3482\> {\sf \rule{0.5cm}{0pt}double seed = 1.;}\\
3483\> {\sf \rule{0.5cm}{0pt}x[0].setADValue(\&seed);\hspace*{3cm}// derivative of f with respect to $x_1$}\\
3484\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3485\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3486\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3488\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3489\> {\sf \rule{0.5cm}{0pt}cout $<<$ "dy2/dx1 = " $<<$ *(y[1].getADValue()) $<<$ endl;}\\
3490\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3491\> {\sf \}}
3495\caption{Example for traceless scalar forward mode}
3499\subsubsection*{The vector mode}
3501In scalar mode only one direction element has to be stored per {\sf
3502  adouble} whereas a field of $p$ elements is needed in the vector
3503  mode to cover the computations for the given $p$ directions. The
3504  resulting changes to the source code are described in this section.
3506In order to use the vector mode one has to call
3507\verb#adtl::setNumDir(N)#.  This function takes the maximal number of
3508directions to be used within the resulting vector mode. Changing the
3509number of directions in the middle of executing code containing active
3510variables is undesirable and can lead to segmentation violations,
3511hence such an operation will result in an
3512error. \verb#adtl::setNumDir(N)# must therefore be called before
3513declaring any \verb#adtl::adouble# objects.
3515Setting and getting the derivative values is done in the same manner as in the scalar case, by passing and retrieving the pointers, as illustrated in the following example:
3517  \begin{tabular}{l}
3518    {\sf adtl::setNumDir(10);}\\
3519    {\sf adouble x, y;}\\
3520    {\sf double *ptr1 = new double[10];}\\
3521    {\sf double *ptr2;}\\
3522      ...\\
3523    {\sf x1=2;}\\
3524    {\sf x1.setADValue(ptr1);}\\
3525    ...\\
3526    {\sf ptr2=y.getADValue();}
3527  \end{tabular}
3529Now using the pointers in the methods {\sf setADValue} and {\sf getADValue} makes more sense since the vector mode requires the field of $p$ elements per {\sf adouble} which holds the derivative values.
3530Additionally, the traceless vector forward mode of ADOL-C offers two
3531new methods for setting/getting the derivative values. Setting a derivative value of the desired vector element is possible by passing a {\sf double} value and its position in the vector ({\sf integer}). Getting a derivative value of the desired vector element requires as argument only its position in the vector. One can note that the pointers are not used in these cases, as illustrated in the following example:
3533  \begin{tabular}{l}
3534    {\sf adtl::setNumDir(10);}\\
3535    ...\\
3536    {\sf adouble x, y;}\\
3537    ...\\
3538    {\sf x1=2;}\\
3539    {\sf x1.setADValue(5,1);\hspace*{3.7cm}// set the 6th point value of x to 1.0}\\
3540      ...\\
3541    {\sf cout $<<$ y.getADValue(3) $<<$ endl;\hspace*{1cm}// print the 4th derivative value of y}
3542  \end{tabular}
3544The resulting source code containing all changes that are required is
3545shown in \autoref{fig:modcode2}
3550\hspace*{-1cm} \= \kill
3551\> {\sf \#include $<$iostream$>$}\\
3552\> {\sf  using namespace std;}\\
3554\> {\sf \#include $<$adolc/adtl.h$>$}\\
3555\> {\sf typedef adtl::adouble adouble;}\\
3557\> {\sf int main() \{}\\
3558\> {\sf \rule{0.5cm}{0pt}adtl::setNumDir(3);}\\
3559\> {\sf \rule{0.5cm}{0pt}adouble x[3], y[3];}\\
3561\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3562\> {\sf \rule{1cm}{0pt}...\hspace*{3cm}// Initialize $x_i$}\\
3563\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j) if (i==j) x[i].setADValue(j,1);}\\
3564\> {\sf \rule{0.5cm}{0pt}\}}\\
3566\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3567\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3568\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3570\> {\sf \rule{0.5cm}{0pt}cout $<<$ "y1=" $<<$ y[0].getValue() $<<$ " , y2=" $<<$ y[1].getValue ... ;}\\
3571\> {\sf \rule{0.5cm}{0pt}cout $<<$ "jacobian : " $<<$ endl;}\\
3572\> {\sf \rule{0.5cm}{0pt}for (int i=0; i$<$3; ++i) \{}\\
3573\> {\sf \rule{1cm}{0pt}for (int j=0; j$<$3; ++j)}\\
3574\> {\sf \rule{1.5cm}{0pt}cout $<<$ y[i].getADValue(j) $<<$ "  ";}\\
3575\> {\sf \rule{1cm}{0pt}cout $<<$ endl;}\\
3576\> {\sf \rule{0.5cm}{0pt}\}}\\
3577\> {\sf \rule{0.5cm}{0pt}return 0;}\\
3578\> {\sf \}}
3581\caption{Example for traceless vector forward mode}
3585\subsection{Compiling and Linking the Source Code}
3587After incorporating the required changes, one has to compile the
3588source code and link the object files to get the executable.
3589As long as the ADOL-C header files are not included in the absolute path
3590the compile sequence should be similar to the following example:
3592  \begin{tabular}{l}
3593    {\sf g++ -I/home/username/adolc\_base/include -c traceless\_scalar.cpp}
3594  \end{tabular}
3596The \verb#-I# option tells the compiler where to search for the ADOL-C
3597header files. This option can be omitted when the headers are included
3598with absolute path or if ADOL-C is installed in a ``global'' directory.
3600Although originally designed to be completely inline, certain global
3601variables need to be initialised in the compiled part of the library
3602and therefore the ADOL-C library must be linked with the compiled code.
3603The example started above could be finished with the
3604following command:
3606  \begin{tabular}{l}
3607    {\sf g++ -o traceless\_scalar traceless\_scalar.o}\\
3608    {\sf -Wl,-{}-rpath -Wl,/home/username/adolc\_base/lib -L/home/username/adolc\_base/lib -ladolc}
3609  \end{tabular}
3611The mentioned source codes {\sf traceless\_scalar.c} and {\sf traceless\_vector.c} 
3612illustrating the use of the for traceless scalar and vector mode can be found in
3613the directory {\sf examples}.
3615\subsection{Concluding Remarks for the Traceless Forward Mode Variant}
3617As many other AD methods the traceless forward mode provided by the
3618ADOL-C package has its own strengths and drawbacks. Please read the
3619following section carefully to become familiar with the things that
3620can occur:
3622  \item Advantages:
3623    \begin{itemize}
3624      \item Code speed\\
3625        Increasing computation speed was one of the main aspects in writing
3626        the traceless code. In many cases higher performance can be
3627        expected this way. However this is not guarenteed for repeated
3628        evaluations.
3629      \item Smaller overall memory requirements\\
3630        Traceless ADOL-C does not write traces anymore, as the name
3631        implies. Loop ''unrolling'' can be avoided this
3632        way. Considered main memory plus disk space as overall memory
3633        requirements the traceless version can be
3634        executed in a more efficient way.
3635    \end{itemize}
3636  \item Drawbacks:
3637    \begin{itemize}
3638    \item Main memory limitations\\
3639      The ability to compute derivatives to a given function is
3640      bounded by the main memory plus swap size  when using
3641      traceless ADOL-C. Computation from swap should be avoided anyway
3642      as far as possible since it slows down the computing time
3643      drastically. Therefore, if the program execution is 
3644      terminated without error message insufficient memory size can be
3645      the reason among other things. The memory requirements $M$ can
3646      be determined roughly as followed:
3647      \begin{itemize}
3648        \item Scalar mode: $M=$(number of {\sf adouble}s)$*2 + M_p$
3649        \item Vector mode: $M=$(number of {\sf adouble}s)*({\sf
3650          NUMBER\_DIRECTIONS}$+1) + M_p$ 
3651      \end{itemize}
3652      where the storage size of all non {\sf adouble} based variables is described by $M_p$.
3653    \item Compile time\\
3654      As discussed in the previous sections, the traceless forward mode of
3655      the ADOL-C package is implemented as inline intended version. Using
3656      this approach results in a higher source code size, since every
3657      operation involving at least one {\sf adouble} stands for the
3658      operation itself as well as for the corresponding derivative
3659      code after the inlining process. Therefore, the compilation time
3660      needed for the traceless version may be higher than that of the trace based code.
3661    \item Code Size\\
3662      A second drawback and result of the code inlining is the
3663      increase of code sizes for the binaries. The increase
3664      factor compared to the corresponding trace based program is
3665      difficult to quantify as it is task dependent. Practical results
3666      have shown that values between 1.0 and 2.0 can be
3667      expected. Factors higher than 2.0 are possible too and even
3668      values below 1.0 have been observed.
3669    \end{itemize}
3673\section{Traceless forward differentiation in ADOL-C using Cuda}
3676One major drawback using the traceless version of ADOL-C is the fact
3677that several function evaluations are needed to compute derivatives
3678in many different points. More precisely, to calculate the Jacobian for
3679a function $F:\mathbb{R}^n\rightarrow \mathbb{R}^m$ in $M$ points, $M$
3680function evaluations are needed for the traceless vector mode and even
3681$M*n$ for the traceless scalar mode. Depending on the size of the function
3682this can result in a long runtime. To achieve a better performance one can
3683use parallelisation techniques as the same operations are
3684performed during a function evaluation. One possibility is to use GPUs
3685since they are optimized for data parallel computation. Starting with
3686version 2.3.0 ADOL-C now features a traceless forward mode for computing
3687first order derivatives in scalar mode on GPUs using the
3688general purpose parallel computing architecture Cuda.
3690The idea is to include parallel code that executes in many GPU threads across
3691processing elements. This can be done by using kernel functions, that is functions
3692which are executed on GPU as an array of threads in parallel. In general all
3693threads execute the same code. They are grouped into blocks which are then grouped
3694into grids. A kernel is executed as a grid of blocks of threads. For more
3695details see, e.g., the NVIDIA CUDA C Programming Guide which can be downloaded from
3696the web page {\sf}. To solve the problem of calculating
3697the Jacobian of $F$ at $M$ points it is possible to let each thread perform a
3698function evaluation and thus the computation of derivatives for one direction
3699at one point. The advantage is that the function is evaluated in different points
3700in parallel which can result in a faster wallclock runtime.
3702The following subsection describes the source code modifications required to use
3703the traceless forward mode of ADOL-C with Cuda. 
3705\subsection{Modifying the source code}
3707Let us again consider the coordinate transformation from Cartesian to spherical
3708polar coordinates given by the function $F: \mathbb{R}^3 \to \mathbb{R}^3$, $y
3709= F(x)$, with
3711y_1  =  \sqrt{x_1^2 + x_2^2 + x_3^2},\qquad
3712y_2  =  \arctan\left(\sqrt{x_1^2 + x_2^2}/x_3\right),\qquad
3713y_3  =  \arctan(x_2/x_1),
3715as an example. We now calculate the Jacobian at $M=1024$ different points.
3716The source code for one point is shown in \autoref{fig:traceless}. This example
3717has no real application but can still be used to show the combination of the traceless version
3718of ADOL-C and Cuda.
3720For the use of this mode a Cuda toolkit, which is suitable for the grafic card used,
3721has to be installed. Furthermore, it is important to check that the graphic card
3722used supports double precision (for details see e.g. NVIDIA CUDA C Programming Guide).
3723Otherwise the data type employed inside of the {\sf adouble} class has to be adapted
3724to {\sf float}. To use the traceless forward mode with Cuda, one has to include the header
3725files \verb#adoublecuda.h# and \verb#cuda.h#. The first one contains the
3726definition of the class {\sf adouble} and the overloaded operators
3727for this version of ADOL-C. As in the other versions all derivative
3728calculations are introduced by calls to overloaded operators. The
3729second header file is needed for the use of Cuda. 
3731One possibility to solve the problem above is the following. First of all
3732three {\sf double} arrays are needed: {\sf x} for the independent variables,
3733{\sf y} for the dependent variables and {\sf deriv} for the values of the
3734Jacobian matrices. The independent variables have to be initialised, therefore
3735the points at which the function should be evaluated are saved in a row in the
3736same array of length $3*M$. For the computation on GPUs one also has to allocate memory
3737on this device. Using the syntax in Cuda one can allocate an array of
3738length $3*M$ (number of independent variables times number of points) for
3739the independent variables as follows:
3741  \begin{tabular}{l}
3742    {\sf double * devx;}\\
3743    {\sf cudaMalloc((void**)\&devx, 3*M*sizeof(double));}\\
3744  \end{tabular}
3746The arrays for the dependent variables and the values of the Jacobian matrices
3747are allocated in the same way. Then the values of the independent variables
3748have to be copied to the GPU using the following command
3750  \begin{tabular}{l}
3751    {\sf cudaMemcpy(devx, x, sizeof(double)*3*M, cudaMemcpyHostToDevice);}\\
3752  \end{tabular}
3754The argument {\sf cudaMemcpyHostToDevice} indicates that the values are copied from
3755the host to the GPU. In this case the values stored in {\sf x} are copied to {\sf devx}.
3757Now all required information has been transferred to the GPU. The changes
3758in the source code made so far are summarized in \autoref{fig:cudacode1}.
3764\hspace*{-1cm} \= \kill
3765\> {\sf \#include $<$iostream$>$}\\
3766\> {\sf \#include $<$cuda.h$>$}\\
3767\> {\sf \#include $<$adoublecuda.h$>$}\\
3768\> {\sf using namespace std;}\\
3770\> {\sf int main() \{}\\
3771\> {\sf \rule{0.5cm}{0pt}int M=1024;}\\
3772\> {\sf \rule{0.5cm}{0pt}double* deriv = new double[9*M];}\\
3773\> {\sf \rule{0.5cm}{0pt}double* y = new double[3*M];}\\
3774\> {\sf \rule{0.5cm}{0pt}double* x = new double[3*M];}\\
3776\> {\sf \rule{0.5cm}{0pt}// Initialize $x_i$}\\
3777\> {\sf \rule{0.5cm}{0pt}for (int k=0; k$<$M; ++k)\{}\\
3778\> {\sf \rule{1cm}{0pt}for (int i=0; i$<$3; ++i)}\\
3779\> {\sf \rule{1.5cm}{0pt}x[k*3+i] =i + 1/(k+1);\}}\\
3781\> {\sf \rule{0.5cm}{0pt}// Allocate array for independent and dependent variables}\\
3782\> {\sf \rule{0.5cm}{0pt}// and Jacobian matrices on GPU}\\
3783\> {\sf \rule{0.5cm}{0pt}double * devx;}\\
3784\> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devx, 3*M*sizeof(double));}\\
3785\> {\sf \rule{0.5cm}{0pt}double * devy;}\\
3786\> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devy, 3*M*sizeof(double));}\\
3787\> {\sf \rule{0.5cm}{0pt}double * devderiv;}\\
3788\> {\sf \rule{0.5cm}{0pt}cudaMalloc((void**)\&devderiv, 3*3*M*sizeof(double));}\\
3790\> {\sf \rule{0.5cm}{0pt}// Copy values of independent variables from host to GPU}\\
3791\> {\sf \rule{0.5cm}{0pt}cudaMemcpy(devx, x, sizeof(double)*3*M, cudaMemcpyHostToDevice);}\\
3793\> {\sf \rule{0.5cm}{0pt}// Call function to specify amount of blocks and threads to be used}\\
3794\> {\sf \rule{0.5cm}{0pt}kernellaunch(devx, devy, devderiv,M);}\\
3796\> {\sf \rule{0.5cm}{0pt}// Copy values of dependent variables and Jacobian matrices from GPU to host}\\
3797\> {\sf \rule{0.5cm}{0pt}cudaMemcpy(y, devy, sizeof(double)*3*M,cudaMemcpyDeviceToHost);}\\
3798\> {\sf \rule{0.5cm}{0pt}cudaMemcpy(deriv, devderiv, sizeof(double)*3*3*M, cudaMemcpyDeviceToHost);}\\
3799\> {\sf \}}
3803\caption{Example for traceless scalar forward mode with Cuda}
3807In the next step the user has to specify how many blocks and threads per block
3808will be needed for the function evaluations. In the present example this is done by 
3809the call of the function {\sf kernellaunch}, see \autoref{fig:cudacode2}.
3810In this case the blocks are two dimensional: the x-dimension is determined
3811by the number of points $M$ at which the Jacobian matrix has to be calculated
3812while the y-dimension is given by the number of independent variables, i.e., 3.
3813Since a block cannot contain more than 1024 threads, the
3814x-dimension in the example is 64 instead of $M=1024$. Therefore $1024/64=16$
3815blocks are needed. The described division into blocks is reasonable as each
3816thread has to perform a function evaluation for one point and one direction,
3817hence $M*3$ threads are needed where 3 denotes the number of independent
3818variables corresponding to the number of directions needed for the computation
3819of the Jacobian in one point.
3824\hspace*{-1cm} \= \kill
3825\> {\sf cudaError\_t kernellaunch(double* inx, double* outy, double* outderiv, int M) \{}\\
3826\> {\sf \rule{0.5cm}{0pt}// Create 16 blocks}\\
3827\> {\sf \rule{0.5cm}{0pt}int Blocks=16;}\\
3828\> {\sf \rule{0.5cm}{0pt}// Two dimensional (M/Blocks)$\times$3 blocks}\\
3829\> {\sf \rule{0.5cm}{0pt}dim3 threadsPerBlock(M/Blocks,3);}\\
3831\> {\sf \rule{0.5cm}{0pt}// Call kernel function with 16 blocks with (M/Blocks)$\times$3 threads per block}\\
3832\> {\sf \rule{0.5cm}{0pt}kernel $<<<$ Blocks, threadsPerBlock $>>>$(inx, outy, outderiv);}\\
3833\> {\sf \rule{0.5cm}{0pt}cudaError\_t cudaErr = cudaGetLastError();}\\
3835\> {\sf \rule{0.5cm}{0pt}return cudaErr;}\\
3836\> {\sf \}}
3839\caption{Example for traceless scalar forward mode with Cuda}
3843We can now perform the function evaluations together with the calculation
3844of the Jacobians. The corresponding code is illustrated in \autoref{fig:cudacode3}.
3845Since the function evaluations should be performed on a GPU a kernel function is
3846needed which is defined by using the \mbox{{\sf \_\_ global\_\_}} declaration specifier
3847(see \autoref{fig:cudacode3}). Then each thread executes the operations that are
3848defined in the kernel.
3853\hspace*{-1cm} \= \kill
3854\> {\sf \_\_global\_\_ void kernel(double* inx, double* outy, double* outderiv) \{}\\
3855\> {\sf \rule{0.5cm}{0pt}const int index = threadIdx.x ;}\\
3856\> {\sf \rule{0.5cm}{0pt}const int index1 = threadIdx.y;}\\
3857\> {\sf \rule{0.5cm}{0pt}const int index2 = blockIdx.x;}\\
3858\> {\sf \rule{0.5cm}{0pt}const int index3 = blockDim.x;}\\
3859\> {\sf \rule{0.5cm}{0pt}const int index4 = blockDim.x*blockDim.y;}\\
3861\> {\sf \rule{0.5cm}{0pt}// Declare dependent and independent variables as {\sf adouble}s}\\
3862\> {\sf \rule{0.5cm}{0pt}adtlc::adouble y[3], x[3];}\\
3863\> {\sf \rule{0.5cm}{0pt}// Read out point for function evaluation}\\
3864\> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\
3865\> {\sf \rule{1cm}{0pt}x[i]=inx[index2*index4+index*3+i];}\\
3866\> {\sf \rule{0.5cm}{0pt}// Set direction for calculation of derivatives}\\
3867\> {\sf \rule{0.5cm}{0pt}x[index1].setADValue(1);}\\
3869\> {\sf \rule{0.5cm}{0pt}// Function evaluation}\\
3870\> {\sf \rule{0.5cm}{0pt}y[0] = sqrt(x[0]*x[0]+x[1]*x[1]+x[2]*x[2]);}\\
3871\> {\sf \rule{0.5cm}{0pt}y[1] = atan(sqrt(x[0]*x[0]+x[1]*x[1])/x[2]);}\\
3872\> {\sf \rule{0.5cm}{0pt}y[2] = atan(x[1]/x[0]);}\\
3874\> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\
3875\> {\sf \rule{1cm}{0pt}outy[(index2*index3+index)*3+i]=y[i].getValue();}\\
3876\> {\sf \rule{0.5cm}{0pt}for(int i=0; i $<$ 3; i++)}\\
3877\> {\sf \rule{1cm}{0pt}outderiv[(index2*index4+index*3+index1)*3+i]=y[i].getADValue();}\\
3878\> {\sf \}}
3881\caption{Example for traceless scalar forward mode with Cuda}
3885In this example each thread is assigned the task of calculating the
3886derivatives in one point with respect to one independent variable. Therefore
3887some indices are needed for the implementation. Each thread has a unique thread
3888ID in a block consisting of an x and an y-dimension. The blocks have an ID as
3889well. In the example the following indices are used.
3892 \item {\sf index = threadIdx.x} denotes the x-dimension of a thread (ranges from 0 to 63 in the example)
3893 \item {\sf index1 = threadIdx.y} denotes the y-dimension of a thread (ranges from 0 to 2 in the example)
3894 \item {\sf index2 = blockIdx.x} denotes the block index (ranges from 0 to 15 in the example)
3895 \item {\sf index3 = blockDim.x} denotes the x-dimension of a block (always 64 in the example)
3896 \item {\sf index4 = blockDim.x*blockDim.y} denotes the size of a block (always $64*3=192$ in the example)
3899For the calculation of derivatives the function evaluation has to be performed
3900with {\sf adouble}s. Therefore the dependent and the independent variables
3901have to be declared as {\sf adouble}s as in the other versions of ADOL-C (see, e.g., \autoref{traceless}).
3902Similar to the traceless version without Cuda the namespace \verb#adtlc# is used.
3903Now each thread has to read out the point at which the function evaluation is
3904then performed. This is determined by the blockindex and the x-dimension of
3905the thread, therefore the independent variables in a thread
3906have the values
3908  \begin{tabular}{l}
3909    {\sf \rule{0.5cm}{0pt}x[i]=inx[index2*index4+index*3+i];}\\
3910  \end{tabular}
3912for {\sf i=0,1,2} where {\sf inx} corresponds to the vector {\sf devx}. The direction for
3913the derivatives is given by {\sf index1}.
3915  \begin{tabular}{l}
3916     {\sf \rule{0.5cm}{0pt}x[index1].setADValue(1);}\\
3917  \end{tabular}
3919The functions for setting and getting the value and the derivative value of an
3920{\sf adouble} are the same as in the traceless version of ADOL-C for first
3921order derivatives (see \autoref{traceless}).
3922The function evaluation is then performed with {\sf adouble x} on GPU and the results
3923are saved in {\sf adouble y}. The function evaluation itself remains unchanged
3924(see \autoref{fig:cudacode3}). Then we store the values of the function
3925evaluations in each point in a row in one array:
3927  \begin{tabular}{l}
3928   {\sf \rule{0.5cm}{0pt}outy[(index2*index3+index)*3+i]=y[i].getValue();}\\
3929  \end{tabular}
3931for {\sf i=0,1,2}. The values of a Jacobian are stored column by column in
3932a row:
3934  \begin{tabular}{l}
3935   {\sf \rule{0.5cm}{0pt}outderiv[(index2*index4+index*3+index1)*3+i]=y[i].getADValue();}\\
3936  \end{tabular}
3938for {\sf i=0,1,2}.
3940Now there is one thing left to do. The values calculated on the GPU have to be copied
3941back to host. For the dependent variables in the example this can be done by the
3942following call:
3944  \begin{tabular}{l}
3945 {\sf \rule{0.5cm}{0pt}cudaMemcpy(y, devy, sizeof(double)*3*M,cudaMemcpyDeviceToHost);}\\
3946  \end{tabular}
3948see \autoref{fig:cudacode1}. The argument {\sf cudaMemcpyDeviceToHost} determines that
3949the values are copied from GPU to host.
3952\subsection{Compiling and Linking the Source Code}
3954After incorporating the required changes, one has to compile the
3955source code and link the object files to get the executable. For
3956the compilation of a Cuda file the {\sf nvcc} compiler is needed.
3957The compile sequence should be similar to the following example:
3959  \begin{tabular}{l}
3960    {\sf nvcc -arch=sm\_20 -o traceless\_cuda traceless\}
3961  \end{tabular}
3963The compiler option \verb#-arch=sm_20# specifies the compute capability that is
3964assumed, in this case one that supports double precision.
3966The mentioned source code {\sf traceless\} illustrating the use of the
3967forward traceless scalar mode with Cuda and a further example {\sf}
3968can be found in the directory examples. The second example is an adaption of
3969the OpenMP example to the traceless version with Cuda.
3973\section{Installing and Using ADOL-C}
3976\subsection{Generating the ADOL-C Library}
3979The currently built system is best summarized by the ubiquitous gnu
3980install triplet
3982\verb=configure - make - make install= .
3984Executing this three steps from the package base directory
3985\verb=</SOMEPATH/=\texttt{\packagetar}\verb=>= will compile the static and the dynamic
3986ADOL-C library with default options and install the package (libraries
3987and headers) into the default installation directory {\tt
3988  \verb=<=\$HOME/adolc\_base\verb=>=}. Inside the install directory
3989the subdirectory \verb=include= will contain all the installed header
3990files that may be included by the user program, the subdirectory
3991\verb=lib= will contain the 32-bit compiled library
3992and the subdirectory \verb=lib64= will contain the 64-bit compiled
3993library. Depending on the compiler only one of \verb=lib= or
3994\verb=lib64= may be created.
3996Before doing so the user may modify the header file \verb=usrparms.h=
3997in order to tailor the \mbox{ADOL-C} package to the needs in the
3998particular system environment as discussed in
3999\autoref{Customizing}. The configure procedure which creates the necessary
4000\verb=Makefile=s can be customized by use of some switches. Available
4001options and their meaning can be obtained by executing
4002\verb=./configure --help= from the package base directory.
4004All object files and other intermediately generated files can be
4005removed by the call \verb=make clean=. Uninstalling ADOL-C by
4006executing \verb=make uninstall= is only reasonable after a previous
4007called \verb=make install= and will remove all installed package files
4008but will leave the created directories behind.
4010The sparse drivers are included in the ADOL-C libraries if the
4011\verb=./configure= command is executed with the option
4012\verb=--enable-sparse=. The ColPack library available at
4013\verb= is required to
4014compute the sparse structures, and is searched for in all the default
4016In case the library and its headers are installed in a nonstandard path
4017this may be specified with the \verb?--with-colpack=PATH? option.
4018It is assumed that the library and its header files have the following
4019directory structure: \verb?PATH/include? contains all the header
4021\verb?PATH/lib? contains the 32-bit compiled library and
4022\verb?PATH/lib64? contains the 64-bit compiled library. Depending on
4023the compiler used to compile {\sf ADOL-C} one of these libraries will
4024be used for linking.
4026The option \verb=--disable-stdczero= turns off the initialization in the {\sf adouble} default
4027constructor. This will improve efficiency but requires that there be no implicit array initialization in the code, see
4030Support for MPI and the AdjoinableMPI API (see\\
4032) may be enabled using the option \verb=--enable-ampi=. This requires
4033the presence of MPI compiler wrappers \verb=mpicc= and \verb=mpicxx=
4034in the \verb=$PATH= and the AdjoinableMPI libraries in the standard
4035locations. If MPI is installed in a nonstandard path one may specify
4036this using the \verb?--with-mpi=PATH? option. Similarly if the
4037AdjoinableMPI libraries are in an nonstandard path this may be
4038specified using the \verb?--with-ampi=PATH? option.  When MPI and
4039AdjoinableMPI support is compiled into {\sf ADOL-C} the generated
4040library will be unsuitable for linking with non-MPI
4041programs and is called \verb=libadolc_ampi=. Therefore the user
4042programs should use the flag \verb=-ladolc_ampi= instead of
4043\verb=-ladolc= in this case. An advanced user may infact specify
4044whatever name the resulting {\sf ADOL-C} library should have using the
4045\verb?--with-soname=SONAME? option.
4048\subsection{Compiling and Linking the Example Programs}
4050The installation procedure described in \autoref{genlib} also
4051provides the \verb=Makefile=s  to compile the example programs in the
4052directories \verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= and the
4053additional examples in
4054\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples/additional_examples=. However,
4055one has to execute the
4056\verb=configure= command with  appropriate options for the ADOL-C package to enable the compilation of
4057examples. Available options are:
4060\verb=--enable-docexa=&build all examples discussed in this manual\\
4061&(compare \autoref{example})\\
4062\verb=--enable-addexa=&build all additional examples\\
4063&(See file \verb=README= in the various subdirectories)
4067Just calling \verb=make= from the packages base directory generates
4068all configured examples and the library if necessary. Compiling from
4069subdirectory \verb=examples= or one of its subfolders is possible
4070too. At least one kind of the ADOL-C library (static or shared) must
4071have been built previously in that case. Hence, building the library
4072is always the first step.
4074For Compiling the library and the documented examples on Windows using
4075Visual Studio please refer to the \verb=<Readme_VC++.txt>= files in
4076the \verb=<windows/>=, \verb=<ThirdParty/ColPack/>= and
4077\verb=<ADOL-C/examples/>= subdirectories.
4079\subsection{Description of Important Header Files}
4082The application of the facilities of ADOL-C requires the user
4083source code (program or module) to include appropriate
4084header files where the desired data types and routines are
4085prototyped. The new hierarchy of header files enables the user
4086to take one of two possible ways to access the right interfaces.
4087The first and easy way is recommended to beginners: As indicated in
4088\autoref{globalHeaders} the provided {\em global} header file
4089\verb=<adolc/adolc.h>= can be included by any user code to support all
4090capabilities of ADOL-C depending on the particular programming language
4091of the source.   
4094\center \small
4096\verb=<adolc/adolc.h>= &
4098  \boldmath $\rightarrow$ \unboldmath
4099                 & global header file available for easy use of ADOL-C; \\
4100  $\bullet$      & includes all ADOL-C header files depending on
4101                   whether the users source is C++ or C code.
4103\\ \hline
4104\verb=<adolc/internal/usrparms.h>= &
4106  \boldmath $\rightarrow$ \unboldmath
4107                 & user customization of ADOL-C package (see
4108                   \autoref{Customizing}); \\
4109  $\bullet$      & after a change of
4110                   user options the ADOL-C library \verb=libadolc.*=
4111                   has to be rebuilt (see \autoref{genlib}); \\
4112  $\bullet$      & is included by all ADOL-C header files and thus by all user
4113                   programs.
4114\end{tabular*} \\ \hline
4116\caption{Global header files}
4120The second way is meant for the more advanced ADOL-C user: Some source code
4121includes only those interfaces used by the particular application.
4122The respectively needed header files are indicated
4123throughout the manual.
4124Existing application determined dependences between the provided
4125ADOL-C routines are realized by automatic includes of headers in order
4126to maintain easy use. The header files important to the user are described
4127in the \autoref{importantHeaders1} and \autoref{importantHeaders2}.
4130\center \small
4132%\multicolumn{2}{|l|}{\bf Tracing/taping}\\ \hline
4133\verb=<adolc/adouble.h>= &
4135  \boldmath $\rightarrow$ \unboldmath
4136                & provides the interface to the basic active
4137                  scalar data type of ADOL-C: {\sf class adouble}
4138                  (see \autoref{prepar});
4139%  $\bullet$     & includes the header files \verb=<adolc/avector.h>= and \verb=<adolc/taputil.h>=.
4141\\ \hline
4142% \verb=<adolc/avector.h>= &
4144%  \boldmath $\rightarrow$ \unboldmath
4145%                & provides the interface to the active vector
4146%                  and matrix data types of ADOL-C: {\sf class adoublev}
4147%                  and {\sf class adoublem}, respectively
4148%                   (see \autoref{arrays}); \\
4149%  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
4151%\\ \hline
4152 \verb=<adolc/taputil.h>= &
4154  \boldmath $\rightarrow$ \unboldmath
4155                & provides functions to start/stop the tracing of
4156                  active sections (see \autoref{markingActive})
4157                  as well as utilities to obtain
4158                  tape statistics (see \autoref{examiningTape}); \\
4159  $\bullet$     & is included by the header \verb=<adolc/adouble.h>=.
4161\\ \hline
4163\caption{Important header files: tracing/taping}
4168\center \small
4170%\multicolumn{2}{|l|}{\bf Evaluation of derivatives}\\ \hline
4171\verb=<adolc/interfaces.h>= &
4173  \boldmath $\rightarrow$ \unboldmath
4174                & provides interfaces to the {\sf forward} and
4175                  {\sf reverse} routines as basic versions of derivative
4176                  evaluation (see \autoref{forw_rev}); \\
4177  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
4178  $\bullet$     & includes the header \verb=<adolc/sparse/sparsedrivers.h>=; \\
4179  $\bullet$     & is included by the header \verb=<adolc/drivers/odedrivers.h>=.
4181\\ \hline
4182\verb=<adolc/drivers.h>= &
4184  \boldmath $\rightarrow$ \unboldmath
4185                & provides ``easy to use'' drivers for solving
4186                  optimization problems and nonlinear equations
4187                  (see \autoref{optdrivers}); \\
4188  $\bullet$     & comprises C and Fortran-callable versions.
4190\\ \hline
4192\verb=<adolc/sparse/=\newline\verb= sparsedrivers.h>=
4193\end{minipage}  &
4195  \boldmath $\rightarrow$ \unboldmath
4196                & provides the ``easy to use'' sparse drivers
4197                  to exploit the sparsity structure of
4198                  Jacobians (see \autoref{sparse}); \\
4199  \boldmath $\rightarrow$ \unboldmath & provides interfaces to \mbox{C++}-callable versions
4200                  of {\sf forward} and {\sf reverse} routines
4201                  propagating bit patterns (see \autoref{ProBit}); \\
4203  $\bullet$     & is included by the header \verb=<adolc/interfaces.h>=.
4205\\ \hline
4207\verb=<adolc/sparse/=\newline\verb= sparse_fo_rev.h>=
4208\end{minipage}  &
4210  \boldmath $\rightarrow$ \unboldmath
4211                & provides interfaces to the underlying C-callable
4212                  versions of {\sf forward} and {\sf reverse} routines
4213                  propagating bit patterns.
4215\\ \hline
4217\verb=<adolc/drivers/=\newline\verb= odedrivers.h>=
4218\end{minipage}  &
4220  \boldmath $\rightarrow$ \unboldmath
4221                & provides ``easy to use'' drivers for numerical
4222                  solution of ordinary differential equations
4223                  (see \autoref{odedrivers}); \\
4224  $\bullet$     & comprises C++, C, and Fortran-callable versions; \\
4225  $\bullet$     & includes the header \verb=<adolc/interfaces.h>=.
4227\\ \hline
4229\verb=<adolc/drivers/=\newline\verb= taylor.h>=
4230\end{minipage}  &
4232  \boldmath $\rightarrow$ \unboldmath
4233                & provides ``easy to use'' drivers for evaluation
4234                  of higher order derivative tensors (see
4235                  \autoref{higherOrderDeriv}) and inverse/implicit function
4236                  differentiation (see \autoref{implicitInverse});\\
4237  $\bullet$     & comprises C++ and C-callable versions.
4239\\ \hline
4240\verb=<adolc/adalloc.h>= &
4242  \boldmath $\rightarrow$ \unboldmath
4243                & provides C++ and C functions for allocation of
4244                  vectors, matrices and three dimensional arrays
4245                  of {\sf double}s.
4247\\ \hline
4249\caption{Important header files: evaluation of derivatives}
4253\subsection{Compiling and Linking C/C++ Programs}
4255To compile a C/C++ program or single module using ADOL-C
4256data types and routines one has to ensure that all necessary
4257header files according to \autoref{ssec:DesIH} are
4258included. All modules involving {\em active} data types as
4259{\sf adouble}
4260%, {\bf adoublev} and {\bf adoublem}
4261have to be compiled as C++. Modules that make use of a previously
4262generated tape to evaluate derivatives can either be programmed in ANSI-C
4263(while avoiding all C++ interfaces) or in C++. Depending
4264on the chosen programming language the header files provide
4265the right ADOL-C prototypes.
4266For linking the resulting object codes the library \verb=libadolc.*= 
4267must be used (see \autoref{genlib}).
4268\subsection{Interfacing with Python and R}
4269For generating the SWIG based interface to Python and R programming
4270languages the prerequisite is an installation of the SWIG tool (at
4271least newer than vesion 3.0.8). Furthermore, for installing and using
4272the R interface please refer to the documentation and software at
4274Installation of the Python interface can be done by changung into the
4275\verb=ADOL-C/swig= subdirectory of the ADOL-C source code and calling
4276the command
4278\verb=python install --user=
4280This will internally cause the compilation and installtion of the
4281ADOL-C library and header files using default configuration options as
4282described in \autoref{genlib}.
4284For a description on the working of these interfaces refer to the
4285following article.\\[1ex]
4286K. Kulshreshtha, S.H.K. Narayanan, J. Bessac and K. MacIntyre:
4287Efficient computation of derivatives for solving optimizaiton problems
4288in R and Python using SWIG generated interfaces to
4289ADOL-C. \emph{Optimisation Methods and Software}, 33(4-6),
4290pp1173-1191. 2018.
4292\subsection{Adding Quadratures as Special Functions}
4296Suppose an integral
4297\[ f(x) = \int\limits^{x}_{0} g(t) dt \]
4298is evaluated numerically by a user-supplied function
4300{\sf  double  myintegral(double\& x);}
4302Similarly, let us suppose that the integrand itself is evaluated by
4303a user-supplied block of C code {\sf integrand}, which computes a
4304variable with the fixed name {\sf val} from a variable with the fixed
4305name {\sf arg}. In many cases of interest, {\sf integrand} will
4306simply be of the form
4308{\sf \{ val = expression(arg) \}}\enspace .
4310In general, the final assignment to {\sf val} may be preceded
4311by several intermediate calculations, possibly involving local
4312active variables of type {\sf adouble}, but no external or static
4313variables of that type.  However, {\sf integrand} may involve local
4314or global variables of type {\sf double} or {\sf int}, provided they
4315do not depend on the value of {\sf arg}. The variables {\sf arg} and
4316{\sf val} are declared automatically; and as {\sf integrand} is a block
4317rather than a function, {\sf integrand} should have no header line. 
4319Now the function {\sf myintegral} can be overloaded for {\sf adouble}
4320arguments and thus included in the library of elementary functions
4321by the following modifications:
4324At the end of the file \verb=<adouble.cpp>=, include the full code
4325defining \\ {\sf double myintegral(double\& x)}, and add the line
4327{\sf extend\_quad(myintegral, integrand); }
4329This macro is extended to the definition of
4330 {\sf adouble myintegral(adouble\& arg)}.
4331Then remake the library \verb=libadolc.*= (see \autoref{genlib}).
4333In the definition of the class
4334{\sf ADOLC\_DLL\_EXPORT adouble} in \verb=<adolc/adouble.h>=, add the statement
4336{\sf friend adouble myintegral(adouble\&)}.
4339In the first modification, {\sf myintegral} represents the name of the
4340{\sf double} function, whereas {\sf integrand} represents the actual block
4341of C code.
4343For example, in case of the inverse hyperbolic cosine, we have
4344{\sf myintegral} = {\sf acosh}. Then {\sf integrand} can be written as
4345{\sf \{ val = sqrt(arg*arg-1); \}}
4346so that the line
4348{\sf extend\_quad(acosh,val = sqrt(arg*arg-1));}
4350can be added to the file \verb=<adouble.cpp>=.
4351A mathematically equivalent but longer representation of
4352{\sf integrand} is
4355{\sf \{ }\hspace{1.0in}\= {\sf  \{ adouble} \= temp =   \kill
4356 \>{\sf  \{ adouble} \> {\sf temp = arg;} \\
4357 \> \ \> {\sf  temp = temp*temp; } \\ 
4358 \> \ \> {\sf  val = sqrt(temp-1); \}
4361The code block {\sf integrand} may call on any elementary function that has already
4362been defined in file \verb=<adouble.cpp>=, so that one may also introduce
4363iterated integrals.
4366\section{Example Codes}
4369The following listings are all simplified versions of codes that
4370are contained in the example subdirectory
4371\verb=<=\texttt{\packagetar}\verb=>/ADOL-C/examples= of ADOL-C. In particular,
4372we have left out timings, which are included in the complete codes.
4374\subsection{Speelpenning's Example ({\tt speelpenning.cpp})}
4376The first example evaluates the gradient and the Hessian of
4377the function
4379y \; = \; f(x)\; =\; \prod_{i=0}^{n-1} x_i
4381using the appropriate drivers {\sf gradient} and {\sf hessian}.
4384#include <adolc/adouble.h>               // use of active doubles and taping
4385#include <adolc/drivers/drivers.h>       // use of "Easy to Use" drivers
4386                                   // gradient(.) and hessian(.)
4387#include <adolc/taping.h>                // use of taping
4389void main() {
4390int n,i,j;
4391size_t tape_stats[STAT_SIZE];
4392cout << "SPEELPENNINGS PRODUCT (ADOL-C Documented Example) \n";
4393cout << "number of independent variables = ?  \n";
4394cin >> n;
4395double* xp = new double[n];         
4396double  yp = 0.0;
4397adouble* x = new adouble[n];     
4398adouble  y = 1;
4400  xp[i] = (i+1.0)/(2.0+i);         // some initialization
4401trace_on(1);                       // tag =1, keep=0 by default
4402  for(i=0;i<n;i++) {
4403    x[i] <<= xp[i]; y *= x[i]; }     
4404  y >>= yp;
4405  delete[] x;                     
4407tapestats(1,tape_stats);           // reading of tape statistics
4408cout<<"maxlive "<<tape_stats[2]<<"\n";
4409...                                // ..... print other tape stats
4410double* g = new double[n];       
4411gradient(1,n,xp,g);                // gradient evaluation
4412double** H=(double**)malloc(n*sizeof(double*));
4414  H[i]=(double*)malloc((i+1)*sizeof(double));
4415hessian(1,n,xp,H);                 // H equals (n-1)g since g is
4416double errg = 0;                   // homogeneous of degree n-1.
4417double errh = 0;
4419  errg += fabs(g[i]-yp/xp[i]);     // vanishes analytically.
4420for(i=0;i<n;i++) {
4421  for(j=0;j<n;j++) {
4422    if (i>j)                       // lower half of hessian
4423      errh += fabs(H[i][j]-g[i]/xp[j]); } }
4424cout << yp-1/(1.0+n) << " error in function \n";
4425cout << errg <<" error in gradient \n";
4426cout << errh <<" consistency check \n";
4427}                                  // end main
4430\subsection{Power Example ({\tt powexam.cpp})}
4432The second example function evaluates the $n$-th power of a real
4433variable $x$ in
4434$\log_2 n$ multiplications by recursive halving of the exponent. Since
4435there is only one independent variable, the scalar derivative can be
4436computed by
4437using both {\sf forward} and {\sf reverse}, and the
4438results are subsequently compared.
4440#include <adolc/adolc.h>                 // use of ALL ADOL-C interfaces
4442adouble power(adouble x, int n) {
4443adouble z=1;
4444if (n>0) {                         // recursion and branches
4445  int nh =n/2;                     // that do not depend on
4446  z = power(x,nh);                 // adoubles are fine !!!!
4447  z *= z;
4448  if (2*nh != n) 
4449    z *= x;
4450  return z; }                      // end if
4451else {
4452  if (n==0)                        // the local adouble z dies
4453    return z;                      // as it goes out of scope.
4454  else
4455    return 1/power(x,-n); }        // end else
4456} // end power
4458The function {\sf power} above was obtained from the original
4459undifferentiated version by simply changing the type of all
4460{\sf double}s including the return variable to {\sf adouble}s. The new version
4461can now be called from within any active section, as in the following
4462main program.
4464#include ...                       // as above
4465int main() {
4466int i,n,tag=1;
4467cout <<"COMPUTATION OF N-TH POWER (ADOL-C Documented Example)\n\n";
4468cout<<"monomial degree=? \n";      // input the desired degree
4469cin >> n;
4470                                   // allocations and initializations
4471double* Y[1];
4472*Y = new double[n+2];
4473double* X[1];                      // allocate passive variables with
4474*X = new double[n+4];              // extra dimension for derivatives
4475X[0][0] = 0.5;                     // function value = 0. coefficient
4476X[0][1] = 1.0;                     // first derivative = 1. coefficient
4478  X[0][i+2]=0;                     // further coefficients
4479double* Z[1];                      // used for checking consistency
4480*Z = new double[n+2];              // between forward and reverse
4481adouble y,x;                       // declare active variables
4482                                   // beginning of active section
4483trace_on(1);                       // tag = 1 and keep = 0
4484x <<= X[0][0];                     // only one independent var
4485y = power(x,n);                    // actual function call
4486y >>= Y[0][0];                     // only one dependent adouble
4487trace_off();                       // no global adouble has died
4488                                   // end of active section
4489double u[1];                       // weighting vector
4490u[0]=1;                            // for reverse call
4491for(i=0;i<n+2;i++) {               // note that keep = i+1 in call
4492  forward(tag,1,1,i,i+1,X,Y);      // evaluate the i-the derivative
4493  if (i==0)
4494    cout << Y[0][i] << " - " << y.value() << " = " << Y[0][i]-y.value() 
4495    << " (should be 0)\n";
4496  else
4497    cout << Y[0][i] << " - " << Z[0][i] << " = " << Y[0][i]-Z[0][i] 
4498    << " (should be 0)\n";
4499  reverse(tag,1,1,i,u,Z);          // evaluate the (i+1)-st derivative
4500  Z[0][i+1]=Z[0][i]/(i+1); }       // scale derivative to Taylorcoeff.
4501return 1;
4502}                                  // end main
4504Since this example has only one independent and one dependent variable,
4505{\sf forward} and {\sf reverse} have the same complexity and calculate
4506the same scalar derivatives, albeit with a slightly different scaling.
4507By replacing the function {\sf power} with any other univariate test function,
4508one can check that {\sf forward} and {\sf reverse} are at least consistent.
4509In the following example the number of independents is much larger
4510than the number of dependents, which makes the reverse mode preferable.
4512\subsection{Determinant Example ({\tt detexam.cpp})}
4514Now let us consider an exponentially expensive calculation,
4515namely, the evaluation of a determinant by recursive expansion
4516along rows. The gradient of the determinant with respect to the
4517matrix elements is simply the adjoint, i.e. the matrix of cofactors.
4518Hence the correctness of the numerical result is easily checked by
4519matrix-vector multiplication. The example illustrates the use
4520of {\sf adouble} arrays and pointers. 
4523#include <adolc/adouble.h>               // use of active doubles and taping
4524#include <adolc/interfaces.h>            // use of basic forward/reverse
4525                                   // interfaces of ADOL-C
4526adouble** A;                       // A is an n x n matrix
4527int i,n;                           // k <= n is the order
4528adouble det(int k, int m) {        // of the sub-matrix
4529if (m == 0) return 1.0 ;           // its column indices
4530else {                             // are encoded in m
4531  adouble* pt = A[k-1];
4532  adouble t = zero;                // zero is predefined
4533  int s, p =1;
4534  if (k%2) s = 1; else s = -1;
4535  for(i=0;i<n;i++) {
4536    int p1 = 2*p;
4537    if (m%p1 >= p) {
4538      if (m == p) {
4539        if (s>0) t += *pt; else t -= *pt; }
4540      else {
4541        if (s>0)
4542          t += *pt*det(k-1,m-p);   // recursive call to det
4543        else
4544          t -= *pt*det(k-1,m-p); } // recursive call to det
4545      s = -s;}
4546    ++pt;
4547    p = p1;}
4548  return t; }
4549}                                  // end det
4551As one can see, the overloading mechanism has no problem with pointers
4552and looks exactly the same as the original undifferentiated function
4553except for the change of type from {\sf double} to {\sf adouble}.
4554If the type of the temporary {\sf t} or the pointer {\sf pt} had not been changed,
4555a compile time error would have resulted. Now consider a corresponding
4556calling program.
4559#include ...                       // as above
4560int main() {
4561int i,j, m=1,tag=1,keep=1;
4562cout << "COMPUTATION OF DETERMINANTS (ADOL-C Documented Example)\n\n";
4563cout << "order of matrix = ? \n";  // select matrix size
4564cin >> n;
4565A = new adouble*[n];             
4566trace_on(tag,keep);                // tag=1=keep
4567  double detout=0.0, diag = 1.0;   // here keep the intermediates for
4568  for(i=0;i<n;i++) {               // the subsequent call to reverse
4569    m *=2;
4570    A[i] = new adouble[n];         // not needed for adoublem
4571    adouble* pt = A[i];
4572    for(j=0;j<n;j++)
4573      A[i][j] <<= j/(1.0+i);       // make all elements of A independent
4574    diag += A[i][i].value();        // value() converts to double
4575    A[i][i] += 1.0; }
4576  det(n,m-1) >>= detout;           // actual function call
4577  printf("\n %f - %f = %f  (should be 0)\n",detout,diag,detout-diag);
4579double u[1];
4580u[0] = 1.0;
4581double* B = new double[n*n];
4583cout <<" \n first base? : ";
4584for (i=0;i<n;i++) {
4585  adouble sum = 0;
4586  for (j=0;j<n;j++)                // the matrix A times the first n
4587    sum += A[i][j]*B[j];           // components of the gradient B
4588  cout<<sum.value()<<" "; }         // must be a Cartesian basis vector
4589return 1;
4590}                                  // end main
4592The variable {\sf diag} should be mathematically
4593equal to the determinant, because the
4594matrix {\sf A} is defined as a rank 1 perturbation of the identity.
4596\subsection{Ordinary Differential Equation Example ({\tt odexam.cpp})}
4599Here, we consider a nonlinear ordinary differential equation that
4600is a slight modification of the Robertson test problem
4601given in Hairer and Wanner's book on the numerical solution of
4602ODEs \cite{HW}. The following source code computes the corresponding
4603values of $y^{\prime} \in \R^3$:
4605#include <adolc/adouble.h>                  // use of active doubles and taping
4606#include <adolc/drivers/odedrivers.h>       // use of "Easy To use" ODE drivers
4607#include <adolc/adalloc.h>                  // use of ADOL-C allocation utilities
4609void tracerhs(short int tag, double* py, double* pyprime) {
4610adouble y[3];                         // this time we left the parameters
4611adouble yprime[3];                    // passive and use the vector types
4613for (int i=0; i<3; i++)
4614     y[i] <<= py[i];                  // initialize and mark independents
4615yprime[0] = -sin(y[2]) + 1e8*y[2]*(1-1/y[0]);
4616yprime[1] = -10*y[0] + 3e7*y[2]*(1-y[1]);
4617yprime[2] = -yprime[0] - yprime[1];
4618yprime >>= pyprime;                   // mark and pass dependents
4620}                                     // end tracerhs
4622The Jacobian of the right-hand side has large
4623negative eigenvalues, which make the ODE quite stiff. We  have added
4624some numerically benign transcendentals to make the differentiation
4625more interesting.
4626The following main program uses {\sf forode} to calculate the Taylor series
4627defined by the ODE at the given point $y_0$ and {\sf reverse} as well
4628as {\sf accode} to compute the Jacobians of the coefficient vectors
4629with respect to $x_0$.
4631#include .......                   // as above
4632int main() {
4633int i,j,deg; 
4634int n=3;
4635double py[3];
4636double pyp[3];
4637cout << "MODIFIED ROBERTSON TEST PROBLEM (ADOL-C Documented Example)\n";
4638cout << "degree of Taylor series =?\n";
4639cin >> deg;
4640double **X;
4643  X[i]=(double*)malloc((deg+1)*sizeof(double));