Version 4 (modified by pkambadu, 4 years ago) (diff)

--

PFunc allows parallel execution of functions. Let us explore this notion in a bit more detail. A normal function call is executed sequentially. Furthermore, a sequence of function calls are also executed sequentially. However, it is often the case that there are function calls that can be executed at the same time without any harmful side effects. In such cases, one can make use of PFunc to execute functions in parallel with respect to each other. For example consider the problem of calculating the sum of an integer array.

```int array_sum (int a[], int n) {
int sum = 0;
int i;
for (i=0; i<n; ++i)  sum += a[i];

return sum;
}
```

Now, suppose that we are to sum up an array of 100 elements. We could then invoke array_sum as shown below:

```int main () {
int a[100];
return array_sum (a, 100);
}
```

Although this serves our purpose, we could speed up the calculation by splitting the array into two and using array_sum on each part:

```int main () {
int a[100];
return array_sum (a, 50) + array_sum (a+50, 50);
}
```

Once we have written the problem in this form, we can see that the two invocations of array_sum can actually be executed in parallel. It is precisely such things that PFunc allows us to do.

## Creating work

In the introductory section above, we saw what PFunc allows us to do. However, the term \textbf{function} is broad, and as such, PFunc can only accept functions expressed in a particular form. In this section, we exposit on the functions that PFunc accepts. In brief, PFunc accepts work in two forms: as function pointers (C and C++), and function objects (C++ only). In this section, we explain the functions and function pointers that are accepted by PFunc.

### C-style pointers

PFunc accepts function pointers of the type void (*)(void*). The example below demonstrates how one such function looks like.

```void parallel_foo (void* arg) {
char* string = (char*) arg;
print ("PFunc task printing: ", string);
return;
}
```

Note that the function only accepts a single argument of type void*. Because of the constraints of a statically typed language, PFunc cannot accept arbitrary function objects as tasks. However, PFunc provides two function calls - pfunc_pack and pfunc_unpack to facilitate currying arguments to parallel functions (see PackUnpack).

### C++ function objects

PFunc also accepts C++ function objects (overloaded operator()) as work. However, using function objects as work requires some attention. As function objects name concrete types, users must decide if they have more than one type of function object that needs to be parallelized. If so, then all the function objects must derive from a common base class, which can then be used as the type of the Function object feature during library instance generation. In the following sections, we explain how to generated library instance descriptions for both cases.

• Single function object: In this case, for optimal performance, it is beneficial to explicitly name the function object that is going to be used at library instance description generation time. For example, consider the code sample given below:
```/* Forward declaration */
struct parallel_foo;

/* Library instance description */
typedef pfunc::generator<cilkS, /* scheduling policy */
pfunc::use_default, /* compare */
parallel_foo> my_pfunc; /*function object*/

struct parallel_foo {
...
...
void operator()() { ... };
};
```

In this case, parallel_foo is the only function object that can be parallelized by the library instance my_pfunc. As the function object is explicitly named, PFunc avoids making virtual function calls when spawning tasks. Using one function object to parallelize suffices for many applications (eg., Fibonacci numbers).

• Multiple function objects: In this case, users are required to name a common type during library instance description generation and have all their function objects derive from this type. To facilitate this case, PFunc provides a built-in base type that users can derive from. The following example demonstrates the use of the common base type:
```/* Library instance description */
typedef pfunc::generator<cilkS, /* scheduling policy */
pfunc::use_default, /* compare */
pfunc::use_default> my_pfunc; /*function object*/

/* First function object */
struct parallel_foo : public my_pfunc::functor {
void operator()() { ... };
};

/* Second function object */
struct parallel_bar : public my_pfunc::functor {
void operator()() { ... };
};
```

In the example above, pfunc::use_default is used as the value for the Function object feature. As a result, PFunc uses a virtual base class that stipulates operator(). The type of this class can be accessed from the generated library instance description using the nested type ::functor. Now, invocations of operator() on both parallel_foo and parallel_bar can be parallelized.

Once we have initialized the library and created work (functions and function objects), we can parallelize execution of these work packets using PFunc. In addition to the work packets, each task is comprised of three additional details. These are:

• Attribute: controls the execution of the task. PFunc provides suitable default value to this parameter.
• Group: enables SPMD-style task groups. PFunc provides suitable default value to this parameter.
• Task handle: a receipt for the spawned task. This handle can be used to query the status of the spawned task.

In C++, these types can be accessed as nested types of the generated library instance description. In C, these types are pre-generated.

In this section, we will introduce parallelization of a simple function using PFunc by means of an example. Consider the code sample given below.

```void parallel_foo (void* arg) {
char* string = (char*) arg;
print ("PFunc task number: ", string);
return;
}

int main () {
unsigned int num_queues = 4;
const unsigned int num_threads_per_queue[] = {1,1,1,1};
int i;

/* Initialize a global instance of the library */

for (i=0; i<10; ++i) {
pfunc_cilk_spawn_c (cilk_tmanager, tasks[i], NULL, NULL, parallel_foo, ltoa(i));
}

for (i=0; i<10; ++i) {
}

/* Clear the library */
return 0;
}
```

In the above example, we have parallelized execution of parallel_foo using PFunc. First, we initialize the Cilk-style library instance using the function call pfunc_cilk_taskmgr_init. In this example, we use task queues, 1 thread per queue and allow default values for thread affinities. Second, we spawn 10 instances of parallel_foo using the function pfunc_cilk_spawn_c. In this example, we choose to use the default value (NULL) for both attribute and group. Notice that the task handle has to be initialized (using pfunc_cilk_task_init) prior to its use in pfunc_cilk_spawn_c. This is required as the C types are mere pointers to their C++ counterparts. Third, we wait for the spawned tasks to finish using pfunc_cilk_wait before clearing the task handles. Finally, we clear the initialized library using pfunc_cilk_taskmgr_clear. This deallocates all resources (threads and internal queues) that are in use by PFunc. Note that we could have use the global runtime facility provided by PFunc in this example by setting up cilk_tmanager using pfunc_cilk_init.

In this section, we will parallelize the execution of a function object that is equivalent to the function parallelized in the previous section. The code is given below:

```struct parallel_foo {
void initialize (const int& _id) { id = _id; }
void operator()() {
std::cout << "PFunc task number:" << id << std::endl;
}
private:
int id;
};

/* Library instance description */
typedef pfunc::generator<cilkS, /* scheduling policy */
pfunc::use_default, /* compare */
parallel_foo> my_pfunc; /*function object*/

int main () {
parallel_foo work[10];
unsigned int num_queues = 4;
const unsigned int num_threads_per_queue[] = {1,1,1,1};

/* Initialize an instance of the library */

/* Make this instance the global runtime */

for (int i=0; i<10; ++i) {
work[i].initialize (i);
}

for (int i=0; i<10; ++i) pfunc::wait (tasks[i]);

/* Clear the global runtime */
pfunc::clear();

return 0;
}
```

This example has many changes from its C counterpart. First, notice that we do not have to initialize objects such as task, attribute or group as they are initialized on construction. Second, default values for unused parameters such as affinity (for pfunc::init), attribute and group (for pfunc::spawn) are filled in and consequently, there is no need to explicitly pass their values. Finally, notice that we use the global version of the functions spawn and wait because we set up cilk_tmanager as our global runtime.

```pfunc_cilk_wait_all (cilk_tmanager, tasks, 10);
```pfunc::wait_all (tasks, 10);