To initialize PFunc's runtime, users are required to provide three pieces of information: number of queues, number of threads per queue and the affinities of threads to processors. By manipulating these parameters, users are able to choose from a wide variety of mappings ranging from centralized work-sharing model to the distributed work-stealing model. These three parameters are summarized in the table below:

 Parameter Value type Explanation Num queues unsigned int Number of task queues to be used. Queues are numbered from 0 to N-1. Num threads per queue unsigned int[] Number of threads to work on each queue. Allows a m x n mapping. 1 x n mapping represents work-sharing (thread-pools). n x1 mapping represents work-stealing (Cilk-style). Thread affinities unsigned int[][] Affinity of each thread in each queue to a processor. Processors are numbered from 0 to N-1. Default values are accepted.

## Initializing in C++

In this section, we initialize PFunc's runtime to use Cilk-style work stealing. Consider the following example:

/* Library instance description */
typedef pfunc::generator<cilkS, /* scheduling policy */
pfunc::use_default, /* compare */
parallel_foo> my_pfunc; /*function object*/

int main () {
unsigned int num_queues = 4;
const unsigned int num_threads_per_queue[] = {1,1,1,1};
const unsigned int affinities[4][1] = {{0},{1},{2},{3}};

/* Create a variable of the type taskmgr */
...
return 0; /* PFunc runtime is destroyed when my_taskmgr goes out of scope */
}


Let us now walk through the above example step by step. First, we generate the required library instance description using PFunc's generator interface. Next, we create an object of type taskmgr that is going to act as our runtime with the required parameters. For the scheduling policy, we choose Cilk-style runtime with with 4 queues and 1 thread per queue. Notice that this configuration sets up each thread with its own queue. When a thread runs out of work on its own queue, it steals work from other task queues. Hence, this model is called the work-stealing model. At the other end of the spectrum, if had chosen to have a single queue and put all our threads on it, it would constitute a work-sharing model. PFunc also allows users to define an m x n model, which would be a hybrid between the work-stealing and work-sharing models. The work-stealing model has been proven to be efficient for running applications that are written in a divide and conquer model (for example, the naive fibonacci? example). In such applications, each thread generates ample tasks to keep itself busy and avoids the contention associated with having a single task queue. The best scheduling policy for an application is usually found out by experimenting with different configurations. With PFunc, this is as simple as just changing the library instance description and the initialization of the runtime.

In our example, we also specify the processor affinities for each of the threads. In this example, we bind thread 0 to processor 0, thread 1 to processor 1, thread 2 to processor 2 and thread 3 to processor 3. Processor affinities are currently only supported on Linux platforms. By default, each thread can be scheduled to run on any of the available processors (cores). Binding a thread to a particular processor (core) might results in better cache resuse for applications running on dedicated machines. However, setting a thread's affinity also prevents it from being scheduled on other processors (cores).

unsigned int num_attempts;
if (10000 > num_attempts) {
}


## Initializing in C

In this section, we see how to initialize PFunc's runtime exactly to the same specification as that in the previous section. Initializing PFunc in C is much the same as in C++, and is shown in the code given below:

int main () {
unsigned int num_queues = 4;
const unsigned int num_threads_per_queue[] = {1,1,1,1};
const unsigned int affinities[4][1] = {{0},{1},{2},{3}};

/* Initialize a global instance of the library */
...
/* Clear the global instance of the library */

return 0;
}


Immediately, two differences can be seen from the C++ example. First, as we are programming in C, PFunc is initialized using a function call (pfunc_cilk_taskmgr_init in this case) rather than by constructors. Second, unlike in C++. PFunc's runtime needs to be explicitly cleared to release all the resources allocated by PFunc (using pfunc_cilk_taskmgr_clear in this case).

## Using global runtimes

The most common use of PFunc involves using only one object of type taskmgr (one runtime). Under such circumstances, it becomes tedious to explicitly specify the correct runtime to use when spawning tasks. To avoid this, PFunc allows users to set up a global runtime and use it as the default runtime when a specific runtime is not specified in the various PFunc function calls. In following C++ code sample, we set up a global runtime and then proceed to change the number of attempts made by each thread to check for the availability of a task before yielding control to the thread scheduler.

/* Library instance description */
typedef pfunc::generator<cilkS, /* scheduling policy */
pfunc::use_default, /* compare */
parallel_foo> my_pfunc; /*function object*/

int main () {
unsigned int num_queues = 4;
const unsigned int num_threads_per_queue[] = {1,1,1,1};
const unsigned int affinities[4][1] = {{0},{1},{2},{3}};
unsigned int num_attempts;

/* Create a variable of the type taskmgr */

/* Set up my_taskmgr as the global runtime */

/* Change the number of attempts if necessary */
if (10000 > num_attempts) pfunc::taskmgr_max_attempts_set (10000);

/* Clear my_taskmgr as the global runtime */
pfunc::clear ();

return 0; /* my_taskmgr is destroyed when my_taskmgr goes out of scope */
}


The global run time is set up by first initializing an object of the type taskmgr (my_taskmgr in our case) as before and then using the function init to specify the use of my_taskmgr as the global runtime. Corresponding to this, it is necessary to clear the global runtime using the function clear. This does not destroy my_taskmgr, but merely unsets the use of my_taskmgr as the global runtime. This is useful when users want to switch to using a different object of type taskmgr as the global runtime. Finally, we turn our attention to how setting up the global runtime simplifies further function calls. In our case, we have simply omitted the first argument (meant to be \code{my_taskmgr}) from calls to the functions taskmgr_max_attempts_set and taskmgr_max_attempts_get. Similarly, once the global runtime has been set up, users can omit the taskmgr argument from the function call.

The code given below demonstrates the programmatic equivalent of the above example in C. To set up and clear the global runtime, we have used the functions pfunc_cilk_init and pfunc_cilk_clear respectively. The one marked difference from the C++ example is the addition of the _gbl suffix to the name of the functions that operate on the global runtimes. Such suffixing is necessary because C does not provide function overloading. For example, in Figure~\ref{fig:c_global}, the local equivalent of the function pfunc_cilk_taskmgr_max_attempts_set_gbl would be pfunc_cilk_taskmgr_max_attempts_set.

int main () {
unsigned int num_queues = 4;
const unsigned int num_threads_per_queue[] = {1,1,1,1};
const unsigned int affinities[4][1] = {{0},{1},{2},{3}};
unsigned int num_attempts;

/* Initialize a global instance of the library */

/* Set up the global runtime */
pfunc_cilk_init (&cilk_tmanager);

/* Change the number of attempts if necessary */