wiki:LibraryInitialization

Once the appropriate library instance description has been generated, the next step is to initialize the PFunc runtime. PFunc's runtime is encapsulated by objects of type taskmgr. To help understand, each object of type taskmgr can be thought of as the equivalent of an OpenMP parallel section. Each such object encapsulates a task scheduling policy, a number of task queues into which tasks can be placed, and threads that are attached to these task queues, which execute the tasks. Typically, there is one object of type taskmgr per application run. However, users can create as many object instances of type taskmgr as they deem necessary. For example, if there are two disjoint sets of tasks that need to be run simultaneously with different scheduling policies, it is advisable to create two objects of type taskmgr. Each such object of type taskmgr represents a separate initialized instance of PFunc's runtime. PFunc further facilitates users who require just one runtime per application run by allowing specification of a global object of type taskmgr that can be used as an implicit argument in many function calls. The words runtime and taskmgr can be used inter-changably.

To initialize PFunc's runtime, users are required to provide three pieces of information: number of queues, number of threads per queue and the affinities of threads to processors. By manipulating these parameters, users are able to choose from a wide variety of mappings ranging from centralized work-sharing model to the distributed work-stealing model. These three parameters are summarized in the table below:

Parameter Value type Explanation
Num queues unsigned int Number of task queues to be used. Queues are numbered from 0 to N-1.
Num threads per queue unsigned int[] Number of threads to work on each queue. Allows a m x n mapping. 1 x n mapping represents work-sharing (thread-pools). n x1 mapping represents work-stealing (Cilk-style).
Thread affinities unsigned int[][] Affinity of each thread in each queue to a processor. Processors are numbered from 0 to N-1. Default values are accepted.

Initializing in C++

In this section, we initialize PFunc's runtime to use Cilk-style work stealing. Consider the following example:

/* Library instance description */
typedef pfunc::generator<cilkS, /* scheduling policy */
                         pfunc::use_default, /* compare */
                         parallel_foo> my_pfunc; /*function object*/

int main () {
  unsigned int num_queues = 4;
  const unsigned int num_threads_per_queue[] = {1,1,1,1};
  const unsigned int affinities[4][1] = {{0},{1},{2},{3}};

  /* Create a variable of the type taskmgr */
  my_pfunc::taskmgr my_taskmgr (num_queues, num_threads_per_queue, affinities);
  ...
  return 0; /* PFunc runtime is destroyed when my_taskmgr goes out of scope */
}

Let us now walk through the above example step by step. First, we generate the required library instance description using PFunc's generator interface. Next, we create an object of type taskmgr that is going to act as our runtime with the required parameters. For the scheduling policy, we choose Cilk-style runtime with with 4 queues and 1 thread per queue. Notice that this configuration sets up each thread with its own queue. When a thread runs out of work on its own queue, it steals work from other task queues. Hence, this model is called the work-stealing model. At the other end of the spectrum, if had chosen to have a single queue and put all our threads on it, it would constitute a work-sharing model. PFunc also allows users to define an m x n model, which would be a hybrid between the work-stealing and work-sharing models. The work-stealing model has been proven to be efficient for running applications that are written in a divide and conquer model (for example, the naive fibonacci? example). In such applications, each thread generates ample tasks to keep itself busy and avoids the contention associated with having a single task queue. The best scheduling policy for an application is usually found out by experimenting with different configurations. With PFunc, this is as simple as just changing the library instance description and the initialization of the runtime.

In our example, we also specify the processor affinities for each of the threads. In this example, we bind thread 0 to processor 0, thread 1 to processor 1, thread 2 to processor 2 and thread 3 to processor 3. Processor affinities are currently only supported on Linux platforms. By default, each thread can be scheduled to run on any of the available processors (cores). Binding a thread to a particular processor (core) might results in better cache resuse for applications running on dedicated machines. However, setting a thread's affinity also prevents it from being scheduled on other processors (cores).

In PFunc, the number of threads created can be calculated by multiplying the number of queues with the number of threads in each queue. In our example, we are creating 4 threads in all. These threads are created in addition to the main user thread that is already running. As a general rule, it is recommended to have only as many threads running an application as there are processors (cores). For example, on a dual core machine, we recommend creating only two threads, regardless of the configuration that the users set the threads up in (for example, 2 x1 or 1 x2). Creating more threads than processors might result in performance degradation as threads contend for shared computing resources. Furthermore, each PFunc runtime initialization (i.e., each object of type taskmgr) creates its own threads separate from other instances. So, exercise caution while having more than one library instance running. As soon as PFunc's runtime is initialized, the task queues and their corresponding threads are created. Each thread continually checks on the tasks queues (starting with its own) for tasks to be executed. However, as such continuous checking for tasks to run can deplete compute resource, PFunc threads check for tasks a pre-specified number of times (2 x 106 by default) before yielding the processor that they are running on. Such yielding behavior allows PFunc applications to co-exist with other applications without completely eating up compute resources. However, when the number of threads is <= to the number of processors available to run, and the application is being run on a dedicated machine, users can opt for Cilk-like behavior by increasing the number of attempts made by each thread before yielding. The higher the number of attempts made by a thread, the quicker the response time of a task in the task queue of being picked up by the thread and executed. The code below demonstrates how the maximum attempts can be changed if it is not to the user's liking.

unsigned int num_attempts;
pfunc::taskmgr_max_attempts_get (my_taskmgr, num_attempts);
if (10000 > num_attempts) {
  pfunc::taskmgr_max_attempts_set (my_taskmgr, 10000);
}

Initializing in C

In this section, we see how to initialize PFunc's runtime exactly to the same specification as that in the previous section. Initializing PFunc in C is much the same as in C++, and is shown in the code given below:

int main () {
  unsigned int num_queues = 4;
  const unsigned int num_threads_per_queue[] = {1,1,1,1};
  const unsigned int affinities[4][1] = {{0},{1},{2},{3}};
  pfunc_cilk_taskmgr_t cilk_tmanager;

  /* Initialize a global instance of the library */
  pfunc_cilk_taskmgr_init 
    (&cilk_tmanager, num_queues, num_threads_per_queue, affinities);
  ...
  /* Clear the global instance of the library */
  pfunc_cilk_taskmgr_clear (&cilk_tmanager);

  return 0;
}

Immediately, two differences can be seen from the C++ example. First, as we are programming in C, PFunc is initialized using a function call (pfunc_cilk_taskmgr_init in this case) rather than by constructors. Second, unlike in C++. PFunc's runtime needs to be explicitly cleared to release all the resources allocated by PFunc (using pfunc_cilk_taskmgr_clear in this case).

Using global runtimes

The most common use of PFunc involves using only one object of type taskmgr (one runtime). Under such circumstances, it becomes tedious to explicitly specify the correct runtime to use when spawning tasks. To avoid this, PFunc allows users to set up a global runtime and use it as the default runtime when a specific runtime is not specified in the various PFunc function calls. In following C++ code sample, we set up a global runtime and then proceed to change the number of attempts made by each thread to check for the availability of a task before yielding control to the thread scheduler.

/* Library instance description */
typedef pfunc::generator<cilkS, /* scheduling policy */
                         pfunc::use_default, /* compare */
                         parallel_foo> my_pfunc; /*function object*/

int main () {
  unsigned int num_queues = 4;
  const unsigned int num_threads_per_queue[] = {1,1,1,1};
  const unsigned int affinities[4][1] = {{0},{1},{2},{3}};
  unsigned int num_attempts;

  /* Create a variable of the type taskmgr */
  my_pfunc::taskmgr my_taskmgr (num_queues, num_threads_per_queue, affinities);

  /* Set up my_taskmgr as the global runtime */
  pfunc::init (my_taskmgr);

  /* Change the number of attempts if necessary */
  pfunc::taskmgr_max_attempts_get (num_attempts);
  if (10000 > num_attempts) pfunc::taskmgr_max_attempts_set (10000);

  /* Clear my_taskmgr as the global runtime */
  pfunc::clear ();

  return 0; /* my_taskmgr is destroyed when my_taskmgr goes out of scope */
}

The global run time is set up by first initializing an object of the type taskmgr (my_taskmgr in our case) as before and then using the function init to specify the use of my_taskmgr as the global runtime. Corresponding to this, it is necessary to clear the global runtime using the function clear. This does not destroy my_taskmgr, but merely unsets the use of my_taskmgr as the global runtime. This is useful when users want to switch to using a different object of type taskmgr as the global runtime. Finally, we turn our attention to how setting up the global runtime simplifies further function calls. In our case, we have simply omitted the first argument (meant to be \code{my_taskmgr}) from calls to the functions taskmgr_max_attempts_set and taskmgr_max_attempts_get. Similarly, once the global runtime has been set up, users can omit the taskmgr argument from the function call.

The code given below demonstrates the programmatic equivalent of the above example in C. To set up and clear the global runtime, we have used the functions pfunc_cilk_init and pfunc_cilk_clear respectively. The one marked difference from the C++ example is the addition of the _gbl suffix to the name of the functions that operate on the global runtimes. Such suffixing is necessary because C does not provide function overloading. For example, in Figure~\ref{fig:c_global}, the local equivalent of the function pfunc_cilk_taskmgr_max_attempts_set_gbl would be pfunc_cilk_taskmgr_max_attempts_set.

int main () {
  unsigned int num_queues = 4;
  const unsigned int num_threads_per_queue[] = {1,1,1,1};
  const unsigned int affinities[4][1] = {{0},{1},{2},{3}};
  pfunc_cilk_taskmgr_t cilk_tmanager;
  unsigned int num_attempts;

  /* Initialize a global instance of the library */
  pfunc_cilk_taskmgr_init 
    (&cilk_tmanager, num_queues, num_threads_per_queue, affinities);

  /* Set up the global runtime */
  pfunc_cilk_init (&cilk_tmanager);

  /* Change the number of attempts if necessary */
  pfunc_cilk_taskmgr_max_attempts_get_gbl (&num_attempts);
  if (10000 > num_attempts) pfunc_cilk_taskmgr_max_attempts_set_gbl (10000);

  /* Clear the global runtime */
  pfunc_cilk_clear (&cilk_tmanager);

  /* Clear the global instance of the library */
  pfunc_cilk_taskmgr_clear (&cilk_tmanager);

  return 0;
}
Last modified 12 years ago Last modified on Oct 22, 2009 1:34:50 PM