# source:trunk/SYMPHONY/Doc/man-design.tex@827

Last change on this file since 827 was 827, checked in by tkr, 7 years ago

• Property svn:eol-style set to native
• Property svn:keywords set to Author Date Id Revision
File size: 67.2 KB
Line
1%===========================================================================%
2%                                                                           %
3% This file is part of the documentation for the SYMPHONY MILP Solver.      %
4%                                                                           %
5% SYMPHONY was jointly developed by Ted Ralphs (tkralphs@lehigh.edu) and    %
7%                                                                           %
9%                                                                           %
11% accompanying file for terms.                                              %
12%                                                                           %
13%===========================================================================%
14
15\section{Branch and Bound}
16
17{\em Branch and bound} is the broad class of algorithms from which
18branch, cut, and price is descended. A branch and bound algorithm uses
19a divide and conquer strategy to partition the solution space into
20{\em subproblems} and then optimizes individually over each
21subproblem. For instance, let $S$ be the set of solutions to a given
22problem, and let $c \in {\bf R}^S$ be a vector of costs associated
23with members of S. Suppose we wish to determine a least cost member of
24S and we are given $\hat{s} \in S$, a good'' solution determined
25heuristically. Using branch and bound, we initially examine the entire
26solution space $S$. In the {\em processing} or {\em bounding} phase,
27we relax the problem. In so doing, we admit solutions that are not in
28the feasible set $S$. Solving this relaxation yields a lower bound on
29the value of an optimal solution. If the solution to this relaxation
30is a member of $S$ or has cost equal to $\hat{s}$, then we are
31done---either the new solution or $\hat{s}$, respectively, is optimal.
32Otherwise, we identify $n$ subsets of $S$, $S_1, \ldots, S_n$, such
33that $\cup_{i = 1}^n S_i = S$. Each of these subsets is called a {\em
34subproblem}; $S_1, \ldots, S_n$ are sometimes called the {\em
35children} of $S$. We add the children of $S$ to the list of {\em
36candidate subproblems} (those which need processing). This is called
37{\em branching}.
38
39To continue the algorithm, we select one of the candidate subproblems
40and process it. There are four possible results. If we find a feasible
41solution better than $\hat{s}$, then we replace $\hat{s}$ with the new
42solution and continue. We may also find that the subproblem has no
43solutions, in which case we discard, or {\em prune} it. Otherwise, we
44compare the lower bound to our global upper bound. If it is greater
45than or equal to our current upper bound, then we may again prune the
46subproblem. Finally, if we cannot prune the subproblem, we are forced
47to branch and add the children of this subproblem to the list of
48active candidates. We continue in this way until the list of active
49subproblems is empty, at which point our current best solution is the
50optimal one.
51
52\section{Branch and Cut}
53\label{branchandcut}
54
55In many applications, the bounding operation is accomplished using the
56tools of linear programming (LP), a technique first described in full
57generality by Hoffman and Padberg \cite{hoff:LP}. This general class of
58algorithms is known as {\em LP-based branch and bound}. Typically, the
59integrality constraints of an integer programming formulation of the
60problem are relaxed to obtain a {\em LP relaxation}, which is then
61solved to obtain a lower bound for the problem. In \cite{padb:branc},
62Padberg and Rinaldi improved on this basic idea by describing a method
63of using globally valid inequalities (i.e., inequalities valid for the
64convex hull of integer solutions) to strengthen the LP relaxation.
65They called this technique {\em branch and cut}. Since then, many
66implementations (including ours) have been fashioned around the
67framework they described for solving the Traveling Salesman Problem.
68
69\begin{figure}
70\framebox[6.5in]{
71\begin{minipage}{6.0in}
72\vskip .1in
73{\rm
74{\bf Bounding Operation}\\
75\underbar{Input:} A subproblem ${\cal S}$, described in
76terms of a small'' set of inequalities ${\cal L'}$ such that ${\cal 77S} = \{x^s : s \in {\cal F}\;\hbox{\rm and}\;ax^s \leq \beta\;\forall 78\;(a,\beta) \in {\cal L'}\}$ and $\alpha$, an upper bound on the global
79optimal value. \\
80\underbar{Output:} Either (1) an optimal solution $s^* \in {\cal S}$ to
81the subproblem, (2) a lower bound on the optimal value of the
82subproblem, or (3) a message {\tt pruned} indicating that the
83subproblem should not be considered further. \\
84{\bf Step 1.} Set ${\cal C} \leftarrow {\cal L'}$. \\
85{\bf Step 2.} Solve the LP $\min\{cx : ax \leq \beta\;\forall\;(a, \beta) 86\in {\cal C}\}$. \\
87{\bf Step 3.} If the LP has a feasible solution $\hat{x}$, then go to
88Step 4. Otherwise, STOP and output {\tt pruned}. This subproblem has no
89feasible solutions. \\
90{\bf Step 4.} If $c\hat{x} < \alpha$, then go to Step
915. Otherwise, STOP and output {\tt pruned}. This subproblem
92cannot produce a solution of value better than $\alpha$. \\
93{\bf Step 5.} If $\hat{x}$ is the incidence vector of some $\hat{s} 94\in {\cal S}$, then $\hat{s}$ is the optimal solution to this
95subproblem. STOP and output $\hat{s}$ as $s^*$. Otherwise, apply
96separation algorithms and heuristics to $\hat{x}$ to get a set of
97violated inequalities ${\cal C'}$. If ${\cal C'} = \emptyset$, then
98$c\hat{x}$ is a lower bound on the value of an optimal element of
99${\cal S}$.  STOP and return $\hat{x}$ and the lower bound
100$c\hat{x}$. Otherwise, set ${\cal C} \leftarrow {\cal C} \cup {\cal 101C'}$ and go to Step 2.}
102\end{minipage}
103}
104\caption{Bounding in the branch and cut algorithm}
105\label{proc-bound}
106\end{figure}
107As an example, let a combinatorial optimization problem $\hbox{\em CP} = 108(E, {\cal F})$ with {\em ground set} $E$ and {\em feasible set} ${\cal F} 109\subseteq 2^E$ be given along with a cost function $c \in {\bf R}^E$.
110The incidence vectors corresponding to the members of ${\cal F}$ are
111sometimes specified as the the set of all incidence vectors obeying a
112(relatively) small set of inequalities. These inequalities are
113typically the ones used in the initial LP relaxation. Now let ${\cal 114P}$ be the convex hull of incidence vectors of members of ${\cal 115F}$. Then we know by Weyl's Theorem (see \cite{nemwol88}) that there exists
116a finite set ${\cal L}$ of inequalities valid for ${\cal P}$ such that
117\begin{equation}
118\label{the-polyhedron}
119{\cal P} = \{x \in {\bf R}^n: ax \leq \beta\;\;\forall\;(a, \beta) \in
120{\cal L}\}.
121\end{equation}
122The inequalities in ${\cal L}$ are the potential cutting planes to be
123added to the relaxation as needed. Unfortunately, it is usually
124difficult, if not impossible, to enumerate all of inequalities in
125${\cal L}$ or we could simply solve the problem using linear
126programming. Instead, they are defined implicitly and we use
127separation algorithms and heuristics to generate these inequalities
128when they are violated. In Figure \ref{proc-bound}, we describe more
129precisely how the bounding operation is carried out in branch and cut.
130\begin{figure}
131\framebox[6.5in]{
132\begin{minipage}{6.0in}
133\vskip .1in
134{\rm
135{\bf Branching Operation} \\
136\underbar{Input:} A subproblem ${\cal S}$ and $\hat{x}$, the LP solution
137yielding the lower bound. \\
138\underbar{Output:} $S_1, \ldots, S_p$ such that ${\cal S} = \cup_{i = 1}^p 139S_i$. \\
140{\bf Step 1.} Determine sets ${\cal L}_1, \ldots, {\cal L}_p$ of
141inequalities such that ${\cal S} = \cup_{i = 1}^n \{x \in {\cal S}: ax \leq 142\beta\;\forall\;(a, \beta) \in {\cal L}_i\}$ and $\hat{x} \notin 143\cup_{i = 1}^n S_i$. \\
144{\bf Step 2.} Set $S_i = \{x \in {\cal S}: ax \leq 145\beta\;\;\forall\;(a, \beta) \in {\cal L}_i \cup {\cal L}'\}$ where
146${\cal L'}$ is the set of inequalities used to describe ${\cal S}$.}
147\end{minipage}
148}
149\caption{Branching in the branch and cut algorithm}
150\label{branching-fig}
151\end{figure}
152
153\indent Once we have failed to either prune the current subproblem or separate
154the current fractional solution from ${\cal P}$, we are forced to
155branch. The branching operation is accomplished by specifying a set of
156hyperplanes which divide the current subproblem in such a way that the
157current solution is not feasible for the LP relaxation of any of the
158new subproblems. For example, in a combinatorial optimization problem,
159branching could be accomplished simply by fixing a variable whose
160current value is fractional to 0 in one branch and 1
161in the other. The procedure is described more formally in Figure
162\ref{branching-fig}. Figure \ref{gb&c} gives a high level description
163of the generic branch and cut algorithm.
164\begin{figure}
165\framebox[6.5in]{
166\begin{minipage}{6.0in}
167\vskip .1in
168{\rm
169{\bf Generic Branch and Cut Algorithm}\\
170\underbar{Input:} A data array specifying the problem instance.\\
171\underbar{Output:} The global optimal solution $s^*$ to the problem
172instance. \\
173{\bf Step 1.} Generate a good'' feasible solution ${\hat s}$ using
174heuristics. Set $\alpha \leftarrow c(\hat{s})$. \\
175{\bf Step 2.} Generate the first subproblem ${\cal S}^I$ by constructing a
176small set ${\cal L'}$ of inequalities valid for ${\cal P}$. Set $A 177\leftarrow \{{\cal S}^I\}$. \\
178{\bf Step 3.} If $A = \emptyset$, STOP and output $\hat{s}$ as the
179global optimum $s^*$. Otherwise, choose some ${\cal S} \in A$. Set $A 180\leftarrow A \setminus \{{\cal S}\}$. Process ${\cal S}$. \\
181{\bf Step 4.} If the result of Step 3 is a feasible solution
182$\overline{s}$, then $c\overline{s} < c\hat{s}$.
183Set $\hat{s} \leftarrow \overline{s}$ and $\alpha \leftarrow 184c(\overline{s})$ and go to Step 3. If the subproblem was pruned, go to
185Step 3. Otherwise, go to Step 5. \\
186{\bf Step 5.} Perform the branching operation. Add the set of
187subproblems generated to $A$ and go to Step 3.}
188\end{minipage}
189}
190\caption{Description of the generic branch and cut algorithm}
191\label{gb&c}
192\end{figure}
193
194In the remainder of the manual, we often use the term {\em search
195tree}. This term derives from the common representation of the list of
196subproblems as the nodes of a graph in which each subproblem is
197connected only to its parent and its children. Storing the subproblems
198in such a form is an important aspect of our global data structures.
199Since the subproblems correspond to the nodes of this graph, they are
200sometimes be referred to as {\em nodes in the search tree} or simply
201as {\em nodes}. The {\em root node} or {\em root} of the tree is the
202node representing the initial subproblem.
203
204\section{Design of SYMPHONY}
205\label{design}
206
207\BB\ was designed with two major goals in mind---portability and
208ease of use. With respect to ease of use, we aimed for a black box''
209design, whereby the user would not be required to know anything about
210the implementation of the library, but only about the user interface.
211With respect to portability, we aimed not only for it to be {\em
212possible} to use the framework in a wide variety of settings and on a
213wide variety of hardware, but also for it to perform {\em effectively} in all
214these settings. Our primary measure of effectiveness was
215how well the framework would perform in comparison to a problem-specific
216(or hardware-specific) implementation written from scratch.''
217
218It is important to point out that achieving such design goals involves
219a number of very difficult tradeoffs. For instance, ease of use is quite
220often at odds with efficiency. In several instances, we had to give up
221some efficiency to make the code easy to work with and to maintain a
222true black box implementation. Maintaining portability across a wide
223variety of hardware, both sequential and parallel, also required some
224difficult choices. For example, solving large-scale problems on
225sequential platforms requires extremely memory-efficient data
226structures in order to maintain the very large search trees that can
227be generated. These storage schemes, however, are highly centralized
228and do not scale well to large numbers of processors.
229
230\subsection{An Object-oriented Approach}
231
232As we have already alluded to, applying BCP to large-scale problems
233presents several difficult challenges. First and foremost is designing
234methods and data structures capable of handling the potentially huge
235numbers of cuts and variables that need to be accounted for during the
236solution process. The dynamic nature of the algorithm requires that we
237must also be able to efficiently move cuts and variables in and out of
238the {\em active set} of each search node at any time. A second,
239closely-related challenge is that of effectively dealing with the very
240large search trees that can be generated for difficult problem
241instances. This involves not only the important question of how to
242store the data, but also how to move it between modules during
243parallel execution. A final challenge in developing a generic
244framework, such as SYMPHONY, is to deal with these issues using a
245problem-independent approach.
246
247Describing a node in the search tree consists of, among other things,
248specifying which cuts and variables are initially {\em active} in the
249subproblem. In fact, the vast majority of the methods in BCP that
250depend on the model are related to generating, manipulating, and
251storing the cuts and variables. Hence, SYMPHONY can be considered an
252object-oriented framework with the central objects'' being the cuts
253and variables. From the user's perspective, implementing a BCP
254algorithm using SYMPHONY consists primarily of specifying various
255properties of objects, such as how they are generated, how they are
256represented, and how they should be realized within the context of a
257particular subproblem.
258
259With this approach, we achieved the black box'' structure by
260separating these problem-specific functions from the rest of the
261implementation. The internal library interfaces with the user's
262subroutines through a well-defined Application Program Interface (API) (see
263Section \ref{API})
264and independently performs all the normal functions of BCP---tree
265management, LP solution, and cut pool management, as well as inter-process
266communication (when parallelism is employed). Although there are
267default options for many of the operations, the user can also assert
268control over the behavior of the algorithm by overriding the default
269methods or by parameter setting.
270
271Although we have described our approach as being object-oriented,''
272we would like to point out that SYMPHONY is implemented in C, not C++.
273To avoid inefficiencies and enhance the modularity of the code
274(allowing for easy parallelization), we used a more
275function-oriented'' approach for the implementation of certain
276aspects of the framework. For instance, methods used for communicating
277data between modules are not naturally object-oriented'' because the
278type of data being communicated is usually not known by the
279message-passing interface. It is also common that efficiency
280considerations require that a particular method be performed on a
281whole set of objects at once rather than on just a single object.
282Simply invoking the same method sequentially on each of the members of
283the set can be extremely inefficient. In these cases, it is far better
284to define a method which operates on the whole set at once. In order
285to overcome these problems, we have also defined a set of {\em
286interface functions}, which are associated with the computational
287modules. These function is described in detail in Section \ref{API}.
288
289\subsection{Data Structures and Storage}
290\label{data-structures}
291
292Both the memory required to store the search tree and the time
293required to process a node are largely dependent on the number of
294objects (cuts and variables) that are active in each subproblem.
295Keeping this active set as small as possible is one of the keys to
296efficiently implementing BCP. For this reason, we chose data
297structures that enhance our ability to efficiently move objects in and
298out of the active set. Allowing sets of cuts and variables to move in
299and out of the linear programs simultaneously is one of the most
300significant challenges of BCP. We do this by maintaining an abstract
301{\em representation} of each global object that contains information
303
304In the literature on linear and integer programming, the terms {\em
305cut} and {\em row} are typically used interchangeably. Similarly, {\em
306variable} and {\em column} are often used with similar meanings. In
307many situations, this is appropriate and does not cause confusion.
308However, in object-oriented BCP frameworks, such as \BB\ or ABACUS
309\cite{abacus1}, a {\em cut} and a {\em row} are {\em fundamentally
310different objects}. A {\em cut} (also referred to as a {\em
311constraint}) is a user-defined representation of an abstract object
312which can only be realized as a row in an LP matrix {\em with respect
313to a particular set of active variables}. Similarly, a {\em variable}
314is a representation which can only be realized as a column of an LP
315matrix with respect to a {\em particular set of cuts}. This
316distinction between the {\em representation} and the {\em realization}
317of objects is a crucial design element and is what allows us to
318effectively address some of the challenges inherent in BCP. In the
319remainder of this section, we further discuss this distinction
320and its implications.
321
322%In later sections, we will discuss the computational issues involved
323%in the efficient processing of individual subproblem.
324
325\subsubsection{Variables}
326\label{variables}
327
328In \BB, problem variables are {\em represented} by a unique global
329index assigned to each variable by the user. This index represents
330each variable's position in a virtual'' global list known only to
331the user. The main requirement of this indexing scheme is that, given
332an index and a list of active cuts, the user must be able to generate
333the corresponding column to be added to the matrix. As an example, in
334problems where the variables correspond to the edges of an underlying
335graph, the index could be derived from a lexicographic ordering of the
336edges (when viewed as ordered pairs of nodes).
337
338This indexing scheme provides a very compact representation, as well
339as a simple and effective means of moving variables in and out of the
340active set. However, it means that the user must have a priori
341knowledge of all problem variables and a method for indexing them. For
342combinatorial models such as the {\em Traveling Salesman Problem},
343this does not present a problem. However, for some set partitioning
344models, for instance, the number of columns may not be known in
345advance. Even if the number of columns is known in advance, a viable
346indexing scheme may not be evident. Eliminating the indexing
347requirement by allowing variables to have abstract, user-defined
348representations (such as we do for cuts), would allow for more
349generality, but would also sacrifice some efficiency. A hybrid scheme,
350allowing the user to have both indexed and {\em algorithmic} variables
351(variables with user-defined representations) is planned for a future
352version of SYMPHONY.
353
354For efficiency, the problem variables can be divided into two sets, the
355{\em base variables} and the {\em extra variables}. The base variables
356are active in all subproblems, whereas the extra variables can be
357added and removed. There is no theoretical difference between
358base variables and extra variables; however, designating a well-chosen
359set of base variables can significantly increase efficiency. Because
360they can move in and out of the problem, maintaining extra variables
361requires additional bookkeeping and computation. If the user has
362reason to believe a priori that a variable is good'' or has a high
363probability of having a non-zero value in some optimal solution to the
364problem, then that variable should be designated as a base variable.
365It is up to the user to designate which variables should be active in
366the root subproblem. Typically, when column generation is used, only base
367variables are active. Otherwise, all variables must be active in the
368root node.
369
370\subsubsection{Constraints}
371\label{constraints}
372
373Because the global list of potential constraints (also called cuts) is
374not usually known a priori or is extremely large, constraints cannot
375generally be represented simply by a user-assigned index. Instead,
376each constraint is assigned a global index only after it becomes
377active in some subproblem. It is up to the user, if desired, to
378designate a compact {\em representation} for each class of constraints
379that is to be generated and to implement subroutines for converting
380from this compact representation to a matrix row, given the list of
381active variables. For instance, suppose that the set of nonzero
382variables in a particular class of constraints corresponds to the set
383of edges across a cut in a graph. Instead of storing the indices of
384each variable explicitly, one could simply store the set of nodes on
385one side (shore'') of the cut as a bit array. The constraint could
386then be constructed easily for any particular set of active variables
387(edges).
388
389Just as with variables, the constraints are divided into {\em core
390constraints} and {\em extra constraints}. The core constraints are
391those that are active in every subproblem, whereas the extra
392constraints can be generated dynamically and are free to enter and leave
393as appropriate. Obviously, the set of core constraints must be known
394and constructed explicitly by the user. Extra constraints, on the
395other hand, are generated dynamically by the cut generator as they are
396violated. As with variables, a good set of core constraints can have a
397significant effect on efficiency.
398
399Note that the user is not {\em required} to designate a compact
400representation scheme. Constraints can simply be represented
401explicitly as matrix rows with respect to the global set of variables.
402However, designating a compact form can result in large reductions in
403memory use if the number of variables in the problem is large.
404
405\subsubsection{Search Tree}
406
407Having described the basics of how objects are represented, we now
408describe the representation of search tree nodes. Since the base
409constraints and variables are present in every subproblem, only the
410indices of the extra constraints and variables are stored in each
411node's description. A complete description of the current basis is
412maintained to allow a warm start to the computation in each search
413node. This basis is either inherited from the parent, computed during
414strong branching (see Section \ref{branching}), or comes from earlier
415partial processing of the node itself (see Section \ref{two-phase}).
416Along with the set of active objects, we must also store the identity
417of the object(s) which were branched upon to generate the node. The
418branching operation is described in Section \ref{branching}.
419
420Because the set of active objects and the status of the basis do not
421tend to change much from parent to child, all of these data are stored
422as differences with respect to the parent when that description is
423smaller than the explicit one. This method of storing the entire tree
424is highly memory-efficient. The list of nodes that are candidates for
425processing is stored in a heap ordered by a comparison function
426defined by the search strategy (see \ref{tree-management}). This
427allows efficient generation of the next node to be processed.
428
429\subsection{Modular Implementation}
430\label{SYMPHONY-modules}
431
432\BB's functions are grouped into five independent computational
433modules. This modular implementation not only facilitates code
434maintenance, but also allows easy and highly configurable
435parallelization. Depending on the computational setting, the modules
436can be compiled as either (1) a single sequential code, (2) a
437multi-threaded shared-memory parallel code, or (3) separate processes
438running in distributed fashion over a network. The modules pass data
439to each other either through shared memory (in the case of sequential
440computation or shared-memory parallelism) or through a message-passing
441protocol defined in a separate communications API (in the case of
442distributed execution). an schematic overview of the modules is
443presented in Figure \ref{overview}. In the remainder of the section,
444we describe the modularization scheme and the implementation of each
445module in a sequential environment.
446
447\begin{figure}
448\centering
449\psfig{figure=pbandc.eps}
450\caption{Schematic overview of the branch, cut, and price algorithm}
451\label{overview}
452\end{figure}
453
454\subsubsection{The Master Module}
455\label{master-process}
456
457The {\em master module} includes functions that perform problem initialization
458and I/O. This module is the only persistent module and stores all static
459problem data. The other modules are created only during a solve call and
460destroyed afterward. All calls to the API are processed through the master
461module. These functions of the master module implement the following tasks:
462%\underbar{Overall Function:} Handle input/output, maintain the data
463%for the problem instance and serve requests to send out that
464%data. Keep track of the best solution found so far.\\
465%\underbar{Specific Functions:}
466\begin{itemize}
467        \item Initialize the environment.
468        \item Set and maintain parameter values.
469        \item Read and store static problem data for instance to be solved.
470        \item Compute an initial upper bound using heuristics.
471        \item Perform problem preprocessing.
472        \item Initialize the solution process, pass problem information to the
473        solver modules and store the results after completion of the solve
474        call.
475        \item Track the status of associated processes during parallel
476        solution calls.
477        \item Act as a clearing house for output during the solution process.
478        \item Store warm start information between solver calls.
479        \item Service requests from the user through the API for problem data,
480        problem modification, and parameter modification.
481\end{itemize}
482
483\subsubsection{The Tree Management Module}
484
485The \emph{tree manager} controls the overall execution of the algorithm. It
486tracks the status of its worker modules, as well as that of the search
487tree, and distributes the subproblems to be processed to the node processing
488module(s). Functions performed by the tree management module are:
489%\underbar{Overall Function:} Start up all the processes and keep
490%track of the current state of the search tree.
491%\noindent \underbar{Specific Functions:} \nobreak
492\begin{itemize}
493        \item Receive data for the root node and place it on the list
494        of candidates for processing.
495        \item Receive data for subproblems to be held for later
496        processing.
497        \item Handle requests from linear programming modules to
498        release a subproblem for processing.
499        \item Receive branching object information, set up data structures
500        for the children, and add them to the list of candidate subproblems.
501        \item Keep track of the global upper bound and notify all node
502        processing modules when it changes.
503        \item Write current state information out to disk periodically
504        to allow a restart in the event of a system crash.
505        \item Keep track of run data and send it to the master
506        program at termination.
507\end{itemize}
508
509\subsubsection{The Node Processing Module}
510
511The \emph{node processing} (NP) module is the most complex and
512computationally intensive of the five processes. Its job is to perform
513the bounding and branching operations. These operations
514are, of course, central to the performance of the algorithm. Functions
515performed by the LP module are:
516%\underbar{Overall Function:} Process and bound subproblems. Branch and
517%produce new subproblems. \\
518%\underbar{Specific Functions:}
519\begin{itemize}
520        \item Inform the tree manager when a new subproblem is needed.
521        \item Receive a subproblem and process it in conjunction
522        with the cut generator and the cut pool.
523        \item Decide which cuts should be sent to the global pool to
524        be made available to other NP modules.
525        \item If necessary, choose a branching object and send its
526        description back to the tree manager.
527        \item Perform the fathoming operation, including generating
528        variables.
529\end{itemize}
530
531\subsubsection{The Cut Generation Module}
532
533The {\em cut generator} performs only one function---generating valid
534inequalities violated by the current fractional solution and sending
535them back to the requesting LP process. Here are the functions
536performed by the cut generator module:
537%\underbar{Overall Function:} Receive solution vectors from the LP
538%processes and generate valid inequalities violated by these vectors.\\
539%\underbar{Specific Functions:}
540\begin{itemize}
541        \item Receive an LP solution and attempt to
542        separate it from the convex hull of all solutions.
543        \item Send generated valid inequalities back to the NP module.
544        \item When finished processing a solution vector, inform the
545        NP module not to expect any more cuts in case it is still waiting.
546\end{itemize}
547
548\subsubsection{The Cut Management Module}
549
550The concept of a {\em cut pool} was first suggested by Padberg and
551Rinaldi \cite{padb:branc}, and is based on the observation that in BCP, the
552inequalities which are generated while processing a particular node in
553the search tree are also generally valid and potentially useful at
554other nodes. Since generating these cuts is usually a relatively
555expensive operation, the cut pool maintains a list of the best'' or
556strongest'' cuts found in the tree so far for use in processing
557future subproblems. Hence, the cut manager functions as an auxiliary cut
558generator. More explicitly, here are the functions of the cut pool
559module:
560%\underbar{Overall Function:} Maintain a list of effective'' valid
561%inequalities for use in processing the subproblems.\\
562%\underbar{Specific Functions:}
563\begin{itemize}
564        \item Receive cuts generated by other modules and store them.
565        \item Receive an LP solution and return a
566set of cuts which this solution violates.
567        \item Periodically purge ineffective'' and duplicate cuts
568to control its size.
569\end{itemize}
570
571\subsection{Algorithm Summary}
572\label{symphony}
573
574Currently, \BB\ is what is known as a single-pool BCP algorithm.
575The term {\em single-pool} refers to the fact that there is a single
576central list of candidate subproblems to be processed, which is
577maintained by the tree manager. Most sequential implementations use
578such a single-pool scheme. However, other schemes may be used in
579parallel implementations. For a description of various types of
580parallel branch and bound, see \cite{gend:paral}.
581
582The user begins by initializing the SYMPHONY environment and can then invoke
583subroutines for reading in parameters and problem data, finding an initial
584upper bound, and designating the initial set of active cuts and variables in
585the root node. Once the user invokes a solve routine, a tree manager is
586created to manage the solution process. The tree manager module in turn sets
587up the cut pool module(s), the linear programming module(s), and the cut
588generator module(s). Currently, there are three solve calls supported by the
589API. The first call is the \emph{initial solve} (see Section
590\ref{initial_solve}), which solves the problem from scratch without using warm
591start information. The second type of solve call is a \emph{warm solve}, which
592solves the problem using previously computed warm start information (see
593Section \ref{warm_solve}). Finally, there is a \emph{multicriteria solve} call
594which is used to enumerate efficient solutions to a given multicriteria MILP
595(see Section \ref{mc_solve}).
596
597During the solution process, the tree manager functions control the execution
598by maintaining the list of candidate subproblems and sending them to the NP
599modules as they become idle. The NP modules receive nodes from the tree
600manager, process them, branch (if required), and send back the identity of the
601chosen branching object to the tree manager, which in turn generates the
602children and places them on the list of candidates to be processed (see
603Section \ref{branching} for a description of the branching operation). A
604schematic summary of the algorithm is shown in Figure \ref{overview}. The
605preference ordering for processing nodes is a run-time parameter. Typically,
606the node with the smallest lower bound is chosen to be processed next since
607this strategy minimizes the overall size of the search tree. However, at
608times, it is advantageous to {\em dive} down in the tree. The concepts of {\em
609diving} and {\em search chains}, introduced in Section \ref{tree-management},
610extend the basic best-first'' approach.
611
612We mentioned earlier that cuts and variables can be treated in a
613somewhat symmetric fashion. However, it should be clear by now
614that our current implementation favors the implementation of
615branch and cut algorithms, where the computational effort spent
616generating cuts dominates that of generating variables. Our methods of
617representation also clearly favor such problems. In a future version
618of the software, we plan to erase this bias by adding additional
619functionality for handling variable generation and storage. This is
620the approach already taken by of COIN/BCP \cite{coin-or}. For more
621discussion of the reasons for this bias and the differences between
622the treatment of cuts and variables, see Section \ref{lp-relaxation}.
623
624\section{Details of the Implementation}
625\label{modules}
626
627\subsection{The Master Module}
628\label{master}
629
630The primary functions performed by the master module were listed in
631Section \ref{master-process}. Here, we describe the implementational details of
632the various solve calls.
633%If needed, the user must provide a
634%routine to read problem-specific parameters in from the parameter
635%file. She must also provide a subroutine for upper bounding if
636%desired, though upper bounds can also be provided explicitly. A good
637%initial upper bound can dramatically decrease the solution time by
638%allowing more variable-fixing and earlier pruning of search tree
639%nodes. If no upper bounding subroutine is available, then the
640%two-phase algorithm, in which a good upper bound is found quickly in
641%the first phase using a reduced set of variables can be advantageous.
642%See Section \ref{two-phase} for details. The user's only unavoidable
643%obligation during pre-processing is to specify the list of base
644%variables and, if desired, the list of extra variables that are to be
645%active in the root node. Again, we point out that selecting a good set
646%of base variables can make a marked difference in solution speed,
647%especially using the two-phase algorithm.
648
649\subsubsection{Initial Solve}\label{initial_solve}
650
651Calling the initial solve method solves a given MILP from scratch, as
652described above. The first action taken is to create an instance of the tree
653manager module that will control execution of the algorithm. If the algorithm
654is to be executed in parallel on a distributed architecture, the master module
655spawns a separate tree manager process that will autonomously control the
656solution process. The tree manager in turn creates the modules for processing
657the nodes of the search tree, generating cuts, and maintaining cut pools.
658These modules work in concert to execute the solution process. When it makes
659sense, sets of two or more modules, such as a node processing module and a cut
660generation module may be combined to yield a single process in which the
661combined modules work in concert and communicate with each other through
662shared memory instead of across the network. When running as separate process,
663the modules communicate with each other using a standard communications
664protocol. Currently, the only option supported is PVM, but it would be
665relatively easy to add an MPI implementation.
666
667The overall flow of the algorithm is similar to other branch and bound
668implementations and is detailed below. A priority queue of candidate
669subproblems available for processing is maintained at all times and the
670candidates are processed in an order determined by the search strategy. The
671algorithm terminates when the queue is empty or when another specified
672condition is satisfied. A new feature in SYMPHONY \VER\  is the ability to stop
673the computation based on exceeding a given time limit, exceeding a given limit
674on the number of processed nodes, achieving a target percentage gap between
675the upper and lower bounds, or finding the first feasible solution. After
676halting prematurely, the computation can be restarted after modifying
677parameters or problem data. This enables the implementation of a wide range of
678dynamic and on-line solution algorithms, as we describe next.
679
680\subsubsection{Solve from Warm Start} \label{warm_solve}
681
682Among the utility classes in the COIN-OR repository is a base class for
683describing the data needed to warm start the solution process for a particular
684solver or class of solvers. To support this option for SYMPHONY, we have
685implemented such a warm start class for MILPs. The main content of the class
686is a compact description of the search tree at the time the computation was
687halted. This description contains complete information about the subproblem
688corresponding to each node in the search tree, including the branching
689decisions that lead to the creation of the node, the list of active variables
690and constraints, and warm start information for the subproblem itself (which
691is a linear program). All information is stored compactly using SYMPHONY's
692native data structures, which store only the differences between a child and
693its parent, rather than an explicit description of every node. This approach
694reduces the tree's description to a fraction of the size it would otherwise
695be. In addition to the tree itself, other relevant information regarding the
696status of the computation is recorded, such as the current bounds and best
697feasible solution found so far. Using the warm start class, the user can save
698a warm start to disk, read one from disk, or restart the computation at any
699point after modifying parameters or the problem data itself. This allows
700the user to easily implement periodic checkpointing, to design dynamic
701algorithms in which the parameters are modified after the gap reaches a
702certain threshold, or to modify problem data during the solution process if
703needed.
704
705\paragraph{Modifying Parameters.}
706
707The most straightforward use of the warm start class is to restart the solver
708after modifying problem parameters. To start the computation from a given warm
709start when the problem data has not been modified, the tree manager simply
710traverses the tree and adds those nodes marked as candidates for processing to
711the node queue. Once the queue has been reformed, the algorithm is then able
712to pick up exactly where it left off. Code for using the resolve command was
713shown in Figure \ref{dynamic}. The situation is more challenging if the user
714modifies problem data in between calls to the solver. We address this
715situation next.
716
717\paragraph{Modifying Problem Data.}
718
719If the user modifies problem data in between calls to the solver, SYMPHONY
720must make corresponding modifications to the leaf nodes of the current search
721tree to allow execution of the algorithm to continue. In principle, any change
722to the original data that does not invalidate the subproblem warm start data,
723i.e., the basis information for the LP relaxation, can be
724accommodated. Currently, SYMPHONY can only handle modifications to the rim
725vectors of the original MILP. Methods for handling other modifications, such as
726the addition of columns or the modification of the constraint matrix itself,
727will be added in the future. To initialize the algorithm, each leaf node,
728regardless of its status after termination of the previous solve call, must be
729inserted into the queue of candidate nodes and reprocessed with the changed
730rim vectors. After this reprocessing, the computation can continue as usual.
731Optionally, the user can trim the tree'' before resolving. This consists of
732locating nodes whose descendants are all likely to be pruned in the resolve
733and eliminating those descendants in favor of processing the parent node
734itself. This ability could be extended to allow changes that invalidate the
735warm start data of some leaf nodes.
736
737The ability to resolve after modifying problem data has a wide range of
738applications in practice. One obvious use is to allow dynamic modification of
739problem data during the solve procedure, or even after the procedure has been
740completed. Implementing such a solver is simply a matter of periodically
741stopping to check for user input describing a change to the problem. Another
742obvious application is in situations where it is known a priori that the user
743will be solving a sequence of very similar MILPs. This occurs, for instance,
744when implementing algorithms for multicriteria optimization, as we describe in
745Section \ref{mc_solve}. One approach to this is to solve a given base
746problem'' (possibly limiting the size of the warm start tree), save the warm
747start information from the base problem and then start each subsequent call
748from this same checkpoint. Code for implementing this was shown in Figure
749\ref{warm_start}.
750
751\subsubsection{Bicriteria Solve}\label{mc_solve}
752
753For those readers not familiar with bicriteria integer programming, we briefly
754review the basic notions here. For clarity, we restrict the discussion here to
755pure integer programs (ILPs), but the principles are easily generalized. A
756bicriteria ILP is a generalization of a standard ILP presented earlier that
757includes a second objective function, yielding an optimization problem of the
758form
759\begin{equation}
760\begin{array}{lrcl}
761\vmin & [cx, dx],  \label{bicrit} \\
762\textrm{s.t.} & Ax & \leq & b, \\
763& x & \in & \Z^{n}.
764\end{array}
765\end{equation}
766The operator \emph{vmin} is understood to mean that solving this program is
767the problem of generating \emph{efficient} solutions, which are these feasible
768solutions $p$ to \ref{bicrit} for which there does not exist a second
769distinct feasible solution $q$ such that $cq \leq cp$ and $dq \leq dp$ and at
770least one inequality is strict. Note that (\ref{bicrit}) does not have a
771unique optimal solution value, but a set of pairs of solution values called
772\emph{outcomes}. The pairs of solution values corresponding to efficient
773solutions are called \emph{Pareto outcomes}. Surveys of methodology for for
774enumerating the Pareto outcomes of multicriteria integer programs are provided
775by Climaco et al.~\cite{climaco97} and more recently by Ehrgott and
776Gandibleux~\cite{ehrgott00, ehrgott02} and Ehrgott and
777Wiecek~\cite{ehrgott04}.
778
779The bicriteria ILP (\ref{bicrit}) can be converted to a standard ILP by
780taking a nonnegative linear combination of the objective
781functions~\cite{geoff68}. Without loss of generality, the weights can be
782scaled so they sum to one, resulting in a family of ILPs parameterized by a
783scalar $0 \leq \alpha \leq 1$, with the bicriteria objective function replaced
784by the \emph{weighted sum objective}
785\begin{equation}\label{wsum}
786(\alpha c + (1 - \alpha) d) x.
787\end{equation}
788Each selection of weight $\alpha$ produces a different single-objective
789problem. Solving the resulting ILP produces a Pareto outcome called a
790\emph{supported outcome}, since it is an extreme point on the convex lower
791envelope of the set of Pareto outcomes. Unfortunately, not all efficient
792outcomes are supported, so it is not possible to enumerate the set of Pareto
793outcomes by solving a sequence of ILPs from this parameterized family. To
794obtain all Pareto outcomes, one must replace the weighted sum objective
795(\ref{wsum}) with an objective based on the \emph{weighted Chebyshev norm}
796studied by Eswaran et al.~\cite{eswaran89} and Solanki~\cite{solanki91}. If
797$x^c$ is a solution to a weighted sum problem with $\alpha = 1$ and $x^d$ is
798the solution with $\alpha = 0$, then the weighted Chebyshev norm of a feasible
799solution $p$ is
800\begin{equation}
801\max \{\alpha (cp - cx^c), (1 - \alpha)(dp - dx^d)\}.
802\label{chebyshev}
803\end{equation}
804Although this objective function is not linear, it can easily be linearized by
805adding an artificial variable, resulting in a second parameterized family of
806ILPs. Under the assumption of \emph{uniform dominance}, Bowman showed that an
807outcome is Pareto if and only if it can be obtained by solving some ILP in
808this family~\cite{bowman76}. In \cite{WCN}, the authors presented a method for
809enumerating all Pareto outcomes by solving a sequence of ILPs in this
810parameterized family. By slightly perturbing the objective function, they also
811showed how to relax the uniform dominance assumption. Note that the set of all
812supported outcomes, which can be thought of as an approximation of the set of
813Pareto outcomes, can be similarly obtained by solving a sequence of ILPs with
814weighted sum objectives.
815
816SYMPHONY \VER\  contains a generic implementation of the algorithm described in
817\cite{WCN}, along with a number of methods for approximating the set of Pareto
818outcomes. To support these capabilities, we have extended the OSI interface so
819that it allows the user to define a second objective function. Of course, we
820have also added a method for invoking this bicriteria solver called
821\texttt{multiCriteriaBranchAndBound()}. Relaxing the uniform dominance
822requirement requires the underlying ILP solver to have the ability to
823generate, among all optimal solutions to a ILP with a primary objective, a
824solution minimizing a given secondary objective. We added this capability to
825SYMPHONY through the use of optimality cuts, as described in \cite{WCN}.
826
827Because implementing the algorithm requires the solution of a sequence of
828ILPs that vary only in their objective functions, it is possible to use warm
829starting to our advantage.  Although the linearization of (\ref{chebyshev})
830requires modifying the constraint matrix from iteration to iteration, it is
831easy to show that these modifications cannot invalidate the basis. In the case
832of enumerating all supported outcomes, only the objective function is modified
833from one iteration to the next. In both cases, we save warm start information
834from the solution of the first ILP in the sequence and use it for each
835subsequent computation.
836
837\subsection{The Node Processing Module}
838
839The NP module is at the core of the algorithm, as it performs the
840processing and bounding operations for each subproblem. A schematic
841diagram of the node processing loop is presented in Fig. \ref{LP-loop}.
842The details of the implementation are discussed in the following
843sections.
844
845\begin{figure}
846\centering
847\psfig{figure=lploop.eps,width=4.80in}
848\caption{Overview of the node processing loop}
849\label{LP-loop}
850\end{figure}
851
852\subsubsection{The LP Engine}
853
854SYMPHONY requires the use of a third-party callable library (referred
855to as the {\em LP engine} or {\em LP library}) to solve the LP
856relaxations once they are formulated. As with the user functions,
857SYMPHONY communicates with the LP engine through an API that converts
858SYMPHONY's internal data structures into those of the LP engine.
859Currently, the framework will only work with advanced, simplex-based
860LP engines, such as CPLEX \cite{cplex}, since the LP engine must be
861able to accept an advanced basis, and provide a variety of data to the
862framework during the solution process. The internal data structures
863used for maintaining the LP relaxations are similar to those of CPLEX
864and matrices are stored in the standard column-ordered format.
865
866\subsubsection{Managing the LP Relaxation}
867\label{lp-relaxation}
868
869The majority of the computational effort of BCP is spent
870solving LPs and hence a major emphasis in the development was to make
871this process as efficient as possible. Besides using a good LP engine,
872the primary way in which this is done is by controlling the size of
873each relaxation, both in terms of number of active variables and
874number of active constraints.
875
876The number of constraints is controlled through use of a local
877pool and through purging of ineffective constraints. When a cut is
878generated by the cut generator, it is first sent to the local cut
879pool. In each iteration, up to a specified number of the strongest
880cuts (measured by degree of violation) from the local pool are added
881to the problem. Cuts that are not strong enough to be added to the
882relaxation are eventually purged from the list. In addition, cuts are
883purged from the LP itself when they have been deemed ineffective for
884more than a specified number of iterations, where ineffective is
885defined as either (1) the corresponding slack variable is positive,
886(2) the corresponding slack variable is basic, or (3) the dual value
887corresponding to the row is zero (or very small). Cuts that have
888remained effective in the LP for a specified number of iterations are
889sent to the global pool where they can be used in later search nodes.
890Cuts that have been purged from the LP can be made active again if
891they later become violated.
892
893The number of variables (columns) in the relaxation is controlled
894through {\em reduced cost fixing} and {\em dynamic column generation}.
895Periodically, each active variable is {\em priced} to see if it can be
896fixed by reduced cost. That is, the LP reduced cost is examined in an
897effort  to determine whether fixing that variable at
898one of its bounds would remove improving solutions; if not, the
899variable is fixed and removed from consideration. If the matrix is
900{\em full} at the time of the fixing, meaning that all unfixed
901variables are active, then the fixing is permanent for that subtree.
902Otherwise, it is temporary and only remains in force until the next
903time that columns are dynamically generated.
904
905Because SYMPHONY was originally designed for combinatorial problems
906with relatively small numbers of variables, techniques for performing
907dynamic column generation are somewhat unrefined. Currently, variables
908are priced out sequentially by index, which can be costly. To improve
909the process of pricing variables, we plan to increase the symmetry
910between our methods for handling variables and those for handling
911cuts. This includes (1) allowing user-defined, abstract
912representations for variables, (2) allowing the use of variable
913generators'' analogous to cut generators, (3) implementing both global
914and local pools for variables, (4) implementing heuristics that help
915determine the order in which the indexed variables should be priced,
916and (5) allowing for methods of simultaneously pricing out large
917groups of variables. Much of this is already implemented in COIN/BCP.
918
919Because pricing is computationally burdensome, it currently takes
920place only either (1) before branching (optional), or (2) when a node
921is about to be pruned (depending on the phase---see the description of
922the two-phase algorithm in Sect. \ref{two-phase}). To use dynamic
923column generation, the user must supply a subroutine which generates
924the column corresponding to a particular user index, given the list of
925active constraints in the current relaxation. When column generation
926occurs, each column not currently active that has not been previously
927fixed by reduced cost is either priced out immediately, or becomes
928active in the current relaxation. Only a specified number of columns
929may enter the problem at a time, so when that limit is reached, column
930generation ceases. For further discussion of column generation, see
931Sect. \ref{two-phase}, where the two-phase algorithm is described.
932
933Since the matrix is stored in compressed form, considerable
934computation may be needed to add and remove rows and columns. Hence,
935rows and columns are only physically removed from the problem when
936there are sufficiently many to make it worthwhile.'' Otherwise,
937deleted rows and columns remain in the matrix but are simply ignored
938by the computation. Note that because ineffective rows left in the
939matrix increase the size of the basis unnecessarily, it is usually
941
942\subsubsection{Branching}
943\label{branching}
944
945Branching takes place whenever either (1) both cut generation and
946column generation (if performed) have failed; (2) tailing off'' in
947the objective function value has been detected; or (3) the
948user chooses to force branching. Branching can take place on cuts or
949variables and can be fully automated or fully controlled by the user,
950as desired. Branching can result in as many children as the user
951desires, though two is typical. Once it is decided that branching will
952occur, the user must either select the list of candidates for {\em
953strong branching} (see below for the procedure) or allow SYMPHONY to
954do so automatically by using one of several built-in strategies, such
955as branching on the variable whose value is farthest from being
956integral. The number of candidates may depend on the level of the
957current node in the tree---it is usually best to expend more effort on
958branching near the top of the tree.
959
960After the list of candidates is selected, each candidate is {\em
961pre-solved}, by performing a specified number of iterations of the
962dual simplex algorithm in each of the resulting subproblems. Based on
963the objective function values obtained in each of the potential
964children, the final branching object is selected, again either by the
965user or by built-in rule. This procedure of using exploratory LP
966information in this manner to select a branching candidate is commonly
967referred to as {\em strong branching}. When the branching object has
968been selected, the NP module sends a description of that object to the
969tree manager, which then creates the children and adds them to the
970list of candidate nodes. It is then up to the tree manager to specify
971which node the now-idle NP module should process next. This issue is
972further discussed below.
973
974\subsection{The Tree Management Module}
975\label{tree-management}
976
977\subsubsection{Managing the Search Tree}
978
979The tree manager's primary job is to control the execution of the
980algorithm by deciding which candidate node should be chosen as the
981next to be processed. This is done using either one of several
982built-in rules or a user-defined rule. Usually, the goal of the search
983strategy is to minimize overall running time, but it is sometimes
984also important to find good feasible solutions early in the search
985process. In general, there are two ways to decrease running
986time---either by decreasing the size of the search tree or by
987decreasing the time needed to process each search tree node.
988
989To minimize the size of the search tree, the strategy is to select
990consistently that candidate node with the smallest associated lower bound.
991In theory, this strategy, sometimes called {\em best-first}, will lead
992the smallest possible search tree. However, we need to consider the
993time required to process each search tree node as well. This is affected
994by both the quality of the current upper bound and by such factors as
995communication overhead and node set-up costs. When considering these
996additional factors, it is sometimes be more effective to deviate from the
997best-first search order. We discuss the importance of such strategies
998below.
999
1000\subsubsection{Search Chains and Diving}
1001
1002One reason for not strictly enforcing the search order is because it
1003is somewhat expensive to construct a search node, send it to an NP
1004module, and set it up for processing. If, after branching, we choose
1005to continue processing one of the children of the current subproblem,
1006we avoid the set-up cost, as well as the cost of communicating the
1007node description of the retained child subproblem back to the tree
1008manager. This is called {\em diving} and the resulting chain of nodes
1009is called a {\em search chain}. There are a number of rules for
1010deciding when an NP module should be allowed to dive. One such rule is
1011to look at the number of variables in the current LP solution that
1012have fractional values. When this number is low, there may be a good
1013chance of finding a feasible integer solution quickly by diving. This
1014rule has the advantage of not requiring any global information. We
1015also dive if one of the children is close'' to being the best node,
1016where close'' is defined by a chosen parameter.
1017
1018In addition to the time saved by avoiding reconstruction of the LP in
1019the child, diving has the advantage of often leading quickly to the
1020discovery of feasible solutions, as discussed above. Good upper bounds
1021not only allow earlier pruning of unpromising search chains, but also
1022should decrease the time needed to process each search tree node by
1023allowing variables to be fixed by reduced cost.
1024
1025\subsubsection{The Two-Phase Algorithm}
1026\label{two-phase}
1027
1028If no heuristic subroutine is available for generating feasible
1029solutions quickly, then a unique two-phase algorithm can also be
1030invoked. In the two-phase method, the algorithm is first run to
1031completion on a specified set of core variables. Any node that would
1032have been pruned in the first phase is instead sent to a pool of
1033candidates for the second phase. If the set of core variables is
1034small, but well-chosen, this first phase should be finished quickly
1035and should result in a near-optimal solution. In addition, the first
1036phase will produce a list of useful cuts. Using the upper bound and
1037the list of cuts from the first phase, the root node is {\em
1038repriced}---that is, it is reprocessed with the full set of variables
1039and cuts. The hope is that most or all of the variables not included
1040in the first phase will be priced out of the problem in the new root
1041node. Any variable thus priced out can be eliminated from the problem
1042globally. If we are successful at pricing out all of the inactive
1043variables, we have shown that the solution from the first phase was,
1044in fact, optimal. If not, we must go back and price out the (reduced)
1045set of extra variables in each leaf of the search tree produced during
1046the first phase. We then continue processing any node in which we fail
1047to price out all the variables.
1048
1049In order to avoid pricing variables in every leaf of the tree, we can
1050{\em trim the tree} before the start of the second phase. Trimming the
1051tree consists of eliminating the children of any node for which
1052each child has lower bound above the current upper
1053bound. We then reprocess the parent node itself. This is typically
1054more efficient, since there is a high probability that, given the new
1055upper bound and cuts, we will be able to prune the parent node and
1056avoid the task of processing each child individually.
1057
1058\subsection{The Cut Generation Module}
1059
1060To implement the cut generator process, the user must provide a
1061function that accepts an LP solution and returns cuts violated by that
1062solution to the NP module. In parallel configurations, each cut is
1063returned immediately to the NP module, rather than being passed back
1064as a group once the function exits. This allows the LP to begin adding
1065cuts and solving the current relaxation before the cut generator is
1066finished if desired. Parameters controlling if and when the LP should
1067begin solving the relaxation before the cut generator is finished can
1068be set by the user.
1069
1070\subsection{The Cut Management Module}
1071
1072\subsubsection{Maintaining and Scanning the Pool}
1073
1074The cut manager's primary job is to receive a solution from an
1075NP module and return cuts from the pool that are violated by it. The
1076cuts are stored along with two pieces of information---the level of
1077the tree on which the cut was generated, known simply as the {\em
1078level} of the cut, and the number of times it has been checked for
1079violation since the last time it was actually found to be violated,
1080known as the number of {\em touches}. The number of touches
1081can be used as a simplistic measure of its effectiveness. Since the pool
1082can get quite large, the user can choose to scan only cuts whose
1083number of touches is below a specified threshold and/or cuts that were
1084generated on a level at or above the current one in the tree. The idea
1085behind this second criterion is to try to avoid checking cuts that were
1086not generated nearby'' in the tree, as they are less likely to be
1087effective. Any cut generated at a level in the tree
1088below the level of the current node must have been generated in a
1089different part of the tree. Although this is admittedly a naive
1090method, it does seem to work reasonably well.
1091
1092On the other hand, the user may define a specific measure of quality for
1093each cut to be used instead. For example, the degree of
1094violation is an obvious candidate. This measure of quality must be
1095computed by the user, since the cut pool module has no knowledge of
1096the cut data structures. The quality is recomputed every time
1097the user checks the cut for violation and a running average is used as
1098the global quality measure. The cuts in the pool are periodically
1099sorted by this measure and only the highest quality cuts
1100are checked each time. All duplicate cuts, as well as all cuts whose
1101number of touches exceeds or whose quality falls below specified
1102thresholds, are periodically purged from the pool to keep it as small as
1103possible.
1104
1105\subsubsection{Using Multiple Pools}
1106\label{multi-cut-pools}
1107
1108For several reasons, it may be desirable to have multiple cut pools.
1109When there are multiple cut pools, each pool is initially assigned
1110to a particular node in the search tree. After being assigned to that
1111node, the pool services requests for cuts from that node and all
1112of its descendants until such time as one of its descendants gets
1113assigned to another cut pool. After that, it continues to
1114serve all the descendants of its assigned node that are not assigned
1115to other cut pools.
1116
1117Initially, the first cut pool is assigned to the root node. All other cut
1118pools are unassigned. During execution, when a new node is sent to be
1119processed, the tree manager must determine which cut pool the node should be
1120serviced by. The default is to use the same cut pool as its parent. However,
1121if there is currently an idle cut pool process (either it has never been
1122assigned to any node or all the descendants of its assigned node have been
1123processed or reassigned), then that cut pool is assigned to this new node. All
1124the cuts currently in the cut pool of its parent node are copied to the new
1125pool to initialize it, after which the two pools operate independently on
1126their respective subtrees. When generating cuts, the NP module sends the new
1127cuts to the cut pool assigned to service the node during whose processing the
1128cuts were generated.
1129
1130The primary motivation behind the idea of multiple cut pools is
1131two-fold. First, we want simply to limit the size of each pool as
1132much as possible. By limiting the number of nodes that a cut pool has
1133to service, the number of cuts in the pool will be similarly limited.
1134This not only allows cut storage to spread over multiple processors,
1135and hence increases the available memory, but at the same time, the
1136efficiency with which the cut pool can be scanned for violated cuts is
1137also increased. A secondary reason for maintaining multiple cut pools is
1138that it allows us to limit the scanning of cuts to only those that
1139were generated in the same subtree as the current search node. As
1140described above, this helps focus the search and should increase the
1141efficiency and effectiveness of the search. This idea also
1142allows us to generate locally valid cuts, such as the classical
1143Gomory cuts (see \cite{nemwol88}).
1144
1145\section{Parallelizing BCP}
1146\label{parallelizing}
1147
1148Because of the clear partitioning of work that occurs when the
1149branching operation generates new subproblems, branch and bound
1150algorithms lend themselves well to parallelization. As a result, there
1151is already a significant body of research on performing branch and
1152bound in parallel environments. We again point the reader to the
1153survey of parallel branch and bound algorithms by Gendron and Crainic
1154\cite{gend:paral}, as well as other references such as
1155\cite{PICO,GraKum99,kuma:para4,kuma:para3}.
1156
1157In parallel BCP, as in general branch and bound, there are two major
1158sources of parallelism. First, it is clear that any number of
1159subproblems on the current candidate list can be processed
1160simultaneously. Once a subproblem has been added to the list, it can
1161be properly processed before, during, or after the processing of any
1162other subproblem. This is not to say that processing a particular node
1163at a different point in the algorithm won't produce different
1164results---it most certainly will---but the algorithm will terminate
1165correctly in any case. The second major source of parallelism is to
1166parallelize the processing of individual subproblems. By allowing
1167separation to be performed in parallel with the solution of the linear
1168programs, we can theoretically process a node in little more than the
1169amount of time it takes to solve the sequence of LP relaxations. Both
1170of these sources of parallelism can be easily exploited using the
1171\BB\ framework.
1172
1173The most straightforward parallel implementation, which is the one we
1174currently employ, is a master-slave model, in which there is a central
1175manager responsible for partitioning the work and parceling it out to
1176the various slave processes that perform the actual computation. The
1177reason we chose this approach is because it allows memory-efficient
1178data structures for sequential computation and yet is conceptually
1179easy to parallelize. Unfortunately, this approach does have limited
1180scalability. For further discussions on the scalability of BCP algorithms and
1181approaches to improving it, see \cite{symphony1} and \cite{ALPS2}.
1182
1183%\subsection{Details of the Parallel Implementation}
1184
1185\subsection{Parallel Configurations}
1186
1187SYMPHONY supports numerous configurations, ranging from completely
1188sequential to fully parallel, allowing efficient execution in many
1189different computational settings. As described in the previous
1190section, there are five modules in the standard distributed
1191configuration. Various subsets of these modules can be
1192combined to form separate executables capable of communicating
1193with each other across a network. When two or more modules are combined,
1194they simply communicate through shared-memory instead of through
1195message-passing. However, they are also forced to run in sequential
1196fashion in this case, unless the user chooses to enable threading
1197using an OpenMP compliant compiler (see next section).
1198
1199As an example, the default distributed configuration includes a
1200separate executable for each module type, allowing full parallelism.
1201However, if cut generation is fast and not memory-intensive,
1202it may not be worthwhile to have the NP module and its associated cut
1203generator work independently, as this increases communication
1204overhead without much potential benefit. In this case, the cut
1205generator functions can be called directly from the NP module,
1206creating a single, more efficient executable.
1207
1208\subsection{Inter-process Communication}
1209
1210SYMPHONY can utilize any third-party communication protocol supporting basic
1211message-passing functions. All communication subroutines interface with
1212SYMPHONY through a separate communications API. Currently, PVM \cite{PVMbook}
1213is the only message-passing protocol supported, but interfacing with another
1214protocol is a straightforward exercise.
1215
1216Additionally, it is possible to configure the code to run in parallel
1217using threading to process multiple search tree nodes simultaneously.
1218Currently, this is implemented using OpenMP compiler directives to
1219specify the parallel regions of the code and perform memory locking
1220functions. Compiling the code with an OpenMP compliant compiler will
1221result in a shared-memory parallel executable. For a list of OpenMP
1222compliant compilers and other resources, visit {\tt http://www.openmp.org}.
1223
1224\subsection{Fault Tolerance}
1225\label{fault-tolerance}
1226
1227Fault tolerance is an important consideration for solving large problems on
1228computing networks whose nodes may fail unpredictably. The tree manager tracks
1229the status of all processes and can restart them as necessary. Since the state
1230of the entire tree is known at all times, the most that will be lost if an NP
1231module or cut generator is killed is the work that had been completed on that
1232particular search node. To protect against the tree manager itself or a cut
1233pool being killed, full logging capabilities have been implemented. If
1234desired, the tree manager can write out the entire state of the tree to disk
1235periodically, allowing a warm restart if a fault occurs. Similarly, the cut
1236pool process can be warm-started from a log file. This not only allows for
1237fault tolerance but also for full reconfiguration in the middle of solving a
1238long-running problem. Such reconfiguration could consist of anything from
1239adding more processors to moving the entire solution process to another
1240network.
1241
Note: See TracBrowser for help on using the repository browser.