# source:coopr.doc/trunk/GettingStarted/current/pysp.txt@5583

Last change on this file since 5583 was 5583, checked in by dlwoodr, 2 years ago

Added a little section with a summary of file names. Make is not working
the way I think it should (time zones?) so I have not done a make.

File size: 17.8 KB
Line
1
2== PySP Overview  ==
3
4This chapter describes PySP: (Pyomo Stochastic Programming), where parameters are allowed to be uncertain.
5
6=== Overview of Modeling Components and Processes ===
7
8The sequence of activities is typically the following:
9
10* Create a deterministic model and declare components
11* Develop base-case data for the deterministic model
12* Test, verify and validate the deterministic model
13* Model the stochastic processes
14* Develop a way to generate scenarios (in the form of a tree if there are more than two stages)
15* Create the data files need to describe the stochastics
16* Use PySP to solve stochastic problem
17
18When viewed from the standpoint of file creation, the process is
19
20* Create an abstract model for the deterministic problem in a file called +ReferenceModel.py+
21* Specify data for this model in a file called +ReferenceModel.dat+
22* Specify the stochastics in a file called +ScenarioStructure.dat+
23* Specify scenario data
24
25=== Birge and Louveaux's Farmer Problem ===
26
27Birge and Louveaux <<BirgeLouveauxBook>> make use of the example of a farmer
28who has 500 acres that can be planted in wheat, corn or sugar beets,
29at a per acre cost of \$150, \$230 and \$260, respectively. The farmer
30needs to have at least 200 tons of wheat and 240 tons of corn to use as feed, but
31if enough is not grown, those crops can be purchased for \$238 and \$210, respectively.
32Corn and wheat grown in excess of the feed requirements can be sold for \$170 and \$150,
33respectively. A price of \$36 per ton is guaranteed for the first 6000 tons grown by
34any farmer, but beets in excess of that are sold for \$10 per ton. The yield
35is 2.5, 3, and 20 tons per acre for wheat, corn and sugar beets, respectively.
36
37==== ReferenceModel.py ====
38So far, this is a deterministic problem because we are assuming that we know all the data. The Pyomo
39model for this problem shown here is in the file +ReferenceModel.py+ in the sub-directory +examples/pysp/farmer/models+ that is
40distributed with Coopr.
41----
42include::examples/CooprGettingStarted/farmer.py[]
43----
44
45==== ReferenceModel.dat ====
46The data introduced here are in the file ReferenceModel.dat in the sub-directory examples/pysp/farmer/scenariodata that
47is distributed with Coopr.
48----
49include::examples/CooprGettingStarted/farmer.dat[]
50----
51
52Any of these data could be modeled as uncertain, but we will consider only the possibility that the yield per acre
53could be higher or lower than expected. Assume that there is a probability of 1/3 that the yields will be
54the average values that were given (i.e., wheat 2.5; corn 3; and beets 20). Assume that there is a 1/3 probability
55that they will be lower (2, 2.4, 16) and 1/3 probability they will be higher (3, 3.6, 24).  We refer to
56each full set of data as a _scenario_ and collectively we call them a _scenario_ _tree_.
57In this case the scenario tree is very simple: there is a root node
58and three leaf nodes: one corresponding to each scenario. The acreage-to-plant decisions are root node decisions
59because they must be made without knowing what the yield will be.
60The other variables are so-called _second_ _stage_ decisions, because they will depend on which scenario is realized.
61
62==== ScenarioStructure.dat ====
63PySP requires that users describe the scenario tree using specific constructs in a file named +ScenarioStructure.dat+; for the farmer
64problem, this file can be found in the coopr sub-directory +examples/pysp/farmer/scenariodata+ that is distributed with Coopr.
65----
66include::examples/CooprGettingStarted/farmerScenarioStructure.dat[]
67----
68This data file is verbose and somewhat redundant, but in most applications it is generated by software rather than by a person, so
69this is not an issue.
70Generally, the left-most part of each
71expression (e.g. ``set Stages :='') is required and uses reserved words (e.g., +Stages+) and the other names are
72supplied by the user (e.g., ``FirstStage'' could be any name). Every assignment is terminated with a semi-colon.
73We will now consider the assignments in this file one at a time.
74
75The first assignments provides names for the stages and the words "set Stages" are required, as are the := symbols.
76Any names can be used. In this example, we used "FirstStage" and "SecondStage" but we could
77have used "EtapPrimero" and "ZweiteEtage" if we had wanted to. Whatever names are given here will continue to be used
78to refer to the stages in the rest of the file. The order of the names is important. A simple way to think of it
79is that generally, the names must be in time order (technically, they need to be in order of information discovery, but
80that is usually time-order).  Stages refers to decision stages, which may, or may not, correspond directly with time stages.
81In the farmer example, decisions about how much to plant are made in the first stage and "decisions" (which are pretty obvious, but
82which are decision variables nonetheless) about how much to sell at each price and how much needs to be bought are
83second stage decisions because they are made after the yield is known.
84----
85set Stages := FirstStage SecondStage ;
86----
87
88Node names are constructed next. The words "set Nodes" are required, but any names may be assigned to the nodes. In two
89stage stochastic problems there is a root node, which we chose to name "RootNode" and then there is a node for each scenario.
90----
91set Nodes := RootNode
92             BelowAverageNode
93             AverageNode
94             AboveAverageNode ;
95----
96
97Nodes are associated with time stages with an assignment beginning with the required words "param Nodestage." The assignments must make
98use of previously defined node and stage names. Every node must be assigned a stage.
99----
100param NodeStage := RootNode         FirstStage
101                   BelowAverageNode SecondStage
102                   AverageNode      SecondStage
103                   AboveAverageNode SecondStage ;
104----
105
106The structure of the scenario tree is defined using assignment of children to each node that has them. Since this is a two stage
107problem, only the root node has children. The words "param Children" are required for every node that has children and the name
108of the node is in square brackets before the colon-equals assignment symbols. A list of children is assigned.
109----
110set Children[RootNode] := BelowAverageNode
111                          AverageNode
112                          AboveAverageNode ;
113----
114
115The probability for each node, conditional on observing the parent node is given in an assignment that begins with the
116required words "param ConditionalProbability." The root node always has a conditional probability of 1, but it must always
117be given anyway. In this example, the second stage nodes are equally likely.
118----
119param ConditionalProbability := RootNode          1.0
120                                BelowAverageNode  0.33333333
121                                AverageNode       0.33333334
122                                AboveAverageNode  0.33333333 ;
123----
124
125Scenario names are given in an assignment that begins with the required words "set Scenarios" and provides a list of
126the names of the scenarios. Any names may be given. In many applications they are given unimaginative names generated by
127software such as "Scen1" and the like. In this example, there are three scenarios and the names reflect the relative values of
128the yields.
129----
130set Scenarios := BelowAverageScenario
131                 AverageScenario
132                 AboveAverageScenario ;
133----
134
135Leaf nodes, which are nodes with no children, are associated with scenarios. This assignment must be one-to-one and it is
136initiated with the words "param ScenarioLeafNode" followed by the colon-equals assignment characters.
137----
138param ScenarioLeafNode :=
139                    BelowAverageScenario BelowAverageNode
140                    AverageScenario      AverageNode
141                    AboveAverageScenario AboveAverageNode ;
142----
143
144Variables are associated with stages using an assignment that begins with the required words "set StageVariables" and the name
145of a stage in square brackets followed by the colon-equals assignment characters. Variable names that have been defined
146in the file ReferenceModel.py can be assigned to stages. Any variables that are not assigned are assumed to be in the last stage.
147Variable indexes can be given explicitly and/or wildcards can be used.
148Note that the variable names appear without the prefix "model."
149In the farmer example, DevotedAcreage is the only
150first stage variable.
151----
152set StageVariables[FirstStage] :=  DevotedAcreage[*] ;
153set StageVariables[SecondStage] := QuantitySubQuotaSold[*]
154                                   QuantitySuperQuotaSold[*]
155                                   QuantityPurchased[*] ;
156----
157
158For reporting purposes, it is useful to define auxiliary variables in +ReferenceModel.py+ that will be assigned the cost
159associated with each stage. This variables do not impact algorithms, but the values are output by some software during execution
160as well as upon completion. The names of the variables are assigned to stages using the "param StageCostVariable" assignment. The stages
161are previously defined in +ScenarioStructure.dat+ and the variables are previously defined in +ReferenceModel.py+. Note that the variable names appear without the prefix "model."
162----
163param StageCostVariable := FirstStage  FirstStageCost
164                           SecondStage SecondStageCost ;
165----
166
167==== Scenario data specification ====
168So far, we have given a model in the file named +ReferenceModel.py+, a set of deterministic data in the file named +ReferenceModel.py+, and
169a description of the stochastics in the file named +ScenarioStructure.dat+. All that remains is to give the data for each scenario.
170There are two ways to do that in PySP: _scenario-based_ and _node-based_. The default is scenario-based so we will describe that first.
171
172For scenario-based data, the full data for each scenario is given in a +.dat+ file with the root name that is the name of the scenario.
173So, for example, the file named +AverageScenario.dat+ must contain all the data for the model for the scenario named "AvererageScenario." It turns out that this file can be created by simply copying the file +ReferenceModel.dat+ as shown above because it contains
174a full set of data for the "AverageScenario" scenario. The files +BelowAverageScenario.dat+ and +AboveAverageScenario.dat+ will differ from
175this file and from each other only in their last line, where the yield is specified.  These three files are distributed with
176Coopr and are in the coopr sub-directory +examples/pysp/farmer/scenariodata+ along with +ScenarioStructure.dat+ and +ReferenceModel.dat+.
177
178Scenario-based data wastes resources by specifying the same thing over and over again. In many cases, that does not matter
179and it is convenient to have full scenario data files available (for one thing, the scenarios can easily be run
180independently using the +pyomo+ command). However, in many other settings, it is better to use a node-based specification where the
181data that is unique to each node is specified in a .dat file with a root name that matches the node name. In the farmer example, the file
182+RootNode.dat+ will be the same as +ReferenceModel.dat+ except that it will lack the last line that specifies the yield. The files
183+BelowAverageNode.dat+, +AverageNode.dat+, and +AboveAverageNode.dat+ will contain only one line each to specify the yield. If node-based
184data is to be used, then the +ScenarioStructure.dat+ file must contain the following line:
185----
186param ScenarioBasedData := False ;
187----
188An entire set of files for node-based data for the farmer problem are distributed with Coopr in the sub-directory
189+examples/pysp/farmer/nodedata+
190
191=== Finding Solutions for Stochastic Models ===
192
193PySP provides a variety of tools for finding solutions to stochastic programs.
194
195==== runef ====
196
197The +runef+ command puts together the so-called _extensive_ _form_ version of the model. It creates a large model that has constraints to
198ensure that variables at a node have the same value. For example, in the farmer problem, all of the +DevotedAcres+ variables must have the
199same value regardless of which scenario is ultimately realized. The objective can be the expected value of the objective function, or
200the CVaR, or a weighted combination of the two. Expected value is the default. A full set of options
201for +runef+ can be obtained using the command:
202----
203runef --help
204----
205
206The coopr distribution contains the files need to run the farmer example in the sub-directories to the sub-directory
207+examples/pysp/farmer+ so if this is the current directory and
208if CPLEX is installed, the following command will cause formation of the EF and its solution using CPLEX.
209----
210runef -m models -i nodedata --solver=cplex --solve
211----
212
213The option +-m models+ has one dash and is short-hand for the option +--model-directory=models+ and note that the full option
214uses two dashes. The +-i+ is equivalent to +--instance-directory=+ in the same fashion. The default solver is CPLEX,
215so the solver option is not really needed. With the +--solve+ option, runef would simply write an .lp data file
216that could be passed to a solver.
217
218==== runph ====
219
220The runph command executes an implementation of Progressive Hedging (PH) that is intended to support scripting and extension.
221
222The coopr distribution contains the files need to run the farmer example in the sub-directories to the sub-directory
223examples/pysp/farmer so if this is the current directory and
224if CPLEX is installed, the following command will cause PH to execute using the default sub-problem solver, which
225is CPLEX.
226----
227runph -m models -i nodedata
228----
229
230The option +-m models+ has one dash and is short-hand for the option +--model-directory=models+ and note that the full option
231uses two dashes. The +-i+ is equivalent to +--instance-directory=+ in the same fashion.
232
233After about 33 iterations, the algorithm will achieve the default level of convergence and terminate. A lot of output is
234generated and among the output is the following solution information:
235----
236Variable=DevotedAcreage
237        Index: [CORN]            (Scenarios:  BelowAverageScenario   AverageScenario   AboveAverageScenario   )
238                Values:       79.9844      80.0000      79.9768     Max-Min=      0.0232     Avg=     79.9871
239        Index: [SUGAR_BEETS]             (Scenarios:  BelowAverageScenario   AverageScenario   AboveAverageScenario   )
240                Values:      249.9848     249.9770     250.0000     Max-Min=      0.0230     Avg=    249.9873
241        Index: [WHEAT]           (Scenarios:  BelowAverageScenario   AverageScenario   AboveAverageScenario   )
242                Values:      170.0308     170.0230     170.0232     Max-Min=      0.0078     Avg=    170.0256
243Cost Variable=FirstStageCost
244        Tree Node=RootNode               (Scenarios:  BelowAverageScenario   AverageScenario   AboveAverageScenario   )
245        Values:   108897.0836  108897.4725  108898.1476     Max-Min=      1.0640     Avg= 108897.5679
246----
247For problems with no, or few, integer variables, the default level of convergence leaves root-node variables almost
248converged. Since the acreage to be planted cannot depend on the scenario that will be realized in the future, the
249average, which is labeled "Avg" in this output, would be used.
250A farmer would probably interpret acreages of 79.9871, 249.9873, and 170.0256 to be 80, 250, and 170.
251In real-world applications, PH is embedded in scripts that produce output in a format desired by a decision maker.
252
253But in real-world applications, the default settings for PH seldom work well enough. In addition to post-processing the output,
254a number of parameters need to be adjusted and sometimes scripting to extend or augment the algorithm
255is needed to improve convergence rates.
256A full set of options can be obtained with the command.
257----
258runph --help
259----
260Note that there are two dashes before +help+.
261
262By default, PH uses quadratic objective functions after iteration zero; in some settings it may
263be desirable to linearize the quadratic terms. This is required to use a solver such as glpk
264for MIPs because it does not support quadratic MIPs. The directive
265+--linearize-nonbinary-penalty-terms=n+ causes linearization of the penalty terms using n pieces. For example,
266to use glpk on the farmer, assuming glpk is installed and the command is given when the current
267directory is the +examples/pysp/farmer+, the following command will use default settings for most parameters and
268four pieces to approximate quadratic terms in sub-problems:
269----
270runph -i nodedata -m models --solver=glpk --linearize-nonbinary-penalty-terms=4
271----
272Use of the +linearize-nonbinary-penalty-terms+ option requires that all variables not in the final stage have bounds.
273
274=== Summary of PySP File Names ===
275
276PySP scripts such as +runef+ and +runph+ require files that specify the model and data using files with specific names.
277All files can be in the current
278directory, but typically, the file +ReferenceModel.py+ is in a directory that is specified using +--model-directory=+ option (the short
279version of this option is +-i +) and the data files are in a directory specified in the +--instance-directory=+ option (the short version
280of this option is +-m +).
281
282* +ReferenceModel.py+: A full Pyomo model for a singe scenario. There should be no scenario indexes in this model because they are implicit.
283* +ReferenceModel.dat+: A full set of data for an arbitrary scenario. This will not be used during solution, but just used to define indexes.
284* +ScenarioStructure.dat+: Specifies the nature of the stochastics. It also specifies whether the rest of the data is node-based or scenario-based.  It is scenario-based unless +ScenarioStructure.dat+ contains the line
285----
286param ScenarioBasedData := False ;
287----
288
289If scenario-based, then there is a data file for each scenario that specifies a full set of data for the scenario. The name of the file
290is the name of the scenario with +.dat+ appended. The names of the scenarios are given in the +ScenarioStructure.dat+ file.
291
292If node-based, then there is a file with data for each node that specifies only that data that is unique for the node.
293The name of the file
294is the name of the node with +.dat+ appended. The names of the nodes are given in the +ScenarioStructure.dat+ file.
295
296// vim: set syntax=asciidoc:
Note: See TracBrowser for help on using the repository browser.