AlgorithmsforDecisionMaking
www.dbooks.org
AlgorithmsforDecisionMaking
MykelJ.Kochenderfer
TimA.Wheeler
KyleH.Wray
TheMITPress
Cambridge,Massachusetts
London,England
www.dbooks.org
©2022MassachusettsInstituteofTechnology
ThisworkissubjecttoaCreativeCommonsCC-BY-NC-NDlicense.Subjecttosuchlicense,allrightsare
reserved.
TheMITPresswouldliketothanktheanonymouspeerreviewerswhoprovidedcommentsondraftsofthis
book.Thegenerousworkofacademicexpertsisessentialforestablishingtheauthorityandqualityofour
publications.Weacknowledgewithgratitudethecontributionsoftheseotherwiseuncreditedreaders.
ThisbookwassetinT
E
XGyrePagellabytheauthorsinL
A
T
E
X.
PrintedandboundintheUnitedStatesofAmerica.
LibraryofCongressCataloging-in-PublicationData
Names:Kochenderfer,MykelJ.,1980–author.|Wheeler,TimA.(TimAllan),author.|Wray,KyleH.,author.
Title:Algorithmsfordecisionmaking/MykelJ.Kochenderfer,TimA.Wheeler,KyleH.Wray.
Description:Cambridge:MassachusettsInstituteofTechnology,[2022]|
Includesbibliographicalreferencesandindex.
Identiers:LCCN2021038701|ISBN9780262047012(hardcover)
Subjects:LCSH:Decisionsupportsystems–Mathematics.|Algorithms.
Classication:LCCT58.62.K6662022|DDC658.4/03—dc23
LCrecordavailableathttps://lccn.loc.gov/2021038701
10987654321
Toourfamilies
www.dbooks.org
Contents
Prefacexix
Acknowledgmentsxxi
1Introduction1
1.1DecisionMaking1
1.2Applications2
1.3Methods5
1.4History7
1.5SocietalImpact12
1.6Overview14
partiprobabilisticreasoning
2Representation19
2.1DegreesofBeliefandProbability19
2.2ProbabilityDistributions20
2.3JointDistributions24
2.4ConditionalDistributions29
2.5BayesianNetworks32
2.6ConditionalIndependence35
2.7Summary36
2.8Exercises38
www.dbooks.org
viiicontents
3Inference43
3.1InferenceinBayesianNetworks43
3.2InferenceinNaiveBayesModels48
3.3Sum-ProductVariableElimination49
3.4BeliefPropagation53
3.5ComputationalComplexity53
3.6DirectSampling54
3.7LikelihoodWeightedSampling57
3.8GibbsSampling60
3.9InferenceinGaussianModels63
3.10Summary65
3.11Exercises66
4ParameterLearning71
4.1MaximumLikelihoodParameterLearning71
4.2BayesianParameterLearning75
4.3NonparametricLearning82
4.4LearningwithMissingData82
4.5Summary89
4.6Exercises89
5StructureLearning97
5.1BayesianNetworkScoring97
5.2DirectedGraphSearch99
5.3MarkovEquivalenceClasses103
5.4PartiallyDirectedGraphSearch104
5.5Summary106
5.6Exercises107
6SimpleDecisions111
6.1ConstraintsonRationalPreferences111
6.2UtilityFunctions112
6.3UtilityElicitation114
6.4MaximumExpectedUtilityPrinciple116
6.5DecisionNetworks116
6.6ValueofInformation119
6.7Irrationality122
6.8Summary125
6.9Exercises125
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
contentsix
partiisequentialproblems
7ExactSolutionMethods133
7.1MarkovDecisionProcesses133
7.2PolicyEvaluation136
7.3ValueFunctionPolicies139
7.4PolicyIteration140
7.5ValueIteration141
7.6AsynchronousValueIteration145
7.7LinearProgramFormulation147
7.8LinearSystemswithQuadraticReward147
7.9Summary150
7.10Exercises151
8ApproximateValueFunctions161
8.1ParametricRepresentations161
8.2NearestNeighbor163
8.3KernelSmoothing164
8.4LinearInterpolation167
8.5SimplexInterpolation168
8.6LinearRegression172
8.7NeuralNetworkRegression174
8.8Summary175
8.9Exercises177
9OnlinePlanning181
9.1RecedingHorizonPlanning181
9.2LookaheadwithRollouts183
9.3ForwardSearch183
9.4BranchandBound185
9.5SparseSampling187
9.6MonteCarloTreeSearch187
9.7HeuristicSearch197
9.8LabeledHeuristicSearch197
9.9Open-LoopPlanning200
9.10Summary208
9.11Exercises209
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
xcontents
10PolicySearch213
10.1ApproximatePolicyEvaluation213
10.2LocalSearch215
10.3GeneticAlgorithms215
10.4CrossEntropyMethod218
10.5EvolutionStrategies219
10.6IsotropicEvolutionaryStrategies224
10.7Summary226
10.8Exercises226
11PolicyGradientEstimation231
11.1FiniteDierence231
11.2RegressionGradient234
11.3LikelihoodRatio234
11.4Reward-to-Go237
11.5BaselineSubtraction241
11.6Summary245
11.7Exercises246
12PolicyGradientOptimization249
12.1GradientAscentUpdate249
12.2RestrictedGradientUpdate251
12.3NaturalGradientUpdate253
12.4TrustRegionUpdate254
12.5ClampedSurrogateObjective257
12.6Summary263
12.7Exercises264
13Actor-CriticMethods267
13.1Actor-Critic267
13.2GeneralizedAdvantageEstimation269
13.3DeterministicPolicyGradient272
13.4Actor-CriticwithMonteCarloTreeSearch274
13.5Summary277
13.6Exercises277
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
contentsxi
14PolicyValidation281
14.1PerformanceMetricEvaluation281
14.2RareEventSimulation285
14.3RobustnessAnalysis288
14.4TradeAnalysis289
14.5AdversarialAnalysis291
14.6Summary295
14.7Exercises295
partiiimodeluncertainty
15ExplorationandExploitation299
15.1BanditProblems299
15.2BayesianModelEstimation301
15.3UndirectedExplorationStrategies301
15.4DirectedExplorationStrategies303
15.5OptimalExplorationStrategies306
15.6ExplorationwithMultipleStates309
15.7Summary309
15.8Exercises311
16Model-BasedMethods317
16.1MaximumLikelihoodModels317
16.2UpdateSchemes318
16.3Exploration321
16.4BayesianMethods326
16.5Bayes-AdaptiveMarkovDecisionProcesses329
16.6PosteriorSampling330
16.7Summary332
16.8Exercises332
17Model-FreeMethods335
17.1IncrementalEstimationoftheMean335
17.2Q-Learning336
17.3Sarsa338
17.4EligibilityTraces341
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
xiicontents
17.5RewardShaping343
17.6ActionValueFunctionApproximation343
17.7ExperienceReplay345
17.8Summary348
17.9Exercises351
18ImitationLearning355
18.1BehavioralCloning355
18.2DataSetAggregation358
18.3StochasticMixingIterativeLearning358
18.4MaximumMarginInverseReinforcementLearning361
18.5MaximumEntropyInverseReinforcementLearning365
18.6GenerativeAdversarialImitationLearning369
18.7Summary371
18.8Exercises372
partivstateuncertainty
19Beliefs379
19.1BeliefInitialization379
19.2DiscreteStateFilter380
19.3KalmanFilter383
19.4ExtendedKalmanFilter385
19.5UnscentedKalmanFilter387
19.6ParticleFilter390
19.7ParticleInjection394
19.8Summary395
19.9Exercises397
20ExactBeliefStatePlanning407
20.1Belief-StateMarkovDecisionProcesses407
20.2ConditionalPlans408
20.3AlphaVectors411
20.4Pruning412
20.5ValueIteration416
20.6LinearPolicies419
20.7Summary419
20.8Exercises422
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
contentsxiii
21OineBeliefStatePlanning427
21.1FullyObservableValueApproximation427
21.2FastInformedBound429
21.3FastLowerBounds430
21.4Point-BasedValueIteration431
21.5RandomizedPoint-BasedValueIteration433
21.6SawtoothUpperBound436
21.7PointSelection440
21.8SawtoothHeuristicSearch442
21.9TriangulatedValueFunctions445
21.10Summary447
21.11Exercises448
22OnlineBeliefStatePlanning453
22.1LookaheadwithRollouts453
22.2ForwardSearch453
22.3BranchandBound456
22.4SparseSampling456
22.5MonteCarloTreeSearch457
22.6DeterminizedSparseTreeSearch459
22.7GapHeuristicSearch460
22.8Summary464
22.9Exercises467
23ControllerAbstractions471
23.1Controllers471
23.2PolicyIteration475
23.3NonlinearProgramming478
23.4GradientAscent481
23.5Summary486
23.6Exercises486
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
xivcontents
partvmultiagentsystems
24MultiagentReasoning493
24.1SimpleGames493
24.2ResponseModels494
24.3DominantStrategyEquilibrium497
24.4NashEquilibrium498
24.5CorrelatedEquilibrium498
24.6IteratedBestResponse503
24.7HierarchicalSoftmax504
24.8FictitiousPlay505
24.9GradientAscent509
24.10Summary509
24.11Exercises511
25SequentialProblems517
25.1MarkovGames517
25.2ResponseModels519
25.3NashEquilibrium520
25.4FictitiousPlay521
25.5GradientAscent526
25.6NashQ-Learning526
25.7Summary528
25.8Exercises530
26StateUncertainty533
26.1PartiallyObservableMarkovGames533
26.2PolicyEvaluation535
26.3NashEquilibrium537
26.4DynamicProgramming540
26.5Summary542
26.6Exercises542
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
contentsxv
27CollaborativeAgents545
27.1DecentralizedPartiallyObservableMarkovDecisionProcesses545
27.2Subclasses546
27.3DynamicProgramming549
27.4IteratedBestResponse550
27.5HeuristicSearch550
27.6NonlinearProgramming551
27.7Summary554
27.8Exercises556
appendices
AMathematicalConcepts561
A.1MeasureSpaces561
A.2ProbabilitySpaces562
A.3MetricSpaces562
A.4NormedVectorSpaces562
A.5PositiveDeniteness564
A.6Convexity564
A.7InformationContent565
A.8Entropy566
A.9CrossEntropy566
A.10RelativeEntropy567
A.11GradientAscent567
A.12TaylorExpansion568
A.13MonteCarloEstimation569
A.14ImportanceSampling570
A.15ContractionMappings570
A.16Graphs572
BProbabilityDistributions573
CComputationalComplexity575
C.1AsymptoticNotation575
C.2TimeComplexityClasses577
C.3SpaceComplexityClasses577
C.4Decidability579
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
xvicontents
DNeuralRepresentations581
D.1NeuralNetworks581
D.2FeedforwardNetworks582
D.3ParameterRegularization585
D.4ConvolutionalNeuralNetworks587
D.5RecurrentNetworks588
D.6AutoencoderNetworks592
D.7AdversarialNetworks594
ESearchAlgorithms599
E.1SearchProblems599
E.2SearchGraphs600
E.3ForwardSearch600
E.4BranchandBound601
E.5DynamicProgramming604
E.6HeuristicSearch604
FProblems609
F.1HexWorld609
F.22048610
F.3Cart-Pole611
F.4MountainCar612
F.5SimpleRegulator613
F.6AircraftCollisionAvoidance614
F.7CryingBaby615
F.8MachineReplacement617
F.9Catch619
F.10Prisoner’sDilemma621
F.11Rock-Paper-Scissors621
F.12Traveler’sDilemma622
F.13Predator-PreyHexWorld623
F.14MulticaregiverCryingBaby624
F.15CollaborativePredator-PreyHexWorld625
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com