AlgorithmsforDecisionMaking
www.dbooks.org
AlgorithmsforDecisionMaking
MykelJ.Kochenderfer
TimA.Wheeler
KyleH.Wray
TheMITPress
Cambridge,Massachusetts
London,England
www.dbooks.org
©2022MassachusettsInstituteofTechnology
ThisworkissubjecttoaCreativeCommonsCC-BY-NC-NDlicense.Subjecttosuchlicense,allrightsare
reserved.
TheMITPresswouldliketothanktheanonymouspeerreviewerswhoprovidedcommentsondraftsofthis
book.Thegenerousworkofacademicexpertsisessentialforestablishingtheauthorityandqualityofour
publications.Weacknowledgewithgratitudethecontributionsoftheseotherwiseuncreditedreaders.
ThisbookwassetinT
E
XGyrePagellabytheauthorsinL
A
T
E
X.
PrintedandboundintheUnitedStatesofAmerica.
LibraryofCongressCataloging-in-PublicationData
Names:Kochenderfer,MykelJ.,1980–author.|Wheeler,TimA.(TimAllan),author.|Wray,KyleH.,author.
Title:Algorithmsfordecisionmaking/MykelJ.Kochenderfer,TimA.Wheeler,KyleH.Wray.
Description:Cambridge:MassachusettsInstituteofTechnology,[2022]|
Includesbibliographicalreferencesandindex.
Identiers:LCCN2021038701|ISBN9780262047012(hardcover)
Subjects:LCSH:Decisionsupportsystems–Mathematics.|Algorithms.
Classication:LCCT58.62.K6662022|DDC658.4/03—dc23
LCrecordavailableathttps://lccn.loc.gov/2021038701
10987654321
Toourfamilies
www.dbooks.org
Contents
Prefacexix
Acknowledgmentsxxi
1Introduction1
1.1DecisionMaking1
1.2Applications2
1.3Methods5
1.4History7
1.5SocietalImpact12
1.6Overview14
partiprobabilisticreasoning
2Representation19
2.1DegreesofBeliefandProbability19
2.2ProbabilityDistributions20
2.3JointDistributions24
2.4ConditionalDistributions29
2.5BayesianNetworks32
2.6ConditionalIndependence35
2.7Summary36
2.8Exercises38
www.dbooks.org
viiicontents
3Inference43
3.1InferenceinBayesianNetworks43
3.2InferenceinNaiveBayesModels48
3.3Sum-ProductVariableElimination49
3.4BeliefPropagation53
3.5ComputationalComplexity53
3.6DirectSampling54
3.7LikelihoodWeightedSampling57
3.8GibbsSampling60
3.9InferenceinGaussianModels63
3.10Summary65
3.11Exercises66
4ParameterLearning71
4.1MaximumLikelihoodParameterLearning71
4.2BayesianParameterLearning75
4.3NonparametricLearning82
4.4LearningwithMissingData82
4.5Summary89
4.6Exercises89
5StructureLearning97
5.1BayesianNetworkScoring97
5.2DirectedGraphSearch99
5.3MarkovEquivalenceClasses103
5.4PartiallyDirectedGraphSearch104
5.5Summary106
5.6Exercises107
6SimpleDecisions111
6.1ConstraintsonRationalPreferences111
6.2UtilityFunctions112
6.3UtilityElicitation114
6.4MaximumExpectedUtilityPrinciple116
6.5DecisionNetworks116
6.6ValueofInformation119
6.7Irrationality122
6.8Summary125
6.9Exercises125
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
contentsix
partiisequentialproblems
7ExactSolutionMethods133
7.1MarkovDecisionProcesses133
7.2PolicyEvaluation136
7.3ValueFunctionPolicies139
7.4PolicyIteration140
7.5ValueIteration141
7.6AsynchronousValueIteration145
7.7LinearProgramFormulation147
7.8LinearSystemswithQuadraticReward147
7.9Summary150
7.10Exercises151
8ApproximateValueFunctions161
8.1ParametricRepresentations161
8.2NearestNeighbor163
8.3KernelSmoothing164
8.4LinearInterpolation167
8.5SimplexInterpolation168
8.6LinearRegression172
8.7NeuralNetworkRegression174
8.8Summary175
8.9Exercises177
9OnlinePlanning181
9.1RecedingHorizonPlanning181
9.2LookaheadwithRollouts183
9.3ForwardSearch183
9.4BranchandBound185
9.5SparseSampling187
9.6MonteCarloTreeSearch187
9.7HeuristicSearch197
9.8LabeledHeuristicSearch197
9.9Open-LoopPlanning200
9.10Summary208
9.11Exercises209
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
xcontents
10PolicySearch213
10.1ApproximatePolicyEvaluation213
10.2LocalSearch215
10.3GeneticAlgorithms215
10.4CrossEntropyMethod218
10.5EvolutionStrategies219
10.6IsotropicEvolutionaryStrategies224
10.7Summary226
10.8Exercises226
11PolicyGradientEstimation231
11.1FiniteDierence231
11.2RegressionGradient234
11.3LikelihoodRatio234
11.4Reward-to-Go237
11.5BaselineSubtraction241
11.6Summary245
11.7Exercises246
12PolicyGradientOptimization249
12.1GradientAscentUpdate249
12.2RestrictedGradientUpdate251
12.3NaturalGradientUpdate253
12.4TrustRegionUpdate254
12.5ClampedSurrogateObjective257
12.6Summary263
12.7Exercises264
13Actor-CriticMethods267
13.1Actor-Critic267
13.2GeneralizedAdvantageEstimation269
13.3DeterministicPolicyGradient272
13.4Actor-CriticwithMonteCarloTreeSearch274
13.5Summary277
13.6Exercises277
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
contentsxi
14PolicyValidation281
14.1PerformanceMetricEvaluation281
14.2RareEventSimulation285
14.3RobustnessAnalysis288
14.4TradeAnalysis289
14.5AdversarialAnalysis291
14.6Summary295
14.7Exercises295
partiiimodeluncertainty
15ExplorationandExploitation299
15.1BanditProblems299
15.2BayesianModelEstimation301
15.3UndirectedExplorationStrategies301
15.4DirectedExplorationStrategies303
15.5OptimalExplorationStrategies306
15.6ExplorationwithMultipleStates309
15.7Summary309
15.8Exercises311
16Model-BasedMethods317
16.1MaximumLikelihoodModels317
16.2UpdateSchemes318
16.3Exploration321
16.4BayesianMethods326
16.5Bayes-AdaptiveMarkovDecisionProcesses329
16.6PosteriorSampling330
16.7Summary332
16.8Exercises332
17Model-FreeMethods335
17.1IncrementalEstimationoftheMean335
17.2Q-Learning336
17.3Sarsa338
17.4EligibilityTraces341
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
xiicontents
17.5RewardShaping343
17.6ActionValueFunctionApproximation343
17.7ExperienceReplay345
17.8Summary348
17.9Exercises351
18ImitationLearning355
18.1BehavioralCloning355
18.2DataSetAggregation358
18.3StochasticMixingIterativeLearning358
18.4MaximumMarginInverseReinforcementLearning361
18.5MaximumEntropyInverseReinforcementLearning365
18.6GenerativeAdversarialImitationLearning369
18.7Summary371
18.8Exercises372
partivstateuncertainty
19Beliefs379
19.1BeliefInitialization379
19.2DiscreteStateFilter380
19.3KalmanFilter383
19.4ExtendedKalmanFilter385
19.5UnscentedKalmanFilter387
19.6ParticleFilter390
19.7ParticleInjection394
19.8Summary395
19.9Exercises397
20ExactBeliefStatePlanning407
20.1Belief-StateMarkovDecisionProcesses407
20.2ConditionalPlans408
20.3AlphaVectors411
20.4Pruning412
20.5ValueIteration416
20.6LinearPolicies419
20.7Summary419
20.8Exercises422
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
contentsxiii
21OineBeliefStatePlanning427
21.1FullyObservableValueApproximation427
21.2FastInformedBound429
21.3FastLowerBounds430
21.4Point-BasedValueIteration431
21.5RandomizedPoint-BasedValueIteration433
21.6SawtoothUpperBound436
21.7PointSelection440
21.8SawtoothHeuristicSearch442
21.9TriangulatedValueFunctions445
21.10Summary447
21.11Exercises448
22OnlineBeliefStatePlanning453
22.1LookaheadwithRollouts453
22.2ForwardSearch453
22.3BranchandBound456
22.4SparseSampling456
22.5MonteCarloTreeSearch457
22.6DeterminizedSparseTreeSearch459
22.7GapHeuristicSearch460
22.8Summary464
22.9Exercises467
23ControllerAbstractions471
23.1Controllers471
23.2PolicyIteration475
23.3NonlinearProgramming478
23.4GradientAscent481
23.5Summary486
23.6Exercises486
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
xivcontents
partvmultiagentsystems
24MultiagentReasoning493
24.1SimpleGames493
24.2ResponseModels494
24.3DominantStrategyEquilibrium497
24.4NashEquilibrium498
24.5CorrelatedEquilibrium498
24.6IteratedBestResponse503
24.7HierarchicalSoftmax504
24.8FictitiousPlay505
24.9GradientAscent509
24.10Summary509
24.11Exercises511
25SequentialProblems517
25.1MarkovGames517
25.2ResponseModels519
25.3NashEquilibrium520
25.4FictitiousPlay521
25.5GradientAscent526
25.6NashQ-Learning526
25.7Summary528
25.8Exercises530
26StateUncertainty533
26.1PartiallyObservableMarkovGames533
26.2PolicyEvaluation535
26.3NashEquilibrium537
26.4DynamicProgramming540
26.5Summary542
26.6Exercises542
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
contentsxv
27CollaborativeAgents545
27.1DecentralizedPartiallyObservableMarkovDecisionProcesses545
27.2Subclasses546
27.3DynamicProgramming549
27.4IteratedBestResponse550
27.5HeuristicSearch550
27.6NonlinearProgramming551
27.7Summary554
27.8Exercises556
appendices
AMathematicalConcepts561
A.1MeasureSpaces561
A.2ProbabilitySpaces562
A.3MetricSpaces562
A.4NormedVectorSpaces562
A.5PositiveDeniteness564
A.6Convexity564
A.7InformationContent565
A.8Entropy566
A.9CrossEntropy566
A.10RelativeEntropy567
A.11GradientAscent567
A.12TaylorExpansion568
A.13MonteCarloEstimation569
A.14ImportanceSampling570
A.15ContractionMappings570
A.16Graphs572
BProbabilityDistributions573
CComputationalComplexity575
C.1AsymptoticNotation575
C.2TimeComplexityClasses577
C.3SpaceComplexityClasses577
C.4Decidability579
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
xvicontents
DNeuralRepresentations581
D.1NeuralNetworks581
D.2FeedforwardNetworks582
D.3ParameterRegularization585
D.4ConvolutionalNeuralNetworks587
D.5RecurrentNetworks588
D.6AutoencoderNetworks592
D.7AdversarialNetworks594
ESearchAlgorithms599
E.1SearchProblems599
E.2SearchGraphs600
E.3ForwardSearch600
E.4BranchandBound601
E.5DynamicProgramming604
E.6HeuristicSearch604
FProblems609
F.1HexWorld609
F.22048610
F.3Cart-Pole611
F.4MountainCar612
F.5SimpleRegulator613
F.6AircraftCollisionAvoidance614
F.7CryingBaby615
F.8MachineReplacement617
F.9Catch619
F.10Prisoner’sDilemma621
F.11Rock-Paper-Scissors621
F.12Traveler’sDilemma622
F.13Predator-PreyHexWorld623
F.14MulticaregiverCryingBaby624
F.15CollaborativePredator-PreyHexWorld625
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
contentsxvii
GJulia627
G.1Types627
G.2Functions640
G.3ControlFlow643
G.4Packages645
G.5ConvenienceFunctions648
References651
Index671
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
Preface
Thisbookprovidesabroadintroductiontoalgorithmsfordecisionmakingunder
uncertainty.Wecoverawidevarietyoftopicsrelatedtodecisionmaking,intro-
ducingtheunderlyingmathematicalproblemformulationsandthealgorithms
forsolvingthem.Figures,examples,andexercisesareprovidedtoconveythe
intuitionbehindthevariousapproaches.
Thisbookisintendedforadvancedundergraduatesandgraduatestudents,as
wellasprofessionals.Itrequiressomemathematicalmaturityandassumesprior
exposuretomultivariablecalculus,linearalgebra,andprobabilityconcepts.Some
reviewmaterialisprovidedintheappendices.Disciplineswherethebookwould
beespeciallyusefulincludemathematics,statistics,computerscience,aerospace,
electricalengineering,andoperationsresearch.
Fundamentaltothistextbookarethealgorithms,whichareallimplemented
intheJuliaprogramminglanguage.Wehavefoundthislanguagetobeidealfor
specifyingalgorithmsinhuman-readableform.Thepriorityinthedesignofthe
algorithmicimplementationswasinterpretabilityratherthaneciency.Indus-
trialapplications,forexample,maybenetfromalternativeimplementations.
Permissionisgranted,freeofcharge,tousethecodesnippetsassociatedwith
thisbook,subjecttotheconditionthatthesourceofthecodeisacknowledged.
MykelJ.Kochenderfer
TimA.Wheeler
KyleH.Wray
Stanford,California
February28,2022
www.dbooks.org
Acknowledgments
Thistextbookhasgrownfromacourseondecisionmakingunderuncertainty
taughtatStanford.Wearegratefultothestudentsandteachingassistantswho
havehelpedshapethecourseoverthepastsixyears.
Theauthorswishtothankthemanyindividualswhohaveprovidedvalu-
ablefeedbackonearlydraftsofourmanuscript,includingDylanAsmar,Drew
Bagnell,SafaBakhshi,EdwardBalaban,JeanBetterton,RaunakBhattacharyya,
KelseyBing,MaximeBouton,AustinChan,SimonChauvin,ShushmanChoud-
hury,JonCox,MatthewDaly,VictoriaDax,RichardDewey,DeaDressel,Ben
Duprey,TorsteinEliassen,JohannesFischer,RushilGoradia,JayeshGupta,Arec
Jamgochian,RohanKapre,MarkKoren,LiamKruse,TorLattimore,Bernard
Lange,RitchieLee,ShengLi,MichaelLittman,RobertMoss,JoshuaOtt,Oriana
Peltzer,FrancescoPiccoli,JereySarno,MarcSchlichting,RansaluSenanayake,
ChelseaSidrane,ChrisStrong,ZachSunberg,AbiyTeshome,AlexandrosTzikas,
KemalUre,JoshWol,AnılYıldız,andZongzhangZhang.Wealsowouldlike
tothankSydneyKatz,KunalMenda,andAyanMukhopadhyayfortheircon-
tributionstothediscussioninchapter1.RossAlexanderproducedmanyofthe
exercisesthroughoutthebook.IthasbeenapleasureworkingwithElizabeth
SwayzefromtheMITPressinpreparingthismanuscriptforpublication.
ThestyleofthisbookwasinspiredbyEdwardTufte.Amongotherstylistic
elements,weadoptedhiswidemarginsanduseofsmallmultiples.Thetype-
settingofthisbookisbasedontheTufte-LaTeXpackagebyKevinGodby,Bil
Kleb,andBillWood.Thebook’scolorschemewasadaptedfromtheMonokai
themebyJonSkinnerofSublimeText(sublimetext.com)andapalettethatbetter
accommodatesindividualswithcolorblindness.
1
Forplots,weusetheviridis
1
B.Wong,“PointsofView:Color
Blindness,”NatureMethods,vol.8,
no.6,pp.441–442,2011.
colormapdenedbyStéfanvanderWaltandNathanielSmith.
www.dbooks.org
xxiiacknowledgments
Wehavealsobenetedfromthevariousopen-sourcepackagesonwhichthis
textbookdepends(seeappendixG).Thetypesettingofthecodewasdonewith
thehelpofpythontex,whichismaintainedbyGeoreyPoore.Thetypefaceused
forthealgorithmsisJuliaMono(github.com/cormullion/juliamono).Theplotting
washandledbypgfplots,whichismaintainedbyChristianFeuersänger.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
1Introduction
Manyimportantproblemsinvolvedecisionmakingunderuncertainty,including
aircraftcollisionavoidance,wildremanagement,anddisasterresponse.When
designingautomateddecision-makingsystemsordecision-supportsystems,itis
importanttoaccountforvarioussourcesofuncertaintywhilecarefullybalanc-
ingmultipleobjectives.Wewilldiscussthesechallengesfromacomputational
perspective,aimingtoprovidethetheorybehinddecision-makingmodelsand
computationalapproaches.Thischapterintroducestheproblemofdecisionmak-
ingunderuncertainty,providessomeexamplesofapplications,andoutlines
thespaceofcomputationalapproaches.Itthensummarizeshowvariousdisci-
plineshavecontributedtoourunderstandingofintelligentdecisionmakingand
highlightsareasofpotentialsocietalimpact.Weconcludewithanoutlineofthe
remainderofthebook.
1.1DecisionMaking
Anagentisanentitythatactsbasedonobservationsofitsenvironment.Agents
maybephysicalentities,likehumansorrobots,ortheymaybenonphysicalenti-
ties,suchasdecisionsupportsystemsthatareimplementedentirelyinsoftware.
Asshowningure1.1,theinteractionbetweentheagentandtheenvironment
followsanobserve-actcycleorloop.
Theagentattime
t
receivesanobservationoftheenvironment,denotedas
o
t
.
Observationsmaybemade,forexample,throughabiologicalsensoryprocess,
asinhumans,orbyasensorsystem,likeradarinanairtraccontrolsystem.
Observationsareoftenincompleteornoisy;humansmaynotseeanapproaching
aircraftoraradarsystemmightmissadetectionduetoelectromagneticinterfer-
ence.Theagentthenchoosesanaction
a
t
throughsomedecision-makingprocess.
www.dbooks.org
2chapter1.introduction
Environment
Agent
Observation(o
t
)
Action(a
t
)
Figure1.1.Interactionbetweenan
agentanditsenvironment.
Thisaction,suchassoundinganalert,mayhaveanondeterministiceectonthe
environment.
Ourfocusisonagentsthatinteractintelligentlytoachievetheirobjectivesover
time.Giventhepastsequenceofobservations,
o
1
, . . . , o
t
,andknowledgeofthe
environment,theagentmustchooseanaction
a
t
thatbestachievesitsobjectives
inthepresenceofvarioussourcesofuncertainty,
1
includingthefollowing:
1
We focushereon discretetime
problems.Continuoustimeprob-
lems arestudiedin the eldof
controltheory.SeeD.E.Kirk,Opti-
malControlTheory:AnIntroduction.
Prentice-Hall,1970.
outcomeuncertainty,wheretheeectsofouractionsareuncertain,
modeluncertainty,whereourmodeloftheproblemisuncertain,
stateuncertainty,wherethetruestateoftheenvironmentisuncertain,and
interactionuncertainty,wherethebehavioroftheotheragentsinteractinginthe
environmentisuncertain.
Thisbookisorganizedaroundthesefoursourcesofuncertainty.Makingdecisions
inthepresenceofuncertaintyiscentraltotheeldofarticialintelligence,
2
as
2
Acomprehensiveintroductionto
articial intelligenceis provided
byS.RussellandP.Norvig,Arti-
cialIntelligence:AModernApproach,
4thed.Pearson,2021.
wellasmanyotherelds,asoutlinedinsection1.4.Wewilldiscussavarietyof
algorithms,ordescriptionsofcomputationalprocesses,formakingdecisionsthat
arerobusttouncertainty.
1.2Applications
Thedecision-makingframeworkpresentedintheprevioussectioncanbeapplied
toawidevarietyofdomains.Thissectiondiscussesafewconceptualexamples
withreal-worldapplications.AppendixFoutlinesadditionalnotionalproblems
thatareusedthroughoutthistexttodemonstratethealgorithmswediscuss.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
1.2.applications3
1.2.1AircraftCollisionAvoidance
Tohelppreventmidaircollisionsbetweenaircraft,wewanttodesignasystem
thatcanalertpilotstopotentialthreatsanddirectthemhowtomaneuverto
avoidthem.
3
Thesystemcommunicateswiththetranspondersofotheraircraft
3
Thisapplicationisdiscussedin
achaptertitled‘‘CollisionAvoid-
ance’’byM.J.Kochenderfer,De-
cision Making Under Uncertainty:
TheoryandApplication.MITPress,
2015.
toidentifytheirpositionswithsomedegreeofaccuracy.Decidingwhatguidance
toprovidetothepilotsischallenging.Thereisuncertaintyinhowquicklythe
pilotswillrespondandhowaggressivelytheywillcomplywiththeguidance.
Inaddition,thereisuncertaintyinthebehaviorofotheraircraft.Wewantour
systemtoalertsucientlyearlytoprovideenoughtimeforpilotstomaneuver
theiraircrafttoavoidcollisions,butwedonotwantoursystemtoissuealertstoo
early,whichwouldresultinmanyunnecessarymaneuvers.Sincethissystemisto
beusedcontinuouslyworldwide,weneedthesystemtoprovideanexceptional
levelofsafety.
1.2.2AutomatedDriving
Wewanttobuildanautonomousvehiclethatcansafelydriveinurbanenviron-
ments.
4
Thevehiclemustrelyonasuiteofsensorstoperceiveitsenvironmentin
4
Asimilarapplicationwasex-
ploredbyM.Bouton,A.Nakhaei,
K.Fujimura,andM.J.Kochender-
fer,“SafeReinforcementLearning
withSceneDecompositionforNav-
igatingComplexUrbanEnviron-
ments,”inIEEEIntelligentVehicles
Symposium(IV),2019.
ordertomakesafedecisions.Onetypeofsensorislidar,whichinvolvesmeasuring
laserreectionsotheenvironmenttodeterminedistancestoobstacles.Another
typeofsensorisacamera,which,throughcomputervisionalgorithms,candetect
pedestriansandothervehicles.Bothofthesetypesofsensorsareimperfectand
susceptibletonoiseandocclusions.Forexample,aparkedtruckmayoccludea
pedestrianwhomaybetryingtocrossatacrosswalk.Oursystemmustpredictthe
intentionsandfuturepathsofothervehicles,pedestrians,andotherroadusers
fromtheirobservablebehaviorsinordertonavigatesafelytoourdestination.
1.2.3BreastCancerScreening
Worldwide,breastcanceristhemostcommoncancerinwomen.Detectingbreast
cancerearlycanhelpsavelives,withmammographybeingthemosteective
screeningtoolavailable.However,mammographycarrieswithitpotentialrisks,
includingfalsepositives,whichcanresultinunnecessaryandinvasivediagnos-
ticfollow-up.Researchovertheyearshasresultedinvariouspopulation-based
screeningschedulesbasedonagetobalancethebenetsandrisksoftesting.
Developingasystemthatcanmakerecommendationsbasedonpersonalrisk
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
4chapter1.introduction
characteristicsandscreeninghistoryhasthepotentialtoresultinbetterhealth
outcomes.
5
Thesuccessofsuchasystemcanbecomparedtopopulationwide
5
SuchaconceptisproposedbyT.
Ayer,O.Alagoz,andN.K.Stout,
“APOMDPApproachtoPersonal-
izeMammographyScreeningDeci-
sions,”OperationsResearch,vol.60,
no.5,pp.1019–1034,2012.
screeningschedulesintermsoftotalexpectedquality-adjustedlifeyears,thenum-
berofmammograms,theprevalenceoffalsepositives,andtheriskofundetected,
invasivecancer.
1.2.4FinancialConsumptionandPortfolioAllocation
Supposethatwewanttobuildasystemthatrecommendshowmuchofan
individual’swealthshouldbeconsumedthatyearandhowmuchshouldbe
invested.
6
Theinvestmentportfoliomayincludestocksandbondswithdierent
6
Arelatedproblemwasstudied
byR.C.Merton,“OptimumCon-
sumptionandPortfolioRulesina
Continuous-TimeModel,”Journal
ofEconomicTheory, vol. 3,no. 4,
pp.373–413,1971.
levelsofriskandexpectedreturn.Theevolutionofwealthisstochasticdueto
uncertaintyinbothearnedandinvestmentincome,oftenincreasinguntilthe
investorisnearretirement,andthensteadilydecreasing.Theenjoymentthat
comesfromtheconsumptionofaunitofwealthinayeartypicallydiminishes
withtheamountconsumed,resultinginadesiretosmoothconsumptionover
thelifespanoftheindividual.
1.2.5DistributedWildreSurveillance
Situationalawarenessisamajorchallengewhenghtingwildres.Thestateofa
reevolvesovertime,inuencedbyfactorssuchaswindandthedistribution
offuelintheenvironment.Manywildresspanlargegeographicregions.One
conceptformonitoringawildreistouseateamofdronesequippedwith
sensorstoyaboveit.
7
Thesensingrangeofindividualdronesislimited,but
7
Thisapplicationwasexploredby
K.D.JulianandM.J.Kochender-
fer,“DistributedWildreSurveil-
lancewithAutonomousAircraft
UsingDeepReinforcementLearn-
ing,”AIAAJournalofGuidance,Con-
trol,andDynamics,vol.42,no.8,
pp.1768–1778,2019.
theinformationfromtheteamcanbefusedtoprovideauniedsnapshotofthe
situationtodriveresourceallocationdecisions.Wewouldliketheteammembers
toautonomouslydeterminehowtocollaboratewitheachothertoprovidethebest
coverageofthere.Eectivemonitoringrequiresdecidinghowtomaneuverto
coverareaswherenewsensorinformationislikelytobeuseful;spendingtimein
areaswherewearecertainofwhetherthereisburningornotwouldbewasteful.
Identifyingimportantareastoexplorerequiresreasoningaboutthestochastic
evolutionofthere,givenonlyimperfectknowledgeofitscurrentstate.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
1.3.methods5
1.2.6MarsScienceExploration
Rovershavemadeimportantdiscoveriesonandincreasedourunderstanding
ofMars.However,amajorbottleneckinscienticexplorationhasbeenthecom-
municationlinkbetweentheroverandtheoperationsteamonEarth.Itcantake
asmuchashalfanhourforsensorinformationtobesentfromMarstoEarth
andforcommandstobesentfromEarthtoMars.Inaddition,guidancetorovers
needstobeplannedinadvancebecausetherearelimiteduploadanddownload
windowswithMarsduetothepositionsoforbitersservingasinformationrelays
betweentheplanets.Recentresearchhassuggestedthattheeciencyofscience
explorationmissionscanbeimprovedbyafactorofvethroughtheintroduction
ofgreaterlevelsofautonomy.
8
Humanoperatorswouldstillprovidehigh-level
8
Thisconceptispresentedand
evaluatedbyD.Gaines,G.Doran,
M.Paton,B.Rothrock,J.Russino,R.
Mackey,R.Anderson,R.Francis,C.
Joswig,H.Justice,K.Kolcio,G.Ra-
bideau,S.Schaer,J.Sawoniewicz,
A.Vasavada,V.Wong,K.Yu,
andA.
-
a.Agha-mohammadi,“Self-
ReliantRoversforIncreasedMis-
sionProductivity,”JournalofField
Robotics,vol.37,no.7,pp.1171–
1196,2020.
guidanceonmissionobjectives,buttheroverwouldhavetheexibilitytoselect
itsownsciencetargetsusingthemostup-to-dateinformation.Inaddition,it
wouldbedesirableforroverstorespondappropriatelytovarioushazardsand
systemfailureswithouthumanintervention.
1.3Methods
Therearemanymethodsfordesigningdecision-makingagents.Dependingon
theapplication,somemaybemoreappropriatethanothers.Theydierinthe
responsibilitiesofthedesignerandthetaskslefttoautomation.Thissection
brieyoverviewsacollectionofthesemethods.Thebookwillfocusprimarily
onplanningandreinforcementlearning,butsomeofthetechniqueswillinvolve
elementsofsupervisedlearningandoptimization.
1.3.1ExplicitProgramming
Themostdirectmethodfordesigningadecision-makingagentistoanticipateall
thescenariosthattheagentmightnditselfinandexplicitlyprogramwhatthe
agentshoulddoinresponsetoeachone.Theexplicitprogrammingapproachmay
workwellforsimpleproblems,butitplacesalargeburdenonthedesignertopro-
videacompletestrategy.Variousagentprogramminglanguagesandframeworks
havebeenproposedtomakeprogrammingagentseasier.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
6chapter1.introduction
1.3.2SupervisedLearning
Withsomeproblems,itmaybeeasiertoshowanagentwhattodoratherthanto
writeaprogramfortheagenttofollow.Thedesignerprovidesasetoftraining
examples,andanautomatedlearningalgorithmmustgeneralizefromtheseexam-
ples.Thisapproachisknownassupervisedlearningandhasbeenwidelyapplied
toclassicationproblems.Thistechniqueissometimescalledbehavioralcloning
whenappliedtolearningmappingsfromobservationstoactions.Behavioral
cloningworkswellwhenanexpertdesigneractuallyknowsthebestcourseof
actionforarepresentativecollectionofsituations.Althoughawidevarietyof
dierentlearningalgorithmsexist,theygenerallycannotperformbetterthan
humandesignersinnewsituations.
1.3.3Optimization
Anotherapproachisforthedesignertospecifythespaceofpossibledecision
strategiesandaperformancemeasuretobemaximized.Evaluatingtheperfor-
manceofadecisionstrategygenerallyinvolvesrunningabatchofsimulations.
Theoptimizationalgorithmthenperformsasearchinthisspacefortheoptimal
strategy.Ifthespaceisrelativelysmallandtheperformancemeasuredoesnot
havemanylocaloptima,thenvariouslocalorglobalsearchmethodsmaybe
appropriate.Althoughknowledgeofadynamicmodelisgenerallyassumedto
runthesimulations,itisnototherwiseusedtoguidethesearch,whichcanbe
importantforcomplexproblems.
1.3.4Planning
Planningisaformofoptimizationthatusesamodeloftheproblemdynamics
tohelpguidethesearch.Abroadbaseofliteratureexploresvariousplanning
problems,muchofitfocusedondeterministicproblems.Forsomeproblems,
itmaybeacceptabletoapproximatethedynamicswithadeterministicmodel.
Assumingadeterministicmodelallowsustousemethodsthatcanmoreeasily
scaletohigh-dimensionalproblems.Forotherproblems,accountingforfuture
uncertaintyiscritical.Thisbookfocusesentirelyonproblemsinwhichaccounting
foruncertaintyisimportant.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
1.4.history7
1.3.5ReinforcementLearning
Reinforcementlearningrelaxestheassumptioninplanningthatamodelisknown
aheadoftime.Instead,thedecision-makingstrategyislearnedwhiletheagent
interactswiththeenvironment.Thedesigneronlyhastoprovideaperformance
measure;itisuptothelearningalgorithmtooptimizethebehavioroftheagent.
Oneoftheinterestingcomplexitiesthatarisesinreinforcementlearningisthatthe
choiceofactionaectsnotonlytheimmediatesuccessoftheagentinachievingits
objectives,butalsotheagent’sabilitytolearnabouttheenvironmentandidentify
thecharacteristicsoftheproblemthatitcanexploit.
1.4History
Thetheoryofautomatingtheprocessofdecisionmakinghasitsrootsinthe
dreamsofearlyphilosophers,scientists,mathematicians,andwriters.Theancient
Greeksbeganincorporatingautomationintomythsandstoriesasearlyas800BC.
ThewordautomatonwasrstusedinHomer’sIliad,whichcontainsreferencesto
thenotionofautomaticmachines,includingmechanicaltripodsusedtoserve
dinnerguests.
9
Intheseventeenthcentury,philosophersproposedtheuseoflogic
9
S.Vasileiadou,D.Kalligeropou-
los,andN.Karcanias, “Systems,
ModellingandControlinAn-
cientGreece:Part1:MythicalAu-
tomata,MeasurementandControl,
vol.36,no.3,pp.76–80,2003.
rulestoautomaticallysettledisagreements.Theirideascreatedthefoundation
formechanizedreasoning.
Beginninginthelateeighteenthcentury,inventorsbegancreatingautomatic
machinestoperformlabor.Inparticular,aseriesofinnovationsinthetextile
industryledtothedevelopmentoftheautomaticloom,whichinturnlaidthe
foundationfortherstfactoryrobots.
10
Intheearlynineteenthcentury,theuseof
10
N.J.Nilsson,TheQuestforArti-
cialIntelligence.CambridgeUniver-
sityPress,2009.
intelligentmachinestoautomatelaborbegantomakeitswayintosciencection
novels.ThewordrobotoriginatedintheCzechwriterKarelČapek’splaytitled
R.U.R.,shortforRossum’sUniversalRobots,aboutmachinesthatcouldperform
workthathumanswouldprefernottodo.Theplayinspiredothersciencection
writerstoincorporaterobotsintotheirwriting.Inthemid-twentiethcentury,the
notablewriterandprofessorIsaacAsimovlaidouthisvisionforroboticsinhis
famousRobotseries.
Amajorchallengeinpracticalimplementationsofautomateddecisionmaking
isaccountingforuncertainty.Evenattheendofthetwentiethcentury,George
Dantzig,mostfamousfordevelopingthesimplexalgorithm,statedin1991:
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
8chapter1.introduction
Inretrospectitisinterestingtonotethattheoriginalproblemthatstartedmyresearch
isstilloutstanding—namelytheproblemofplanningorschedulingdynamically
overtime,particularlyplanningdynamicallyunderuncertainty.Ifsuchaproblem
couldbesuccessfullysolveditcould(eventuallythroughbetterplanning)contribute
tothewell-beingandstabilityoftheworld.
11
11
G.B.Dantzig,“LinearProgram-
ming,”OperationsResearch,vol.50,
no.1,pp.42–47,2002.
Whiledecisionmakingunderuncertaintystillremainsanactiveareaofresearch,
overthepastfewcenturies,researchersandengineershavecomeclosertomak-
ingtheconceptsposedbytheseearlydreamerspossible.Currentstate-of-the-art
decision-makingalgorithmsrelyonaconvergenceofconceptsdevelopedinmulti-
pledisciplines,includingeconomics,psychology,neuroscience,computerscience,
engineering,mathematics,andoperationsresearch.Thissectionhighlightssome
majorcontributionsfromthesedisciplines.Thecross-pollinationbetweendisci-
plineshasledtomanyrecentadvancesandwilllikelycontinuetosupportgrowth
inthefuture.
1.4.1Economics
Economicsrequiresmodelsofhumandecisionmaking.Oneapproachtobuild-
ingsuchmodelsinvolvesutilitytheory,whichwasrstintroducedinthelate
eighteenthcentury.
12
Utilitytheoryprovidesameanstomodelandcomparethe
12
G.J.Stigler,“TheDevelopmentof
UtilityTheory.I,”JournalofPolitical
Economy,vol.58,no.4,pp.307–327,
1950.
desirabilityofvariousoutcomes.Forexample,utilitycanbeusedtocomparethe
desirabilityofmonetaryquantities.IntheTheoryofLegislation,JeremyBentham
summarizedthenonlinearityintheutilityofmoney:
1st.Eachportionofwealthhasacorrespondingportionofhappiness.
2nd.Oftwoindividualswithunequalfortunes,hewhohasthemostwealthhasthe
mosthappiness.
3rd.Theexcessinhappinessofthericherwillnotbesogreatastheexcessofhis
wealth.
13
13
J.Bentham,TheoryofLegislation.
Trübner&Company,1887.
Bycombiningtheconceptofutilitywiththenotionofrationaldecisionmaking,
economistsinthemid-twentiethcenturyestablishedabasisforthemaximum
expectedutilityprinciple.Thisprincipleisakeyconceptbehindthecreationof
autonomousdecision-makingagents.Utilitytheoryalsogaverisetothedevel-
opmentofgametheory,whichattemptstounderstandthebehaviorofmultiple
agentsactinginthepresenceofoneanothertomaximizetheirinterests.
14
14
O.MorgensternandJ.vonNeu-
mann,Theory of Games and Eco-
nomicBehavior.PrincetonUniver-
sityPress,1953.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
1.4.history9
1.4.2Psychology
Psychologistsalsostudyhumandecisionmaking,typicallyfromtheperspective
ofhumanbehavior.Bystudyingthereactionsofanimalstostimuli,psychologists
havebeendevelopingtheoriesoftrial-and-errorlearningsincethenineteenth
century.Researchersnoticedthatanimalstendtomakedecisionsbasedonthe
satisfactionordiscomforttheyexperiencedinprevioussimilarsituations.Russian
psychologistIvanPavlovcombinedthisideawiththeconceptofreinforcement
afterobservingthesalivationpatternsofdogswhenfed.Psychologistsfound
thatapatternofbehaviorcouldbestrengthenedorweakenedusingcontinuous
reinforcementofaparticularstimulus.Inthemid-twentiethcentury,themathe-
maticianandcomputerscientistAlanTuringexpressedthepossibilityofallowing
machinestolearninthesamemanner:
Theorganizationofamachineintoauniversalmachinewouldbemostimpressiveif
thearrangementsofinterferenceinvolveveryfewinputs.Thetrainingofahuman
childdependslargelyonasystemofrewardsandpunishments,andthissuggests
thatitoughttobepossibletocarrythroughtheorganisingwithonlytwointerfering
inputs,onefor‘‘pleasure’’or‘‘reward’’(R)andtheotherfor‘‘pain’’or‘‘punishment’’
(P).
15
15
A.M.Turing,“IntelligentMa-
chinery,NationalPhysicalLabo-
ratory,Report,1948.
Theworkofpsychologistslaidthefoundationfortheeldofreinforcement
learning,acriticaltechniqueusedtoteachagentstomakedecisionsinuncertain
environments.
16
16
R.S.SuttonandA.G.Barto,Rein-
forcementLearning:AnIntroduction,
2nded.MITPress,2018.
1.4.3Neuroscience
Whilepsychologistsstudyhumanbehaviorasithappens,neuroscientistsfocuson
thebiologicalprocessesusedtocreatethebehavior.Attheendofthenineteenth
century,scientistsfoundthatthebrainiscomposedofaninterconnectednetwork
ofneurons,whichisresponsibleforitsabilitytoperceiveandreasonaboutthe
world.ArticialintelligencepioneerNilsNilssondescribestheapplicationof
thesendingstodecisionmakingasfollows:
Becauseitisthebrainofananimalthatisresponsibleforconvertingsensoryin-
formationintoaction,itistobeexpectedthatseveralgoodideascanbefoundin
theworkofneurophysiologistsandneuroanatomistswhostudybrainsandtheir
fundamentalcomponents,neurons.
17
17
N.J.Nilsson,TheQuestforArti-
cialIntelligence.CambridgeUniver-
sityPress,2009.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
10chapter1.introduction
Inthe1940s,researchersrstproposedthatneuronscouldbeconsideredas
individual‘‘logicunits”capableofperformingcomputationaloperationswhen
piecedtogetherintoanetwork.Thisworkservedasabasisforneuralnetworks,
whichareusedintheeldofarticialintelligencetoperformavarietyofcomplex
tasks.
1.4.4ComputerScience
Inthemid-twentiethcentury,computerscientistsbeganformulatingtheproblem
ofintelligentdecisionmakingasaproblemofsymbolicmanipulationthrough
formallogic.ThecomputerprogramLogicTheorist,writteninthemid-twentieth
centurytoperformautomatedreasoning,usedthiswayofthinkingtoprovemath-
ematicaltheorems.HerbertSimon,oneofitsinventors,addressedthesymbolic
natureoftheprogrambyrelatingittothehumanmind:
Weinventedacomputerprogramcapableofthinkingnonnumerically,andthereby
solvedthevenerablemind/bodyproblem,explaininghowasystemcomposedof
mattercanhavethepropertiesofmind.
18
18
QuotedbyJ.Agar,Scienceinthe
20th CenturyandBeyond.Polity,
2012.
Thesesymbolicsystemsreliedheavilyonhumanexpertise.Analternativeap-
proachtointelligence,calledconnectionism,wasinspiredinpartbydevelopments
inneuroscienceandfocusesontheuseofarticialneuralnetworksasasubstrate
forintelligence.Withtheknowledgethatneuralnetworkscouldbetrainedfor
patternrecognition,connectionistsattempttolearnintelligentbehaviorfromdata
orexperienceratherthanthehard-codedknowledgeofexperts.Theconnection-
istparadigmunderpinnedthesuccessofAlphaGo,theautonomousprogram
thatbeatahumanprofessionalatthegameofGo,aswellasmuchofthedevel-
opmentofautonomousvehicles.Algorithmsthatcombinebothsymbolicand
connectionistparadigmsremainanactiveareaofresearchtoday.
1.4.5Engineering
Theeldofengineeringhasfocusedonallowingphysicalsystems,suchasrobots,
tomakeintelligentdecisions.World-renownedroboticistSebastianThrunde-
scribesthecomponentsofthesesystemsasfollows:
Roboticssystemshaveincommonthattheyaresituatedinthephysicalworld,
perceivetheirenvironmentsthroughsensors,andmanipulatetheirenvironment
throughthingsthatmove.
19
19
S. Thrun, “Probabilistic Robot-
ics,”CommunicationsoftheACM,
vol.45,no.3,pp.52–57,2002.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
1.4.history11
Todesignthesesystems,engineersmustaddressperception,planning,andacting.
Physicalsystemsperceivetheworldbyusingtheirsensorstocreatearepresenta-
tionofthesalientfeaturesoftheirenvironment.Theeldofstateestimationhas
focusedonusingsensormeasurementstoconstructabeliefaboutthestateofthe
world.Planningrequiresreasoningaboutthewaystoexecutethetaskstheyare
designedtoperform.Theplanningprocesshasbeenenabledbyadvancesinthe
semiconductorindustryspanningmanydecades.
20
Onceaplanhasbeendevised,
20
G.E.Moore,“CrammingMore
ComponentsontoIntegratedCir-
cuits,” Electronics, vol. 38, no. 8,
pp.114–117,1965.
anautonomousagentmustactonitintherealworld.Thistaskrequiresboth
hardware(intheformofactuators)andalgorithmstocontroltheactuatorsand
rejectdisturbances.Theeldofcontroltheoryhasfocusedonthestabilization
ofmechanicalsystemsthroughfeedbackcontrol.
21
Automaticcontrolsystems
21
D.A.Mindell,Between Human
andMachine:Feedback,Control,and
ComputingBeforeCybernetics.JHU
Press,2002.
arewidelyusedinindustry,fromtheregulationoftemperatureinanoventothe
navigationofaerospacesystems.
1.4.6Mathematics
Anagentmustbeabletoquantifyitsuncertaintytomakeinformeddecisionsin
uncertainenvironments.Theeldofdecisionmakingreliesheavilyonprobability
theoryforthistask.Inparticular,Bayesianstatisticsplaysanimportantroleinthis
text.In1763,apaperofThomasBayeswaspublishedposthumously,containing
whatwouldlaterbeknownasBayes’rule.Hisapproachtoprobabilisticinference
fellinandoutoffavoruntilthemid-twentiethcentury,whenresearchersbeganto
ndBayesianmethodsusefulinanumberofsettings.
22
MathematicianBernard
22
W.M.BolstadandJ.M.Curran,
Introduction to Bayesian Statistics.
Wiley,2016.
KoopmanfoundpracticaluseforthetheoryduringWorldWarII:
Everyoperationinvolvedinsearchisbesetwithuncertainties;itcanbeunderstood
quantitativelyonlyintermsof[...]probability.Thismaynowberegardedasatruism,
butitseemstohavetakenthedevelopmentsinoperationalresearchoftheSecond
WorldWartodrivehomeitspracticalimplications.
23
23
B.O.Koopman,SearchandScreen-
ing:GeneralPrincipleswithHistorical
Applications.PergamonPress,1980.
Sampling-basedmethods(sometimesreferredtoasMonteCarlomethods)devel-
opedintheearlytwentiethcenturyforlarge-scalecalculationsaspartofthe
ManhattanProject,madesomeinferencetechniquespossiblethatwouldpre-
viouslyhavebeenintractable.ThesefoundationsserveasabasisforBayesian
networks,whichincreasedinpopularitylaterinthetwentiethcenturyintheeld
ofarticialintelligence.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
12chapter1.introduction
1.4.7OperationsResearch
Operationsresearchisconcernedwithndingoptimalsolutionstodecision-making
problemssuchasresourceallocation,assetinvestment,andmaintenanceschedul-
ing.Inthelatenineteenthcentury,researchersbegantoexploretheapplicationof
mathematicalandscienticanalysistotheproductionofgoodsandservices.The
eldwasacceleratedduringtheIndustrialRevolutionwhencompaniesbeganto
subdividetheirmanagementintodepartmentsresponsiblefordistinctaspectsof
overalldecisions.DuringWorldWarII,theoptimizationofdecisionswasapplied
toallocatingresourcestoanarmy.Oncethewarcametoanend,businessesbegan
tonoticethatthesameoperationsresearchconceptspreviouslyusedtomake
militarydecisionscouldhelpthemoptimizebusinessdecisions.Thisrealization
ledtothedevelopmentofmanagementscience,asdescribedbytheorganizational
theoristHaroldKoontz:
Theabidingbeliefofthisgroupisthat,ifmanagement,ororganization,orplanning,
ordecisionmakingisalogicalprocess,itcanbeexpressedintermsofmathematical
symbolsandrelationships.Thecentralapproachofthisschoolisthemodel,foritis
throughthesedevicesthattheproblemisexpressedinitsbasicrelationshipsandin
termsofselectedgoalsorobjectives.
24
24
H. Koontz, “TheManagement
TheoryJungle,”AcademyofManage-
mentJournal,vol.4,no.3,pp.174–
188,1961.
Thisdesiretobeabletobettermodelandunderstandbusinessdecisionssparked
thedevelopmentofanumberofconceptsusedtoday,suchaslinearprogramming,
dynamicprogramming,andqueuingtheory.
25
25
F.S.Hillier,IntroductiontoOpera-
tionsResearch.McGraw-Hill,2012.
1.5SocietalImpact
Algorithmicapproachestodecisionmakinghavetransformedsocietyandwill
likelycontinuetointhefuture.Thissectionbrieyhighlightsafewwaysthat
decision-makingalgorithmscancontributetosocietyandintroduceschallenges
thatremainwhenattemptingtoensureabroadbenet.
26
26
Amuchmorethoroughdiscus-
sion is provided byZ.R. Shi, C.
Wang,andF.Fang,“ArticialIntel-
ligenceforSocialGood:ASurvey,
2020.arXiv:2001.01818v1.
Algorithmicapproacheshavecontributedtoenvironmentalsustainability.In
thecontextofenergymanagement,forexample,Bayesianoptimizationhasbeen
appliedtoautomatedhomeenergymanagementsystems.Algorithmsfromthe
eldofmultiagentsystemsareusedtopredicttheoperationofsmartgrids,design
marketsfortradingenergy,andpredictrooftopsolar-poweradoption.Algorithms
havealsobeendevelopedtoprotectbiodiversity.Forexample,neuralnetworks
areusedtoautomatewildlifecensuses,game-theoreticapproachesareusedto
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
1.5.societalimpact13
combatpoachinginforests,andoptimizationtechniquesareemployedtoallocate
resourcesforhabitatmanagement.
Decision-makingalgorithmshavefoundsuccessintheeldofmedicinefor
decades.Suchalgorithmshavebeenusedformatchingresidentstohospitals
andorgandonorstopatientsinneed.AnearlyapplicationofBayesiannetworks,
whichwewillcoverintherstpartofthisbook,wasdiseasediagnosis.Since
then,Bayesiannetworkshavebeenwidelyusedinmedicineforthediagnosisand
prognosisofdiseases.Theeldofmedicalimageprocessinghasbeentransformed
bydeeplearning,andalgorithmicideashaverecentlyplayedanimportantrole
inunderstandingthespreadofdisease.
Algorithmshaveenabledustounderstandthegrowthofurbanareasand
facilitatetheirdesign.Data-drivenalgorithmshavebeenwidelyusedtoimprove
publicinfrastructure.Forexample,stochasticprocesseshavebeenusedtopredict
failuresinwaterpipelines,deeplearninghasimprovedtracmanagement,
andMarkovdecisionprocessesandMonteCarlomethodshavebeenemployed
toimproveemergencyresponse.Ideasfromdecentralizedmultiagentsystems
haveoptimizedtravelroutes,andpath-planningtechniqueshavebeenusedto
optimizethedeliveryofgoods.Decision-makingalgorithmshavebeenusedfor
autonomouscarsandimprovingaircraftsafety.
Algorithmsforoptimizingdecisionscanamplifytheimpactofitsusers,regard-
lessoftheirintention.Iftheobjectiveoftheuserofthesealgorithms,forexample,
istospreadmisinformationduringapoliticalelection,thenoptimizationpro-
cessescanhelpfacilitatethis.However,similaralgorithmscanbeusedtomonitor
andcounteractthespreadoffalseinformation.Sometimestheimplementation
ofthesedecision-makingalgorithmscanleadtodownstreamconsequencesthat
theirusersdidnotintend.
27
27
Forageneraldiscussion,seeB.
Christian,TheAlignmentProblem.
Norton&Company,2020.Seealso
thediscussionbyD.Amodei, C.
Olah,J.Steinhardt,P.Christiano,
J.Schulman,andD.Mané,“Con-
creteProblemsinAISafety,”2016.
arXiv:1606.06565v2.
Althoughalgorithmshavethepotentialtobringsignicantbenets,there
arealsochallengesassociatedwiththeirimplementationinsociety.Data-driven
algorithmsoftensuerfrominherentbiasesandblindspotsduetothewaythat
dataiscollected.Asalgorithmsbecomepartofourlives,itisimportanttounder-
standhowtheriskofbiascanbereducedandhowthebenetsofalgorithmic
progresscanbedistributedinamannerthatisequitableandfair.Algorithmscan
alsobevulnerabletoadversarialmanipulation,anditiscriticalthatwedesign
algorithmsthatarerobusttosuchattacks.Itisalsoimportanttoextendmoral
andlegalframeworksforpreventingunintendedconsequencesandassigning
responsibility.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
14chapter1.introduction
1.6Overview
Thisbookisdividedintoveparts.Therstpartaddressestheproblemof
reasoningaboutuncertaintyandobjectivesinsimpledecisionsatasinglepoint
intime.Thesecondextendsdecisionmakingtosequentialproblems,wherewe
mustmakeasequenceofdecisionsinresponsetoinformationabouttheoutcomes
ofouractionsasweproceed.Thethirdaddressesmodeluncertainty,wherewe
donotstartwithaknownmodelandmustlearnhowtoactthroughinteraction
withtheenvironment.Thefourthaddressesstateuncertainty,whereimperfect
perceptualinformationpreventsusfromknowingthefullenvironmentalstate.
Thenalpartdiscussesdecisioncontextsinvolvingmultipleagents.
1.6.1ProbabilisticReasoning
Rationaldecisionmakingrequiresreasoningaboutouruncertaintyandobjectives.
Thispartofthebookbeginsbydiscussinghowtorepresentuncertaintyasaprob-
abilitydistribution.Real-worldproblemsrequirereasoningaboutdistributions
overmanyvariables.Wewilldiscusshowtoconstructthesemodels,howtouse
themtomakeinferences,andhowtolearntheirparametersandstructurefrom
data.Wethenintroducethefoundationsofutilitytheoryandshowhowitforms
thebasisforrationaldecisionmakingunderuncertaintythroughthemaximum
expectedutilityprinciple.Wethendiscusshownotionsofutilitytheorycanbe
incorporatedintotheprobabilisticgraphicalmodelsintroducedearlierinthis
chaptertoformwhatarecalleddecisionnetworks.
1.6.2SequentialProblems
Manyimportantproblemsrequirethatwemakeaseriesofdecisions.Thesame
principleofmaximumexpectedutilitystillapplies,butoptimaldecisionmaking
inasequentialcontextrequiresreasoningaboutfuturesequencesofactionsand
observations.Thispartofthebookwilldiscusssequentialdecisionproblemsin
stochasticenvironments,wheretheoutcomesofouractionsareuncertain.We
willfocusonageneralformulationofsequentialdecisionproblemsunderthe
assumptionthatthemodelisknownandthattheenvironmentisfullyobservable.
Wewillrelaxbothoftheseassumptionslaterinthebook.Ourdiscussionwill
beginwiththeintroductionoftheMarkovdecisionprocess(MDP),thestandard
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
1.6.overview15
mathematicalmodelforsequentialdecisionproblems.Wewilldiscussseveral
approachesforndingexactsolutionstothesetypesofproblems.Becauselarge
problemssometimesdonotpermitexactsolutionstobefoundeciently,wewill
discussacollectionofbothoineandonlineapproximatesolutionmethods,along
withatypeofmethodthatinvolvesdirectlysearchingthespaceofparameterized
decisionpolicies.Finally,wewilldiscussapproachesforvalidatingthatour
decisionstrategieswillperformasexpectedwhendeployedintherealworld.
1.6.3ModelUncertainty
Inourdiscussionofsequentialdecisionproblemsuptothispoint,wehave
assumedthatthetransitionandrewardmodelsareknown.Inmanyproblems,
however,thedynamicsandrewardsarenotknownexactly,andtheagentmust
learntoactthroughexperience.Byobservingtheoutcomesofitsactionsinthe
formofstatetransitionsandrewards,theagentistochooseactionsthatmaximize
itslong-termaccumulationofrewards.Solvingsuchproblemsinwhichthere
ismodeluncertaintyisthesubjectoftheeldofreinforcementlearningandthe
focusofthispartofthebook.Wewilldiscussseveralchallengesinaddressing
modeluncertainty.First,theagentmustcarefullybalancetheexplorationofthe
environmentwiththeexploitationofknowledgegainedthroughexperience.
Second,rewardsmaybereceivedlongaftertheimportantdecisionshavebeen
made,socreditforlaterrewardsmustbeassignedtoearlierdecisions.Third,the
agentmustgeneralizefromlimitedexperience.Wewillreviewthetheoryand
someofthekeyalgorithmsforaddressingthesechallenges.
1.6.4StateUncertainty
Inthispart,weextenduncertaintytoincludethestate.Insteadofobservingthe
stateexactly,wereceiveobservationsthathaveonlyaprobabilisticrelationship
withthestate.SuchproblemscanbemodeledasapartiallyobservableMarkov
decisionprocess(POMDP).AcommonapproachtosolvingPOMDPsinvolves
inferringabeliefdistributionovertheunderlyingstateatthecurrenttimestepand
thenapplyingapolicythatmapsbeliefstoactions.Thispartbeginsbydiscussing
howtoupdateourbeliefdistribution,givenapastsequenceofobservationsand
actions.Itthendiscussesavarietyofexactandapproximatemethodsforsolving
POMDPs.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
16chapter1.introduction
1.6.5MultiagentSystems
Uptothispoint,therehasonlybeenoneagentmakingdecisionswithintheenvi-
ronment.Thispartexpandsthepreviousfourpartstomultipleagents,discussing
thechallengesthatarisefrominteractionuncertainty.Webeginbydiscussing
simplegames,whereagroupofagentssimultaneouslyeachselectanaction.
Theresultisanindividualrewardforeachagentbasedonthecombinedjoint
action.TheMarkovgame(MG)representsageneralizationofbothsimplegames
tomultiplestatesandtheMDPtomultipleagents.Consequently,theagents
selectactionsthatcanstochasticallychangethestateofasharedenvironment.
AlgorithmsforMGsrelyonreinforcementlearningduetouncertaintyabout
thepoliciesoftheotheragents.ApartiallyobservableMarkovgame(POMG)intro-
ducesstateuncertainty,furthergeneralizingMGsandPOMDPs,asagentsnow
receiveonlynoisylocalobservations.ThedecentralizedpartiallyobservableMarkov
decisionprocess(Dec-POMDP)focusesthePOMGonacollaborative,multiagent
teamwherethereisasharedrewardamongtheagents.Thispartofthebook
presentsthesefourcategoriesofproblemsanddiscussesexactandapproximate
algorithmsthatsolvethem.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
parti
probabilisticreasoning
Rationaldecisionmakingrequiresreasoningaboutouruncertaintyandobjec-
tives.Uncertaintyarisesfrompracticalandtheoreticallimitationsonourabilityto
predictfutureevents.Forexample,predictingexactlyhowahumanoperatorwill
respondtoadvicefromadecisionsupportsystemwouldrequire,amongother
things,adetailedmodelofhowthehumanbrainworks.Eventhepathsofsatel-
litescanbediculttopredict.AlthoughNewtonianphysicspermithighlyprecise
predictionsofsatellitetrajectories,spontaneousfailuresinattitudethrusterscan
resultinlargedeviationsfromthenominalpath,andevensmallimprecisionscan
compoundovertime.Toachieveitsobjectives,arobustdecision-makingsystem
mustaccountforvarioussourcesofuncertaintyinthecurrentstateoftheworld
andfutureevents.Thispartofthebookbeginsbydiscussinghowtorepresentun-
certaintyusingprobabilitydistributions.Real-worldproblemsrequirereasoning
aboutdistributionsovermanyvariables.Wewilldiscusshowtoconstructthese
models,usethemtomakeinferences,andlearntheirparametersandstructure
fromdata.Wethenintroducethefoundationsofutilitytheoryandshowhowit
formsthebasisforrationaldecisionmakingunderuncertainty.Utilitytheorycan
beincorporatedintotheprobabilisticgraphicalmodelsintroducedearliertoform
whatarecalleddecisionnetworks.Wefocusonsingle-stepdecisions,reserving
discussionofsequentialdecisionproblemsforthenextpartofthebook.
www.dbooks.org
2Representation
Computationallyaccountingforuncertaintyrequiresaformalrepresentation.
Thischapterdiscusseshowtorepresentuncertainty.
1
Webeginbyintroducing
1
Adetaileddiscussionofavari-
etyofapproachestorepresenting
uncertaintyisprovidedbyF.Cuz-
zolin,TheGeometryofUncertainty.
Springer,2021.
thenotionofdegreeofbeliefandshowhowasetofaxiomsresultsinourability
touseprobabilitydistributionstoquantifyouruncertainty.
2
Wediscussseveral
2
Foramorecomprehensiveelabo-
ration,seeE.T.Jaynes,Probability
Theory:TheLogicofScience.Cam-
bridgeUniversityPress,2003.
usefulformsofdistributionsoverbothdiscreteandcontinuousvariables.Because
manyimportantproblemsinvolveprobabilitydistributionsoveralargenumber
ofvariables,wediscussawaytorepresentjointdistributionsecientlythattakes
advantageofconditionalindependencebetweenvariables.
2.1DegreesofBeliefandProbability
Inproblemsinvolvinguncertainty,itisessentialtobeabletocomparetheplausi-
bilityofdierentstatements.Wewouldliketobeabletorepresent,forexample,
thatproposition
A
ismoreplausiblethanproposition
B
.If
A
represents‘‘my
actuatorfailed,’’and
B
represents‘‘mysensorfailed,’’thenwewouldwrite
A B
.
Usingthisbasicrelation,wecandeneseveralotherrelations:
A B ifandonlyifB A (2.1)
A B ifandonlyifneitherA B norB A (2.2)
A B ifandonlyifA B orA B (2.3)
A B ifandonlyifB A orA B (2.4)
Wewanttomakecertainassumptionsabouttherelationshipsinducedby
theoperators
,
,and
.Theassumptionofuniversalcomparabilityrequires
exactlyoneofthefollowingtohold:
A B
,
A B
,or
A B
.Theassumptionof
transitivityrequiresthatif
A B
and
B C
,then
A C
.Universalcomparability
www.dbooks.org
20chapter2.representation
andtransitivityassumptionsleadtoanabilitytorepresentplausibilitybyareal-
valuedfunctionP thathasthefollowingtwoproperties:
3
3
SeediscussioninE.T.Jaynes,
ProbabilityTheory:TheLogicofSci-
ence.CambridgeUniversityPress,
2003.
P(A) > P(B) ifandonlyifA B (2.5)
P(A) = P(B) ifandonlyifA B (2.6)
Ifwemakeasetofadditionalassumptions
4
abouttheformof
P
,thenwecan
4
Theaxiomatizationofsubjective
probabilityisgivenbyP.C.Fish-
burn,“TheAxiomsofSubjec-
tiveProbability,”StatisticalScience,
vol.1,no.3,pp.335–345,1986.
A morerecent axiomatization is
containedinM.J.DupréandF.J.
Tipler,“New AxiomsforRigor-
ousBayesianProbability,Bayesian
Analysis,vol.4,no.3,pp.599–606,
2009.
showthat
P
mustsatisfythebasicaxiomsofprobability(seeappendixA.2).Ifwe
arecertainof
A
,then
P(A) = 1
.Ifwebelievethat
A
isimpossible,then
P(A) = 0
.
Uncertaintyinthetruthof
A
isrepresentedbyvaluesbetweenthetwoextrema.
Hence,probabilitymassesmustliebetween0 and1,with0 P(A) 1.
2.2ProbabilityDistributions
Aprobabilitydistributionassignsprobabilitiestodierentoutcomes.
5
Thereare
5
Foranintroductiontoprobability
theory,seeD.P.BertsekasandJ.N.
Tsitsiklis,IntroductiontoProbability.
AthenaScientic,2002.
dierentwaystorepresentprobabilitydistributionsdependingonwhetherthey
involvediscreteorcontinuousoutcomes.
2.2.1DiscreteProbabilityDistributions
1
2 3
4
5 6
0.1
0.2
0.3
x
P(x)
Figure 2.1.Aprobabilitymass
functionforadistributionover
1 : 6.
Adiscreteprobabilitydistributionisadistributionoveradiscretesetofvalues.We
canrepresentsuchadistributionasaprobabilitymassfunction,whichassigns
aprobabilitytoeverypossibleassignmentofitsinputvariabletoavalue.For
example,supposethatwehaveavariable
X
thatcantakeononeof
n
values:
1, . . . , n
,or,usingcolonnotation,
1 : n
.
6
Adistributionassociatedwith
X
speciesthe
6
Wewilloftenusethiscolonnota-
tionforcompactness.Othertexts
sometimesusethenotation
[1 . . n]
forintegerintervalsfrom
1
to
n
.
Wewillalsousethiscolonnota-
tiontoindexintovectorsandma-
trices.Forexample
x
1:n
represents
x
1
, . . . , x
n
. Thecolonnotation is
sometimesusedinprogramming
languages,suchasJuliaandMAT-
LAB.
n
probabilitiesofthevariousassignmentsofvaluestothatvariable,inparticular
P(X = 1), . . . , P(X = n).Figure2.1showsanexampleofadiscretedistribution.
Thereareconstraintsontheprobabilitymassesassociatedwithdiscretedistri-
butions.Themassesmustsumto1:
n
i=1
P(X = i) = 1 (2.7)
and0 P(X = i) 1 foralli.
Fornotationalconvenience,wewilluselowercaselettersandsuperscriptsas
shorthandwhendiscussingtheassignmentofvaluestovariables.Forexample,
P(x
3
)
isshorthandfor
P(X = 3)
.If
X
isabinaryvariable,itcantakeonthevalueof
trueorfalse.
7
Wewilluse
0
torepresentfalseand
1
torepresenttrue.Forexample,
7
Julia,likemanyotherprogram-
ming languages, similarly treats
Booleanvaluesas
0
and
1
innu-
mericaloperations.
weuseP(x
0
) torepresenttheprobabilitythatX isfalse.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
2.2.probabilitydistributions21
Theparametersofadistributiongoverntheprobabilitiesassociatedwithdier-
entassignments.Forexample,ifweuse
X
torepresenttheoutcomeofarollof
asix-sideddie,thenwewouldhave
P(x
1
) = θ
1
, . . . , P(x
6
) = θ
6
,with
θ
1:6
being
thesixparametersofthedistribution.However,weneedonlyveindependentpa-
rameterstouniquelyspecifythedistributionovertheoutcomesoftherollbecause
weknowthatthedistributionmustsumto1.
2.2.2ContinuousProbabilityDistributions
dx
x
p(x)
Figure2.2.Probabilitydensity
functionsareusedtorepresentcon-
tinuousprobabilitydistributions.
If
p(x)
isaprobabilitydensity,then
p(x)dx
indicatedbytheareaofthe
blue rectangle is the probability
that a sample from the random
variablefalls withinthe interval
(x, x + dx) asdx 0.
Acontinuousprobabilitydistributionisadistributionoveracontinuoussetofvalues.
Representingadistributionoveracontinuousvariableisalittlelessstraightfor-
wardthanforadiscretevariable.Forinstance,inmanycontinuousdistributions,
theprobabilitythatavariabletakesonaparticularvalueisinnitesimallysmall.
Onewaytorepresentacontinuousprobabilitydistributionistouseaprobability
densityfunction(seegure2.2),representedwithlowercaseletters.If
p(x)
isa
probabilitydensityfunctionover
X
,then
p(x)dx
istheprobabilitythat
X
falls
withintheinterval
(x, x + dx)
as
dx 0
.Similartohowtheprobabilitymassesas-
sociatedwithadiscretedistributionmustsumto
1
,aprobabilitydensityfunction
p(x) mustintegrateto1:
Z
p(x) dx = 1 (2.8)
2 0 2
0
0.5
1
x
p(x)
cdf
X
(x)
Figure2.3. Theprobabilitydensity
functionandcumulativedistribu-
tionfunctionforastandardGaus-
siandistribution.
Anotherwaytorepresentacontinuousdistributioniswithacumulativedistri-
butionfunction(seegure2.3),whichspeciestheprobabilitymassassociated
withvaluesbelowsomethreshold.Ifwehaveacumulativedistributionfunction
P
associatedwithvariable
X
,then
P(x)
representstheprobabilitymassassoci-
atedwith
X
takingonavaluelessthanorequalto
x
.Acumulativedistribution
functioncanbedenedintermsofaprobabilitydensityfunctionp asfollows:
cdf
X
(x) = P(X x) =
Z
x
p(x
) dx
(2.9)
Relatedtothecumulativedistributionfunctionisthequantilefunction,also
calledtheinversecumulativedistributionfunction(seegure2.4).Thevalueof
quantile
X
(α)
isthevalue
x
suchthat
P(X x) = α
.Inotherwords,thequantile
functionreturnstheminimumvalueof
x
whosecumulativedistributionvalueis
greaterthanorequaltoα.Ofcourse,wehave0 α 1.
0 0.5
1
2
0
2
α
quantile
X
(α)
Figure2.4.Thequantilefunction
forastandardGaussiandistribu-
tion.
Therearemanydierentparameterizedfamiliesofdistributions.Weoutline
severalinappendixB.Asimpledistributionfamilyistheuniformdistribution
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
22chapter2.representation
U(a, b)
,whichassignsprobabilitydensityuniformlybetween
a
and
b
,andzero
elsewhere.Hence,theprobabilitydensityfunctionis
p(x) = 1/(b a)
for
x
in
theinterval
[a, b]
.Wecanuse
U(x | a, b)
torepresentthedensityat
x
.
8
Thesupport
8
Sometextsuseasemicolontosep-
aratetheparametersofthedistri-
bution.Forexample,onecanalso
writeU(x; a, b).
ofadistributionisthesetofvaluesthatareassignednonzerodensity.Inthecase
ofU(a, b),thesupportistheinterval[a, b].Seeexample2.1.
Theuniformdistribution
U(0, 10)
assignsequalprobabilitytoallvaluesin
therange[0, 10] withaprobabilitydensityfunction:
U(x | 0, 10) =
1/10 if0 x 10
0 otherwise
(2.10)
Theprobabilitythatarandomsamplefromthisdistributionisequalto
theconstant
π
isessentiallyzero.However,wecandenenonzeroprobabili-
tiesforsamplesbeingwithinsomeinterval,suchas
[3, 5]
.Forexample,the
probabilitythatasampleliesbetween
3
and
5
giventhedistributionplotted
hereis:
Z
5
3
U(x | 0, 10) dx =
5 3
10
=
1
5
(2.11)
Thesupportofthisdistributionistheinterval[0, 10].
15 10
5 0 5
10 15
0
0.1
x
U(x | 0, 10)
Example 2.1.Anexampleofa
uniformdistributionwithalower
boundof
0
andanupperboundof
10.
AnothercommondistributionforcontinuousvariablesistheGaussiandistribu-
tion(alsocalledthenormaldistribution).TheGaussiandistributionisparameter-
izedbyameanµ andvarianceσ
2
:
p(x) = N(x | µ, σ
2
) (2.12)
Here,
σ
isthestandarddeviation,whichisthesquarerootofthevariance.The
varianceisalsocommonlydenotedby
ν
.Weuse
N(µ, σ
2
)
torepresentaGaus-
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
2.2.probabilitydistributions23
siandistributionwithparameters
µ
and
σ
2
and
N(x | µ, σ
2
)
torepresentthe
probabilitydensityatx,asgivenby
N(x | µ, σ
2
) =
1
σ
φ
x µ
σ
(2.13)
whereφ isthestandardnormaldensityfunction:
φ(x) =
1
2π
exp
x
2
2
(2.14)
AppendixBshowsplotsofGaussiandensityfunctionswithdierentparameters.
AlthoughaGaussiandistributionisoftenconvenientbecauseitisdenedby
onlytwoparametersandmakescomputationandderivationeasy,ithassome
limitations.Itassignsnonzeroprobabilitytolargepositiveandnegativevalues,
whichmaynotbeappropriateforthequantitywearetryingtomodel.Forexample,
wemightnotwanttoassignnonzeroprobabilitiesforaircraftyingbelowthe
groundoratinfeasiblealtitudes.WecanuseatruncatedGaussiandistribution(see
gure2.5)toboundthesupportofpossiblevalues;thatis,therangeofvalues
assignednonzeroprobabilities.Thedensityfunctionisgivenby
N(x | µ, σ
2
, a, b) =
1
σ
φ
xµ
σ
Φ
bµ
σ
Φ
aµ
σ
(2.15)
whenx iswithintheinterval(a, b).
2 0 2
0
0.2
0.4
x
probabilitydensity
full
truncated
Figure2.5.Theprobabilitydensity
functionsforaunitGaussiandistri-
butionandthesamedistribution
truncatedbetween1 and2.
Thefunction
Φ
isthestandardnormalcumulativedistributionfunction,asgiven
by
Φ(x) =
Z
x
φ(x
) dx
(2.16)
TheGaussiandistributionisunimodal,meaningthatthereisapointinthe
distributionatwhichthedensityincreasesononesideanddecreasesonthe
otherside.Therearedierentwaystorepresentcontinuousdistributionsthat
aremultimodal.Onewayistouseamixturemodel,whichisamixtureofmultiple
distributions.Wemixtogetheracollectionofunimodaldistributionstoobtain
amultimodaldistribution.AGaussianmixturemodelisamixturemodelthatis
simplyaweightedaverageofvariousGaussiandistributions.Theparametersof
aGaussianmixturemodelincludetheparametersoftheGaussiandistribution
componentsµ
1:n
, σ
2
1:n
,aswellastheirweightsρ
1:n
.Thedensityisgivenby
p(x | µ
1:n
, σ
2
1:n
, ρ
1:n
) =
n
i=1
ρ
i
N(x | µ
i
, σ
2
i
) (2.17)
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
24chapter2.representation
wheretheweightsmustsumto
1
.Example2.2showsaGaussianmixturemodel
withtwocomponents.
WecancreateaGaussianmixturemodelwithcomponents
µ
1
= 5
,
σ
1
= 2
and
µ
2
= 5
,
σ
2
= 4
,weightedaccordingto
ρ
1
= 0.6
and
ρ
2
= 0.4
.Herewe
plotthedensityoftwocomponentsscaledbytheirweights:
10
5 0 5
10
0.00
0.05
0.10
x
p(x)
scaledcomponents
mixturedensity
Example 2.2.An example ofa
Gaussianmixturemodel.
Anotherapproachtorepresentingmultimodalcontinuousdistributionsis
throughdiscretization.Forexample,wecanrepresentadistributionoveracon-
tinuousvariableasapiecewise-uniformdensity.Thedensityisspeciedbythe
binedges,andaprobabilitymassisassociatedwitheachbin.Suchapiecewise-
uniformdistributionisatypeofmixturemodelwherethecomponentsareuni-
formdistributions.
2.3JointDistributions
Ajointdistributionisaprobabilitydistributionovermultiplevariables.Adistribu-
tionoverasinglevariableiscalledaunivariatedistribution,andadistributionover
multiplevariablesiscalledamultivariatedistribution.Ifwehaveajointdistribution
overtwodiscretevariables
X
and
Y
,then
P(x, y)
denotestheprobabilitythat
bothX = x andY = y.
Fromajointdistribution,wecancomputeamarginaldistributionofavariable
orasetofvariablesbysummingoutallothervariablesusingwhatisknownas
thelawoftotalprobability:
9
9
Ifourdistributioniscontinuous,
thenweintegrateouttheothervari-
ableswhenmarginalizing.Forex-
ample:
p(x) =
Z
p(x, y) dy
P(x) =
y
P(x, y) (2.18)
Thispropertyisusedthroughoutthisbook.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
2.3.jointdistributions25
Real-worlddecisionmakingoftenrequiresreasoningaboutjointdistributions
involvingmanyvariables.Sometimestherearecomplexrelationshipsbetween
thevariablesthatareimportanttorepresent.Wemayusedierentstrategiesto
representjointdistributionsdependingonwhetherthevariablesinvolvediscrete
orcontinuousvalues.
2.3.1DiscreteJointDistributions
Ifthevariablesarediscrete,thejointdistributioncanberepresentedbyatablelike
theoneshownintable2.1.Thattablelistsallthepossibleassignmentsofvalues
tothreebinaryvariables.Eachvariablecanonlybe
0
or
1
,resultingin
2
3
= 8
possibleassignments.Aswithotherdiscretedistributions,theprobabilitiesin
thetablemustsumto
1
.Itfollowsthatalthoughthereareeightentriesinthe
table,onlysevenofthemareindependent.If
θ
i
representstheprobabilityinthe
i
throwinthetable,thenweonlyneedtheparameters
θ
1
, . . . , θ
7
torepresentthe
distributionbecauseweknowthatθ
8
= 1 (θ
1
+ . . . + θ
7
).
Table2.1.Exampleofajointdistri-
butioninvolvingbinaryvariables
X,Y,andZ.
X Y Z P(X, Y, Z)
0 0 0 0.08
0 0 1 0.31
0 1 0 0.09
0 1 1 0.37
1 0 0 0.01
1 0 1 0.05
1 1 0 0.02
1 1 1 0.07
Ifwehave
n
binaryvariables,thenweneedasmanyas
2
n
1
independent
parameterstospecifythejointdistribution.Thisexponentialgrowthinthenum-
berofparametersmakesstoringthedistributioninmemorydicult.Insome
cases,wecanassumethatourvariablesareindependent,whichmeansthatthe
realizationofonedoesnotaecttheprobabilitydistributionoftheother.If
X
and
Y
areindependent,whichissometimeswrittenas
XY
,thenweknowthat
P(x, y) = P(x)P(y)
forall
x
and
y
.Supposewehavebinaryvariables
X
1
, . . . , X
n
thatareallindependentofeachother,resultingin
P(x
1:n
) =
i
P(x
i
)
.Thisfac-
torizationallowsustorepresentthisjointdistributionwithonly
n
independent
parametersinsteadofthe
2
n
1
requiredwhenwecannotassumeindependence
(seetable2.2).Independencecanresultinanenormoussavingsintermsof
representationalcomplexity,butitisoftenapoorassumption.
Table2.2.Ifweknowthevariables
intable2.1areindependent,wecan
represent
P(x, y, z)
usingtheprod-
uct
P(x)P(y)P(z)
.Thisrepresenta-
tionrequiresonlyoneparameter
foreachofthethreeunivariatedis-
tributions.
X P(X)
0 0.85
1 0.15
Y P(Y)
0 0.45
1 0.55
Z P(Z)
0 0.20
1 0.80
Wecanrepresentjointdistributionsintermsoffactors.Afactor
φ
overasetof
variablesisafunctionfromassignmentsofthosevariablestotherealnumbers.In
ordertorepresentaprobabilitydistribution,therealnumbersinthefactormust
benonnegative.Afactorwithnonnegativevaluescanbenormalizedsuchthatit
representsaprobabilitydistribution.Algorithm2.1providesanimplementation
fordiscretefactors,andexample2.3demonstrateshowtheywork.
Anotherapproachtoreducethestoragerequiredtorepresentjointdistributions
withrepeatedvaluesistouseadecisiontree.Adecisiontreeinvolvingthreediscrete
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
26chapter2.representation
struct 


 # number of possible values
end
const
 󰎑 󰁤󰁧
const  󰎑 󰁤󰁧
struct 

󰁤󰁧

end
󰁜󰁠 󰎑 󰁪 for  in 󰁭
󰁜 󰁤󰁧󰁠 󰎑
󰁜󱝛󱝜󰁪󰁭 for in 󰁠
function 󰁜󰁤󰁧󰁠
 󰎑 󰁪 for  in 󰁭
return 󰁜󰁪󰁜󱝛󱝜 for 󰁜󰁠 in 󰁜 󰁠󰁠
for  in 󰁜󰁜1 for in 󰁠󰁠󰁭󰁠
end
function
󰁜󰁠
󰎑 󰁜 for 󰁜󰁠 in 󰁠
for 󰁜󰁠 in 
󰁪󰁭 󰎑 󰀴
end
return
end
Algorithm 2.1.Types and func-
tionsrelevanttoworkingwithfac-
torsoverasetofdiscretevariables.
Avariableisgivenaname(repre-
sentedasasymbol)andmaytake
onanintegerfrom
1
to
m
.Anas-
signmentisamappingfromvari-
ablenamestovaluesrepresented
asintegers.Afactorisdenedby
afactortable,whichassignsval-
uestodierentassignmentsin-
volvinga setofvariablesand is
a mappingfrom assignments to
realvalues.Thismappingisrepre-
sentedbyadictionary.Anyassign-
mentsnotcontainedinthedictio-
naryaresetto
0
.Alsoincludedin
thisalgorithmblockaresomeutil-
ityfunctionsforreturningthevari-
ablenamesassociatedwithafactor,
selectingasubsetofanassignment,
enumeratingpossibleassignments,
and normalizing factors. Asdis-
cussedinappendixG.3.3,

producestheCartesianproductof
asetofcollections.Itisimported
from.
Wecaninstantiatethetablefromtable2.1usingthe

typeusingthe
followingcode:
# requires convenience functions from appendix G.5
󰎑 󰁜 2󰁠
󰎑 󰁜 2󰁠
󰎑 󰁜 2󰁠
󰎑 󰁜󰁪 󰁭 󰁜
󰁜
󰎑1 󰎑1 󰎑1󰁠 󱝛󱝜 008 󰁜󰎑1 󰎑1 󰎑2󰁠 󱝛󱝜 031
󰁜
󰎑1 󰎑2 󰎑1󰁠 󱝛󱝜 009 󰁜󰎑1 󰎑2 󰎑2󰁠 󱝛󱝜 037
󰁜
󰎑2 󰎑1 󰎑1󰁠 󱝛󱝜 001 󰁜󰎑2 󰎑1 󰎑2󰁠 󱝛󱝜 005
󰁜
󰎑2 󰎑2 󰎑1󰁠 󱝛󱝜 002 󰁜󰎑2 󰎑2 󰎑2󰁠 󱝛󱝜 007
󰁠󰁠
Example2.3.Constructingadis-
cretefactor.Theconstructionofthe
factortableusingnamedtuplestakes
advantageoftheutilityfunctionsde-
nedinappendixG.5.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
2.3.jointdistributions27
variablesisshowninexample2.4.Althoughthesavingsinthisexampleintermsof
thenumberofparametersmaynotbesignicant,itcanbecomequitesubstantial
whentherearemanyvariablesandmanyrepeatedvalues.
Supposewehavethefollowingtablerepresentingajointprobabilitydis-
tribution.Wecanusethedecisiontreetotherightofittomorecompactly
representthevaluesinthetable.Redarrowsarefollowedwhenavariableis
0
,andbluearrowsarefollowedwhenavariableis
1
.Insteadofstoringeight
probabilities,westoreonlyve,alongwitharepresentationofthetree.
X Y Z P(X, Y, Z)
0 0 0 0.01
0 0 1 0.01
0 1 0 0.50
0 1 1 0.38
1 0 0 0.02
1 0 1 0.03
1 1 0 0.02
1 1 1 0.03
X
Y
Z
0.01
Z
0.02 0.03
0.50 0.38
Example2.4.Adecisiontreecan
beamoreecientrepresentation
ofajointdistributionthanatable.
2.3.2ContinuousJointDistributions
Wecanalsodenejointdistributionsovercontinuousvariables.Arathersimple
distributionisthemultivariateuniformdistribution,whichassignsaconstantprob-
abilitydensityeverywherethereissupport.Wecanuse
U(a, b)
torepresenta
uniformdistributionoverabox,whichisaCartesianproductofintervals,with
the
i
thintervalbeing
[a
i
, b
i
]
.Thisfamilyofuniformdistributionsisaspecialtype
ofmultivariateproductdistribution,whichisadistributiondenedintermsofthe
productofunivariatedistributions.Inthiscase,
U(x | a, b) =
i
U(x
i
| a
i
, b
i
) (2.19)
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
28chapter2.representation
Wecancreateamixturemodelfromaweightedcollectionofmultivariate
uniformdistributions,justaswecanwithunivariatedistributions.Ifwehavea
jointdistributionover
n
variablesand
k
mixturecomponents,weneedtodene
k(2n + 1) 1
independentparameters.Foreachofthe
k
components,weneed
todenetheupperandlowerboundsforeachofthevariablesaswellastheir
weights.Wecansubtract
1
becausetheweightsmustsumto
1
.Figure2.6shows
anexamplethatcanberepresentedbyvecomponents.
10
5 0 5
10
10
5
0
5
10
x
1
x
2
0.000
0.001
0.002
0.003
Figure2.6.Adensityfunctionfor
amixtureofmultivariateuniform
distributions.
Itisalsocommontorepresentpiecewiseconstantdensityfunctionsbydis-
cretizingeachofthevariablesindependently.Thediscretizationisrepresented
byasetofbinedgesforeachvariable.Thesebinedgesdeneagridoverthe
variables.Wethenassociateaconstantprobabilitydensitywitheachgridcell.
Thebinedgesdonothavetobeuniformlyseparated.Insomecases,itmaybe
desirabletohaveincreasedresolutionaroundcertainvalues.Dierentvariables
mighthavedierentbinedgesassociatedwiththem.Ifthereare
n
variablesand
m
binsforeachvariable,thenweneed
m
n
1
independentparameterstodene
thedistribution—inadditiontothevaluesthatdenethebinedges.
Insomecases,itmaybemorememoryecienttorepresentacontinuous
jointdistributionasadecisiontreeinamannersimilartowhatwediscussed
fordiscretejointdistributions.Theinternalnodescomparevariablesagainst
thresholdsandtheleafnodesaredensityvalues.Figure2.7showsadecisiontree
thatrepresentsthedensityfunctioningure2.6.
x
1
< 5
x
2
< 0
0.003
x
1
< 0
0.0027
x
2
< 5
0.003
0.0002 0.0027
Figure2.7.Anexampleofade-
cisiontreethatrepresentsapiece-
wiseconstantjointprobabilityden-
sitydenedover
x
1
and
x
2
overthe
interval[10, 10]
2
.
AnotherusefuldistributionisthemultivariateGaussiandistributionwiththe
densityfunction
N(x | µ, Σ) =
1
(2π)
n/2
|Σ|
1/2
exp
1
2
(x µ)
Σ
1
(x µ)
(2.20)
where
x
isin
R
n
,
µ
isthemeanvector,and
Σ
isthecovariancematrix.Thedensity
functiongivenhererequiresthat
Σ
bepositivedenite.
10
Thenumberofindepen-
10
Thisdenitionisreviewedinap-
pendixA.5.
dentparametersisequalto
n + (n + 1)n/2
,thenumberofcomponentsin
µ
added
tothenumberofcomponentsintheuppertriangleofmatrix
Σ
.
11
AppendixB
11
Ifweknowtheparametersinthe
uppertriangleof
Σ
,weknowthe
parametersinthelowertriangleas
well,becauseΣ issymmetric.
showsplotsofdierentmultivariateGaussiandensityfunctions.Wecanalso
denemultivariateGaussianmixturemodels.Figure2.8showsanexampleofone
withthreecomponents.
IfwehaveamultivariateGaussianwithallthevariablesindependent,then
thecovariancematrix
Σ
isdiagonalwithonly
n
independentparameters.Infact,
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
2.4.conditionaldistributions29
10
0
10
10
0
10
x
1
x
2
µ = [0, 0], Σ = [1 0; 0 1]
10
0
10
x
1
µ = [0, 5], Σ = [3 0; 0 3]
10
0
10
x
1
µ = [3, 3], Σ = [4 2; 2 4]
10
0
10
x
1
mixture
0.00
0.02
0.04
Figure2.8.MultivariateGaussian
mixturemodelwiththreecompo-
nents.Thecomponentsaremixed
togetherwiththeweights
0.1
,
0.5
,
and0.4,respectively.
wecanwritethedensityfunctionasaproductofunivariateGaussiandensities:
N(x | µ, Σ) =
i
N(x
i
| µ
i
, Σ
ii
) (2.21)
2.4ConditionalDistributions
Theprevioussectionintroducedtheideaofindependence,whichcanhelpreduce
thenumberofparametersusedtodeneajointdistribution.However,aswas
mentioned,independencecanbetoostrongofanassumption.Thissectionwill
introducetheideaofconditionalindependence,whichcanhelpreducethenum-
berofindependentparameterswithoutmakingassumptionsthatareasstrong.
Beforediscussingconditionalindependence,wewillrstintroducethenotionof
aconditionaldistribution,whichisadistributionoveravariablegiventhevalueof
oneormoreotherones.
Thedenitionofconditionalprobabilitystatesthat
P(x | y) =
P(x, y)
P(y)
(2.22)
where
P(x | y)
isreadas‘‘probabilityof
x
given
y
.’’Insomecontexts,itiscommon
torefertoy asevidence.
Sinceaconditionalprobabilitydistributionisaprobabilitydistributionover
oneormorevariablesgivensomeevidence,weknowthat
x
P(x | y) = 1 (2.23)
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
30chapter2.representation
foradiscreteX.IfX iscontinuous,itintegratesto1.
Wecanincorporatethedenitionofconditionalprobabilityintoequation(2.18)
toobtainaslightlydierentformofthelawoftotalprobability:
P(x) =
y
P(x | y)P(y) (2.24)
foradiscretedistribution.
Anotherusefulrelationshipthatfollowsfromthedenitionofconditional
probabilityisBayes’rule:
12
12
NamedfortheEnglishstatis-
tician andPresbyterian minister
ThomasBayes(c.1701–1761)who
providedaformulationofthisthe-
orem.AhistoryisprovidedbyS.B.
McGrayne,TheTheoryThatWould
NotDie.YaleUniversityPress,2011.
P(x | y) =
P(y | x)P(x)
P(y)
(2.25)
Ifwehavearepresentationofaconditionaldistribution
P(y | x)
,wecanapply
Bayes’ruletoswapy andx toobtaintheconditionaldistributionP(x | y).
Wewillnowdiscussavarietyofwaystorepresentconditionalprobability
distributionsoverdiscreteandcontinuousvariables.
2.4.1DiscreteConditionalModels
Aconditionalprobabilitydistributionoverdiscretevariablescanberepresented
usingatable.Infact,wecanusethesamediscretefactorrepresentationthat
weusedinsection2.3.1forjointdistributions.Table2.3showsanexampleofa
tablerepresenting
P(X | Y, Z)
withallbinaryvariables.Incontrastwithajoint
table(e.g.,table2.1),thecolumncontainingtheprobabilitiesneednotsumto
1
.However,ifwesumovertheprobabilitiesthatareconsistentwithwhatwe
areconditioningon,wemustget
1
.Forexample,conditioningon
y
0
and
z
0
(the
evidence),wehave
P(x
0
| y
0
, z
0
) + P(x
1
| y
0
, z
0
) = 0.08 + 0.92 = 1 (2.26)
Table2.3.Anexampleofacondi-
tionaldistributioninvolvingthebi-
naryvariablesX,Y,andZ.
X Y Z P(X | Y, Z)
0 0 0 0.08
0 0 1 0.15
0 1 0 0.05
0 1 1 0.10
1 0 0 0.92
1 0 1 0.85
1 1 0 0.95
1 1 1 0.90
Conditionalprobabilitytablescanbecomequitelarge.Ifweweretocreate
atableliketable2.3,inwhichallvariablescantakeon
m
valuesandweare
conditioningon
n
variables,therewouldbe
m
n+1
rows.However,sincethe
m
valuesofthevariablewearenotconditioningonmustsumto
1
,thereareonly
(m 1)m
n
independentparameters.Thereisstillanexponentialgrowthinthe
numberofvariablesonwhichwecondition.Whentherearemanyrepeatedvalues
intheconditionalprobabilitytable,adecisiontree(introducedinsection2.3.1)
maybeamoreecientrepresentation.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
2.4.conditionaldistributions31
2.4.2ConditionalGaussianModels
AconditionalGaussianmodelcanbeusedtorepresentadistributionoveracon-
tinuousvariablegivenoneormorediscretevariables.Forexample,ifwehavea
continuousvariable
X
andadiscretevariable
Y
withvalues
1 : n
,wecandenea
conditionalGaussianmodelasfollows:
13
13
Thisdenitionisforamixtureof
univariateGaussians,butthecon-
ceptcanbeeasilygeneralizedtoa
mixtureofmultidimensionalGaus-
sians.
p(x | y) =
N(x | µ
1
, σ
2
1
) ify
1
.
.
.
N(x | µ
n
, σ
2
n
) ify
n
(2.27)
withparametervector
θ = [µ
1:n
, σ
1:n
]
.All
2n
ofthoseparameterscanbevaried
independently.Ifwewanttoconditiononmultiplediscretevariables,wejust
needtoaddmorecasesandassociatedparameters.
2.4.3LinearGaussianModels
ThelinearGaussianmodelof
P(X | Y)
representsthedistributionoveracontinu-
ousvariable
X
asaGaussiandistributionwiththemeanbeingalinearfunction
ofthevalueofthecontinuousvariableY.Theconditionaldensityfunctionis
p(x | y) = N(x | my + b, σ
2
) (2.28)
withparameters
θ = [m, b, σ]
.Themeanisalinearfunctionof
y
denedby
parametersm andb.Thevarianceisconstant.Figure2.9showsanexample.
10
0
10
10
0
10
x
y
0.00
0.01
0.02
0.03
Figure2.9.AlinearGaussian
modelwith
p(x | y) = N(x | 2y + 1, 10
2
)
2.4.4ConditionalLinearGaussianModels
TheconditionallinearGaussianmodelcombinestheideasofconditionalGaussian
andlinearGaussianmodelstobeabletoconditionacontinuousvariableonboth
discreteandcontinuousvariables.Supposethatwewanttorepresent
p(X | Y, Z)
,
where
X
and
Y
arecontinuousand
Z
isdiscretewithvalues
1 : n
.Theconditional
densityfunctionisthen
p(x | y, z) =
N(x | m
1
y + b
1
, σ
2
1
) ifz
1
.
.
.
N(x | m
n
y + b
n
, σ
2
n
) ifz
n
(2.29)
Here,theparametervectorθ = [m
1:n
, b
1:n
, σ
1:n
] has3n components.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
32chapter2.representation
2.4.5SigmoidModels
Wecanuseasigmoid
14
modeltorepresentadistributionoverabinaryvariable
14
AsigmoidisanS-shapedcurve.
Therearedierentwaystodene
suchacurvemathematically,but
wewillfocusonthelogitmodel.
conditionedonacontinuousvariable.Forexample,wemaywanttorepresent
P(x
1
| y)
,where
x
isbinaryand
y
iscontinuous.Ofcourse,wecouldjustseta
threshold
θ
andsaythat
P(x
1
| y) = 0
if
y < θ
,and
P(x
1
| y) = 1
otherwise.
However,inmanyapplications,wemaynotwanttohavesuchahardthreshold
thatresultsinassigningzeroprobabilitytox
1
forcertainvaluesofy.
Insteadofahardthreshold,wecoulduseasoftthreshold,whichassignslow
probabilitieswhenbelowathresholdandhighprobabilitieswhenaboveathresh-
old.Onewaytorepresentasoftthresholdistousealogitmodel,whichproduces
asigmoidcurve:
P(x
1
| y) =
1
1 + exp
2
yθ
1
θ
2
(2.30)
Theparameter
θ
1
governsthelocationofthethreshold,and
θ
2
controlsthe‘‘soft-
ness’’orspreadoftheprobabilities.Figure2.10showsaplotof
P(x
1
| y)
witha
logitmodel.
4
2 0 2
4
0
0.5
1
y
P(x
1
| y)
θ
2
= 1 θ
2
= 2
θ
2
= 3 θ
2
= 10
Figure2.10.Thelogitmodelwith
θ
1
= 0 anddierentvaluesforθ
2
.
2.4.6DeterministicVariables
Someproblemsmayinvolveadeterministicvariable,whosevalueisxedgiven
evidence.Inotherwords,weassignprobability
1
toavaluethatisadetermin-
isticfunctionofitsevidence.Usingaconditionalprobabilitytabletorepresent
adiscretedeterministicvariableispossible,butitiswasteful.Asinglevariable
instantiationwillhaveprobability
1
foreachparentalinstantiation,andthere-
mainingentrieswillbe
0
.Ourimplementationcantakeadvantageofthissparsity
foramorecompactrepresentation.Algorithmsinthistextusingdiscretefactors
treatanyassignmentsmissingfromthefactortableashavingvalue
0
,makingit
sothatwehavetostoreonlytheassignmentsthathavenonzeroprobability.
2.5BayesianNetworks
ABayesiannetworkcanbeusedtorepresentajointprobabilitydistribution.
15
The
15
Foranin-depthtreatmentof
Bayesiannetworksandotherforms
ofprobabilisticgraphicalmodels,
seeD.KollerandN.Friedman,
ProbabilisticGraphicalModels:Prin-
ciples and Techniques.MITPress,
2009.
structureofaBayesiannetworkisdenedbyadirectedacyclicgraphconsistingof
nodesanddirectededges.
16
Eachnodecorrespondstoavariable.Directededges
16
AppendixA.16reviewscommon
graphterminology.
connectpairsofnodes,withcyclesinthegraphbeingprohibited.Thedirected
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
2.5.bayesiannetworks33
edgesindicatedirectprobabilisticrelationships.
17
Associatedwitheachnode
X
i
is
17
Incausalnetworks,thedirection
oftheedgesindicatecausalrela-
tionshipsbetweenvariables.How-
ever,causalityisnotrequiredin
generalBayesiannetworks.J.Pearl,
Causality:Models,Reasoning,andIn-
ference,2nded.CambridgeUniver-
sityPress,2009.
aconditionaldistribution
P(X
i
| Pa(X
i
))
,where
Pa(X
i
)
representstheparentsof
X
i
inthegraph.Algorithm2.2providesanimplementationofaBayesiannetwork
datastructure.Example2.5illustratestheapplicationofBayesiannetworkstoa
satellite-monitoringproblem.
struct 

󰁤󰁧
󰁤󰁧
󰁤󰁧
end
Algorithm2.2. AdiscreteBayesian
networkrepresentationintermsof
a setof variables,factors, anda
graph.Thegraphdatastructureis
providedby.
ThechainruleforBayesiannetworksspecieshowtoconstructajointdistribu-
tionfromthelocalconditionalprobabilitydistributions.Supposethatwehave
thevariables
X
1:n
andwanttocomputetheprobabilityofaparticularassignment
ofallthesevariablestovaluesP(x
1:n
).Thechainrulesays
P(x
1:n
) =
n
i=1
P(x
i
| pa(x
i
)) (2.31)
where
pa(x
i
)
istheparticularassignmentoftheparentsof
X
i
totheirvalues.
Algorithm2.3providesanimplementationforBayesiannetworkswithconditional
probabilitydistributionsrepresentedasdiscretefactors.
function 󰁜 󰁠
󰁜󰁠 󰎑 󰁜 󰁜󰁠󰁠
󰁜󰁠 󰎑 󰁜 󰁜󰁠 00󰁠
return 󰁜󰁜󰁠 for in 󰁠
end
Algorithm2.3.Afunctionfor
evaluating the probabilityof an
assignmentgivenaBayesian
network

.Forexample,if

is
asdenedinexample2.5,then
󰎑 󰁜󰎑1󰎑1󰎑1󰎑2󰎑1󰁠
󰁜 󰁜󰁠󰁠
returns0034228655999999996.
Inthesatelliteexample,supposewewanttocomputetheprobabilitythat
nothingiswrong;thatis,P(b
0
, s
0
, e
0
, d
0
, c
0
).Fromthechainrule,
P(b
0
, s
0
, e
0
, d
0
, c
0
) = P(b
0
)P(s
0
)P(e
0
| b
0
, s
0
)P(d
0
| e
0
)P(c
0
| e
0
) (2.32)
Ifwehadfullyspeciedajointdistributionoverthevevariables
B
,
S
,
E
,
D
,and
C
,thenwewouldhaveneeded
2
5
1 = 31
independentparameters.Thestructure
assumedinourBayesiannetworkallowsustospecifythejointdistributionusing
only
1 + 1 + 4 + 2 + 2 = 10
independentparameters.Thedierencebetween
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
34chapter2.representation
InthemarginisaBayesiannetworkforasatellite-monitoringprobleminvolv-
ingvebinaryvariables.Fortunately,batteryfailureandsolarpanelfailures
arebothrare,althoughsolarpanelfailuresaresomewhatmorelikelythan
batteryfailures.Failuresineithercanleadtoelectricalsystemfailure.There
maybecausesofelectricalsystemfailureotherthanbatteryorsolarpanel
failure,suchasaproblemwiththepowermanagementunit.Anelectrical
systemfailurecanresultintrajectorydeviation,whichcanbeobservedfrom
theEarthbytelescope,aswellasacommunicationlossthatinterruptsthe
transmissionoftelemetryandmissiondatadowntovariousgroundstations.
Otheranomaliesnotinvolvingtheelectricalsystemcanresultintrajectory
deviationandcommunicationloss.
Associatedwitheachofthevevariablesareveconditionalprobability
distributions.Because
B
and
S
havenoparents,weonlyneedtospecify
P(B)
and
P(S)
.ThecodeherecreatesaBayesiannetworkstructurewithexample
valuesfortheelementsoftheassociatedfactortables.Thetuplesinthefactor
tablesindexintothedomainsofthevariables,whichis
{0, 1}
forallthe
variables.Forexample,󰁜󰎑2󰎑1󰎑1󰁠 correspondsto(e
1
, b
0
, s
0
).
# requires convenience functions from appendix G.5
󰎑 󰁜 2󰁠 󰎑 󰁜 2󰁠
󰎑 󰁜 2󰁠
󰎑 󰁜 2󰁠 󰎑 󰁜 2󰁠
 󰎑 󰁪 󰁭
 󰎑 󰁪
󰁜󰁪󰁭 󰁜󰁜󰎑1󰁠 󱝛󱝜 099 󰁜󰎑2󰁠 󱝛󱝜 001󰁠󰁠
󰁜󰁪󰁭 󰁜󰁜󰎑1󰁠 󱝛󱝜 098 󰁜󰎑2󰁠 󱝛󱝜 002󰁠󰁠
󰁜󰁪󰁭 󰁜
󰁜
󰎑1󰎑1󰎑1󰁠 󱝛󱝜 090 󰁜󰎑1󰎑1󰎑2󰁠 󱝛󱝜 004
󰁜
󰎑1󰎑2󰎑1󰁠 󱝛󱝜 005 󰁜󰎑1󰎑2󰎑2󰁠 󱝛󱝜 001
󰁜
󰎑2󰎑1󰎑1󰁠 󱝛󱝜 010 󰁜󰎑2󰎑1󰎑2󰁠 󱝛󱝜 096
󰁜
󰎑2󰎑2󰎑1󰁠 󱝛󱝜 095 󰁜󰎑2󰎑2󰎑2󰁠 󱝛󱝜 099󰁠󰁠
󰁜󰁪 󰁭 󰁜
󰁜
󰎑1󰎑1󰁠 󱝛󱝜 096 󰁜󰎑1󰎑2󰁠 󱝛󱝜 003
󰁜
󰎑2󰎑1󰁠 󱝛󱝜 004 󰁜󰎑2󰎑2󰁠 󱝛󱝜 097󰁠󰁠
󰁜󰁪 󰁭 󰁜
󰁜
󰎑1󰎑1󰁠 󱝛󱝜 098 󰁜󰎑1󰎑2󰁠 󱝛󱝜 001
󰁜
󰎑2󰎑1󰁠 󱝛󱝜 002 󰁜󰎑2󰎑2󰁠 󱝛󱝜 099󰁠󰁠
󰁭
 󰎑 󰁜5󰁠
󰃷󰁜 1 3󰁠 󰃷󰁜 2 3󰁠
󰃷󰁜 3 4󰁠 󰃷󰁜 3 5󰁠
 󰎑 󰁜  󰁠
Example2.5.ABayesiannetwork
representingasatellite-monitoring
problem.Hereisthestructureof
thenetworkrepresentedasadi-
rected acyclic graph. Associated
with eachnode is a conditional
probabilitydistribution.
B
S
E
D
C
P(B) P(S)
P(E | B, S)
P(D | E) P(C | E)
B batteryfailure
S solarpanelfailure
E electricalsystemfailure
D trajectorydeviation
C communicationloss
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
2.6.conditionalindependence35
10
and
31
doesnotrepresentanespeciallysignicantsavingsinthenumberof
parameters,butthesavingscanbecomeenormousinlargerBayesiannetworks.
ThepowerofBayesiannetworkscomesfromtheirabilitytoreducethenumber
ofparametersrequiredtospecifyajointprobabilitydistribution.
2.6ConditionalIndependence
ThereasonthataBayesiannetworkcanrepresentajointdistributionwithfewer
independentparametersthanwouldnormallyberequiredistheconditionalin-
dependenceassumptionsencodedinitsgraphicalstructure.
18
Conditionalinde-
18
Iftheconditionalindependence
assumptionsmadebytheBayesian
networkareinvalid,thenwerun
theriskofnotproperlymodeling
thejointdistribution,aswillbedis-
cussedinchapter5.
pendenceisageneralizationofthenotionofindependenceintroducedinsec-
tion2.3.1.Variables
X
and
Y
areconditionallyindependentgiven
Z
ifandonly
if
P(X, Y | Z) = P(X | Z)P(Y | Z)
.Theassertionthat
X
and
Y
areconditionally
independentgiven
Z
iswrittenas
(XY | Z)
.Itispossibletoshowfromthis
denitionthat
(XY | Z)
ifandonlyif
P(X | Z) = P(X | Y, Z)
.Given
Z
,in-
formationabout
Y
providesnoadditionalinformationabout
X
,andviceversa.
Example2.6showsaninstanceofthis.
Supposethepresenceofsatellitetrajectorydeviation(
D
)isconditionally
independentofwhetherwehaveacommunicationloss(
C
)givenknowledge
ofwhetherwehaveanelectricalsystemfailure(
E
).Wewouldwritethis
(DC | E)
.Ifweknowthatwehaveanelectricalsystemfailure,thenthe
factthatweobservealossofcommunicationhasnoimpactonourbelief
thatthereisatrajectorydeviation.Wemayhaveanelevatedexpectation
thatthereisatrajectorydeviation,butthatisonlybecauseweknowthatan
electricalsystemfailurehasoccurred.
Example2.6.Conditionalindepen-
denceinthesatellite-trackingprob-
lem.
WecanuseasetofrulestodeterminewhetherthestructureofaBayesian
networkimpliesthattwovariablesmustbeconditionallyindependentgivenaset
ofotherevidencevariables.
19
Supposethatwewanttocheckwhether
(AB | C)
19
Evenifthestructureofanetwork
doesnotimplyconditionalinde-
pendence,theremaystillbecon-
ditionalindependenceduetothe
choiceofconditionalprobability
distributions.Seeexercise2.10.
isimpliedbythenetworkstructure,where
C
isasetofevidencevariables.Wehave
tocheckallpossibleundirectedpathsfrom
A
to
B
forwhatiscalledd-separation.
ApathbetweenA andB isd-separatedbyC ifanyofthefollowingistrue:
1.Thepathcontainsachainofnodes,X Y Z,suchthatY isinC.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
36chapter2.representation
2.Thepathcontainsafork,X Y Z,suchthatY isinC.
3.
Thepathcontainsaninvertedfork(alsocalledav-structure),
X Y Z
,such
that
Y
isnotin
C
andnodescendantof
Y
isin
C
.Example2.7providessome
intuitionforthisrule.
Wesaythat
A
and
B
ared-separatedby
C
ifallthepathsbetween
A
and
B
ared-separatedby
C
.Thisd-separationimpliesthat
(AB | C)
.
20
Example2.8
20
Analgorithmforecientlyde-
terminingd-separationisabitcom-
plicated. Seealgorithm 3.1 inD.
KollerandN.Friedman,Probabilis-
ticGraphicalModels:Principlesand
Techniques.MITPress,2009.
demonstratesthisprocessforcheckingwhetheragraphimpliesaparticular
conditionalindependenceassumption.
If we have
X Y Z
(chain) or
X Y Z
(fork) with evi-
denceat
Y
,then
X
and
Z
areconditionallyindependent,meaningthat
P(X | Y, Z) = P(X | Y)
.Interestingly,ifthedirectionsofthearrowswere
slightlydierent,with
X Y Z
(invertedfork),then
X
and
Z
mayno
longerbeconditionallyindependentgiven
Y
.Inotherwords,itmaybethe
casethat
P(B | E) 6= P(B | S, E)
.Toprovidesomeintuition,considerthe
invertedforkpathfrombatteryfailure
B
tosolarpanelfailure
S
viaelectrical
systemfailure
E
.Supposeweknowthatwehaveanelectricalfailure.Ifwe
knowthatwedonothaveabatteryfailure,thenwearemoreinclinedto
believethatwehaveasolarpanelfailurebecauseitisanalternativecauseof
theelectricalfailure.Conversely,ifwefoundoutthatwedohaveabattery
failure,thenourbeliefthatwehaveasolarpanelfailuredecreases.Thiseect
isreferredtoasexplainingaway.Observingasolarpanelfailureexplains
awaythecauseoftheelectricalsystemfailure.
Example2.7.Intuitionbehind
conditionalindependenceassump-
tionsimplied(andnotimplied)in
chains,forks,andinvertedforks.
SometimesthetermMarkovblanket
21
ofnode
X
isusedtorefertotheminimal
21
NamedaftertheRussianmath-
ematicianAndreyAndreyevich
Markov(1856–1922).J.Pearl,Prob-
abilisticReasoninginIntelligentSys-
tems:NetworksofPlausibleInference.
MorganKaufmann,1988.
setofnodesthat,iftheirvalueswereknown,make
X
conditionallyindependent
ofallothernodes.AMarkovblanketofaparticularnodeturnsouttoconsistof
itsparents,itschildren,andtheotherparentsofitschildren.
2.7Summary
Representinguncertaintyasaprobabilitydistributionismotivatedbyasetof
axiomsrelatedtothecomparisonoftheplausibilityofdierentstatements.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
2.7.summary37
Supposethatwewanttodeterminewhetherthenetworkshowninthemargin
impliesthat
(DB | F)
.Therearetwoundirectedpathsfrom
D
to
B
.We
needtocheckbothpathsford-separation.
Thepath
D A C B
involvesthefork
D A C
,followed
byaninvertedfork,
A C B
.Thereisnoevidenceat
A
,sothereis
nod-separationfromthefork.Since
F
isadescendantof
C
,thereisnod-
separationalongtheinvertedfork.Hence,thereisnod-separationalongthis
path.
Thesecondpath,
D E C B
,involvestheinvertedfork
D E C
andachain,
E C B
.Since
F
isadescendantof
E
,thereisnod-separation
alongtheinvertedfork.Becausethereisnod-separationalongthechainpart
ofthispatheither,thereisnod-separationalongthispathfromD toB.
For
D
and
B
tobeconditionallyindependentgiven
F
,theremustbed-
separationalongallundirectedpathsfrom
D
to
B
.Inthiscase,neitherofthe
twopathshasd-separation.Hence,conditionalindependenceisnotimplied
bythenetworkstructure.
Example2.8. Conditionalindepen-
denceassumptionsimpliedbythe
graphicalstructurebelow.
A
E
F
D
C
B
Therearemanyfamiliesofbothdiscreteandcontinuousprobabilitydistribu-
tions.
Continuousprobabilitydistributionscanberepresentedbydensityfunctions.
Probabilitydistributionfamiliescanbecombinedinmixturestocreatemore
exibledistributions.
Jointdistributionsaredistributionsovermultiplevariables.
Conditionaldistributionsaredistributionsoveroneormorevariablesgiven
thevaluesofevidencevariables.
ABayesiannetworkisdenedbyagraphicalstructureandasetofconditional
distributions.
DependingonthestructureoftheBayesiannetwork,wecanrepresentjoint
distributionswithfewerparametersduetoconditionalindependenceassump-
tions.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
38chapter2.representation
2.8Exercises
Exercise2.1.Consideracontinuousrandomvariable
X
thatfollowstheexponentialdistri-
butionparameterizedby
λ
withdensity
p(x | λ) = λ exp(λx)
withnonnegativesupport.
ComputethecumulativedistributionfunctionofX.
Solution:Westartwiththedenitionofthecumulativedistributionfunction.Sincethe
supportofthedistributionislower-boundedby
x = 0
,thereisnoprobabilitymassin
theinterval
(, 0)
,allowingustoadjustthelowerboundoftheintegralto
0
.After
computingtheintegral,weobtaincdf
X
(x):
cdf
X
(x) =
Z
x
p(x
) dx
cdf
X
(x) =
Z
x
0
λe
λx
dx
cdf
X
(x) = e
λx
x
0
cdf
X
(x) = 1 e
λx
Exercise2.2.Forthedensityfunctioningure2.6,whatarethevecomponentsofthe
mixture?(Therearemultiplevalidsolutions.)
Solution:OnesolutionisU([10, 10], [5, 10]),U([5, 0], [0, 10]),U([5, 10], [0, 0]),
U([0, 10], [10, 5]),andU([0, 5], [10, 10]).
Exercise2.3.Giventhefollowingtablerepresentationof
P(X, Y, Z)
,generateanequivalent
compactdecisiontreerepresentation:
X Y Z P(X, Y, Z)
0 0 0 0.13
0 0 1 0.02
0 1 0 0.05
0 1 1 0.02
1 0 0 0.13
1 0 1 0.01
1 1 0 0.05
1 1 1 0.17
2 0 0 0.13
2 0 1 0.12
2 1 0 0.05
2 1 1 0.12
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
2.8.exercises39
Solution:Westartwiththemostcommonprobabilities:
0.13
,whichoccurswhen
Z = 0
and
Y = 0
,and
0.05
,whichoccurswhen
Z = 0
and
Y = 1
.Wechoosetomake
Z
therootof
ourdecisiontree,andwhen
Z = 0
,wecontinuetoa
Y
node.Basedonthevalueof
Y
,we
branchtoeither
0.13
or
0.05
.Next,wecontinuewithcaseswhen
Z = 1
.Themostcommon
probabilitiesare
0.02
,whichoccurswhen
Z = 1
and
X = 0
,and
0.12
,whichoccurswhen
Z = 1
and
X = 2
.So,when
Z = 1
,wechoosetocontinuetoan
X
node.Basedonthe
whether
X
is
0
,
1
,or
2
,wecontinueto
0.02
,a
Y
node,or
0.12
,respectively.Finally,based
onthevalueofY,webranchtoeither0.01 or0.17.
Z
Y
0.13
0.05
X
0.02
Y
0.01 0.17
0.12
Exercise2.4.SupposethatwewanttospecifyamultivariateGaussianmixturemodelwith
threecomponentsdenedoverfourvariables.WerequirethattwoofthethreeGaussian
distributionsassumeindependencebetweenthefourvariables,whiletheotherGaussian
distributionisdenedwithoutanyindependenceassumptions.Howmanyindependent
parametersarerequiredtospecifythismixturemodel?
Solution:ForaGaussiandistributionoverfourvariables(
n = 4
)withindependence
assumptions,weneedtospecify
n + n = 2n = 8
independentparameters;therearefour
parametersforthemeanvectorandfourparametersforthecovariancematrix(whichis
equivalenttothemeanandvarianceparametersoffourindependentunivariateGaussian
distributions).ForaGaussiandistributionoverfourvariableswithoutindependence
assumptions,weneedtospecify
n + n(n + 1)/2 = 14
independentparameters;thereare
4
parametersforthemeanvectorand
10
parametersforthecovariancematrix.Inaddition,
forourthreemixturecomponents(
k = 3
),weneedtospecify
k 1 = 2
independent
parametersfortheweights.Thus,weneed
2(8) + 1(14) + 2 = 32
independentparameters
tospecifythismixturedistribution.
Exercise2.5.Wehavethreeindependentvariables
X
1:3
denedbypiecewise-constant
densitieswith
4
,
7
,and
3
binedges,respectively.Howmanyindependentparametersare
requiredtospecifytheirjointdistribution?
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
40chapter2.representation
Solution:Ifwehaveapiecewise-constantdensitywith
m
binedges,thenthereare
m 1
binsand
m 2
independentparameters.Forthisproblem,therewillbe
(4 2) + (7
2) + (3 2) = 8 independentparameters.
Exercise2.6.Supposethatwehavefourcontinuousrandomvariables,
X
1
,
X
2
,
Y
1
,and
Y
2
,andwewanttoconstructalinearGaussianmodelof
X = X
1:2
given
Y = Y
1:2
;thatis,
p(X | Y).Howmanyindependentparametersarerequiredforthismodel?
Solution:Inthiscase,ourmeanvectorfortheGaussiandistributionistwo-dimensional
andrequiresfourindependentparametersforthetransformationmatrix
M
andtwoinde-
pendentparametersforthebiasvector
b
.Wealsorequirethreeindependentparameters
forthecovariancematrix
Σ
.Intotal,weneed
4 + 2 + 3 = 9
independentparametersto
specifythismodel:
p(x | y) = N(x | My + b, Σ)
Exercise2.7.GiventhefollowingBayesiannetwork,inwhicheachnodecantakeononeof
fourvalues,howmanyindependentparametersarethere?Whatisthepercentreduction
inthenumberofindependentparametersrequiredwhenusingthefollowingBayesian
networkcomparedtousingafulljointprobabilitytable?
D
F
E
C
A
B
Solution:Thenumberofindependentparametersforeachnodeisequalto
(k 1)k
m
,where
k
isthenumberofvaluesthatthenodecantakeonand
m
isthenumberofparentsthatthe
nodehas.Variable
A
has
3
,
B
has
12
,
C
has
48
,
D
has
3
,
E
has
12
,and
F
has
48
independent
parameters.Thereare126 totalindependentparametersforthisBayesiannetwork.
Thenumberofindependentparametersrequiredtospecifyajointprobabilitytable
over
n
variablesthatcantakeon
k
valuesisequalto
k
n
1
.Therefore,specifyingajoint
probabilitytablewouldrequire
4
6
1 = 4096 1 = 4095
independentparameters.
Thepercentreductioninthenumberofindependentparametersrequiredis
(4095
126)/4095 96.9 %.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
2.8.exercises41
Exercise2.8.GiventhefollowingBayesiannetwork,isA d-separatedfromE,givenC?
A
E
D
C
B
Solution:Therearetwopathsfrom
A
to
E
:
A D E
and
A C E
.Thereis
d-separationalongthesecondpath,butnottherst.Hence,A isnotd-separatedfromE
givenC.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
42chapter2.representation
Exercise2.9.GiventhefollowingBayesiannetwork,determinetheMarkovblanketofB:
A
B
C
D
E
F
G
H
Solution:Pathsfrom
B
to
A
canonlybed-separatedgiven
A
.Pathsfrom
B
to
D
canonlybe
d-separatedgiven
D
.Pathsfrom
B
to
E
,andsimultaneously
F
,
G
,and
H
,canbeeciently
d-separatedgiven
E
.Pathsfrom
B
to
C
arenaturallyd-separatedduetoav-structure;
however,since
E
mustbecontainedinourMarkovblanket,pathsfrom
B
to
C
given
E
can
onlybed-separatedgivenC.So,theMarkovblanketofB is{A, C, D, E}.
Exercise2.10. InaBayesiannetworkwithstructure
A B
,isitpossiblefor
A
tobe
independentofB?
Solution:Thereisadirectarrowfrom
A
to
B
,whichindicatesthatindependenceisnot
implied.However,thisdoesnotmeanthattheyarenotindependent.Whether
A
and
B
are
independentdependsonthechoiceofconditionalprobabilitytables.Wecanchoosethe
tablessothatthereisindependence.Forexample,supposethatbothvariablesarebinary
and
P(a) = 0.5
isuniformand
P(b | a) = 0.5
.Clearly,
P(A)P(B | A) = P(A)P(B)
,which
meanstheyareindependent.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3Inference
Thepreviouschapterexplainedhowtorepresentprobabilitydistributions.This
chapterwillshowhowtousetheseprobabilisticrepresentationsforinference,
whichinvolvesdeterminingthedistributionoveroneormoreunobservedvari-
ablesgiventhevaluesassociatedwithasetofobservedvariables.Itbeginsby
introducingexactinferencemethods.Becauseexactinferencecanbecomputation-
allyintractabledependingonthestructureofthenetwork,wewillalsodiscuss
severalalgorithmsforapproximateinference.
3.1InferenceinBayesianNetworks
Ininferenceproblems,wewanttoinferadistributionoverqueryvariablesgiven
someobservedevidencevariables.Theothernodesarereferredtoashiddenvariables.
Weoftenrefertothedistributionoverthequeryvariables,giventheevidence,as
aposteriordistribution.
Toillustratethecomputationsinvolvedininference,recalltheBayesiannetwork
fromexample2.5,thestructureofwhichisreproducedingure3.1.Supposewe
have
B
asaqueryvariableandevidence
D = 1
and
C = 1
.Theinferencetaskisto
compute
P(b
1
| d
1
, c
1
)
,whichcorrespondstocomputingtheprobabilitythatwe
haveabatteryfailuregivenanobservedtrajectorydeviationandcommunication
loss.
B
S
E
D
C
Figure3.1.Bayesiannetworkstruc-
turefromexample2.5.
Fromthedenitionofconditionalprobabilityintroducedinequation(2.22),
weknowthat
P(b
1
| d
1
, c
1
) =
P(b
1
, d
1
, c
1
)
P(d
1
, c
1
)
(3.1)
www.dbooks.org
44chapter3.inference
Tocomputethenumerator,wemustuseaprocessknownasmarginalization,where
wesumoutvariablesthatarenotinvolved(inthiscaseS andE):
P(b
1
, d
1
, c
1
) =
s
e
P(b
1
, s, e, d
1
, c
1
) (3.2)
WeknowfromthechainruleforBayesiannetworksintroducedinequation(2.31)
that
P(b
1
, s, e, d
1
, c
1
) = P(b
1
)P(s)P(e | b
1
, s)P(d
1
| e)P(c
1
| e) (3.3)
Allthecomponentsontherightsidearespeciedintheconditionalprobability
distributionsassociatedwiththenodesintheBayesiannetwork.Wecancom-
putethedenominatorinequation(3.1)usingthesameapproach,butwithan
additionalsummationoverthevaluesforB.
Thisprocessofusingthedenitionofconditionalprobability,marginaliza-
tion,andapplyingthechainrulecanbeusedtoperformexactinferenceinany
Bayesiannetwork.Wecanimplementexactinferenceusingfactors.Recallthat
factorsrepresentdiscretemultivariatedistributions.Weusethefollowingthree
operationsonfactorstoachievethis:
Weusethefactorproduct(algorithm3.1)tocombinetwofactorstoproducea
largerfactorwhosescopeisthecombinedscopeoftheinputfactors.Ifwehave
φ(X, Y)
and
ψ(Y, Z)
,then
φ · ψ
willbeover
X
,
Y
,and
Z
with
(φ ·ψ)(x, y, z) =
φ(x, y)ψ(y, z).Thefactorproductisdemonstratedinexample3.1.
Weusefactormarginalization(algorithm3.2)tosumoutaparticularvariable
fromtheentirefactortable,removingitfromtheresultingscope.Example3.2
illustratesthisprocess.
Weusefactorconditioning(algorithm3.3)withrespecttosomeevidenceto
removeanyrowsinthetableinconsistentwiththatevidence.Example3.3
demonstratesfactorconditioning.
Thesethreefactoroperationsareusedtogetherinalgorithm3.4toperform
exactinference.Itstartsbycomputingtheproductofallthefactors,conditioning
ontheevidence,marginalizingoutthehiddenvariables,andnormalizing.One
potentialissuewiththisapproachisthesizeoftheproductofallthefactors.The
sizeofthefactorproductisequaltotheproductofthenumberofvalueseach
variablecanassume.Forthesatelliteexampleproblem,thereareonly
2
5
= 32
possibleassignments,butmanyinterestingproblemswouldhaveafactorproduct
thatistoolargetoenumeratepractically.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3.1.inferenceinbayesiannetworks45
function 󰁜 󰁠
 󰎑 󰁜󰁠
 󰎑 󰁜󰁠
 󰎑 󰁜 󰁠
 󰎑 󰁜󰁠
for 󰁜󰁠 in 
for in 󰁜󰁠
󰎑 󰁜 󰁠
 󰎑 󰁜 󰁠
󰁪󰁭 󰎑  󰁜  00󰁠
end
end
 󰎑 󰁜 󰁠
return 󰁜 󰁠
end
Algorithm3.1. Animplementation
ofthefactorproduct,whichcon-
structsthefactorrepresentingthe
jointdistributionoftwosmallerfac-
tors
and
.Ifwewanttocompute
thefactorproductof
and
,we
simplywrite.
Thefactorproductoftwofactorsproducesanewfactorovertheunionof
theirvariables.Here,weproduceanewfactorfromtwofactorsthatsharea
variable:
X Y φ
1
(X, Y)
0 0 0.3
0 1 0.4
1 0 0.2
1 1 0.1
Y Z φ
2
(Y, Z)
0 0 0.2
0 1 0.0
1 0 0.3
1 1 0.5
X Y Z φ
3
(X, Y, Z)
0 0 0 0.06
0 0 1 0.00
0 1 0 0.12
0 1 1 0.20
1 0 0 0.04
1 0 1 0.00
1 1 0 0.03
1 1 1 0.05
Example 3.1.An illustration of
a factor product combining two
factorsrepresenting
φ
1
(X, Y)
and
φ
2
(Y, Z)
toproduceafactorrepre-
sentingφ
3
(X, Y, Z).
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
46chapter3.inference
function 󰁜 󰁠
 󰎑 󰁜󰁠
for 󰁜 󰁠 in 
󰤱
󰎑 󰁜󰁜󰁠 󰁠
󰁪󰤱󰁭 󰎑 󰁜 󰤱 00󰁠 󰎉
end
 󰎑 󰁜 󰃣󰎖  󰎑  󰁠
return 󰁜 󰁠
end
Algorithm3.2.Amethodfor
marginalizing avariable named
 fromafactor.
Recallthejointprobabilitydistribution
P(X, Y, Z)
fromtable2.1.Wecan
marginalizeout
Y
bysummingtheprobabilitiesineachrowthathavematch-
ingassignmentsforX andZ:
X Y Z φ(X, Y, Z)
0 0 0 0.08
0 0 1 0.31
0 1 0 0.09
0 1 1 0.37
1 0 0 0.01
1 0 1 0.05
1 1 0 0.02
1 1 1 0.07
X Z φ(X, Z)
0 0 0.17
0 1 0.68
1 0 0.03
1 1 0.12
Example3.2.Ademonstrationof
factormarginalization.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3.1.inferenceinbayesiannetworks47
󰃷󰁜 󰁠 󰎑 󰁜 󰎑󰎑  for in 󰁠
function 󰁜  󰁠
if 󰃷󰁜 󰁠
return
end
 󰎑 󰁜󰁠
for 󰁜 󰁠 in 
if 󰁪󰁭 󰎑󰎑 

󰁪󰁜󰁜󰁠 󰁠󰁭 󰎑
end
end
 󰎑 󰁜 󰃣󰎖  󰎑  󰁠
return 󰁜 󰁠
end
function
󰁜 󰁠
for 󰁜 󰁠 in 󰁜󰁠
󰎑 󰁜  󰁠
end
return
end
Algorithm3.3.Twomethodsfor
factorconditioninggivensomeevi-
dence.Thersttakesafactor
and
returnsanewfactorwhosetable
entriesareconsistentwiththevari-
ablenamed

havingthevalue

.Thesecondtakesafactor
andappliesevidenceintheform
ofa namedtuple. The
󰃷
methodreturnstrueifthevariable
named

iswithinthescopeof
thefactor.
Factorconditioninginvolvesdroppinganyrowsinconsistentwiththeevi-
dence.Hereisthefactorfromtable2.1,andweconditionon
Y = 1
.Allrows
forwhichY 6= 1 areremoved:
X Y Z φ(X, Y, Z)
0 0 0 0.08
0 0 1 0.31
0 1 0 0.09
0 1 1 0.37
1 0 0 0.01
1 0 1 0.05
1 1 0 0.02
1 1 1 0.07
X Z φ(X, Z)
0 0 0.09
0 1 0.37
1 0 0.02
1 1 0.07
Y = 1
Example3.3.Anillustrationofset-
tingevidence,inthiscasefor
Y
,in
afactor.Theresultingvaluesmust
berenormalized.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
48chapter3.inference
struct  end
function
󰁜   󰁠
󰎑 󰁜󰁠
󰎑 󰁜 󰁠
for  in 󰁜󰁜󰁠 󰁠
󰎑 󰁜 󰁠
end
return
󰁜󰁠
end
Algorithm3.4.Anaiveexactin-
ference algorithmfor adiscrete
Bayesiannetwork

,whichtakes
as input a setof query variable
names

and

asso-
ciatingvalueswithobservedvari-
ables.Thealgorithmcomputesa
jointdistributionover thequery
variablesin theformof afactor.
Weintroducethe

typetoallow

tobecalled
withdierentinferencemethods,
asshallbeseenintherestofthis
chapter.
3.2InferenceinNaiveBayesModels
Theprevioussectionpresentedageneralmethodforperformingexactinferencein
anyBayesiannetwork.Thissectiondiscusseshowthissamemethodcanbeused
tosolveclassicationproblemsforaspecialkindofBayesiannetworkstructure
knownasanaiveBayesmodel.Thisstructureisgiveningure3.2.Anequivalent
butmorecompactrepresentationisshowningure3.3usingaplate,shownhere
asaroundedbox.The
i = 1 : n
inthebottomoftheboxspeciesthatthe
i
inthe
subscriptofthevariablenameisrepeatedfrom1ton.
C
O
1
···
O
n
Class
Observedfeatures
Figure3.2.AnaiveBayesmodel.
C
O
i
i = 1 : n
Class
Observedfeatures
Figure3.3.Platerepresentationof
anaiveBayesmodel.
InthenaiveBayesmodel,class
C
isthequeryvariable,andtheobserved
features
O
1:n
aretheevidencevariables.ThenaiveBayesmodeliscallednaive
becauseitassumesconditionalindependencebetweentheevidencevariables
giventheclass.Usingthenotationintroducedinsection2.6,wecansay
(O
i
O
j
|
C)
forall
i 6= j
.Ofcourse,iftheseconditionalindependenceassumptionsdo
nothold,thenwecanaddthenecessarydirectededgesbetweentheobserved
features.
Wehavetospecifytheprior
P(C)
andtheclass-conditionaldistributions
P(O
i
| C)
.
Asdoneintheprevioussection,wecanapplythechainruletocomputethejoint
distribution:
P(c, o
1:n
) = P(c)
n
i=1
P(o
i
| c) (3.4)
Ourclassicationtaskinvolvescomputingtheconditionalprobability
P(c | o
1:n
)
.
Fromthedenitionofconditionalprobability,wehave
P(c | o
1:n
) =
P(c, o
1:n
)
P(o
1:n
)
(3.5)
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3.3.sum-productvariableelimination49
Wecancomputethedenominatorbymarginalizingthejointdistribution:
P(o
1:n
) =
c
P(c, o
1:n
) (3.6)
Thedenominatorinequation(3.5)isnotafunctionof
C
andcanthereforebe
treatedasaconstant.Hence,wecanwrite
P(c | o
1:n
) = κP(c, o
1:n
) (3.7)
where
κ
isanormalizationconstantsuchthat
c
P(c | o
1:n
) = 1
.Weoftendrop
κ
andwrite
P(c | o
1:n
) P(c, o
1:n
) (3.8)
wheretheproportionaltosymbol
isusedtorepresentthattheleftsideispropor-
tionaltotherightside.Example3.4illustrateshowinferencecanbeappliedto
classifyingradartracks.
Wecanusethismethodtoinferadistributionoverclasses,butformany
applications,wehavetocommittoaparticularclass.Itiscommontoclassify
accordingtotheclasswiththehighestposteriorprobability,
arg max
c
P(c | o
1:n
)
.
However,choosingaclassisreallyadecisionproblemthatoftenshouldtakeinto
accounttheconsequencesofmisclassication.Forexample,ifweareinterestedin
usingourclassiertolterouttargetsthatarenotaircraftforthepurposeofair
traccontrol,thenwecanaordtooccasionallyletafewbirdsandotherclutter
tracksthroughourlter.However,wewouldwanttoavoidlteringoutanyreal
aircraftbecausethatcouldleadtoacollision.Inthiscase,wewouldprobably
wanttoclassifyatrackasabirdonlyiftheposteriorprobabilitywerecloseto1.
Decisionproblemswillbediscussedinchapter6.
3.3Sum-ProductVariableElimination
Avarietyofmethodscanbeusedtoperformecientinferenceinmorecompli-
catedBayesiannetworks.Onemethodisknownassum-productvariableelimination,
whichinterleaveseliminatinghiddenvariables(summations)withapplications
ofthechainrule(products).Itismoreecienttomarginalizevariablesoutas
earlyaspossibletoavoidgeneratinglargefactors.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
50chapter3.inference
Supposethatwehavearadartrackandwewanttodeterminewhetherit
wasgeneratedbyabirdoranaircraft.Webaseourinferenceonairspeed
andtheamountofheadinguctuation.Therstrepresentsourbeliefabout
whetheratargetisabirdoranaircraftintheabsenceofanyinformation
aboutthetrack.Hereareexampleclass-conditionaldistributionsforairspeed
v asestimatedfromradardata:
0 20
40
60 80
100
0
2
4
6
8
×10
2
v (m/s)
p(v | c)
Aircraft
Bird
Supposefromthechainrule,wedetermine:
P(bird, slow, littleheadinguctuation) = 0.03
P(aircraft, slow, littleheadinguctuation) = 0.01
Ofcourse,theseprobabilitiesdonotsumto1.Ifwewanttodeterminethe
probabilitythatatargetisabirdgiventheevidence,thenwewouldmake
thefollowingcalculation:
P(bird| slow, littleheadinguctuation) =
0.03
0.03 + 0.01
= 0.75
Example3.4.Radartargetclassi-
cationinwhichwewanttodeter-
minewhetheraradartrackcorre-
spondstoabirdoranaircraft.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3.3.sum-productvariableelimination51
Wewillillustratethevariableeliminationalgorithmbycomputingthedistribu-
tion
P(B | d
1
, c
1
)
fortheBayesiannetworkingure3.1.Theconditionalprobability
distributionsassociatedwiththenodesinthenetworkcanberepresentedbythe
followingfactors:
φ
1
(B), φ
2
(S), φ
3
(E, B, S), φ
4
(D, E), φ
5
(C, E) (3.9)
Because
D
and
C
areobservedvariables,thelasttwofactorscanbereplacedwith
φ
6
(E) andφ
7
(E) bysettingtheevidenceD = 1 andC = 1.
Wethenproceedbyeliminatingthehiddenvariablesinsequence.Dierent
strategiescanbeusedforchoosinganordering,butforthisexample,wearbitrarily
choosetheordering
E
andthen
S
.Toeliminate
E
,wetaketheproductofallthe
factorsinvolvingE andthenmarginalizeoutE togetanewfactor:
φ
8
(B, S) =
e
φ
3
(e, B, S)φ
6
(e)φ
7
(e) (3.10)
Wecannowdiscard
φ
3
,
φ
6
,and
φ
7
becausealltheinformationweneedfromthem
iscontainedinφ
8
.
Next,weeliminate
S
.Again,wegatherallremainingfactorsthatinvolve
S
and
marginalizeoutS fromtheproductofthesefactors:
φ
9
(B) =
s
φ
2
(s)φ
8
(B, s) (3.11)
Wediscard
φ
2
and
φ
8
andareleftwith
φ
1
(B)
and
φ
9
(B)
.Finally,wetaketheprod-
uctofthesetwofactorsandnormalizetheresulttoobtainafactorrepresenting
P(B | d
1
, c
1
).
Thisprocedureisequivalenttocomputingthefollowing:
P(B | d
1
, c
1
) φ
1
(B)
s
φ
2
(s)
e
φ
3
(e | B, s)φ
4
(d
1
| e)φ
5
(c
1
| e)
!
(3.12)
Thisproducesthesameresultas,butismoreecientthan,thenaiveprocedure
oftakingtheproductofallthefactorsandthenmarginalizing:
P(B | d
1
, c
1
)
s
e
φ
1
(B)φ
2
(s)φ
3
(e | B, s)φ
4
(d
1
| e)φ
5
(c
1
| e) (3.13)
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
52chapter3.inference
Thesum-productvariableeliminationalgorithmisimplementedinalgorithm3.5.
IttakesasinputaBayesiannetwork,asetofqueryvariables,alistofobserved
values,andanorderingofthevariables.Werstsetallobservedvalues.Then,
foreachvariable,wemultiplyallfactorscontainingitandthenmarginalizethat
variableout.Thisnewfactorreplacestheconsumedfactors,andwerepeatthe
processforthenextvariable.
Formanynetworks,variableeliminationallowsinferencetobedoneinan
amountoftimethatscaleslinearlywiththesizeofthenetwork,butithasex-
ponentialtimecomplexityintheworstcase.Whatinuencestheamountof
computationisthevariableeliminationorder.Choosingtheoptimalelimination
orderisNP-hard,
1
meaningthatitcannotbedoneinpolynomialtimeinthe
1
S.Arnborg,D.G.Corneil,and
A.Proskurowski,“Complexityof
FindingEmbeddingsina
k
-Tree,”
SIAMJournalonAlgebraicDiscrete
Methods,vol.8,no.2,pp.277–284,
1987.
worstcase(section3.5).Evenifwefoundtheoptimaleliminationorder,variable
eliminationcanstillrequireanexponentialnumberofcomputations.Variable
eliminationheuristicsgenerallytrytominimizethenumberofvariablesinvolved
intheintermediatefactorsgeneratedbythealgorithm.
struct 

# array of variable indices
end
function
󰁜   󰁠
󰎑 󰁪󰁜 󰁠 for in 󰁭
for in 

󰎑 󰁪󰁭
if  󰍴 

󰎑 󰁜󰃣󰎖󰃷󰁜 󰁠 󰁠
if 󰁜󰁠
󰎑 󰁜󰁪󰁭󰁠
󰁜 󰁠
󰎑 󰁜 󰁠
󰁜 󰁠
end
end
end
return
󰁜󰁜󰁠󰁠
end
Algorithm 3.5.An implementa-
tionofthesum-productvariable
eliminationalgorithm,whichtakes
in aBayesian network

, alist
ofqueryvariables

,andev-
idence

.Thevariablesare
processed in the order given by
.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3.4.beliefpropagation53
3.4BeliefPropagation
Anapproachtoinferenceknownasbeliefpropagationworksbypropagating‘‘mes-
sages’’throughthenetworkusingthesum-productalgorithminordertocompute
themarginaldistributionsofthequeryvariables.
2
Beliefpropagationrequires
2
A tutorial onthesum-product
algorithmwithadiscussionof
itsconnectionstomanyotheral-
gorithmsdevelopedinseparate
communitiesisprovidedbyF.
Kschischang,B.Frey,andH.
-
A.
Loeliger,“FactorGraphsandthe
Sum-ProductAlgorithm,”IEEE
TransactionsonInformationTheory,
vol.47,no.2,pp.498–519,2001.
lineartimebutprovidesanexactansweronlyifthenetworkdoesnothaveundi-
rectedcycles.Ifthenetworkhasundirectedcycles,thenitcanbeconvertedtoa
treebycombiningmultiplevariablesintosinglenodesbyusingwhatisknown
asthejunctiontreealgorithm.Ifthenumberofvariablesthathavetobecombined
intoanyonenodeintheresultingnetworkissmall,theninferencecanbedone
eciently.Avariationofbeliefpropagationknownasloopybeliefpropagationcan
provideapproximatesolutionsinnetworkswithundirectedcycles.Althoughthis
approachdoesnotprovideanyguaranteesandmaynotconverge,itcanwork
wellinpractice.
3
3
Beliefpropagationandrelatedal-
gorithmsarecoveredindetailby
D.Barber,BayesianReasoningand
MachineLearning.CambridgeUni-
versityPress,2012.
3.5ComputationalComplexity
WecanshowthatinferenceinBayesiannetworksisNP-hardbyusinganNP-
completeproblemcalled3SAT.
4
ItiseasytoconstructaBayesiannetworkfrom
4
G.F.Cooper,“TheComputa-
tionalComplexityofProbabilis-
ticInferenceUsingBayesianBelief
Networks,”ArticialIntelligence,
vol.42,no.2–3,pp.393–405,1990.
TheBayesiannetworkconstruction
inthissectionfollowsthattext.See
appendixCforabriefreviewof
complexityclasses.
anarbitrary3SATproblem.Forexample,considerthefollowing3SATformula:
5
5
Thisformulaalsoappearsinex-
ampleC.3inappendixC.
F(x
1
, x
2
, x
3
, x
4
) =
( x
1
x
2
x
3
)
( ¬x
1
¬x
2
x
3
)
( x
2
¬x
3
x
4
)
(3.14)
where
¬
representslogicalnegation(‘‘not’’),
representslogicalconjunction(‘‘and’’),
and
representslogicaldisjunction(‘‘or’’).Theformulaconsistsofaconjunction
ofclauses,whicharedisjunctionsofwhatarecalledliterals.Aliteralissimplya
variableoritsnegation.
Figure3.4showsthecorrespondingBayesiannetworkrepresentation.The
variablesarerepresentedby
X
1:4
,andtheclausesarerepresentedby
C
1:3
.The
distributionsoverthevariablesareuniform.Thenodesrepresentingclauses
haveasparentstheparticipatingvariables.Becausethisisa3SATproblem,each
clausenodehasexactlythreeparents.Eachclausenodeassignsprobability
0
toassignmentsthatdonotsatisfytheclauseandprobability
1
toallsatisfying
assignments.Theremainingnodesassignprobability
1
totrueifalltheirparents
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
54chapter3.inference
aretrue.Theoriginalproblemissatisableifandonlyif
P(y
1
) > 0
.Hence,
inferenceinBayesiannetworksisatleastashardas3SAT.
X
1
X
2
X
3
X
4
C
1
C
2
C
3
D
1
D
2
Y
Figure3.4.Bayesiannetworkrep-
resentinga3SATproblem.
ThereasonwegototheeortofshowingthatinferenceinBayesiannetworks
isNP-hardissothatweknowtoavoidwastingtimelookingforanecient,exact
inferencealgorithmthatworksonallBayesiannetworks.Therefore,researchover
thepastcoupleofdecadeshasfocusedonapproximateinferencemethods,which
arediscussednext.
3.6DirectSampling
Motivatedbythefactthatexactinferenceiscomputationallyintractable,many
approximationmethodshavebeendeveloped.Oneofthesimplestmethods
forinferenceisbasedondirectsampling,whererandomsamplesfromthejoint
distributionareusedtoarriveataprobabilityestimate.
6
Toillustratethispoint,
6
Sometimesapproachesinvolv-
ingrandomsamplingarereferred
toasMonteCarlomethods.The
namecomesfromtheMonteCarlo
Casinoin Monaco.Anintroduc-
tiontorandomizedalgorithmsand
theirapplicationto avariety of
problemdomainsisprovidedbyR.
MotwaniandP.Raghavan,Random-
izedAlgorithms.CambridgeUniver-
sityPress,1995.
supposethatwewanttoinfer
P(b
1
| d
1
, c
1
)
fromasetof
n
samplesfromthejoint
distribution
P(b, s, e, d, c)
.Weuseparentheticalsuperscriptstoindicatetheindex
ofasample,wherewewrite
(b
(i)
, s
(i)
, e
(i)
, d
(i)
, c
(i)
)
forthe
i
thsample.Thedirect
sampleestimateis
P(b
1
| d
1
, c
1
)
i
(b
(i)
= 1 d
(i)
= 1 c
(i)
= 1)
i
(d
(i)
= 1 c
(i)
= 1)
(3.15)
Weusetheconventionwherealogicalstatementinparenthesesistreatednumer-
icallyas
1
whentrueand
0
whenfalse.Thenumeratoristhenumberofsamples
consistentwith
b
,
d
,and
c
allsetto
1
,andthedenominatoristhenumberof
samplesconsistentwithd andc allsetto1.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3.6.directsampling55
SamplingfromthejointdistributionrepresentedbyaBayesiannetworkis
straightforward.Therststepinvolvesndingatopologicalsortofthenodesinthe
Bayesiannetwork.Atopologicalsortofthenodesinadirectedacyclicgraphis
anorderedlistsuchthatifthereisanedge
A B
,then
A
comesbefore
B
inthe
list.
7
Forexample,atopologicalsortforthenetworkingure3.1is
B, S, E, D, C
.
7
A.B.Kahn,“TopologicalSorting
ofLargeNetworks,”Communica-
tions of the ACM,vol.5,no.11,
pp.558–562,1962.Animplemen-
tationoftopologicalsortingispro-
videdbythe package.
Atopologicalsortalwaysexists,butitmaynotbeunique.Anothertopological
sortforthenetworkisS, B, E, C, D.
Oncewehaveatopologicalsort,wecanbeginsamplingfromtheconditional
probabilitydistributions.Algorithm3.6showshowtosamplefromaBayesian
networkgivenanordering
X
1:n
thatrepresentsatopologicalsort.Wedrawa
samplefromtheconditionaldistributionassociatedwith
X
i
giventhevaluesof
theparentsthathavealreadybeenassigned.Because
X
1:n
isatopologicalsort,
weknowthatalltheparentsof
X
i
havealreadybeeninstantiated,allowingthis
samplingtobedone.Directsamplingisimplementedinalgorithm3.7andis
demonstratedinexample3.5.
function 󰁜󰁠
 󰎑 00 󰁜󰁠 󰁜󰁜󰁠󰁠
for 󰁜󰁠 in 

󰎉󰎑 󰀴
if  󰎕󰎑
return
end
end
return
󰁜󰁠
end
function
󰁜󰁠
󰎑 󰁜󰁠
for in 󰃷󰁜󰁠
 󰎑 󰁪󰁭 󰁪󰁭
󰁪󰁭 󰎑 󰁜󰁜 󰁠󰁠󰁪󰁭
end
return
end
Algorithm3.6.Amethodfor
sampling an assignment from a
Bayesiannetwork

.Wealsopro-
videamethodforsamplinganas-
signmentfromafactor.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
56chapter3.inference
Supposewedraw
10
randomsamplesfromthenetworkingure3.1.We
areinterestedininferring
P(b
1
| d
1
, c
1
)
.Only2ofthe10samples(pointed
tointhetable)areconsistentwithobservations
d
1
and
c
1
.Onesamplehas
b = 1
,andtheothersamplehas
b = 0
.Fromthesesamples,weinferthat
P(b
1
| d
1
, c
1
) = 0.5
.Ofcourse,wewouldwanttousemorethanjust2
samplestoaccuratelyestimateP(b
1
| d
1
, c
1
).
B S E D C
0 0 1 1 0
0 0 0 0 0
1 0 1 0 0
1 0 1 1 1
0 0 0 0 0
0 0 0 1 0
0 0 0 0 1
0 1 1 1 1
0 0 0 0 0
0 0 0 1 0
Example3.5.Anexampleofhow
directsamplesfromaBayesiannet-
workcanbeusedforinference.
struct 
# number of samples
end
function
󰁜   󰁠
 󰎑 󰁜󰁠
for in 1󰁜󰁠
󰎑 󰁜󰁠
if 󰁜󰁪󰁭 󰎑󰎑 for 󰁜󰁠 in 󰁜󰁠󰁠
󰎑 󰁜 󰁠
󰁪󰁭 󰎑 󰁜 0󰁠 󰎉 1
end
end
 󰎑 󰁜󰃣󰎖 󰈱  󰁠
return 󰁜󰁜 󰁠󰁠
end
Algorithm3.7.The directsam-
plinginferencemethod,which
takesaBayesiannetwork

,a
listofqueryvariables

,and
evidence

.The method
draws
samplesfromtheBayesian
networkandretainsthosesamples
that are consistent withthe evi-
dence.Afactoroverthequeryvari-
ablesisreturned.Thismethodcan
failifnosamplesthatsatisfythe
evidencearefound.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3.7.likelihoodweightedsampling57
3.7LikelihoodWeightedSampling
Theproblemwithdirectsamplingisthatwemaywastetimegeneratingsamples
thatareinconsistentwiththeobservations,especiallyiftheobservationsare
unlikely.Analternativeapproachiscalledlikelihoodweightedsampling,which
involvesgeneratingweightedsamplesthatareconsistentwiththeobservations.
Toillustrate,wewillagainattempttoinfer
P(b
1
| d
1
, c
1
)
.Wehaveasetof
n
samples,wherethe
i
thsampleisagaindenoted
(b
(i)
, s
(i)
, e
(i)
, d
(i)
, c
(i)
)
.Theweight
oftheithsampleisw
i
.Theweightedestimateis
P(b
1
| d
1
, c
1
)
i
w
i
(b
(i)
= 1 d
(i)
= 1 c
(i)
= 1)
i
w
i
(d
(i)
= 1 c
(i)
= 1)
(3.16)
=
i
w
i
(b
(i)
= 1)
i
w
i
(3.17)
Togeneratetheseweightedsamples,webeginwithatopologicalsortand
samplefromtheconditionaldistributionsinsequence.Theonlydierencein
likelihoodweightingishowwehandleobservedvariables.Insteadofsampling
theirvaluesfromaconditionaldistribution,weassignvariablestotheirobserved
valuesandadjusttheweightofthesampleappropriately.Theweightofasample
issimplytheproductoftheconditionalprobabilitiesattheobservednodes.
Likelihoodweightedsamplingisimplementedinalgorithm3.8.Example3.6
demonstratesinferencewithlikelihoodweightedsampling.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
58chapter3.inference
struct 
# number of samples
end
function
󰁜   󰁠
 󰎑 󰁜󰁠
 󰎑 󰃷󰁜󰁠
for in 1󰁜󰁠
󰎑 󰁜󰁠 10
for in 

󰎑 󰁪󰁭 󰁪󰁭
if 󰁜 󰁠
󰁪󰁭 󰎑 󰁪󰁭
󰎑 󰁪󰁜 󰁜󰁠󰁠󰁭
else
󰁪󰁭 󰎑 󰁜󰁜 󰁠󰁠󰁪󰁭
end
end
󰎑 󰁜 󰁠
󰁪󰁭 󰎑 󰁜 0󰁠 󰎉
end
 󰎑 󰁜󰃣󰎖 󰈱  󰁠
return 󰁜󰁜 󰁠󰁠
end
Algorithm3.8.Thelikelihood
weightedsamplinginference
method, whichtakesaBayesian
network

,alistofquery
variables

,andevidence

.Themethoddraws
samplesfromtheBayesian
networkbutsetsvaluesfrom
evidencewhenpossible,keeping
trackoftheconditionalprobability
whendoingso.Theseprobabilities
are usedto weight thesamples
suchthatthenalinference
estimateisaccurate.Afactorover
thequeryvariablesisreturned.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3.7.likelihoodweightedsampling59
Thetablehereshowsvelikelihoodweightedsamplesfromthenetworkin
gure3.1.Wesamplefrom
P(B)
,
P(S)
,and
P(E | B, S)
,aswewouldwith
directsampling.Whenwecometo
D
and
C
,weassign
D = 1
and
C = 1
.If
thesamplehasE = 1,thentheweightisP(d
1
| e
1
)P(c
1
| e
1
);otherwise,the
weightisP(d
1
| e
0
)P(c
1
| e
0
).Ifweassume
P(d
1
| e
1
)P(c
1
| e
1
) = 0.95
P(d
1
| e
0
)P(c
1
| e
0
) = 0.01
thenwemayapproximatefromthesamplesinthetable:
P(b
1
| d
1
, c
1
) =
0.95
0.95 + 0.95 + 0.01 + 0.01 + 0.95
0.331
B S E D C Weight
1 0 1 1 1 P(d
1
| e
1
)P(c
1
| e
1
)
0 1 1 1 1 P(d
1
| e
1
)P(c
1
| e
1
)
0 0 0 1 1 P(d
1
| e
0
)P(c
1
| e
0
)
0 0 0 1 1 P(d
1
| e
0
)P(c
1
| e
0
)
0 0 1 1 1 P(d
1
| e
1
)P(c
1
| e
1
)
Example3.6.Likelihoodweighted
samplesfromaBayesiannetwork.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
60chapter3.inference
C
D
P(c
1
) = 0.001
P(d
1
| c
0
) = 0.001
P(d
1
| c
1
) = 0.999
Figure3.5.Chemicaldetection
Bayesiannetwork,with
C
indi-
catingwhetherthechemicalis
presentand
D
indicatingwhether
thechemicalisdetected.
Althoughlikelihoodweightingmakesitsothatallsamplesareconsistentwith
theobservations,itcanstillbewasteful.Considerthesimplechemicaldetection
Bayesiannetworkshowningure3.5,andassumethatwedetectedachemicalof
interest.Wewanttoinfer
P(c
1
| d
1
)
.Becausethisnetworkissmall,wecaneasily
computethisprobabilityexactlybyusingBayes’rule:
P(c
1
| d
1
) =
P(d
1
| c
1
)P(c
1
)
P(d
1
| c
1
)P(c
1
) + P(d
1
| c
0
)P(c
0
)
(3.18)
=
0.999 ×0.001
0.999 ×0.001 + 0.001 ×0.999
(3.19)
= 0.5 (3.20)
Ifweuselikelihoodweighting,then
99.9 %
ofthesampleswillhave
C = 0
,
withaweightof
0.001
.Untilwegetasampleof
C = 1
,whichhasanassociated
weightof0.999,ourestimateofP(c
1
| d
1
) willbe0.
3.8GibbsSampling
AnalternativeapproachtoinferenceistouseGibbssampling,
8
whichisakindof
8
NamedfortheAmericanscientist
JosiahWillardGibbs(1839–1903),
who, withJames Clerk Maxwell
andLudwigBoltzman,createdthe
eldofstatisticalmechanics.
MarkovchainMonteCarlotechnique.Gibbssamplinginvolvesdrawingsamples
consistentwiththeevidenceinawaythatdoesnotinvolveweighting.Fromthese
samples,wecaninferthedistributionoverthequeryvariables.
Gibbssamplinginvolvesgeneratingasequenceofsamples,startingwithan
initialsample,
x
(1)
1:n
,generatedrandomlywiththeevidencevariablessettotheir
observedvalues.The
k
thsample
x
(k)
1:n
dependsprobabilisticallyontheprevious
sample,
x
(k1)
1:n
.Wemodify
x
(k1)
1:n
inplacetoobtain
x
(k)
1:n
asfollows.Usingany
orderingoftheunobservedvariables,whichneednotbeatopologicalsort,
x
(k)
i
is
sampledfromthedistributionrepresentedby
P(X
i
| x
(k)
i
)
.Here,
x
(k)
i
represents
thevaluesofallothervariablesexcept
X
i
insample
k
.Samplingfrom
P(X
i
| x
(k)
i
)
canbedoneecientlybecauseweonlyneedtoconsidertheMarkovblanketof
variableX
i
(seesection2.6).
Unliketheothersamplingmethodsdiscussedsofar,thesamplesproduced
bythismethodarenotindependent.However,itcanbeproventhat,inthe
limit,samplesaredrawnexactlyfromthejointdistributionovertheunobserved
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3.8.gibbssampling61
variablesgiventheobservations.Algorithm3.9showshowtocomputeafactor
forP(X
i
| x
i
).Gibbssamplingisimplementedinalgorithm3.10.
function 󰁜 󰁠
 󰎑 󰁪󰁭

󰎑 󰁪󰁭
󰎑 󰁜󰁜󰁠 󰁠
󰎑 󰁜 󰃣󰎖 󰃷󰁜 󰁠 󰁠
󰎑 󰁜󰁜 󰁠 for in 󰁠
return 󰁜󰁠
end
Algorithm3.9.Amethodforob-
taining
P(X
i
| x
i
)
foraBayesian
network

givenacurrentassign-
ment.
Gibbssamplingcanbeappliedtoourrunningexample.Wecanuseour
m
samplestoestimate
P(b
1
| d
1
, c
1
)
1
m
i
(b
(i)
= 1) (3.21)
Figure3.6comparestheconvergenceoftheestimateof
P(c
1
| d
1
)
inthechem-
icaldetectionnetworkusingdirect,likelihoodweighted,andGibbssampling.
Directsamplingtakesthelongesttoconverge.Thedirectsamplingcurvehaslong
periodsduringwhichtheestimatedoesnotchangebecausesamplesareincon-
sistentwiththeobservations.Likelihoodweightedsamplingconvergesfasterin
thisexample.Spikesoccurwhenasampleisgeneratedwith
C = 1
,andthen
graduallydecrease.Gibbssampling,inthisexample,quicklyconvergestothe
truevalueof0.5.
Asmentionedearlier,Gibbssampling,likeotherMarkovchainMonteCarlo
methods,producessamplesfromthedesireddistributioninthelimit.Inpractice,
wehavetorunGibbsforsomeamountoftime,calledtheburn-inperiod,before
convergingtoasteady-statedistribution.Thesamplesproducedduringburn-in
arenormallydiscarded.IfmanysamplesaretobeusedfromasingleGibbs
samplingseries,itiscommontothinthesamplesbykeepingonlyevery
h
th
samplebecauseofpotentialcorrelationbetweensamples.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
62chapter3.inference
function 󰃷󰃷󰁜   󰁠
for in 

󰎑 󰁪󰁭
if 󰁜 󰁠
󰎑 󰁜 󰁠
󰁪󰁭 󰎑 󰁜󰁠󰁪󰁭
end
end
end
function
󰃷󰁜    󰁠
for in 1
󰃷󰃷󰁜   󰁠
end
end
struct

󰃷
# number of samples to use
󰃷 # number of samples to discard during burn-in
󰃷 # number of samples to skip for thinning
 # array of variable indices
end
function
󰁜   󰁠
 󰎑 󰁜󰁠
󰎑 󰁜󰁜󰁠 󰁠
󰃷󰁜    󰃷󰁠
for in 1󰁜󰃷󰁠
󰃷󰁜    󰃷󰁠
󰎑 󰁜 󰁠
󰁪󰁭 󰎑 󰁜 0󰁠 󰎉 1
end
 󰎑 󰁜󰃣󰎖 󰈱  󰁠
return 󰁜󰁜 󰁠󰁠
end
Algorithm3.10.Gibbssampling
implementedfor aBayesian net-
work

withevidence

andanordering

.The
methoditerativelyupdatestheas-
signment for iterations.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3.9.inferenceingaussianmodels63
0 2,000
4,000
6,000 8,000
0
0.2
0.4
0.6
0.8
1
numberofsamples
estimateofP(c
1
| d
1
)
directsampling
likelihoodweighted
Gibbssampling
Figure3.6.Acomparisonof
sampling-basedinferencemethods
onthechemicaldetectionnetwork.
Bothlikelihoodweightedanddi-
rect sampling have poor conver-
genceduetotherarityofevents,
whereas Gibbs sampling isable
toconvergetothetruevaluee-
ciently,evenwithnoburn-inpe-
riodorthinning.
3.9InferenceinGaussianModels
IfthejointdistributionisGaussian,wecanperformexactinferenceanalytically.
TwojointlyGaussianrandomvariablesa andb canbewritten
"
a
b
#
N
"
µ
a
µ
b
#
,
"
A C
C
B
#!
(3.22)
ThemarginaldistributionofamultivariateGaussianisalsoGaussian:
a N
(
µ
a
, A
)
b N
(
µ
b
, B
)
(3.23)
TheconditionaldistributionofamultivariateGaussianisalsoGaussian,with
aconvenientclosed-formsolution:
p(a | b) = N
a | µ
a|b
, Σ
a|b
(3.24)
µ
a|b
= µ
a
+ CB
1
(
b µ
b
)
(3.25)
Σ
a|b
= A CB
1
C
(3.26)
Algorithm3.11showshowtousetheseequationstoinferadistributionovera
setofqueryvariablesgivenevidence.Example3.7illustrateshowtoextractthe
marginalandconditionaldistributionsfromamultivariateGaussian.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
64chapter3.inference
function 󰁜   󰁠
󰎑 
  󰎑  󰁪󰁭 󰁪󰁭
󰎑 󰁪󰁭
󰎑 󰁪󰁭
󰎑 󰁪󰁭
󰎑 󰁪󰁭 󰎉 󰁜󰀸󰁜 󰃢 󰁠󰁠
󰎑 󰃢 󰁜 󰀸 󰄒󰁠
return 󰁜 󰁠
end
Algorithm3.11.Inferenceinamul-
tivariateGaussiandistribution
.
Avectorofintegersspeciesthe
queryvariables in the

ar-
gument,andavectorofintegers
speciestheevidencevariables
inthe

argument.
The values of the evidence vari-
ablesarecontainedinthevector

.The

packagedenesthe

dis-
tribution.
Consider
"
x
1
x
2
#
N
"
0
1
#
,
"
3 1
1 2
#!
Themarginaldistributionfor
x
1
is
N(0, 3)
,andthemarginaldistribution
forx
2
isN(1, 2).
Theconditionaldistributionforx
1
givenx
2
= 2 is
µ
x
1
|x
2
=2
= 0 + 1 ·2
1
·(2 1) = 0.5
Σ
x
1
|x
2
=2
= 3 1 ·2
1
·1 = 2.5
x
1
| (x
2
= 2) N
(
0.5, 2.5
)
Wecanperformthisinferencecalculationusingalgorithm3.11byconstruct-
ingthejointdistribution
󰎑 󰁜󰁪0010󰁭󰁪30 10 10 20󰁭󰁠
andthencalling󰁜 󰁪1󰁭 󰁪2󰁭 󰁪20󰁭󰁠.
Example3.7.Marginalandcondi-
tionaldistributionsforamultivari-
ateGaussian.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3.10.summary65
3.10Summary
Inferenceinvolvesdeterminingtheprobabilityofqueryvariablesgivensome
evidence.
Exactinferencecanbedonebycomputingthejointdistributionoverthevari-
ables,settingevidence,andmarginalizingoutanyhiddenvariables.
InferencecanbedoneecientlyinnaiveBayesmodels,inwhichasingleparent
variableaectsmanyconditionallyindependentchildren.
Thevariableeliminationalgorithmcanmakeexactinferencemoreecientby
marginalizingvariablesinsequence.
Beliefpropagationrepresentsanothermethodforinference,inwhichinforma-
tionisiterativelypassedbetweenfactorstoarriveataresult.
InferenceinaBayesiannetworkcanbeshowntobeNP-hardthroughare-
ductiontothe3SATproblem,motivatingthedevelopmentofapproximate
inferencemethods.
Approximateinferencecanbedonebydirectlysamplingfromthejointdistri-
butionrepresentedbyaBayesiannetwork,butitmayinvolvediscardingmany
samplesthatareinconsistentwiththeevidence.
Likelihoodweightedsamplingcanreducecomputationrequiredforapproxi-
mateinferencebyonlygeneratingsamplesthatareconsistentwiththeevidence
andweightingeachsampleaccordingly.
Gibbssamplinggeneratesaseriesofunweightedsamplesthatareconsistent
withtheevidenceandcangreatlyspeedapproximateinference.
Exactinferencecanbedoneecientlythroughmatrixoperationswhenthe
jointdistributionisGaussian.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
66chapter3.inference
3.11Exercises
Exercise3.1.GiventhefollowingBayesiannetworkanditsassociatedconditionalproba-
bilitydistributions,writetheequationrequiredtoperformexactinferenceforthequery
P(a
1
| d
1
).
A
B
C
D
Solution:Werstexpandtheinferenceexpressionusingthedenitionofconditional
probability.
P(a
1
| d
1
) =
P(a
1
, d
1
)
P(d
1
)
Wecanrewritethenumeratorasamarginalizationoverthehiddenvariablesandwecan
rewritethedenominatorasamarginalizationoverboththehiddenandqueryvariables:
P(a
1
| d
1
) =
b
c
P(a
1
, b, c, d
1
)
a
b
c
P(a, b, c, d
1
)
Thedenitionofthejointprobabilityinboththenumeratorandthedenominatorcanbe
rewrittenusingthechainruleforBayesiannetworksandtheresultingequationcanbe
simpliedbyremovingconstantsfrominsidethesummations:
P(a
1
| d
1
) =
b
c
P(a
1
)P(b | a
1
)P(c | b)P(d
1
| c)
a
b
c
P(a)P(b | a)P(c | b)P(d
1
| c)
=
P(a
1
)
b
c
P(b | a
1
)P(c | b)P(d
1
| c)
a
b
c
P(a)P(b | a)P(c | b)P(d
1
| c)
=
P(a
1
)
b
P(b | a
1
)
c
P(c | b)P(d
1
| c)
a
P(a)
b
P(b | a)
c
P(c | b)P(d
1
| c)
Exercise3.2.GiventhefollowingBayesiannetworkanditsassociatedconditionalproba-
bilitydistributions,writetheequationrequiredtoperformanexactinferenceforthequery
P(c
1
, d
1
| a
0
, f
1
).
A
B
C
D E F
Solution:Werstexpandtheinferenceexpressionusingthedenitionofconditional
probability:
P(c
1
, d
1
| a
0
, f
1
) =
P(a
0
, c
1
, d
1
, f
1
)
P(a
0
, f
1
)
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
3.11.exercises67
Wecanrewritethenumeratorasamarginalizationoverthehiddenvariables,andwecan
rewritethedenominatorasamarginalizationoverboththehiddenandqueryvariables:
P(c
1
, d
1
| a
0
, f
1
) =
b
e
P(a
0
, b, c
1
, d
1
, e, f
1
)
b
c
d
e
P(a
0
, b, c, d, e, f
1
)
Thedenitionofthejointprobabilityinboththenumeratorandthedenominatorcanbe
rewrittenusingthechainruleforBayesiannetworks,andtheresultingequationcanbe
simpliedbyremovingconstantsfrominsidethesummations.Notethattherearemultiple
possibleorderingsofthesummationsinthenalequation:
P(c
1
, d
1
| a
0
, f
1
) =
b
e
P(a
0
)P(b | a
0
, c
1
)P(c
1
)P(d
1
| a
0
)P(e | b, c
1
, d
1
)P( f
1
| e)
b
c
d
e
P(a
0
)P(b | a
0
, c)P(c)P(d | a
0
)P(e | b, c, d)P( f
1
| e)
=
P(a
0
)P(c
1
)P(d
1
| a
0
)
b
e
P(b | a
0
, c
1
)P(e | b, c
1
, d
1
)P( f
1
| e)
P(a
0
)
b
c
d
e
P(b | a
0
, c)P(c)P(d | a
0
)P(e | b, c, d)P( f
1
| e)
=
P(c
1
)P(d
1
| a
0
)
b
P(b | a
0
, c
1
)
e
P(e | b, c
1
, d
1
)P( f
1
| e)
c
P(c)
b
P(b | a
0
, c)
d
P(d | a
0
)
e
P(e | b, c, d)P( f
1
| e)
Exercise3.3.Supposethatwearedevelopinganobjectdetectionsystemforanautonomous
vehicledrivinginacity.Ourvehicle’sperceptionsystemreportsanobject’ssize
S
(either
small,medium,orlarge)andspeed
V
(eitherslow,moderate,orfast).Wewanttodesign
amodelthatwilldeterminetheclass
C
ofanobject—eitheravehicle,pedestrian,ora
ball—givenobservationsoftheobject’ssizeandspeed.AssuminganaiveBayesmodel
withthefollowingclasspriorandclass-conditionaldistributions,whatisthedetected
classgivenobservationsS =mediumandV =slow?
C P(C)
vehicle0.80
pedestrian0.19
ball0.01
C S P(S | C)
vehiclesmall0.001
vehiclemedium0.009
vehiclelarge0.990
pedestriansmall0.200
pedestrianmedium0.750
pedestrianlarge0.050
ballsmall0.800
ballmedium0.199
balllarge0.001
C V P(V | C)
vehicleslow0.2
vehiclemoderate0.2
vehiclefast0.6
pedestrianslow0.5
pedestrianmoderate0.4
pedestrianfast0.1
ballslow0.4
ballmoderate0.4
ballfast0.2
Solution:Tocomputetheposteriordistribution
P(c | o
1:n
)
,weusethedenitionofthejoint
distributionforanaiveBayesmodelinequation(3.4):
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com
www.dbooks.org
68chapter3.inference
P(c | o
1:n
) P(c)
n
i=1
P(o
i
| c)
P(vehicle| medium, slow) P(vehicle)P(S = medium| vehicle)P(V = slow| vehicle)
P(vehicle| medium, slow) (0.80)(0.009)(0.2) = 0.00144
P(pedestrian| medium, slow) P(pedestrian)P(S = medium| pedestrian)P(V = slow| pedestrian)
P(pedestrian| medium, slow) (0.19)(0.75)(0.5) = 0.07125
P(ball| medium, slow) P(ball)P(S = medium| ball)P(V = slow| ball)
P(ball| medium, slow) (0.01)(0.199)(0.4) = 0.000796
Since
P(pedestrian| medium, slow)
hasthelargestprobability,theobjectisclassiedasa
pedestrian.
Exercise3.4.Giventhe3SATformulainequation(3.14)andtheBayesiannetworkstructure
ingure3.4,whatarethevaluesofP(c
1
3
| x
1
2
, x
0
3
, x
1
4
) andP(y
1
| d
1
2
, c
0
3
)?
Solution:Wehave
P(c
1
3
| x
1
2
, x
0
3
, x
1
4
) = 1
because
x
1
2
, x
0
3
, x
1
4
makesthethirdclausetrue,and
P(y
1
| d
1
2
, c
0
3
) = 0,becauseY = 1 requiresthatbothD
2
andC
3
betrue.
Exercise3.5.Giveatopologicalsortforeachofthefollowingdirectedgraphs:
D
A
C
F
B
E
(1)
D
A
C
F
B
E
(2)
Solution:Therearethreevalidtopologicalsortsfortherstdirectedgraph(Bayesian
network):
(F, D, A, B, C, E)
,
(D, A, F, B, C, E)
,and
(D, F, A, B, C, E)
.Therearenovalid
topologicalsortsfortheseconddirectedgraphsinceitiscyclic.
Exercise3.6.SupposethatwehavethefollowingBayesiannetworkandweareinterested
ingeneratinganapproximationoftheinferencequery
P(e
1
| b
0
, d
1
)
usinglikelihood
weightedsampling.Giventhefollowingsamples,writetheexpressionsforeachofthe
sampleweights.Inaddition,writetheequationforestimating
P(e
1
| b
0
, d
1
)
intermsof
thesampleweightsw
i
.
©2022MassachusettsInstituteofTechnology,sharedunderaCreativeCommonsCC-BY-NC-NDlicense.
2022-05-0721:55:16-07:00,commentstobugs@algorithmsbook.com