JournalofMathematicalPsychology47(2003)90–100TutorialTutorialonmaximumlikelihoodestimationInJaeMyung*DepartmentofPsychology,OhioStateUniversity,1885NeilAvenueMall,Columbus,OH43210-1222,USAReceived30November2001;revised16October2002AbstractInthispaper,Iprovideatutorialexpositiononmaximumlikelihoodestimation(MLE).-squaresestimationwhichisprimarilyadescriptivetool,MLEisapreferredmethodofparameterestimationinstatisticsandisanindispensabletoolformanystatisticalmodelingtechniques,(USA).(),,whichunlikeMLErequiresnoorminimaldistributionalInpsychologicalscience,weseektouncovergeneralassumptions,isusefulforobtainingadescriptivelawsandprinciplesthatgovernthebehaviorundermeasureforthepurposeofsummarizingobserveddata,,,suchhypo-Ontheotherhand,MLEisnotaswidelyrecognizedthesesaboutthestructureandinnerworkingoftheamongmodelersinpsychology,:sufficiency(completeinformationaboutthepara-theunderlyingprocessbytestingtheviabilityofsuchmeterofinterestcontainedinitsMLEestimator);(trueparametervaluethatgeneratedtheOnceamodelisspecifiedwithitsparameters,anddatarecoveredasymptotically,-datahavebeencollected,oneisinapositiontoevaluatecientlylargesamples);efficiency(lowest-possiblevar-itsgoodnessoffit,thatis,howwellitfitstheobservedianceofparameterestimatesachievedasymptotically);(sameMLEsolutionvaluesofamodelthatbestfitsthedata—aprocedureobtainedindependentoftheparametrizationused).,,-squaresestimation(LSE)andmethodforparameterestimation,butratherasanmaximumlikelihoodestimation(MLE).,manyoftheinferencemethodsin(.,Rubin,Hinton,&Wenzel,1999;Lamberts,,butseeUsher&McClelland,2001)andistiedtomanyMLEisaprerequisiteforthechi-squaretest,theG-familiarstatisticalconceptssuchaslinearregression,squaretest,Bayesianmethods,inferencewithmissingsumofsquareserror,proportionvarianceaccountedfordata,modelingofrandomeffects,andmanymodelselectioncriteriasuchastheAkaikeinformationcriterion(Akaike,1973)andtheBayesianinformation*Fax:+-mailaddress:@(Schwarz,1978).0022-2496/03/$-seefrontmatterr2003ElsevierScience(USA).:
yfðyjn¼10;w¼0:2Þ¼ð0:2Þð0:8Þ,eachpopulationisidentifiedbyay!ð10 yÞ!ðy¼0;1;y;10Þð2Þ¼10andprobabilityparameterw¼0:2(top)andw¼0:7(bottom).
yLðwjyÞ¼fðyjwÞ:ð5Þfðyjn¼10;w¼0:7Þ¼ð0:7Þð0:3Þy!ð10 yÞ!ThusLðwjyÞrepresentsthelikelihoodoftheparameterðy¼0;1;y;10Þð3Þwgiventheobserveddatay;:Fortheone-parameterbinomialexampleinEq.(4),followingisthegeneralexpressionofthePDFofthethelikelihoodfunctionfory¼7andn¼10isgivenbybinomialdistributionforarbitraryvaluesofwandn:Lðwjn¼10;y¼7Þ¼fðy¼7jn¼10;wÞn!n yy10!fðyjn;wÞ¼wð1 wÞ37¼wð1 wÞð0pwp1Þ:ð6Þy!ðn yÞ!7!3!ð0pwp1;y¼0;1;y;nÞð4Þ:ThecollectionofallsuchfðyjwÞandthelikelihoodfunctionLðwjyÞ:,thetwofunctionsaredefinedonrange(0–1inthiscaseforw;nX1),,,,theGivenasetofparametervalues,thecorrespondinglikelihoodfunctionisafunctionoftheparametergivenPDFwillshowthatsomedataaremoreprobablethanaparticularsetofobserveddata,,thePDFwithw¼,:2;y¼2ismorelikelytooccurthany¼5(,).Inreality,however,(‘‘,wearefacedwithaninverseity’’):¼7andsamplesizen¼10fortheone-parametermodeldescribedinthetext.
wÞð:9Þwhichmeansthatonemustseekthevalueofthe7!3!parametervectorthatmaximizesthelikelihoodfunctionNext,thefirstderivativeofthelog-likelihoodisLðwjyÞ:Theresultingparametervector,whichissoughtcalculatedasbysearchingthemulti-dimensionalparameterspace,isdlnLðwjn¼10;y¼7Þ737 10wcalledtheMLEestimate,andisdenotedbyw¼MLE¼ ¼:ð10Þdww1 wwð1 wÞðw;y;wÞ:Forexample,,theMLE1;MLEk;MLEestimateisw¼0:7forwhichthemaximizedlike-MLEByrequiringthisequationtobezero,thedesiredMLElihoodvalueisLðw¼0:7jn¼10;y¼7Þ¼0:267:MLEestimateisobtainedasw¼0:7:TomakesurethatMLETheprobabilitydistributioncorrespondingtothisthesolutionrepresentsamaximum,notaminimum,-likelihoodiscalculatedandAccordingtotheMLEprinciple,thisisthepopulationevaluatedatw¼w;MLEthatismostlikelytohavegeneratedtheobserveddata2dlnLðwjn¼10;y¼7Þ73ofy¼7:Tosummarize,maximumlikelihoodestima-¼ 222dwwð1 wÞtionisamethodtoseektheprobabilitydistributionthatmakestheobserveddatamostlikely.¼ 47:62o0ð11Þwhichisnegative,,however,itisusuallynotpossibletoobtainananalyticformsolutionfortheMLEestimate,,,-MLEestimatemustbesoughtnumericallyusingnon-nience,-log-likelihoodfunction,lnLðwjyÞ:Thisisbecausethelinearoptimizationistoquicklyfindoptimalparameterstwofunctions,lnLðwjyÞandLðwjyÞ;ðwÞdefinedasHðwÞ¼ijlog-likelihoodfunction,lnLðwjyÞ;isdifferentiable,ifwexists,itmustsatisfy2thefollowingpartialMLE@lnLðwÞði;j¼1;y;kÞ:Thenamoreaccuratetestoftheconvexitydifferentialequationknownasthelikelihoodequation:@w@wijconditionrequiresthatthedeterminantofHðwÞbenegativedefinite,@lnLðwjyÞ0thatis,zHðw¼wÞzo0foranykx1real-numberedvectorz;where¼0ð7ÞMLE0@wizdenotesthetransposeofz:
w2powermodel:pðw;tÞ¼wtðw;w40Þ;112theotherhand,weseektheparametervaluesthatexponentialmodel:pðw;tÞ¼wexpð wtÞð13Þ12providethemostaccuratedescriptionofthedata,ðw;w40Þ:,inLSE,Supposethatdatay¼ðy;y;yÞconsistsofm1mthesumofsquareserror(SSE)betweenobservationsandobservationsinwhichyð0pyp1Þrepresentsanob-iipredictionsisminimized:servedproportionofcorrectrecallattimetði¼imX1;y;mÞ:Weareinterestedintestingtheviabilityof2SSEðwÞ¼ðy prdðwÞÞ;ð12Þ¼ðwÞdenotesthemodel’ðwÞisafunctionofthefðyjwÞ,firstweparametervectorw¼ðw;y;wÞ:1knotethateachobservedproportionyisobtainedbyiAsinMLE,findingtheparametervaluesthatdividingthenumberofcorrectresponsesðxÞbytheiminimizeSSEgenerallyrequiresuseofanon-lineartotalnumberofindependenttrialsðnÞ;y¼=nð0pyp1ÞWethennotethateachxisbinomiallyiiisubjecttothelocalminimaproblem,especiallywhenthedistributedwithprobabilitypðw;tÞ,LSEestimatestendn!todifferfromMLEestimates,especiallyfordatathatpower:fðxjn;wÞ¼iðn xÞ!x!iiarenotnormallydistributedsuchasproportioncorrect wx wn x2i2iðwtÞð1 wtÞ;!ð14Þexponential:fðxjn;wÞ¼idatasetdependinguponwhichmethodofestimationisðn xÞ!x!,MLExiðwexpð wtÞÞ12ishouldbepreferredtoLSE,unlesstheprobabilityn xið1 wexpð wtÞÞ;12idensityfunctionisunknownordifficulttoobtaininaneasilycomputableform,forinstance,forthediffusionwherex¼0;1;y;n;i¼1;y;m:i3modelofrecognitionmemory(Ratcliff,1978).ThereisTherearetwopointstobemaderegardingthePDFsasituation,however,,(.(4)),,maximizationoftheEq.(14)isobtainedbysimplyreplacingtheprobabilitylog-likelihoodisequivalenttominimizationofSSE,andparameterwinEq.(4)withthemodelequation,pðw;tÞ;intherefore,thesameparametervaluesareobtainedunderEq.(13).Second,,1=n:Assuch,anystatisticalconclusionregardingxisapplicabledirectlytoy;,thePDFfory;fðyjn;wÞ;ðxjn;wÞwithny:iiiNow,assumingthatx’sarestatisticallyindependentiInthissection,Ipresentanapplicationexampleofofoneanother,,IchoseforgettingdatagiventherecentsurgelnLðw¼ðw;wÞjn;xÞ12ofinterestinthistopic(&Wenzel,1996;¼lnðfðxjn;wÞ fðxjn;wÞ?fðxjn;wÞÞ12mWickens,1998;Wixted&Ebbesen,1991).mXAmongahalf-dozenretentionfunctionsthathave¼lnfðxjn;wÞibeenproposedandtestedinthepast,Iprovideani¼1exampleofMLEforthetwofunctions,powerandmX w ¼ðwwÞbetheparametervector,t221;2¼ðxlnðwtÞþðn xÞlnð1 wtÞi1i1iii¼13Forthismodel,thePDFisexpressedasaninfinitesumofþlnn! lnðn xÞ! lnx!Þ:ð15Þiitranscendentalfunctions.
313:37ð0:886Þ 305:31ð0:963Þ()():Foreachmodelfitted,():ðy;y;yÞ¼ð0:94;0:77;0:40;16twoparameters,wandw:Itisworthnotingthatthe0:26;0:24;0:16Þ;fromwhichthenumberofcorrect12lastthreetermsofthefinalexpressionintheaboveresponses,x;isobtainedas100y;i¼1;y;6:Iniiequation(.,lnn! lnðn xÞ! lnx!),,,thesetermscanbeignored,Table1summarizestheMLEresults,includingfitandtheirvaluesareoftenomittedinthecalculationofmeasuresandparameterestimates,,fortheexponentialmodel,LSEresults,-likelihoodfunctioncanbeobtainedfromEq.(15)calculationsisincludedintheappendix. w2bysubstitutingwexpð wtÞforwt:TheresultsinTable1indicatethatundereither12i1iInillustratingMLE,IusedadatasetfromMurdockmethodofestimation,theexponentialmodelfitbetter(1961).,fortheformer,thelog-setofwordsorlettersandwereaskedtorecalltheitemslikelihoodwaslargerandtheSSEsmallerthanfortheaftersixdifferentretentionintervals,ðt;y;tÞ¼ð1;3;6;9;12;18Þinsecondsandthus,m¼6:Theofr:(¼100)todifferencesarenotunexpectedandareduetothefact
pÞ;dependsuponproportioncorrectp:
;%timeintervalsasacolumnvector0y¼½:94:77:40:26:24:16
;%observedproportioncorrectasacolumnvectornx¼ny;%numberofcorrectresponsesinitw¼randð2;1Þ;%startingparametervaluesloww¼zerosð2;1Þ;%parameterlowerboundsnupw¼100onesð2;1Þ;%parameterupperboundswhile1,½w1;lik1;exit1
¼fmincon(‘powermle’,initw,[],[],[],[],loww,upw,[],opts);
¼FMINCONð‘EXPOMLE’;INITW;½
;½
;½
;½
;LOWW;UPW;½
;OPTSÞ;%optimizationforexponentialmodelthatminimizesminuslog-likelihoodnprd1¼w1ð1;1Þt:ð-w1ð2;1ÞÞ;%bestfitpredictionbypowermodel#r2ð1;1Þ¼1-sumððprd1-yÞ:2Þ=sumððy-meanðyÞÞ:2Þ;%r2forpowermodel###nnprd2¼w2ð1;1Þexpð-w2ð2;1ÞtÞ;%bestfitpredictionbyexponentialmodelr2ð2;1Þ¼1-sumððprd2-yÞ:2Þ=sumððy-meanðyÞÞ:2Þ;%r2forexponentialmodel###ifsumðr240Þ¼¼2break;elseinitw¼randð2;1Þ;end;end;formatlong;disp(num2str([w1w2r2],5));%displayresultsdisp(num2str([lik1lik2exit1exit2],5));%displayresultsend%endofthemainprogramfunctionloglik¼powermleðwÞ%POWERMLEThelog-likelihoodfunctionofthepowermodelglobalntx;np¼wð1;1Þt:ð-wð2;1ÞÞ;%powermodelpredictiongivenparameter#nnp¼pþðp¼¼zerosð6;1ÞÞ1e-5 ðp¼¼onesð6;1ÞÞ1e-5;%ensure0opo1n**loglik¼ð 1Þðx:logðpÞþðn-xÞ:logð1-pÞÞ;%minuslog-likelihoodforindividualobservationsloglik¼sumðloglikÞ;%overallminuslog-likelihoodbeingminimizedfunctionloglik¼expomleðwÞ%EXPOMLEThelog-likelihoodfunctionoftheexponentialmodelglobalntx;nnp¼wð1;1Þexpð-wð2;1ÞtÞ;%exponentialmodelpredictionnnp¼pþðp¼¼zerosð6;1ÞÞ1e-5 ðp¼¼onesð6;1ÞÞ1e-5;%ensure0opo1n**loglik¼ð 1Þðx:logðpÞþðn-xÞ:logð1pÞÞ;%minuslog-likelihoodforindividualobservationsloglik¼sumðloglikÞ;%overallminuslog-likelihoodbeingminimizedMatlabCodeforLSE%,it%takessamplesize(n),timeintervals(t)andobservedproportioncorrect%(y)%squareserrorglobalt;%defineglobalvariableopts¼optimset(‘DerivativeCheck’,‘off’,‘Display’,‘off’,‘TolX’,1e-6,‘TolFun’,1e-6,‘Diagnostic-s’,‘off’,‘MaxIter’,200,‘LargeScale’,‘off’);%optionsettingsforoptimizationalgorithmn¼100;%numberofindependentbinomialtrials(.,samplesize)0t¼½13691218
;%timeintervalsasacolumnvector
;%observedproportioncorrectasacolumnvectorinitw¼randð2;1Þ;%startingparametervaluesloww¼zerosð2;1Þ;%parameterlowerboundsnupw¼100onesð2;1Þ;%parameterupperbounds0½w1;sse1;res1;exit1
¼lsqnonlinð‘powerlse;initw;loww;upw;opts;yÞ;%optimizationforpowermodel%w1:LSEestimates%sse1:minimizedSSEvalue%res1:valueoftheresidualatthesolution%exit1:optimizationhasconvergedifexit140ornototherwise0½w2;sse2;res2;exit2
¼lsqnonlinð‘expolse;initw;loww;upw;opts;yÞ;%optimizationforexponentialmodelr2ð1;1Þ¼1-sse1=sumððy-meanðyÞÞ:2Þ;%r2forpowermodel##r2ð2;1Þ¼1-sse2=sumððy-meanðyÞÞ:2Þ;%r2forexponentialmodel##formatlong;disp(num2str([w1w2r2],5));%displayoutresultsdisp(num2str([sse1sse2exit1exi2],5));%displayoutresultsend%endofthemainprogramfunctiondev¼powerlseðw;yÞ%POWERLSEThedeviationbetweenobservationandpredictionofthepower%modelglobalt;np¼wð1;1Þt:ð-wð2;1ÞÞ;%powermodelprediction#dev¼p y;%deviationbetweenpredictionandobservation,thesquareofwhichisbeingminimizedfunctiondev¼expolseðw;yÞ%EXPOLSEThedeviationbetweenobservationandpredictionofthe%exponentialmodelglobalt;nnp¼wð1;1Þexpð wð2;1ÞtÞ;%exponentialmodelpredictiondev¼p y;%deviationbetweenpredictionandobservation,thesquareofwhichisbeingminimizedReferencesLamberts,K.(2000).,107(2),227–,H.,&Zucchini,W.(1986).,NY:Akaike,H.(1973).:Petrox,.,&Caski,MurdockJr.,.(1961).,62,618–625.(–281).Budapest:,.,Forster,M.,&Browne,.(2000).SpecialissueBatchelder,.,&Crowther,.(1997).,44,–,41,45–,.,&Doksum,,.,Myung,.,&Zhang,S.(2002).Towardamethodof(1977).,CA:Holden-day,,G.,&Review,109,472–,.(2002).Statisticalinference(2nded.).PacificGrove,CA:,R.(1978).,85,59–108.,.,&Schervish,.(2002).Probabilityandstatistics(3rded.).Boston,MA:,.,Hinton,S.,&Wenzel,A.(1999).TheprecisetimecourseKirkpatrick,S.,Gelatt,.,&Vecchi,.(1983).:Learning,,220,671–,andCognition,25,1161–1176.