Title | The Stanford Natural Language Processing Group |
---|---|
Author | AI Solutions |
Course | Problem Based Learning in Science, Technology and Society |
Institution | Aalborg Universitet |
Pages | 10 |
File Size | 200.8 KB |
File Type | |
Total Downloads | 32 |
Total Views | 134 |
Download The Stanford Natural Language Processing Group PDF
29/07/2019
The Stanford Natural Language Processing Group
Software(/software/)>StanfordNamedEntity (/software/) Recognizer(NER)
About|Citation|Gettingstarted|Questions|Mailinglists|Download|Extensions|Models|Onlinedemo (http://nlp.stanford.edu:8080/ner)|Releasehistory|FAQ(crf-faq.html)
About StanfordNERisaJavaimplementationofaNamedEntityRecognizer.NamedEntityRecognition(NER)labels sequencesofwordsinatextwhicharethenamesofthings,suchaspersonandcompanynames,orgeneand proteinnames.Itcomeswithwell-engineeredfeatureextractorsforNamedEntityRecognition,andmanyoptions fordeningfeatureextractors.IncludedwiththedownloadaregoodnamedentityrecognizersforEnglish, particularlyforthe3classes(PERSON,ORGANIZATION,LOCATION),andwealsomakeavailableonthispage variousothermodelsfordierentlanguagesandcircumstances,includingmodelstrainedonjusttheCoNLL2003 (http://www.cnts.ua.ac.be/conll2003/ner/)Englishtrainingdata. StanfordNERisalsoknownasCRFClassier.Thesoftwareprovidesageneralimplementationof(arbitraryorder) linearchainConditionalRandomField(CRF)sequencemodels.Thatis,bytrainingyourownmodelsonlabeled data,youcanactuallyusethiscodetobuildsequencemodelsforNERoranyothertask.(CRFmodelswere pioneeredbyLaerty,McCallum,andPereira(2001)(http://www.cis.upenn.edu/~pereira/papers/crf.pdf);see SuttonandMcCallum(2006)(http://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf)orSuttonand McCallum(2010)(http://arxiv.org/pdf/1011.4088v1)formorecomprehensibleintroductions.) TheoriginalCRFcodeisbyJennyFinkel.ThefeatureextractorsarebyDanKlein,ChristopherManning,andJenny Finkel.MuchofthedocumentationandusabilityisduetoAnnaRaerty.Morerecentcodedevelopmenthasbeen donebyvariousStanfordNLPGroupmembers. StanfordNERisavailablefordownload,licensedundertheGNUGeneralPublicLicense (http://www.gnu.org/licenses/gpl-2.0.html)(v2orlater).Sourceisincluded.Thepackageincludescomponents forcommand-lineinvocation(lookattheshellscriptsandbatchlesincludedinthedownload),runningasaserver (lookat NERServer inthesourcesjarle),andaJavaAPI(lookatthesimpleexamplesinthe NERDemo.java(nerexample/NERDemo.java) leincludedinthedownload,andthenatthejavadocs).StanfordNERcodeisdual
licensed(inasimilarmannertoMySQL,etc.).Opensourcelicensingisunderthe fullGPL,whichallowsmanyfree uses.Fordistributorsofproprietarysoftware(http://www.gnu.org/licenses/gpl-faq.html#GPLInProprietarySystem), commerciallicensing(http://otlportal.stanford.edu/technder/technology/ID=24628)isavailable.Ifyoudon'tneed acommerciallicense,butwouldliketosupportmaintenanceofthesetools,wewelcomegifts. https://nlp.stanford.edu/software/CRF-NER.shtml
1/10
29/07/2019
The Stanford Natural Language Processing Group
Citation TheCRFsequencemodelsprovidedheredonotpreciselycorrespondtoanypublishedpaper,butthecorrect papertociteforthemodelandsoftwareis: JennyRoseFinkel,TrondGrenager,andChristopherManning.2005.IncorporatingNon-localInformationinto InformationExtractionSystemsbyGibbsSampling. Proceedingsofthe43ndAnnualMeetingoftheAssociation
forComputationalLinguistics(ACL2005),pp.363-370. http://nlp.stanford.edu/~manning/papers/gibbscrf3.pdf
(http://nlp.stanford.edu/~manning/papers/gibbscrf3.pdf)
Thesoftwareprovidedhereissimilartothebaselinelocal+Viterbimodelinthatpaper,butaddsnewdistributional similaritybasedfeatures(inthe -distSim classiers).Distributionalsimilarityfeaturesimproveperformancebut themodelsrequiresomewhatmorememory.OurbigEnglishNERmodelsweretrainedonamixtureofCoNLL, MUC-6,MUC-7andACEnamedentitycorpora,andasaresultthemodelsarefairlyrobustacrossdomains.
Gettingstarted YoucantryoutStanfordNERCRFclassiers(http://nlp.stanford.edu:8080/ner/)orStanfordNERaspartofStanford CoreNLP(http://corenlp.run/)ontheweb,tounderstandwhatStanfordNERisandwhetheritwillbeusefultoyou. Tousethesoftwareonyourcomputer,downloadtheziple(http://nlp.stanford.edu/software/CRFNER.html#Download).Youthenunzipthelebyeitherdouble-clicingontheziple,usingaprogramforunpacking ziples,orbyusingthe unzip command.Thisshordcreatea stanford-ner folder.Thereisnoinstallation procedure,youshouldbeabletorunStanfordNERfromthatfolder.Normally,StanfordNERisrunfromthe commandline(i.e.,shellorterminal).CurrentreleasesofStanfordNERrequireJava1.8orlater.Eithermakesure youhaveorgetJava8(http://java.com/)orconsiderrunninganearlierversionofthesoftware(versionsthrough 3.4.1supportJava6and7).. NERGUI ProvidingjavaisonyourPATH,youshouldbeabletorunanNERGUIdemonstrationbyjustclicking.Itmightwork todouble-clickonthestanford-ner.jararchivebutthismaywellfailastheoperatingsystemdoesnotgiveJava enoughmemoryforourNERsystem,soitissafertoinsteaddoubleclickonthener-gui.baticon(Windows)ornergui.sh(Linux/Unix/MacOSX).Then,usingthetopoptionfromtheClassiermenu,loadaCRFclassierfromthe classiersdirectoryofthedistribution.YoucantheneitherloadatextleorwebpagefromtheFilemenu,or decidetousethedefaulttextinthewindow.Finally,youcannownamedentitytagthetextbypressingtheRun NERbutton. https://nlp.stanford.edu/software/CRF-NER.shtml
2/10
29/07/2019
The Stanford Natural Language Processing Group
SingleCRFNERClassierfromcommand-line Fromacommandline,youneedtohavejavaonyourPATHandthestanford-ner.jarleinyourCLASSPATH.(The wayofdoingthisdependsonyourOS/shell.)Thesupplied ner.bat and ner.sh shouldworktoallowyoutotaga singlele,whenrunningfrominsidetheStanfordNERfolder.Forexample,forWindows: nerle
Thiscorrespondstothefullcommand: java-mx600m-cp"*;lib\*"edu.stanford.nlp.ie.crf.CRFClassifier-loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz-textFilesample.txt
OronUnix/Linuxyoushouldbeabletoparsethetestleinthedistributiondirectorywiththecommand: java-mx600m-cp"*:lib/*"edu.stanford.nlp.ie.crf.CRFClassifier-loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz-textFilesample.txt
Here'sanoutputoptionthatwillprintoutentitiesandtheirclasstothersttwocolumnsofatab-separated columnsoutputle: java-mx600m-cp"*;lib/*"edu.stanford.nlp.ie.crf.CRFClassifier-loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz-outputFormattabbedEntities-textFile sample.txt>sample.tsv
FullStanfordNERfunctionality ThisstandalonedistributionalsoallowsaccesstothefullNERcapabilitiesoftheStanfordCoreNLPpipeline.These capabilitiescanbeaccessedviathe NERClassifierCombiner class.NERClassierCombinerallowsformultiple CRFstobeusedtogether,andhasoptionsforrecognizingnumericsequencepatternsandtimepatternswiththe rule-basedNERofSUTime. TouseNERClassierCombineratthecommand-line,thejarsinlibdirectoryandstanford-ner.jarmustbeinthe CLASSPATH.Hereisanexamplecommand: java-mx1g-cp"*:lib/*"edu.stanford.nlp.ie.NERClassifierCombiner-textFilesample.txtner.model classifiers/english.all.3class.distsim.crf.ser.gz,classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.dis
https://nlp.stanford.edu/software/CRF-NER.shtml
3/10
29/07/2019
The Stanford Natural Language Processing Group
Theonedierenceyoushouldseefromaboveisthat SundayisnowrecognizedasaDATE. ProgrammaticuseviaAPI YoucancallStanfordNERfromyourowncode.Thele NERDemo.java includedinthedistributionillustrates severalwaysofcallingthesystemprogramatically.Wesuggestthatyoustartfromthere,andthenlookatthe javado,etc.asneeded. Programmaticuseviaaservice StanfordNERcanalsobesetuptorunasaserverlisteningonasocket.
Questions YoucanlookataPowerpointIntroductiontoNERandtheStanfordNERpackage[ppt(jenny-ner-2007.ppt)][pdf (jenny-ner-2007.pdf)].ThereisalsoalistofFrequentlyAskedQuestions(crf-faq.html)(FAQ),withanswers!This includessomeinformationontrainingmodels.Furtherdocumentationisprovidedintheincluded README.txt andinthejavadocs. Haveasupportquestion?AskusonStackOverow(http://stackoverow.com)usingthetagstanford-nlp. Feedbackandbugreports/xescanbesenttoourmailinglists.
MailingLists Wehave3mailinglistsfortheStanfordNamedEntityRecognizer,allofwhicharesharedwithotherJavaNLPtools (withtheexclusionoftheparser).Eachaddressisat @lists.stanford.edu : 1. java-nlp-user Thisisthebestlisttoposttoinordertosendfeaturerequests,makeannouncements,or fordiscussionamongJavaNLPusers.(PleaseasksupportquestionsonStackOverow (http://stackoverow.com)usingthestanford-nlptag.) Youhavetosubscribetobeabletousethislist.Jointhelistviathiswebpage (https://mailman.stanford.edu/mailman/listinfo/java-nlp-user)orbyemailing [email protected] .(Leavethesubjectandmessagebodyempty.)Youcanalsolookatthelist
archives(https://mailman.stanford.edu/pipermail/java-nlp-user/). 2. java-nlp-announce ThislistwillbeusedonlytoannouncenewversionsofStanfordJavaNLPtools.Soitwill beverylowvolume(expect1-3messagesayear).Jointhelistviathiswebpage (https://mailman.stanford.edu/mailman/listinfo/java-nlp-announce)orbyemailing [email protected] .(Leavethesubjectandmessagebodyempty.)
3. java-nlp-support Thislistgoesonlytothesoftwaremaintainers.It'sagoodaddressforlicensing questions,etc.Forgeneraluseandsupportquestions,you'rebetterojoiningandusing java-nlpuser .Youcannotjoin java-nlp-support ,butyoucanmailquestionsto [email protected] .
https://nlp.stanford.edu/software/CRF-NER.shtml
4/10
29/07/2019
The Stanford Natural Language Processing Group
Download DownloadStanfordNamedEntityRecognizerversion3.9.2(stanford-ner-2018-10-16.zip) Thedownloadisa151Mzippedle(mainlyconsistingofclassierdataobjects).Ifyouunpackthatle,youshould haveeverythingneededforEnglishNER(oruseasageneralCRF).Itincludesbatchlesforrunningunder WindowsorUnix/Linux/MacOSX,asimpleGUI,andtheabilitytorunasaserver.StanfordNERrequiresJavav1.8+. IfyouwanttouseStanfordNERforotherlanguages,you'llalsoneedtodownloadmodellesforthoselanguages; seefurtherbelow.
Extensions:PackagesbyothersusingStanfordNER Forsome(computer)languages,therearemoreup-to-dateinterfacestoStanfordNERavailablebyusingitinside StanfordCoreNLP(http://stanfordnlp.github.io/CoreNLP/other-languages.html),andyouarebetterogetting thosefromtheCoreNLPpageandusingthem.... ApacheTika:NamedEntityRecognition(NER)withTika(https://wiki.apache.org/tika/TikaAndNER). JavaScript/npm: PranavHerurhaswrittenner-server(https://www.npmjs.com/package/ner-server).Source (https://github.com/PranavHerur/ner-server)ongithub. NikhilSrivastavahaswrittenner(https://www.npmjs.com/package/ner).Source (https://github.com/niksrc/ner)ongithub. VarunChatterjihaswrittenstanford-ner(https://www.npmjs.com/package/stanford-ner).Source (https://github.com/vchatterji/stanford-ner)ongithub. .NET/F#/C#:SergeyTihonhasportedStanfordNERtoF#(andother.NETlanguages,suchasC#) (http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordNER.html),usingIKVM.Seealsopageson:GitHub (http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordNER.html)andNuGet (http://nuget.org/packages/Stanford.NLP.NER/). Perl:KierenDimenthaswrittenText-NLP-Stanford-EntityExtract (https://metacpan.org/pod/Text::NLP::Stanford::EntityExtract),aPerlmodulethatprovidesaninterfaceto StanfordNERrunningasaserver. PHP:PatrickSchurin2017wrotePHPwrapperforStanfordPOSandNERtaggers (https://github.com/patrickschur/stanford-nlp-tagger).Alsoonpackagist (https://packagist.org/packages/patrickschur/stanford-nlp-tagger).Secondchoice:PHP-Stanford-NLP (https://github.com/agentile/PHP-Stanford-NLP).SupportsPOSTagger,NER,Parser.ByAnthonyGentile (agentile). Python:
https://nlp.stanford.edu/software/CRF-NER.shtml
5/10
29/07/2019
The Stanford Natural Language Processing Group
DatHoangwrotepyner(https://github.com/dat/pyner),aPythoninterfacetoStanfordNER. [Old
version.] NLTK(2.0+)(http://nltk.org/)containsaninterfacetoStanfordNERwrittenbyNitinMadnani: documentation(http://nltk.org/api/nltk.tag.html#module-nltk.tag.stanford)(note:setthecharacter encodingoryougetASCIIbydefault!),code(http://nltk.org/_modules/nltk/tag/stanford.html),on Github(https://github.com/nltk/nltk/blob/master/nltk/tag/stanford.py). scrapy-corenlp(https://github.com/vu3jej/scrapy-corenlp),aPythonScrapy(https://scrapy.org/)(web pagescraping)middlewarebyJitheshE.J.PyPI(https://pypi.python.org/pypi/scrapy-corenlp). Ruby:tiendunghaswrittenaRubyBinding(http://github.com/tiendung/ruby-nlp)fortheStanfordPOS taggerandNamedEntityRecognizer. UIMA:FlorianLawsmadeaStanfordNERUIMA(http://uima.apache.org/)annotatorusingamodiedversion ofStanfordNER,whichisavailableonhishomepage(http://www.orianlaws.de/software/). [Oldversion.]
Models IncludedwithStanfordNERarea4classmodeltrainedontheCoNLL2003eng.train,a7classmodeltrainedon theMUC6andMUC7trainingdatasets,anda3classmodeltrainedonbothdatasetsandsomeadditionaldata (includingACE2002andlimitedamountsofin-housedata)ontheintersectionofthoseclasssets.(Thetraining dataforthe3classmodeldoesnotincludeanymaterialfromtheCoNLLeng.testaoreng.testbdatasets,nor anyoftheMUC6or7testordevtestdatasets,norAlanRitter'sTwitterNERdata,soalloftheseremainvalidtests ofitsperformance.) 3class: Location,Person,Organization 4class: Location,Person,Organization,Misc 7class: Location,Person,Organization,Money,Percent,Date,Time Thesemodelseachusedistributionalsimilarityfeatures,whichprovideconsiderableperformancegainatthecost ofincreasingtheirsizeandruntime.Wealsohavemodelsthatarethesameexceptwithoutthedistributional similarityfeatures.YoucanndtheminourEnglishmodelsjar.Youcaneitherunpackthejarleoraddittothe classpath;ifyouaddthejarletotheclasspath,youcanthenloadthemodelsfromthepath edu/stanford/nlp/models/... .Youcanrun jar-tf togetthelistoflesinthejarle.
Alsoavailablearecaselessversionsofthesemodels,betterforuseontextsthataremainlyloweroruppercase, ratherthanfollowtheconventionsofstandardEnglish CoreNLPmodelsjarsdownloadpage(https://stanfordnlp.github.io/CoreNLP/index.html#download)
https://nlp.stanford.edu/software/CRF-NER.shtml
6/10
29/07/2019
The Stanford Natural Language Processing Group
Importantnote:Therewasaproblemwiththev3.6.0EnglishCaselessNERmodel.Seethispage (http://stanfordnlp.github.io/CoreNLP/caseless.html).
German AGermanNERmodelisavailable,basedonworkbyManaalFaruquiandSebastianPadó.Youcannditinthe CoreNLPGermanmodelsjar.ForcitationandotherinformationrelatingtotheGermanclassiers,pleasesee SebastianPado'sGermanNERpage(http://www.nlpado.de/~sebastian/software/ner_german.html)(butthe modelstherearenowmanyyearsold;youshouldusethebettermodelsthatwehave!).Itisa4classIOB1 classier(see,e.g.,Memory-BasedShallowParsing(https://arxiv.org/abs/cs/0204049)byErikF.TjongKimSang). Thetagsgiventowordsare:I-LOC,I-PER,I-ORG,I-MISC,B-LOC,B-PER,B-ORG,B-MISC,O.Itistrainedoverthe CoNLL2003datawithdistributionalsimilarityclassesbuiltfromtheHugeGermanCorpus. CoreNLPmodelsjarsdownloadpage(https://stanfordnlp.github.io/CoreNLP/index.html#download) Hereareacoupleofcommandsusingthesemodels,twosampleles,andacoupleofnotes.RunningonTSVles: themodelsweresavedwithoptionsfortestingonGermanCoNLLNERles.Whilethemodelsusejustthesurface wordform,theinputreaderexpectsthewordintherstcolumnandtheclassinthefthcolum(1-indexed colums).Youcaneithermaketheinputlikethatorelsechangetheexpectationswith,say,theoption -map "word=0,answer=1" (0-indexedcolumns).ThesemodelswerealsotrainedondatawithstraightASCIIquotesand
BIOentitytags.Also,becarefulofthetextencoding:ThedefaultisUnicode;use -encodingiso-8859-15 ifthe textisin8-bitencoding. TSVminitestle: german-ner.tsv (german-ner.tsv)—Textminitestle: german-ner.txt (german-ner.txt) java-cp"*"edu.stanford.nlp.ie.crf.CRFClassifier-loadClassifieredu/stanford/nlp/model java-cp"*"edu.stanford.nlp.ie.crf.CRFClassifier-loadClassifieredu/stanford/nlp/model
Spanish Fromversion3.4.1forward,wehaveaSpanishmodelavailableforNER.ItisincludedintheSpanishcorenlp modelsjar. CoreNLPmodelsjarsdownloadpage(https://stanfordnlp.github.io/CoreNLP/index.html#download)
Chinese
https://nlp.stanford.edu/software/CRF-NER.shtml
7/10
29/07/2019
The Stanford Natural Language Processing Group
WealsoprovideChinesemodelsbuiltfromtheOntonotesChinesenamedentitydata.Therearetwomodels,one usingdistributionalsimilarityclustersandonewithout.Thesearedesignedtoberunon word-segmentedChinese. So,ifyouwanttousetheseonnormalChinesetext,youwillrstneedtorunStanfordWordSegmenter (http://nlp.stanford.edu/software/segmenter.html)orsomeotherChinesewordsegmenter,andthenrunNERon theoutputofthat! CoreNLPmodelsjarsdownloadpage(https://stanfordnlp.github.io/CoreNLP/index.html#download)
OnlineDemo Wehaveanonlinedemo(http://nlp.stanford.edu:8080/ner)ofseveralofourNERmodels.SpecialthankstoDat Hoang(https://github.com/dat),whoprovidedtheinitialversion.Notethattheonlinedemodemonstratessingle CRFmodels;inordertoseetheeectofthetimeannotatororthecombinedmodels,seeCoreNLP (http://nlp.stanford.edu/software/corenlp.html).
ReleaseHistory Version
Date
Description
3.9.2(stanford-ner-2018- 2018-10-16 Updatedforcompatibility 10-16.zip) 3.9.1(stanford-ner-2018- 2018-02-27 KBPnermodelsforChineseandSpanish 02-27.zip) 3.8.0(stanford-ner-2017- 2017-06-09 Updatedforcompatibility 06-09.zip) 3.7.0(stanford-ner-2016- 2016-10-31 ImprovementstoChineseandGermanNER 10-31.zip) 3.6.0(stanford-ner-2015- 2015-12-09 Updatedforcompatibility 12-09.zip) 3.5.2(stanford-ner-2015- 2015-04-20 synchstandaloneandCoreNLPfunctionality 04-20.zip) 3.5.1(stanford-ner-2015- 2015-01-29 Substantialaccuracyimprovements 01-29.zip) 3.5.0(stanford-ner-2014- 2014-10-26 UpgradetoJava8 10-26.zip)
https://nlp.stanford.edu/software/CRF-NER.shtml
8/10
29/07/2019
The Stanford Natural Language Processing Group
3.4.1(stanford-ner-2014- 2014-08-27 AddedSpanishmodels 08-27.zip) 3.4(stanford-ner-2014-
2014-06-16 Fixseri...