|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.ucdenver.ccp.nlp.biolemmatizer.BioLemmatizer
public class BioLemmatizer
BioLemmatizer: Lemmatize a word in biomedical texts and return its lemma; the part of speech (POS) of the word is optional.
Usage:
java -Xmx1G -jar biolemmatizer-core-1.0-jar-with-dependencies.jar [-l] <input_string> [POS tag] or
java -Xmx1G -jar biolemmatizer-core-1.0-jar-with-dependencies.jar [-l] -i <input_file_name> -o <output_file_name> or
java -Xmx1G -jar biolemmatizer-core-1.0-jar-with-dependencies.jar [-l] -t
Example:
java -Xmx1G -jar biolemmatizer-core-1.0-jar-with-dependencies.jar catalyses NNS
Please see the README file for more usage examples
Field Summary | |
---|---|
static String |
lemmaSeparator
Lemma separator character |
edu.northwestern.at.utils.corpuslinguistics.lemmatizer.Lemmatizer |
lemmatizer
BioLemmatizer |
protected static String |
mappingFileName
the Part-Of-Speech mapping file |
Map<String,String[]> |
mappingMajorClasstoPennPOS
Hierachical mapping file from major class to Penn Treebank POS |
Map<String,String[]> |
mappingPennPOStoNUPOS
Hierachical mapping file from PennPOS to NUPOS |
edu.northwestern.at.utils.corpuslinguistics.partsofspeech.PartOfSpeechTags |
partOfSpeechTags
NUPOS tags |
edu.ucdenver.ccp.nlp.biolemmatizer.POSEntry |
posEntry
POSEntry object to retrieve POS tag information |
edu.northwestern.at.utils.corpuslinguistics.tokenizer.WordTokenizer |
spellingTokenizer
Extract individual word parts from a contracted word. |
edu.northwestern.at.utils.corpuslinguistics.lexicon.Lexicon |
wordLexicon
Word lexicon for lemma lookup |
Constructor Summary | |
---|---|
BioLemmatizer()
Default constructor loads the lexicon from the classpath |
|
BioLemmatizer(File lexiconFile)
Constructor to initialize the class fields |
Method Summary | |
---|---|
LemmataEntry |
lemmatizeByLexicon(String spelling,
String partOfSpeech)
Lemmatize a string with POS tag using Lexicon only |
LemmataEntry |
lemmatizeByLexiconAndRules(String spelling,
String partOfSpeech)
Lemmatize a string with POS tag using both lexicon lookup and lemmatization rules This is the preferred method as it gives the best lemmatization performance |
LemmataEntry |
lemmatizeByRules(String spelling,
String partOfSpeech)
Lemmatize a string with POS tag using lemmatization rules only |
static void |
main(String[] args)
Input arguments are parsed into a BioLemmatizerCmdOpts object. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static String lemmaSeparator
public edu.northwestern.at.utils.corpuslinguistics.lemmatizer.Lemmatizer lemmatizer
public edu.northwestern.at.utils.corpuslinguistics.lexicon.Lexicon wordLexicon
public edu.northwestern.at.utils.corpuslinguistics.partsofspeech.PartOfSpeechTags partOfSpeechTags
public edu.northwestern.at.utils.corpuslinguistics.tokenizer.WordTokenizer spellingTokenizer
public Map<String,String[]> mappingPennPOStoNUPOS
public Map<String,String[]> mappingMajorClasstoPennPOS
protected static String mappingFileName
public edu.ucdenver.ccp.nlp.biolemmatizer.POSEntry posEntry
Constructor Detail |
---|
public BioLemmatizer()
public BioLemmatizer(File lexiconFile)
lexiconFile
- a reference to the lexicon file to use. If null, the lexicon that comes with the
BioLemmatizer distribution is loaded from the classpathMethod Detail |
---|
public LemmataEntry lemmatizeByLexicon(String spelling, String partOfSpeech)
spelling
- an input stringpartOfSpeech
- POS tag of the input string
public LemmataEntry lemmatizeByRules(String spelling, String partOfSpeech)
spelling
- an input stringpartOfSpeech
- POS tag of the input string
public LemmataEntry lemmatizeByLexiconAndRules(String spelling, String partOfSpeech)
spelling
- an input stringpartOfSpeech
- POS tag of the input string
public static void main(String[] args)
BioLemmatizerCmdOpts
object. Valid input arguments
include:
VAL : Single input to be lemmatized VAL : Part of speech of the single input to be lemmatized -f VAL : optional path to a lexicon file. If not set, the default lexicon available on the classpath is used -i VAL : the path to the input file -l : if present, only the lemma is returned (part-of-speech information is suppressed) -o VAL : the path to the output file -t : if present, the interactive mode is used
args
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |