| 
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectedu.ucdenver.ccp.nlp.biolemmatizer.BioLemmatizer
public class BioLemmatizer
BioLemmatizer: Lemmatize a word in biomedical texts and return its lemma; the part of speech (POS) of the word is optional.
Usage:
 
        java -Xmx1G -jar biolemmatizer-core-1.0-jar-with-dependencies.jar [-l] <input_string> [POS tag]   or
 
  java -Xmx1G -jar biolemmatizer-core-1.0-jar-with-dependencies.jar [-l] -i <input_file_name> -o <output_file_name> or
  java -Xmx1G -jar biolemmatizer-core-1.0-jar-with-dependencies.jar [-l] -t
        
Example:
 
        java -Xmx1G -jar biolemmatizer-core-1.0-jar-with-dependencies.jar catalyses NNS
 
 
Please see the README file for more usage examples
| Field Summary | |
|---|---|
static String | 
lemmaSeparator
Lemma separator character  | 
 edu.northwestern.at.utils.corpuslinguistics.lemmatizer.Lemmatizer | 
lemmatizer
BioLemmatizer  | 
protected static String | 
mappingFileName
the Part-Of-Speech mapping file  | 
 Map<String,String[]> | 
mappingMajorClasstoPennPOS
Hierachical mapping file from major class to Penn Treebank POS  | 
 Map<String,String[]> | 
mappingPennPOStoNUPOS
Hierachical mapping file from PennPOS to NUPOS  | 
 edu.northwestern.at.utils.corpuslinguistics.partsofspeech.PartOfSpeechTags | 
partOfSpeechTags
NUPOS tags  | 
 edu.ucdenver.ccp.nlp.biolemmatizer.POSEntry | 
posEntry
POSEntry object to retrieve POS tag information  | 
 edu.northwestern.at.utils.corpuslinguistics.tokenizer.WordTokenizer | 
spellingTokenizer
Extract individual word parts from a contracted word.  | 
 edu.northwestern.at.utils.corpuslinguistics.lexicon.Lexicon | 
wordLexicon
Word lexicon for lemma lookup  | 
| Constructor Summary | |
|---|---|
BioLemmatizer()
Default constructor loads the lexicon from the classpath  | 
|
BioLemmatizer(File lexiconFile)
Constructor to initialize the class fields  | 
|
| Method Summary | |
|---|---|
 LemmataEntry | 
lemmatizeByLexicon(String spelling,
                                     String partOfSpeech)
Lemmatize a string with POS tag using Lexicon only  | 
 LemmataEntry | 
lemmatizeByLexiconAndRules(String spelling,
                                                     String partOfSpeech)
Lemmatize a string with POS tag using both lexicon lookup and lemmatization rules This is the preferred method as it gives the best lemmatization performance  | 
 LemmataEntry | 
lemmatizeByRules(String spelling,
                                 String partOfSpeech)
Lemmatize a string with POS tag using lemmatization rules only  | 
static void | 
main(String[] args)
Input arguments are parsed into a BioLemmatizerCmdOpts object. | 
| Methods inherited from class java.lang.Object | 
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Field Detail | 
|---|
public static String lemmaSeparator
public edu.northwestern.at.utils.corpuslinguistics.lemmatizer.Lemmatizer lemmatizer
public edu.northwestern.at.utils.corpuslinguistics.lexicon.Lexicon wordLexicon
public edu.northwestern.at.utils.corpuslinguistics.partsofspeech.PartOfSpeechTags partOfSpeechTags
public edu.northwestern.at.utils.corpuslinguistics.tokenizer.WordTokenizer spellingTokenizer
public Map<String,String[]> mappingPennPOStoNUPOS
public Map<String,String[]> mappingMajorClasstoPennPOS
protected static String mappingFileName
public edu.ucdenver.ccp.nlp.biolemmatizer.POSEntry posEntry
| Constructor Detail | 
|---|
public BioLemmatizer()
public BioLemmatizer(File lexiconFile)
lexiconFile - a reference to the lexicon file to use. If null, the lexicon that comes with the
            BioLemmatizer distribution is loaded from the classpath| Method Detail | 
|---|
public LemmataEntry lemmatizeByLexicon(String spelling,
                                       String partOfSpeech)
spelling - an input stringpartOfSpeech - POS tag of the input string
public LemmataEntry lemmatizeByRules(String spelling,
                                     String partOfSpeech)
spelling - an input stringpartOfSpeech - POS tag of the input string
public LemmataEntry lemmatizeByLexiconAndRules(String spelling,
                                               String partOfSpeech)
spelling - an input stringpartOfSpeech - POS tag of the input string
public static void main(String[] args)
BioLemmatizerCmdOpts object. Valid input arguments
 include:
 
 
  VAL    : Single input to be lemmatized
  VAL    : Part of speech of the single input to be lemmatized
  -f VAL : optional path to a lexicon file. If not set, the default lexicon 
           available on the classpath is used
  -i VAL : the path to the input file
  -l     : if present, only the lemma is returned (part-of-speech information is 
           suppressed)
  -o VAL : the path to the output file
  -t     : if present, the interactive mode is used
 
args - 
  | 
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||