BioLemmatizer |
The BioLemmatizer 1.2 release adds an optional functionality to normalize British English spellings into American English spellings and then retrieve corresponding lemmas. For instance: the lemma of "haemangioblastomata" will be "hemangioblastoma".
The BioLemmatizer 1.1 public release is the FULL version of the BioLemmatizer. It includes the data from the EBI term repository, the publicly available part of the BioLexicon database. The lemmatization accuracy of the BioLemmatizer 1.1 is 99% on a sampled set of CRAFT, a richly annotated corpus of 97 full-text biomedical journal articles.
If you use the BioLemmatizer to support academic research, please cite the following paper:
Haibin Liu, Tom Christiansen, William A Baumgartner Jr, and Karin Verspoor BioLemmatizer: a lemmatization tool for morphological processing of biomedical text Journal of Biomedical Semantics, 2012, 3:3.
Source code and resources pertaining to the BioLemmatizer 1.2 release are available here
Java API documentation can be found at http://biolemmatizer.sourceforge.net/apidocs/
Version 1.2 of the BioLemmatizer is available via a Maven repository. If you use Maven as your build tool, you can add the BioLemmatizer as a dependency by adding the following to your pom.xml file:
<dependency> <groupId>edu.ucdenver.ccp</groupId> <artifactId>biolemmatizer-core</artifactId> <version>1.2</version> </dependency> <dependency> <groupId>edu.ucdenver.ccp</groupId> <artifactId>biolemmatizer-uima</artifactId> <version>1.2</version> </dependency> <repository> <id>bionlp-sourceforge</id> <url>http://svn.code.sf.net/p/bionlp/code/repo</url> </repository>