I have to read and respond to this article for my 08:00 class:
Comeau and Wilbur (2004), “Non-Word Identification or Spell Checking without a Dictionary.”Ā JASIST, 55(2), 169-177.
The hilarious part is that “Comeau” is misspelled as “Commeau” on the class website.
The article is actually fairly interesting. The authors use a context score, 𝛂, as a baseline of how likely a word is to be spelled correctly and develop a method to predict which low-𝛂 tokens will be misspellings. They say their method has a recall of 79% (finds 79% of the misspellings) and a precision of 86% (86% of the things it calls misspelled actually are) for the MEDLINEĀ© corpus they use. Their use of MEDLINEĀ© is important, since a huge percentage of the correct words in the corpus don’t appear in any dictionary. Oh, and their method can distinguish abbreviations/acronyms from words, too. Pretty hot.


Post a Comment