Wednesday, December 30, 2009

Glossary Replacement

TODOs 01/04/2010 10pm
  • Added back boundary together with ^|\W\|\b + phrases + $|\W
  • Removed word boundary from the regex, not sure the impact
  • Is there a need to escape parenthesis, full-stops etc.?
  • Testing - long texts with tags, scripts etc., sizeable sample of terms
  • Tagged item requirement in sec 2.20, use :term: in title and use the same in content, don't use tag? May need a list of valid tags to match
  • Exceptions (while parsing html, while replacing text (regex), error retrieving cache, invalid channel
  • Product channel required instead of infered (can be changed).
  • Case sensitivity - will cause issues with look back and or condition
TODOs 01/04/2010
  • Refactor into smaller classes and methods
  • Loading of cache needs revisiting
  • If time permits look into OSCache for loading and refresh

HTML Parsing
  • Appending tags, attributes
  • Calling replace function
  • Using Jericho parser
Load
  • Servlet
  • Own class, singleton
  • Caching
Regualar expression
  • looping through HashMap for terms
  • Regex
  • Cleanup definition (strip html tags)? Since we have the parser already