Pattern/motif discovery in proteins is crucial for any biological sequence analysis. Hence we have processed Bio-Medical dictionaries such as UCMP
Glossary© for biological terms and from University
of Maryland Medical Centre for popular medical terms and technical terms from other dictionaries to obtain a collection of over 2008 terms having valid single letter characters used to represent amino acids.
Here we describe an approach in which the valid (characterized by the commonly
used single letter amino acid character codes/alphabets to represent protein
sequence) medical and biological terms are extracted from Bio-Medical dictionaries.
We also excluded the invalid medical terms by removing the single letter
character(s) which are not commonly used to represent amino acids like B,
J, O, U, X, Z or simply ‘JUZBOX’. Therefore our lexicon is having only valid Bio-Medical terms; in contrast we have named this tool as ‘JUZBOX’ to
represent we have excluded these characters/words in our lexicon. The valid
medical terms can be used for pattern and motif analysis to find the biological
conservation and significance in different organisms.