Pattern/motif discovery in proteins is crucial for any biological sequence analysis. Hence we have processed Bio-Medical dictionaries such as UCMP Glossary© for biological terms and from University of Maryland Medical Centre for popular medical terms and technical terms from other dictionaries to obtain a collection of over 2008 terms having valid single letter characters used to represent amino acids.

Here we describe an approach in which the valid (characterized by the commonly used single letter amino acid character codes/alphabets to represent protein sequence) medical and biological terms are extracted from Bio-Medical dictionaries. We also excluded the invalid medical terms by removing the single letter character(s) which are not commonly used to represent amino acids like B, J, O, U, X, Z or simply ‘JUZBOX’. Therefore our lexicon is having only valid Bio-Medical terms; in contrast we have named this tool as ‘JUZBOX’ to represent we have excluded these characters/words in our lexicon. The valid medical terms can be used for pattern and motif analysis to find the biological conservation and significance in different organisms.


We express our sincere gratitude to the following resources