Biological Sequence Markup

This is a simple convention for marking up DNA, RNA, or protein sequences. The markup should be valid HTML so it can be easily displayed in browsers. An accompanying style sheet indicates how the features are to be displayed. We should probaly have three style sheets for DNA, RNA, and protein. For now, there is a style section in the header of this page. It should help us get started with an example of RNA markup below.

Features are noted with span elements. Each such feature should have a class attribute indicating what kind of feature it is. e.g. <span class="startCodon">AUG</span>. Any other annotation should be in the title attribute.

Sequences, being linear, should probably not contain any white space. This would include newline characters. I'm still a bit undecided about this. Each sequence should be in it's own code element. If it is to be broken into lines, the whole thing should be enclosed in a pre element.

Overlapping features cannot be represented this way. One could resort to marking the start and stop places separately sort of like marking the start and stop codons as opposed to marking the entire coding region.


123456789 123456789 123456789 123456789 123456789 123456789