logo     Putative Exon-Exon junction Database

for New Alternative Splicing

 

   

High throughput tandem mass spectrometry provides us with valuable information for rapidly identifying potentially novel alternatively-spliced protein products from experimental datasets. However, the ability of identifying alternative splicing events through tandem mass spectrometry depends on the database against which the spectra are searched.

Our exon-exon junction database provides the scientific community with an efficient means to identify novel alternatively spliced protein isoforms using mass spectrometry data, and will be useful in annotating genome structures using rapidly accumulating MS (mass spectrometry) data.

We started with building a theoretical exon-exon junction protein database to account for all possible combination of exons for a gene while keeping the frame of translation (i.e. only keeping in-phase exon-exon combination). Our database was built from the Ensembl Core Database using the scripts we wrote in perl, Bioperl, mysql and Ensembl API. It contains every compatible exon-exon junction protein sequence encoded in the human genome.

We took 25 amino acid residues for each exon from those compatible exon combinations (if an exon has less than 25 amino acid, we include the whole exon). The reason for doing so is described below. A typical ion-trap mass spectrometer usually has a window size to detect peptides with molecular weight from 500 to 3000 daltons. A peptide with 25 amino acids would have a molecular weight of about 3000 daltons, which is at the upper range of MS detection. We then excluded those previously described exon-exon junction sequences if they have been annotated in the Ensembl database as our purpose is to identify novel splicing isoforms. By doing so, we also reduce the size of the exon-exon junction database and can save MS search time. The final database (PEEJ_database_Ensembl_Core45.tar.gz) has 873024 entries and is about 132Mb (not compressed) in size.

More detailed process of database building can be found in supplementary.doc.

 

 

 

 

 

 

 

 

 

 

 

 

Copyright 2008 Reserved By Systems Biology Platform of  ZCNI

Zhejiang California International Nanosystems Institute.268 Kai Xuan Road Hua Jia Chi Campus Zhejiang University. Hangzhou. China 310029