Ref: DDBJ-KEGG-PDBj DDBJ-KEGG-PDBj/Results

A DDBJ-KEGG-PDBj workflow: from pathways to protein-protein interactions

The objective of this satellite group is to examine the potentials and obstacles in web services by implementing a real-life use case. The goal of the workflow is to enumerate possible physical protein-protein interactions among proteins in a biochemical pathway. More specifically, the workflow proceeds as follows. (1) The user provide a KEGG pathway ID. (2) Extract the protein sequence of each enzyme in the specified pathway. (3) For each protein sequence, run BLAST search against Swiss-Prot database. (4) Construct a phylogenetic profile (a species-by-enzyme matrix) by identifying the top hits for each proteins and each species. (5) For each species in the phylogenetic profile, run BLAST searches for each protein sequence against PDB. (6) If two amino acid sequences (of the same species) have homologs in the same PDB entry, they are inferred to be in physical contact, and hence predicted to be an interacting pair. (7) Output image files highlighting the conserved and interacting proteins in the pathway map.

To implement the workflow outlined above, we have used the SOAP and REST APIs of DDBJ ( http://www.ddbj.nig.ac.jp/), KEGG ( http://www.genome.jp/) and PDBj ( http://www.pdbj.org/). The workflow can be divided into three parts corresponding to the three web sites. Part I consists of steps (1) and (2) (using KEGG API), Part II of steps (3) and (4) (using DDBJ WABI), Part III of steps (5) and (6) (using PDBj sequence navigator SOAP); step (7) is handled by a customized program on the client side. The main part of the client program was written in Java, but we were forced to switch to Perl for PDBj's sequence navigator due to a version incompatibility in SOAP libraries. Image manipulation programs were written in Perl and Ruby.