Changes between Version 4 and Version 5 of DDBJ-KEGG-PDBj/Draft

Show
Ignore:
Timestamp:
2009/06/29 16:21:26 (15 years ago)
Author:
akinjo
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DDBJ-KEGG-PDBj/Draft

    v4 v5  
    44 
    55= A DDBJ-KEGG-PDBj workflow: from pathways to protein-protein interactions = 
     6== Objectives and Outline == 
    67The objective of this satellite group is to examine the potentials and obstacles in web services by implementing a real-life use case. 
    78The goal of the workflow is to enumerate possible physical protein-protein interactions among proteins in a biochemical pathway. 
     
    1516(7) Output image files highlighting the conserved and interacting proteins in the pathway map. 
    1617 
     18== Implementation == 
    1719To implement the workflow outlined above, we have used the SOAP and REST APIs of DDBJ (http://www.ddbj.nig.ac.jp/), KEGG (http://www.genome.jp/) and PDBj (http://www.pdbj.org/). The workflow can be divided into three parts corresponding to the three web sites. Part I consists of steps (1) and (2) (using KEGG API), Part II of steps (3) and (4) (using DDBJ WABI), Part III of steps (5) and (6) (using PDBj sequence navigator SOAP); step (7) is handled by a customized program on the client side. The main part of the client program was written in Java, but we were forced to switch to Perl for PDBj's sequence navigator due to a version incompatibility in SOAP libraries. Image manipulation programs were written in Perl and Ruby. 
     20 
     21== Outcome == 
     22We were able to implement the above workflow within the three days of BioHackathon. The following is a list of what we have noticed during the development. 
     23First, we had to do a significant amount of coding in spite of the wealth of web services. Most of the codes were dedicated to converting file formats between different steps. Although some of such conversions may be automated by providing new web services, we suspect that non-trivial amount of coding for format conversions is inevitable if we try to tackle new problems. It was also noted that the non-standard output format of BLAST search in PDBj's sequence navigator caused some trouble. Second, it can take a significant amount of time to finish a whole analysis. This is for the most part due to the amount of BLAST searches required for constructing a phylogenetic profile. In retrospect, this problem might have been solved if our client program was made multi-threaded (assuming the web servers can handle hundreds of requests at the same time).   
     24By actually solving biologically oriented problems, we can identify some typical use cases which might be useful for further development of web services. 
     25e.g. 1. Given a set of gene names, return a phylogenetic profile. 
     26e.g. 2. Given a set of blast hits, group them according to their species.