Version 14 (modified by akinjo, 14 years ago)


A DDBJ-KEGG-PDBj workflow: from pathways to protein-protein interactions

for paper

(The following is a draft of a part of a paper to be submitted.)

=== A DDBJ-KEGG-PDBj workflow: from pathways to protein-protein interactions == The objective of this working group is to examine the potentials and obstacles of web services by implementing a real-life use case. The goal of the workflow is to enumerate possible physical protein-protein interactions among proteins in a biochemical pathway. More specifically, the workflow proceeds as follows. (1) The user provide a KEGG pathway ID. (2) Extract the amino acid sequence of each enzyme in the specified pathway. (3) For each amino acid sequence, run BLAST search against Swiss-Prot database. (4) Construct a phylogenetic profile (a species-by-enzyme matrix) by identifying the top hits for each proteins and each species. (5) For each species in the phylogenetic profile, run BLAST searches for each amino acid sequence against PDB. (6) If two amino acid sequences (of the same species) have homologs in the same PDB entry, they are inferred to be in possible contact, and hence predicted to be an interacting pair.


  • Yasumasa Shigemoto (WABI/SABI; DDBJ, Japan)
  • Akira Kinjo (PDBj, Japan)
  • Soichi Ogishima (Tokyo Medical and Dental Univ, Japan)
  • Masumi Itoh (Hokkaido Univ)

The objective

Given a KEGG pathway ID of a specific species,

  1. Identify homologous proteins of the pathway.
  2. Infer possible protein-protein interactions between components of the homologous pathways.


  1. Input: KEGG pathway ID
  2. Get amino acid sequences of all the proteins involved in the given pathway. (KEGG API)
  3. For each protein sequence, run BLAST against the UniProt (SwissProt / !TrEMBL) database to identify its homologs. (WABI)
  4. Sort the BLAST results according to species. (a local program)
    • At this point, we will have a phylogenetic profile.
  5. For each BLAST hit (a UniProt entry), run BLAST against PDB. (PDBj SeqNavi API).
  6. Find physical interactions between different components. (a local program)
    • That is, if two (UniProt) sequences are homologous to different chains of the same PDB entry, they are predicted to be in physical contact.
  7. Output the phylogenetic profile and the predicted species-wise PPI.

An example

  1. KEGG pathway (TCA cycle in E. coli): path:eco00020 eco00020.
  2. Phylogenetic profile
  3. The pathway colored according to homologs in Acinetobacter pathway colored by phylogenetic profile
  4. PPI list
  5. The pathway with PPI edges in Yersinia pestis PPI

What we learned

  1. We had to do a significant amount of coding in spite of the wealth of Web services.
  2. It takes a long time to finish all the flow.
  3. By actually solving biologically oriented problems, we can identify some typical use cases which might be useful for further development of web services.
    • e.g. 1. Given a set of gene names, return a phylogenetic profile.
    • e.g. 2. Given a set of blast hits, group them according to their species.