A DDBJ-KEGG-PDBj workflow: from pathways to protein-protein interactions


  • Yasumasa Shigemoto (WABI/SABI; DDBJ, Japan)
  • Akira Kinjo (PDBj, Japan)
  • Soichi Ogishima (Tokyo Medical and Dental Univ, Japan)
  • Masumi Itoh (Hokkaido Univ)

The objective

Given a KEGG pathway ID of a specific species,

  1. Identify homologous proteins of the pathway.
  2. Infer possible protein-protein interactions between components of the homologous pathways.


  1. Input: KEGG pathway ID
  2. Get amino acid sequences of all the proteins involved in the given pathway. (KEGG API)
  3. For each protein sequence, run BLAST against the UniProt (SwissProt / !TrEMBL) database to identify its homologs. (WABI)
  4. Sort the BLAST results according to species. (a local program)
    • At this point, we will have a phylogenetic profile.
  5. For each BLAST hit (a UniProt entry), run BLAST against PDB. (PDBj SeqNavi API).
  6. Find physical interactions between different components. (a local program)
    • That is, if two (UniProt) sequences are homologous to different chains of the same PDB entry, they are predicted to be in physical contact.
  7. Output the phylogenetic profile and the predicted species-wise PPI.

An example

  1. KEGG pathway (TCA cycle in E. coli): path:eco00020 eco00020.
  2. Phylogenetic profile
  3. The pathway colored according to homologs in Acinetobacter pathway colored by phylogenetic profile
  4. PPI list
  5. The pathway with PPI edges in Yersinia pestis PPI

What we learned

  1. We had to do a significant amount of coding in spite of the wealth of Web services.
  2. It takes a long time to finish all the flow.
  3. By actually solving biologically oriented problems, we can identify some typical use cases which might be useful for further development of web services.
    • e.g. 1. Given a set of gene names, return a phylogenetic profile.
    • e.g. 2. Given a set of blast hits, group them according to their species.