Version 10 (modified by akinjo, 16 years ago) |
---|
Creating an example of workflows using DDBJ-KEGG-PDBj
Member: Shigemoto (DDBJ) and Kinjo (PDBj) with some help from the KEGG people.
What we want to do
Finding pathway components (enzymes) that are expected to form physical contacts as judged from homologous protein structures.
In short, find the physical protein-protein interaction networks of homologous pathways in various species.
The procedure
Part I
- Pick a pathway of a specific organism, say Glycolysis / Gluconeogenesis pathway of Homo sapiens (human).
- Get all the amino acid sequences of that pathway component (c1, c2, ..., cm).
Part II
- Run blast for each sequence against UniProt(SwissProt / TrEMBL).
- Group the blast results according to biological species (s1, s2, ..., sn).
- For each pathway component (ci), pick the top hit from each species.
- At this step, you have (at most) an m compoents X n species array of protein sequences: Homologous Compoenents Matrix (HCM).
Part III
- Pick one column (species) of the HCM (sj) For each component of that column, run blast against PDB sequences.
- You will have a list of PDBID and Chain ID as a result of this blast search.
- If the blast results of two components share the same PDB ID but do not share the same Chain ID, they are judged to be in contact.
- By regarding each pathway component as a node, the inferred physical contacts between two components correspond to an edge in the network.
- Iterate the above two steps for each column of the HCM.
- Finally, show the list of inferred PPI of the pathway for each species.
Outline
Here's a pathway of human
Pick amino acid sequences for the pathway components
>comp1 ..... >comp2 .... >comp3 ....
After running blast for each pathway component
You will have a list of proteins annotated with species.
Comp1 protein1 species1 evalue... protein2 species2 evalue... protein3 species1 evalue... protein4 species2 evalue... protein5 species3 evalue... ... Comp2 protein1 species1 evalue... ...
For each species, select the top hit for each pathway component
That is, filling the Homologous Components Matrix. In doing so, just select the top hit for each component for each species.
species1 species2 species3 ... ....... speciesN -------+--------------------------- comp1 | protein11 protein12 ... comp2 | protein21 protein22 protein23 ... comp3 | ... comp4 | ... .... compM
Pick one column and run blast against PDB
Let's pick the species j.
for i = 1 to M run blast (protein(i,j) against PDB
Attachments
-
hsa00010.gif
(18.8 KB) - added by akinjo
16 years ago.
Glycolysis pathway of human
- pdb_result.tgz (0.8 KB) - added by yshigemo 16 years ago.
- sp_kegggene_matrix.xls (336 bytes) - added by yshigemo 16 years ago.
- species1.gif (95.0 KB) - added by ogishima 16 years ago.
- species2.gif (95.0 KB) - added by ogishima 16 years ago.
- species3.gif (95.0 KB) - added by ogishima 16 years ago.
- sp1-3_conservation.gif (108.4 KB) - added by ogishima 16 years ago.
- Escherichia_coli_O1_K1_APEC.gif (55.8 KB) - added by ogishima 16 years ago.
- Escherichia_coli_O139_H28_strain_E24377A_ETEC.gif (55.8 KB) - added by ogishima 16 years ago.
- Escherichia_coli_O157_H7.gif (55.8 KB) - added by ogishima 16 years ago.
- Escherichia_coli_O157_H7_str_EC4076.gif (55.8 KB) - added by ogishima 16 years ago.
- Escherichia_coli_O157_H7_str_EC4196.gif (55.8 KB) - added by ogishima 16 years ago.
- Escherichia_coli_strain_UTI89_UPEC.gif (55.8 KB) - added by ogishima 16 years ago.
- Citrate_cycle_conservation.gif (55.7 KB) - added by ogishima 16 years ago.
-
sample.gif
(26.4 KB) - added by itoh
16 years ago.
sample map with artificial data
- sp_kegggene_matrix.2.xls (59.1 KB) - added by yshigemo 16 years ago.
- ppi_by_species.txt (80.4 KB) - added by yshigemo 16 years ago.
- sp_kegggene_matrix.2.tgz (9.4 MB) - added by ogishima 16 years ago.
- sp_kegggene_matrix.3.xls (57.4 KB) - added by yshigemo 16 years ago.
- Yersinia_pestis.gif (25.1 KB) - added by itoh 16 years ago.
- Serratia_proteamaculans_strain_568.gif (32.5 KB) - added by itoh 16 years ago.