Changes between Version 52 and Version 53 of SatelliteBigData

Show
Ignore:
Timestamp:
2009/03/20 14:25:22 (15 years ago)
Author:
severin
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SatelliteBigData

    v52 v53  
    9696 - maybe not all data will end up in ONE archive (because it is so big). maybe need to query multiple centers to find all data (DDBJ, SRA, GEO, ArrayExpress, korea? China?) 
    9797 
     98What are the common ways we would want to query? 
     99 - should we try to define? yes 
     100 - who should or can provide the data? 
     101   - maybe the the data production centers can also provide access to data 
     102   - eventually the DataCenters will also provide fine-grained access 
     103   - the new high data production rates are going to make it expensive for central data providers 
     104 - region query again we need! means mapping needed by the services 
     105 - but also want the read Sequence since we can extract SNP, need quality 
     106 - NEED data providers to create complete meta-data descriptions. 
    98107 
    99108==== Processing ==== 
     
    140149 - SRA is still evaluating technology to do region based access of short-reads 
    141150 
    142 Working with existing big data 
    143  - SRA, GEO, ArrayExpress 
    144  - now most of us pull the whole thing down and then work with it 
    145  - sometimes it is even hard to send for submission, sometimes DVD to move it around. not optimal. 
    146  - what are the queries we want to do? 
    147    - what are the reads in this region, definitely  
    148  - but also SRA maybe does not want do everything, but they will not turn data down and want everyone to send them the published data 
    149  - maybe not all data will end up in ONE archive (because it is so big). maybe need to query multiple centers to find all data (DDBJ, SRA, GEO, ArrayExpress, korea? China?) 
    150151 
    151 What are the common ways we would want to query? 
    152  - should we try to define? yes 
    153  - who should or can provide the data? 
    154    - maybe the the data production centers can also provide access to data 
    155    - eventually the DataCenters will also provide fine-grained access 
    156    - the new high data production rates are going to make it expensive for central data providers 
    157  - region query again we need! means mapping needed by the services 
    158  - but also want the read Sequence since we can extract SNP, need quality 
    159  - NEED data providers to create complete meta-data descriptions. 
    160152 
    161153IN THE END the goal is to find biology. Having access to the individual data elements is critical and this can not be just locked away inside files that can not be internally accessed.