| 85 | Currently the available public resources like SRA, GEO, ArrayExpress are only providing query facilities on the metadata of the experiments surrounding the data. The data is available as files to download (often in the original format) but they do not provide facilities to externally explore the data and ask biological questions on the data. This then forces anyone who wants to explore the dataset to download this data into local integration systems before they can ask their biological questions. |
| 86 | |
| 87 | Working with existing big data |
| 88 | - SRA, GEO, ArrayExpress: today they just provide the metadata of the dataset, not an ability to explore the actual data |
| 89 | - now most of us pull the whole thing down and then work with it |
| 90 | - sometimes it is even hard to send for submission, sometimes DVD to move it around. not optimal. |
| 91 | - what are the queries we want to do? |
| 92 | - what are the reads in this region, definitely |
| 93 | - but also SRA maybe does not want do everything, but they will not turn data down and want everyone to send them the published data |
| 94 | - maybe not all data will end up in ONE archive (because it is so big). maybe need to query multiple centers to find all data (DDBJ, SRA, GEO, ArrayExpress, korea? China?) |
| 95 | |
| 96 | |