Changes between Version 35 and Version 36 of SatelliteBigData

Show
Ignore:
Timestamp:
2009/03/20 14:05:41 (15 years ago)
Author:
jmfernandez
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SatelliteBigData

    v35 v36  
    6161Protein-protein interaction datasets typically consist of a relatively small number of small objects. This type of data requires no advanced storage systems or tweaks to common systems; a simple RDBMS will do. 
    6262Data like genome sequences and assembly data involve larger objects (1 genome sequence -> 3Gb), but are still easily manageable on a standard filesystem. 
    63 Really big objects such as the data from simulations [IS THIS CORRECT?] require specialized storage systems such as ZFS, lustre or PVFS. 
     63Really big objects such as the data from simulations [IS THIS CORRECT?] require specialized storage systems such as [http://opensolaris.org/os/community/zfs/ ZFS], [http://wiki.lustre.org/index.php?title=Main_Page Lustre], [http://oss.sgi.com/projects/xfs/ XFS], [http://www.pvfs.org/ PVFS2] or future Linux filesystem [http://btrfs.wiki.kernel.org/index.php/Main_Page Brtfs]. 
    6464 
    6565In contrast to the above, diffraction results, microarray results or next-gen sequencing reads involve a largish number of objects which become more difficult to query. They are typically still stored in RDBMS but might require some tweaking that digresses from a normalized relational database model. Apart from obvious things to do such as creating good indices, further optimization can be found by using as few joins as possible and therefore organizing the data so that it can be stored in 2 or 3 tables/indexes (e.g. eeDB).