Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

6
Options for Managing Scientific Data?
Post Body

I am looking to restructure a few dozen terabytes of scientific data. Most of this data is satellite imagery in various formats scattered across undocumented flat-file directory structures and spiderwebbed together ad hoc. Some of the data is raw data that must be securely backed up and some is data generated from the raw data that is less critical. Despite the variety, this set of requirements is common across many scientific datasets and so I have been searching for options (without much luck).

What can I use to best support research scientists wanting to access a large-ish repository of scientific data?

A primary requirement is how to spread the files across multiple servers, but I also want to provide added functionality like the ability to search using detailed metadata.

For file storage itself I see solutions like:

  • building out a SAN
  • distributed filesystems like HDFS, glusterfs, & swift

And I also see projects developing databases for scientific data like:

I haven't really bought into any options I have seen so far. Can anybody here recommend software or provide general guidance for a 6-month sysadmin suddenly in charge of a research group's infrastructure?

Author
User Disabled
Account Strength
0%
Disabled 6 months ago
Account Age
13 years
Verified Email
Yes
Verified Flair
No
Total Karma
5,059
Link Karma
522
Comment Karma
4,499
Profile updated: 21 hours ago
Posts updated: 8 months ago
Jr. Sysadmin

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
7 years ago