This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
I am looking to restructure a few dozen terabytes of scientific data. Most of this data is satellite imagery in various formats scattered across undocumented flat-file directory structures and spiderwebbed together ad hoc. Some of the data is raw data that must be securely backed up and some is data generated from the raw data that is less critical. Despite the variety, this set of requirements is common across many scientific datasets and so I have been searching for options (without much luck).
What can I use to best support research scientists wanting to access a large-ish repository of scientific data?
A primary requirement is how to spread the files across multiple servers, but I also want to provide added functionality like the ability to search using detailed metadata.
For file storage itself I see solutions like:
- building out a SAN
- distributed filesystems like HDFS, glusterfs, & swift
And I also see projects developing databases for scientific data like:
I haven't really bought into any options I have seen so far. Can anybody here recommend software or provide general guidance for a 6-month sysadmin suddenly in charge of a research group's infrastructure?
Subreddit
Post Details
- Posted
- 7 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/sysadmin/co...