Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

6
Need to move a ton of data. Deciding approach.
Post Body

Need to move about 10 terabytes of data. Here's the scenario:

  • Data is currently with a vendor that we partner with. Only export option is for us to pull data out with APIs.
  • Need to move this data to a customer's S3 instance for backup purposes.

Basically, I'm kind of torn because we don't want to have customers to write an integration, but i'm thinking of some options:

  1. Python script to download the 50TB locally, and then push up to customer's S3 bucket using something like rclone using their credentials. Would have to do in batches obviously. Don't love the idea of making this a 2-step process though.
  2. Write a generic script for customer to run somewhere to pull data from vendor into their S3 bucket using our api as a proxy. This is quite messy because we'd have to also join their data with some of our internal metadata (we don't want to grab all).

Imagine data is something like this:

Vendor

  • mixed customer data
  • 1000 files or so related to 1 customer
  • 500 files for customer that we don't need (abandoned/soft deleted/etc).

API's to download from there ^

Our system

  • links objects to the data above. helps us know which we actually need to pull over (which 1000, not all 1500)

Author
Account Strength
100%
Account Age
5 years
Verified Email
Yes
Verified Flair
No
Total Karma
10,283
Link Karma
245
Comment Karma
9,995
Profile updated: 1 week ago
Posts updated: 4 months ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
1 year ago