This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
I've had some people mention that the file sizes for the monthly dumps are very large (especially when uncompressed). I've tossed around the idea of moving to a daily file format. The pros would be smaller file sizes and the ability for people only interested in specific days to grab the data more easily. The cons are that there will be 365 files for a year's worth of data instead of 12 files.
Would moving to a file format like:
RC_2017-12-01.bz2
RC_2017-12-02.bz2
RC_2017-12-03.bz2
RC_2017-12-04.bz2, etc. be better than monthly dumps?
Another benefit to moving to daily file dumps would be having data available more quickly.
Thoughts?
Subreddit
Post Details
- Posted
- 6 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/datasets/co...