This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Hey All,
I'm the dev of Repost Sleuth here on Reddit.
I'm looking for input on the most performant way to select ~100 million rows from a table with 1.3 billion rows in a MySQL database.
Example table looks like this
post_type | created_at | hash |
---|---|---|
String(20) | DATETIME | String(64) |
There's a composite index containing post_type and created_at
I periodically need to select a year worth of rows as fast as possible.
A query similar to:
SELECT id, hash, post_id FROM post WHERE post_type='image' AND (created_at BETWEEN '2022-1-1' AND '2022-7-15');
Explain for query:
The test database above explain was run on has 580 million rows.
For the past several years I've maintained a table for each year. This allowed me to select the whole table. I was able to pull all records fast enough for my needs.
I'm experimenting with keeping all data in a single table VS split by years to simplify things. However, with the table and query described above, it takes a painfully long time to pull the data I need.
I'm hoping someone here with more experience can offer some suggestions to optimize this. I'm not sure if there's ways to do the indexes better or possibly core MySQL tweaks
I'd appreciate any input.
Subreddit
Post Details
- Posted
- 2 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/Database/co...