This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
So a bit of background this is a database representing surveys, surveytakers, and their demographics (Male, Female, location, etc)
The relationship between the takers and groups are in a through table, which i denormalized to include the survey for better join performance
SELECT count(*), taker.complete, group.name
from surveytaker taker
inner join surveytaker__group through
on taker.id = through.surveytaker_id
and taker.survey_id = through.survey_id
inner join group
on through.group_id = group.id
and group.company_id = taker.company_id
and group.survey_id = taker.survey_id
and through.group_id = group.id
where taker.survey_id = 22101
and taker.company_id = 2997
group by taker.complete, group.name
I've indexed the takers on (company_id, survey_id)
the groups on (company_id, survey_id)
and the through/join table on (survey_id, taker_id, group_id)
as you may be able to see from the group by
, I'm trying to get the count of complete/incomplete surveytakers by the groups they belong to; the response rate by demographics. I know that by doing this, I'm joining (in the case of this survey) around 10k rows of takers, to around 15k rows of groups, to the join table of over 200k rows 💥. The explain (without the groupby as it add complexity to the explain with no cost) is below
Aggregate (cost=22241.84..22241.85 rows=1 width=8)
-> Gather (cost=1054.71..22241.83 rows=1 width=0)
Workers Planned: 1
-> Nested Loop (cost=54.71..21241.73 rows=1 width=0)
-> Nested Loop (cost=54.28..18984.59 rows=286 width=16)
-> Parallel Bitmap Heap Scan on group (cost=53.71..9241.82 rows=1451 width=12)
Recheck Cond: ((company_id = 2997) AND (survey_id = 22101))
-> Bitmap Index Scan on group_company_id_survey_id_index (cost=0.00..53.09 rows=2466 width=0)
Index Cond: ((company_id = 2997) AND (survey_id = 22101))
-> Index Only Scan using surveytaker__group_group_id_survey on surveytaker__group through (cost=0.57..5.99 rows=72 width=12)
Index Cond: ((survey_id = 22101) AND (group_id = group.id))
-> Index Scan using surveytaker_pkey on survey_surveytaker taker (cost=0.43..7.89 rows=1 width=12)
Index Cond: (id = through.surveytaker_id)
Filter: ((survey_id = 22101) AND (company_id = 2997))
The pain point here seems to be on these nested loops, postgres isn't doing an index only scan on the groups which is interesting... is there an index I'm missing? or is this simply the best that postgres can do on many joins and i need to shovel more money into hardware? Its a rather complex explain so I'm trying to wrap my head around it.
Post Details
- Posted
- 5 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/SQL/comment...