This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
For people who have not been keeping up with the news, Elon Musk recently announced his intentions to buy Twitter. This deal however, is on hold
So our man Elon has new concerns that Twitter may be bot infested, which would reduce how valuable the company is and reduce how much Elon should have to pay for it (why he didn't raise this concern before putting in an agreement to buy Twitter is a different story...).
To figure out whether less than 5% of users are bots Elon Musk suggested he should take a "random sample of 100 users" and count how many are bots. To get this sample he would:
- skip the first 1000 replies to one of his tweets (or one of the tweets of someone with a large number of followers),
- pick every 10th comment until he reached 100 users.
- count the number of bots to determine the overall percentage of active twitter users who are bots (how he would decide whether an account is a bot is unclear and not the subject of this R1).
Why is this bad:
There are issues with whether 100 users is enough of a sample (it isn't) to draw any meaningful conclusions, but the biggest issue is what's called selection bias. People who respond to big accounts are neither random nor representative of twitter users at large! Compare the responses to an Elon tweet to the replies to someone like Harvard Economist Jason Furman. There's a big difference. If you surveyed from only people who responded to Jason you would likely conclude that there are close to zero bots on Twitter! Elon's twitter on the other hand gets disproportionate numbers of bots, so sampling from his tweets will overstate the proportion of bots on twitter.
To get a random sample, you have to actually sample randomly, or you have to formerly model the selection process to account for different users having a different probability of being included in your sample. In a survey, this would be weighting respondants based on the probabillity that they responded, in economics this could be something like a Heckman correction).
Subreddit
Post Details
- Posted
- 2 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/badeconomic...