This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
I did a bit of analysis on the github open source data using google's BigQuery. I processed ~3gb of raw data into something workable, about ~1mb in size. You can get the data here pt 1, pt 2.
The processed data is formatted for github commit frequency per 15 minute period. That is, each row in the time series tells us how many commits there were for those 15 minutes, starting august 1st, 2015 ending ~sept 11 2016 (missing jan 31 2016).
Here is a plot of the full data
Pretty cool! Couple of things of interest:
That spike at period 7675 (which translates roughly to October 20th, 2015, 0:00 UTC). What caused it, I do not know. I did some research and couldn't find anything.
The dip around 13000-15000 is the holiday periods. Programmers are people, too.
The only way I can explain the drop from 33000 to the end is that the data quality dropped. Alexa disagrees with lower github usage in that period. If you want to do serious analysis, you might want to cut that period out.
More fun graphs:
A monday-friday work week on github. Note seasonality in the days. Note friday night droppind lower.
Two months on GitHub. Notice those two days a week with drastically lower usage? Yep, that's the weekend. People on Github tend to 9-to-5ers. They also tend to do fuckall on saturdays, and do a little coding on sundays.
Did the july 20th stackoverflow outage affect github commits? Doesn't seem like it (outage starts at period 59 here and ends at period 61). In fact, none of the 5 stack overflow outages in the dataset seem to have an effect at all.
So there you have it, people over 40: kids these days aren't all that dependent on Stack Exchange.
Subreddit
Post Details
- Posted
- 8 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/datascience...