Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

5
Analysing github user habits with BigQuery public data
Post Body

I did a bit of analysis on the github open source data using google's BigQuery. I processed ~3gb of raw data into something workable, about ~1mb in size. You can get the data here pt 1, pt 2.

The processed data is formatted for github commit frequency per 15 minute period. That is, each row in the time series tells us how many commits there were for those 15 minutes, starting august 1st, 2015 ending ~sept 11 2016 (missing jan 31 2016).

Here is a plot of the full data

Pretty cool! Couple of things of interest:

  • That spike at period 7675 (which translates roughly to October 20th, 2015, 0:00 UTC). What caused it, I do not know. I did some research and couldn't find anything.

  • The dip around 13000-15000 is the holiday periods. Programmers are people, too.

  • The only way I can explain the drop from 33000 to the end is that the data quality dropped. Alexa disagrees with lower github usage in that period. If you want to do serious analysis, you might want to cut that period out.

More fun graphs:

So there you have it, people over 40: kids these days aren't all that dependent on Stack Exchange.

Author
Account Strength
100%
Account Age
12 years
Verified Email
Yes
Verified Flair
No
Total Karma
150,300
Link Karma
18,628
Comment Karma
129,685
Profile updated: 3 days ago
Posts updated: 9 months ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
8 years ago