This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Hello to the 4 of you that are still interested at this point! As always, if you're unfamiliar with WAYRStats, it's a data science module I developed for the WAYR threads to track user data and organize it into a bunch of different metrics and leaderboards in an effort to promote more activity in said threads. This comes with a monthly contest on top of it, where you will be entered to win if you post three times in one month in any three WAYR threads. I made a big post at the start of the month explaining everything, check it out if you're curious.
What is it
So for this sneak peek I'm gonna talk about what I think to be the most impressive leaderboard in this project, at least from a coding perspective. The post streak leaderboard tracks users who post in consecutive WAYR threads and keeps track of the longest post streak they have achieved for the year. As always I'll give the data first and explain after:
Leaderboard stats: Longest Consecutive Post Streak
#1: /u/deathjohnson1....[19] posts between Jan 1 and May 6.
#2: /u/Alexfang452......[14] posts between Feb 5 and May 6.
#3: /u/UnknownNinja.....[10] posts between Mar 4 and May 6.
#4: /u/Lastshade01......[8] posts between Feb 26 and Apr 15.
#5: /u/boran_blok.......[6] posts between Feb 5 and Mar 11.
#6: /u/PHNX_Arcanus.....[6] posts between Apr 1 and May 6.
#7: /u/superange128.....[4] posts between Apr 8 and Apr 29.
#8: /u/Pizzaphotoseyes..[3] posts between Jan 1 and Jan 15.
#9: /u/_JohnTitor__.....[3] posts between Jan 8 and Jan 22.
#10: /u/Solaris251......[3] posts between Feb 19 and Mar 4.
How it Works
This next section is all about how it works, so if that's not your bag then skip this part. I love this one, and the fact that this shit actually works properly is one of the reasons I love it. To make things easier, here's the source code for this module; I'll be breaking down individual lines/blocks from here. To start, let's go with the very first code block, InitStreakData:
def InitStreakData(self, submission):
self.lastThread = submission
for comment in submission.comments:
if comment.author:
tempData = UserStreakInfo()
tempData.endThread = submission.title.split("-")[1].strip()
tempData.startThread = submission.title.split("-")[1].strip()
tempData.streakVal = 1
self.curStreakContext[comment.author.name] = tempData
self.streakData[comment.author.name] = tempData
return True
This code, in a sentence, sets up the first pass for our streak databases. Technically the first thread we start with, every single user in the thread is starting a streak. In order to make things easy to handle that first pass, this function is called and gets everything in order to go through the rest of the threads. But wait, how will it know to call this only once? And why does the function return True
seemingly for nothing? I give you the next two lines:
if "untranslated" not in submission.title.lower():
doOnce = WAYRStats.InitStreakData(submission) if not doOnce else WAYRStats.FindLongestStreak(submission)
This is the code that actually calls the streak functions, and it looks wonky, because it is. This is a clever little workaround for a couple problems this module has: for starters, streaks are completely broken when we take the untranslated threads into account, so the if statement on top eliminates that problem. After that, that second line is called a ternary operator
- if you're code savvy, ternary operators take an if-else
statement and condense it into a single line of code; if you have no idea what that means, I can use a single line of code to process two different actions based on whether a condition evaluates as true or false. So, if the doOnce
variable is set to False
it will run InitStreakData, and if it is set to True
it will run FindLongestStreak. The not
is in there to simply invert the logic check - False
values now evaluate to True
and so on. Now here's where that return
statement from earlier is pretty cool - InitStreakData returns true after it finishes running, which is then stored in doOnce, so the next time that line is run it will call FindLongestStreak. It's a cheeky way of being able to execute the initializer code only once before running the main function every time from there on out. So, now that everything is set up, on to the main function.
To start I want to break down what I think to be the 4 most important lines of code in this whole program:
threadUsers = [comment.author.name for comment in submission.comments if comment.author]
newLosers = [user for user in self.curStreakContext if user not in threadUsers]
newUsers = [name for name in threadUsers if name not in self.streakData.keys()]
newCruisers = [user for user in self.curStreakContext if user in threadUsers]
These four lines set things up to handle the three major events that happen in any given thread:
- A user begins a new streak
- A user's streak ends
- A user's streak continues
In any given thread these three events occur and need to be handled, and those four lines kill it for handling what goes on. These lines rely on a method of constructing arrays in Python based off of other sets of data. The basic structure is [variable] = [[item] for [data] in [structure] if [condition]]
- an array of item
is created consisting of every data
within structure
that meets condition
. With this, the first line creates an array consisting of every username present in the thread (I mentioned in the main post, but these functions are called once per thread and handles everything it needs to within it before moving on). That data structure is then used to determine when:
- A user begins a new streak - username is in the thread but not in the resident database
- A user's streak ends - username is in the contextual database but not the thread
- A user's streak continues - username is in both the thread and the contextual database
If you noticed I specified two different databases there - there's a contextual database and a resident database. The contextual database is basically our running database that keeps track of active streak data, while the resident database holds all the data when a user's streak ends. This allows the context database to be very lightweight as it goes through the threads, and only writes relevant data to the resident database when it needs to, AKA when a streak ends and that user's information is removed from the context database. After we have these four data structures, we need to process the three scenarios before continuing. Now two of these scenarios are relatively simple; for new user streaks, we run the same code as is in the InitStreakData function and add that user's data to both the resident and context database. For continuing streaks, just add 1 to their streak value, easy as that. The last situation, when a streak ends, is a little trickier:
for loser in newLosers:
self.curStreakContext[loser].startThread = self.lastThread.title.split("-")[1].strip()
if self.curStreakContext[loser].streakVal >= self.streakData[loser].streakVal:
self.streakData[loser] = self.curStreakContext[loser]
del self.curStreakContext[loser]
So, when a user's streak ends, we only find that out in the thread after their final post, right? So we need to keep track of the last post so we can tell the start and end threads a user's streak was contained within. After that, check to see if this current streak is larger than what's already stored in the database, overwrite if necessary, and then delete that user from the contextual database.
So that's a ton of tricky processing we need to do on every single thread, thank goodness that's over right? Wrong. Things would be too easy that way, wouldn't they? So as it turns out Jan 1 was the first WAYR thread of the year, and that is an explicit edge case we need to handle, because no matter what everyone's streak needs to end on Jan 1, otherwise this is no longer a 2020 leaderboard. I give you this piece of ugly code that I still blame deathJohnson1 for forcing me to do, because the only reason why I had to do it is he has perfect fucking attendance for the year:
if submission.title.split("-")[1] == FIRST_THREAD:
for edgeCase in newCruisers:
self.curStreakContext[edgeCase].startThread = submission.title.split("-")[1].strip()
if self.curStreakContext[edgeCase].streakVal >= self.streakData[edgeCase].streakVal:
self.streakData[edgeCase] = self.curStreakContext[edgeCase]
Basically this is a hard-coded version of the previous streak end code, but meant to handle every user in the Jan 1 WAYR thread. Hard-coded
for those unaware means code that can't really handle a variety of inputs or situations; it's hard-coded to handle one or two explicit situations and no more. It's generally a common coding practice to hard-code as little as possible so your code has no issues with scalability, but shit like this there's not much you can do. After we've done ALLLLLLLLLLLLLLLLL that shit, the only thing left is to format things and print to console.
Pros and Cons
Aight that last section is dummy thicc and I apologize but I'm very impressed with how that module came out and wanted to really dig into it. So this module is less of a pro-con kinda situation, but I'll try to break down some downsides:
Pros
- Massively incentivizes users to continue posting week by week
- Users on longer streaks will get a continuous ego boost with each successive leaderboard
Cons
- At some point in the middle of the year the leaderboard may become 100% impenetrable
- Doesn't stop a user from generating a long streak of low-effort posts
Honestly the pros aren't really crazy pros, and the cons aren't really crazy cons. Posting consistently is something this entire project incentivizes, so it's hard to say this module specifically claims credit for it. Users who are on streak posts also are highly likely to be on other leaderboards as well, so this won't be the cream of the crop for them, but the top 3 may take pride in their streak score. For the cons, yeah low effort posts can happen but I'm honestly not scared of low-effort content. The very fact that you took time out of your day to post in the thread, even if it was literally 2 sentences, is awesome and I thank you so much for contributing to the discussion. We don't need a minimum character count or anything, any effort at all is a significant effort in my book, and I don't think that should be punished. As for the leaderboard becoming impenetrable, yeah, long as the top spots keep their streaks going you're screwed, but the month a top user's streak ends will be a big event, no? We'll see what happens.
Conclusion
I loved working on this module, and I honestly think it's my favorite module of them all, especially those four lines of code; reminds me of SQL to some degree. So the question some may be asking is "What about the untranslated threads?" You're right - this module breaks when processing those, but what happens when it processes only those? Welp, the answer is way more disappointing than you'd think:
Leaderboard stats: Longest Consecutive Post Streak
#1: /u/GitahMuttan......[2] posts between Apr 20 and Apr 27.
#2: /u/superange128.....[2] posts between Apr 20 and Apr 27.
Yeah that's like, it. That's the whole leaderboard. Kinda makes you realize merging the threads was a good idea.
Do you have any ideas on how this could work better? Something not make sense that needs clarification? Are my posts still too fucking long? Let me know!
Subreddit
Post Details
- Posted
- 4 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/visualnovel...