This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Introducing WAYRStats!

Post Body

TL;DR is, now you get to see numbers and leaderboards associated with your WAYR posts, and you’re entered into a raffle drawing for prizes if you post three times in one month. So go and frolic and bash on worst girl and argue whose taste is the most shit and complain about which subreddit you think is better subtly between the lines of a well thought out and introspective analysis of Boob Wars 2.

Large Post Warning - Take your time reading this one. There’s a lot here, I’m not expecting you to go through it all in one sitting. You essentially have a month to go through this before you start missing out on anything, worst case you have until the first WAYR thread of the month. But hey, I’m not a cop, I just like to make long posts. Enjoy.

What’s up everyone! Name’s Arcanus, I’ve been around for a while; if you recognize the name great to talk to you again, if not then it’s nice to meet you. Now if you have been around for a while, then doubtless this screenshot is something you remember from a long time ago. If that’s Greek to you, the short and sweet of it is, back in the day this subreddit was managed much more by hand and as a result we had a fair bit more extra features and add-ons. Custom CSS banners for specific VN discussions, weekly threads not even being handled by AutoModerator, and a hand-made WAYR archive, complete with a user leaderboard hosted on the subreddit’s wiki.

That screenshot was taken from the only internet archive snapshot that exists of it, circa Winter 2015. Back then, I loved that leaderboard. It was this tiny little corner of the internet that even plenty of subreddit regulars at the time didn’t care too much about, but in some sense, it was validation; it was motivation to keep posting in the WAYR threads. More importantly, it gamified the WAYR threads. Gamification is as the word implies, it is the process of turning any given action, habit, or task into a game with various degrees of interactivity. Something as simple as a leaderboard with one or two numbers representing a “score” can have powerful motivating effects on people, and as a result can increase activity, engagement, and popularity. Over time, however, this leaderboard was abandoned for a number of reasons, but the simplest being it became too arduous to maintain manually, and demand was not high enough to either tackle that issue head on or find automated alternatives.

Nowadays, additionally for a number of reasons, the WAYR threads aren’t pushing the level of popularity and activity that could be seen in “the good ol’ days” as it were. Simple as it was that motivation to get onto and climb the leaderboard is gone; posting in the WAYR thread nowadays is an exercise in self-satisfaction. There’s only a small handful of users posting consistently in there - write-ups get less upvotes overall, there’s less comment replies, and AutoMod-chan is the most active user in the threads. Now it’s certain that this can grow organically; a lot has happened since the start of the year and especially with things “starting fresh” in a couple senses these weekly threads still need to get their feet under them again. Lately I’ve been mulling this over, of what there may be that I can do to boost activity in these threads, and one of the things I always found myself going back to was that leaderboard. So, I decided why not make my own?

It is with this thread that I announce the spiritual successor to the classic WAYR leaderboard. But this one will be better. This one will be automatic, this one will have more features, track more, be more public, and incentivize different things. This will be more than what it’s based on. Ladies and (mostly) Gentlemen, I present to you a personal project I’ve been working on: WAYRStats.

The Monthly WAYRStats, Leaderboard, and Competition

What is WAYRStats?

I’m so glad you asked in such an incredibly convenient and properly formatted manner! WAYRStats is a data science script written in Python that passes a search query to the Reddit API and processes the results. In plain English, my script passes a string to Reddit to run as a search on /r/visualnovels; this search string isolates and returns nothing but the WAYR threads. Using this fucking mountain of data I get as a result I can go through ALL that information, parse it in various ways, and set up basically whatever leaderboards/metrics/analytics I (or you!) could want. Later on in this post I’m gonna be breaking down one of the main modules of the script in more detail than you could ever want, so stay tuned for more details. Before that though, I’m excited to announce a new monthly competition revolving around the WAYR posts!

The WAYR Monthly Competition

I say competition but really everyone wins man. The idea behind this is to boost activity in the WAYR threads by providing tangible and intangible incentives to do so. This could work amazingly, this could be downvoted to oblivion, who knows. This competition as it were was the original idea when I came up with this whole project. At the start I asked “what would motivate [user] to participate more in the WAYR threads?” And the answer I came up with is what we all love to see: free shit! The system works like this:

Post in the WAYR threads three times in one month, and you’re eligible for a raffle drawing at the end of said month. Your post must be a parent comment, presumably talking about, well, uhh, What Are You Reading. Subtle, I know.
Raffle prizes are currently limited to sidebar suggestions and custom text or image flairs. If you have an idea for a prize that seems reasonable, let me or better yet one of the mods know and they will come to a decision about it. The prize selection is small for now, but we plan on expanding the prize pool if this new system gains traction.
There is no minimum character requirement, no extras for posting essays; post literally one sentence three times in one month and you’re golden, but we encourage you to discuss!
Every month I will be posting a huge thread going over the stats and leaderboards for the month; that thread will also be used to announce the winner of the previous month. I will try to post these threads in the first week of every month, most likely the day of the first new WAYR thread so the data is the most accurate.

That’s the super basics of the competition. Talk about what you’re reading, get cool stuff. Very simple, no? It took me maybe a day to write the code that determines contest eligibility. I looked at things and just though “yeah nah there’s more I can do here” and so I got to work. Over the past month I’ve been communicating with the mods about how they want this to go and I’m very pleased with the modules that are in the script now. So now that you know about the competition, let’s go on to the next section, breaking down that module.

WAYRStats

So this is literally the debut so nomenclature is still relatively WIP, but WAYRStats is basically half of what I’ve put together so far (I’ll tell you about the other half in the next section of this post). As I said above I send a search query to isolate the WAYR posts; WAYRStats specifically singles out every WAYR post for the entire month and slowly parses through all the data, building data structures as it goes. One term I’ll use consistently throughout here is the word Dictionary which if you’re unfamiliar with it, it’s a data structure that stores data in key : value pairs; keys are used to access values. It’s a really flexible way of storing data that has a lot of options with it. As a special surprise, y’all will get to see (most of) April’s full suite of WAYR analytics as a part of me explaining what all this stuff does and how it does it. I say most of because there’s still a few more days for other users to comment in the most recent thread that may alter the data. Once the first WAYR in May goes up I’ll prolly run the script again and update it or create another thread. I won’t drone on with the opening paragraph, lemme show you what this thing can do.

You know what, fuck it. Have the largest data set first: Average User Data.

Average User Data

Average user data: [user] - [total char. count / num. of posts = avg character count]
[PHNX_Arcanus]----------[25794/6 = 4299]    [deathjohnson1]---------[21969/7 = 3138]    
[UnknownNinja]----------[19217/5 = 3843]    [Some_Guy_87]-----------[19192/2 = 9596]    
[alwayslonesome]--------[16856/2 = 8428]    [Alexfang452]-----------[15636/5 = 3127]    
[KaveAhangar]-----------[13366/2 = 6683]    [GitahMuttan]-----------[12837/2 = 6418]    
[superange128]-----------[8328/6 = 1388]    [greenhillmario]---------[7605/2 = 3802]    
[Betteroni]--------------[7554/1 = 7554]    [fallenguru]-------------[7410/3 = 2470]    
[eiruki]-----------------[6983/2 = 3491]    [JayOutslee]-------------[6750/2 = 3375]    
[RisingChaos]------------[6616/2 = 3308]    [nwl123]-----------------[6377/2 = 3188]    
[GorbyVodka]-------------[6063/1 = 6063]    [Kiesuu]-----------------[6013/3 = 2004]    
[DarknessInferno7]-------[5273/2 = 2636]    [SSparks31]--------------[4182/1 = 4182]    
[MidgetPanda3031]--------[3946/1 = 3946]    [SignificantMaybe]-------[3684/1 = 3684]    
[GeneralGom]-------------[3558/3 = 1186]    [Stefan474]--------------[3390/1 = 3390]    
[Worluvus]---------------[3259/1 = 3259]    [SpectrumDT]-------------[3200/1 = 3200]    
[iT__jUsT__WoRks]--------[3072/1 = 3072]    [Eterna1Ice]-------------[2768/1 = 2768]    
[therumisallgone]--------[2745/1 = 2745]    [faiiper]----------------[2610/1 = 2610]    
[Lastshade01]-------------[2381/3 = 793]    [tintintinintin]---------[2333/1 = 2333]    
[sorathecrow_]-----------[2117/1 = 2117]    [a_pale_horse]-----------[1983/1 = 1983]    
[KnightLunaaire]---------[1966/1 = 1966]    [caspar57]----------------[1947/3 = 649]    
[Inara_Seraph]-----------[1841/1 = 1841]    [Zagorz]-----------------[1557/1 = 1557]    
[SortaWeeb]--------------[1548/1 = 1548]    [drinkyourmilk94]--------[1506/1 = 1506]    
[tauros113]---------------[1322/2 = 661]    [OdaNova]-----------------[1226/3 = 408]    
[OhLookAtMeImSpecial]----[1179/1 = 1179]    [AngristIron-Cleaver]----[1173/1 = 1173]    
[yolo1234123]------------[1154/1 = 1154]    [AssembledVoid]----------[1136/1 = 1136]    
[sfisher923]--------------[1108/2 = 554]    [ShinKozato]-------------[1050/1 = 1050]    
[sirflimflam]------------[1019/1 = 1019]    [tostitosruler]------------[888/1 = 888]    
[nanogenesis]--------------[878/1 = 878]    [WalriderCosplay]----------[855/1 = 855]    
[Deost8003]----------------[677/1 = 677]    [Adan181]------------------[616/1 = 616]    
[YossaRedMage]-------------[590/1 = 590]    [Hikagura]-----------------[577/1 = 577]    
[Codex28]------------------[515/1 = 515]    [August_Hail]--------------[482/1 = 482]    
[Oglifatum]----------------[465/1 = 465]    [totallyhuman939]----------[415/1 = 415]    
[sultonydp]----------------[402/1 = 402]    [PlasmaLeaderN]------------[339/1 = 339]    
[VeriDF]-------------------[332/1 = 332]    [Jazz_Musician]------------[325/1 = 325]    
[morphogenic96]------------[313/1 = 313]    [davisjryoung]-------------[264/1 = 264]    
[Cenriqu3]-----------------[236/1 = 236]    [iHicham]------------------[216/1 = 216]    
[metroman1]----------------[172/1 = 172]    [Koyomi-senpai]------------[166/1 = 166]    
[ShoujoKakumeiLea]---------[139/1 = 139]    [Nirvash78]----------------[129/1 = 129]    
[cerek17]--------------------[70/1 = 70]    [chrispy4627]----------------[35/1 = 35]

The number you’re wondering about is 74, by the way. This one was one of the earlier modules I put in and the formatting used to be horrible. Here’s a sample:

OdaNova...............[1] Total Posts, [528] Total Post Length,   [528] Average Post Length.
stealthswor...........[1] Total Posts, [755] Total Post Length,   [755] Average Post Length.
deathjohnson1.........[4] Total Posts, [8270] Total Post Length,  [2067] Average Post Length.

That’s just 3 lines. It did that for every user. Look we learn from our mistakes. Anyways, another little module attached to it:

Averages: Top 5 for the month
#1: /u/Some_Guy_87[9596]
#2: /u/alwayslonesome[8428]
#3: /u/KaveAhangar[6683]
#4: /u/GitahMuttan[6418]
#5: /u/PHNX_Arcanus[4299]
#6: /u/UnknownNinja[3843]
#7: /u/greenhillmario[3802]
#8: /u/eiruki-----[3491]
#9: /u/JayOutslee-[3375]
#10: /u/RisingChaos[3308]

It says top 5 because I intend to do only top 5s for monthly leaderboards, however this is the debut so I’ll give y’all a little extra. This one was fun to do because it pushed me to use dictionaries in a creative way. I created my own data structure here and stored that as the value in the dictionary, so one key could access more than one piece of data associated with it. In this case, your Reddit username serves as the key (which it’s gonna do that for almost every single module) which accesses both the total character count and the total number of your posts, both of which it adds up as the script goes through the threads one by one. Afterwards, do a bit of math and print it out nice and neat.

Single Line Statistics

As the name says, these aren’t large aggregations of data, rather singular calculations on said large aggregations of data. Say that five times fast. Here’s some cool stats for the month:

The longest post in Apr was written by [Some_Guy_87] on [Apr 22] and had a length of [9964] characters.
We had an average of [14] comments per thread this month.
The average length of posts for this month is [2276] characters.
The Pretty People Coefficient: Percentage of users who have set custom character flairs: [29%]

Alright so the first one is relatively easy, it’s a dictionary with a few key : value pairs for username, character count, and post date. It goes through the threads and immediately takes the first comment it sees and calls it the biggest, then compares every comment through the whole month, swapping out the data when a bigger comment comes along.

The next two come in a pair and are a very simple module, here’s the code:

def FindAvgPostInfo(self, submission):
    self.totalThreads  = 1
    for comment in submission.comments:
        self.totalComments  = 1
        self.totalCharacters  = len(comment.body)

This function is called once per thread, increments the thread count, then for every comment in the thread it increments the count for comments and adds to the total character count. Again, a bit of math and print to console.

The last one is unique and lets me explain a bit how the Reddit API works. The very first call you make in the program is an authentication call to Reddit, which gives you an instance of their API, fancy words for our own little copy of Reddit we can work with. The Reddit object has subreddit objects you can grab from it, so we get the VNs subreddit. We send a search on that subreddit object and we get a list of submission objects. On a submission object you’ll find a comments object, then an individual comment, then that comment’s author, and then there is an author_flair_css_class object on that. It goes alllllll the way down the rabbit hole, but at least we set up a base camp around the search layer so it’s easy to build off of. By checking if that member variable has either no data at all or a default blank value we can get the total percentage of custom flairs.

The Early Bird Club

The Early Bird Club is easy to figure out - post your review or comment within the first hour of the thread going live, and you get a point. Here’s this month’s notables:

The Early Bird Club: 
#1: /u/UnknownNinja-----[5]
#2: /u/superange128-----[4]
#3: /u/PHNX_Arcanus-----[3]
#4: /u/alwayslonesome---[2]
#5: /u/Alexfang452------[2]
#6: /u/Some_Guy_87------[2]
#7: /u/KaveAhangar------[2]

Like I said, normally I only want to do top 5s for monthly leaderboards, but it doesn’t feel fair to 6 and 7 and I haven’t implemented a weighted ranking system yet. This one is also easy to explain. By now you know we like dictionaries with usernames as our key, same deal here. By checking your post date against the thread’s post date, simply check if the hour value is the same and give a point if so. Sort it and print to console.

After writing this post I realize because this script only checks the post hour, that now gives a 1-hour window every 24 hours to qualify as an early bird; gonna have to update that shit before the end of the month.

The Sweet Talker’s Club

This was one of my favorite modules I came up with, and also a headache to implement. The Sweet Talker’s Club tracks total comment replies for the entire month, including shit that goes back and forth. Funny story about February 12th, I’ll tell you in a bit. Top 10 for this month is:

The Sweet Talker's Club: 
#1: /u/Some_Guy_87------[17]
#2: /u/AutoModerator----[17]
#3: /u/PHNX_Arcanus-----[11]
#4: /u/UnknownNinja-----[8]
#5: /u/tintintinintin---[7]
#6: /u/tauros113--------[4]
#7: /u/GeneralGom-------[4]
#8: /u/Bruxae-----------[3]
#9: /u/Zagorz-----------[3]
#10: /u/Veshurik--------[3]

Honestly I can’t fucking believe /u/Some_Guy_87 tied with Automod. I’ve got the year-in-total leaderboard and she’s got first place by a long shot. So how do we go about getting this data? Let me tell you about something called recursion. People familiar with this concept are already groaning, for those out of the loop (no pun intended but that was godlike), recursion is where you call a function within itself, basically forcing the compiler to do over the same code again and again and again. However, think of it like adding layers to a cave; everything is identical but you’re still going deeper. When you find what you’re looking for you need to go back up to the surface, you don’t just warp back to where you started; you have to manage your ascent. (Sorry for all the cave analogies I watched Made in Abyss last week and this is how I’m coping) I’ll give you the code on this one:

def FindSweetTalkers(self, submission):
    submission.comments.replace_more(limit = None)
    for comment in submission.comments:
        self.RecurseSweetTalkers(comment)

def RecurseSweetTalkers(self, comment):
    if not comment.stickied:
        for reply in comment.replies:
            if reply.replies._comments.__len__() > 0:
                self.RecurseSweetTalkers(reply)
            if reply.author and reply.author.name not in self.sweetTalkers: 
                self.sweetTalkers[reply.author.name] = 1
            elif reply.author:
                self.sweetTalkers[reply.author.name]  = 1

That first function just sets up a loop to recurse through the thread. The RecurseSweetTalkers function will continue to call and execute itself until it sees that the Reddit API tells it there are no more comment replies. Then it goes all the way back up the comment chain and loops again. This crawls its way through every comment in every thread and gives points to replies only. So, February 12th fucking broke the shit out of this code because two crazy edge cases happened in a single thread. First, /u/tauros113 posted and stickied a comment in the thread. Stickied comments do all kinds of fucked up shit with the API, it’s got absolutely no idea what to do about it, and crashes my code. Secondly, a handful of users went back and forth long enough for the standard “continue this thread” or “keep reading” prompt to show up. This also breaks the shit out of the API. That “continue reading” is an actual entity, the API thinks it’s a comment reply, but has absolutely no data associated with it because it’s just a button to keep reading. Thankfully with the line submission.comments.replace_more(limit = None) is basically a global command on the thread to flush out those prompts and load every comment in the thread. Even funnier story, fixing that issue was the difference between a user being on the leaderboard or not. Wild stuff.

The Perfect Attendance Club

I saved my favorite for last, I really do hope this category gets bigger as time goes on. The Perfect Attendance Club naturally is for users who posted in every thread for the month:

The Perfect Attendance Club:
/u/UnknownNinja
/u/deathjohnson1
/u/PHNX_Arcanus
/u/Alexfang452

Congratulations guys, good stuff. The way I get this list is actually kind of fun; I start with an array of every user in the last thread of the month (Reddit can only sort by new, so all of my data processing actually happens backwards), then one by one strip away any user who is in the next thread and is not in the current list. By the end, only the perfect attendees remain. Shouts out to these guys, they’re doing god’s work.

The WAYR Leaderboards

The WAYR Leaderboards are........a surprise! Now that WAYRStats has been announced, you all knowing of its existence changes the nature of the data that this thing aggregates, so for now I gave y’all a full suite of monthly analytics for April. At the end of May you’ll get the full suite and that thread will debut the year-in-total leaderboards, and this project will be going open source for anyone who is curious about Python, about how this all works, or if they want to run it on their machine to fiddle with things. In addition, I’m thinking that throughout the month of May, I’ll make some smaller threads to spotlight individual facets of my code, Python in general, and getting feedback from you guys; I’ll probably debut a couple leaderboards in that time. The leaderboards module is fully functional and I actually do have a couple ideas for more, so stay tuned. But, maybe I can give you a little peek...

For real, shouts out to /u/deathjohnson1 for being the only user to have posted in every single thread for the year of 2020 thus far (not including untranslated threads).

In Conclusion

Yes I know I’m in just about every metric, I made this shit don’t you think I wouldn’t completely rig it my way? /s

I’m very excited to debut this to the subreddit, and a bit nervous as well. In its own right I learned a lot about Python doing this and this project will likely end up on my portfolio as a code sample. I think you guys will like it, I’ve spoken to a few people that miss that old leaderboard, and I hope that this spiritual successor will feel more noticed, more accessible, more engaging, and more fun.

Do you have ideas to make this project better? By all fucking means shoot me a line if it has to do with the code or the mods if you want to contribute to the prize pool or offer suggestions!

It’s been a pleasure getting back into this community after a long break; having a place to just let those creative juices flow and pop off about something I care about is really important to me, and I wanted to show my appreciation with this. So at the end of it all, I would like to ask you a question:

What are you reading?

Author

Account Strength

100%

Account Age

6 years

Verified Email

Yes

Verified Flair

Total Karma

8,774

Link Karma

2,002

Comment Karma

6,521

Profile updated: 1 month ago

Posts updated: 11 months ago

PHNX_Arcanus

ChizuChizu | vndb.org/u86636

Subreddit

r/visualnovels

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 4 years ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/visualnovel...