This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

19
Stupid word statistics experiment with scripts
Post Body

I came up with an idea today and tried to make a small experiment...

๐Ÿ‘จโ€๐Ÿ”ฌ๐Ÿงชโš—๏ธ The most distinctive words in my scripts

So what is this, what does it do and how is it done? It:

  • collects all words from all scripts
  • removes words that don't add any semantic progression to the script ("and", "through", "for", "my", "those", stuff that adds meaning but that's only there to make it English)
  • collapses words to their stems (fuck, fucks, fucking, fucked are only counted once)
  • sorts out the occurrence count of every (surviving) word in all of my scripts
  • for each script works out the relative occurrence of every word in that script
  • picks the top ten most disproportionately popular (the most popular, but weighed against words that are popular across all scripts)

I'm not in this field (data science) but I've tried stuff with generated scripts before. I think it's interesting how you can get some sort of feel for the difference between scripts, even if it's less revealing of the plot than I thought.

Author
Account Strength
100%
Account Age
9 years
Verified Email
Yes
Verified Flair
Yes
Total Karma
11,594
Link Karma
2,689
Comment Karma
3,585
Profile updated: 2 days ago
Posts updated: 6 months ago
Cuddly male script writer

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
4 years ago