This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
I don't really come from a statistics/data science background, and don't have a lot of formal training outside online classes, but have ended up in a position where I do a ton of data analysis on a daily basis. And goddamn do I wish someone sat me down and explained this to me sooner.
The reality is it is SO EASY to mess something up and end up with data that looks right but isn't. Forgetting to enable the weights in a survey, messing up the filter in a SQL query, using count instead of distinct count in a pivot table, etc.
I've had clients come back to me four months after a project saying they got different results when they ran the data. Fuck if I know, time to spend 5 hours re-running everything. SAVE YOUR WORK.
I once got to the end of 6 weeks worth of data runs and analyses for a client. Went to double check the data in the final report and couldn't reproduce it. I had no idea what I had run and didn't know if I messed up early or was messing up now. Turns out it was earlier and had to re-do everything. SAVE YOUR WORK.
Most of my day to day work is in excel and SPSS. Unfortunately that means a lot of it is in GUIs that can't be reproduced, so I've been slowly integrating more and more R.
I know is one of things that most of you are going to be like "duh," but on a deadline or just doing exploratory analysis, it's so easy to take shortcuts and forget to do it.
Subreddit
Post Details
- Posted
- 7 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/statistics/...