Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

1
How to avoid non-technical errors and bugs ?
Post Flair (click to view more posts with a particular flair)
Post Body

Dear data-scientists,

How do you guys prevent non-technical errors and bugs ?

I work as a data-scientist in a junior position. My typical workflow consist of the following steps :

1) the client gives us a problem

2) think about proper methodology

3) gather the data necessary to solve the problem

4) apply some statistical procedures to solve the problem (generally a model)

5) build a report to send to the client (this report must follow the company's format and standards).

One aspect where I notice I am having difficulties or improvement are in what I will call the non-technical or non-statistical aspects of the workflow above. That is suppose you gather the right data and think about the proper methodology to solve the problem, but then how can I prevent errors on the coding and reporting, for instance:

- you have the right methodology, but when you are coding the model you assign a wrong variable in the code in some step and then the results are not valid ( for instance you have x_train and x_test and you mistakenly do m = x_test / 2 instead of m = x_train / 2).

- on the reporting stage, you exported the wrong results.

These are just examples.

Then you send your report and under scrutiny from your managers or revising things to answer additional questions you find this errors. Then it looks unprofessional to say that the initial results were wrong and you will have to update it. It may not inspire much confidence in your results in the future.

It has been hard for me to find ways to improve in this aspect because these types of errors are hard to predict. When you are coding you are already doing what you think it is correct. Given the time frames we have, it is also unfeasible to double check every single line of code. Also, the problems are generally very diverse in nature, so it is not like you can just adopt an automated or semi-automated methodology that you can work upon and improve, many things you have to build from scratch every time you receive a new project.

How do you guys prevent this type of errors ?

Thanks in advance.

Author
Account Strength
90%
Account Age
6 years
Verified Email
Yes
Verified Flair
No
Total Karma
1,493
Link Karma
621
Comment Karma
872
Profile updated: 1 day ago
Posts updated: 1 year ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
4 years ago