Coming soon - Get a detailed view of why an account is flagged as spam!
view details

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

1
Is there a way to parse mostly consistent but not always Fixed Width reports or a suggested language that can do this?
Post Flair (click to view more posts with a particular flair)
Post Body

Unfortunately, in about 6 months, my company is being migrated from an accounting system where I have SQL database access and beautiful reports and a lot of stuff I've made refreshable in Excel for managers to make decisions, to a system that:

a) Has no data warehouse.

b) Has no direct SQL query access.

c) Reports are for the vast majority, fixed width, and while generally (but not always) the same fixed width, there's sub-data below each line of useful data on the reports that overlaps the columns with actual data. i.e.: (This example isn't real data). The extra description is good for human eyes but for Excel, for instance, it's garbage.

Desciption    Item        Qty Moved
    Extra description
------------------------------------
3" PASTRY     220464       -35
    Status is in WH 13 BIN 5

My dilemma is that the extra line overlaps. there's no guarantee that the "Item" will be numeric data and the same positions in the extra line will not be numeric, and so i need a way to be able to parse this data in an intelligible way. As I'm a novice programmer (i've dabbled) I was thinking Python could be used but not quite how.

Other accountants basically said "why not just use Data->Columns in excel for this but i want automation (I have a bunch of reports that go through Automate out to the cloud, etc) and sometimes things like the description push the item # over so i can't rely on the physical position in the document (i could then just use Python and parse the line by character #) unless i'm manually looking at the data.

If anyone could point me in a direction for something I can learn to help me accomplish this that makes it easier, I'd appreciate it.

Comments

Given the inconsistent nature of your data, Python with its powerful libraries such as pandas and numpy can indeed be extremely helpful for data manipulation and cleaning tasks like this

It might have a steep learning curve in the beginning but it'll pay off in the long run.

Author
Account Strength
100%
Account Age
3 years
Verified Email
Yes
Verified Flair
No
Total Karma
251,302
Link Karma
1,497
Comment Karma
247,810
Profile updated: 1 week ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
1 year ago