This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
Unfortunately, in about 6 months, my company is being migrated from an accounting system where I have SQL database access and beautiful reports and a lot of stuff I've made refreshable in Excel for managers to make decisions, to a system that:
a) Has no data warehouse.
b) Has no direct SQL query access.
c) Reports are for the vast majority, fixed width, and while generally (but not always) the same fixed width, there's sub-data below each line of useful data on the reports that overlaps the columns with actual data. i.e.: (This example isn't real data). The extra description is good for human eyes but for Excel, for instance, it's garbage.
Desciption Item Qty Moved
Extra description
------------------------------------
3" PASTRY 220464 -35
Status is in WH 13 BIN 5
My dilemma is that the extra line overlaps. there's no guarantee that the "Item" will be numeric data and the same positions in the extra line will not be numeric, and so i need a way to be able to parse this data in an intelligible way. As I'm a novice programmer (i've dabbled) I was thinking Python could be used but not quite how.
Other accountants basically said "why not just use Data->Columns in excel for this but i want automation (I have a bunch of reports that go through Automate out to the cloud, etc) and sometimes things like the description push the item # over so i can't rely on the physical position in the document (i could then just use Python and parse the line by character #) unless i'm manually looking at the data.
If anyone could point me in a direction for something I can learn to help me accomplish this that makes it easier, I'd appreciate it.
Given the inconsistent nature of your data, Python with its powerful libraries such as pandas and numpy can indeed be extremely helpful for data manipulation and cleaning tasks like this
It might have a steep learning curve in the beginning but it'll pay off in the long run.
Subreddit
Post Details
- Posted
- 1 year ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/learnprogra...