.pdf to text question

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

Hello. Rank Python newbie here with a question. I have been working with texts converted from pdfs using Python. No problem there as I got the code working well cycling through multiple pdfs with no problems EXCEPT for the low quality of the texts. I've had to do a lot of tweaking to the texts and it's time consuming. On a whim I manually selected and copied a pdf and then pasted it to a text file. I had previously converted this pdf to text using Python (PyPDF2) and the difference in quality between the two text files was staggering. PyPdf2 text extraction just doesn't stand up in quality to a manual C&P. If I had had C&P text files I could have saved myself a lot of time. I get a high number of new pdfs every day and do not have the time to C&P them manually. That said, here's my question:

Is there a way to use Python to select and copy a pdf file like it was being done manually and then paste it to a text file rather than use the standard Python PyPDF2 text extraction method?

Author

Account Strength

100%

Account Age

7 years

Verified Email

Yes

Verified Flair

Total Karma

2,048

Link Karma

Comment Karma

2,001

Profile updated: 6 days ago

Posts updated: 9 months ago

MasterTony127

Subreddit

r/pythonhelp

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 1 year ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/pythonhelp/...