This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
On the basis of programming there's most of the time obtaining the right data as much machine readable as possible. Those cases when you crawl the internet looking for the right table but what you find leads to more in code complexity instead of simplification. EG. tables with city names, historical dates, dictionaries, etc.
Yesterday, for example I needed a csv on this format:
Old Spanish, Spanish
Delos sos oios tan fuerte mientre lorando , De sus ojos fuertemente llorando,
Tornaua la cabeça & estaua los catando , De un lado a otro volvÃa la cabeza mirándolos;
...
But Instead I had a txt with the original Medieval text and a pdf with a free styled translation, with added rhyme and different sentence structure and length that made those texts by all means defased and unpairable. I didnt noticed that until I've already lost a whole lot of time preformatting both texts. Now I luckly found this html:
<dd>Con sesenta abanderados, a los que a ver salÃan mujeres y varones; </TD> <TD style="BORDER-TOP: 0px solid" VALIGN="TOP"> </P> <P><font face="Old English Text MT">En su co<EM><SUP>n</SUP></EM>pan<EM><SUP>n</SUP></EM>a .Lx. pendones ([2leuaua]) exie<EM><SUP>n</SUP></EM> lo uer mugieres & uarones </TD> <TD></TD></TR> <TR><TD style="BORDER-TOP: 0px solid">
<dd>Asomados por las ventanas burgalese y burgalesas vio </TD>
<TD style="BORDER-TOP: 0px solid" VALIGN="TOP"> </P> <P><font face="Old English Text MT">Burgeses & burgesas por las finiestras son ([3puestas]) </TD> <TD></TD></TR> <TR><TD style="BORDER-TOP: 0px solid">
As you see I have a bunch of work yet dealing with html tags and encodings to reach to csv bay. So question is where I can find the right data for each application?
Subreddit
Post Details
- Posted
- 9 months ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/AskProgramm...