This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

8
Idiomatic way to create dictionary
Post Body

I have written a crawler to fetch a bunch of university resources as they get uploaded and I was just wondering what the pythonic way to create the list of dictionaries is in this case? From what I have read, pep8 Hitchhiker's Guide to Python Code Style Guide says that unit = {} is not a recommended convention.

Also, would it be advisable to change my regex string to a compiled regex? Performance wise I know that re caches the compiled regex anyway, so I gather it would be for readability improvements.

def parse_config():
    soup = bs(open('config.xml').read(), 'lxml')
    units = []

    for u in (soup.find_all('unit')):
        unit = {}
        unit['unitname'] = re.sub('<[^<] ?>', '', str(u.unitname))
        unit['unitcode'] = re.sub('<[^<] ?>', '', str(u.unitcode))
        unit['directory'] = re.sub('<[^<] ?>', '', str(u.directory))
        unit['url'] = unescape(re.sub('<[^<] ?>', '', str(u.url)))
        unit['semester'] = re.sub('<[^<] ?>', '', str(u.semester))
        unit['year'] = re.sub('<[^<] ?>', '', str(u.year))

        units.append(unit)

    return units

edit: Best solution was provided by /u/gschizas here and here.

As said in a comment, I had actually already worked on a refactor which was:

def parse_config():
    soup = bs(open('config.xml').read(), 'lxml')

    for u in (soup.find_all('unit')):
        yield {'name': clean_tags(u.unitname),
               'code': clean_tags(u.unitcode),
               'directory': clean_tags(u.directory),
               'url': unescape(clean_tags(u.url)),
               'semester': clean_tags(u.semester),
               'year': clean_tags(u.year)}

Just wanted some other insights into how to solve this problem

edit 2: Should have checked which code style I was remembering :( changed pep8 to Hitchhikers Guide to Python

Author
Account Strength
100%
Account Age
10 years
Verified Email
Yes
Verified Flair
No
Total Karma
14,995
Link Karma
842
Comment Karma
14,140
Profile updated: 1 day ago
Posts updated: 5 months ago

Subreddit

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.
Posted
8 years ago