Idiomatic way to create dictionary

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

I have written a crawler to fetch a bunch of university resources as they get uploaded and I was just wondering what the pythonic way to create the list of dictionaries is in this case? From what I have read, ~~pep8~~ Hitchhiker's Guide to Python Code Style Guide says that unit = {} is not a recommended convention.

Also, would it be advisable to change my regex string to a compiled regex? Performance wise I know that re caches the compiled regex anyway, so I gather it would be for readability improvements.

def parse_config():
    soup = bs(open('config.xml').read(), 'lxml')
    units = []

    for u in (soup.find_all('unit')):
        unit = {}
        unit['unitname'] = re.sub('<[^<] ?>', '', str(u.unitname))
        unit['unitcode'] = re.sub('<[^<] ?>', '', str(u.unitcode))
        unit['directory'] = re.sub('<[^<] ?>', '', str(u.directory))
        unit['url'] = unescape(re.sub('<[^<] ?>', '', str(u.url)))
        unit['semester'] = re.sub('<[^<] ?>', '', str(u.semester))
        unit['year'] = re.sub('<[^<] ?>', '', str(u.year))

        units.append(unit)

    return units

edit: Best solution was provided by /u/gschizas here and here.

As said in a comment, I had actually already worked on a refactor which was:

def parse_config():
    soup = bs(open('config.xml').read(), 'lxml')

    for u in (soup.find_all('unit')):
        yield {'name': clean_tags(u.unitname),
               'code': clean_tags(u.unitcode),
               'directory': clean_tags(u.directory),
               'url': unescape(clean_tags(u.url)),
               'semester': clean_tags(u.semester),
               'year': clean_tags(u.year)}

Just wanted some other insights into how to solve this problem

edit 2: Should have checked which code style I was remembering :( changed pep8 to Hitchhikers Guide to Python

Author

Account Strength

100%

Account Age

10 years

Verified Email

Yes

Verified Flair

Total Karma

14,995

Link Karma

842

Comment Karma

14,140

Profile updated: 1 day ago

Posts updated: 5 months ago

eraptic

Subreddit

r/learnpython

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 8 years ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/learnpython...