Recently faced a situation when I wanted to extracted parts of a html code. But then I realized, I have no control over the amount of information regex is giving me back.
Regular Expressions have these qualifiers :
They are all greedy. That means they will always try to get as much as possible .
Regular Expressions have these qualifiers :
* + ?
They are all greedy. That means they will always try to get as much as possible .
Let's see an example.
>>> r = re.search('<.*>', '<a><b><ab>') >>> r.group() '<a><b><ab>'
Now I didn't ask for the whole deal here. I just was trying to extract '<a>' from the above text.
So we need to use the ? in partnership with the *
>>> r = re.search('<.*?>', '<a><b><ab>') >>> r.group() '<a>'
Another Example:
First case is without the ?. Second is with *?>>> r = re.search('(?:http://.*/)', 'http://www.google.com/search/query/') >>> r.group() 'http://www.google.com/search/query/'
>>> r = re.search('(http://.*?/)', 'http://www.google.com/search/query/') >>> r.group() 'http://www.google.com/'
No comments:
Post a Comment