Let's talk Solutions: Regular Expressions. Lets not be greedy.

Recently faced a situation when I wanted to extracted parts of a html code. But then I realized, I have no control over the amount of information regex is giving me back.

Regular Expressions have these qualifiers :

* + ?

They are all greedy. That means they will always try to get as much as possible .

Let's see an example.

>>> r = re.search('<.*>', '<a><b><ab>')
>>> r.group()
'<a><b><ab>'

Now I didn't ask for the whole deal here. I just was trying to extract '<a>' from the above text.
So we need to use the ? in partnership with the *

>>> r = re.search('<.*?>', '<a><b><ab>')
>>> r.group()
'<a>'

Another Example:

First case is without the ?. Second is with *?

>>> r = re.search('(?:http://.*/)', 'http://www.google.com/search/query/')
>>> r.group()
'http://www.google.com/search/query/'

>>> r = re.search('(http://.*?/)', 'http://www.google.com/search/query/')
>>> r.group()
'http://www.google.com/'

Let's talk Solutions

Search

11 September, 2018

Regular Expressions. Lets not be greedy.

Let's see an example.

Another Example:

No comments:

Post a Comment