Using REGEX to match elements between lines in Python -


i'm looking use regex extract quantity out of shopping website. in following example, want "12.5 kilograms". however, quantity within first span not listed in kilograms; lbs., oz., etc.

        <td class="size-price last first" colspan="4">                     <span>12.5 kilograms </span>             <span> <span class="strike">$619.06</span> <span class="price">$523.91</span>                     </span>                 </td> 

the code above small portion of extracted using beautifulsoup. whatever page is, quantity within span , on new line after

<td class="size-price last first" colspan="4">   

i've used regex in past far expert. i'd know how match elements between different lines. in case between

<td class="size-price last first" colspan="4"> 

and

<span> <span class="strike"> 

avoid parsing html regex. use tool job, html parser, beautifulsoup - powerful, easy use , can handle case:

from bs4 import beautifulsoup   data = """ <td class="size-price last first" colspan="4">                     <span>12.5 kilograms </span>             <span> <span class="strike">$619.06</span> <span class="price">$523.91</span>                     </span>                 </td>""" soup = beautifulsoup(data)  print soup.td.span.text 

prints:

12.5 kilograms  

or, if td part of bigger structure, find class , first span's text out of it:

print soup.find('td', {'class': 'size-price'}).span.text 

upd (handling multiple results):

print [td.span.text td in soup.find_all('td', {'class': 'size-price'})] 

hope helps.


Comments

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

objective c - Greedy NSProgressIndicator Allocation -

how to set an OCR language in Google Drive -