Navigating an HTML tree in Python -

January 15, 2011

<td id="aisd_calendar-2014-04-28-0" class="single-day future" colspan="1" rowspan="1" date="**2014-04-28**" >   <div class="inner">     <div class="item">   <div class="view-item view-item-aisd_calendar">   <div class="calendar monthview">         <div class="calendar.4168.field_date.8.0 contents">                       <a href="/event/2013/regular-board-meeting">**regular board meeting**</a>                      <span class="date-display-single">7:00 pm</span>          </div>           <div class="cutoff">&nbsp;</div>       </div>    </div>    </div>  </div> </td>

hi! have above html code. extract "date" tag (2014-04-28) , "a href" tag (regular board meeting) above. how can using python? can done using beautiful soup? appreciated

here's how can via beautifulsoup:

from bs4 import beautifulsoup   data = """ <html>     <body>         <td id="aisd_calendar-2014-04-28-0" class="single-day future" colspan="1" rowspan="1" date="**2014-04-28**" >           <div class="inner">             <div class="item">           <div class="view-item view-item-aisd_calendar">           <div class="calendar monthview">                 <div class="calendar.4168.field_date.8.0 contents">                               <a href="/event/2013/regular-board-meeting">**regular board meeting**</a>                      <span class="date-display-single">7:00 pm</span>          </div>                 <div class="cutoff">&nbsp;</div>               </div>           </div>         </div>  </div>         </td>     </body> </html> """ soup = beautifulsoup(data)  td = soup.body.td  # or soup.find('td', id='aisd_calendar-2014-04-28-0') print td['date'].strip('*')  link = soup.find('div', {'class': 'contents'}).a print link['href']

prints:

2014-04-28 /event/2013/regular-board-meeting

also, if need convert date python's datetime, can use strptime():

from datetime import datetime  ...  datetime.strptime(td['date'].strip('*'), '%y-%m-%d')

hope helps.

Search This Blog

Silver

Navigating an HTML tree in Python -

Comments

Post a Comment

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

objective c - Greedy NSProgressIndicator Allocation -

how to set an OCR language in Google Drive -