python - how to extract a specific paragraph tag -
i want extract contents of response:
<div class="bio-container"> <p class="bio profile" > chinedu boy </p> </div> please assume there other paragrpah tags different class attributes, want extract 1 class attribute "bio-profile"
i want extract chinedu boy file.
i tried desc = bs.find ('p', {'class' : 'bio profile'})
but not working
this exact code trying apply answer above to:
import urllib bs4 import beautifulsoup bsoup import string httpresponse = urllib.urlopen("https://twitter.com/drericcole") html = httpresponse.read() bs = bsoup(html) desc = bs.find("p", class_="bio profile-field") print desc.get_text().strip() but error statement
print desc.get_text().strip() attributeerror: 'nonetype' object has no attribute 'get_text'
you should use .get_text() method on desc. using python 2.7 , bs 4.3.2:
from bs4 import beautifulsoup bsoup ofile = open("test.html") soup = bsoup(ofile) desc = soup.find("p", class_="bio profile") # or desc = soup.find("p", {"class":"bio profile"}) print desc.get_text().strip() result:
chinedu boy [finished in 0.2s] hope helps.
Comments
Post a Comment