Jsoup want to get Values where class names are same for all elements -
this html. want 2 details
publisher: springer-verlag, price: $7,284
problem outer , inner class names same. please suggest how above 2 values below html using jsoup.
<div class="details"> <div class="fullname">analytical , bioanalytical chemistry (2011)</div> <div class="catbox"> <div class="catcontents"> <div class="contents_ct1">eigenfactor category:</div> <div class="contents_ct2" style="margin-left: -5px;">analytic chemistry</div> </div> <div class="catcontents"> <div class="contents_ct1">isi category:</div> <div class="contents_ct2" style="margin-left: -49px;">co ea</div> </div> <div class="catcontents"> <div class="contents_ct1">group:</div> <div class="contents_ct2" style="margin-left: -80px;">sci</div> </div> <div class="catcontents"> <div class="contents_ct1">total articles (5yrs):</div> <div class="contents_ct2" style="margin-left: -12px;">3,544</div> </div> </div> <div class="catbox" style="margin-left: 20px"> <div class="catcontents"> <div class="contents_ct1">publisher:</div> <div class="contents_ct2" style="margin-left: -55px;">springer-verlag</div> </div> <div class="catcontents"> <div class="contents_ct1">first published:</div> <div class="contents_ct2" style="margin-left: -35px;">2001</div> </div> <div class="catcontents"> <div class="contents_ct1"><a href="http://journalprices.com/" title="prices provided journalprices.com" target="_blank" style="font-size: 11px">price:</a></div> <div class="contents_ct2" style="margin-left: -80px;">$7,284</div> </div> <div class="catcontents"> <div class="contents_ct1">cost effectiveness:</div> <div class="contents_ct2" style="margin-left: -18px;">1.0302</div> </div> </div> <div class="tgraph"> <div class="plotb"> <iframe src="plot1.php?issn=1618-2642" width="370px" height="220px" frameborder=0 scrolling="no"></iframe> </div> <div class="plotb" style="margin-left: 10px"> <iframe src="plot2.php?issn=1618-2642" width="340px" height="220px" frameborder=0 scrolling="no"></iframe> </div> </div> </div>
static html structure
assuming layout follows structure of source provided, can use simple css selector syntax specify element parse.
element publisher = doc.select("div.catbox:eq(2) div.catcontents div.contents_ct2").first(); element price = doc.select("div.catbox:eq(2) div.catcontents:eq(2) div.contents_ct2").first(); system.out.println("publisher: " + publisher.text() + "\nprice: " + price.text());
would result in print out
run: publisher: springer-verlag price: $7,284
dynamic html structure
if structure isn't same time, below code should produce same result checks text of elements identify them correctly.
elements content = doc.select("div.catcontents"); element publisher = null; element price = null; (element element : content) { if(element.text().startswith("publisher")){ publisher = element; } if(element.text().startswith("price")){ price = element; } } system.out.println(publisher.text() + "\n" + price.text());
Comments
Post a Comment