c# - WebClient download string is different than WebBrowser View source -
i create c# 4.0 application download webpage content using web client.
webclient function
public static string getdoctext(string url) { string html = string.empty; try { using (configurablewebclient client = new configurablewebclient()) { /* set timeout webclient */ client.timeout = 600000; /* build url */ uri innuri = null; if (!url.startswith("http://")) url = "http://" + url; uri.trycreate(url, urikind.relativeorabsolute, out innuri); try { client.headers.add("user-agent", "mozilla/4.0 (compatible; msie 7.0; windows nt 6.1; trident/5.0; slcc2; .net clr 2.0.50727; .net clr 3.5.30729; .net clr " + "3.0.30729; media center pc 6.0; .net4.0c; .net4.0e; infopath.2; asktbfxtv5/5.15.4.23821; bri/2)"); client.headers.add("vary", "accept-encoding"); client.encoding = encoding.utf8; html = client.downloadstring(innuri); if (html.contains("pagina non disponibile")) { string str = "site blocked"; str = ""; } if (string.isnullorempty(html)) { return string.empty; } else { return html; } } catch (exception ex) { return ""; } { client.dispose(); } } } catch (exception ex) { return ""; } } public class configurablewebclient : webclient { public int? timeout { get; set; } public int? connectionlimit { get; set; } protected override webrequest getwebrequest(uri address) { var baserequest = base.getwebrequest(address); var webrequest = baserequest httpwebrequest; if (webrequest == null) return baserequest; if (timeout.hasvalue) webrequest.timeout = timeout.value; if (connectionlimit.hasvalue) webrequest.servicepoint.connectionlimit = connectionlimit.value; return webrequest; } }
i examine download content in c# web client it's different browser
content. give same url in browser ( mozilla firefox ) , web client function.
the webpage shows content correctly web client downloadstring returns another
html. please see web client response below.
webclient downloaded html
<!doctype html> <head> <meta name="robots" content="noindex, nofollow"> <meta http-equiv="cache-control" content="max-age=0" /> <meta http-equiv="cache-control" content="no-cache" /> <meta http-equiv="expires" content="0" /> <meta http-equiv="expires" content="tue, 01 jan 1980 1:00:00 gmt" /> <meta http-equiv="pragma" content="no-cache" /> <meta http-equiv="refresh" content="10; url=/distil_r_captcha.html?ref=/pgol/4-abbigliamento/3-roma%20%28rm%29/p-7&distil_rid=a8d2f8b6-b314-11e3-a5e9-e04c5dba1712" /> <script type="text/javascript" src="/ga.280243267228712.js?pid=6d4e4d1d-7094-375d-a439-0568a6a70836" defer></script><style type="text/css">#d__ffh{position:absolute;top:-5000px;left:-5000px}#d__ff{font-family:serif;font-size:200px;visibility:hidden}#glance7ca96c1b,#hiredf795fe70,#target01a7c05a,#hiredf795fe70{display:none!important}</style></head> <body> <div id="distil_ident_block"> </div> <div id="d__ffh"><object id="d_dlg" classid="clsid:3050f819-98b5-11cf-bb82-00aa00bdce0b" width="0px" height="0px"></object><span id="d__ff"></span></div></body> </html>
my problem webclient function not returned actual webpage content.
please help.
some web program respond different http request header.
so, if want same html web browser's,
then send same http request of web browser!
how?
using firefox developer tool or chrome developer tool, , copy http request!
Comments
Post a Comment