Friday, October 24, 2008

How to get HTML content of a web page for a given URL programatically with c# ?

This article shows how to grab the HTML content of a given url programatically with C#.DotNet Framework provides different classes to send http web request programatically which are under namespaec 'System.Net'.For the current implementation i'm using types 'HttpWebRequest' to send hhtp request and 'HttpWebResponse' to capture the response send by web server.
The code looks like this

static void Main(string[] args)
{
string html;
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://google.com");
try
{
HttpWebResponse res =(HttpWebResponse)req.GetResponse();
if (res != null)
{
if (res.StatusCode == HttpStatusCode.OK)
{
Stream stream = res.GetResponseStream();
using (StreamReader reader = new StreamReader(stream))
{
html = reader.ReadToEnd();
}
Console.Write(html);
Console.Read();
}
res.Close();
}
}
catch { }
}


Both the HttpWebRequest & HttpWebResponse objects doesnt support any constructors.The Type WebRequest supports static method called Create() which takes URL of the web page as parameter & returns 'WebRequest' object which can be casted to 'HttpWebRequest'.
Similarly HttpWebRequest object supports method GetResponse() which returns WebResponse object which can be casted into HttpWebResponse.The response of the internet resource can be accessed from HttpWebResponse object by calling 'GetResponseStream()' which returns a stream.
If the method of the http request to internet source is POST & it is having any PostData which to be attached to http request can be sent by attaching it to HttpRequest Stream which can be accessed by calling the method GetRequestStream() of HttpWebRequest object.

No comments: