`
baron.unsafe
  • 浏览: 80247 次
社区版块
存档分类
最新评论

Using Internet Explorer from .NET

    博客分类:
  • c#
阅读更多
5.0         Introduction
Earlier in this book we have looked at how to read HTML from websites, and how to navigate through websites using GET and POST requests. These techniques certainly offer high performance, but with many websites using cryptic POST data, complex cookie data, and JavaScript rendered text, it might be useful to know that you can always call on the assistance of Internet Explorer browsing engine to help you get the data you need.

It must be stated though, that using Internet Explorer to data mine web pages creates a much larger memory footprint, and is not as fast as scanning using HTTP requests alone. But it does come into its own when a data mining process requires a degree of human interaction. A good example of this would be if you wanted to create an automated test of your website, and needed to allow a non-technical user the ability to follow a sequence of steps, and select data to extract and compare, based on the familiar Internet Explorer interface.

This chapter is divided into two main sections. The first deals with how to use the Internet Explorer object to interact with all the various types of web page controls. The second section deals with how Internet explorer can detect and respond to a user interacting with web page elements.

5.1   Web page navigation
The procedure for including the Internet Explorer object in your application differs depending on which version of Visual Studio .NET you are using. After starting a new windows forms project, users of Visual Studio .NET 2002 should right click on their toolbox and select customize toolbox, click COM components, then select Microsoft Web Browser.  Users of Visual Studio .NET 2003 should right click on their toolbox and select Add/Remove Items, and then follow the same procedure as mentioned above. In Visual Studio .NET 2005, you do not need to add the web browser to the toolbox, just drag theWebBrowser control to the form.

An important distinction between the Internet Explorer object used in Visual Studio .NET 2002/03 and the 2005 version is that, the latter uses a native .NET class to interact with Internet Explorer, whereas the former uses a .NET wrapper around a COM (Common Object Model) object. This creates some syntactic differences between how Internet Explorer is used within .NET 2.0 and .NET 1.x. The first example in this chapter will cover both versions of .NET for completeness. Further examples will show .NET 2.0 code only, unless the equivalent .NET 1.x code would differ substantially.

The first thing you will need to know when using Internet Explorer is how to navigate to a web page. Since Internet Explorer works asynchronously, you will also need to know when Internet Explorer is finished loading a web page. In the following example, we will simply navigate to www.google.com and popup a message box once the page is loaded.

To begin this example, drop an Internet Explorer object onto a form, as described above, and call it WebBrowser. Now add a button to the form and name it btnNavigate. Click on the button and add the following code

C#

private void btnNavigate_Click(object sender, System.EventArgs e)

{

    NavigateToUrlSync("http://www.google.com");

    MessageBox.Show("page loaded");

}

VB.NET

Private Sub btnNavigate_Click(ByVal sender As System.Object, _

            ByVal e As System.EventArgs) Handles btnNavigate.Click

    NavigateToUrlSync("http://www.google.com")

    MessageBox.Show("page loaded")

End Sub



We then create the NavigateToUrlSync method. Note how the C# version differs in version 1.x and 2.0. This is because the COM object is expecting four optional ref object parameters. These parameters can optionally define the flags, target frame name, post data and headers sent with the request. They are not used in this case, yet since C# does not support optional parameters they have to be passed in nonetheless.

C# 1.x

public void NavigateToUrlSync(string url)

{

    object oMissing = null;               

    bBusy=true;

    WebBrowser.Navigate(url,ref oMissing,ref oMissing,ref oMissing,ref oMissing);

    while(bBusy)

    {

          Application.DoEvents();

    }

}

C# 2.0

public void NavigateToUrlSync(string url)

{            

    bBusy=true;

    WebBrowser.Navigate(url);

    while(bBusy)

    {

          Application.DoEvents();

    }

}

VB.NET

Public Sub NavigateToUrlSync(ByVal url As String)

   bBusy = True

   WebBrowser.Navigate(url)

   While (bBusy)

     Application.DoEvents()

   End While

End Sub

The while loop is polls until the public bBusy flag is cleared. The DoEvents command ensures that the application remains responsive whilst waiting for a response from the web server.

To clear the bBusy flag, we handle either the DocumentComplete (.NET 1.x) or DocumentCompleted (.NET 2.0) thus:

C# 1.x

private void WebBrowser_DocumentComplete(object sender, AxSHDocVw.DWebBrowserEvents2_DocumentCompleteEvent e)

{

    bBusy = false;

}

C# 2.0

private void WebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)

{

    bBusy = false;

}

VB.NET 1.x

Private Sub WebBrowser_DocumentComplete(ByVal sender As Object, _

            ByVal e As AxSHDocVw.DWebBrowserEvents2_DocumentCompleteEvent) _

            Handles WebBrowser.DocumentComplete

    bBusy = False

End Sub

VB.NET 2.0

Private Sub WebBrowser_DocumentCompleted(ByVal sender As Object, _

           ByVal e As WebBrowserDocumentCompletedEventArgs) _

           Handles WebBrowser.DocumentCompleted

     bBusy = False

End Sub

To finish off the example, don抰 forget to declare the public bBusy flag.

C#

public bool bBusy = false;

VB.NET

public bBusy As Boolean = false

To test the application, compile and run it in Visual Studio, then press the navigate button.  

5.2   Manipulating web pages
An advantage of using Internet Explorer over raw HTTP requests is that you get access to the DOM (Document Object Model) of web pages, once they are loaded into Internet Explorer. For developers familiar with JavaScript, this should be an added bonus, since you will be able to control the web page in much the same way as if you were using JavaScript within a HTML page.

The main difference however, between using the DOM in .NET versus JavaScript, is that .NET is a strongly typed language, and therefore you must know the type of the element you are interacting with before you can access its full potential.

If you are using .NET 1.x you will need to reference the HTML type library, by clicking Projects > Add Reference. Then select Microsoft.mshtml from the list. For each of the examples in this section you must import the namespace into your code thus:

C#

using mshtml;

VB.NET

Imports mshtml

If you then cast the WebBrowser.Document object to an HTMLDocument class, many of the code examples shown below should word equally well for .NET 1.x as .NET 2.0

5.2.1  Frames
Frames may be going out of fashion in modern websites, but oftentimes, you may need to extract data from a website that uses frames, and you need to be aware how to handle them within Internet Explorer. In this section, you will notice that the code differs substantially between version 1.x and 2.0 of .NET, therefore source code for both are included.

To create a simple frameset, create three files, Frameset.html, left.html and right.html, these files containing the following HTML code respectively.

Frameset.html

<html>

<frameset cols="50%,50%">

  <frame name="LeftFrame" src="Left.html">

  <frame name="LeftFrame" src="right.html">

</frameset>

</html>

Left.html

<html>

This is the left frame

</html>

Right.html

<html>

This is the right frame

</html>

In the following example, we will use Internet Explorer to read the HTML contents of the left frame. This example uses code from the program listing in section 5.1, and assumes you have saved the HTML files in C:\

VB.NET 1.x

Private Sub btnNavigate_Click(ByVal sender As System.Object, _

            ByVal e As System.EventArgs) Handles btnNavigate.Click

    NavigateToUrlSync("C:\frameset.html")

    Dim hDoc As HTMLDocument

    hDoc = WebBrowser.Document

    hDoc = CType(hDoc.frames.item(0), HTMLWindow2).document

    MessageBox.Show(hDoc.body.innerHTML)

End Sub

VB.NET 2.0

Private Sub btnNavigate_Click(ByVal sender As System.Object, _

            ByVal e As System.EventArgs) Handles btnNavigate.Click

     NavigateToUrlSync("C:\frameset.html")

     Dim hDoc As HtmlDocument

     hDoc = WebBrowser.Document.Window.Frames(0).Document

     MessageBox.Show(hDoc.Body.InnerHtml)

End Sub

C# 1.x

private void btnNavigate_Click(object sender, System.EventArgs e)

{

    NavigateToUrlSync(@"C:\frameset.html");

    HTMLDocument hDoc;

    object oFrameIndex = 0;

    hDoc = (HTMLDocument)WebBrowser.Document;

    hDoc = (HTMLDocument)((HTMLWindow2)hDoc.frames.item(

           ref oFrameIndex)).document;

    MessageBox.Show(hDoc.body.innerHTML); 

}

C# 2.0

private void btnNavigate_Click(object sender, System.EventArgs e)

{

    NavigateToUrlSync(@"C:\frameset.html");

    HtmlDocument hDoc;

    hDoc = WebBrowser.Document.Window.Frames[0].Document;

    MessageBox.Show(hDoc.Body.InnerHtml);          

}

The main difference between the .NET 2.0 and .NET 1.x versions of the above code is that the indexer on the frames collection returns an object, which must be cast to an HTMLWindow2 under the COM wrapper in .NET 1.x. In .NET 2.0 the indexer performs the cast internally, and returns an HtmlWindow object.

To test the application, compile and run it from Visual Studio .NET, press the navigate button, and a message box should pop up saying this is the left frame.

5.2.2  Input boxes
Input boxes are used in HTML to allow the user enter text into a web page. Here we will automatically populate an input box with some data.

Given a some HTML, which we save as InputBoxes.html as follows

<html>

  <form name="myForm">

  My Name is :

  <input type="text" value="" name="myName">

</form>

</html>

We can get a reference to the input box on the form by calling getElementById on the HtmlDocument. In .NET 1.x this should be then cast to an IHTMLInputElement.

C# 2.0

private void btnNavigate_Click(object sender, System.EventArgs e)

{

    NavigateToUrlSync(@"C:\InputBoxes.html");

    HtmlElement hElement;

    hElement = WebBrowser.Document.GetElementById("myName");

    hElement.SetAttribute("value", "Joe Bloggs");

}

VB.NET 2.0

Private Sub btnNavigate_Click(ByVal sender As System.Object, _

            ByVal e As System.EventArgs) Handles btnNavigate.Click

    NavigateToUrlSync("C:\InputBoxes.html")

    Dim hElement As HtmlElement

    hElement = WebBrowser.Document.GetElementById("myName")

    hElement.SetAttribute("value", "Joe Bloggs")

End Sub

In order to enter the text into the input box, we call the SetAttribute method of the HtmlElement, passing in the property to change, and the new text. In .NET 1.x we would set the value property of the IHTMLInputElement to the new text.

To test the application, compile and run it from Visual Studio .NET, then press the navigate button.

5.2.3  Drop down lists
In HTML, drop down lists are used in web pages to allow users input from a list of pre-defined values. In the following example, we will demonstrate how to set a value of a drop down list, and then read it back.

We shall start off with a HTML file, which we save as DropDownList.html

<html>

  <form name="myForm">

   My favourite colour is:

   <select name="myColour">

    <option value="Blue">Blue</option>

    <option value="Red">Red</option>

   </select>

</form>

</html>

We can get a reference to the drop down list by calling getElementById on the HtmlDocument. In .NET 1.x this should be then cast to anIHTMLSelectElement.

C# 2.0

private void btnNavigate_Click(object sender, System.EventArgs e)

{

     NavigateToUrlSync(@"C:\dropdownlists.html");

     HtmlElement hElement;          

     hElement = WebBrowser.Document.GetElementById("myColour");

     hElement.SetAttribute("selectedIndex", "1");

     MessageBox.Show("My favourite colour is:" + hElement.GetAttribute("value"));

}

VB.NET 2.0

Private Sub btnNavigate_Click(ByVal sender As System.Object, _

            ByVal e As System.EventArgs) Handles btnNavigate.Click

     NavigateToUrlAsync("C:\dropdownlists.html")

     Dim hElement As HtmlElement

     hElement = WebBrowser.Document.GetElementById("myColour")

     hElement.SetAttribute("selectedIndex", "1")

     MessageBox.Show("My favourite colour is:" + hElement.GetAttribute("value"))

End Sub

Here, we can see that in order to set our selection we pass ?/span>selectedIndex? and the selection number to SetAttribute. We then pass ?/span>value? to GetAttribute in order to read back the selection. In .NET 1.x, we achieve the same results by setting the selectedIndex property on theIHTMLSelectElement and reading back the selection from the value property.

To test the application, compile and run it from Visual Studio .NET, press the navigate button, and you should see a message box appear saying my favorite color is: Red.

5.2.4    Check boxes and radio buttons
Check boxes and radio buttons are generally used on web pages to allow the user to select between small numbers of options. In the following example, we shall demonstrate how to toggle check boxes and radio buttons.

We shall start off with a HTML file, which we will save as CheckBoxes.html

<html>

<form name="myForm">

  <input type="checkbox" name="myCheckBox">Check this.<br>

  <input type="radio" name="myRadio" value="Yes">Yes

  <input type="radio" name="myRadio" checked="true" value="No">No

</form>

</html>

As before we can get a reference to the checkbox by calling getElementById. However, since the two radio buttons have the same name, we need to use

Document.All.GetElementsByName and then select the required radio button from the HtmlElementCollection returned.

In .NET 1.x, we would use a call to getElementsByName on the HTMLDocument. This returns an IHTMLElementCollection. We can then get the reference to the IHTMLInputElement with the method item(null,1).

C# 2.0

private void btnNavigate_Click(object sender, System.EventArgs e)

{

    NavigateToUrlSync(@"C:\checkboxes.html");

    HtmlElement hElement;

    HtmlElementCollection hElements;

    hElement = WebBrowser.Document.GetElementById("mycheckBox");

    hElement.SetAttribute("checked", "true");

    hElements = WebBrowser.Document.All.GetElementsByName("myRadio");

    hElement = hElements[0];

    hElement.SetAttribute("checked", "true");             

}

VB.NET 2.0

Private Sub btnNavigate_Click(ByVal sender As System.Object, _

            ByVal e As System.EventArgs) Handles btnNavigate.Click

    NavigateToUrlSync("C:\checkboxes.html")

    Dim hElement As HtmlElement

    hElement = WebBrowser.Document.GetElementById("mycheckBox")

    hElement.SetAttribute("checked", "true")

    hElement = WebBrowser.Document.All.GetElementsByName("myRadio").Item(0)

    hElement.SetAttribute("checked", "true")

End Sub

As before, we set the property of the HtmlElement using the SetAttribute method. In .NET 1.x, you need to set the @checked property on theIHTMLInputElement

To test the application, compile and run it from Visual Studio, then press the navigate button. You should see the check box and radio button toggle simultaneously.

5.2.5  Buttons
Submit buttons and standard buttons are generally used to submit forms in HTML. They form a crucial part in navigating any website.

Given a simple piece of HTML, which we save as Buttons.html as follows:

<html>

<form action="http://www.google.com/search" method="get" name="myForm">

  <input type="text" value=".NET" name="q">

  <input type="submit" name="btnSubmit" value="Google Search">

</form>

</html>

We can get a reference to the button on the form by calling getElementById on the HtmlDocument. In .NET 1.x this should be then cast to anIHTMLElement.

C# 2.0

private void btnNavigate_Click(object sender, System.EventArgs e)

{

     NavigateToUrlSync(@"C:\buttons.html");

     HtmlElement hElement;          

     hElement = WebBrowser.Document.GetElementById("btnSubmit");

     hElement.InvokeMember("click");          

}

VB.NET 2.0

Private Sub btnNavigate_Click(ByVal sender As System.Object, _

            ByVal e As System.EventArgs) Handles btnNavigate.Click

    NavigateToUrlSync("C:\buttons.html")

    Dim hElement As HtmlElement

    hElement = WebBrowser.Document.GetElementById("btnSubmit")

    hElement.InvokeMember("click")

End Sub

In the above example, we can see that after we get a reference to the button, we call the click method using InvokeMember. Similarly, if we wanted to submit the form without clicking the button, we could get a reference to myForm and pass ?/span>submit? to the InvokeMember method.

In .NET 1.x, there is no InvokeMember method of IHTMLElement, so therefore you must call the click method of theIHTMLElement. In the case of a form, you should cast the IHTMLElement to an IHTMLFormElement and call it submit method.

To test this application, compile and run it from Visual Studio .NET, and press the navigate button. The form should load and then automatically forward itself to a google.com search result page..

5.2.6  JavaScript
Many web pages use JavaScript to perform complex interactions between the user and the page. It is important to know how to execute JavaScript functions from within Internet explorer. The simplest method is to use Navigate with the prefixjavascript: then the function name. However, this does not give us a return value, nor will it work correctly in all situations.

We shall start with a HTML page, which contains a JavaScript function to display some text. This will be saved asJavaScript.html

<html>

<span id="hiddenText" style="display:none">This was displayed by javascript</span>

  <script language="javascript">  

  function jsFunction()

  {

   window.document.all["hiddenText"].style.display="block";

   return "ok";

  }

</script>

</html>

We can then use the Document.InvokeScript method to execute the JavaScript thus:

C# 2.0

private void btnNavigate_Click(object sender, System.EventArgs e)

{

     NavigateToUrlSync(@"C:\javascript.html");

     string strRetVal = "";

     strRetVal = (string)WebBrowser.Document.InvokeScript("jsFunction");

     MessageBox.Show(strRetVal);

}

VB.NET 2.0

Private Sub btnNavigate_Click(ByVal sender As System.Object, _

            ByVal e As System.EventArgs) Handles btnNavigate.Click

        NavigateToUrlSync("C:\javascript.html")

        Dim strRetVal As String

        strRetVal = WebBrowser.Document.InvokeScript("jsFunction").ToString()

        MessageBox.Show(strRetVal)

End Sub

In .NET 1.x, we would call the parentWindow.execScript method on the HTMLDocument. Not forgetting to add empty parenthesis after the JavaScript function name. Unfortunately execScript returns null instead of the JavaScript return value.

To test the application, compile and run it from Visual Studio .NET, then press the Navigate button.

5.3   Extracting data from web pages
In order to extract HTML from a web page using Internet Explorer, you need to call Body.Parent.OuterHtml in .NET 2.0 orbody.parentElement.outerHTML in .NET 1.x. You should be aware that the HTML returned by this method is different to the actual HTML content of the page.

Internet Explorer will correct HTML in the page by adding <BODY>, <TBODY> and <HEAD> tags where missing. It will also capitalize existing HTML Tags, and make other formatting changes that you should be aware of.

Techniques for parsing this textual data are explained later in the book under the section concerning Regular Expressions.

5.4   Advanced user interaction
When designing an application which uses Internet Explorer as a tool for data mining, it comes of added benefit, that the user can interact with the control in a natural fashion, in order to manipulate its behavior. The following sections describe ways in which a user can interact with Internet Explorer, and how these events can be handled within .NET

5.4.1  Design mode
If you wanted to provide the user with the ability to manipulate web pages on-the-fly, there is no simpler way to do it, than using the in-built design mode?in internet explorer. This particular feature is not supported with the managed .NET 2.0WebBrowser control. However, it is possible to access the unmanaged interfaces, which we were using in .NET 1.x through the Document.DomDocument property. This can be then cast to the HTMLDocument in the mshtml library (Not to be confused with the managed HtmlDocument class). Therefore, in the case, you will need to add a reference to the mshtml library and add a using mshtml? statement to the top of your code.

In this example, we will create a simple rich text editor based on Internet Explorer design mode. Within design mode the user can perform a wide variety of tasks using intuitive actions, for example, you can insert an image by right clicking on the browser, or convert text to bold by pressing CTRL+B. Many of these tasks can be further automated using the execCommandmethod of the HTMLDocument object. In the following example, we will demonstrate how to set fonts using this method.

Open a new project in Visual Studio .NET, and drag a WebBrowser control onto the form, followed by a button, namedbtnFont. Also Add a FontDialog control named fontDialog. Click on the form and type the following code for the form load event.

C# 2.0

private void Form1_Load(object sender, EventArgs e)

{

     string url = "about:blank";

     webBrowser.Navigate(url);

     Application.DoEvents();

     HTMLDocument hDoc = (HTMLDocument)webBrowser.Document.DomDocument;

     hDoc.designMode = "On";

}

VB.NET 2.0

Private Sub Form1_Load(ByVal sender As System.Object, _

            ByVal e As System.EventArgs) Handles MyBase.Load

      Dim url As String = "about:blank"

      webBrowser.Navigate(url)

      Application.DoEvents()

      Dim hDoc As HTMLDocument = webBrowser.Document.DomDocument

      hDoc.designMode = "On"

End Sub

In .NET 1.x, we would cast the Document to an HTMLDocument, rather than referencing the DomDocument property, and also, the Navigate method would be as described in section 5.1.

Now click on the font button and enter some code as follows

C# 2.0

private void btnFont_Click(object sender, EventArgs e)

{

     fontDialog.ShowDialog();

     HTMLDocument hDoc = (HTMLDocument)webBrowser.Document.DomDocument;

     IHTMLTxtRange selection = (IHTMLTxtRange)hDoc.selection.createRange();          

     hDoc.execCommand("FontName", false, fontDialog.Font.FontFamily.Name);

     hDoc.execCommand("FontSize", false, fontDialog.Font.Size);

     selection.select();

}

VB.NET 2.0

Private Sub btnFont_Click(ByVal sender As System.Object, _

            ByVal e As System.EventArgs) Handles btnFont.Click

     fontDialog.ShowDialog()

     Dim hDoc As HTMLDocument = webBrowser.Document.DomDocument

     Dim selection As IHTMLTxtRange = hDoc.selection.createRange()

     hDoc.execCommand("FontName", False, fontDialog.Font.FontFamily.Name)

     hDoc.execCommand("FontSize", False, fontDialog.Font.Size)

     selection.select()

End Sub

From the above code we can see that we get a reference to the text currently highlighted by the user using theselection.createRange method. We then execute two commands on this selection, FontName and FontSize. Other commands that could be used would be ForeColor Italic, Bold and so forth.

To test the application, compile and run it from Visual Studio .NET, enter some text into the space provided, then highlight it. Click on the font button and choose a new font and size. The text should change to the selected font.

5.4.2  Capturing Post data
When a user is navigating through web pages, it may be necessary to keep track of what URLs they are going to, and post data sent between Internet Explorer and the web server. Although there are ways and means of doing this using packet sniffing, or third party tools, these do sometimes tend to listen to too much data and record traffic from other applications. Due to a bug in the .NET wrapper for Internet Explorer (see Microsoft Knowledge base 311298), the beforeNaviate event will not fire as you move between pages.

In order to subscribe to this event, we need to know a little about how COM events work under the hood? Every COM object which generates events will implement the IConnetionPointContainer interface. A client wishing to subscribe to events from this COM object must call the FindConnectionPoint method on this interface, passing the IID (Interface ID) of the required set of events.

Some COM objects support multiple sets of events or connection Points? for example, Internet Explorer supports theDWebBrowserEvents connection point, and the DWebBrowserEvents2 connection point. Herein lies the problem, the .NET wrapper will by default attach to the DWebBrowserEvents2 connection point, which contains a version of BeforeNavigatewhich is incompatible with .NET due to unsupported variant types.

If you open the ILDASM utility, then click file open, and select Interop.SHDocVw.DLL.
From the information in Figure 5.8 we can see that the Dispatch ID is set to 64 Hex (100 decimal). While using ILDASM we can also find the IID of the DWebBrowserEvents connection point by double clicking on class interface?that is, eab22ac2-30c1-11cf-a7eb-0000c05bae0b. At this point we have everything we need to create an interface in C# for this event.

C#

[Guid("eab22ac2-30c1-11cf-a7eb-0000c05bae0b"),

InterfaceType(ComInterfaceType.InterfaceIsIDispatch)]

public interface IWebBrowserEvents

{

    [DispId(100)]

    void RaiseBeforeNavigate(String url, int flags, String targetFrameName,

               ref Object postData, String headers, ref Boolean cancel);

}

To put this all together, create a new project in Visual Studio .NET, and drop in a Web Browser control (not the .NET 2.0 version, but the COM version). Add a reference to Microsoft Internet Controls under COM references. You will need to include both SHDocVw and System.Runtime.InteropServices in the using list at the head of your code. Now add the code for the interface listed above.

C#

private void Form1_Load(object sender, System.EventArgs e)

{

    UCOMIConnectionPointContainer icpc;

    UCOMIConnectionPoint icp;

    int cookie = -1;

    icpc = (UCOMIConnectionPointContainer)axWebBrowser1.GetOcx();

    Guid g = typeof(DWebBrowserEvents).GUID;

    icpc.FindConnectionPoint(ref g, out icp);

    icp.Advise(this, out cookie);

}

What this code does, is that it obtains a reference to Internet Explorer underlying IConnectionPointContainer, by calling the GetOcx method, that axWebBrowser1 has inherited from AxHost.  From this, we can then obtain a reference to the required connection point by passing its GUID / IID to theFindConnectionPoint method. To subscribe to events, we call the Advise method. To unsubscribe we should call the unAdvise method, if required.

To handle the event, we shall simply pop up a message box immediately before the page navigates. We shall also display any post data being sent.

C#

public void RaiseBeforeNavigate(String url, int flags, String

    targetFrameName, ref Object postData, String headers, ref Boolean cancel)

{

    string strPostData="";

    if (postData!=null)

    {

          strPostData= System.Text.Encoding.UTF8.GetString((byte[])postData);

          MessageBox.Show(strPostData);

    }                                     

}

Since we have specified that our class should implement IWebBrowserEvents, this dictates that the post data must be received as an object. This object should then be cast to a byte array, and then to a UTF8 string for readability.

To finish off the example, add a button to the form, and attach some code to it, to allow it to navigate to some website with a post-form on it, in this example, Amazon.com

C#

private void button1_Click(object sender, EventArgs e)

{

    object o = null;

    this.axWebBrowser1.Navigate("http://www.amazon.com",ref o,ref o,ref o,ref o);

}

To test the application, run it from Visual Studio .NET, press the navigate button, enter something in the Amazon search box, and press go. You should see a message box appearing, containing the post data which you sent to the web server.

5.4.3  Capturing click events
Although you can capture events such as DocumentComplete to determine when a user navigates to a new page, it is a little trickier to trap events which do not involve page navigation, such as entering text into a text box for instance.

The event trapping technique differs substantially between .NET 1.x and .NET 2.0. In the latter version, you need to implement the default COM interop method, this an entry point in your application which is marked as Dispatch ID 0, which COM uses to call back whenever your application subscribes to an event. In order to use COM interoperability, you need to include a using System.Runtime.InteropServices statement at the top of your code, in .NET 1.x.

In .NET 2.0, it is a little more straightforward. Here, we attach an HtmlElementEventHandler delegate to the Document.Clickevent, and implement it in our own event handler.

Basing this example on the sample code in section 5.1, we shall now add some extra event handling capabilities to pop up a message box whenever the user clicks a HTML element in the web browser.

C# 2.0

private void btnNavigate_Click(object sender, System.EventArgs e)

{

     NavigateToUrlSync(@"http://www.google.com");

     WebBrowser.Document.Click += new HtmlElementEventHandler(Document_Click);

}

C# 1.x

private void btnNavigate_Click(object sender, System.EventArgs e)

{

    NavigateToUrlSync(@"http://www.google.com");

    HTMLDocument hDoc = (HTMLDocument)WebBrowser.Document;

    hDoc.onclick = this;

}

At this point we have now subscribed to the click event, and in the case of .NET 2.0, supplied a call back delegate namedDocument_Click. For demonstration purposes, we shall simply display the tag name of the element clicked, and the event type (which should always be click?in our case).

C# 2.0

public void Document_Click(object sender, HtmlElementEventArgs e)

{

    string sTag = WebBrowser.Document.GetElementFromPoint(e.MousePosition).TagName;

    MessageBox.Show("Object: " + sTag + ", type:" + e.EventType);

}

C# 1.x

[DispId(0)]

public void DefaultMethod()

{

    HTMLDocument hDoc = (HTMLDocument)WebBrowser.Document;

    HTMLWindow2 hWin = (HTMLWindow2)hDoc.parentWindow;

    MessageBox.Show("Object: " + hWin.@event.srcElement.tagName +

                    ", Type: " + hWin.@event.type);

}

In order to get a reference to the element clicked, we have used a different technique for each version of .NET. In .NET 2.0, the GetElementFromPoint method is used to determine the element from the mouse location. In .NET 1.x, we can get the reference to the element via the Document.parentWindow.@event.srcElement property.

To test the application, compile and run it from Visual Studio .NET, press the navigate button, then click anywhere on the screen. You should see a message box appear with the tag name of the HTML element that you clicked on.

5.5         Extending Internet Explorer
The examples so far have dealt with embedding Internet Explorer in our applications, rather than embedding our applications in Internet Explorer. This may not be ideal for all users, as we loose the familiar interface that users are accustomed to. This section deals with how build applications around running instances of Internet Explorer.

5.5.1    Menu extensions
When you right click on a web page, you can see a context menu, which you can extend with a simple registry tweak. In this example, you can add a link to 揝end to a friend?in the context menu, which will link to a website that allows you to send emails. Firstly create the following registry key:

HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\MenuExt\Send to a friend

Then set the default value to a location on your hard drive, say c:\SendToAFriend.html, which would contain the following HTML:

<script language="JavaScript">

window.open("http://www.pop3webmail.info/reply.aspx?url=" + external.menuArguments.document.URL);

</script>

After making the change to the registry, close all browser windows.

5.5.2    Spawning a new instance of Internet Explorer
A simple way of controlling instances of Internet Explorer is to create them yourself using COM. In this example I use COM late binding, which differs from the early-bound examples used earlier in this chapter, specifically in the .NET 1.x examples. Early bound COM objects are compiled into the application at design time. Late bound COM objects are loaded dynamically at run time.

The benefit of early bound objects is that the development environment will be aware of the object model of the component, and Intellisense will assist you determine which methods you can call. We do not have such a luxury with late bound objects. However, there is an advantage that we can bind to COM objects hosted as executables, such as in the following example.

To start off, create a new windows forms application in Visual Studio .NET, drop a button on the form, and attach the following code to it.

C#

public Type tIE;

public object oIE;

private void btnNavigate_Click(object sender, EventArgs e)

{

    object[] oParameter = new object[1];

    tIE = Type.GetTypeFromProgID("InternetExplorer.Application");

    oIE = Activator.CreateInstance(tIE);

    oParameter[0] = (bool)true;

    tIE.InvokeMember("Visible", BindingFlags.SetProperty, null, oIE, oParameter);

    oParameter[0] = (string)"http://www.google.com";

    tIE.InvokeMember("Navigate2", BindingFlags.InvokeMethod,

                       null, oIE, oParameter);

}

VB.NET

Public tIE As Type

Public oIE As Object

Private Sub btnNavigate_Click(ByVal sender As System.Object, _

             ByVal e As System.EventArgs) Handles btnNavigate.Click

    Dim oParameter(0) As Object

    tIE = Type.GetTypeFromProgID("InternetExplorer.Application")

    oIE = Activator.CreateInstance(tIE)

    oParameter(0) = CType(True, Boolean)

    tIE.InvokeMember("Visible", BindingFlags.SetProperty, Nothing, oIE, oParameter)

    oParameter(0) = CType("http://www.google.com", String)

    tIE.InvokeMember("Navigate2", BindingFlags.InvokeMethod,

                       Nothing, oIE, oParameter)

End Sub

You should also add references to System.Threading and System.Reflection at the top of your code.

The above code retrieves a reference to the COM object model for the Internet Explorer application by inspecting the ProgID?/span>InternetExplorer.Application? It then creates an instance of this COM object. It sets its Visible property to true, then calls the Navigate2 method, passing the URL www.google.com as a parameter.

Unfortunately it is not trivial to subscribe to events from this late bound object, so therefore, if it is necessary to detect navigation between pages, it may be necessary to poll on the LocationURL property of the browser.

To test this application, compile and run it from Visual Studio .NET, then press the button on the form. You should see a new browser window open.

5.5.3    Browser Helper Objects
When you need to get really tight integration with Internet Explorer, in cases where you want code to execute completely transparently to the user, and yet have full control of the browsers document model and be able to subscribe to events, Browser Helper Objects (BHO) is the way to go.

BHO technology is widely associated with Spyware applications, which silently run in the background, as a user is browsing websites. Since the BHO would have access to the HTMLDocument object of the Internet Explorer instance hosting it, it would be possible to read the text of the webpage being visited, and duly display context-sensitive advertisements.

Internet Explorer expects the BHO object to be COM based, not a .NET assembly. Therefore it is necessary to create a CCW (Com Callable Wrapper) for our assembly. This CCW has a unique Class ID, which we store in the registry at the following location:

HKLM\Software\Microsoft\Windows\CurrentVersion\Explorer\
Browser Helper Objects

When Internet Explorer (or Windows Explorer) starts, it reads all the Class ID抯 listed in at the registry location listed above, and creates instances of their respective COM objects, and in our case, the underlying .NET assembly. It then interrogates the COM object to ensure that it implements the IObjectWithSite interface. This interface is very strictly defined and implemented as follows:

C#

using System;

using System.Runtime.InteropServices;

namespace BrowserHelperObject

{

    [ComVisible(true),

    InterfaceType(ComInterfaceType.InterfaceIsIUnknown),

    Guid("FC4801A3-2BA9-11CF-A229-00AA003D7352")]

    public interface IObjectWithSite

    {

          [PreserveSig]

          int SetSite([MarshalAs(UnmanagedType.IUnknown)]object site);

          [PreserveSig]

          int GetSite(ref Guid guid, out IntPtr ppvSite);

    }

}

Internet Explorer uses the two methods listed above to interact with the BHO. The SetSite method is called by Internet Explorer whenever it starts up, or shuts down. This is to update the BHO with the status of any internal references it may hold to the instance of Internet Explorer which is hosting it. GetSite may be called by Internet Explorer to query the reference a BHO holds to it. Every BHO must implement both of these methods, and handle requests to and from the hosting instance correctly.

To demonstrate Browser Helper Objects, we shall go though a simple example, where we attach a BHO to Internet Explorer, which will append the current date to every page visited by the user.

Start a new class library project in Visual Studio, add a reference to the Microsoft.mstml .NET assembly, and also to the COM object named Microsoft Internet Controls? Add a new class file containing the definition of IObjectWithSite as listed above. Then you can create the skeleton of your BHO thus:

C#

using System;

using System.Runtime.InteropServices;

using SHDocVw;

using Microsoft.Win32;

using mshtml;

namespace BrowserHelperObject

{

    [ComVisible(true),

    Guid("F839CC51-A6D8-4e9c-ACE5-F05071AD0C74"),

    ClassInterface(ClassInterfaceType.None)]

    public class DateStamp : IObjectWithSite

    {

          WebBrowser webBrowser;

  

    }

}

What you can see from the code above is that the class implements the IObjectWithSite interface, which is a pre-requisite of any BHO. It also has a GUID (Genuinely Unique Identifier), - this is used to uniquely identify the CCW, and can be chosen arbitrarily, using the GuidGen.exe tool or similar. The WebBrowser class in the code does not refer to the familiarWebBrowser class as used in .NET 2.0, but instead is a class defined within SHDocVw. It is this object which will contain a reference to the hosting instance of Internet Explorer.

As mentioned previously, it is necessary for every BHO to implement both the GetSite and SetSite methods. In most cases, there is little need to perform any custom actions within GetSite, so therefore its implementation would remain standard for most types of BHO. A typical implementation would be as follows:

C#

public int GetSite(ref Guid guid, out IntPtr ppvSite)

{

    IntPtr punk = Marshal.GetIUnknownForObject(webBrowser);

    int hr = Marshal.QueryInterface(punk, ref guid, out ppvSite);         

    Marshal.Release(punk);

    return hr;

}

What this code does, is that it firstly obtains a pointer to the IUnknown COM interface for our reference to the hosting instance of Internet Explorer. It then queries the IUnknown interface with a GUID issued internally by Internet Explorer. This returns a pointer to another object, as required by Internet Explorer. The code then frees the resources associated with theIUnknown pointer, and returns a HRESULT in the event that an error occurred whilst trying to query the interface.

What is of more interest is the SetSite method. This is where we have the opportunity to attach custom event handlers to the hosting web browser. In this case, we attach the DocumentComplete event handler.

C#

public int SetSite(object site)

{

    if (site != null)

    {

          webBrowser = (WebBrowser)site;

          webBrowser.DocumentComplete += new

           DWebBrowserEvents2_DocumentCompleteEventHandler(

           this.OnDocumentComplete);                      

    }

    else

    {

          webBrowser.DocumentComplete -= new

           DWebBrowserEvents2_DocumentCompleteEventHandler(

           this.OnDocumentComplete);

          webBrowser = null;

    }

    return 0;

}

As mentioned earlier, Internet Explorer may also call this method as it is shutting down, therefore, in which case the passed parameter is null. It is required that we should detach event handlers and free any associated resources at the point when the host closes.

At this point we are in a position to add our own custom logic, which we place within the OnDocumentComplete function thus:

C#

private void OnDocumentComplete(object frame, ref object urlObj)

{

    if (webBrowser != frame)   return;                   

    IHTMLDocument2 document = (IHTMLDocument2)webBrowser.Document;                  document.body.innerHTML = document.body.innerHTML +

                     DateTime.Now.ToShortDateString();

    return;             

}

In the above code, we retrieve a reference to the HTMLDocument contained within the hosting Internet Explorer instance, and simply add the current date to the HTML of the page.

Before we are ready to try out our new BHO, we should add some extra plumbing to enable the assembly to store the Class ID of it CCW in the registry with the other Browser Helper Objects installed on the system.

C#

public static string BHOKEYNAME = "Software\\Microsoft\\Windows\\CurrentVersion\\Explorer\\Browser Helper Objects";

[ComRegisterFunction]

public static void RegisterBHO(Type t)

{

    RegistryKey key = Registry.LocalMachine.OpenSubKey(BHOKEYNAME, true);

    if (key == null) key = Registry.LocalMachine.CreateSubKey(BHOKEYNAME);

    string guidString = t.GUID.ToString("B");

    RegistryKey bhoKey = key.OpenSubKey(guidString);

    if (bhoKey == null)   bhoKey = key.CreateSubKey(guidString);

    key.Close();

    bhoKey.Close();

}

The above code is called whenever we create a CCW from the assembly. It inserts a new key containing the Class ID, at the registry location as specified.

Similarly, as we un-register the CCW, we will want to remove that key from the registry. This would be implemented thus:

C#

[ComUnregisterFunction]

public static void UnregisterBHO(Type t)

{

    RegistryKey key = Registry.LocalMachine.OpenSubKey(BHOKEYNAME, true);

    string guidString = t.GUID.ToString("B");

    if (key != null) key.DeleteSubKey(guidString, false);

}

You will find that whilst developing a BHO, you may need to recompile and test the code several times to perfect your application. Every time that you attach a BHO to Internet Explorer, it will also attach itself to Windows Explorer, and the assembly will be locked for the duration of the lifetime of these two processes. When the assembly is locked you will not be able to delete it, or modify it by building a new version of the BHO over it.

To unlock the BHO, you will need to un-register it using regasm /unregister then stop all iexplore.exe and explore.exe processes, through either task manager, or by logging off and logging back in again.

To test the above application, compile the above code, then open up the Visual Studio .NET command prompt and navigate to the folder that contains the output DLL. then run the command

Regasm /codebase browserHelperObject.dll

Now open up an Internet Explorer window and you should see the date written at the bottom of the page.



If you receive the following warning ?do not panic, as long the GUID you used in your assembly is unique, it will not cause a problem

RegAsm warning: Registering an unsigned assembly with /codebase can cause your assembly to interfere with other applications that may be installed on the same computer. The /codebase switch is intended to be used only with signed assemblies. Please give your assembly a strong name and re-register it.



5.6         Conclusion
This chapter has demonstrated how to control Internet Explorer from within a .NET application. It should pave the way for automating data mining processes using this versatile component.

With the added benefit of enabling a user to interact with web pages in a natural fashion, and being able to trap events from within Internet Explorer, it should be possible to implement data mining training tools, and website test automation utilities with the examples shown in this chapter.
分享到:
评论

相关推荐

    pandas-1.3.5-cp37-cp37m-macosx_10_9_x86_64.zip

    pandas whl安装包,对应各个python版本和系统(具体看资源名字),找准自己对应的下载即可! 下载后解压出来是已.whl为后缀的安装包,进入终端,直接pip install pandas-xxx.whl即可,非常方便。 再也不用担心pip联网下载网络超时,各种安装不成功的问题。

    基于java的大学生兼职信息系统答辩PPT.pptx

    基于java的大学生兼职信息系统答辩PPT.pptx

    基于java的乐校园二手书交易管理系统答辩PPT.pptx

    基于java的乐校园二手书交易管理系统答辩PPT.pptx

    tornado-6.4-cp38-abi3-musllinux_1_1_i686.whl

    tornado-6.4-cp38-abi3-musllinux_1_1_i686.whl

    Android Studio Ladybug(android-studio-2024.2.1.10-mac.zip.002)

    Android Studio Ladybug 2024.2.1(android-studio-2024.2.1.10-mac.dmg)适用于macOS Intel系统,文件使用360压缩软件分割成两个压缩包,必须一起下载使用: part1: https://download.csdn.net/download/weixin_43800734/89954174 part2: https://download.csdn.net/download/weixin_43800734/89954175

    基于ssm框架+mysql+jsp实现的监考安排与查询系统

    有学生和教师两种角色 登录和注册模块 考场信息模块 考试信息模块 点我收藏 功能 监考安排模块 考场类型模块 系统公告模块 个人中心模块: 1、修改个人信息,可以上传图片 2、我的收藏列表 账号管理模块 服务模块 eclipse或者idea 均可以运行 jdk1.8 apache-maven-3.6 mysql5.7及以上 tomcat 8.0及以上版本

    tornado-6.1b2-cp38-cp38-macosx_10_9_x86_64.whl

    tornado-6.1b2-cp38-cp38-macosx_10_9_x86_64.whl

    Android Studio Ladybug(android-studio-2024.2.1.10-mac.zip.001)

    Android Studio Ladybug 2024.2.1(android-studio-2024.2.1.10-mac.dmg)适用于macOS Intel系统,文件使用360压缩软件分割成两个压缩包,必须一起下载使用: part1: https://download.csdn.net/download/weixin_43800734/89954174 part2: https://download.csdn.net/download/weixin_43800734/89954175

    基于MATLAB车牌识别代码实现代码【含界面GUI】.zip

    matlab

    基于java的毕业生就业信息管理系统答辩PPT.pptx

    基于java的毕业生就业信息管理系统答辩PPT.pptx

    基于Web的毕业设计选题系统的设计与实现(springboot+vue+mysql+说明文档).zip

    随着高等教育的普及和毕业设计的日益重要,为了方便教师、学生和管理员进行毕业设计的选题和管理,我们开发了这款基于Web的毕业设计选题系统。 该系统主要包括教师管理、院系管理、学生管理等多个模块。在教师管理模块中,管理员可以新增、删除教师信息,并查看教师的详细资料,方便进行教师资源的分配和管理。院系管理模块则允许管理员对各个院系的信息进行管理和维护,确保信息的准确性和完整性。 学生管理模块是系统的核心之一,它提供了学生选题、任务书管理、开题报告管理、开题成绩管理等功能。学生可以在此模块中进行毕业设计的选题,并上传任务书和开题报告,管理员和教师则可以对学生的报告进行审阅和评分。 此外,系统还具备课题分类管理和课题信息管理功能,方便对毕业设计课题进行分类和归档,提高管理效率。在线留言功能则为学生、教师和管理员提供了一个交流互动的平台,可以就毕业设计相关问题进行讨论和解答。 整个系统设计简洁明了,操作便捷,大大提高了毕业设计的选题和管理效率,为高等教育的发展做出了积极贡献。

    机器学习(预测模型):2000年至2015年期间193个国家的预期寿命和相关健康因素的数据

    这个数据集来自世界卫生组织(WHO),包含了2000年至2015年期间193个国家的预期寿命和相关健康因素的数据。它提供了一个全面的视角,用于分析影响全球人口预期寿命的多种因素。数据集涵盖了从婴儿死亡率、GDP、BMI到免疫接种覆盖率等多个维度,为研究者提供了丰富的信息来探索和预测预期寿命。 该数据集的特点在于其跨国家的比较性,使得研究者能够识别出不同国家之间预期寿命的差异,并分析这些差异背后的原因。数据集包含22个特征列和2938行数据,涉及的变量被分为几个大类:免疫相关因素、死亡因素、经济因素和社会因素。这些数据不仅有助于了解全球健康趋势,还可以辅助制定公共卫生政策和社会福利计划。 数据集的处理包括对缺失值的处理、数据类型转换以及去重等步骤,以确保数据的准确性和可靠性。研究者可以使用这个数据集来探索如教育、健康习惯、生活方式等因素如何影响人们的寿命,以及不同国家的经济发展水平如何与预期寿命相关联。此外,数据集还可以用于预测模型的构建,通过回归分析等统计方法来预测预期寿命。 总的来说,这个数据集是研究全球健康和预期寿命变化的宝贵资源,它不仅提供了历史数据,还为未来的研究和政策制

    基于微信小程序的高校毕业论文管理系统小程序答辩PPT.pptx

    基于微信小程序的高校毕业论文管理系统小程序答辩PPT.pptx

    基于java的超市 Pos 收银管理系统答辩PPT.pptx

    基于java的超市 Pos 收银管理系统答辩PPT.pptx

    基于java的网上报名系统答辩PPT.pptx

    基于java的网上报名系统答辩PPT.pptx

    基于java的网上书城答辩PPT.pptx

    基于java的网上书城答辩PPT.pptx

    婚恋网站 SSM毕业设计 附带论文.zip

    婚恋网站 SSM毕业设计 附带论文 启动教程:https://www.bilibili.com/video/BV1GK1iYyE2B

    基于java的戒烟网站答辩PPT.pptx

    基于java的戒烟网站答辩PPT.pptx

    基于微信小程序的“健康早知道”微信小程序答辩PPT.pptx

    基于微信小程序的“健康早知道”微信小程序答辩PPT.pptx

    机器学习(预测模型):自行车共享使用情况的数据集

    Capital Bikeshare 数据集是一个包含从2020年5月到2024年8月的自行车共享使用情况的数据集。这个数据集记录了华盛顿特区Capital Bikeshare项目中自行车的租赁模式,包括了骑行的持续时间、开始和结束日期时间、起始和结束站点、使用的自行车编号、用户类型(注册会员或临时用户)等信息。这些数据可以帮助分析和预测自行车共享系统的需求模式,以及了解用户行为和偏好。 数据集的特点包括: 时间范围:覆盖了四年多的时间,提供了长期的数据观察。 细节丰富:包含了每次骑行的详细信息,如日期、时间、天气条件、季节等,有助于深入分析。 用户分类:数据中区分了注册用户和临时用户,可以分析不同用户群体的使用习惯。 天气和季节因素:包含了天气情况和季节信息,可以研究这些因素对骑行需求的影响。 通过分析这个数据集,可以得出关于自行车共享使用模式的多种见解,比如一天中不同时间段的使用高峰、不同天气条件下的使用差异、季节性变化对骑行需求的影响等。这些信息对于城市规划者、交通管理者以及自行车共享服务提供商来说都是非常宝贵的,可以帮助他们优化服务、提高效率和满足用户需求。同时,这个数据集也

Global site tag (gtag.js) - Google Analytics