INTRODUCTION to Parsing HTML :

If you are familiar with PowerShell Invoke-WebRequest cmdlet then you must be aware that you get a parsed HTML from the requested Web URL. DOM structure of the web page is utilized to get access to HTML elements of the web page or Parsing HTML, like in the below animation –

iwr

PROBLEM :

What if we have HTML files are locally present on your machine or HTML content in form of string? Do we have any mechanism in place to Parse the local file/string?

SOLUTION : 

Well the answer is – yes we can! 🙂

Microsoft provides HTML document class in .Net framework class library, which has a  Write() method to write HTML Document using DOM 2 (Document Object Model Level 2write

APPROACH 1 : From a String

Instantiate HTML  document class object like in below animation and parse the HTML content as a string to access the HTML Elements.

htmlfile

APPROACH 2 : From a File

Similarly we can parse HTML document from a local HTML file

fromfile

NOTE : 

Even the parsed HTML from Invoke-Webrequest has the type HTML Document Class

same

That was all on today’s #Powershell Tip, Thanks for reading! 🙂

signature