INTRODUCTION to Parsing HTML :
If you are familiar with PowerShell Invoke-WebRequest cmdlet then you must be aware that you get a parsed HTML from the requested Web URL. DOM structure of the web page is utilized to get access to HTML elements of the web page or Parsing HTML, like in the below animation –
PROBLEM :
What if we have HTML files are locally present on your machine or HTML content in form of string? Do we have any mechanism in place to Parse the local file/string?
SOLUTION :
Well the answer is – yes we can! 🙂
Microsoft provides HTML document class in .Net framework class library, which has a Write() method to write HTML Document using DOM 2 (Document Object Model Level 2)
APPROACH 1 : From a String
Instantiate HTML document class object like in below animation and parse the HTML content as a string to access the HTML Elements.
APPROACH 2 : From a File
Similarly we can parse HTML document from a local HTML file
NOTE :
Even the parsed HTML from Invoke-Webrequest has the type HTML Document Class
That was all on today’s #Powershell Tip, Thanks for reading! 🙂