My top preference for data munging and harvesting from The Web is Internet Explorer, Yes, Internet Explorer! 🙂 because I can create an InternetExplorer.Application object and access the HTML DOM to scrape web data as and when required.


$InternetExplorer = New-Object -ComObject InternetExplorer.Application -property @{
Navigate = 'http://www.twitter.com/following'
Visible = $true
}
view raw

IEObject.ps1

hosted with ❤ by GitHub

The problem arises when all information on the web page is not populated by default when the page loads, and you’ve to manually scroll down the Internet Explorer’s scroll bar to populate more content. But how to Programmatically Scroll Internet Explorer?

Luckily, to our rescue .Net libraries provide a ScrollTo() function which can be utilized to scroll and populate content on a webpage, and this is very handy with web scraping techniques.

Programmatically Scroll Internet Explorer

Hence this quick post for people who may find it useful, because I found the answer after a lot of research 🙂

The example in above animation is used for harvesting data from Twitter and you can find a full blog post here

SCRIPT: 


$InternetExplorer = New-Object -ComObject InternetExplorer.Application -property @{
Navigate = 'http://www.twitter.com/following'
Visible = $true
}
# Wait unitl IE is busy
While($InternetExplorer.busy){ Write-Verbose "Internet Explorer is Busy, waiting for few seconds";Start-Sleep -Seconds 5 }
$start = Get-Date;$VerticalScroll = 0
Write-Host "Scrolling the WebPage to auto-populate all profiles for next 60 Secs" -ForegroundColor Yellow
# 60 Secs to Infinitely scroll webpage, So that all items are populated that only come when you scroll down
While((Get-Date) -lt $($start + [timespan]::new(0,0,60))) {
$InternetExplorer.Document.parentWindow.scrollTo(0,$VerticalScroll)
$VerticalScroll = $VerticalScroll + 100
}

Also, take a look at the web series here to know more about Data munging/harvesting and where Internet Explorer’s scrolling can be used for Web data munging and harvest data from the internet in any way possible. Above mentioned web series covers –

  1. Automating “From the Blog Archives” Tweets using Powershell
  2. Pumping Reddit user trend to AWS CloudWatch with Powershell
  3. Capturing & Analyzing online users Trend on Reddit with Powershell
  4. Powershell fiddling around Web scraping, Twitter – User Profiles, Images and much more
  5. Get example Sentence’s for a Word using Web scraping on online dictionary
  6. Get-Quote using Powershell
  7. PowerShell: Web-hosted Image Scraping
  8. [ Powershell ] Data Harvesting all dictionary words for each alphabet from Web
  9. PowerShell: Get Synonyms using Online Thesaurus
  10. Powershell: How to get Cricket Live Scores to Your Powershell Console
  11. PowerShell: Import / Query All Windows System Error Codes for Description

Please do follow me on twitter for more Interesting PowerShell material and don’t forget to Show-off more cool Web Scraping techniques you learn to your colleagues, thanks for reading. Cheers! 😉

Optical Character Recognition

Subscribe to our mailing list

* indicates required