Web Scraping Twitter with PowerShell – User Profiles, Images, links
I’m Big Fan of REST API’s , they are efficient, reliable and fun.
Recently I have been playing with Twitter REST API’s and was thinking is it possible to get the required information from Twitter without using the API? without setting up the authentication model (OAuth tokens) or connecting to right endpoint ?
Point of this post is to see what we can achieve with shell scripting and NOT reflecting the idea that API are bad in any way.
INITIAL HYPOTHESIS :
All information on twitter webpage are under some specific HTML tags, and can be easily extracted by parsing the HTML content returned in response of the query to twitter Webpage. Let’s suppose https://twitter.com/followers
But I would require a Web Control mechanism (Property or Method) to infinitely scroll web page so that full list of user profiles are populated which only appear when you scroll down to the bottom of the page. We can’t use Invoke-WebRequest as it doesn’t support any functionality to scroll the webpage URL it is requesting.
Once we figure out this and the web page is fully populated, we can use web data scraping techniques to get information from any web page(Followers, Following, Timeline etc)
STEPS BREAKDOWN :
Following are the steps how I managed to data harvest User profiles of all my followers on twitter –
- Create InternetExplorer.Application COM (Component Object Model) objects and navigate to the URL; wait until the URL has been properly loaded in IE.
- Programmatically scroll to the bottom of the page so that all user profiles are populated.
- Once all profiles are populated on Internet explorer window use the internet Explorer COM Object to access the web page Document (Parsed HTML)
- Filter out required data sitting in specific HTML Tags
- Convert raw filtered data to Powershell objects and generate presentable output on screen
- It’s prerequisite to login to Internet Explorer and check ‘Remember me’ checkbox , so that IE opens your twitter profile by default when the script is running.
- Infinite scrolling requires to be stopped when all data is populated, for me max 30 secs window worked, but it may change depending upon the length of page under your profile and speed of your internet connection.
HOW TO RUN :
Run the function like I did in the animation below and it will Data scrape user Profile information from your twitter webpage
OK, let’s check how many user profiles I was able to harvest
Perfect! that looks good 🙂 and exactly the number of Followers I have (286 only! what a shame 😀 )
Now let’s filter out the User profiles of Microsoft MVP (Most Valuable professionals) awardees from my followers. I guess most of them have the “MVP” keyword in the user bio , so a simple “where” keyword would do the work for us like in the following screenshot.
Though I’m harvesting only four properties from the webpage, You can tweak the script as desired to get more information from the WebPage HTML content.
Since, I know who are the MVP’s following myself on twitter, how about downloading their Profile picture from twitter to my local drive using the User profile data we have harvested. To achieve this you’ve to follow steps in below animation.
Hope you’ll find it fun and fiddle around more with Powershell ! 😉 thanks for stopping by.
My new book : PowerShell Scripting Guide to Python
This PowerShell Scripting guide to Python is designed to make readers familiar with syntax, semantics and core concepts of Python language, in an approach that readers can totally relate with the concepts of PowerShell already in their arsenal, to learn Python fast and effectively, such that it sticks with readers for longer time.
“Use what you know to learn what you don’t. ” also known as Associative learning.
Book follows a comparative method to jump start readers journey in Python, but who is the target audience? and who should read this book –
- Any System Administrator who want to step into Development or Programming roles, and even if you don’t want to be a developer, knowledge of another scripting language will make your skill set more robust.
- Python Developers who want to learn PowerShell scripting and understand its ease of user and importance to manage any platform.
Python is one of the top programming languages and in fast changing IT scenarios to DevOps and Cloudto the future – Data Science, Artificial Intelligence (AI) and Machine Learning Python is a must know.
But this PowerShell Scripting guide to Python would be very helpful for you if you already have some knowledge of PowerShell
NOTE! This is a Leanpub “Agile-published” book. That means the book is currently unfinished and in-progress. As I continue to complete the chapters, we will re-publish the book with the new and updated content. Readers will receive an email once a new version is published!
While the book is in progress, please review it and send any feedback or error corrections at [email protected]