In this video, I'll show you an INCREDIBLE shortcut for web scraping HTML tables!!!
This shortcut is a single command that skips the entire process of accessing the Developer Tools and finding specific elements on the page! ????
We will use the ???? Pandas read_html method ???? - and extract a list of all the table elements found at a given URL (table headers, table data, table rows).
So with this single command - we actually end up solving a problem that took more than 15 minutes to solve in the recent Mechanical Soup tutorial!! ????????????????????????
(which can be found here: https://youtu.be/MkGQmZoMuRM)
But wait... it sounds too good to be true! where's the catch?? ????
Unfortunately, this method is restricted to table elements only and it can't be used for traditional web scraping perks such as logging in, filling forms or pressing on buttons.
However, if scraping HTML tables is exactly what you seek - this tutorial will change your life!!! (well... maybe just slightly ????)
***************************
⏰ TIME STAMPS ⏰
***************************
00:00 - intro
00:46 - install pandas and lxml
01:34 - read HTML method
02:22 - fix SSL Cert Verification Error and URL Error
03:08 - find a single table in read_html
04:26 - benefits and disadvantages of read_html over Mechanical Soup
05:01 - credits and thanks for watching!
???? USEFUL TUTORIALS ????
***************************
⭐⭐ web scraping databases with SQLite and Mechanical Soup:
https://youtu.be/MkGQmZoMuRM
⭐ web scraping images with Mechanical Soup:
https://youtu.be/drDdb1MBBfI
⭐ web scraping DOM elements with Beautiful Soup:
https://youtu.be/ySNSY7iiBDY
⭐ interacting with LinkedIn using Selenium:
https://youtu.be/7aIb6iQZkDw
⭐ web scraping entire Facebook image gallery with Selenium:
https://youtu.be/SsXcyoevkV0
⭐ web scraping Instagram images with Selenium:
https://youtu.be/iJGvYBH9mcY
???? USEFUL LINKS ????
***************************
⭐ complete code on Github:
https://github.com/MariyaSha/WebscrapingDatabases/blob/main/scraper_Pandas.ipynb
⭐ URL used in video:
https://en.wikipedia.org/wiki/Comparison_of_Linux_distributions
⭐SSL Certificate error fix:
https://github.com/prisma-labs/python-graphql-client/issues/13
???? THANK YOU ????
***************************
Thank you so much for watching and for all your support!
An extra special thank you to Ditto for suggesting the read_html method! ????
Also, thanks to flaticon.com for the lovely icons at the end of the video
See you soon in a brand new Flask + Sqlite tutorial! ????