Selenium

Selenium is a Python module for browser automation. You can use it to grab a webpages HTML code.

Introduction

While BeautifulSoup is used for smaller tasks. Selenium is used for JavaScript featured websites – and can be used as a standalone web scraper and parser. They are all useful in their own way, and learning how to use all of them will make you a better web scraping developer. With Selenium we can do things that you cant do with BeautifulSoup like simulating clicks or filling out forms.

Install Selenium

To start, install the selenium module for Python and import it.

pip install selenium
from selenium import webdriver

Start Web Browser

We have to download the ChromeDriver from here. And move it to a specific path that the script has access to and then copy the path when we initialize the driver.

driver = webdriver.Chrome(executable_path='./chromedriver.exe')

The main purpose of the ChromeDriver is to launch Google Chrome. Without that, it is not possible to execute Selenium test scripts in Google Chrome as well as automate any web application. This is the main reason why you need ChromeDriver to run test cases on Google Chrome browser.

Get HTML Contents

All we have to do is call the get function on the driver and you give the URL as a parameter. The URL you want to get is opened, this just opens the link in the browser.

browser.get("https://en.wikipedia.org")

Then you can use the attribute .page_source to get the HTML code.

html = browser.page_source
print(html)

You can continue using the BeautifulSoup parser and functions even after we get the HTML contents through Selenium.

Simulate a click

To put it in simple words, the click command emulates a click operation for a link, button, checkbox or radio button. In Selenium Webdriver, execute click after finding an element.

This will come in useful in our news scrapper that we will be explaining later to click the load more button of the articles.

button = driver.find_element_by_class_name("classname")
button.click()

You can find the button using different attributes of the element it doesn't have to be by class.

Last updated