Home / Blog / Web Scraping with Selenium and Python: Beginner to Intermediate

Web Scraping

Web Scraping with Selenium and Python: Beginner to Intermediate

QA Knowledge Hub·2026-04-12·7 min read

Web scraping with Selenium is different from scraping with requests or BeautifulSoup. Selenium controls a real browser — it can interact with JavaScript-rendered pages, click buttons, fill forms, and wait for dynamic content to load.

If you already know Selenium for testing, you are halfway to knowing it for scraping. The browser control skills are identical. What changes is the goal: instead of asserting what you see, you extract and store it.

Legal note: Only scrape websites where you have permission or where the data is publicly available and the site's terms of service allow it. Do not use scraping to bypass paywalls or violate privacy.

When to Use Selenium for Scraping

Use Selenium (not requests/BeautifulSoup) when:

  • The page renders content using JavaScript (React, Angular, Vue apps)
  • You need to interact with the page — click "Load More", apply filters, log in
  • The site blocks non-browser requests (checks for User-Agent, cookies, or JavaScript execution)
  • You need to handle infinite scroll or pagination that requires clicks

Use requests + BeautifulSoup when:

  • The content is in the raw HTML (view page source shows the data)
  • Speed matters (requests is 10–100x faster than Selenium)
  • You are scraping at scale (Selenium is resource-heavy)

Setup

pip install selenium pandas

Selenium 4.x includes Selenium Manager — it automatically downloads the correct browser driver. No manual chromedriver setup needed.

Your First Scraper

Scrape the top posts from Hacker News:

# scraper_hn.py
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import csv

def scrape_hacker_news():
    options = Options()
    options.add_argument("--headless=new")  # Run without opening a browser window
    options.add_argument("--window-size=1920,1080")
    
    driver = webdriver.Chrome(options=options)
    
    try:
        driver.get("https://news.ycombinator.com")
        
        # Find all post rows
        titles = driver.find_elements(By.CSS_SELECTOR, ".titleline > a")
        scores = driver.find_elements(By.CSS_SELECTOR, ".score")
        
        posts = []
        for i, title in enumerate(titles[:30]):  # Top 30 posts
            posts.append({
                "rank": i + 1,
                "title": title.text,
                "url": title.get_attribute("href"),
                "score": scores[i].text if i < len(scores) else "N/A"
            })
        
        return posts
    
    finally:
        driver.quit()


posts = scrape_hacker_news()
for post in posts[:5]:
    print(f"{post['rank']}. {post['title']} ({post['score']})")

Output:

1. Show HN: I built a QA knowledge site (423 points)
2. The art of the possible with LLMs (312 points)
...

Handling Dynamic Content — Waiting for Elements

The most common scraping error: NoSuchElementException when the element exists but hasn't loaded yet. Fix with explicit waits:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

def wait_for_element(driver, locator, timeout=10):
    try:
        wait = WebDriverWait(driver, timeout)
        return wait.until(EC.presence_of_element_located(locator))
    except TimeoutException:
        print(f"Element not found: {locator}")
        return None

# Usage
element = wait_for_element(driver, (By.CSS_SELECTOR, ".product-list"))
if element:
    products = driver.find_elements(By.CSS_SELECTOR, ".product-card")

Scraping a Real Product Page

Scrape product names, prices, and ratings from a demo e-commerce site:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import json

def scrape_products():
    options = Options()
    options.add_argument("--headless=new")
    driver = webdriver.Chrome(options=options)
    
    products = []
    
    try:
        driver.get("https://books.toscrape.com")
        wait = WebDriverWait(driver, 10)
        
        # Wait for products to load
        wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".product_pod")))
        
        # Get all product cards on the page
        product_cards = driver.find_elements(By.CSS_SELECTOR, ".product_pod")
        
        for card in product_cards:
            title_element = card.find_element(By.CSS_SELECTOR, "h3 > a")
            price_element = card.find_element(By.CSS_SELECTOR, ".price_color")
            rating_element = card.find_element(By.CSS_SELECTOR, ".star-rating")
            
            # Rating is stored in a CSS class like "star-rating Three"
            rating_class = rating_element.get_attribute("class")
            rating_word = rating_class.replace("star-rating ", "")
            
            rating_map = {"One": 1, "Two": 2, "Three": 3, "Four": 4, "Five": 5}
            
            products.append({
                "title": title_element.get_attribute("title"),
                "price": price_element.text,
                "rating": rating_map.get(rating_word, 0),
                "url": title_element.get_attribute("href")
            })
        
    finally:
        driver.quit()
    
    return products

results = scrape_products()
print(f"Scraped {len(results)} products")

# Save to JSON
with open("products.json", "w") as f:
    json.dump(results, f, indent=2)

Pagination — Scraping Multiple Pages

Many sites split data across multiple pages. Scrape all pages by following the "Next" link:

def scrape_all_pages():
    options = Options()
    options.add_argument("--headless=new")
    driver = webdriver.Chrome(options=options)
    
    all_products = []
    base_url = "https://books.toscrape.com"
    
    try:
        driver.get(base_url)
        page_number = 1
        
        while True:
            print(f"Scraping page {page_number}...")
            
            # Wait for products to load
            WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, ".product_pod"))
            )
            
            # Scrape current page (reuse the logic above)
            cards = driver.find_elements(By.CSS_SELECTOR, ".product_pod")
            for card in cards:
                title = card.find_element(By.CSS_SELECTOR, "h3 > a").get_attribute("title")
                price = card.find_element(By.CSS_SELECTOR, ".price_color").text
                all_products.append({"title": title, "price": price, "page": page_number})
            
            # Check if there is a "next" button
            try:
                next_button = driver.find_element(By.CSS_SELECTOR, ".next > a")
                next_button.click()
                page_number += 1
            except:
                print("No more pages. Scraping complete.")
                break
    
    finally:
        driver.quit()
    
    return all_products


all_products = scrape_all_pages()
print(f"Total products scraped: {len(all_products)}")

Filling Forms and Interacting with the Page

When you need to search, filter, or log in before scraping:

def search_and_scrape(query: str):
    driver = webdriver.Chrome()
    
    try:
        driver.get("https://example.com/products")
        
        # Find and fill the search box
        search_box = driver.find_element(By.CSS_SELECTOR, "input[name='q']")
        search_box.clear()
        search_box.send_keys(query)
        
        # Click search button
        driver.find_element(By.CSS_SELECTOR, "button[type='submit']").click()
        
        # Wait for results to load
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, ".search-results"))
        )
        
        # Scrape results
        results = driver.find_elements(By.CSS_SELECTOR, ".result-item")
        return [r.text for r in results]
    
    finally:
        driver.quit()

Logging In Before Scraping

def login_and_scrape(username: str, password: str):
    driver = webdriver.Chrome()
    
    try:
        # Log in
        driver.get("https://example.com/login")
        driver.find_element(By.ID, "username").send_keys(username)
        driver.find_element(By.ID, "password").send_keys(password)
        driver.find_element(By.ID, "login-btn").click()
        
        # Wait for redirect after login
        WebDriverWait(driver, 10).until(
            EC.url_contains("/dashboard")
        )
        
        # Now navigate to the page that requires login
        driver.get("https://example.com/my-orders")
        
        # Scrape order data
        orders = driver.find_elements(By.CSS_SELECTOR, ".order-row")
        return [{"id": o.find_element(By.CSS_SELECTOR, ".order-id").text,
                 "status": o.find_element(By.CSS_SELECTOR, ".order-status").text}
                for o in orders]
    
    finally:
        driver.quit()

Extracting JavaScript-Rendered Data

When data is loaded via JavaScript after the initial page load:

# Wait for the element to have content (not just exist)
wait = WebDriverWait(driver, 15)
element = wait.until(
    EC.text_to_be_present_in_element(
        (By.CSS_SELECTOR, "#product-count"), "products found"
    )
)

# Execute JavaScript directly to get computed values
price_js = driver.execute_script(
    "return document.querySelector('.final-price').innerText"
)

# Get element attribute not accessible via get_attribute()
data_value = driver.execute_script(
    "return arguments[0].getAttribute('data-product-id')",
    driver.find_element(By.CSS_SELECTOR, ".product")
)

Saving Data to CSV

import csv

def save_to_csv(data: list, filename: str):
    if not data:
        return
    
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=data[0].keys())
        writer.writeheader()
        writer.writerows(data)
    
    print(f"Saved {len(data)} rows to {filename}")


products = scrape_products()
save_to_csv(products, "products.csv")

Avoiding Common Errors

StaleElementReferenceException

Happens when the DOM updates after you found an element. Fetch elements inside the loop iteration, not before it:

# WRONG — element becomes stale after page update
rows = driver.find_elements(By.CSS_SELECTOR, ".row")
for row in rows:
    row.click()  # StaleElementReferenceException on later iterations

# CORRECT — refetch elements each time or index into fresh list
for i in range(len(driver.find_elements(By.CSS_SELECTOR, ".row"))):
    driver.find_elements(By.CSS_SELECTOR, ".row")[i].click()

Rate Limiting

Add small delays between requests to avoid triggering bot detection:

import time
import random

# Random delay between 1 and 3 seconds
time.sleep(random.uniform(1, 3))

Missing Data on Some Pages

Always use .get() with a default or wrap in try/except:

try:
    price = card.find_element(By.CSS_SELECTOR, ".price").text
except:
    price = "N/A"

Useful CSS Selectors for Scraping

# Select by class
".product-title"

# Select by ID
"#main-content"

# Select by attribute
"a[href^='https']"           # Links starting with https
"img[alt]"                   # Images with alt text
"input[type='checkbox']"     # Checkboxes

# Child selector
".product-card > .title"     # Direct child

# Descendant selector
".product-card .price"       # Any descendant

# nth element
".product-card:nth-child(1)" # First card
".product-card:last-child"   # Last card

# Multiple selectors
".product-title, .product-name"  # Either class

QA + Scraping Career Crossover

For QA engineers, web scraping opens up additional career paths:

  • Test data generation: Scrape real product names, addresses, and prices to seed test databases with realistic data
  • Competitive monitoring: Scrape competitor pricing for e-commerce clients
  • QA for scraping systems: Companies that sell data products need QA engineers who understand the scraping pipeline
  • Data engineering adjacent work: Cleaning and validating scraped datasets

Having both Selenium automation and scraping experience makes your profile unusual in the QA market — most automation engineers know one or the other, not both.

Summary

Selenium scraping uses the same skills as Selenium testing — browser control, locators, waits, and form interaction. The difference is in what you do with the data: instead of asserting values, you extract and store them.

Start with a static page (books.toscrape.com is purpose-built for scraping practice), then graduate to dynamic pages that require interaction. The pagination and login patterns cover 90% of real-world scraping scenarios.

Recommended Resource

Automation Testing Scenarios Pack

High-quality automation scenarios for UI, API, and microservices systems.

1299Get This Guide →

Related Posts

📝
Interview Prep
Apr 2026·12 min read

35 Playwright Interview Questions (With Answers)

Playwright interview questions and answers for QA and SDET roles — covering setup, locators, waits, fixtures, API testing, and debugging.

Read article →
📝
Selenium
Apr 2026·8 min read

Page Object Model in Selenium: Design and Best Practices

Learn the Page Object Model design pattern for Selenium — why it exists, how to implement it correctly in Java, and the mistakes to avoid.

Read article →
📝
Interview Prep
Apr 2026·11 min read

60 Selenium Interview Questions for Java Developers

The most asked Selenium + Java interview questions with complete answers — covering WebDriver setup, locators, waits, POM, TestNG, and framework design.

Read article →