Thursday, 30 January 2014

scrape macy's deals using beautiful soup

scrape macy's deals using beautiful soup

Let me show you a tiny real example on how to use the bs4 (beautiful soup version 4) module of Python. Say we want to collect information about the hot deals from macy's. The URL is here. Well, you can see all the info in one page and copy-paste, but that's not our purpose. First you have to get the content of the page using the cute requests module.
import requests

url = 'http://bit.ly/19zWmQT'
r = requests.get(url)
html_content = r.text
Now start cooking the soup:
from bs4 import BeautifulSoup

soup = BeautifulSoup(html_content)
Now look at the html code (page source code) of the url. You will see that the offers are in a list (li) that has 'offer' as a css class name (and some other class names). So you can write the code in the following way:
offer_list = soup('li', 'offer')
Or you can write:
offer_list = soup.find_all('li', 'offer')
Another way to write this is:
offer_list = soup.select('li.offer')
Now run this loop:
for offer in offer_list:
    title = offer.find('h3').text
    url = offer.find('a')['href']
    description = offer.find('span').text
    promo_code = offer.find('span', class_='promo-code').text
    promo_date = offer.find('span', class_='end-date').text
    print title, url, description, promo_date, promo_code
You are done! :) 

No comments:

Post a Comment