scrape macy's deals using beautiful soup
Let me show you a tiny real example on how to use the bs4 (beautiful soup version 4) module of Python. Say we want to collect information about the hot deals from macy's. The URL is here. Well, you can see all the info in one page and copy-paste, but that's not our purpose. First you have to get the content of the page using the cute requests module.
import requests url = 'http://bit.ly/19zWmQT' r = requests.get(url) html_content = r.textNow start cooking the soup:
from bs4 import BeautifulSoup soup = BeautifulSoup(html_content)Now look at the html code (page source code) of the url. You will see that the offers are in a list (li) that has 'offer' as a css class name (and some other class names). So you can write the code in the following way:
offer_list = soup('li', 'offer')Or you can write:
offer_list = soup.find_all('li', 'offer')Another way to write this is:
offer_list = soup.select('li.offer')Now run this loop:
for offer in offer_list: title = offer.find('h3').text url = offer.find('a')['href'] description = offer.find('span').text promo_code = offer.find('span', class_='promo-code').text promo_date = offer.find('span', class_='end-date').text print title, url, description, promo_date, promo_codeYou are done! :)
No comments:
Post a Comment