Web Scrape with Python Using Just 9 Lines of Code

Written by songthamtung | Published 2019/10/02
Tech Story Tags: python | scraping | tech | startup | productivity | automation | coding | latest-tech-stories

TLDR Scraping is extracting data from websites. In this article, I will show you how to scrape links from a test e-commerce site with Python 3. If you haven't done so already, install beautifulsoup4 and requests.com. The output for data should be something similar to: "That's it. That's it, scraping is great and can save you plenty of time. The examples above are used for you to quickly get started. Of course there's more to it than what I showed above. This is only the tip of the iceberg.via the TL;DR App

Scraping is extracting data from websites. In this article, I will show you how to scrape links from a test e-commerce site with Python 3.
Prerequisites
If you haven't done so already, install beautifulsoup4 and requests.
pip install beautifulsoup4
pip install requests
Start Scraping!
import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.webscraper.io/test-sites/e-commerce/allinone")
soup = BeautifulSoup(result.content)
links = soup.find_all("a", "title")
data = {}
for link in links:
    title = link.string
    data[title] = link.attrs['href']
Here is the full snippet that you can copy and paste directly to your terminal, favorite text editor, or jupyter notebook.
To check if you did it correctly, the output for data should be something similar to:
{'MSI GL62VR 7RFX': '/test-sites/e-commerce/allinone/product/326''Dell Vostro 15…': '/test-sites/e-commerce/allinone/product/283''Dell Inspiron 17…': '/test-sites/e-commerce/allinone/product/296'}

That's it

Web scraping is great and can save you plenty of time when you want to quickly extract data from websites. The examples above are used for you to quickly get started. Of course there's more to it than what I showed above e.g. (crawling, pagination, viewing the DOM, authentication, cookies, etc). This is only the tip of the iceberg 😉.
Thanks for reading! Originally published on The Startup.

Published by HackerNoon on 2019/10/02