Scraping Websites That Require Login In Python

Scraping Websites That Require Login In Python

Hi , one of the things I enjoy using my Python skills is web scraping, I actually find it fun and I once wrote a blog here on how I scraped Jumia.

image.png

For me, it's like magic, but recent, I have encountered websites that require logins and you can't do much until you create a session.

Wait, What is Scraping?

Web scraping refers to the extraction of data from a website. You can later dump this data in form of JSON, CSV, text etc. Web scraping has been around but has been changing names which I think, all refer to the same thing.

Some call it web mining or web data extraction or whatever you may want to call it but I believe the references point to the same thing. Data Mining might slightly differ.

image.png There are vast uses of the extracted data.

Solution:

I tried some options using curl in the network inspection of my chrome dev tools and it did not yield a viable solution and later found out that I could actually use requests and initiate a session with a payload.

You could also try out other web scraper apps, selenium and other online options available on the internet but for your own learning and code production, I recommend this.

Before you begin, you will need two elements to post a response to a site and login:
The name of the field attributes you want to push data to
The url of the page the data actually posts unto the backend

I will use Walmart to demonstrate this:

Go to Walmart Sign In Page

Right-click to Inspect

Get the name attributes for the required fields:

image.png For Walmart, it is email for the email address

image.png And password for Password

Code:

import requests

# Start the session
session = requests.Session()

# Create the payload
payload = {'email':'<YOUR  EMAIL>',
          'password':'<YOUR  PASSWORD>'
         }

# Post the payload to the site to log in
s = session.post("https://www.walmart.com/account/login?vid=oaoh", data=payload)

# Navigate to the next page and scrape the data
s = session.get('https://www.walmart.com/cp/playstation-5/3475115')

GitHub Repo

Now Start Scraping with your code. See my Jumia Scraper

I will be building a simple web crawler in the next post, consider subscribing to keep posted. I will use the power of requests and beautiful soup to achieve this.

That's it!

If you enjoyed reading, consider subscribing and reacting to this with love by sharing, commenting and any criticism is much welcome.

¢ollow me on Twitter :

Ronnie Atuhaire