Instagram Scraper: How to Scrape Data From Instagram [2023]

Written by dameskik | Published 2021/03/02
Tech Story Tags: instagram-marketing | growth-hacking | ecommerce | marketing | instagram | entrepreneur | digital-marketing | how-to-scrape-instagram

TLDRvia the TL;DR App

In this article, we'll cover how to build your own Instagram data scraping tool.
You should know that to build a scraper you need to have some technical skills. If you're not a tech person, or don’t have the time, resources, and want to be 100% on the legal side of things, use a service like influencers.club.
You can simply order targeted emails from followers of a profile (probably competitor) or a relevant hashtag.

You can also use their database of 50M+ Instagram profiles to find people by keywords in bio.

Important note: Please be advised that automatically accessing Instagram is against their terms of service.

What's Instagram Scraping?

Instagram scraping means automatically gathering publicly available data from Instagram users. The process may include scraping tools, Instagram scraping services or manually extracting the data. You can scrape data like as email addresses, phone numbers, images, bio, likes, comments, etc.

Is Instagram Scraping Legal?

While Instagram forbids any kind of crawling, scraping, or caching content from Instagram it is not regulated by law. Meaning, if you scrape data from Instagram you may get your account banned, but there are no legal repercussions.
So let's begin by general overview of the components you'll need for Instagram scraping.

1. Scrape Using the Unofficial Instagram API

The official Instagram API got disabled on June 29, 2020, and that’s ok because it was useless when you needed data like emails, phone numbers, bio, etc.
Instead, Instagram uses an unofficial (mobile) API (known as mobile endpoints) to communicate to and from their servers. So, with the help of open-source software and intercepting traffic we can see how their API works and use it for data scraping.

2. Instagram Profiles

Next, we need to use Instagram profiles that will simulate human behavior on Instagram’s mobile app while gathering data. The number of Instagram profiles you need depends on the amount of data you want to collect. Instagram has a small API call limit (that’s constantly decreasing) and is currently at 200 calls per day. 
So if you want to scrape the Instagram followers of an influencer with 50k fans you would need 50 Instagram profiles that are going to scrape for 5 days. 
Two important things to remember when purchasing Instagram profiles for scraping:
  • ALWAYS use aged Instagram profiles 
  • NEVER use your personal profile
You can purchase Instagram profiles from
  • Facebook pages
  • Instagram direct messages
  • and even on dedicated online marketplaces
But even if you manage to buy and login with all those profiles, you’ll still face many challenges. Instagram is pretty smart and can recognize profiles originating from the gray market. However, some sellers are really good at creating fake profiles that are hard to detect. I’d suggest searching for the most expensive sellers on this market.

3. Proxies for Remaining Undetected

A proxy is a third party server that allows you to route your request through their servers and use their IP address in the process. When using a proxy, Instagram no longer sees your IP address but the IP address of the proxy, giving you the ability to do all the scraping from one server. Don’t try to simulate too many IPs because logging in more than 5 profiles on the same IP is a huge no-no..
Same as with the Instagram profiles we have the same problem with proxies. Instagram detects thousands of proxy providers and until you find a good one you can face a lot of trouble. 
If Instagram bans the proxy you use, that automatically means that the associated Instagram profile is also no longer available. To check if you are safe and your proxy provider is still not on the radar use this website. If it’s a known provider it will be there and since this website knows, trust me the all-seeing Zuckerberg eye knows too.

Pros and Cons of Building Instagram Data Scraper

The benefits of having an Instagram scraper inhouse are:
  • Full control of the whole process
  • The contact data you acquire can be resold or rented
  • You can use the data to scale your business
However, there are also some serious drawbacks:
  • No targeting or segmentation once you have the data
  • In clear violation of Instagram’s ToS
  • Fake Accounts and Bots
  • Invalid Emails, Spam Traps, Catch-all
  • Security Risks
  • Very limited data points

How Can You Scrape Data From Instagram Followers or Users?

You can use Python (GitHub) to build your own Instagram scraper or buy Instagram users' data from Influencers Club.

Scraping Instagram With Python (GitHub)

To scrape Instagram with Python you can use a tool like Instagramy. This tool is specifically created for Instagram and has data analyzing ability through Pandas.
Instagramy is used to scrape Instagram quickly and easily. This package is installed by running the following command and based on network connection it scrapes the data for you.
pip install instagramy
Example 1: Scraping basic details
from instagramy import Instagram 


# Connecting the profile 
user = Instagram("geeks_for_geeks") 

# printing the basic details like 
# followers, following, bio 
print(user.is_verified()) 
print(user.popularity()) 
print(user.get_biography()) 

# return list of dicts 
posts = user.get_posts_details() 

print('\n\nLikes', 'Comments') 
for post in posts: 
	likes = post["likes"] 
	comments = post["comment"] 
	print(likes,comments)
Example 2: Analysing the data
from instagramy import Instalysis 

# Instagram user_id of ipl teams 
teams = ["chennaiipl", "mumbaiindians", 
		"royalchallengersbangalore", "kkriders", 
		"delhicapitals", "sunrisershyd", 
		"kxipofficial"] 

data = Instalysis(teams) 

# return the dataframe 
data_frame = data.analyis() 
data_frame

How to Scrape Likes From Instagram

Unfortunately, it's not possible to export people that liked a certain post or multiple posts. However, they can be crawled and scraped with this code:
def get_likes_list(username):
    api.login()
    api.searchUsername(username)
    result = api.LastJson
    username_id = result['user']['pk'] # Get user ID
    user_posts = api.getUserFeed(username_id) # Get user feed
    result = api.LastJson
    media_id = result['items'][0]['id'] # Get most recent post
    api.getMediaLikers(media_id) # Get users who liked
    users = api.LastJson['users']
    for user in users: # Push users to list
        users_list.append({'pk':user['pk'], 'username':user['username']})

Scrape Emails From Instagram Accounts | Instagram Email Scraper

To scrape emails from Instagram you need to log in with an Instagram account from a specific proxy. And to get the email addresses extracted, use this code: /api/v1/users/{{user_id}}/info/
You can use this GitHub Repo to find all samples.

Scrape Images From Instagram Users

A lot of you folks want to export your own or someone else's Instagram photos. Now, from my experience that is super hard to pull off since they need to be scraped from the web (not the app). But it's doable!
Here's the exact GitHub that you can use to build your own image scraper:

Instagram Scraper Tool Online

Scraping data from Instagram can be a whole lot of mess since 95 million profiles on the platforms are fake accounts or bots. That's why if you plan on scraping Insta for contact information like email or phone numbers, it's best to use a scraping service. These types of services will extract any data you like, but also clean and filter the list so you only end up with people you want to reach out to.
If you're a regular IG user or a small influencer that wants to export your own followers just search for a cheap scraping tool. But for companies that plan on using the data for advertising purposes I suggest you use Influencers Club. They are currently the leader on the market and offer filtering options that you don't get anywhere else (age, gender, location, interests and more).

Written by dameskik | More growth.
Published by HackerNoon on 2021/03/02