Intro to Pandas: -1 : An absolute beginners guide to Machine Learning and Data science.

Written by rakshithvasudev | Published 2017/10/16
Tech Story Tags: data-science | machine-learning | pandas | python-pandas | pandas-series

TLDRvia the TL;DR App

Pandas is hands down one of the best libraries of python. It supports reading and writing excel spreadsheets, CVS's and a whole lot of manipulation. It is more like a mandatory library you need to know if you’re dealing with datasets from excel files and CSV files. i.e for Machine learning and data science.

This is part one of Pandas tutorial. I’m not going to cover everything possible with pandas, however, I want to give you a taste of what it is and how you can get started with it. This tutorial is going to be super short just introducing you to Series object of pandas.

As other libraries, you’d import pandas and reference it as pd.

import pandas as pd

We’re officially indicating to python that pandas must be hence fourth referred to as pd.

If you like trance music, I’m positive you’ve heard of songs mentioned in this list.

# lets create a list of songs.songs = ['In the name of love','Scream','Till the sky falls down','In and out of Love']

# lets also create a list of corresponding artists. FYI: 'MG' stands # for Martin Garrix, 'TI' for Tiesto, 'DB' for Dash Berlin, 'AV'for # Armin Van Buuren.artists = ['MG','TI','DB','AV']

# likewise lets create a dictionary that contains artists and songs.song_arts = {'MG':'In the name of love','TI':'Scream','DB':'Till the sky falls down','AV':'In and out of Love'}

How do I create a table like structure using these lists? pd.Series()

pd.Series() is a method that creates a series object from data passed. The data must be defined as a parameter.

# create a Series object whose data is coming from songs list.ser_num = pd.Series(data=songs)ser_num

====================================================================0 In the name of love1 Scream2 Till the sky falls down3 In and out of Lovedtype: object

So, what is a “Series” object in Pandas?

It is a data structure defined by Pandas. Basically it looks like a table having rows and columns.

0 In the name of love1 Scream2 Till the sky falls down3 In and out of Love

Notice that these numbers on the first column were added automatically by pandas. They serve as index.

The first column here are the indices of the series and the second column are values of the series.

Say supposing you want to access ‘In and out of Love’. How would you do that?

# get the element that corresponds to index 3.ser_num[3]===================================================================='In and out of Love'

What if you want the artists name to be the index of the song?

# make artists the index this time.ser_art = pd.Series(data=songs,index=artists)ser_art====================================================================MG In the name of loveTI ScreamDB Till the sky falls downAV In and out of Lovedtype: object

This time instead of numbers, name of artists are made as the index. But how?Notice, this time we passed artists as index parameter additionally to pd.Series().

How to access via custom index defined? i.e How to get songs by their artist name?

Just pass the name of the artist and you get their song.

ser_art['MG']===================================================================='In the name of love'

ser_art['AV']===================================================================='In and out of Love'

ser_art['DB']===================================================================='Till the sky falls down'

It is kind of like accessing elements via dictionary. There you pass the ‘key’, here in series you pass ‘index’ to retrieve elements.

Not to mention even numbers still work as index.

ser_art[0]===================================================================='In the name of love'

ser_art[2]===================================================================='Till the sky falls down'

Great! Seems interesting. But how to create a series object from dictionary?

It’s as simple as passing the dictionary element to pd.Series(), like so:

ser_dict= pd.Series(song_arts)ser_dict====================================================================AV In and out of LoveDB Till the sky falls downMG In the name of loveTI Screamdtype: object

pandas elegantly created series object by taking keys as series’s indices and values as series’s values.

Accessing still works fine like before.

ser_art['TI']===================================================================='Scream'

ser_art['DB']===================================================================='Till the sky falls down'

What to do if I want to get all the indices and values separately from a Series object?

Series object has index and values attribute that can pump out only indices and values of a particular series.

# get the indices onlyser_art.index====================================================================Index(['MG', 'TI', 'DB', 'AV'], dtype='object')

# get only values of the seriesser_art.values====================================================================array(['In the name of love', 'Scream', 'Till the sky falls down','In and out of Love'], dtype=object)

This is just the tip of iceberg of what can be done with series. We’ll cover more on pandas in the next upcoming tutorial.

Here’s a video tutorial explaining everything that I did if you’re interested to consume via video.

Stay tuned. There’s going to be a follow up tutorial involving more content on pandas.

If you want to learn numpy, I wrote an article titled “Introduction to Numpy -1 : An absolute beginners guide to Machine Learning and Data science.” Check it out.

If you liked this article, a clap/recommendation would be really appreciated. It helps me to write more such articles.


Published by HackerNoon on 2017/10/16