WTF are Python Generators?

Written by jeff-ridgeway | Published 2020/05/09
Tech Story Tags: python3 | programming | codenewbie | generator | 100daysofcode | python-top-story | python-generators-explained | what-are-python-generators

TLDR Python programming is a way of returning values as you need them vs. all at once. Generators operate in a similar vein as Python’s readline(size) function. They evaluate the value as necessary (in our case, necessary is the number of bytes to be read and if 0, reads until the newline character is approached. This is also called saving/accessing the state of the generator object when you reenter the function when you need to reenter it.via the TL;DR App

While this post seeks to break down the concept of a generator, it is assumed that there is comfortability with the basics of programming or in this case Python programming. Thanks!
I would venture to say that generators can make your code more efficient, easier to maintain, and handle memory properly in your programming. Interestingly enough, I didn’t start using Python generators heavily until after college in my software engineering career. Up until that point, I used to load mostly everything in memory not caring about efficiency and performance (we were all beginners at some point 😂). Now, the tide has turned and I have seen the glory of the usage of generators.
via GIPHY

Return as needed, please

Let’s assume you have a file named 
dummy.txt
 and you want to read in and print what’s in the file.  There are two ways to tackle this problem: You could read the whole file into memory or you could read line-by-line.

# dummy.txt contains the following lines
# Hello, this is dummy.txt
# I want to show you that generators are the way to go!
# You do not have to use generators all the time
# But they sure do help when you need them 

# you load the whole file into memory
with open("dummy.txt", "r+") as txtfile:
    lines_in_file = txtfile.readlines()

print(lines_in_file)
>>> ['Hello, this is dummy.txt\n',
'I want to show you that generators are the way to go!\n',
'You do not have to use generators all the time\n',
'But they sure do help when you need them\n']

# you only read/load in memory a certain number of bytes
with open("dummy.txt", "r+") as txtfile2:
    line_in_file = txtfile2.readline(10)

print(line_in_file)
>>> Hello, thi
Looking at this example using the
readlines()
 function, you load ALL the contents of the .txt file into memory via a variable 
lines_in_file
. An alternative approach loads a subset into the variable 
line_in_file
 using 
readline(size)
. The size parameter is the number of bytes to be read and if 0, reads until the newline character is approached.
While 
dummy.txt
 was only 4 lines of text, what if you had a text file and it was 100,000’s of lines that you had to parse and then do more computation on? Or what about a data structure you are working with and you don’t know the size? Do you want to load all of it into memory?!?! While you could use 
readlines()
, it is safer and more resourceful to use 
readline(size)
 as you are only reading either a full line in the file or piece of the line in the file.
How does this relate to generators? Well, generators operate in a similar vein as 
readline(size)
. They evaluate the value as necessary (in our case, necessary is the size parameter or reaching a terminating new-line character) and not all at once like 
readlines()
.
Note: both readline(size) and readlines() are not generators. They are used for example purposes.

Bruh, what are generators and please please KISS (Keep it Simple Stupid)?

Generators are a way of returning values as you need them vs. all at once (if you have a relatively large data structure). If generators could talk they would say, “hey dude, I’ll give you what you need as you need it, but not everything at one time”. Sometimes it isn’t efficient to load all the contents of a data structure into a variable (one especially of unknown length), as you only have so much memory on your computer. Usually, it’s best to conserve your computational resources when you can, and generators can help a ton in that area. Let’s look at implementation to get a fuller picture 😃.

Python generators


# generator function
def rand_statements():
    yield "this is the first statement"
    print("first statement has passed")
    yield "this is the second statement"

#create generator object from function
rand_genobject = rand_statements()

l1 = [1,2]

for number in l1:
    print(number)
    print(next(rand_genobject))

>>> 1
>>> this is the first statement
>>> 2
>>> first statement has passed
>>> this is the second statement
In the for loop above, the value in list 
l1
 is printed first and then the generator object is called to print the value from the generator function. Notice after the last value in the 
l1
 is printed (the number 2 in this case), the value from the generator object doesn’t start at the beginning of the function. It kinda starts in the middle right after the first 
yield
 statement.

What’s going on here?

Photo by C Drying / Unsplash
The idea when creating your generator function is the usage of 
yield
yield
 basically means “stop here and send what value is passed from me to the caller (which in this case is 
rand_genobject
 inside the for loop), and next time I’m called continue the function starting with the next line in code (which would be the print statement in 
rand_statements()
)”. This idea is very different than a 
return
 statement in a function. When a normal function is called more than once, the return statement starts execution at the beginning of the function.  The generator function starts from where it left off, similar to when you yield at a yield sign, the function keeps going giving you want you need when you need it (i.e. if you have another yield statement, the generator object will evaluate that next value). This is also called saving/accessing the state of the function when you re-enter.
The 
next()
 that is called on the generator object is saying “hey this is the value that was yielded my guy”.

Uh…Can we get a recap?!

For my TLDR readers and others who would appreciate a summary on generators:
  • Generators via their objects (or iterators in this case) have values that when yielded, return said values to the callee function. If (or when) the generator is called again, it “resumes” the state of the function after the yield statement
  • When you approach a yield sign while driving, you stop momentarily (at least you should 🤣) and then keep going. The generator function is doing the same thing. Unlike a regular function which is called once (or many times) and has a stopping point, the generator “keeps going”.
  • The way to access the yield value from the generator is to use next(generator_object_name)
  • At a high level, that’s it! It’s not necessary to use generators for every use case imaginable in your Python programming, but it doesn’t hurt to make your programming more efficient and trying it out! My goal with this blog post was to give a general working knowledge and get the wheels spinning in your head (hopefully I did that 🤞🏾).
    Until next time! ✌🏾

Written by jeff-ridgeway | Software Engineer/Blogger
Published by HackerNoon on 2020/05/09