WTF are Python Generators?

While this post seeks to break down the concept of a generator, it is assumed that there is comfortability with the basics of programming or in this case Python programming. Thanks!

I would venture to say that generators can make your code more efficient, easier to maintain, and handle memory properly in your programming. Interestingly enough, I didn’t start using Python generators heavily until after college in my software engineering career. Up until that point, I used to load mostly everything in memory not caring about efficiency and performance (we were all beginners at some point 😂). Now, the tide has turned and I have seen the glory of the usage of generators.

https://gph.is/14h4kKl

via GIPHY

Return as needed, please

Let’s assume you have a file named

dummy.txt

and you want to read in and print what’s in the file. There are two ways to tackle this problem: You could read the whole file into memory or you could read line-by-line.


# dummy.txt contains the following lines
# Hello, this is dummy.txt
# I want to show you that generators are the way to go!
# You do not have to use generators all the time
# But they sure do help when you need them 

# you load the whole file into memory
with open("dummy.txt", "r+") as txtfile:
    lines_in_file = txtfile.readlines()

print(lines_in_file)
>>> ['Hello, this is dummy.txt\n',
'I want to show you that generators are the way to go!\n',
'You do not have to use generators all the time\n',
'But they sure do help when you need them\n']

# you only read/load in memory a certain number of bytes
with open("dummy.txt", "r+") as txtfile2:
    line_in_file = txtfile2.readline(10)

print(line_in_file)
>>> Hello, thi

Looking at this example using the

readlines()

function, you load ALL the contents of the .txt file into memory via a variable

lines_in_file

. An alternative approach loads a subset into the variable

line_in_file

using

readline(size)

. The size parameter is the number of bytes to be read and if 0, reads until the newline character is approached.

While

dummy.txt

was only 4 lines of text, what if you had a text file and it was 100,000’s of lines that you had to parse and then do more computation on? Or what about a data structure you are working with and you don’t know the size? Do you want to load all of it into memory?!?! While you could use

readlines()

, it is safer and more resourceful to use

readline(size)

as you are only reading either a full line in the file or piece of the line in the file.

How does this relate to generators? Well, generators operate in a similar vein as

readline(size)

. They evaluate the value as necessary (in our case, necessary is the size parameter or reaching a terminating new-line character) and not all at once like

readlines()

Note: both readline(size) and readlines() are not generators. They are used for example purposes.

Bruh, what are generators and please please KISS (Keep it Simple Stupid)?

Generators are a way of returning values as you need them vs. all at once (if you have a relatively large data structure). If generators could talk they would say, “hey dude, I’ll give you what you need as you need it, but not everything at one time”. Sometimes it isn’t efficient to load all the contents of a data structure into a variable (one especially of unknown length), as you only have so much memory on your computer. Usually, it’s best to conserve your computational resources when you can, and generators can help a ton in that area. Let’s look at implementation to get a fuller picture 😃.

Python generators


# generator function
def rand_statements():
    yield "this is the first statement"
    print("first statement has passed")
    yield "this is the second statement"

#create generator object from function
rand_genobject = rand_statements()

l1 = [1,2]

for number in l1:
    print(number)
    print(next(rand_genobject))

>>> 1
>>> this is the first statement
>>> 2
>>> first statement has passed
>>> this is the second statement

In the for loop above, the value in list

l1

is printed first and then the generator object is called to print the value from the generator function. Notice after the last value in the

l1

is printed (the number 2 in this case), the value from the generator object doesn’t start at the beginning of the function. It kinda starts in the middle right after the first

yield

statement.

What’s going on here?

Photo by C Drying / Unsplash

The idea when creating your generator function is the usage of

yield

yield

basically means “stop here and send what value is passed from me to the caller (which in this case is

rand_genobject

inside the for loop), and next time I’m called continue the function starting with the next line in code (which would be the print statement in

rand_statements()

)”. This idea is very different than a

return

statement in a function. When a normal function is called more than once, the return statement starts execution at the beginning of the function. The generator function starts from where it left off, similar to when you yield at a yield sign, the function keeps going giving you want you need when you need it (i.e. if you have another yield statement, the generator object will evaluate that next value). This is also called saving/accessing the state of the function when you re-enter.

The

next()

that is called on the generator object is saying “hey this is the value that was yielded my guy”.

Uh…Can we get a recap?!

For my TLDR readers and others who would appreciate a summary on generators:

Generators via their objects (or iterators in this case) have values that when yielded, return said values to the callee function. If (or when) the generator is called again, it “resumes” the state of the function after the yield statement

When you approach a yield sign while driving, you stop momentarily (at least you should 🤣) and then keep going. The generator function is doing the same thing. Unlike a regular function which is called once (or many times) and has a stopping point, the generator “keeps going”.

The way to access the yield value from the generator is to use next(generator_object_name)

At a high level, that’s it! It’s not necessary to use generators for every use case imaginable in your Python programming, but it doesn’t hurt to make your programming more efficient and trying it out! My goal with this blog post was to give a general working knowledge and get the wheels spinning in your head (hopefully I did that 🤞🏾).

Until next time! ✌🏾