“March calendar” by Charles Deluvio 🇵🇭🇨🇦 on Unsplash
One of the fun parts in using serverless is the fact that you can try out new ideas and provision them in a flick of a finger. I’ve mentioned more than once that s3 is a powerful tool that can be used as more than an elastic persistent layer.
S3 the best of 2 worlds_S3 can be used more than just storing data. View novel ways to extend its functionality._hackernoon.com
In this post, I’m going to demonstrate how to use s3 as a scheduling mechanism to execute various tasks.
Overview
Simple S3 flow
S3, alongside a Lambda function, creates a simple event base flow. For example, attach a Lambda to an s3 PUT event and create a new file, and the Lambda function is then called. To create a schedule event, all you have to do is to write the file you want to act upon on the designated time; however, AWS enables you to create only recurring events using cron or rate expression. What happens when you want to schedule a one-time event? You are stuck.
The s3-scheduler library enables you to do just that. Specifically, it uses s3 as a scheduling mechanism that enables you to schedule one-time events.
How it works
Each event is a separate file. Behind the scenes, the library uses the recurring mechanism to wake up every 1 minute, scan for the relevant files using s3’s filter capabilities, and if the scheduled time has passed, move the file to the relevant bucket + key.
In order to function properly, the library has to know the answer to three questions:
- What content to save,
- Where to save it (bucket + key) → will trigger the appropriate Lambda function, and
- When to move it to the appropriate bucket.
Encoding details
The content to save is left unchanged, and points 2 and 3 (see above) are encoded in the key’s name and use |
as a separator between the parts. For example, to copy the relevant content on the 5th of August to a bucket called s3-bucket
and a folder named s3_important_files
, the scheduler will produce the following file: 2018–08–05|s3-bucket|s3_files-important
. By keeping the meta data outside the actual content, we achieve the following benefits:
- It speeds up the process with no need to read the entire content to decide when and where to copy.
- It allows the content to be binary, not only text-based.
- By using s3 filter capabilities, it reduces the cost to fetch the correct files.
- There is easier debugging; just view the file name to understand when and where to copy.
Basic Usage
Installation
pip install s3-scheduler
Setting up a recurring flow
The library uses the AWS built-in capability to run every 1 minute. The configuration depends on your framework. For example, for Zappa use the following:
Scheduling
During initialization, the scheduler requires the bucket and a folder in which the actual scheduling details are kept. Remember that each event is a separate file; therefore, there is a need to save them somewhere. When to schedule is a simple datetime
object.
Stopping
If you want to cancel the schedule event before it occurs, do the following:
Fin
Scheduling in the AWS serverless world is a bit tricky. Right now, AWS provides only cron-like capabilities, but this post has demonstrated one technique that can be used to create a more robust scheduling capability.
Read “from the trenches” series
- Serverless testing from the trenches
- Serverless multi-cloud from the trenches
- Development flow in serverless environment from the trenches — Follow me to get an update when it’s ready