Handling Job Failures in Quartz with Retries

Written by yaf | Published 2023/02/09
Tech Story Tags: java | task-scheduler | framework | programming | java-programming | tutorial | spring | spring-framework

TLDRQuartz is designed to be a simple and powerful scheduling framework that is easy to use and configure. But there is no built-in feature in the Quartz framework that allows for the automatic retrying of jobs that encounter exceptions during their execution. Here we consider using Quartz in a Spring application and implement an exponential random backoff policy that reduces the number of retry attempts, prevents overloading the system, allows the system to recover slowly over time, and distributes retry attempts evenly.via the TL;DR App

Quartz is an open-source job scheduling framework that can be integrated into a wide variety of Java applications. It is used to schedule jobs that can be executed at a later time or on a repeating basis.

Quartz is designed to be a simple and powerful scheduling framework that is easy to use and configure. But there is no built-in feature in the Quartz framework that allows for the automatic retrying of jobs that encounter exceptions during their execution.

This article is inspired by Harvey Delaney’s blog post about Job Exception Retrying in Quartz.NET. Here we consider using Quartz in a Spring application and implement an exponential random backoff policy that reduces the number of retry attempts, prevents overloading the system, allows the system to recover slowly over time, and distributes retry attempts evenly.

Storing retry parameters

We will use JobDataMap-s that are associated with JobDetail and Trigger objects of our job. JobDetail’s JobDataMap will store exponential random backoff policy parameters MAX_RETRIES, RETRY_INITIAL_INTERVAL_SECS, and RETRY_INTERVAL_MULTIPLIER. JobDataMap associated with Trigger will store RETRY_NUMBER.

Job failure handling listener

Now let’s add a Spring component JobFailureHandlingListener that implements the org.quartz.JobListener interface. JobListener#jobToBeExecuted is called when the job is about to be executed and JobListener#jobWasExecuted is called after job completion. In jobToBeExecuted we will increment the RETRY_NUMBER counter. jobWasExecuted will contain the main code of exception handling and creating a new trigger for the next execution of a job.

Exponential random backoff policy

Exponential random backoff is an algorithm used to progressively introduce delays in retrying an operation that has previously failed. The idea is to increase the delay between retries in an exponential manner, with each delay being chosen randomly from a range of possible values. This helps to avoid the scenario where multiple devices or systems all attempt to retry an operation at the same time, which can further exacerbate the problem that caused the operation to fail in the first place.

Exponential random backoff is often used in distributed systems that need to retry operations that have failed. For example, it might be used in a distributed database to retry transactions that fail due to conflicts with other transactions or in a distributed file system to retry operations that fail due to network errors or the temporary unavailability of servers.

We will use an exponential random backoff algorithm to calculate the time of the next attempt to start the job.

Show me the code

JobFailureHandlingListener.java

package quartzdemo.listeners;

import org.quartz.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;

import java.util.Date;

@Component
public class JobFailureHandlingListener implements JobListener {

    private final String RETRY_NUMBER_KEY = "RETRY_NUMBER";
    private final String MAX_RETRIES_KEY = "MAX_RETRIES";
    private final int DEFAULT_MAX_RETRIES = 5;
    private final String RETRY_INITIAL_INTERVAL_SECS_KEY = "RETRY_INITIAL_INTERVAL_SECS";
    private final int DEFAULT_RETRY_INITIAL_INTERVAL_SECS = 60;
    private final String RETRY_INTERVAL_MULTIPLIER_KEY = "RETRY_INTERVAL_MULTIPLIER";
    private final double DEFAULT_RETRY_INTERVAL_MULTIPLIER = 1.5;

    private final Logger logger = LoggerFactory.getLogger(getClass());

    @Override
    public String getName() {
        return "FailJobListener";
    }

    @Override
    public void jobToBeExecuted(JobExecutionContext context) {
        context.getTrigger().getJobDataMap().merge(RETRY_NUMBER_KEY, 1,
                (oldValue, initValue) -> ((int) oldValue) + 1);
    }

    @Override
    public void jobExecutionVetoed(JobExecutionContext context) { }

    @Override
    public void jobWasExecuted(JobExecutionContext context, JobExecutionException jobException) {
        if (context.getTrigger().getNextFireTime() != null || jobException == null) {
            return;
        }
        int maxRetries = (int) context.getJobDetail().getJobDataMap()
                .computeIfAbsent(MAX_RETRIES_KEY, key -> DEFAULT_MAX_RETRIES);
        int timesRetried = (int) context.getTrigger().getJobDataMap().get(RETRY_NUMBER_KEY);
        if (timesRetried > maxRetries) {
            logger.error("Job with ID and class: " + context.getJobDetail().getKey() +", " + context.getJobDetail().getJobClass() +
                    " has run " + maxRetries + " times and has failed each time.", jobException);
            return;
        }

        TriggerKey triggerKey = context.getTrigger().getKey();
        int initialIntervalSecs = (int) context.getJobDetail().getJobDataMap()
                .computeIfAbsent(RETRY_INITIAL_INTERVAL_SECS_KEY, key -> DEFAULT_RETRY_INITIAL_INTERVAL_SECS);
        double multiplier = (double) context.getJobDetail().getJobDataMap()
                .computeIfAbsent(RETRY_INTERVAL_MULTIPLIER_KEY, key -> DEFAULT_RETRY_INTERVAL_MULTIPLIER);
        Date newStartTime = ExponentialRandomBackoffFixtures.getNextStartDate(timesRetried, initialIntervalSecs, multiplier);
        Trigger newTrigger = TriggerBuilder.newTrigger()
                .withIdentity(triggerKey)
                .startAt(newStartTime)
                .usingJobData(context.getTrigger().getJobDataMap())
                .build();
        newTrigger.getJobDataMap().put(RETRY_NUMBER_KEY, timesRetried);

        try {
            context.getScheduler().rescheduleJob(triggerKey, newTrigger);
        } catch (SchedulerException e) {
            throw new RuntimeException(e);
        }
    }

}

ExponentialRandomBackoffFixtures.java

package quartzdemo.listeners;

import org.apache.commons.lang3.RandomUtils;

import java.time.Duration;
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.util.Date;

public class ExponentialRandomBackoffFixtures {

    public static Date getNextStartDate(int timesRetried, int initialIntervalSecs, double multiplier) {
        double minValue = initialIntervalSecs * Math.pow(multiplier, timesRetried - 1);
        double maxValue = minValue * multiplier;
        Duration duration = Duration.ofMillis((long) (RandomUtils.nextDouble(minValue, maxValue) * 1000));
        LocalDateTime nextDateTime = LocalDateTime.now().plus(duration);
        return Date.from(nextDateTime.atZone(ZoneId.systemDefault()).toInstant());
    }

}

Now we can add the listener to a scheduler

package quartzdemo;

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.EnableScheduling;
import org.springframework.scheduling.quartz.SchedulerFactoryBean;
import quartzdemo.listeners.JobFailureHandlingListener;

@Configuration
@EnableScheduling
public class SchedulingConfiguration {

    private final JobFailureHandlingListener jobFailureHandlingListener;

    public SchedulingConfiguration2(JobFailureHandlingListener jobFailureHandlingListener) {
        this.jobFailureHandlingListener = jobFailureHandlingListener;
    }

    @Bean
    public SchedulerFactoryBean scheduler() {
        SchedulerFactoryBean schedulerFactory = new SchedulerFactoryBean();
        // ...
        schedulerFactory.setGlobalJobListeners(jobFailureHandlingListener);
        return schedulerFactory;
    }

}

and run a job with a custom RETRY_INITIAL_INTERVAL_SECS parameter

package quartzdemo.services;

import org.quartz.*;
import org.springframework.stereotype.Service;
import quartzdemo.jobs.SimpleJob;

import java.time.LocalDateTime;
import java.time.ZoneId;
import java.util.Date;

import static quartzdemo.listeners.JobFailureHandlingListener.RETRY_INITIAL_INTERVAL_SECS_KEY;

@Service
public class SchedulingService {

    private final Scheduler scheduler;

    public SchedulingService(Scheduler scheduler) {
        this.scheduler = scheduler;
    }

    // ...

    private void scheduleJob() throws SchedulerException {
        Date afterFiveSeconds = Date.from(LocalDateTime.now().plusSeconds(5).atZone(ZoneId.systemDefault()).toInstant());

        JobDetail jobDetail = JobBuilder.newJob(SimpleJob.class).usingJobData(RETRY_INITIAL_INTERVAL_SECS_KEY, 30).build();
        Trigger trigger = TriggerBuilder.newTrigger().startAt(afterFiveSeconds).usingJobData("", "").build();
        scheduler.scheduleJob(jobDetail, trigger);
    }
}

Conclusion

Quartz is a robust framework that provides a wide range of capabilities. Even if it lacks something, it is always possible to implement the necessary functionality using the provided API. This allows developers to extend the framework to meet their specific requirements, so Quartz is a great choice for any scheduling needs.


Written by yaf | 10+ years Java developer
Published by HackerNoon on 2023/02/09