Optimizing SQL Queries With JPA Repositories

Written by dstepanov | Published 2022/07/01
Tech Story Tags: debugging | getsentry | monitoring | java | spring-data-jpa | sql | optimizing-sql-queries | jpa-repositories

TLDRDevelopers created one entity class and one repository for each table in the relational data model. Over time, the volume of data grew, entities acquired additional fields, and the number of connections increased. At this point, the developers decided to write native SQL queries to improve performance. This created architectural problems since such queries often use the features of a particular SQL dialect for optimization. They are not validated during the build of the project and their maintenance in a working state depended on the responsibility of specific developers. There is another option, allocate a separate class for a shortened ViewModel that is read-only.via the TL;DR App

Hello everyone, today I would like to talk about how you can optimize work with JPA repositories and improve performance by avoiding native SQL queries in the code.

I quite often encountered a situation in projects, when designing a repository and interacting with a database, developers created one entity class and one repository for each table in the relational data model. Over time, the volume of data grew, entities acquired additional fields, and the number of connections increased, which ultimately led to an increase in query execution time.

Often we do not need to get the whole entity, but only a few fields from the table are required. At this point, the developers decided to write native SQL queries to improve performance. This created architectural problems since such queries often use the features of a particular SQL dialect for optimization. Which in turn limited the possibilities for migration between different databases. And also there were problems accompanying these requests since they are not validated during the build of the project and their maintenance in a working state entirely depended on the responsibility of specific developers.

Consider an example of managing the structure of an organization, where there is a strict hierarchical structure and each department has employees in different positions. Each position can only have one employee at a time.

This structure can look like tables in a relational database, as shown in the picture below. Each organizational unit has a link to a parent. There are also separate tables for the position, employee, and appointment, which determine the period in which the employee worked in this position.

In code, this will look like an entity corresponding to a specific table and a repository.

But let's consider the simple task of drawing an organizational structure tree with the ability to expand the tree to a level below, and also view a list of employees for each organizational module indicating the position in which they work.

To draw a tree of organizational modules is enough for us to be able to get a list of child units. This is a very simple task, just add the findByParent(OrganizationUnit parent) method to the repository. However, in this case, we will query the database for all information about the department, which is redundant. This request will look like this:

SELECT * FROM organization_unit WHERE parent_id=<<parent_id>>;

To draw the hierarchy, we need to get the unit ID and its name only, the rest of the fields are not interested. Accordingly, we could execute the request:

SELECT id, title FROM organization_unit WHERE parent_id=<<parent_id>>;

To execute such a query, you must either write native SQL or use a query builder, such as QueryDSL, which is close to manually writing a query. But there is another option, you can allocate a separate class for a shortened ViewModel that is read-only. You will have to create a separate repository for it, but this will allow you to reuse this ViewModel in various scenarios where only the main fields of the entity are required. The class structure will then look like this:

And directly in the code, it will be something like this:

@Getter
@MappedSuperclass
@SuppressWarnings("unchecked")
@ToString(callSuper = true, onlyExplicitlyIncluded = true)
public abstract class OrganizationUnitBase<T extends OrganizationUnitBase<T>> {
    @Id
    @ToString.Include
    @EqualsAndHashCode.Include
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    protected Long id;

    @NotNull
    @ToString.Include
    @Column(name = "title")
    protected String title;

    //link to parent organization unit for filter by parentId
    @Column(name = "parent_id", insertable = false, updatable = false)
    protected Long parentId;
}

@Getter
@Setter
@Entity
@NoArgsConstructor
@Accessors(chain = true)
@Table(name = "organization_unit")
@EqualsAndHashCode(callSuper = true, onlyExplicitlyIncluded = true)
@ToString(callSuper = true, onlyExplicitlyIncluded = true)
public class OrganizationUnitFull extends OrganizationUnitBase<OrganizationUnitFull> {
    private String address;
    private String phone;    
    private String email;
    private String fax;
    private LocalDate openDate;
    private LocalDate closeDate;
    private boolean active;

    @ManyToOne
    private OrganizationUnit parent;
}

@Entity
@Getter
@Setter
@Immutable
@NoArgsConstructor
@Accessors(chain = true)
@Table(name = "organization_unit")
public class OrganizationUnitMinimalView extends OrganizationUnitBase<OrganizationUnitMinimalView> {
    
}

@Repository
public interface OrganizationUnitMinimalViewRepository extends JpaRepository<OrganizationUnitMinimalView, Long> {
  List<OrganizationUnitMinimalView> findAllByParentId(Long parentId);
}

@Repository
public interface OrganizationUnitFullRepository extends JpaRepository<OrganizationUnitFull, Long> {
}

Now let's look at the second part of the problem. We need to see all the employees of the organizational module, indicating the positions in which they work. To do this, we need to perform at least two JOINs. When using full entities, it would look like this:

SELECT * 
FROM position AS p 
JOIN appointment AS a ON a.position_id=p.id 
JOIN employee AS e ON a.employee_id=e.id 
WHERE p.organization_unit_id=<<organization_unit_id>>;

In this case, we will get all fields from three tables. And if the organizationUnit field in the Position entity has fetch type eager, then we will get another additional JOIN with the organization_unit table. In reality, it is enough for us to select only the fields of interest to us:

SELECT p.id, p.title, e.id, e.first_name, e.last_name, a.id 
FROM position AS p 
JOIN appointment AS a ON a.position_id=p.id 
JOIN employee AS e ON a.employee_id=e.id 
WHERE p.organization_unit_id=<<organization_unit_id>>;

To transform our query into a more optimal view, it is enough to also highlight the truncated ModelViews for the employee entity, position, and assignment. It might look like this:

In conclusion, I would like to say that sometimes such allocations of minified entities have a huge effect on system performance while maintaining code maintainability.


Written by dstepanov | 10+ years of experience in software development including architecture, design, implementation, and maintenance.
Published by HackerNoon on 2022/07/01