51

Query pagination with JPA and Hibernate

 5 years ago
source link: https://www.tuicool.com/articles/hit/EbAZjiB
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

(Last Updated On: October 9, 2018)

Follow @vlad_mihalcea

Introduction

Inspired by this StackOverflow answer I gave recently, I decided it’s time to write an article about query pagination when using JPA and Hibernate.

In this article, you are going to see how to use query pagination to restrict the JDBC ResultSet size and avoid fetching more data than necessary.

Domain Model

Now, let’s assume we defined the following Post and PostComment entity classes in our application:

BfimQvm.png!web

The Post class is the parent entity while the PostComment is the child as it has a @ManyToOne association with the Post entity. Both entities implement the Identifiable interface which provides a contract for accessing the underlying entity identifier.

Next, we are going to save the following Post and PostComment entities in the database:

LocalDateTime timestamp = LocalDateTime.of(
    2018, 10, 9, 12, 0, 0, 0
);

int commentsSize = 5;

LongStream.range(1, 50).forEach(postId -> {
    Post post = new Post();
    post.setId(postId);
    post.setTitle(
        String.format("Post nr. %d", postId)
    );
    post.setCreatedOn(
         Timestamp.valueOf(
            timestamp.plusMinutes(postId)
        )
    );

    LongStream.range(1, commentsSize + 1).forEach(commentOffset -> {
        long commentId = ((postId - 1) * commentsSize) + commentOffset;

        PostComment comment = new PostComment();        
        comment.setId(commentId);
        comment.setReview(
            String.format("Comment nr. %d", comment.getId())
        );
        comment.setCreatedOn(
            Timestamp.valueOf(
                timestamp.plusMinutes(commentId)
            )
        );

        post.addComment(comment);

    });
    
    entityManager.persist(post);
});

Limiting the result set size

To limit the underlying query ResultSet size, the JPA Query interface provides the setMaxResults method .

Therefore, when executing the following JPQL query:

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "order by p.createdOn ")
.setMaxResults(10)
.getResultList();

assertEquals(10, posts.size());
assertEquals("Post nr. 1", posts.get(0).getTitle());
assertEquals("Post nr. 10", posts.get(9).getTitle());

Hibernate generates the following SQL statement on PostgreSQL:

SELECT p.id AS id1_0_,
       p.created_on AS created_2_0_,
       p.title AS title3_0_
FROM post p
ORDER BY p.created_on
LIMIT 10

On SQL Server 2012 (or newer), Hibernate will execute the following SQL query:

SELECT p.id AS id1_0_,
       p.created_on AS created_2_0_,
       p.title AS title3_0_
FROM post p
ORDER BY p.created_on
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY

Therefore, the SQL pagination query is adapted to the underlying database engine capabilities.

The use of ORDER BY is mandatory when using query pagination because SQL does not guarantee any particular order unless we provide one via the ORDER BY clause.

Using offset to position the result set

If the previous query was typical for the first page of a given pagination query, navigating the following page requires positioning the result set where the last page ended. For this purpose, the JPA Query interface provides the setFirstResult method .

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "order by p.createdOn ")
.setFirstResult(10)
.setMaxResults(10)
.getResultList();

assertEquals(10, posts.size());
assertEquals("Post nr. 11", posts.get(0).getTitle());
assertEquals("Post nr. 20", posts.get(9).getTitle());

When running the previous JPQL query on PostgreSQL, Hibernate executes the following SQL SELECT statements:

SELECT p.id AS id1_0_,
       p.created_on AS created_2_0_,
       p.title AS title3_0_
FROM post p
ORDER BY p.created_on
LIMIT 10
OFFSET 10

and on SQL Server 2012 (or newer), Hibernate would generate this SQL query:

SELECT p.id AS id1_0_,
       p.created_on AS created_2_0_,
       p.title AS title3_0_
FROM post p
ORDER BY p.created_on
OFFSET 10 ROWS FETCH NEXT 10 ROWS ONLY

DTO projection queries

The JPA query pagination is not limited to entity queries that return entities only. You can use it for DTO projections as well.

Assuming we have the following PostCommentSummary DTO:

public class PostCommentSummary {

    private Number id;
    private String title;
    private String review;

    public PostCommentSummary(
            Number id, 
            String title, 
            String review) {
        this.id = id;
        this.title = title;
        this.review = review;
    }

    public PostCommentSummary() {}

    //Getters omitted for brevity
}

When running the following DTO projection query:

List<PostCommentSummary> summaries = entityManager
.createQuery(
    "select new " +
    "   com.vladmihalcea.book.hpjp.hibernate.fetching.PostCommentSummary( " +
    "       p.id, p.title, c.review " +
    "   ) " +
    "from PostComment c " +
    "join c.post p " +
    "order by c.createdOn")
.setMaxResults(10)
.getResultList();

assertEquals(10, summaries.size());
assertEquals("Post nr. 1", summaries.get(0).getTitle());
assertEquals("Comment nr. 1", summaries.get(0).getReview());

assertEquals("Post nr. 2", summaries.get(9).getTitle());
assertEquals("Comment nr. 10", summaries.get(9).getReview());

Hibernate appends the pagination clause to the underlying SQL query:

SELECT p.id AS col_0_0_,
       p.title AS col_1_0_,
       c.review AS col_2_0_
FROM post_comment c
INNER JOIN post p ON c.post_id=p.id
ORDER BY c.created_on
LIMIT 10

For more details about DTO projection with JPA and Hibernate, check outthis article.

Native SQL queries

The JPA query pagination is not limited to entity queries, such as JPQL or Criteria API. You can use it for native SQL queries as well.

List<Tuple> posts = entityManager

.createNativeQuery(

"select p.id as id, p.title as title " +

"from post p " +

"order by p.created_on", Tuple.class)

.setFirstResult(10)

.setMaxResults(10)

.getResultList();

assertEquals(10, posts.size());

assertEquals("Post nr. 11", posts.get(0).get("title"));

assertEquals("Post nr. 20", posts.get(9).get("title"));

When running the SQL query above, Hibernate appends the DB-specific pagination clause:

SELECT p.id AS id,
       p.title AS title
FROM post p
ORDER BY p.created_on
LIMIT 10
OFFSET 10

JOIN FETCH and pagination

However, if we try to use the JOIN FETCH clause in the entity query while also using JPA pagination:

List<Post> posts = entityManager.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "order by p.createdOn", Post.class)
.setMaxResults(10)
.getResultList();

assertEquals(10, posts.size());

Hibernate will issue the following warning message:

HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!

And the executed SQL query will lack the pagination clause:

SELECT p.id AS id1_0_0_,
       c.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       c.created_on AS created_2_1_1_,
       c.post_id AS post_id4_1_1_,
       c.review AS review3_1_1_,
       c.post_id AS post_id4_1_0__,
       c.id AS id1_1_0__
FROM post p
LEFT OUTER JOIN post_comment c ON p.id=c.post_id
ORDER BY p.created_on

This is because Hibernate wants to fetch entities fully along with their collections as indicated by the JOIN FETCH clause while the SQL-level pagination could truncate the ResultSet possibly leaving a parent Post entity with fewer elements in the comments collection.

The problem with the HHH000104 warning is that Hibernate will fetch the product of Post and PostComment entities, and due to the result set size, the query response time is going to be significant.

In order to work around this limitation, you have to use aWindow Function query:

List<Post> posts = entityManager
.createNativeQuery(
    "select * " +
    "from (   " +
    "    select *, dense_rank() OVER (ORDER BY post_id) rank " +
    "    from (   " +
    "        select p.*, pc.* " +
    "        from post p  " +
    "        left join post_comment pc on p.id = pc.post_id  " +
    "        order by p.created_on " +
    "    ) p_pc " +
    ") p_pc_r " +
    "where p_pc_r.rank <= :rank", Post.class)
.setParameter("rank", 10)
.unwrap(NativeQuery.class)
.addEntity("p", Post.class)
.addEntity("pc", PostComment.class)
.setResultTransformer(DistinctPostResultTransformer.INSTANCE)
.getResultList();

For more details about using Window Functions to fix the HHH000104 issue as well as the code for DistinctPostResultTransformer , check outthis article.

Why not use query streaming instead?

JPA 2.2 added the getResultStream Query method, which you might think it’s a valid alternative to pagination. However, the stream result will not provide the result set size to the query planner, hence a suboptimal execution plan might be picked. For this reason, it’s much more efficient to use pagination than streaming when it comes to fetching small amounts of data.

For more details about why pagination is more efficient than streaming, check outthis article.

Keyset pagination

Markus Winand, who wrote the SQL Performance Explained book, advocates for Keyset pagination instead of Offset . Although Offset pagination is an SQL standard feature, there are two reasons why you would prefer Keyset pagination:

  • performance (the index must be scanned up to the offset while, for keyset pagination, we can go directly to the first index entry that matches our order by predicate and filtering criteria)
  • correctness (if elements are being added in between, offset pagination will not provide consistent reads)

Even if Hibernate does not support keyset pagination, you can use Blaze Persistence for this purpose.

If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.

IzMJFjb.jpg!web3m2Yrem.jpg!web

Conclusion

Fetching just as much data as you need is one of the most important tips when it comes to data access performance. When fetching data, pagination allows you to control the result set size so that performance is stable even if the underlying dataset grows with time.

While keyset pagination offers better performance for large result sets, if you can narrow the scanned data set using the right filtering predicates, then offset pagination is going to perform rather well. To get consistent reads, you have to make sure that the scanned data set is always sorted in such a way that new entries are appended at the end of the set, rather than being mixed in between old entries.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK