A hitchhiker’s guide to Spring Boot, Elasticsearch, Logstash, Kibana, PostgreSQL...

FbEZRbN.png!web

Recently I decided to embark on the journey of creating a sample project of the ELK stack that works with Spring Boot and PostgreSQL — all in Docker with a docker-compose. No SaaS. While there are a lot of sample projects for any given web framework because of how popular ELK has grown, in my opinion, there was no clear path to the finish line for Spring Boot.

My sample Spring Boot app spring-elastic-genie has a REST API that looks for movies in OMDB and saves the results in PostgreSQL. These results then get indexed by Elasticsearch at which point you can visualize the results in Kibana. You should know in advance that this is not a how-to-code tutorial but a high-level architecture overview of using Spring Boot with ELK on-premise, in Docker.

If you are not interested in reading further but want a GitHub link in order to look at some code instead… here is a GitHub link to my sample project:

tech4242/spring-elastic-genie

This project will showcase how to use Elasticsearch and Kibana with Spring Boot. — tech4242/spring-elastic-genie

github.com

But I strongly suggest you read ahead as there are numerous pitfalls and a lot of tutorials out there that don’t tell you the whole story; especially if you are coming from a non-Spring background (as I do).

All tutorials I saw cover how to use Spring data repositories to write directly to Elasticsearch but they do not cover what to do with Spring Boot apps in production that actually save data to e.g. relational DBs like PostgreSQL, MariaDB instead where a search index is needed on top for use cases like full text search. The section below explains the subtle differences and what the reasoning behind them is.

Architecture Options

Option 1: What other “How to use Spring Boot with Elasticsearch” tutorials do

3EBrAnE.png!web

Ok, so basically this is what you will find when you browse the Web for tutorials on Elasticsearch and Spring Boot. You will have a Spring Data repository that will allow you to write directly to Elasticsearch with an ElasticsearchCrudRepository interface, which is an extension of the CrudRepository that is used to write to normal DBs like PostgreSQL. And this is all great but covers a small portion of real world use cases because a lot of times you want Elasticsearch to index your primary DB instead as mentioned above.

So, the problem comes up when you want to write directly to both Elasticsearch and PostgreSQL. If you want to keep this architecture as seen above there is sadly only one option: have 2x Spring Data repositories for each model — one for Elasticsearch and one for PostgreSQL. And that is terrible. You are just going to 2x your business logic code and Spring cannot do transaction management between the two DBs, so you could end up just saving in Elasticsearch or just PostgreSQL if a transaction fails (unless you want to manage this stuff manually…), which defeats the purpose of having a search index in the first place.

tl;dr this approach is not good if you want to write to both DBs. There is nothing quite like the e.g. Django + Haystack approach where you kind of tell Elasticsearch to index your Django models and you are done.

Furthermore, you cannot do something like:

public interface MovieRepository extends CrudRepository<Movie, Integer>, ElasticsearchCrudRepository<Movie, Integer> { 
    //...
}

This will instantly give you an error on runtime (not on build) because you are overriding beans, which is disabled by default but enabling it will not help since you can only do one of both (it is in the word — override) This is why you will get stuck with having a Spring Data repository for each DB. Unless you pick Option 2.

Option 2: Logstash and more abstraction

UvAFnqe.png!web

This is what Option 2 looks like. Essentially Spring Boot takes the backseat, we use the standard CrudRepository to save data into PostgreSQL and then we start logging all transactions we want from PostgreSQL to Elasticsearch with Logstash thus removing all Elasticsearch code from our Spring Boot app and the need for code duplication. The code needed for Option 1 is commented out in GitHub in each respective file.

This project’s docker-compose includes the Spring Boot app, PostgreSQL, Elasticsearch, Kibana, Logstash and ElasticHQ (ES monitoring service). This setup will get you running with ELK and Docker in no time.

Most of this is pretty basic if you are familiar with Docker. Some highlights:

Spring Boot is waiting on PostgreSQL with a health-check. This Docker container just runs the jar file that was built with gradle. The config for the DB connections can be found in application.properties, which read from the core.env file.
Logstash mounts 2 local folders in my GH project for the JDBC driver and the logstash.conf
Elastichq monitors Elasticsearch in real-time and works pretty much out of the box
None of this is production-ready — use this for development only! Also ELK is pretty decent if used as a SaaS on e.g. AWS but I wanted to showcase how to do the on-premise open source path. The config is described further in the README on GitHub.
Logstash Disclaimer: currently Logstash is only configured to get data from PostgreSQL with a simple SQL query but it does not have filters for duplicates etc. There are numerous Logstash tutorials you can follow to filter your data.

Random Pitfalls

Logstash 7.2.0 is not compatible with a lot of JDBC drivers you would need to connect Logstash with Elasticsearch. Instead I reverted back to 7.0.0. Wasted so many hours on this. And ended up opening an issue on the pgjdbc GH page only to find out it’s a Logstash bug .
Elasticsearch 7.X is incompatible with spring-data-elasticsearch, which is used to save data into Elasticsearch by using Spring’s data repositories. The latest version supported is 6.7.2 but Kibana’s 7.X has dark-mode and I wanted some dark mode screenshots in this article. Because dark mode. :) You can follow their releases on GitHub .

The Fortune Cookie Story

Option 2 kind of beats the purpose of having a dedicated guide for Spring Boot and Elastic right? Maybe but maybe not. Since Option 1 ends up getting a little confusing this should highlight what you can do with existing projects without the need to overhaul your entire Spring project to write to Elasticsearch. Furthermore, ask yourself whether you even need Elasticsearch. Could another index that works directly with e.g. PostgreSQL get the job done? Can PostgreSQL itself get the job done just fine? If the answer to these is “no” though, Option 2 should introduce the least amount of headache.

I hope this tutorial will enable you to kickstart your ELK stack journey (maybe even with Docker), so you can worry less about your Spring app and more about getting the most out of Elasticsearch!

And above all: DON’T PANIC ;)

Architecture Options

Random Pitfalls

The Fortune Cookie Story

Recommend

Pragmatic compiling of C++ to WebAssembly. A Guide

The July 2019 Security Update Review

A Tour of Generative Adversarial Network Models

Prime Video is on Chromecast and Android TV, plus YouTube on Fire TV

火星，明年我们去看你！

IBM以340亿美元收购Red Hat

Predicting the Generalization Gap in Deep Neural Networks

7年Java后端被淘汰，一路北漂辛酸史。。。

从Web开发者的视角来解读MVC架构

Exclusive: This is the Galaxy Tab S6 and it has a dual camera - SamMobile

About Joyk