elasticsearch | Jeroen van Wilgenburg's blog

Using a ConnectableFlux to do background batching on elasticsearch

9 January 2020 Jeroen van Wilgenburg Leave a comment

We have a Project Reactor application which is a bit brittle on refactorings. The load on our elasticsearch cluster is pretty high due to many single get/insert by id’s. Adding batch read by id was so much work that I was looking for a different solution. I eventually came up with a solution using a ConnectableFlux.

Categories: English, java, work Tags: elasticsearch, reactor, rxjava, spring

Running your elasticsearch integration tests with JUnit 5, Karate and TestContainers (Docker)

8 July 2019 Jeroen van Wilgenburg 2 comments

Earlier this year I wrote an article on how to run your integration tests with an embedded elasticsearch. When upgrading to elasticsearch 7 this method didn’t work (yet). An alternative (and maybe even better) method is using Testcontainers to run elasticsearch in a Docker container. I will also show how you can leverage Karate to do your integration testing.

Categories: English, java, work Tags: docker, elasticsearch, junit, karate, vertx

Running your JUnit 5 integration test with an embedded elasticsearch on a random port (and optionally Spring Boot)

22 January 2019 Jeroen van Wilgenburg 17 comments

With recent versions of elasticsearch (5+) the learning curve for an integration test became a bit steeper but will result in a cleaner solution in the end. In this article I will describe how to set up your test with JUnit 5 to run your elasticsearch integration tests. I will also discuss how to make it work with Spring-Boot Test.

Categories: English, java, Uncategorized, work Tags: elasticsearch, junit, spring

Understanding Spark parameters – A step by step guide to tune your Spark job

15 February 2015 Jeroen van Wilgenburg 1 comment

After using Spark for a few months we thought we had a pretty good grip on how to use it. The documentation of Spark appeared pretty decent and we had acceptable performance on most jobs. On one Job we kept hitting limits which were much lower than with that Jobs predecessor (Storm). When we did some research we found out we didn’t understand Spark as good as we thought.
My colleague Jethro pointed me to an article by Gerard Maas and I found another great article by Michael Noll. Combined with the Spark docs and some Googlin’ I wrote this article to help you tune your Spark Job. We improved our throughput by 600% (and then the elasticsearch cluster became the new bottle neck)

Categories: English, java, work Tags: elasticsearch, hadoop, kafka, scala, spark, yarn

Boosting with wildcards in elasticsearch

9 November 2013 Jeroen van Wilgenburg Leave a comment

While preparing my presentation I discovered that boosting with wildcards wasn’t working. I kind of promised to blog about it, so here it is 🙂

Categories: English, java, work Tags: elasticsearch

My NLJUG presentation on elasticsearch

9 November 2013 Jeroen van Wilgenburg Leave a comment

Last thursday I gave a talk about elasticsearch on the Dutch Java User Group conference (JFall) called “Full text search met ElasticSearch in de praktijk”. Yes it’s in Dutch but I still want to share it with you, it might contain some useful tips and tricks.

Categories: English, java, Nederlands, Uncategorized, work Tags: elasticsearch

How sharding in elasticsearch makes scoring a little less accurate and what to do about it

11 September 2013 Jeroen van Wilgenburg 3 comments

Currently I’m using a small dataset (about 3500 records) on ElasticSearch and saw some strange scoring. Hits that should have exactly the same score had _almost_ the same score. Almost the same is kind of a problem since we sort our data on score and name. After some researching it appeared sharding was the issue here. With only one shard the problem is solved. If you want to know how I found out and what alternatives you have please stay tuned.

updated @ Sep 13: Zachary Tong pointed me to an article with a much better solution (read: the right solution 😉 ). I added the second to last paragraph to explain it.

Categories: English, java, work Tags: elasticsearch, lucene

Diacritics in ElasticSearch – Fix it in two places and also watch your encoding while your at it

3 August 2013 Jeroen van Wilgenburg 3 comments

A few weeks ago I solved an issue with ElasticSearch where some search results with diacritics didn’t show up in the results. At least, I thought I did. There are quite a few places where it can go wrong. In this article I’ll explain these places and how to fix the issues.

Categories: English, java, work Tags: elasticsearch, lucene

Add search field priority to Elasticsearch (works for every Lucene based framework)

31 March 2013 Jeroen van Wilgenburg Leave a comment

Last week a ‘bug’ was filed where the end users of our application wanted search results with a match on the name given more priority than a match on the address (we’re talking about searching for a company). Since I used Lucene a lot I thought it was just ‘boosting’ the name field. It appeared to be a bit more difficult. Maybe because Elasticsearch behaves differently, but probably because my Lucene knowledge has some rusty colorations.
Read more…

Categories: English, java, work Tags: elasticsearch, lucene

JPA with versioning and full text search – Mixing Hibernate Envers with elasticsearch

18 March 2013 Jeroen van Wilgenburg 3 comments

I’m working on a project where we need to search the data the ‘google way’ and keep a history of every change in the data. Since a requirement is that we have to store the data in an sql database I started with Hibernate JPA. Hibernate Envers was added for versioning. For the Google search (or just full text search) I needed something with Lucene in the background. Hibernate Search seemed like a good combination.

Pretty soon I found out that Envers and Search don’t mix very well and a little search on the Hibernate forum confirmed it [1]. Envers and Search are great products, don’t get me wrong, but this time it didn’t work out.

Furthermore it’s good to know that I’m also using Spring, which can mess up things pretty bad. Again it’s a lethal cocktail, nothing to do with the quality of the frameworks.
Read more…

Categories: English, java, work Tags: elasticsearch, hibernate, jpa, lucene

Jeroen van Wilgenburg's blog

Archive

Using a ConnectableFlux to do background batching on elasticsearch

Running your elasticsearch integration tests with JUnit 5, Karate and TestContainers (Docker)

Running your JUnit 5 integration test with an embedded elasticsearch on a random port (and optionally Spring Boot)

Understanding Spark parameters – A step by step guide to tune your Spark job

Boosting with wildcards in elasticsearch

My NLJUG presentation on elasticsearch

How sharding in elasticsearch makes scoring a little less accurate and what to do about it

Diacritics in ElasticSearch – Fix it in two places and also watch your encoding while your at it

Add search field priority to Elasticsearch (works for every Lucene based framework)

JPA with versioning and full text search – Mixing Hibernate Envers with elasticsearch

Recent Posts

Archives

Categories

Tags