Showing posts with label http. Show all posts
Showing posts with label http. Show all posts

Friday, December 19, 2025

DuckDB: very useful were least expected

If you haven't heard about DuckDB yet, you definitely have to check it out. It is fascinating piece of technology for quick data exploration and analysis. But today we are going to talk about somewhat surprising but exceptionally useful area where DuckDB could be tremendously helpful - dealing with chatty HTTP/REST services.

One of the best things about DuckDB is that it is just a single binary (per OS/arch), with no additional dependencies, so the installation process is a breath.

Let me set the stage here. I have been working with OpenSearch (and Elasticsearch) for years, those are great search engines with very reach HTTP/REST APIs. The production grade clusters constitute hundreds of nodes, and at this scale, mostly every single cluster wide HTTP/REST endpoint invocation returns unmanageable JSON blobs. Wouldn't it be cool to somehow transform such JSON blobs into structured, queryable form somehow? Like relational table for example and run SQL queries over it, without writing a single line of code? It is absolutely doable with DuckDB and its JSON Processing Functions.

As an exercise, we are going to play with Nodes API which returns a detailed per node response, following deep nested JSON structure:

{
  "cluster_name" : "...",
  "_nodes" : {
     ...
  },
  "nodes" : {
    <node1> : {
        ...
    },
    <node2> : {
        ...
    },
    ...
    <nodeN> : {
        ...
    }
  }

Ideally, what we want is to flatten this structure into a table where each row represents individual node and each JSON key becomes an individual column by itself. To put things in the context, each node structure has nested arrays and objects, we will not recursively traverse them (although it is possible but needs more complex transformations). With that, let us start our exploration journey!

The first step is the simplest: extract nodes collection of objects from the Nodes API response and just feed it directly into DuckDB.

$ curl "https://localhost:9200/_nodes?pretty" -u admin:<password> -k --raw -s | duckdb -c "
  WITH nodes AS (
    SELECT key as id, value FROM  read_json_auto('/dev/stdin') AS r, json_each(r, '$.nodes')
  )
  SELECT * FROM nodes"

We would get back something like this (the OpenSearch cluster I use for tests has only two nodes, hence we see only two rows):

┌──────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│          id          │                                                                 value                                                                  │
│       varchar        │                                                                  json                                                                  │
├──────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ YQzVQ_kHT-WnYTReh0…  │ {"name":"opensearch-node2","transport_address":"10.89.0.3:9300","host":"10.89.0.3","ip":"10.89.0.3","version":"3.0.0","build_type":"…  │
│ J9a5OM8STainCdkaLm…  │ {"name":"opensearch-node1","transport_address":"10.89.0.2:9300","host":"10.89.0.2","ip":"10.89.0.2","version":"3.0.0","build_type":"…  │
└──────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Although we could stop just here (since DuckDB lets you query over JSON values easily), we would like to do something useful with value blob: we want it to become a table with real columns. There are many ways that could be accomplished in DuckDB, some do require precise JSON structure (schema) to be provided, some require predefined list of keys upfront. We would stick to dynamic exploration instead and extract all keys from the actual JSON data.

$ curl "https://localhost:9200/_nodes?pretty" -u admin:<password> | duckdb -c "
  WITH nodes AS (
    SELECT key as id, value FROM  read_json_auto('/dev/stdin') AS r, json_each(r, '$.nodes')
  ),
  all_keys AS (
    SELECT distinct(unnest(json_keys(value))) AS key FROM nodes
  )
  SELECT * FROM all_keys"

In the version of the OpenSearch I am running, there are 22 unique keys (JSON field names) returned, an example of the output is below:

┌────────────────────────────────┐
│              key               │
│            varchar             │
├────────────────────────────────┤
│ plugins                        │
│ jvm                            │
│ host                           │
│ version                        │
│ build_hash                     │
│ ...                            │
│ modules                        │
│ build_type                     │
│ os                             │
│ transport                      │
│ search_pipelines               │
│ attributes                     │
├────────────────────────────────┤
│            22 rows             │
└────────────────────────────────┘

Good progress so far, but we need to go over the last mile and build a relational table out of these pieces. This is where DuckDB's powerful PIVOT statement comes in very handy.

$ curl "https://localhost:9200/_nodes?pretty" -u admin:<password> | duckdb -c "
  WITH nodes AS (
    SELECT key as id, value FROM  read_json_auto('/dev/stdin') AS r, json_each(r, '$.nodes')
  ),
  all_keys AS (
    SELECT distinct(unnest(json_keys(value))) AS key FROM nodes
  ),
  keys AS (
    SELECT * FROM all_keys WHERE key not in ['plugins', 'modules']
  )
  SELECT id, node.* FROM nodes, (PIVOT keys ON(key) USING first(json_extract(value, '$.' || key))) as node"

And here we are:

┌──────────────────────┬──────────────────────┬──────────────────────┬──────────────────────┬───┬──────────────────────┬──────────────────────┬──────────────────────┬───────────────────┬─────────┐
│          id          │     aggregations     │      attributes      │      build_hash      │ … │     thread_pool      │ total_indexing_buf…  │      transport       │ transport_address │ version │
│       varchar        │         json         │         json         │         json         │   │         json         │         json         │         json         │       json        │  json   │
├──────────────────────┼──────────────────────┼──────────────────────┼──────────────────────┼───┼──────────────────────┼──────────────────────┼──────────────────────┼───────────────────┼─────────┤
│ YQzVQ_kHT-WnYTReh0…  │ {"adjacency_matrix…  │ {"shard_indexing_p…  │ "dc4efa821904cc2d7…  │ … │ {"remote_refresh_r…  │ 53687091             │ {"bound_address":[…  │ "10.89.0.3:9300"  │ "3.0.0" │
│ J9a5OM8STainCdkaLm…  │ {"adjacency_matrix…  │ {"shard_indexing_p…  │ "dc4efa821904cc2d7…  │ … │ {"remote_refresh_r…  │ 53687091             │ {"bound_address":[…  │ "10.89.0.2:9300"  │ "3.0.0" │
├──────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┴───┴──────────────────────┴──────────────────────┴──────────────────────┴───────────────────┴─────────┤
│ 2 rows                                                                                                                                                                      21 columns (9 shown) │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

As we set for the goal, we have transformed each node from JSON to structured table. There is one subtle quirk to mention, the presence of the additional step to filter out plugins and modules fields from the transformations, DuckDB seems to have difficulties pivoting those:

Binder Error:
PIVOT is not supported in correlated subqueries yet

I hope you find it useful, typical enterprise grade HTTP/REST services often throw a pile of JSON at you, try to make sense of it!

I πŸ‡ΊπŸ‡¦ stand πŸ‡ΊπŸ‡¦ with πŸ‡ΊπŸ‡¦ Ukraine.

Thursday, December 26, 2024

Simple is finally easy: bootstrapping JAX-RS applications in Java SE environments

It has been a while since Jakarta EE 10 was released but the ecosystem is slowly (but steadily!) catching up. The Apache CXF project landed new 4.1.0 release very recently that delivers Jakarta EE 10 compatibility, specifically implementation of the Jakarta RESTful Web Services 3.1 specification (also known as JAX-RS).

One of the most exciting (in my option) features that Jakarta RESTful Web Services 3.1 includes is bootstrapping JAX-RS applications in Java SE environments. From now on, creating the full-fledged RESTful web services on the JVM becomes not only easy, but very straightforward! In today's post, we are going to build a sample RESTful web service and host it inside the Java SE application, with a catch - no boilerplate allowed.

The PeopleRestService, presented in the snippet below, is a minimalistic example of the typical Jakarta RESTful web service: for a sake of keeping things simple, it does not do anything useful besides returning the predefined data back.

import java.util.Collection;
import java.util.List;

import com.example.jakarta.restful.bootstrap.model.Person;

import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/people")
public class PeopleRestService {
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public Collection<Person> getPeople() {
        return List.of(new Person("[email protected]", "John", "Smith"));
    }
}

The Person class contains only three fields: email, firstName and lastName.

  
public class Person {
    private String email;
    private String firstName;
    private String lastName;
    // Skipping the getters and setters for brevity
}

Essentially, this is all we need at this point. Now the hardest part, how to expose the PeopleRestService to the outside world? Here is the moment for SeBootstrap to take the stage. Its entire purpose is allowing to startup a JAX-RS application in Java SE environments, without (explicitly) requiring the presence of the web container or application server. How does it look like in practice?

  
import java.util.Set;
import java.util.concurrent.CompletionStage;

import com.example.jakarta.restful.bootstrap.rs.PeopleRestService;

import jakarta.ws.rs.ApplicationPath;
import jakarta.ws.rs.SeBootstrap;
import jakarta.ws.rs.core.Application;

public class BootstrapRunner {
    @ApplicationPath("/api")
    public static final class JakartaRestfulApplication extends Application {
        @Override
        public Set<Object> getSingletons() {
            return Set.of(new PeopleRestService());
        }
    }

    public static void main(String[] args) {
        final SeBootstrap.Configuration configuration = SeBootstrap.Configuration
            .builder()
            .property(SeBootstrap.Configuration.PROTOCOL, "http")
            .property(SeBootstrap.Configuration.PORT, 10800)
            .property(SeBootstrap.Configuration.ROOT_PATH, "/")
            .build();
    
        SeBootstrap
            .start(new JakartaRestfulApplication(), configuration)
            .toCompletableFuture()
            .join();
    }
}

As simple as that: pass the port (10800), protocol (HTTP) and root path (/) through SeBootstrap.Configuration along with Application subclass (JakartaRestfulApplication) instance to SeBootstrap::start method. To complete the puzzle, here are all the dependencies that are required by our Java SE application (taken from project's Apache Maven pom.xml file).

  
<dependencies>
	<dependency>
		<groupId>org.apache.cxf</groupId>
		<artifactId>cxf-rt-frontend-jaxrs</artifactId>
		<version>4.1.0</version>
	</dependency>
	<dependency>
		<groupId>org.apache.cxf</groupId>
		<artifactId>cxf-rt-rs-extension-providers</artifactId>
		<version>4.1.0</version>
	</dependency>
	<dependency>
		<groupId>org.apache.cxf</groupId>
		<artifactId>cxf-rt-transports-http-jetty</artifactId>
		<version>4.1.0</version>
	</dependency>
	<dependency>
		<groupId>jakarta.json</groupId>
		<artifactId>jakarta.json-api</artifactId>
		<version>2.1.3</version>
	</dependency>
	<dependency>
		<groupId>jakarta.json.bind</groupId>
		<artifactId>jakarta.json.bind-api</artifactId>
		<version>3.0.1</version>
	</dependency>
	<dependency>
		<groupId>ch.qos.logback</groupId>
		<artifactId>logback-classic</artifactId>
		<version>1.5.15</version>
	</dependency>
	<dependency>
		<groupId>org.eclipse</groupId>
		<artifactId>yasson</artifactId>
		<version>3.0.4</version>
	</dependency>
</dependencies>

Nothing special, except may be Eclipse Yasson, the JSON-B implementation provider. It is time to make sure everything actually works!

$ mvn clean package

...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
...

$ java -jar target/cxf-jakarta-restful-3.1-bootstrap-0.0.1-SNAPSHOT.jar
Dec 24, 2024 2:43:45 P.M. org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://:10800/api
14:43:46.129 [onPool-worker-1] INFO rg.eclipse.jetty.server.Server - jetty-12.0.15; built: 2024-11-05T19:44:57.623Z; git: 8281ae9740d4b4225e8166cc476bad237c70213a; jvm 23.0.1+8-FR
14:43:46.288 [onPool-worker-1] INFO jetty.server.AbstractConnector - Started ServerConnector@7b66a8d{HTTP/1.1, (http/1.1)}{:10800}
14:43:46.302 [onPool-worker-1] INFO rg.eclipse.jetty.server.Server - Started oejs.Server@7683694f{STARTING}[12.0.15,sto=0] @1701ms
14:43:46.349 [onPool-worker-1] INFO .server.handler.ContextHandler - Started oeje10s.ServletContextHandler@4b04a638{ROOT,/,b=null,a=AVAILABLE,h=oeje10s.ServletHandler@23ae88f1{STARTED}}

With the application up and running, we are ready to invoke the http://localhost:10800/api/people HTTP endpoint (the only one our Jakarta RESTful web service exposes).

$ curl http://localhost:10800/api/people -iv
* Host localhost:10800 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:10800...
*   Trying 127.0.0.1:10800...
* Connected to localhost (127.0.0.1) port 10800
* using HTTP/1.x
> GET /api/people HTTP/1.1
> Host: localhost:10800
> User-Agent: curl/8.11.0
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Server: Jetty(12.0.15)
Server: Jetty(12.0.15)
< Date: Tue, 24 Dec 2024 19:52:10 GMT
Date: Tue, 24 Dec 2024 19:52:10 GMT
< Content-Type: application/json
Content-Type: application/json
< Transfer-Encoding: chunked
Transfer-Encoding: chunked
<

[{"email":"[email protected]","firstName":"John","lastName":"Smith"}]

And here we are, we get the response with our hardcoded list of people, no surprises! I hope you would agree, the bootstrapping process is very simple and easy to follow. Even more, you could integrate SeBootstrap in your test suites as well, thanks to its flexible configuration capabilities, for example:

  
final SeBootstrap.Configuration configuration = SeBootstrap.Configuration
    .builder()
    // Use random free port
    .property(SeBootstrap.Configuration.PORT, SeBootstrap.Configuration.FREE_PORT)
    ...
    .build();

// Start the instance
final Instance instance = SeBootstrap
    .start(new JakartaRestfulApplication(), configuration)
    .toCompletableFuture()
    .join();
    
final SeBootstrap.Configuration actual = instance.configuration();
// Use actual.port(), actual.host(), ...
...

// Stop the instance
instance
    .stop()
    .toCompletableFuture()
    .join();

It is worth to mention that bootstrapping secure Jakarta RESTful Web Services using HTTPS protocol is also supported, for example:

 
final SeBootstrap.Configuration configuration = SeBootstrap.Configuration
    .builder()
    .property(SeBootstrap.Configuration.PROTOCOL, "https")
    .property(SeBootstrap.Configuration.PORT, 10843)
    .property(SeBootstrap.Configuration.ROOT_PATH, "/")
    .sslContext(SSLContext.getDefault()) /* or supply your own */
    .build();

The complete source code of the project is available on Github.

I πŸ‡ΊπŸ‡¦ stand πŸ‡ΊπŸ‡¦ with πŸ‡ΊπŸ‡¦ Ukraine.

Tuesday, August 30, 2022

Quick, somewhat naΓ―ve but still useful: benchmarking HTTP services from the command line

Quite often, while developing HTTP services, you may find yourself looking for a quick and easy way to throw some load at them. The tools like Gatling and JMeter are the golden standard in the open-source world, but developing meaningful load test suites in both of them may take some. It is very likely you are going to end up with one of these but during the development, using more approachable tooling gives the invaluable feedback much faster.

The command line HTTP load testing tools is what we are going to talk about. There are a lot of them, but we will focus on just a few: ab, vegeta, wrk, wrk2, and rewrk. And since HTTP/2 is getting more and more traction, the tools that support both will be highlighted and awarded with bonus points. The sample HTTP service we are going to run tests against is exposing only single GET /services/catalog endpoint over HTTP/1.1, HTTP/2 and HTTP/2 over clear text.

Let us kick off with ab, the Apache HTTP server benchmarking tool: one of the oldest HTTP load testing tools out there. It is available on most Linux distributions and supports only HTTP/1.x (in fact, it does not implement HTTP/1.x fully). The standard set of parameters like desired number of requests, concurrency and timeout is supported.

$> ab -n 1000 -c 10 -s 1 -k  http://localhost:19091/services/catalog

This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
...
Completed 1000 requests
Finished 1000 requests

Server Software:
Server Hostname:        localhost
Server Port:            19091

Document Path:          /services/catalog
Document Length:        41 bytes

Concurrency Level:      10
Time taken for tests:   51.031 seconds
Complete requests:      1000
Failed requests:        0
Keep-Alive requests:    0
Total transferred:      146000 bytes
HTML transferred:       41000 bytes
Requests per second:    19.60 [#/sec] (mean)
Time per request:       510.310 [ms] (mean)
Time per request:       51.031 [ms] (mean, across all concurrent requests)
Transfer rate:          2.79 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       5
Processing:     3  498 289.3    497    1004
Waiting:        2  498 289.2    496    1003
Total:          3  499 289.3    497    1004

Percentage of the requests served within a certain time (ms)
  50%    497
  66%    645
  75%    744
  80%    803
  90%    914
  95%    955
  98%    979
  99%    994
 100%   1004 (longest request)

The report is pretty comprehensive but if your service is talking over HTTP/2 or is using HTTPS with self-signed certificates (not unusual in development), you are out of luck.

Let us move on to a bit more advanced tools, or to be precise - a family of tools, inspired by wrk: a modern HTTP benchmarking tool. There are no official binary releases of wrk available so you are better off building the bits from the sources yourself.

$> wrk -c 50 -d 10 -t 5 --latency --timeout 5s https://localhost:19093/services/catalog

Running 10s test @ https://localhost:19093/services/catalog
  5 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   490.83ms  295.99ms   1.04s    57.06%
    Req/Sec    20.79     11.78    60.00     81.29%
  Latency Distribution
     50%  474.64ms
     75%  747.03ms
     90%  903.44ms
     99%  999.52ms
  978 requests in 10.02s, 165.23KB read
Requests/sec:     97.62
Transfer/sec:     16.49KB

Capability-wise, it is very close to ab with slightly better HTTPS support. The report is as minimal as it could get, however the distinguishing feature of the wrk is the ability to use LuaJIT scripting to perform HTTP request generation. No HTTP/2 support though.

wrk2 is an improved version of wrk (and is based mostly on its codebase) that was modified to produce a constant throughput load and accurate latency details. Unsurprisingly, you have to build this one from the sources as well (and, the binary name is kept as wrk).

$> wrk -c 50 -d 10 -t 5 -L -R 100 --timeout 5s https://localhost:19093/services/catalog

Running 10s test @ https://localhost:19093/services/catalog
  5 threads and 50 connections
  Thread calibration: mean lat.: 821.804ms, rate sampling interval: 2693ms
  Thread calibration: mean lat.: 1077.276ms, rate sampling interval: 3698ms
  Thread calibration: mean lat.: 993.376ms, rate sampling interval: 3282ms
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   933.61ms  565.76ms   3.35s    70.42%
    Req/Sec       -nan      -nan   0.00      0.00%
  Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%  865.79ms
 75.000%    1.22s
 90.000%    1.69s
 99.000%    2.61s
 99.900%    3.35s
 99.990%    3.35s
 99.999%    3.35s
100.000%    3.35s

  Detailed Percentile spectrum:
       Value   Percentile   TotalCount 1/(1-Percentile)

      28.143     0.000000            1         1.00
     278.015     0.100000           36         1.11
     426.751     0.200000           71         1.25
                       ....
    2969.599     0.996875          354       320.00
    3348.479     0.997266          355       365.71
    3348.479     1.000000          355          inf
#[Mean    =      933.606, StdDeviation   =      565.759]
#[Max     =     3346.432, Total count    =          355]
#[Buckets =           27, SubBuckets     =         2048]
----------------------------------------------------------
  893 requests in 10.05s, 150.87KB read
Requests/sec:     88.85
Transfer/sec:     15.01KB

Besides these noticeable enhancements, the feature set of wrk2 is largely matching the wrk's one, so HTTP/2 is out of the picture. But do not give up just yet, we are not done.

The most recent addition to wrk's family is rewrk, a more modern HTTP framework benchmark utility, which could be thought of as wrk rewritten in beloved Rust with HTTP/2 support backed in.

$> rewrk -c 50 -d 10s -t 5 --http2 --pct --host https://localhost:19093/services/catalog

Beginning round 1...
Benchmarking 50 connections @ https://localhost:19093/services/catalog for 10 second(s)
  Latencies:
    Avg      Stdev    Min      Max
    494.62ms  281.78ms  5.56ms   1038.28ms
  Requests:
    Total:   972   Req/Sec:  97.26
  Transfer:
    Total: 192.20 KB Transfer Rate: 19.23 KB/Sec
+ --------------- + --------------- +
|   Percentile    |   Avg Latency   |
+ --------------- + --------------- +
|      99.9%      |    1038.28ms    |
|       99%       |    1006.26ms    |
|       95%       |    975.33ms     |
|       90%       |    942.94ms     |
|       75%       |    859.73ms     |
|       50%       |    737.45ms     |
+ --------------- + --------------- +

As you may have noticed, the report is very similar to the one produced by wrk. In my opinion, this is a tool which has a perfect balance of features, simplicity and insights, at least while you are in the middle of development.

The last one we are going to look at is vegeta, the HTTP load testing tool and library, written in Go. It supports not only HTTP/2, but HTTP/2 over clear text and has powerful reporting built-in. It heavily uses pipes for composing different steps together, for example:

$> echo "GET https://localhost:19093/services/catalog" | 
   vegeta attack -http2 -timeout 5s -workers 10 -insecure -duration 10s | 
   vegeta report

Requests      [total, rate, throughput]         500, 50.10, 45.96
Duration      [total, attack, wait]             10.88s, 9.98s, 899.836ms
Latencies     [min, mean, 50, 90, 95, 99, max]  14.244ms, 529.886ms, 537.689ms, 918.294ms, 962.448ms, 1.007s, 1.068s
Bytes In      [total, mean]                     20500, 41.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:500
Error Set:

You could also ask for histograms (with customizable reporting buckets), not to mention beautiful plots with plot command:

$> echo "GET http://localhost:19092/services/catalog" | 
   vegeta attack -h2c -timeout 5s -workers 10 -insecure -duration 10s | 
   vegeta report  -type="hist[0,200ms,400ms,600ms,800ms]"

Bucket           #    %       Histogram
[0s,     200ms]  86   17.20%  ############
[200ms,  400ms]  102  20.40%  ###############
[400ms,  600ms]  98   19.60%  ##############
[600ms,  800ms]  113  22.60%  ################
[800ms,  +Inf]   101  20.20%  ###############

Out of all above, vegeta is clearly the most powerful tool in terms of capabilities and reporting. It has all the chances to become the one-stop HTTP benchmarking harness even for production.

So we walked through a good number of tools, which one is yours? My advice would be: for HTTP/1.x, use ab; for basic HTTP/2 look at rewrk; and if none of these fit, turn to vegeta. Once your demands and sophistication of the load tests grow, consider Gatling or JMeter.

With that, Happy HTTP Load Testing!

Sunday, June 17, 2018

In the shoes of the consumer: do you really need to provide the client libraries for your APIs?

The beauty of the RESTful web services and APIs is that any consumer which speaks HTTP protocol will be able to understand and use it. Nonetheless, the same dilemma pops up over and over again: should you accompany your web APis with the client libraries or not? If yes, what languages or/and frameworks should you support?

Quite often this is not really an easy question to answer. So let us take a step back and think about the overall idea for a moment: what are the values which client libraries may bring to the consumers?

Someone may say to lower the barrier for adoption. Indeed, specifically in the case of strongly typed languages, exploring the API contracts from your favorite IDE (syntax highlighting and auto-completion please!) is quite handy. But by and large, the RESTful web APIs are simple enough to start with and good documentation would be certainly more valuable here.

Others may say it is good to shield the consumers from dealing with multiple API versions or rough edges. Also kind of make sense but I would argue it just hides the flaws with the way the web APIs in question are designed and evolve over time.

All in all, no matter how many clients you decide to bundle, the APIs are still going to be accessible by any generic HTTP consumer (curl, HttpClient, RestTemplate, you name it). Giving a choice is great but the price to pay for maintenance could be really high. Could we do it better? And as you may guess already, we certainly have quite a few options hence this post.

The key ingredient of the success here is to maintain an accurate specification of your RESTful web APIs, using OpenAPI v3.0 or even its predecessor, Swagger/OpenAPI 2.0 (or RAML, API Blueprint, does not really matter much). In case of OpenAPI/Swagger, the tooling is the king: one could use Swagger Codegen, a template-driven engine, to generate API clients (and even server stubs) in many different languages, and this is what we are going to talk about in this post.

To simplify the things, we are going to implement the consumer of the people management web API which we have built in the previous post. To begin with, we need to get its OpenAPI v3.0 specification in the YAML (or JSON) format.

java -jar server-openapi/target/server-openapi-0.0.1-SNAPSHOT.jar

And then:

wget http://localhost:8080/api/openapi.yaml

Awesome, the half of the job is done, literally. Now, let us allow Swagger Codegen to take a lead. In order to not complicate the matter, let's assume that consumer is also Java application, so we could understand the mechanics without any difficulties (but Java is only one of the options, the list of supported languages and frameworks is astonishing).

Along this post we are going to use OpenFeign, one of the most advanced Java HTTP client binders. Not only it is exceptionally simple to use, it offers quite a few integrations we are going to benefit from soon.

<dependency>
    <groupId>io.github.openfeign</groupId>
    <artifactId>feign-core</artifactId>
    <version>9.7.0</version>
</dependency>

<dependency>
    <groupId>io.github.openfeign</groupId>
    <artifactId>feign-jackson</artifactId>
    <version>9.7.0</version>
</dependency>

The Swagger Codegen could be run as stand-alone application from command-line, or Apache Maven plugin (the latter is what we are going to use).

<plugin>
    <groupId>io.swagger</groupId>
    <artifactId>swagger-codegen-maven-plugin</artifactId>
    <version>3.0.0-rc1</version>
    <executions>
        <execution>
            <goals>
                <goal>generate</goal>
            </goals>
            <configuration>
                <inputSpec>/contract/openapi.yaml</inputSpec>
                <apiPackage>com.example.people.api</apiPackage>
                <language>java</language>
                <library>feign</library>
                <modelPackage>com.example.people.model</modelPackage>
                <generateApiDocumentation>false</generateApiDocumentation>
                <generateSupportingFiles>false</generateSupportingFiles>
                <generateApiTests>false</generateApiTests>
                <generateApiDocs>false</generateApiDocs>
                <addCompileSourceRoot>true</addCompileSourceRoot>
                <configOptions>
                    <sourceFolder>/</sourceFolder>
                    <java8>true</java8>
                    <dateLibrary>java8</dateLibrary>
                    <useTags>true</useTags>
                </configOptions>
            </configuration>
        </execution>
    </executions>
</plugin>

If some of the options are not very clear, the Swagger Codegen has pretty good documentation to look for clarifications. The important ones to pay attention to is language and library, which are set to java and feign respectively. The one thing to note though, the support of the OpenAPI v3.0 specification is mostly complete but you may encounter some issues nonetheless (as you noticed, the version is 3.0.0-rc1).

What you will get when your build finishes is the plain old Java interface, PeopleApi, annotated with OpenFeign annotations, which is direct projection of the people management web API specification (which comes from /contract/openapi.yaml). Please notice that all model classes are generated as well.

@javax.annotation.Generated(
    value = "io.swagger.codegen.languages.java.JavaClientCodegen",
    date = "2018-06-17T14:04:23.031Z[Etc/UTC]"
)
public interface PeopleApi extends ApiClient.Api {
    @RequestLine("POST /api/people")
    @Headers({"Content-Type: application/json", "Accept: application/json"})
    Person addPerson(Person body);

    @RequestLine("DELETE /api/people/{email}")
    @Headers({"Content-Type: application/json"})
    void deletePerson(@Param("email") String email);

    @RequestLine("GET /api/people/{email}")
    @Headers({"Accept: application/json"})
    Person findPerson(@Param("email") String email);

    @RequestLine("GET /api/people")
    @Headers({"Accept: application/json"})
    List<Person> getPeople();
}

Let us compare it with Swagger UI interpretation of the same specification, available at http://localhost:8080/api/api-docs?url=/api/openapi.json:

It looks right at the first glance but we have better ensure things work out as expected. Once we have OpenFeign-annotated interface, it could be made functional (in this case, implemented through proxies) using the family of Feign builders, for example:

final PeopleApi api = Feign
    .builder()
    .client(new OkHttpClient())
    .encoder(new JacksonEncoder())
    .decoder(new JacksonDecoder())
    .logLevel(Logger.Level.HEADERS)
    .options(new Request.Options(1000, 2000))
    .target(PeopleApi.class, "http://localhost:8080/");

Great, fluent builder style rocks. Assuming our people management web APIs server is up and running (by default, it is going to be available at http://localhost:8080/):

java -jar server-openapi/target/server-openapi-0.0.1-SNAPSHOT.jar

We could communicate with it by calling freshly built PeopleApi instance methods, as in the code snippet below.:

final Person person = api.addPerson(
        new Person()
            .email("[email protected]")
            .firstName("John")
            .lastName("Smith"));

It is really cool, if we rewind it back a bit, we actually did nothing. Everything is given to us for free with only web API specification available! But let us not stop here and remind ourselves that using Java interfaces will not eliminate the reality that we are dealing with remote systems. And things are going to fail here, sooner or later, no doubts.

Not so long ago we have learned about circuit breakers and how useful they are when properly applied in the context of distributed systems. It would be really awesome to somehow introduce this feature into our OpenFeign-based client. Please welcome another member of the family, HystrixFeign builder, the seamless integration with Hytrix library:

final PeopleApi api = HystrixFeign
    .builder()
    .client(new OkHttpClient())
    .encoder(new JacksonEncoder())
    .decoder(new JacksonDecoder())
    .logLevel(Logger.Level.HEADERS)
    .options(new Request.Options(1000, 2000))
    .target(PeopleApi.class, "http://localhost:8080/");

The only thing we need to do is just to add these two dependencies (strictly speaking hystrix-core is not really needed if you do not mind to stay on older version) to consumer's pom.xml file.

<dependency>
    <groupId>io.github.openfeign</groupId>
    <artifactId>feign-hystrix</artifactId>
    <version>9.7.0</version>
</dependency>

<dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-core</artifactId>
    <version>1.5.12</version>
</dependency>

Arguably, this is one of the best examples of how easy and straightforward integration could be. But even that is not the end of the story. Observability in the distributed systems is as important as never and as we have learned a while ago, distributed tracing is tremendously useful in helping us out here. And again, OpenFeign has support for it right out of the box, let us take a look.

OpenFeign fully integrates with OpenTracing-compatible tracer. The Jaeger tracer is one of those, which among other things has really nice web UI front-end to explore traces and dependencies. Let us run it first, luckily it is fully Docker-ized.

docker run -d -e \
    COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
    -p 5775:5775/udp \
    -p 6831:6831/udp \
    -p 6832:6832/udp \
    -p 5778:5778 \
    -p 16686:16686 \
    -p 14268:14268 \
    -p 9411:9411 \
    jaegertracing/all-in-one:latest

A couple of additional dependencies have to be introduced in order to OpenFeign client to be aware of the OpenTracing capabilities.

<dependency>
    <groupId>io.github.openfeign.opentracing</groupId>
    <artifactId>feign-opentracing</artifactId>
    <version>0.1.0</version>
</dependency>

<dependency>
    <groupId>io.jaegertracing</groupId>
    <artifactId>jaeger-core</artifactId>
    <version>0.29.0</version>
</dependency>

From the Feign builder side, the only change (besides the introduction of the tracer instance) is to wrap up the client into TracingClient, like the snippet below demonstrates:

final Tracer tracer = new Configuration("consumer-openapi")
    .withSampler(
        new SamplerConfiguration()
            .withType(ConstSampler.TYPE)
            .withParam(new Float(1.0f)))
    .withReporter(
        new ReporterConfiguration()
            .withSender(
                new SenderConfiguration()
                    .withEndpoint("http://localhost:14268/api/traces")))
    .getTracer();
            
final PeopleApi api = Feign
    .builder()
    .client(new TracingClient(new OkHttpClient(), tracer))
    .encoder(new JacksonEncoder())
    .decoder(new JacksonDecoder())
    .logLevel(Logger.Level.HEADERS)
    .options(new Request.Options(1000, 2000))
    .target(PeopleApi.class, "http://localhost:8080/");

On the server-side we also need to integrate with OpenTracing as well. The Apache CXF has first-class support for it, bundled into cxf-integration-tracing-opentracing module. Let us include it as the dependency, this time to server's pom.xml.

<dependency>
    <groupId>org.apache.cxf</groupId>
    <artifactId>cxf-integration-tracing-opentracing</artifactId>
    <version>3.2.4</version>
</dependency>

Depending on the way your configure your applications, there should be an instance of the tracer available which should be passed later on to the OpenTracingFeature, for example.

// Create tracer
final Tracer tracer = new Configuration(
        "server-openapi", 
        new SamplerConfiguration(ConstSampler.TYPE, 1),
        new ReporterConfiguration(new HttpSender("http://localhost:14268/api/traces"))
    ).getTracer();

// Include OpenTracingFeature feature
final JAXRSServerFactoryBean factory = new JAXRSServerFactoryBean();
factory.setProvider(new OpenTracingFeature(tracer()));
...
factory.create()

From now on, the invocation of the any people management API endpoint through generated OpenFeign client will be fully traceable in the Jaeger web UI, available at http://localhost:16686/search (assuming your Docker host is localhost).

Our scenario is pretty simple but imagine the real applications where dozen of external service calls could happen while the single request travels through the system. Without distributed tracing in place, every issue has a chance to turn into a mystery.

As a side note, if you look closer to the trace from the picture, you may notice that server and consumer use different versions of the Jaeger API. This is not a mistake since the latest released version of Apache CXF is using older OpenTracing API version (and as such, older Jaeger client API) but it does not prevent things to work as expected.

With that, it is time to wrap up. Hopefully, the benefits of contract-based (or even better, contract-first) development in the world of RESTful web services and APIs become more and more apparent: generation of the smart clients, consumer-driven contract test, discoverability and rich documentation are just a few to mention. Please make use of it!

The complete project sources are available on Github.