Variant Server 0.10 Benchmark

Variant Blog ⟫ Variant Server 0.10 Benchmark Results

Variant Server 0.10 Benchmark

By Igor Urisman. February 22, 2019.

Introduction

These days, most companies never run benchmarks; their software is never run on a customer’s system, so why bother. Not so for Variant: our customers run Variant server on their own instances. Recently, we conducted the first systematic benchmark of Variant Experience Server. Although we had always been quite confident in its capabilities, given the highly performant underlying Akka stack, but it had got to where prospective customers were asking for it.

Naturally, we chose the Amazon Web Services infrastructure for this benchmark, given that many of our existing and prospective customers run there, its ubiquity, and the fact that we already have a public Amazon machine image.

We have run three series of runs, each series against a server running on a particular instance type, but varying the client load with the goal to identify the maximum server throughput on that instance type. All server and client configurations ran on Java 8 runtime on Ubuntu 18.04 Linux.

Each benchmark run went through the following steps:

To remove the time variability associated with instance startup and ensure that all clients started at the same time, each client started by blocking, waiting for a “green flag” message on an SQS queue. The control script enqueued the green flag message after all clients were up and running.

Server Setup

The server instances were bootstrapped from the public ami-0c638a830551f3ac7 on the following instance types:

Server Instance Type vCPUs Memory, GiB Network EC2 Price per Hour
T2.micro 1 1 Low to Moderate $.0116
M5.large 2 8 Up to 10 Gigabit $.096
M5.xlarge 4 16 Up to 10 Gigabit $.192

Variant server was started in the default configuration, straight out of the box, with the only exception: an unpublished config parameter was set which caused the server to add server elapsed time to each response.

Client Setup

Each client spawns p number of threads, where p is the number of available processors, as returned by Runtime.getRuntime.availableProcessors. Each thread runs through a circular, never ending series of steps, each of which can be one of the following:

val OP_CREATE_SESSION = "Create Session"
val OP_TARGET_SESSION = "Target Session"
val OP_COMMIT_REQUEST = "Commit State Request"
val OP_FAIL_REQUEST   = "Fail State Request"
val OP_READ_ATTR      = "Read Session Attribute"
val OP_WRITE_ATTR     = "Write Session Attribute"

As soon as each step is completed, the next step is taken. A thread does not pause between steps and threads do not synchronize with each other in any way. Еach client accumulates measurements as it runs through the steps and, after the run is completed, srecords the following aggregate data to a DynamoDB table:

  • The average server elapsed time for each step, excluding the socket queuing time.
  • The average client elapsed time for each step, including network related latencies, but excluding the above server elapsed time.

These sum of these two numbers is the overall round trip duration, as seen by the client code.

The Numbers

Four independent runs were made against the T2.Micro server with a progressively greater degree of client parallelism. The server throughput plateaued at 224 requests per second and the server side processing time remained near constant, even after reaching saturation, at around 0.2 milliseconds. The overall round-trip time remained under 4 milliseconds, most of which is attributable to the low network speed.

Four independent runs were made against the M5.Large server with a progressively greater degree of client parallelism. The server throughput plateaued at 943 requests per second and the server side processing time remained near constant, even after reaching saturation, at around 0.2 milliseconds. The overall round-trip time rose from under 3 milliseconds at low client parallelism to over 21 milliseconds at high client parallelism, most of which is attributable to network queuing by the server. At 748 requests per second (75% of maximum), the overall client round-trip time was only 3.5 milliseconds.

Four independent runs were made against the M5.XLarge server with a progressively greater degree of client parallelism. The server throughput plateaued at 1930 requests per second and the server side processing time remained near constant, even after reaching saturation, at around 0.2 milliseconds. The overall round-trip time rose from under 2 milliseconds at low client parallelism to 21 milliseconds at high client parallelism, most of which is attributable to network queuing by the server. At 1496 requests per second (78% of maximum), the overall client round-trip time was only 2.5 milliseconds.

Conclusions

The maximum throughput of Variant server reached near 2,000 client requests per second on the relatively modest 4-vCPU M5.xLarge instance. More importantly, maximum server throughput scaled very well with the number of available vCPUs, as shown in Figure 4 below.

The average server-side elapsed time remained constant at near 0.2 milliseconds.