Variant AIM Server User Guide

Variant AIM Server User Guide

Variant AIM ServerResourcesDocumentation0.10AIM Server ⟫ User Guide

Variant Application Iteration Server User Guide

Release 0.10.2, October 2019

Related documentation: Variant API JavaDoc | Variant Server Reference

1Variant Server Overview

1.1Application Iteration Management (AIM)

Software application development is accelerating. Many leading teams releaese new code continuously, with each independently deployable code delta being released as soon as it’s ready, unbundled from other such deltas, — sometimes multiple times per second. In such dymanic operational environment, code variations play instrumental role in de-risking the SDLC. A code variation is when one or more alternate code paths, intended to co-exist with an existing code path, must be provided by an application. There are several use cases which call for the instrumentation of code variations, which are described below.

1.1.1Online Controlled Experiments

In an online controlled experiment, the candidate user experience(s) are validated against the existing experience in the form of a randomized controlled trial. In such experiments, the existing experience serves as control and the candidate experience(s) as treatment(s). User traffic is split randomly (though not necessarily equally) between all the experiences, so that any observed difference between the experiences with respect to some metric can be interpreted as being caused by the difference in treatment. The experiment is run for as long as it takes for the measurements to reach statistical significance — a mathematical term, meaning that enough traffic has passed through the experience to provide a degree of confindence that the observed difference is not likely to be attributable to chance alone.

For example, you may want to run an experiment to find out the optimal order amount which entitles your customer to free shipping. In such an experiment you offer several experiences, each promoting a different minimal order amount and target your user traffic to these experiences randomly. As your customers pass through these experiences you can compare the revenue lift your offer of free shipping has generated.

Note, that in the case of online controlled experiences, session targeting must be random, if you are going to be able to interpret correlation as causation, because the ramdomness is a natural control for everything other than the difference in user experience. Refer to the Appendix A for further details on statistical analysis of Variant experiments.

1.1.2Managed Feature Roll-Outs

The other use case for code variations is feature flags. They refer to a software delivery practice, where a new product feature is rolled out gradually to a carefully controlled group of customers before it is made generally available. Whenever you roll out a new product feature, a feature flag enables you to first publish it to a limited population of users, while sending all others into the stable existing experience. If all goes well, you gradually increase traffic into the new code path until you reach full production, at which point the existing code path can be discarded. But if a defect is discovered, the new feature can be temporarily toggled off until the problem is fixed.

In contrast with online controlled experiments, when instrumenting feature flags you will likely use some deterministic targeting rules for your user traffic. For example you may want to start by allowing into the new code users by their Zip code, or customers by their organization ID.

1.1Key Features

1.2.1Client-Server Architecture

Variant AIM Server is deployed on the same network as the host application(s), either on premises or in the cloud, facilitating low network latency and proximity to operational data. The server manages variation schemata, which contain the code variation metadata. Each variation schema is a JSON-superset human readable file containing complete definitions of all related code variations. A single server instance can manage an unlimited number of variation schemata.

Each component of the host application that needs to participate in an code variation, communicates with Variant server via a native client library. At the time of this writing the following Variant client libraries are available:

Java Fully functional Variant client with complete support for all Variant server functionality. Any component of the host application written in Java or another JVM language can integrate with this client. Several higher-level adapters are also available to take advantage of a particular interactive framework, e.g. the servlet adapter.
JavaScript Partial Variant client supporting remote trace events from a Web browser.

In addition to the client API, Variant server also exposes server side Extension API, used to extend the server’s default behavior with custom semantics.

1.2.2Separation of Instrumenttion and Implementation

At the core of Variant’s philosphy is the idea of strict separation between variation instrumentation and experience implementation. Variation schemata enable developers to define variations as abstract ideas about the behavior of the host application, leaving the implementation of that behavior out, i.e. up to the application developer.

The application developer uses familiar tools to implement new application behavior, unconcerned with how it will be instrumented as code variations. Variant server, on the other hand, handles the complexity of managing code variations, as defined in variation schemata, hiding enormous amounts of complexity from the application developer.

This clean separation dramatically reduces the amount of client code the application developer must write (and, in most cases, remove), in order to instrument code variations.

1.2.3Distributed Session Management

Variant maintains its own user sessions, instead of relying on the native sessions, maintained by the host application. Variant user sessions are distributed, with Variant server managing their shared state. Any component of the host application, connected to a Variant server, can get a hold of a user session by its ID. Variant guarantees a consistent view of the session’s shared state to all concurrent clients.

1.2.4Concurrent Variations

Variation concurrency refers to those cases when different code variations affect one or more code segments. Concurrent code variations are more likely than it may first seem because of the Pareto principle, which, as applied to interactive computer applications, states that your users spend 80% of their time on 20% of your application’s code. These few high-contention code paths will be instrumented by multiple concurrent experiments and features rollouts, and Variant gives you a cogent abstraction to manage this concurrency.

1.2.5Targeting and Qualification Longevity

Once a user session has been targeted for an experience, it is typically desirable that he continues to see the same experience, i.e. that the targeting remains stable over time. Variant supports three lengevity levels: unstable, stable and durable, which refer, respectively, to longevity guarantees of none (the session may be (re)targeted multiple times), session-scoped (the session will be targeted only once and the targeting will remain in effect for the entire duration of the session), and permanent (a recoginzed return user will see the same experience as on his previous visit).

This longevity model applies consistently to both session targeting and session qualification.

1.2.6Extensibility

Variant AIM Server’s default behavior can be extended via the server-side Extension API. It supports creation and configuration of user code which runs in the server’s address space, augmenting the server’s default behavior with custom semantics. ExtAPI exposes two principal extension mechanisms: lifecycle hooks and trace event flushers. Lifecycle hooks are listeners for various lifecycle events raised by Variant server, such as the session qualification or session targeting events. They are configured in the variation schema and are made available to the server at run time via the /ext directory. Lifecycle hooks can be chained to help you modularize and reuse your code.

Event flushers handle the terminal ingestion of Variant trace events. A few standard event flushers, intended for saving trace events in popular databases, such as PostgreSQL and MySQL, come with Variant server, but you may want to create your own, suitable for your operating environment — it is simply a matter of implementing a Java interface.

2Variant Architecture

2.1Overview

Variant AIM Server handles all the work related to managing code variations. Host applications accesses it via native client libraries, suitable for their language.

Variant server is deployed on the network local to the host application(s) and the operational database, facilitating low network latency and real-time integration with the host application’s operational data. This architecture is particularly attractive to modern distributed applications which are comprised of multiple service components. Each component communicates with Variant server independently, with the server responsible for the maintenance of shared session state.

The following diagram presents a high-level overview of the different components of Variant software platform:

 

[Todo: Add LevelDB]

2.2Server Configuration

Variant server is configured using the Lightbend Config library . At startup, Variant server looks for configuration in the file conf/variant.conf. If it is found, its contents override the defaults. You may further override this configuration at run time by providing an alternative configuration file, or override an individual config key in a JVM system property.

Refer to the Variant AIM Server Reference for further information.

2.3Integration With the Host Application

A host application communicates with Variant server via a native client library, supplied by Variant and suitable for the application’s language. Variant release 0.10 ships with a fully functional Java client and a partial JavaScript client, suitable for deployment to Web browsers.

2.3.1Variant Java Client

Variant Java client is consumable by any host application running on a Java Virtual Machine (JVM) release 8 or later. It makes no assumptions about host application’s other technology details, which makes it universally applicable to any interactive JVM host application, e.g. an HTTP server or an IVR call center. This flexibility, inevitably, comes at the expense of some runtime environment dependencies, which had to be abstracted out and surfaced in the client’s API, such as a mechanism to track Variant session ID. These dependencies have to be provided at runtime.

Most JVM Web applications are written on top of a Web framework, like Java Servlets or Play!. Such applications should take advantage of the available Servlet adapter for Variant Java client or Play! adapter for Variant Java client. (Get in touch if you want to contribute a different adapter.) These adapters wrap Variant Java client API with a functionally equivalent API, which re-writes environment-dependent method signatures in terms of particular framework classes, such as javax.servlet.http.HttpServletRequest and play.mvc.Http.Request and provides framework-specific implementations of all environment-dependent objects.

Refer to the Variant Java Client User Guide for further information.

2.3.2Variant JavaScript Client

Variant.js supports triggering of trace events from a Web browser environment. For more information, refer to the Variant JavaScript Client User Guide.

3Code Variation Model (CVM)

3.1Introduction

3.1.1Interactive Application as a Finate State Machine

The only assumption Variant makes about the host application is that it is interactive, i.e. responds to real time user input. Its control flow cycles between two states, as illustrated in Figure 2 below: the processing state, where the application reads, processes and responds to user’s input, and the interface state, where the application waits for it.


Irrespective of the user interface mechanism, the host application pauses in an interface state while waiting for user’s response. Interface states render some response from the application and provide the means for the user to respond to that response. Depending on the type of the host application, an interface state may be manifested as a computer desktop window (desktop application, e.g. MFC), an HTML page (Web application), an activity (Android mobile app), a phone menu (an IVR application), an XML document (RESTful API), etc. These details are not relevant to the CVM.

A user experience then can be thought of as a traversal of a set of interface states, e.g. transitions from one Web page or one telephone menu to the next. The order of traversal is not important to the CVM and is left up to the host application.

3.1.2Code Variations

Suppose now that the some interface state(s) exist in more than one variation: the base state and one or more state variants, which the host application may choose from, in place of the base state. The control user experience then is one that traverses the base states, while a variant user experience is one that traverses variant states.

The control experience and one or more related variant experiences form an code variation. A feature toggle or an online experiment are both examples of code variations: they comprise the control experience (the current code path) and one or more variant experiences (new code paths), which will co-exist for a time.

3.1.3Variant State Request

Whenever the host application is in a processing state, it must decide what next interface state to present to the user, Figure 3 below:

Variant State Request

In the regular, uninstrumented case (A), the application simply figures out the next state based on the user’s input, carries out requisite computations, renders the state to the user, and pauses. However, if the next state is instrumented by one or more code variations (B), the host application has a set of additional state variants it can choose from. It is exactly this task of figuring out the particular state variant that the host application delegates to Variant server, just like it delegates the task of storing data on disk to a database server.

The step of the processing state where the host application turns to Variant for targeting, followed by the step where the host application carries out the computations needed for rendering of the targeted state variant is called a state request. A Variant session is, in the nutshell, a succession of state requests plus the common state, preserved between the requests.

3.1.4Modeling Code Variations

Code Variation Model (CVM) is a domain model for code variations. It offers a formal framework for defining code variations and for reasoning about them. Its key practical benefit is that it provides a way to externalize the metadata for a set of related code variations in a human-readable documents called variation schemata. Schemata are managed centrally and externally of the host application(s) by Variant server.

This, in turn, enables these two important benefits:

  • Separation of instrumentation from implementation.
    Variation schemata enable developers to define variations as abstract ideas about the behavior of the host application, leaving the implementation of that behavior out and up to the application developer. The application developer uses familiar tools to implement new application behavior, unconcerned with how it will be instrumented as code variations. Variant server, on the other hand, handles the complexity of managing code variations, as defined in variation schemata, hiding enormous amounts of complexity from the application developer.
  • Separation of lifecycles.
    Externalization of variation metadata out of the host application and onto Variant server leads to a very low compute overhead on the host application. All the actual overhead associated with instrumentation of code variations, such as computations, persistence of targeting and qualification information, and dealing with trace event back-pressure is handled by Variant server.

3.2Variation Schemata

Each variation schema is a human readable file containing complete definitions of all related code variations expressed with JSON-superset grammar. A single Variant server can manage any number of such schema files, located in the server’s /schemata directory. This section introduces CVM’s concepts by example. For a complete reference, refer to the Variant AIM Server Reference.

The two top-level entities in CVM are states and variations. The states represent host application’s interface states. They are a rather opaque concept to CVM: all it needs to know about a state is its name and an optional set of state parameters. The state parameters are simple key/value pairs of strings, whose meaning is external to CVM and only meaningful to the host application.

The variations, on the other hand, are complex structures entirely managed by CVM. At a minimum, a variation must have:

  • Name
  • Exactly one control experience (typically mapped to the existing user experience) and at least one variant experience, which represents some alternate code path(s).
  • A list of on-state value objects, one per each state instrumented by this variation.

3.2.1Minimal Valid Schema

A minimal valid variation schema consists of a single state, instrumented by a single variation with one variant experience, like the one below, where we model a feature rollout that adds Recaptcha to the existing password reset page.


// A very simple variation schema with a single state,
// instrumented by a single variation.
{
  'meta':{
    'name':'MinimalSchema'
  },
  'states':[{'name':'passwordResetPage'}],
  'variations':[
    {
      'name':'RecaptchaOnPasswordReset',
      'experiences':[
        {'name':'noRecaptcha', 'isControl':true},
        {'name':'withRecaptcha'}
      ],
      'onStates':[{'stateRef':'passwordResetPage'}]
    }
  ]
}

The meta section contains the schema’s name by which it is known to the connecting clients. The states clause contains the sole state definition and the variations clause contains the sole variation RecaptchaOnPasswordReset with two experiences, noRecaptcha and withRecaptcha.

3.2.2Minimal Practical Schema

In the next example, borrowed varbatim from the Variant Demo Application , we introduce a number of important new concepts:

  • Concurrent variations.
  • A variation can span multiple states.
  • Lifecycle Hooks.


/*
 * Variant Java client + servlet adapter demo application.
 * Demonstrates instrumentation of an experiment and a concurrent feature toggle.
 * See https://github.com/getvariant/variant-java-demo for details.
 *
 * Copyright © 2015-2018 Variant, Inc. All Rights Reserved.
 */
{
  'meta':{
    'name':'petclinic',
    'comment':'Variant schema for the Pet Clinic demo application'
  },
  'states':[
    {'name':'vets'}, 
    {'name':'newVisit'}
  ],
  'variations':[
    /*
     * Vet's hourly rate feature toggle on the vets page only.
     * Demonstrates lazy instrumentation.
     */
    {
      'name':'VetsHourlyRateFeature',
      'experiences':[
        {
          'name':'existing',
          'weight':1,
          'isControl':true
        },
        {
          'name':'rateColumn',
          'weight':3
        }
      ],
      'onStates':[
        {'stateRef':'vets'}
      ]
    },
    /*
     * The Schedule-a-Visit Experiment on 2 pages.
     * Demonstrate eager instrumentation and conjoint variation concurrency.
     */
    {
      'name':'ScheduleVisitTest',
      'conjointVariationRefs':['VetsHourlyRateFeature'],
      'experiences':[
        {
          'name':'noLink',
          'weight':1,
          'isControl':true
        },
        {
          'name':'withLink',
          'weight':3
        }
      ],
      'onStates':[
        {'stateRef':'vets'}, 
        {'stateRef':'newVisit'}
      ],
      'hooks': [
        {
          // Disqualify blacklisted users.
          'class':'com.variant.extapi.std.demo.UserQualifyingHook',
          'init': {'blackList':['Nikita Krushchev']}
        } 
      ]
    }
  ]
}

This schema contains two variations: the feature flag VetsHourlyRateFeature exposes an early release of a the new feature on the vets page and the experiment ScheduleVisitTest on pages vets and newVisit. The new feature adds the hourly rate column to the vets table, and the experiment verifies the hypothesized lift in new appointment bookings due to the new Schedule vist link to the newVisit page also on the vets table.

The ScheduleVisitTest experiment has a lifecycle hook disqualifying black-listed customers from this test. Variant server posts this hook whenever it must determine if a new user session qualifies for the experiment. Lifecycle hooks are managed via the server-side Extension API. For more information, refer to Section 5.

The fact that both variations in the petclinic schema are instrumented on the vets page makes them concurrent. By default, Variant assumes concurrent variations to be disjoint, which is to say that only one of them can be in a non-control variant. This convenient default shields application developers, working on new features from what other developers are doing; if two independent features happen to overlap, Variant will never target a user session for both of them, unless instructed otherwise.

If the two developers each working on his own feature, join forces and provide the hybrid code path, which implements both features, they can direct Variant to treat these two code variations as conjointly concurrent by providing the conjointVariationRefs, just like in the schema in Listing 2 above. Concurrent variations are considered in detail in Section 3.7.

3.3State Variants

Whenever a state is instrumented by a variation, this instrumentation constitutes an obligation, on the part of the host application, to provide an implementation of any experience defined by the variation. Or, more formally, the host application must provide a state variant corresponding to any experience declared by any code variation instrumented on a particular state.

By default, Variant can infer state variants from the variation schema, as is the case in the Listing 2 above. However, there are cases when the application developer must specify state variants explicitly, as discussed in the following two sections.

3.4State Parameters

Although Code Variation Model makes no assumptions about the technology or the semantics of the host application, it provides a mechanism for the host application to enrich variation schema with application-specific state through the use of state parameters, simple key/value pairs, whose meaning is completely opaque to Variant.

State parameters can be specified either at the state or at the state variant level, as illustrated in Listing below.


{
  ...
  'states':[
  ...
  {
    'name':'state1',
    /*
     * State parameters specified at the state level
     * provide the base values for all variants of this state.
     */
    'parameters': [
      {'key':'key1', 'value':'value1'},
      {'key':'key2', 'value':'value2'}
    ]
  },
  ...
  ],
  'variations':[
    {
      'name':'variation1',
      'experiences':[
        {
          'name':'existing',
          'isControl':true
        },
        {
          'name':'variant'
        }
      ],
      'onStates':[{'stateRef':'staate1'}],
      'variants': [
        {
          'experienceRef':'variant',
          /*
           * State parameters specified at the state variant level
           * at runtime override the likely-keyed base values within
           * the scope of the enclosing state variant. 
           */
          'parameters': [
            {'key':'key2', 'value':'value2 in state variant'},
            {'key':'key3', 'value':'value3 in state variant'}
          ]
          ...
        }
      ]
    },
    ...
  ]
}

Each state parameter is a name/value pair, where both the name and the value are strings. State parameters specified at the state level, provide the base values, which have the global scope. At runtime, these parameters are available to the host application within the scope of each state variant that is based in this state, across all variations.

State parameters specified at the state variant level have the scope of that state variant only and within that scope override the likely-named base parameter values. In other words, at runtime, these client calls will return the following results:


stateRequest.getResolvedStateParameters().get("key1");  // "value1"
stateRequest.getResolvedStateParameters().get("key2");  // "value2 in state variant"
stateRequest.getResolvedStateParameters().get("key3");  // "value3 in state variant"

This mechanism of state parameter overrides is a convenient way for the developer to introduce application state into the schema at both global and local scopes.

3.5Mixed Instrumentation

Typically, all experiences in a variation will instrument the same set of states. But there are use cases where this assumption does not hold. For example, you may want to split a busy page in two, or to consolidate two sparse pages into one, as illustrated in Figure 4 below.


Variant Mixed Instrumentation

The type of instrumentation where a state is instrumented by some, but not all experiences in a variation is referred to as mixed instrumentation. Whenever a state variant is undefined in some experience, it is referred to as phantom state variant in that experience. A phantom state variant constitutes an obligation on the part of the host application not to enter this state if it is in the experience where this variant is phantom. For example, in the Figure 4.A above, if a session targeted to the control experience attempts to enter state S2, Variant will throw a runtime exception.

Refer to Listing 3 in the next section for a practical example of mixed instrumentation.

3.6Variation Concurrency

3.6.1Motivation

If two variations instrument no states in common, they are referred to as serial, meaning that their concurrent execution is equivalent to a serial execution because they do not interfere with each other. Conversely, if two variations have one or more states in common, they are referred to as concurrent. Variant Code Variation Model offers full support for variation concurrency; any possible interleaving of two concurrent variations can be defined in the variation schema.

In Figure 5 below, the Blue and the Green variations are serial, but the Red variation is concurrent with both Blue and Green.


Concurrent Experiments

When a user session targets a state that is instrumented by two or more variations, there is a state variant space of possible experience permutations from which a state variant can be chosen. For example, in Figure 5 above, state S2 is instrumented by Blue and Red variations. Blue only has one variant experience and Red has two variant experiences, so the complete variant space of the state S2 has 6 cells:


Variant Space

The relationship of concurrence between two variations V1 and V2 has the following properties:

  • Symmetric: If variation V1 is concurrent with variation V2, then V2 is concurrent with V1.
  • Not Reflexive: a variation cannot be concurrent with itself.
  • Not Transitive: If V1 is concurrent with V2 and V2 is concurrent with V3, then V1 and V3 need not be concurrent.

Variant server supports two runtime strategies for managing concurrent variations: a simplified, pseudo-serial strategy, which requires less work on the part of the experiment designer

3.6.2Disjoint Concurrency

First, let’s consider a pseudo-serial execution, when the two variations are traversed in isolation. To support Blue variation by itself, application developer needs to implement the S2blue experience. Similarly, to support Red variation in isolation, (probably some other) application developer needs to implement its two variant experiences S21red and S22red. This is a perfectly acceptable scenario, so long as no user session ends up targeted to variant experiences in both variations. If that were to happen, the host application would have no code path, implementing both S2blue and S21red state variants.

This type of constrained concurrency is referred to as disjoint concurrency and is the default behavior. Variant will not target a user session for two variant experiences in two concurrent variations, unless instructed to do so in the variation schema. This default makes sense: application developers should not have to communicate with each other simply because they work on overlapping features.

The price of this convenience is the potential starvation of downstream variations of user traffic, which is frequently acceptable.

3.6.3Conjoint Concurrency

The unconstrained concurrency mode, where a session’s ability to participate in Red variation is not constrained by its participation in Blue variation, and vice versa, is referred to as conjoint concurrency. To instrument two conjointly concurrent variations, the application developer has to do the following:

  • Implement all hybrid experiences, e.g. the two hybrid state variants shaded in two colors in Figure 6 above.
  • Tell Variant to treat the two variations as conjoint by using the conjointVariationRefs schema property, as we did in the petclinic schema in Listing 2.

Listing 4 below is the complete variation schema for the Blue, Red and Green variations from Figure 5 above. To illustrate both concurrency modes, Red and Blue variations as defined as conjoint and Green and Red variations as disjoint.


{
  'meta':{
    'name':'Tricolor',
    'comment':'Schema for Red, Green, Blue variations on Figure 5'
  },

  'states':[{'name':'S1'}, {'name':'S2'}, {'name':'S3'}, {'name':'S4'}],

  'variations':[
    {
      'name':'Blue',
      'experiences':[
        {'name':'grey', 'isControl':true},
        {'name':'blue'}
      ],
      'onStates':[{'stateRef':'S1'}, {'stateRef':'S2'}]
    },
    {
      'name':'Red',
      'conjointVariationRefs':['Blue'], // Conjointly concurrent with Blue
      'experiences':[
        {'name':'grey', 'isControl':true},
        {'name':'red1'},
        {'name':'red2'}
      ],
      'onStates':[{'stateRef':'S2'}, {'stateRef':'S3'}]
    },
    {
      'name':'Green', // Serial with Blue and disjointly concurrent with Red
      'experiences':[
        {'name':'grey', 'isControl':true},
        {'name':'green'}
      ],
      'onStates':[
        {'stateRef':'S3'},
        {
          'stateRef':'S4',
          'variants':[
            {
              // Explicit phantom variant definition.
              'isPhantom': true
              'experienceRef': 'grey',
            }
          ]
        }
      ]
    }
  ]
}

Note the explicit state variant for the Green variation's control experience on state S4. It is needed in order to declare it as phantom to account for the fact that there is no control state variant, i.e. that a user session is not allowed to target for S4 if it has already been targeted to the control experience in Green variation.

4Variant Runtime

4.1The Lifecycle of a State Request

As already explained, Code Variation Model treats interactive applications as a finite state machines. Each user session traverses some state graph, whose nodes are the interface states, where the host application pauses for user input. In real world, state nodes be traditional HTTP pages, Angular views, IVR menus, Android activities, etc. — the points in the host application where it pauses waiting for user input.

Whenever host application is about to return to the user session a particular interface state, it must determine if the state it is about to return to the user exists in more than one variant (i.e. is instrumented by any code variations), and, if so, which of these variants to return. Both of these tasks are accomplished by the Session.targetForState(state) method. It returns the StateRequest object which may be further examined for the list of live experiences in all variations instrumenting this state.

Thus, a Variant session can be thought of as a succession of consecutive state requests, united into a single user experience by Variant session. At runtime, the session must be created first, before any state targeting may happen, so we consider it first in the next section, followed by a closer look at the state request.

4.1.1Variant Session

In order to communicate with Variant server, host application must connect to it and create a Variant session as follows:


// Arguments are environment-dependent
Session variantSession = variantConnection.getOrCreateSession(...);

The arguments to the getOrCreateSession() method are environment-dependent and are discussed in detail in the Java Client User Guide.

Variant sessions provide

  • A way to identify a user across multiple state requests;
  • Storage for the session state that must be preserved between state requests;
  • Metadata isolation context.

Variant server acts as the centralized session repository, accessible to any Variant client by the session ID. All clients sharing a session are guaranteed a consistent view of the session state. Variant hides any changes to variation schema from active sessions, which continue to see the variation metadata as it was at the time when the sessions were created. This isolation guarantee is critical in protecting user experiences from fatal inconsistencies. For example, if a variation is taken offline, or one of its variant experiences is dropped, existing sessions, currently traversing this variation, would be thrown out of their experiences, if this change were visible.

Note, that Variant sessions are completely separate of the host application's own native sessions. Variant sessions are configured independently and do not require that the host application even have any native notion of a session.

Variant server expires sessions after a certain configurable period of inactivity.

4.1.2State Request

Whenever the host application is about to serve a user session a particular interface state, potentially instrumented by one or more code variation, it consults Variant server for the targeting information by calling the Session.targetForState(state) method, which returns the StateRequest object.

Continuing with our Tricolor schema from the Listing 4, this is how a Variant session gets targeted for the state S2:


// Obtian the state from the variation schema.
State s2 = variantSession.getSchema().getState("S2").orElseThrow(
  () -> new RuntimeException("State S2 is not in schema!"));

// Taraget current session for the state.
StateRequest variantStateRequest = variantSession.targetForState(s2);

Much of the complexity, hidden by Variant server from the application developer, is inside the targetForState(state) method. Indeed, for each variation, instrumented on the given state, Variant server must perform the following steps:

The StateRequest object has methods that the host application can call to figure out to what experience in a particular variation it is targeted. For example, to find out to what experience in Red variation the session has been targeted:


// Obtain the variation from the variation schema.
Variation redVar = variantSession.getSchema().getVariation("Red").orElseThrow(
  () -> new RuntimeException("Variation Red is not in schema!"));

Variation.Experience redVarExp = variantStateRequest.getLiveExperience(redVar).orElseThrow(
  () -> new RuntimeException("No live experience in variation Red!"));

if (redVarExp == redVar.getExperience("red1").get() {
  // Do experience "red1"
}
else if (redVarExp == redVar.getExperience("red2").get() {
  // Do experience "red2"
}
else {
  throw new RuntimeException("Don't know what to do for experience " + redExperience);
}

4.2Session Qualification

4.2.1How Variant Qualifies Sessions

Qualification is a distinct idea from targeting. Suppose, for example, that a newspaper wants to test promotional rates, offered on its website. This promotion cannot be combined with another promotion, so the traffic coming from other promotional offers must be disqualified from the experiment.

Whenever, inside the Session.targetForState(state) method, Variant determines that the calling session's qualification for a particular vairation must be (re)established, it raises the VariationQualificationLifecycleEvent lifecycle event, which posts the chain of eligible lifecycle hooks. All eligible lifecycle hooks are posted serially. If none were defined or none returned a usable result, the default built-in qualification hook is posted, which blindly qualifies all session for all variations.

For more information on lifecycle hooks, refer to Section 5.1.

If the session is qualified for a variation, Variant proceeds to the targeting step, discussed in the next section. If the session is disqualified, the following semantics apply:

4.2.2Qualification Longevity

Once a session has been (dis)qualified for a variation, the natural question is how long this qualification decision should remain in effect and when this decision should be re-evaluated. Variant supports three longevity levels: unstable, stable and durable, which corresponds to the three temporal scopes: request, session and variation.

Unstable qualification means that the session is re-qualified for each state request. In other words, each time the host application calls Session.targetForState() it will be re-qualified for all variations instrumented on tat state.

Stable qualification means that the session is qualified only once for each code variation it traverses, and continues to reuse this qualification decision until it expires. This is the default qualification longevity.

Finally, durable qualification implies that once a recognized user is (dis)qualified for a particular variation, he will continue to be (dis)qualified for it for as long as this variation is defined in the schema.

Qualification longevity is defined in the variation schema on the per-variation basis as follows:


{
  'meta':{
    'name':'Tricolor',
    'comment':'The revised tri-color schema with qualification longevity'
  },
  'states':[...],
  'variations':[
    {
      'name':'Blue',
      'qualification':'unstable', // Or 'stable' or 'durable'
      ...
    },
    ...
  ]
}

Unstable and stable qualification is provided by Variant automatically: all you have to do is to define it in the schema. (If you don't define any, Variant will default to stable qualification.) But durable qualification, requires that host application identifies each session with a unique key, such as user ID, by which the user can be identified between sessions.


// Get or create session.
Session variantSession = variantConnection.getOrCreateSession(...);
// Let Variant know the user's unique ID.
variantSession.identify(userId);

It is okay to call identify() multiple times with the same user ID.

4.3Session Targeting

4.3.1How Variant Targets Sessions

When a qualified session first comes in contact with a variation, Variant must assign it to some experience in that variation, i.e. target the session for that variation. Even in a serial case, when the requested state is only instrumented by one variation, the targeting decision is not simple and involves, among others, the following consideration:

Is there an existing targeting, that must be honored? As will be discussed in the next section, targeting decision is subject to the same longevity rules as qualification, just discussed in Section 4.2.2. Consequently, even if this session is just coming in contact with a variation, it may already have an existing targeting information that must be honored.

Is this state phantom in any of the experiences? If so, these experiences must be excluded from the set of possible targets.

The complexity of the targeting decision grows dramatically for concurrent variations, where a session must be targeted to multiple variations at once.

Whenever, inside the targetForState(state) method, Variant determines that the calling session must be targeted for a particular vairation, it raises the VariationTargetingLifecycleEvent lifecycle event, which posts the chain of eligible lifecycle hooks. All eligible lifecycle hooks are posted serially. If none were defined or none returned a usable result, the default built-in targeting hook is posted, which targets randomly, according to the weights provided in the schema, as explained in Listing 1.

For more information on lifecycle hooks, refer to Section 3.6.

4.3.2Targeting Longevity

The longevity of a targeting assignment is subject the same rules as that of qualification, already considered in Section 4.2.2. You can specify one of three longevity levels: unstable, stable and durable, which corresponds to the three temporal scopes: request, session and variation.

Unstable targeting means that the session is re-targeted for each state request. In other words, each time the host application calls Session.targetForState() all pre-existing targeting information is discarded. Although in some cases unstable targeting makes sense, use it with care because if your variation is instrumented on more than one state, your users may see different experiences in one session.

Stable targeting means that the session is (re)targeted exactly once, when it first requests a state instrumented by a particular variation, and continues to reuse this targeting until it expires. This is the default targeting longevity. It guarantees each user session a consistent experience, but a return user may see a different experience.

Finally, durable targeting implies that once targeted for a particular experience, it persists between sessions and the return user will see the same experience so long as the variation is defined in the schema. Note however, that this guarantee is subject to certain conditions as described in the next section.

Targeting longevity is defined in the variation schema on the per-variation basis as follows:


{
  'meta':{
    'name':'Tricolor',
    'comment':'The revised tri-color schema with targeting longevity'
  },
  'states':[...],
  'variations':[
    {
      'name':'Blue',
      'targeting':'unstable', // Or 'stable' or 'durable'
      ...
    },
    ...
  ]
}

Unstable and stable targeting is provided by Variant automatically: all you have to do is to define it in the schema. (If you don't define any, Variant will default to stable qualification.) But durable qualification, requires that host application identifies each session with a unique key, such as user ID, by which the user can be identified between sessions.


// Get or create session.
Session variantSession = variantConnection.getOrCreateSession(...);
// Let Variant know the user's unique ID.
variantSession.identify(userId);

It is okay to call identify() multiple times with the same user ID.

In most cases stable targeting is sufficient. However, some use cases require that targeting information be preserved between sessions also, so that a returning user sees the same experience. As an example, consider an IVR call center application. In most cases it is okay if a user calling today is greeted with a different menu than on his previous call yesterday. However, if your user is filling out an online loan application, his experience cannot change if he switches from his home laptop to his work computer.

4.3.3Metadata Modifications

(Un)stable targeting is guaranteed by Variant unconditionally, because Variant session is isolated from any schema changes. However, because the variation schema may have changed between a user's two consecutive session, durable targeting cannot be provided unconditionally. Consider the following scenario:

  1. Your schema contains the variations V1 defined with durable targeting;
  2. Some user has traversed V1 in the past and has been targeted to a variant experience;
  3. You have just added a new variation V2, disjointly concurrent with with V1;
  4. The same user visits again and hits V2 first, in which he is targeted to a variant experience;
  5. The user then hits V2. Variant cannot honor the existing durable targeting for V2 because it is not conjointly concurrent with V1.

When cases like this arise, Variant will discard the least recently used targeting info &mdash in this example the V1 experience and re-target — in this example to the control experience.

4.4Schema Management

When Variant server starts, it looks for variation schema files in the schemata directory and attempts to deploy them sequentially. A schema file must contain exactly one uniquely named Variant schema. There is no requirement that the schema file name match that of the schema it contains, but it is recommended that you name each schema file similarly to the schema therein.

For each schema file in the schemata directory Variant server takes these steps:

  1. Parse the schema file.
    Any messages emitted by the parser are written to the server log file.
  2. Deploy if no parse errors.
    If any parser errors were encountered, Variant server skps this schema file. Otherwise, if no parser errors and provided no already running schemata has the same name, Variant will deploy this schema.

To (re)deploy a variation schema on a running Variant server, simply place the (updated) schema file in the schemata directory. A running server detects the new (or updated) file and attempts to deploy the schema from it by following these steps:

  1. Parse the schema.
    Any messages emitted by the parser are written to the server log file.
  2. Deploy if no parse errors.
    If any parser errors were encountered, Variant server skips this schema file. Otherwise, if no parser errors, Variant will attempt to deploy this schema, subject to the following conditions:

    1. If no currently deployed schemata has the same name as this schema, this schema is deployed.
    2. If a currently deployed schema has the same name as this schema, their respective file names must also be the same.
  3. If both of the above conditions stand, the currently deployed schema is undeployed and the new one is deployed in its place.

To undeploy a currently deployed schema, simply remove the corresponding schema file.

Whenever a schema is undeployed, Variant server will keep it around while all the sessions, connected to it, are drained — i.e. until the last active session connected to that schema generation is naturally expired. All new sessions are created against the current generation. Session draining isolates live sessions from schema updates, which is instrumental in Variant's ability to provide qualification and targeting stability. In practice this means that, for instance, you can throw a feature toggle off at any time without worrying what would happen to those user sessions who are already in the experience.

4.5Trace Event Logging

Variant trace events are the elementary data points generated by user traffic as it flows through Variant variations with the purpose of subsequent analysis by a downstream process. Trace events can be triggered implicitly, by Variant, or explicitly by the host application. In either case, the host application can attach attributes to these events, to aid in the downstream analysis.

The only implicit trace event is the state visited event. It is created at the start of the state request, Figure 3, but not yet triggered. This gives the host application a chance to attach custom attributes to the event. For example, if the host application caught an exception, it may wish to set the status of the event to error, and add the name of the class that threw the exception. This information can be used downstream to exclude this session from the statistical analysis (if this is an experiment), or to shut off the variation (if this is a feature toggle).

Custom events can be triggered by calling an appropriate client API method, e.g. Session.triggerTraceEvent() in the Java client.

Trace events are made available to the downstream analysis via Trace Event Flushers which are part of the Extension API, discussed next.

5Extending Variant Server

Variant AIM Server’s default behavior can be extended via the server-side Extension API. It supports creation and configuration of user code which runs in the server’s address space, augmenting the server's default behavior with custom semantics. ExtAPI exposes two principal extension mechanisms: lifecycle hooks and trace event flushers. They are configured in the variation schema and made available to the server at run time via the /ext directory.

See Variant AIM Server Reference for complete details.

5.1Lifecycle Hooks

The ScheduleVisitTest from Listing 2 above defined a lifecycle hook class UserQualifyingHook, which disqualifies black-listed users from the experiment. Below is the relevant section from Listing 2:


...
      'hooks': [
        {
          // Disqualify blacklisted users.
          'class':'com.variant.extapi.std.demo.UserQualifyingHook',
          'init': {'blackList':['Nikita Krushchev']}
        } 
      ]
...

Lifecycle event hooks are callback methods, executed by Variant server when correponding lifecycle events are raised. For example, when a user session must be qualified or targeted for a particular variation, two corresponding lifecycle events are raised: the session qualification event and the session targeting event. If you have defined custom hooks for these events, Variant will post them by calling their post() method. .

Lifecycle hooks provide a way to extend Variant server’s default behavior with application-specific semantics. They are executed in the server process’s address space and are highly reusable modules that encapsulate common semantics, easily accessible to different variations and outside of the host applicaton's code base.

Depending on where a hook is defined defined in the schema, it may have the global or meta scope, a state scope or a variation scope. Global hooks are defined in the meta section and apply to all states and all variations in this schema. A state-scoped hook only applies to the state with which it is defined, and a variation-scoped hook applies only to the variation with which it is defined.

For more information, refer to the Variant Server Reference.

Lifecycle hooks are listeners for various lifecycle events raised by the server, such as VariationQualificationLifecycleEvent or VariationTargetingLifecycleEvent .

Hooks are configured in the variation schema and have different scopes, depending on where in the schema they are defined. Hooks defined inside the meta section have global scope and apply to all the states and all the variations in the schema. Hooks defined with a particular state apply only to that state, and hooks defined with a particular variation apply only to that variation.

In any scope, any number of hooks can be defined. If more than one lifecycle hook is eligible to be posted by a lifecycle event at runtime, they form a hook chain. More locally defined hooks are posted before the global ones on the chain, and within a scope hooks are posted in ordinal order. The hooks are posted serially, until a hook's post() method returns a non-null value. If no custom hooks have been defined for a lifecycle event, or all returned null, the default built-in hook for the event is posted, which is guaranteed to return a non-null value.

5.2Trace Event Flushers

Event flushers handle the terminal ingestion of Variant trace events. A typical event flusher will write them to a persistent storage mechanism, such as an external database. Whenever a Variant event is triggered — implicitly by Variant server or explicitly by user code — it is picked up by the asynchronous event writer, where it is held in a memory buffer until a dedicated flusher thread wakes up and turns them over to the event flusher by calling the flush() method .

The size of the trace event buffer is configured by the variant.event.writer.buffer.size server config key. The larger the buffer, the better the event writer will cope with bursty inputs, but at the price of additional memory footprint. Whenever the event writer is not keeping up with the event load, it will discard new events (with an error message to the server log) until there's again room for them in the event buffer.

A few ready-made event flushers, intended for saving trace events in popular databases, such as PostgreSQL and MySQL, are included in Variant server's standard extension . These can be configured and used directly.

But, likely, you will want to create your own, suitable for your operating environment. Creating a custom event flusher is a matter of implementing the TraceEventFlusher interface .

See Variant Server Reference Guide for more information.

Appendix A Analyzing Variant Controlled Experiments

5.1Trace Event Data Aggregation

Each Variant experiment is designed with particular target metric(s) in mind. But regardless of the target metric(s), the starting data point is always a time-series of trace events, such as the page visited event, which must be aggregated into a time series of measurements, such as revenue as a function of number of users through the experiment. The details of this aggregation step depend entirely on the longevity mechanism you've chosen for your trace events. If your flusher inserts them into a relational database, you will likely use SQL. A distributed data processing framework, like Apache Hadoop , can also be successfully deployed for longevity and aggregation of Variant trace events.

5.2Statistical Analysis

The goal of an experiment to

  • Discover if there is a difference between control and variant experience(s) with respect to the target metric of interest;
  • Asses how certain can we be that this difference is not just random noise.

The latter can be accomplished with some well-known mathematical formulas developed in the field of statistical hypothesis testing. The fundamental idea there is to develop a procedure that will enable the researcher to make a claim about the entire population with a given degree of certainty, based on a set of sample observations. Refer to the Statistical Analysis of Variant Experiments white paper for more information.