How to Enable Khepri
As of RabbitMQ 4.0, Mnesia is still the default metadata store backend. Khepri has to be
explicitly enabled using the khepri_db
feature flag.
This page demonstrates how to enable Khepri in various situations and what the user should be aware of.
While Khepri is fully supported in RabbitMQ 4.0.x, it does not have the 17 years of extensive use that Mnesia has.
We encourage all RabbitMQ users to test Khepri thoroughly before adopting it in production.
It will be possible to upgrade from 4.0.x to future releases with Khepri enabled.
Terminology
The feature flags subsystem uses the words stable and experimental to qualify feature flags maturity.
An experimental feature flag is used in two situations:
- To introduce changes to get feedback early during the development. These changes could be reverted, upgrading a RabbitMQ node with such a feature flag enabled may not bo possible and support may not be provided.
- For features the RabbitMQ team committed to and provides support for, until it is ready to be enabled by default, possibly replacing an older system.
Khepri in RabbitMQ 3.13.x was in the first group. Be reassured that Khepri in RabbitMQ 4.0.0 and onward is in that second group and is therefore fully supported.
On a brand new RabbitMQ node
Using the CLI
-
Start the new RabbitMQ node using a method of your choice. The example below executes the
rabbitmq-server(8)
command directly:- bash
- PowerShell
rabbitmq-server
rabbitmq-server.bat
At that point, the node is using Mnesia as the metadata store backend.
-
Enable the
khepri_db
feature flag:- bash
- PowerShell
# Opt-in to enable Khepri
rabbitmqctl enable_feature_flag --experimental khepri_db# Opt-in to enable Khepri
rabbitmqctl.bat enable_feature_flag --experimental khepri_db
See the next page to learn more about what happens when nodes with Mnesia and nodes with Khepri are clustered together.
Using the Management UI
-
Start the new RabbitMQ node using a method of your choice. See the example above.
At that point, the node is using Mnesia as the metadata store backend.
-
Enable the management plugin:
- bash
- PowerShell
rabbitmq-plugins enable rabbitmq_management
rabbitmq-plugins.bat enable rabbitmq_management
-
Open and log into the management UI.
-
Navigate to "Admin > Feature Flags".
-
Tick "I understand the risk" and click the "Enable" button:
Using an Environment Variable
The use of this variable requires caution: because the variable takes an exhaustive list, all feature flags that must be enabled in a given cluster must be listed.
$RABBITMQ_FEATURE_FLAGS
environment varable to set the list
of feature flags to enable at boot time on a new node. The variable must be
set to the exhaustive list of feature flags to enable on this node. This
variable is considered on the very first boot only; it is ignored afterwards.
::: important
This variable is considered on the very first boot only; it is ignored afterwards
:::
Start the new RabbitMQ node using a method of your choice, setting the
$RABBITMQ_FEATURE_FLAGS
variable in the process. The example below executes
the rabbitmq-server(8)
command directly:
- bash
- PowerShell
env RABBITMQ_FEATURE_FLAGS="khepri_db,..." rabbitmq-server
$Env:RABBITMQ_FEATURE_FLAGS = 'khepri_db,...'
rabbitmq-server.bat
Note that this example does not list other feature flags to keep it short: you need to fill that list.
The RabbitMQ node will use Khepri right from the beginning.
On an Existing Standalone Node or Cluster
Khepri can be enabled when all cluster nodes are online and the cluster is healthy, like any other feature flag. Khepri cannot be enabled it while a node or the entire cluster is stopped.
::: importnt
Khepri cannot be enabled it while a node or the entire cluster is stopped
:::
To enable Khepri, use either the CLI command on the management UI methods described above.
The migration of the existing data from Mnesia to Khepri runs in parallel of regular activities of RabbitMQ. However this migration takes resources and will pause other activities near the end of the process for a short period of time. Therefore, perform this migration away from peek load.
What Happens When Khepri is Enabled?
The migration from Mnesia to Khepri is the responsibility of the
khepri_mnesia_migration
library.
This library performs the migration in two phases:
- It synchronizes the cluster membership from Mnesia to Khepri.
- It copies records from Mnesia tables to the Khepri store.
Step One: Cluster Membership Synchronization
The common situation is that Khepri is enabled in a Mnesia-based cluster and thus all nodes involved are single isolated nodes from Khepri's point of view.
To be extra safe and avoid the loss of data in case some nodes were already
clustered at the Khepri levet too, khepri_mnesia_migration
uses several
conditions to make sure the Khepri cluster is deterministic. To achieve that,
here are the steps it goes through:
-
It queries the list of members of the Mnesia cluster. This is the baseline list of nodes we want to cluster in Khepri too.
-
It queries each node to get the members of the Khepri cluster. Usually, Khepri was not clustered yet, so each node just returns itself.
-
It sorts the list of Khepri "clusters" according to the following criterias:
- the cluster size (i.e. the number of members)
- the number of records in the Khepri store
- the node uptime
- the node name
Therefore, in the case some nodes were already clustered at the Khepri level, the Khepri clusters will be sorted with the largest cluster (set of nodes) first.
But usually, nodes will be unclustered and thus sorted by node uptime and name.
-
It selects the largest Khepri "cluster" according to the criteria above and adds all other nodes to that largest cluster
-
If some nodes were clustered at the Khepri level but were not in Mnesia, they are removed from Khepri
Step Two: Schema Records Copy
Once the cluster membership view is the same between Mnesia and Khepri,
khepri_mnesia_migration
can proceed with the actual migration of the data.
It performs the copy while permitting writes in Mnesia until the very last
moment.
The copy relies on callback modules provided by RabbitMQ. These callack
modules are responsible for telling khepri_mnesia_migration
that record
$record
from table $table
goes into Khepri path $path
, after possibly
doing some record conversion.
Here are the steps of the data copying algorithm:
-
khepri_mnesia_migration
marks the migration in progress as value in Khepri. -
It subscribes to all Mnesia updates.
-
It does the first copy from Mnesia to Khepri using Mnesia Backup & Restore API. This is based on a checkpoint in time in Mnesia, therefore the view is consistent.
-
It marks all Mnesia tables as read-only. This is where activities in RabbitMQ are paused. Client operations may time out as a consequence.
-
All updates received thanks to the Mnesia subscription in step 2 are now consumed and written to Khepri. Because tables are read-only, it is sure there is an end to the stream of updates.
-
It marks the migration as complete. RabbitMQ can resume activities: they will use Khepri from now on.
-
It proceeds with the cleanup: tables are deleted.
Rollback In Case of an Error
If there is an error during this process, everything is rolled back and RabbitMQ will resume activities using Mnesia as before.