Cassandra Configuration

Note: Cassandra is only available with Guice wiring (cassandra-guice and cassandra-guice-ldap).

Consult cassandra.properties to get some examples and hints.

cassandra.nodes
List of some nodes of the cassandra's cluster in following format host:port or host, if the port is not specified we use 9042
cassandra.keyspace.create
Indicate if the keyspace should be created by James. Optional, default value: false
If set to true James will attempt to create the keyspace when starting up.
cassandra.keyspace
Is the name of the keyspace used by James. Optional, default value: apache_james
cassandra.user
Username used as a credential for contacting Cassandra cluster. Optional, default is absent, required if cassandra.password is supplied
cassandra.password
Password used as a credential for contacting Cassandra cluster. Optional, default is absent, required if cassandra.user is supplied
cassandra.ssl
Whether SSL should be enabled on the communications with Cassandra cluster. Optional, defaults to false.
The keystore used for trusting SSL server socket can be set via JSSE system properties as explained on Cassandra driver manual.
cassandra.replication.factor
Is the replication factor used upon keyspace creation. Modifying this property while the keyspace already exists will have no effect. Optional. Default value 1.
cassandra.query.logger.constant.threshold
Optional. If specified all queries that take more than the given integer in millisecond will be considered slow and logged. If not specified by default a DynamicThresholdQueryLogger will be used (see above)
cassandra.query.slow.query.latency.threshold.percentile
Default is com.datastax.driver.core.QueryLogger.DEFAULT_SLOW_QUERY_THRESHOLD_PERCENTILE. The latency percentile beyond which queries are considered 'slow' and will be logged. If you specify cassandra.query.logger.constant.threshold, you should not specify this property
cassandra.query.logger.max.query.string.length
Default is com.datastax.driver.core.QueryLogger.DEFAULT_MAX_QUERY_STRING.LENGTH. The maximum length of a CQL query string that can be logged verbatim by the cassandra driver
cassandra.query.logger.max.logged.parameters
Default is com.datastax.driver.core.QueryLogger.DEFAULT_MAX_LOGGED_PARAMETERS. The maximum number of query parameters that can be logged by the cassandra driver
cassandra.query.logger.max.parameter.value.length
Default is com.datastax.driver.core.QueryLogger.DEFAULT_MAX_PARAMETER_VALUE_LENGTH. The maximum length of query parameter value that can be logged by the cassandra driver
cassandra.readTimeoutMillis
Optional. If specified defines the Cassandra driver read timeout.
# Read com.datastax.driver.core.PoolingOptions for knowing defaults value
# No value here will default to driver's default value

# cassandra.pooling.local.max.connections=8
# cassandra.pooling.local.max.requests=128
## In ms. Should be higher than socket read timeout
# cassandra.pooling.timeout=5000
## In seconds.
# cassandra.pooling.heartbeat.timeout=30
# cassandra.pooling.max.queue.size=256
cassandra.pooling.local.max.connections
Optional. Defaults to 8.
If specified defines the Cassandra maximum number of connections to hosts (remote and local).
cassandra.pooling.local.max.requests
Optional. Defaults to 128.
If specified defines the Cassandra maximum number of concurrent requests per connection.
cassandra.pooling.timeout
Optional. Defaults to 5000 (ms).
If specified defines the Cassandra timeout for waiting in the pool queue. Should be higher than sockets timeout.
cassandra.pooling.heartbeat.timeout
Optional. Defaults to 30 (s).
If specified defines the Cassandra heartbeat timeout.
cassandra.pooling.max.queue.size
Optional. Defaults to 256.
If specified defines the Cassandra maximum size of the connection pool queue.
mailbox.read.repair.chance
Optional. Defaults to 0.1 (10% chance).
Must be between 0 and 1 (inclusive). Controls the probability of doing a read-repair upon mailbox read.
mailbox.counters.read.repair.chance.max
Optional. Defaults to 0.1 (10% chance).
Must be between 0 and 1 (inclusive). Controls the probability of doing a read-repair upon mailbox counters read.
Formula: read_repair_chance = min(mailbox.counters.read.repair.chance.max, (100/unseens)*mailbox.counters.read.repair.chance.one.hundred)
mailbox.counters.read.repair.chance.one.hundred
Optional. Defaults to 0.01 (1% chance).
Must be between 0 and 1 (inclusive). Controls the probability of doing a read-repair upon mailbox counters read.
Formula: read_repair_chance = min(mailbox.counters.read.repair.chance.max, (100/unseens)*mailbox.counters.read.repair.chance.one.hundred)
mailbox.max.retry.acl
Optional. Defaults to 1000.
Controls the number of retries upon Cassandra ACL updates.
mailbox.max.retry.modseq
Optional. Defaults to 100000.
Controls the number of retries upon Cassandra ModSeq generation.
mailbox.max.retry.uid
Optional. Defaults to 100000.
Controls the number of retries upon Cassandra Uid generation.
mailbox.max.retry.message.flags.update
Optional. Defaults to 1000.
Controls the number of retries upon Cassandra flags update, in MessageMapper.
mailbox.max.retry.message.id.flags.update
Optional. Defaults to 1000.
Controls the number of retries upon Cassandra flags update, in MessageIdMapper.
fetch.advance.row.count
Optional. Defaults to 1000.
Controls the number of remaining rows we should wait before prefetch when paging.
chunk.size.message.read
Optional. Defaults to 100.
Controls the number of messages to be retrieved in parallel.
chunk.size.expunge
Optional. Defaults to 50.
Controls the number of messages to be expunged in parallel.
mailbox.blob.part.size
Optional. Defaults to 102400 (100KB).
Controls the size of blob parts used to store messages.
mailbox.read.strong.consistency
Optional. Boolean, defaults to true. Disabling should be considered experimental. If enabled, regular consistency level is used for read transactions for mailbox. Not doing so might result in stale reads as the system.paxos table will not be checked for latest updates. Better performance are expected by turning it off. Note that reads performed as part of write transactions are always performed with a strong consistency.
message.read.strong.consistency
Optional. Boolean, defaults to true. Disabling should be considered experimental. If enabled, regular consistency level is used for read transactions for message. Not doing so might result in stale reads as the system.paxos table will not be checked for latest updates. Better performance are expected by turning it off. Note that reads performed as part of write transactions are always performed with a strong consistency.
message.write.strong.consistency.unsafe
Optional. Boolean, defaults to true. Disabling should be considered experimental and unsafe. If disabled, Lightweight transactions will no longer be used upon messages operation (table `imapUidTable`). As message flags updates relies so far on a read-before-write model, it exposes yourself to data races leading to potentially update loss. Better performance are expected by turning it off. Reads performed as part of write transaction are also performed with a relaxed consistency.
Allows specifying the driver default consistency level.
cassandra.consistency_level.regular
Optional. Defaults to QUORUM.
QUORUM, LOCAL_QUORUM, or EACH_QUORUM.
cassandra.consistency_level.lightweight_transaction
Optional. Defaults to SERIAL.
SERIAL or LOCAL_SERIAL.
cassandra.local.dc
Optional. Allows specifying the local DC as part of the load balancing policy. Specifying it would result in the use of new TokenAwarePolicy(DCAwareRoundRobinPolicy.builder().withLocalDc(value).build()) as a LoadBalancingPolicy. This value is useful in a multi-DC Cassandra setup. Be aware of limitations of multi-DC setups for James. Not specifying this value results in the driver's default load balancing policy to be used.
optimistic.consistency.level.enabled
Optional. Defaults to false. Allows specifying consistency level ONE for reads in Cassandra BlobStore. Falls back to default read consistency level if the blob is missing.

If you want more explanation about Cassandra configuration, you should visit the dedicated documentation.

Cassandra migration process

Cassandra upgrades implies the creation of a new table. Thus restarting James is needed, as new tables are created on restart.

Once done, we ship code that tries to read from new tables, and if not possible backs up to old tables. You can thus safely run without running additional migrations.

On the fly migration can be enabled. However, one might want to force the migration in a controlled fashion, and update automatically current schema version used (assess in the database old versions is no more used, as the corresponding tables are empty). Note that this process is safe: we ensure the service is not running concurrently on this James instance, that it does not bump version upon partial failures, that race condition in version upgrades will be idempotent, etc...

These schema updates can be triggered by webadmin using the Cassandra backend.

Note that currently the progress can be tracked by logs.

Here are the implemented migrations:

From V1 to V2

Last support on releases 3.5.0

Migration tag on git repository: cassandra_migration_v1_to_v2

Goal is to create a messageV2 table that aims at replacing message table. Message table is both storing message metadata and blobs. It have been proven inefficient. Instead version 2 is chunking message blobs and storing it in an other table. The migration process involves moving all messages from message table to messageV2 table (contains only metadata) and blobs / blobParts tables.

Read more about this migration here.

Summary of available options for this migration:

migration.v1.v2.on.the.fly
Only available on tag cassandra_migration_v1_to_v2. Optional. Defaults to false.
Controls wether v1 to v2 migration should be run on the fly.
migration.v1.v2.thread.count
Only available on tag cassandra_migration_v1_to_v2. Optional. Defaults to 2.
Controls the number of threads used to asynchronously migrate from v1 to v2.
migration.v1.v2.queue.length
Only available on tag cassandra_migration_v1_to_v2. Optional. Defaults to 1000.
Controls the queue size of v1 to v2 migration task. Drops when full.
migration.v1.read.fetch.size
Only available on tag cassandra_migration_v1_to_v2. Optional. Defaults to 10.
Controls the fetch size of the request to retrieve all messages stored in V1 during the migration process.

From V2 to V3

Last support on releases 3.5.0

Migration tag on git repository: cassandra_migration_v2_to_v3

Goal is to drop message table. After this migration, one can manually delete this table.

From V3 to V4

Last support on releases 3.5.0

Migration tag on git repository: cassandra_migration_v3_to_v4

Goal is to store attachments in the blob tables.

Summary of available options for this migration:

attachment.v2.migration.read.timeout
Optional. Defaults to one day.
Controls how many milliseconds before the read on attachment v1 time out.

From V4 to V5

Last support on releases 3.5.0

Migration tag on git repository: cassandra_migration_v4_to_v5

Goal is to store attachment ids in the separated AttachmentMessageId table.

Summary of available options for this migration:

message.attachmentids.read.timeout
Optional. Defaults to one day.
Controls how many milliseconds before the read attachment ids on message time out.

From V5 to V6

Last support on releases 3.6.x

Goal is to no longer rely on an UDT partition key for mailboxPath tables. Entries will be migrated to mailboxPathV2 table relying on a composite primary key

From V6 to V7

Last support on releases 3.6.x

Goal is to populate mapping_sources projection table. This table allows finding the source of a given redirection, which is handy for things like mail aliases (I want to list aliases rewritting things to bob). Without this projection table being available, (ie we rely on schema version 6 or less) such information is obtained through a full table scan, unoptimized. From schema version 7, the optimized projection can safely be used.

From V7 to V8

Last support on releases 3.6.x

Add UID_VALIDITY to mailboxPath table in order not to mandate mailbox table reads.

From V8 to V9

Adopt a more compact representation for message properties.

From V9 to V10

Handles Mailbox ACL transactionality with event-sourcing. We got read of SERIAL consistency upon reads thus unlocking a major performance enhancement.

Adding threadId column to message metadata tables

Add threadId column to messageIdTable and imapUidTable in order to get a message's threadId.