BlobStore Configuration
BlobStore is the dedicated component to store blobs, non-indexable content.
James uses the BlobStore for storing blobs which are usually mail contents, attachments, deleted mails...
You can choose the underlying implementation of BlobStore to fit with your James setup.
It could be the implementation on top of Cassandra or file storage service like Openstack Swift, AWS S3.
This configuration is only applicable with Guice products.
Consult blob.properties
in GIT to get some examples and hints.
Blobs storing configuration
- implementation
- cassandra: use cassandra based BlobStore
- s3: use AWS S3 based BlobStore
- file: (experimental) use directly the file system. Useful for legacy architecture based on shared ISCI SANs and/or
distributed file system with no object store available.
- WARNING: JAMES-3591 Cassandra is not made to store large binary content, its use will be suboptimal compared to
alternatives (namely S3 compatible BlobStores backed by for instance S3, MinIO or Ozone)
-
The generated startup warning log can be deactivated via the
cassandra.blob.store.disable.startup.warning
environment
variable being positioned to false
.
- deduplication.enable
- Mandatory. Supported value: true and false.
- If you choose to enable deduplication, the mails with the same content will be stored only once.
- Warning: Once this feature is enabled, there is no turning back as turning it off will lead to the deletion of all
- the mails sharing the same content once one is deleted.
- This feature also requires a garbage collector mechanism to effectively drop blobs. A first implementation
based on bloom filters can be used and triggered using the WebAdmin REST API. See
Running blob garbage collection.
In order to avoid concurrency issues upon garbage collection, we slice the blobs in generation, the two more recent
generations are not garbage collected.
- deduplication.gc.generation.duration
- Allow controlling the duration of one generation. Longer implies better deduplication
but deleted blobs will live longer. Duration, defaults on 30 days, the default unit is in days.
- deduplication.gc.generation.family
- Every time the duration is changed, this integer counter must be incremented to avoid
conflicts. Defaults to 1.
- Upgrade note: If you are upgrading from James 3.5 or older, the deduplication was enabled.
Cassandra BlobStore Cache
A Cassandra cache can be enabled to reduce latency when reading small blobs frequently.
A dedicated keyspace with a replication factor of one is then used.
Cache eviction policy is TTL based.
Only blobs below a given threshold will be stored.
To be noted that blobs are stored within a single Cassandra row, hence a low threshold should be used.
- cache.enable
- DEFAULT: false, optional, must be a boolean. Whether the cache should be enabled.
- cache.cassandra.ttl
- DEFAULT: 7 days, optional, must be a duration. Cache eviction policy is TTL based.
- cache.sizeThresholdInBytes
- DEFAULT: 8192, optional, must be a positive integer. Unit: bytes.
Supported units: bytes, Kib, MiB, GiB, TiB
Maximum size of stored objects expressed in bytes.
Encryption choice
Data can be optionally encrypted with a symmetric key using AES before being stored in the blobStore. As many user relies
on third party for object storage, a compromised third party will not escalate to a data disclosure. Of course, a
performance price have to be paid, as encryption takes resources.
- encryption.aes.enable
- Optional boolean, defaults to false
If AES encryption is enabled, then the following properties MUST be present:
- encryption.aes.password
- String
- encryption.aes.salt
- Hexadecimal string.
If AES encryption is enabled, then the following properties COULD be present:
- encryption.aes.private.key.algorithm
- String, defaulting to PBKDF2WithHmacSHA512. Previously was
PBKDF2WithHmacSHA1.
WARNING: Once chosen this choice can not be reverted, all the data is either clear or encrypted. Mixed encryption
is not supported.
Here is an example of how you can generate the above values (be mindful to customize the byte lengths in order to add
enough entropy.
# Password generation
openssl rand -base64 64
# Salt generation
generate salt with : openssl rand -hex 16
ObjectStorage BlobStore Buckets Configuration
- objectstorage.bucketPrefix
-
Bucket is an concept in James and similar to Containers in Swift or Buckets in AWS S3.
BucketPrefix is the prefix of bucket names in James BlobStore
- objectstorage.namespace
-
BlobStore default bucket name. Most of blobs storing in BlobStore are inside the default bucket.
Unless a special case like storing blobs of deleted messages.
ObjectStorage Underlying Service Configuration
ObjectStorage AWS S3 Configuration
- objectstorage.s3.endPoint
- S3 service endpoint
- objectstorage.s3.region
- S3 region
- objectstorage.s3.accessKeyId
- S3 access key id
- objectstorage.s3.secretKey
- S3 access key secret
- objectstorage.s3.http.concurrency
- Allow setting the number of concurrent HTTP requests allowed by the Netty driver.
- objectstorage.s3.truststore.path
- optional: Verify the S3 server certificate against this trust store file.
- objectstorage.s3.truststore.type
- optional: Specify the type of the trust store, e.g. JKS, PKCS12
- objectstorage.s3.truststore.secret
- optional: Use this secret/password to access the trust store; default none
- objectstorage.s3.truststore.algorithm
- optional: Use this specific trust store algorithm; default SunX509
- objectstorage.s3.trustall
- optional: boolean. Defaults to false. Cannot be set to true with other trustore options. Wether James should validate
S3 endpoint SSL certificates.
- objectstorage.s3.read.timeout
- optional: HTTP read timeout. duration, default value being second. Leaving it empty relies on S3 driver defaults.
- objectstorage.s3.write.timeout
- optional: HTTP write timeout. duration, default value being second. Leaving it empty relies on S3 driver defaults.
- objectstorage.s3.connection.timeout
- optional: HTTP connection timeout. duration, default value being second. Leaving it empty relies on S3 driver defaults.
- objectstorage.s3.in.read.limit
- optional: Object read in memory will be rejected if they exceed the size limit exposed here. Size, exemple `100M`.
Supported units: K, M, G, defaults to B if no unit is specified. If unspecified, big object won't be prevented
from being loaded in memory. This settings complements protocol limits.
- objectstorage.s3.upload.retry.maxAttempts
- optional: Integer. Default is zero. This property specifies the maximum number of retry attempts allowed for failed upload operations.
- objectstorage.s3.upload.retry.backoffDurationMillis
- optional: Long (Milliseconds). Default is 10 (miliseconds).
Only takes effect when the "objectstorage.s3.upload.retry.maxAttempts" property is declared.
This property determines the duration (in milliseconds) to wait between retry attempts for failed upload operations.
This delay is known as backoff. The jitter factor is 0.5