Name: PlikShare
Author: Damian Krychowski

This is a narrative summary aimed at engineers evaluating the design. The canonical reference is docs/managed-encryption.md in the PlikShare repo. Anything that conflicts with it has drifted and the docs win.

What managed encryption is, and what it isn't

Managed encryption is the mode where the server holds the keys. Pick it on a storage at creation time, and every file written to that storage gets encrypted with AES-256-GCM before any byte leaves the server. The keys that decrypt those files live in the database, themselves wrapped under a master password loaded from an environment variable at process start.

It is not end-to-end encryption. The server can read every file in every managed storage at any moment; it has to, to serve the bytes back to a browser asking for them. The point is narrower: an attacker who steals just the file storage - an exposed S3 bucket, a leaked backup of the blob layer, a stolen disk volume - sees only ciphertext. The keys live somewhere else.

The threat model, explicitly

Managed encryption is built around two properties:

Resilient to file-storage compromise. Bucket leak alone is not enough. The IKMs that derive per-file AES keys are wrapped in the database under the master password.
Recoverable from a recovery code alone. Database gone, master password gone, only the bucket survives? A 24-word BIP-39 mnemonic generated at storage creation regenerates the same keys deterministically. No DB row, no admin assistance.

What it explicitly does not defend against:

Database + master password leak together. Anyone with both can re-derive every IKM (PBKDF2 with the per-row salt baked into the wrapped blob) and decrypt every managed file. The master password lives in the deployment's environment. A leaked .env, a misconfigured CI log, or a compromised Docker host hands it over.
Live access to the server process. The master password sits in pinned + mlocked memory for the entire process lifetime, because every IKM unwrap needs it. An attacker who can dump that memory has the password.

That's the deliberate trade: convenience and zero per-user setup, paid for with full server access to every file at every moment. If the threat model includes the server itself, full encryption is the mode for that.

The server master password

The master password is provided through the PlikShare_EncryptionPasswords environment variable, read once at startup. After that it lives only inside MasterEncryptionKeyProvider as a pinned + mlocked SecureBytes, never as a plain string on the GC heap, never logged, never serialized.

The variable can carry multiple passwords separated by commas. The last one is the current key; older entries (id 1, id 2, …) exist only to decrypt rows written under previous deployments. This is the rotation handle for the password itself: append a new password, redeploy, re-encrypt rows under the new id over time, eventually drop the old entry once nothing references it.

Wrapping a database row

The master password never encrypts anything directly. Each row that needs encryption gets its own AES-GCM key derived via PBKDF2-SHA256 (650 000 iterations) with a freshly random per-row salt:

MasterKey.PasswordBytesSecureBytes, pinned + mlocked

PBKDF2-SHA256, salt = 16 random bytes per row, 650 000 iterations

Per-row AES-GCM key32 bytes, ephemeral

AES-GCM, fresh nonce per encrypted value

Frame layout (one BLOB per encrypted column)

MasterKeyId1B

IterationsFactor2B

Salt16B

Nonce12B

Tag16B

CiphertextN×B

The salt is embedded in the frame, so the read path re-derives the same AES key from the same password without consulting any external state. IterationsFactor records the iteration count used at write time, so a future deployment can raise the cost without breaking older rows.

The IKM hierarchy

Each managed storage carries a list of input key materials (IKMs): 32-byte secrets, one of which is selected per file at write time and recorded in the file's header. The list lives in s_encryption_details_encrypted on the storage row, as a JSON document {"Ikms": ["base64...", "base64..."]}, wrapped under the per-row scheme above.

Both the IKMs and the recovery code descend from one 32-byte random recovery seed generated at storage creation:

Recovery seed32 random bytes, ephemeral, returned once as 24-word BIP-39 mnemonic

HKDF-SHA256, info = ("plikshare-dek\0" || version_be32)

IKM v{version}32 bytes

HKDF-SHA256, salt = FILE_SALT

File AES key32 bytes, ephemeral

Holding the recovery seed reaches every file in the storage. Holding only one file's IKM also reaches every file in the storage, not because the IKM is derived per-file, but because the IKM is shared across files written under the same version.

IKMs in memory

The PBKDF2 derivation that unwraps the IKM blob is paid only once per storage. At application startup, every row in s_storages is read, its s_encryption_details_encrypted is unwrapped (one PBKDF2 against the salt in the frame), and the resulting IKMs are decoded into a version → byte[] map held inside the storage's ManagedStorageEncryption instance. From that point on, file operations on a running server read the IKM directly from memory. No further PBKDF2, no further AES-GCM unwraps of the storage row.

Startup time grows linearly with the number of managed-encryption storages (one 650 000-iteration PBKDF2 each). The hot path stays free of master-password derivation entirely.

Storage creation flow

Creating a managed-encryption storage is the moment the first IKM comes into existence, and the only moment the recovery code is returned to the creator.

Recovery seed32 random bytes, ephemeral

HKDF-SHA256, info = ("plikshare-dek\0" || version_be32(0))

IKM v032 bytes

AesGcmMasterDataEncryption.Encrypt, AES-GCM under PBKDF2(master pwd, row salt)

s_encryption_details_encryptedpersisted on storage row

RecoveryCodeCodec (BIP-39, 24 words)

Recovery codereturned to creator exactly once

The seed feeds two independent derivations and is then zeroed. It is never persisted. What survives in the database is the encrypted IKM blob; the only copies of the seed in existence are the ones the creator writes down.

A creator does not need any per-user encryption material to make a managed-encryption storage. No public key to seal to, no encryption password to set, no session to unlock. Every user with permission to create a storage can create a managed one, regardless of whether they have a full-encryption identity set up.

The file frame

Files use a streaming AEAD adapted from Google Tink. Managed-encryption storages use the V1 frame format throughout. The header is fixed-size and assumes a single server-managed key context.

First segment

HEADER41B

CIPHERTEXTN×B

TAG16B

Subsequent segments

CIPHERTEXTN×B

TAG16B

HEADER breakdown (41 bytes total)

SIZE1B

KEY_VERSION1B

SALT32B

NONCE_PREFIX7B

Segments are 1 MiB of plaintext each. Each IV is constructed deterministically from the file-level NONCE_PREFIX, the segment index (4-byte big-endian), and a final flag byte that distinguishes the last segment from the rest. The flag byte stops truncation attacks: an attacker cannot drop trailing segments and have the result re-authenticate, because the final segment's IV is shaped differently from any intermediate one.

Why streaming, and why range reads work

Three reasons for streaming AEAD: large files, HTTP Range: requests, and parallel uploads. A single AES-GCM call would force the whole plaintext through memory and give no random access. Segmenting lets any segment be authenticated and decrypted independently, which is what makes video seek, resumable downloads, and S3-style multi-part upload all work on encrypted blobs.

A plaintext byte range maps to an encrypted byte range that includes the full segment containing each endpoint. The server doesn't even need to fetch the file header on the hot path: KEY_VERSION, SALT, and NONCE_PREFIX are duplicated onto the file's database row, so a range read computes which segments cover the range, pulls just those segment ranges from storage, and decrypts each independently into the response stream.

The header copy in the file itself still matters. It's what lets an offline recovery tool decrypt a file with no database at all. On the hot path, the database row is the source of truth.

What is not encrypted in managed mode

Managed encryption protects file contents. Everything else stays in the database in plaintext: file names, folder names, workspace and box names, comment bodies, link names, audit log details, storage configuration (bucket names, S3 endpoints, hard-drive paths).

The reason is consistent with the threat model. The database is already inside the trust boundary, the server reads it constantly to operate, and the master password is already on the server. Encrypting metadata under that same password would produce a cosmetic layer that any attacker who already holds the password can lift. The work is only worthwhile when there is a key the server itself does not hold, which is exactly the construction full encryption introduces.

The recovery code, and why it matters

The 24-word mnemonic returned at storage creation is not used during normal operation. Day-to-day decryption goes through the wrapped IKM blob in the database. The recovery code matters in two scenarios:

The database is gone, and with it every encrypted IKM blob.
The master password is gone - the env var was rotated and no operator kept a copy, or the deployment was migrated and the password was lost in transit - making the IKM blobs unreadable even though the database is intact.

Either way the file storage may still be intact, full of encrypted file frames nobody can read. Every V1 header carries the IKM version, the per-file salt, and the per-file nonce prefix, so the only missing input is the IKM itself. The recovery code is exactly that input:

24-word BIP-39 mnemonicheld by the creator

RecoveryCodeCodec.TryDecode

Recovery seed32 bytes, ephemeral

HKDF-SHA256, info = ("plikshare-dek\0" || version_be32(v))

IKM v{v}

for each file: read header, HKDF(IKM, FILE_SALT) → file AES key

Per-file AES key → decrypt frame

The same derivation that produced IKM v0 at creation time produces it again from the same seed. A file plus the recovery seed is enough to derive its AES key without consulting any external state.

This recovery path is not yet exposed in the product. No endpoint accepts a recovery code, no UI prompts for it. The intended consumer is an out-of-band tool that operates directly on the file storage: point it at a bucket, give it the 24-word mnemonic, and let it walk every encrypted frame, deriving keys from headers and emitting plaintext. No database, no running server, no master password. Only the storage and the seed.

What the recovery code cannot reconstruct is anything that lived only in the database: workspace structure, file names and folder paths, audit history, share links, user accounts. Recovery is a path back to the bytes, not back to the application state.

Choosing between managed and full encryption

Managed encryption is the right choice when the threat being defended against is storage-only compromise: a leaked S3 backup, an exposed disk volume, a hosting provider with read access to the blob layer but not the server. It is the wrong choice when the threat model includes the server itself: a hosting compromise, a malicious admin, a subpoena reaching the running process.

Full encryption is the answer to that broader threat model, at the cost of per-user setup and a degraded experience for users who lose their encryption password without their recovery code. Managed encryption is what most installations want by default; full encryption is what installations with explicit confidentiality requirements reach for.

Where to go next

For the format byte layouts, the rotation story, and the parts of the code that connect to the rest of PlikShare, the canonical reference is docs/managed-encryption.md in the repo. The companion piece on full encryption (per-user keypairs, sealed-box sharing, encrypted file names and audit logs) lives at /blog/full-encryption.

Damian Krychowski