Configuring data storage with AWS

When enabled, the Carbon Engine will store user data (transactions, question responses and account preferences) alongside with the footprints and savings calculated by the engine. Currently, AWS is the only supported cloud provider. If you'd like to use a different storage platform, contact your Cogo representative with your requirements.

Storage options

There are two main storage options for running Carbon Engine:

  1. AWS S3 + DynamoDB
  2. Locally mounted volume + AWS DynamoDB

For both options data is stored using one file per user in an optimised binary format (for performance).

AWS S3 + DynamoDB

This uses S3 to store user data files and is our recommended approach.

Locally mounted volume + AWS DynamoDB

Instead of storing files in S3, they can be stored in a locally mounted volume. As Carbon Engine creates one file per user, the file system must be able to efficiently handle a large number of files.

If hosting in AWS, EFS might be the service providing the mounted volume. EFS provides a true directory structure, which might be useful if you have other applications (such as backups) that operate directly on directory structures.

DynamoDB is still required in order to provide file locking capability between Carbon Engine instances.

DynamoDB synchronises writes

In order to provide file locking capability between separate Carbon Engine instances, DynamoDB is used to synchronise writes. Before updating a file, the Carbon Engine will create a locking record in Dynamo, using the user_id to generate a primary key. Once the operation is complete, the locking record is deleted. An expiry timestamp and unique operation id are used to ensure that a file does not remain locked in the case that a process dies before deleting the locking record it created.

{
    "id": "string" // based on the user id; is used as the primary key
    ...
}

If an instance of the Carbon Engine attempts to perform an operation on a locked file, it will wait and retry the operation, meaning that failures due to locking conflicts should be rare.

Records in DynamoDB only last a matter of milliseconds and are very small, so the cost will be minimal.

Setup

Using AWS S3 + DynamoDB

Steps

  1. Select an AWS region. Your S3 bucket and DynamoDB table must be in the same region.
  2. Create an "S3 standard" bucket. Public access should be disabled; versioning is not recommended as the files change frequently.
  3. Create a DynamoDB table. Name the partition key id. We recommend you enable "on-demand" capacity mode until you have a good sense of your transaction throughput.
  4. Create an IAM policy as follows, and attach either to a user, to an ECS task role, or to the Carbon Engine's instance role.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:DeleteItem",
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:Query",
        "dynamodb:Scan",
        "dynamodb:UpdateItem",
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::BUCKET-NAME",
        "arn:aws:s3:::BUCKET-NAME/*",
        "arn:aws:dynamodb:REGION:ACCOUNT:table/TABLE-NAME",
        "arn:aws:dynamodb:REGION:ACCOUNT:table/TABLE-NAME/index/*"
      ]
    }
  ]
}
  1. Configure the following container environment variables:

    • AWS_REGION set to the AWS region you selected above.
    • AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY set for the user that has access to the above policy. If you have attached the policy to the ECS task role or instance role, these can be left unset.
    • COGO_STORAGE set to s3:bucket-name?locking=dynamo&table=table-name, where bucket-name is the name of the S3 bucket and table-name is the name of the DynamoDB table you set up above.

Using locally mounted volume + AWS DynamoDB

Steps

  1. Select an AWS region.
  2. Create a DynamoDB table. Name the partition key id. We recommend you enable "on-demand" capacity mode until you have a good sense of your transaction throughput.
  3. Create an IAM policy as follows, and attach either to a user, to an ECS task role, or to Carbon Engine's instance role.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:DeleteItem",
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:Query",
        "dynamodb:Scan",
        "dynamodb:UpdateItem"
      ],
      "Resource": [
        "arn:aws:dynamodb:REGION:ACCOUNT:table/TABLE-NAME",
        "arn:aws:dynamodb:REGION:ACCOUNT:table/TABLE-NAME/index/*"
      ]
    }
  ]
}
  1. Configure the following container environment variables:

    • AWS_REGION set to the AWS region you selected above.
    • AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY set for the user that has access to the above policy. If you have attached the policy to the ECS task role or instance role, these can be left unset.
    • COGO_STORAGE set to path:///path/to/storage?locking=dynamo&table=table-name, where /path/to/storage is the directory accessible from inside the container where you'd like user files to be stored and table-name is the name of the DynamoDB table you configured above. Note that the URI contains three slashes in a row.