SQS Demo

Mary
Task:

SQS Queue Details
Number of visible messages:	unknown
Number of invisible messages:	unknown
Dead letter queue messages:	unknown

Sally

No active tasks

Jack

No active tasks

Ashley

No active tasks

This is a demo of an SQS queue that is used to manage household "chores". The mother of the family, Mary, can add tasks to the queue. Example tasks would be "Walk the dog" or "Load the dishwasher". Each of the three children, Sally, Jack and Ashley, can click on their respective "Get a task" button to pull a task from the queue. When they are done, they click on "Finish this task" so that the SQS queue knows that the task is done and can be purged from the queue.

Messages in an SQS queue are not stored on a single backend system. Instead, Amazon uses a cluster of systems spread over multiple availability zones to store and process messages. For this reason the amount of messages in the queue is not always an accurate number: Amazon polls only a few of the systems of the cluster, and uses this to create a weighted average that's probably very close to the actual number. But it's not entirely accurate. You can see this during the demo: Sometimes the number of messages jump around a bit, for no apparent reason. (Note that the polling interval for the queue statistics is set to two seconds.)

By its very nature, SQS messages are not guaranteed to be delivered in the proper order. You can easily see this when you enter numbered messages into the queue: They will not be received in the same order.

Once a message has been "received" by a worker, the visibility timeout starts. In this example, the visibility timeout is set to 30 seconds. During this period the message is not visible to other workers. This is done so that messages will, in principle, only be processed once. However, if the worker crashes on the message, or takes too long to process a message, then the message will be made visible again after the timeout expires so that another worker can try and process this.

Obviously that other worker most likely uses the same code to process the message. So if the message caused worker #1 to crash, then worker #2 will most likely also crash on that message. In order to prevent a faulty message from tying up all resources, this queue uses a "redrive policy": If a message has been delivered more than a certain number of times to a worker, and still has not been deleted, it will be delegated to a "dead letter queue".

The redrive policy for this queue has a maximum receive number of two. This means that after two unsuccessful receives, the message will be sent to the DLQ. How this works is a little counterintuitive though: The message is NOT sent to the DLQ when the second visibility timeout expires. Rather, it is sent to the DLQ when the third receive is attempted on the message.

Under normal circumstances messages that end up in the DLQ need to be looked at by a human operator, for instance to determine the reason that the message could not be processed. For this particular demo, I have set a policy on the DLQ that messages should automatically be deleted after one day.

This demo is written in JavaScript. Look at the source of this page, at the end of the document, to view the code. It is very straightforward.

All JavaScript routines use the same backend IAM user, SQSDemo. This user has the following policy attached:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": [
                    "sqs:sendMessage",
                    "sqs:getQueueAttributes",
                    "sqs:receiveMessage",
                    "sqs:deleteMessage"
                ],
                "Effect": "Allow",
                "Resource": [
                    "arn:aws:sqs:eu-central-1:973674585612:SQSDemo",
                    "arn:aws:sqs:eu-central-1:973674585612:SQSDemoDLQ"
                ]
            }
        ]
    }

In a production environment, you may want to make a distinction between three tasks (Observer, Submitter and Receiver) and setup different user accounts for each of the tasks. Also it is considered bad practice to store the user account credentials (access key, secret key) in the code, like I've done here. In this particular case, the policy associated with this user is very limited so I'll take my chances. However, in a production environment you would be using something like AWS Cognito to provide credentials to the user. But that would make the demo a lot more complicated and would distract from its purpose.