In this article by Tanmay Deshpande, the author of the book DynamoDB Cookbook, we will cover the following topics:
(For more resources related to this topic, see here.)
We are going to talk about DynamoDB implementation best practices, which will help you improve the performance while reducing the operation cost. So let’s get started.
In this recipe, we will see how to use a standalone cache for frequently accessed items. Cache is a temporary data store, which will save the items in memory and will provide those from the memory itself instead of making a DynamoDB call. Make a note that this should be used for items, which you expect to not be changed frequently.
We will perform this recipe using Java libraries. So the prerequisite is that you should have performed recipes, which use the AWS SDK for Java.
Here, we will be using the AWS SDK for Java, so create a Maven project with the SDK dependency. Apart from the SDK, we will also be using one of the most widely used open source caches, that is, EhCache. To know about EhCache, refer to http://ehcache.org/.
Let’s use a standalone cache for frequently accessed items:
<repositories>
<repository>
<id>sourceforge</id>
<name>sourceforge</name>
<url>https://oss.sonatype.org/content/repositories/
sourceforge-releases/</url>
</repository>
</repositories>
<dependency>
<groupId>net.sf.ehcache</groupId>
<artifactId>ehcache</artifactId>
<version>2.9.0</version>
</dependency>
public class ProductCacheManager {
// Ehcache cache manager
CacheManager cacheManager = CacheManager.getInstance();
private Cache productCache;
public Cache getProductCache() {
return productCache;
}
//Create an instance of cache using cache manager
public ProductCacheManager() {
cacheManager.addCache("productCache");
this.productCache = cacheManager.getCache("productCache");
}
public void shutdown() {
cacheManager.shutdown();
}
}
static ProductCacheManager cacheManager = new ProductCacheManager();
private static Item getItem(int id, String type) {
Item product = null;
if (cacheManager.getProductCache().isKeyInCache(id + ":" + type)) {
Element prod = cacheManager.getProductCache().get(id + ":" + type);
product = (Item) prod.getObjectValue();
System.out.println("Returning from Cache");
} else {
AmazonDynamoDBClient client = new AmazonDynamoDBClient(
new ProfileCredentialsProvider());
client.setRegion(Region.getRegion(Regions.US_EAST_1));
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("product");
product = table.getItem(new PrimaryKey("id", id, "type", type));
cacheManager.getProductCache().put(
new Element(id + ":" + type, product));
System.out.println("Making DynamoDB Call for getting the item");
}
return product;
}
Item product = getItem(10, "book");
System.out.println("First call :Item: " + product);
Item product1 = getItem(10, "book");
System.out.println("Second call :Item: " + product1);
cacheManager.shutdown();
EhCache is one of the most popular standalone caches used in the industry. Here, we are using EhCache to store frequently accessed items from the product table. Cache keeps all its data in memory. Here, we will save every item against its keys that are cached. We have the product table, which has the composite hash and range keys, so we will also store the items against the key of (Hash Key and Range Key).
Note that caching should be used for only those tables that expect lesser updates. It should only be used for the table, which holds static data. If at all anyone uses cache for not so static tables, then you will get stale data. You can also go to the next level and implement a time-based cache, which holds the data for a certain time, and after that, it clears the cache. We can also implement algorithms, such as Least Recently Used (LRU), First In First Out (FIFO), to make the cache more efficient.
Here, we will make comparatively lesser calls to DynamoDB, and ultimately, save some cost for ourselves.
In this recipe, we will do the same thing that we did in the previous recipe. The only thing we will change is that we will use a cloud hosted distributed caching solution instead of saving it on the local standalone cache.
ElastiCache is a hosted caching solution provided by Amazon Web Services. We have two options to select which caching technology you would need. One option is Memcached and another option is Redis. Depending upon your requirements, you can decide which one to use. Here are links that will help you with more information on the two options:
To get started with this recipe, we will need to have an ElastiCache cluster launched. If you are not aware of how to do it, you can refer to http://aws.amazon.com/elasticache/.
Here, I am using the Memcached cluster. You can choose the size of the instance as you wish. We will need a Memcached client to access the cluster. Amazon has provided a compiled version of the Memcached client, which can be downloaded from https://github.com/amazonwebservices/aws-elasticache-cluster-client-memcached-for-java.
Once the JAR download is complete, you can add it to your Java Project class path:
static String configEndpoint = "my-elastic-
cache.mlvymb.cfg.usw2.cache.amazonaws.com";
static Integer clusterPort = 11211;
static MemcachedClient client;
static {
try {
client = new MemcachedClient(new
InetSocketAddress(configEndpoint, clusterPort));
} catch (IOException e) {
e.printStackTrace();
}
}
private static Item getItem(int id, String type) {
Item product = null;
if (null != client.get(id + ":" + type)) {
System.out.println("Returning from Cache");
return (Item) client.get(id + ":" + type);
} else {
AmazonDynamoDBClient client = new AmazonDynamoDBClient(
new ProfileCredentialsProvider());
client.setRegion(Region.getRegion(Regions.US_EAST_1));
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("product");
product = table.getItem(new PrimaryKey("id", id, "type",
type));
System.out.println("Making DynamoDB Call for getting
the item");
ElasticCache.client.add(id + ":" + type, 3600, product);
}
return product;
}
A distributed cache also works in the same fashion as the local one works. A standalone cache keeps the data in memory and returns it if it finds the key. In distributed cache, we have multiple nodes; here, keys are kept in a distributed manner. The distributed nature helps you divide the keys based on the hash value of the keys. So, when any request comes, it is redirected to a specified node and the value is returned from there.
Note that ElastiCache will help you provide a faster retrieval of items at the additional cost of the ElastiCache cluster. Also note that the preceding code will work if you execute the application from the EC2 instance only. If you try to execute this on the local machine, you will get connection errors.
We are all aware of DynamoDB’s storage limitations for the item’s size. Suppose that we get into a situation where storing large attributes in an item is a must. In that case, it’s always a good choice to compress these attributes, and then save them in DynamoDB. In this recipe, we are going to see how to compress large items before storing them.
To get started with this recipe, you should have your workstation ready with Eclipse or any other IDE of your choice.
There are numerous algorithms with which we can compress the large items, for example, GZIP, LZO, BZ2, and so on. Each algorithm has a trade-off between the compression time and rate. So, it’s your choice whether to go with a faster algorithm or with an algorithm, which provides a higher compression rate.
Consider a scenario in our e-commerce website, where we need to save the product reviews written by various users. For this, we created a ProductReviews table, where we will save the reviewer’s name, its detailed product review, and the time when the review was submitted. Here, there are chances that the product review messages can be large, and it would not be a good idea to store them as they are. So, it is important to understand how to compress these messages before storing them.
Let’s see how to compress large data:
private static ByteBuffer compressString(String input)
throws UnsupportedEncodingException, IOException {
// Write the input as GZIP output stream using UTF-8 encoding
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream os = new GZIPOutputStream(baos);
os.write(input.getBytes("UTF-8"));
os.finish();
byte[] compressedBytes = baos.toByteArray();
// Writing bytes to byte buffer
ByteBuffer buffer = ByteBuffer.allocate(compressedBytes.length);
buffer.put(compressedBytes, 0, compressedBytes.length);
buffer.position(0);
return buffer;
}
private static void putReviewItem() throws
UnsupportedEncodingException, IOException {
AmazonDynamoDBClient client = new AmazonDynamoDBClient(
new ProfileCredentialsProvider());
client.setRegion(Region.getRegion(Regions.US_EAST_1));
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("ProductReviews");
Item product = new Item()
.withPrimaryKey(new PrimaryKey("id", 10))
.withString("reviewerName", "John White")
.withString("dateTime", "20-06-2015T08:09:30")
.withBinary("reviewMessage",
compressString("My Review Message"));
PutItemOutcome outcome = table.putItem(product);
System.out.println(outcome.getPutItemResult());
}
private static String uncompressString(ByteBuffer input) throws
IOException {
byte[] bytes = input.array();
ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPInputStream is = new GZIPInputStream(bais);
int chunkSize = 1024;
byte[] buffer = new byte[chunkSize];
int length = 0;
while ((length = is.read(buffer, 0, chunkSize)) != -1) {
baos.write(buffer, 0, length);
}
return new String(baos.toByteArray(), "UTF-8");
}
Compressing data at client side has numerous advantages. Lesser size means lesser use of network and disk resources. Compression algorithms generally maintain a dictionary of words. While compressing, if they see the words getting repeated, then those words are replaced by their positions in the dictionary. In this way, the redundant data is eliminated and only their references are kept in the compressed string. While uncompressing the same data, the word references are replaced with the actual words, and we get our normal string back.
Various compression algorithms contain various compression techniques. Therefore, the compression algorithm you choose will depend on your need.
Sometimes, we might get into a situation where storing data in a compressed format might not be sufficient enough. Consider a case where we might need to store large images or binaries that might exceed the DynamoDB’s storage limitation per items. In this case, we can use AWS S3 to store such items and only save the S3 location in our DynamoDB table.
AWS S3: Simple Storage Service allows us to store data in a cheaper and efficient manner. To know more about AWS S3, you can visit
http://aws.amazon.com/s3/.
To get started with this recipe, you should have your workstation ready with the Eclipse IDE.
Consider a case in our e-commerce website where we would like to store the product images along with the product data. So, we will save the images on AWS S3, and only store their locations along with the product information in the product table:
private static void uploadFileToS3() {
String bucketName = "e-commerce-product-images";
String keyName = "phone/apple/iphone6/iphone.jpg";
String uploadFileName = "C:\tmp\iphone.jpg";
// Create an instance of S3 client
AmazonS3 s3client = new AmazonS3Client(new
ProfileCredentialsProvider());
// Start the file uploading
File file = new File(uploadFileName);
s3client.putObject(new PutObjectRequest(bucketName,
keyName, file));
}
private static void putItemWithS3Link() {
AmazonDynamoDBClient client = new AmazonDynamoDBClient(
new ProfileCredentialsProvider());
client.setRegion(Region.getRegion(Regions.US_EAST_1));
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("productTable");
Map<String, String> features = new HashMap<String, String>();
features.put("camera", "13MP");
features.put("intMem", "16GB");
features.put("processor", "Dual-Core 1.4 GHz Cyclone
(ARM v8-based)");
Set<String> imagesSet = new HashSet<String>();
imagesSet.add("https://s3-us-west-2.amazonaws.com/
e-commerce-product-images/phone/apple/iphone6/iphone.jpg");
Item product = new Item()
.withPrimaryKey(new PrimaryKey("id", 250, "type", "phone"))
.withString("mnfr", "Apple").withNumber("stock", 15)
.withString("name", "iPhone 6").withNumber("price", 45)
.withMap("features", features)
.withStringSet("productImages", imagesSet);
PutItemOutcome outcome = table.putItem(product);
System.out.println(outcome.getPutItemResult());
}
AWS S3 provides storage services at very cheaper rates. It’s like a flat data dumping ground where we can store any type of file. So, it’s always a good option to store large datasets in S3 and only keep its URL references in DynamoDB attributes. The URL reference will be the connecting link between the DynamoDB item and the S3 file.
If your file is too large to be sent in one S3 client call, you may want to explore its multipart API, which allows you to send the file in chunks.
Till now, we discussed how to perform various operations in DynamoDB. We saw how to use AWS provided by SDK and play around with DynamoDB items and attributes. Amazon claims that AWS provides high availability and reliability, which is quite true considering the years of experience I have been using their services, but we still cannot deny the possibility where services such as DynamoDB might not perform as expected. So, it’s important to make sure that we have a proper error catching mechanism to ensure that the disaster recovery system is in place. In this recipe, we are going to see how to catch such errors.
To get started with this recipe, you should have your workstation ready with the Eclipse IDE.
Catching errors in DynamoDB is quite easy. Whenever we perform any operations, we need to put them in the try block. Along with it, we need to put a couple of catch blocks in order to catch the errors.
Here, we will consider a simple operation to put an item into the DynamoDB table:
try {
AmazonDynamoDBClient client = new AmazonDynamoDBClient(
new ProfileCredentialsProvider());
client.setRegion(Region.getRegion(Regions.US_EAST_1));
DynamoDB dynamoDB = new DynamoDB(client);
Table table = dynamoDB.getTable("productTable");
Item product = new Item()
.withPrimaryKey(new PrimaryKey("id", 10, "type", "mobile"))
.withString("mnfr", "Samsung").withNumber("stock", 15)
.withBoolean("isProductionStopped", true)
.withNumber("price", 45);
PutItemOutcome outcome = table.putItem(product);
System.out.println(outcome.getPutItemResult());
} catch (AmazonServiceException ase) {
System.out.println("Error Message: " + ase.getMessage());
System.out.println("HTTP Status Code: " + ase.getStatusCode());
System.out.println("AWS Error Code: " + ase.getErrorCode());
System.out.println("Error Type: " + ase.getErrorType());
System.out.println("Request ID: " + ase.getRequestId());
} catch (AmazonClientException e) {
System.out.println("Amazon Client Exception :" + e.getMessage());
}
We should first catch AmazonServiceException, which arrives if the service you are trying to access throws any exception. AmazonClientException should be put last in order to catch any client-related exceptions.
Amazon assigns a unique request ID for each and every request that it receives. Keeping this request ID is very important if something goes wrong, and if you would like to know what happened, then this request ID is the only source of information. We need to contact Amazon to know more about the request ID.
There are two types of errors in AWS:
You can read more about DynamoDB specific errors at http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ErrorHandling.html.
As mentioned in the previous recipe, we can perform auto-retries on DynamoDB requests if we get errors. In this recipe, we are going to see how to perform auto=retries.
To get started with this recipe, you should have your workstation ready with the Eclipse IDE.
Auto-retries are required if we get any errors during the first request. We can use the Amazon client configurations to set our retry strategy. By default, the DynamoDB client auto-retries a request if any error is generated three times. If we think that this is not efficient for us, then we can define this on our own, as follows:
public class CustomRetryCondition implements RetryCondition {
public boolean shouldRetry(AmazonWebServiceRequest originalRequest,
AmazonClientException exception, int retriesAttempted) {
if (retriesAttempted < 3 && exception.isRetryable()) {
return true;
} else {
return false;
}
}
}
public class CustomBackoffStrategy implements BackoffStrategy {
/** Base sleep time (milliseconds) **/private static final int SCALE_FACTOR = 25;
/** Maximum exponential back-off time before retrying a request */private static final int MAX_BACKOFF_IN_MILLISECONDS = 20 * 1000;
public long delayBeforeNextRetry(AmazonWebServiceRequest
originalRequest, AmazonClientException exception,
int retriesAttempted) {
if (retriesAttempted < 0)
return 0;
long delay = (1 << retriesAttempted) * SCALE_FACTOR;
delay = Math.min(delay, MAX_BACKOFF_IN_MILLISECONDS);
return delay;
}
}
RetryPolicy retryPolicy = new RetryPolicy(customRetryCondition,
customBackoffStrategy, 3, false);
ClientConfiguration clientConfiguration = new ClientConfiguration();
clientConfiguration.setRetryPolicy(retryPolicy);
AmazonDynamoDBClient client = new AmazonDynamoDBClient(
new ProfileCredentialsProvider(), clientConfiguration);
Auto-retries are quite handy when we receive a sudden burst in DynamoDB requests. If there are more number of requests than the provisioned throughputs, then auto-retries with an exponential back-off strategy will definitely help in handling the load. So if the client gets an exception, then it will get auto retried after sometime; and if by then the load is less, then there wouldn’t be any loss for your application.
The Amazon DynamoDB client internally uses HttpClient to make the calls, which is quite a popular and reliable implementation. So if you need to handle such cases, this kind of an implementation is a must.
In case of batch operations, if any failure occurs, DynamoDB does not fail the complete operation. In case of batch write operations, if a particular operation fails, then DynamoDB returns the unprocessed items, which can be retried.
I hope we are all aware that operations in DynamoDB are eventually consistent. Considering this nature it obviously does not support transactions the way we do in RDBMS. A transaction is a group of operations that need to be performed in one go, and they should be handled in an atomic nature. (If one operation fails, the complete transaction should be rolled back.)
There might be use cases where you would need to perform transactions in your application. Considering this need, AWS has provided open sources, client-side transaction libraries, which helps us achieve atomic transactions in DynamoDB. In this recipe, we are going to see how to perform transactions on DynamoDB.
To get started with this recipe, you should have your workstation ready with the Eclipse IDE.
To get started, we will first need to download the source code of the library from GitHub and build the code to generate the JAR file. You can download the code from https://github.com/awslabs/dynamodb-transactions/archive/master.zip.
Next, extract the code and run the following command to generate the JAR file:
mvn clean install –DskipTests
On a successful build, you will see a JAR generated file in the target folder. Add this JAR to the project by choosing a configure build path in Eclipse:
AmazonDynamoDBClient client = new AmazonDynamoDBClient(
new ProfileCredentialsProvider());
client.setRegion(Region.getRegion(Regions.US_EAST_1));
// Create transaction table
TransactionManager.verifyOrCreateTransactionTable(client,
"Transactions", 10, 10, (long) (10 * 60));
// Create transaction images table
TransactionManager.verifyOrCreateTransactionImagesTable(client,
"TransactionImages", 10, 10, (long) (60 * 10));
TransactionManager txManager = new TransactionManager(client,
"Transactions", "TransactionImages");
Transaction t1 = txManager.newTransaction();
Map<String, AttributeValue> product = new HashMap<String, AttributeValue>();
AttributeValue id = new AttributeValue();
id.setN("250");
product.put("id", id);
product.put("type", new AttributeValue("phone"));
product.put("name", new AttributeValue("MI4"));
t1.putItem(new PutItemRequest("productTable", product));
Map<String, AttributeValue> product1 = new HashMap<String, AttributeValue>();
id.setN("350");
product1.put("id", id);
product1.put("type", new AttributeValue("phone"));
product1.put("name", new AttributeValue("MI3"));
t1.putItem(new PutItemRequest("productTable", product1));
t1.commit();
The transaction library when invoked, first writes the changes to the Transaction table, and then to the actual table. If we perform any update item operation, then it keeps the old values of that item in the TransactionImages table. It also supports multi-attribute and multi-table transactions. This way, we can use the transaction library and perform atomic writes. It also supports isolated reads. You can refer to the code and examples for more details at https://github.com/awslabs/dynamodb-transactions.
Till now, we have used a synchronous DynamoDB client to make requests to DynamoDB. Synchronous requests block the thread unless the operation is not performed. Due to network issues, sometimes, it can be difficult for the operation to get completed quickly. In that case, we can go for asynchronous client requests so that we submit the requests and do some other work.
To get started with this recipe, you should have your workstation ready with the Eclipse IDE.
Asynchronous client is easy to use:
AmazonDynamoDBAsync dynamoDBAsync = new AmazonDynamoDBAsyncClient(
new ProfileCredentialsProvider());
Map<String, AttributeValue> key = new HashMap<String, AttributeValue>();
AttributeValue id = new AttributeValue();
id.setN("10");
key.put("id", id);
key.put("type", new AttributeValue("phone"));
DeleteItemRequest deleteItemRequest = new DeleteItemRequest(
"productTable", key);
dynamoDBAsync.deleteItemAsync(deleteItemRequest,
new AsyncHandler<DeleteItemRequest, DeleteItemResult>() {
public void onSuccess(DeleteItemRequest request,
DeleteItemResult result) {
System.out.println("Item deleted successfully: "+
System.currentTimeMillis());
}
public void onError(Exception exception) {
System.out.println("Error deleting item in async way");
}
});
System.out.println("Delete item initiated" +
System.currentTimeMillis());
Asynchronous clients use AsyncHttpClient to invoke the DynamoDB APIs. This is a wrapper implementation on top of Java asynchronous APIs. Hence, they are quite easy to use and understand. The AsyncHandler is an optional configuration you can do in order to use the results of asynchronous calls. We can also use the Java Future object to handle the response.
We have covered various recipes on cost and performance efficient use of DynamoDB. Recipes like error handling and auto retries helps readers in make their application robust. It also highlights use of transaction library in order to implement atomic transaction on DynamoDB.
Further resources on this subject:
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…