Thursday, October 18, 2012

My CloudWatch Monitor for AWS

Amazon EC2 offers the CloudWatch service to monitor cloud instances as well as load balancers. While this service comes at some cost (0,015$/hour/instance) it offers useful infrastructure metrics about the performance of your EC2 infrastructure. While there are commercial and free tools out there which provide this service, you might not want to invest in them or add another tool to your monitoring infrastructure. This post will provide step-by-step guidance on how to extend your monitoring solution to retrieve cloud metrics. The code sample is based on the free and open-source dynaTrace plugin for agent-less cloud monitoring. Some parts however have been simplified or omitted in tutorial. The major parts that are missing in this sample are dynamic discovery of EC2 instances and an algorithm which is a bit more reliable and accurate in retrieving monitoring data.

Step 1 – Basic Infrastructure

So let’s get started by setting up our basic infrastructure. First we need to download the Java Library for Amazon CloudWatch . Alternatively you can create your own Web Service stubs or simply use the REST interface. For simplicity we rely on the ready-to-use library provided by Amazon. The we create a Java class for our cloud monitor which implements the basic functionality we need. For brevity I will omit any imports needed – in Eclipse CTRL – SHIFT – O will do the job :-)
public class CloudWatchMonitor {
 
 private static class MeasureSet{
   public Calendar timestamp;
   public HashMap<String, Double> measures = new HashMap<String, Double>();
 
   @Override
   public int compareTo(MeasureSet compare) {
     return (int) (timestamp.getTimeInMillis() - compare.timestamp.getTimeInMillis());
   }
 
   public void setMeasure(String measureName, double value) {
     measures.put(measureName, value);
   }
 
   public Set<String> getMeasureNames() {
     return measures.keySet();
   }
 
   public double getMeasure(String measureName) {
     return measures.get(measureName);
   }
 }
 
 private String instanceId;
 
 private AmazonCloudWatchClient cloudWatchClient;
 
   public static void main(String... args) throws Exception {
   CloudWatchMonitor monitor = new CloudWatchMonitor("<instanceName>", Credentials.accessKeyId, Credentials.secretAccessKey);
   for (;;) {
     MeasureSet measureSet = monitor.retrieveMeasureSet(measureNames);
     if (measureSet != null) {
     printMeasureSet(measureSet);
   }
   Thread.sleep(60000);
   }
 }
 
 public CloudWatchMonitor(String instanceId, String accessKeyId, String secretAccessKey) {
   cloudWatchClient = new AmazonCloudWatchClient(accessKeyId, secretAccessKey);
   this.instanceId = instanceId;
 }
}
So what have we done? We defined the CloudWatchMonitor which will contain all our logic. The main method simply queries every minute for new measures and prints them. We have chosen an interval of one minute as CloudWatch provides accuracy to one-minute intervals. Additionally, we defined the inner MeasureSet class which represent a set of measures collected for a given timestamp. We have used a HashMap to make the implementation more generic. The same is true for the retrieveMeasureSet method which takes the measures it retrieves as an input. Finally we defined the constructor of our monitor to create an instance of an AmazonCloudWatchClient – this is supplied by the Amazon library we use -  and store the instanceID of the EC2 instance to monitor.  The  accessKeyID and secretAccessKey are the credentials provided for your Amazon EC2 account.

Step 2 – Retrieve Monitoring Information

Now we have to implement the retrieveMeasureSet method which is the core of our implementation. As there are quite a number of things we have to do, I will split the implementation of this method into several parts. We start by creating a GetMetricStatisticsRequest object which contains all information which data we are going to request. First we set the namespace of the metrics which in our case is AWS/EC2 (in case we want to retrieve load balancer metrics it would be AWS/ELB). Next we define which statistical values we want to retrieve.  Then we define the period of the monitoring data. In our case this is one minute. If you want aggregated data you can specify any multiple of 60. Then we define which measure aggregates we want to retrieve. CloudWatch offers average, minimum and maximum values. As our aggregation will only contain one data point all of them will be the same. Therefore we only retrieve the average.
public MeasureSet retrieveMeasureSet(ArrayList<String> measureNames) throws AmazonCloudWatchException, ParseException {
 
  GetMetricStatisticsRequest getMetricRequest = new GetMetricStatisticsRequest();
  getMetricRequest.setNamespace("AWS/EC2");
  getMetricRequest.setPeriod(60);
  ArrayList<String> stats = new ArrayList<String>();
  stats.add("Average");
  getMetricRequest.setStatistics(stats);
  ArrayList<Dimension> dimensions = new ArrayList<Dimension>();
  dimensions.add(new Dimension("InstanceId", instanceId));
  getMetricRequest.setDimensions(dimensions);
Next we have to define the time frame for which we want to retrieve monitoring data. This code looks a bit complex simply because we have to do some number formatting here. CloudWatch expects the time in a special format and all date values in ISO 8601 format which use UTC and looks like this 2010-04-22T19:12:59Z. Therefore, we have to get the current UTC time and format the date strings in the proper format.  We take the current time as the end time and the start time is 10 minutes back in the past. Why are we doing this? The reason is that CloudWatch data is written asynchronously and the latest metrics we will get will be a couple of minutes in the past.  If we set the start time to one minute in the past we would not get any metrics.
 String dateFormatString = "%1$tY-%1$tm-%1$tdT%1tH:%1$tM:%1$tSZ";
 GregorianCalendar calendar = new GregorianCalendar(TimeZone.getTimeZone("UTC"));
 calendar.add(GregorianCalendar.SECOND, -1 * calendar.get(GregorianCalendar.SECOND));
 getMetricRequest.setEndTime(String.format(dateFormatString, calendar));
 calendar.add(GregorianCalendar.MINUTE, -10);
 getMetricRequest.setStartTime(String.format(dateFormatString, calendar));
Additionally we have to add the following code to the constructor to calculate our UTC offset and define the timezone member field.
  TimeZone zone = TimeZone.getDefault();
  timeOffset = zone.getOffset(new Date().getTime()) / (1000 * 3600);
The next thing we have to do now is retrieve the actual metrics. As we will get more than one measurement we have to store them to later on select the latest measurement. The inconvenient part here is that the CloudWatch API does not allow us to retrieve more than one timer at once.  Therefore we have to make a request for each metric we want to retrieve.  Additionally you will notice some possibly cryptic date parsing and calculation.  What we do here is parse the date string we get back from Amazon and create a calendar object.  The tricky part is that we will have to add (or subtract) the offset of our current timezone to UTC.  The formatter is defined as private DateFormat formatter = new SimpleDateFormat(“yyyy-MM-dd’T'HH:mm:SS’Z'”);
HashMap<Long, MeasureSet> measureSets = new HashMap<Long, MeasureSet>();
for (String measureName : measureNames) {
  getMetricRequest.setMeasureName(measureName);
  GetMetricStatisticsResponse metricStatistics = cloudWatchClient.getMetricStatistics(getMetricRequest);
  if (metricStatistics.isSetGetMetricStatisticsResult()) {
    List<Datapoint> datapoints = metricStatistics.getGetMetricStatisticsResult().getDatapoints();
    for (Datapoint point : datapoints) {
    Calendar cal = new GregorianCalendar();
    cal.setTime(formatter.parse(point.getTimestamp()));
    cal.add(GregorianCalendar.HOUR, timeOffset);
    MeasureSet measureSet = measureSets.get(cal.getTimeInMillis());
    if (measureSet == null) {
      measureSet = new MeasureSet();
      measureSet.timestamp = cal;
      measureSets.put(cal.getTimeInMillis(), measureSet);
   }
   measureSet.setMeasure(measureName, point.getAverage());
}
The last part is to retrieve the latest available measurements and return them. Therefore we will simply sort the measurements and return the latest one.
 ArrayList<MeasureSet> sortedMeasureSets = new ArrayList<MeasureSet>(measureSets.values());
 if (sortedMeasureSets.size() == 0) {
   return null;
 } else {
   Collections.sort(sortedMeasureSets);
   return sortedMeasureSets.get(sortedMeasureSets.size() - 1);
 }
In order to make sorting work we have to make the MeasuresSet implement comparable
private static class MeasureSet implements Comparable<MeasureSet> {
 
  @Override
  public int compareTo(MeasureSet compare) {
    return (int) (timestamp.getTimeInMillis() - compare.timestamp.getTimeInMillis());
  }
 
  // other code omitted
}

Step 3 – Printing the Results

Last we have to print the results to the console.  This code here is pretty straightforward and shown below.
public static void printMeasureSet(MeasureSet measureSet) {
  System.out.println(String.format("%1$tY-%1$tm-%1$td %1tH:%1$tM:%1$tS", measureSet.timestamp));
  for (String measureName : measureSet.getMeasureNames()) {
    System.out.println(measureName + ": " + measureSet.getMeasure(measureName));
  }
}

Step 4 – Defining the Metrics to Retrieve

Our code is now nearly complete the only thing we have to do is define which metrics we want to retrieve. We can either pass them as command-line parameters or explicitly specify them. CloudWatch supports the following parameters for EC2 instances:
  • CPUUtilization
  • NetworkIn
  • NetworkOut
  • DiskReadBytes
  • DiskWriteBytes
  • DiskReadOperations

Step 5 – Visualizing and Storing the Data

As you most likely do not want to look at the data on your console, the final step is to visualize and store the data. How to implement this depends on the monitoring infrastructure you are using- Below you can see a sample of how this data looks in dynaTrace.
CloudWatch-based Instance Monitoring
CloudWatch-based Instance Monitoring in dynaTrace

Conclusion

Building your own CloudWatch monitoring is pretty easy. The metrics provided enable an initial understanding how your EC infrastructure is behaving. These metrics are also input to the Amazon EC2 Auto Scaling infrastructure.
If you want to read more articles like this visit dynaTrace 2010 Application Performance Almanac

No comments:

Post a Comment