George Washington Clouds

The 5th and 6th episodes were fun to record. We didn’t really have anything scripted and went more or less off the cuff based on where Brittany wanted to drive the conversation.

I’ve done a fair amount of thinking and analysis around strategy for applications and data and how they pertain to cloud and other infrastructure considerations. Data gravity is a real thing and a serious consideration when developing a strategy around cloud. Applications these days generally have established patterns to be able to move across cloud behaviors. Broadly, we see movement into one of two patterns: buildpack-based and containerized.

In buildpack-based, on the deployment step, you define (or allow the platform in infer) the buildpack. A buildpack essentially tells the platform how to build and run the code. So, if I’m deploying a python app, either I explicitly declare a python buildpack, or the platform sees some of the code, can tell it’s python, and uses the default buildpack for Python. This type of arrangement is the fundamental underpinning of Heroku, Cloud Foundry, Google App Engine, and others.

Data is a broader problem. Individual applications can manage their data in a similar fashion as described with apps above. Enterprise data warehouse type data is a different animal. It is not easy to just define and move tera- or peta- bytes of data from one environment to another. It’s normally neither operationally not fiscally sound to replicate all data across all environments. So, as companies move to the cloud, they generally incur some up-front costs of moving the data into the selected cloud provider. In parallel, they have to decide how they handle keeping data in sync and must be very particular in terms of the minimum surface area for data that requires that sync.

As you evaluate your cloud options, spend a lot of time thinking about your strategy as it pertains to data. You can focus on three questions:

  1. What do I do with my data warehouse up front?
  2. What do I need to keep in sync across infrastructure silos?
  3. What are the tradeoffs I am willing to make for real time vs analytical data?

There are many more topics than these you will need to consider – hopefully you’ll discover those as you sift your way through the options and define your ultimate strategy and tactical execution plans.