DevOps Cafe Podcast
DevOps Cafe hosted by John Willis and Damon Edwards

Hack Day in China...

Minimalist pipeline.. CI loop, CD model... 

Customer was Centos based.... 

SVN,Jenkins, Rundeck, JBoss and Mysql... (5 Servers).  

Due to cultural differences....

Make a change to the code and see the CI loop, delivery loop take place.  

The do a Roll back....

Talk about Rundeck and the mechanics - Jenkins Rundeck Plugin.  Drives the rundeck job automatically.  Also Rundeck pulls the latest, last and history of builds.  

Normally we would use Chef or Puppet on the back end of Rundeck... 

Point about manual vs automated process


A Contentious Question by Chris Hoff (@beaker)

Contentios question by @beaker.. no,  Actually if you guys don’t follow @beaker you should.  He is our #devops mirror for networking and security (imho).  He tried to keep us honest...

“Given the recent influx of virtual networking solutions, many of which are OpenFlow-based, what possible in-roads and value can they hope to offer in heavily virtualized enterprise environments wherein the virtual networking is owned and controlled by VMware?”

Specifically, if the only third-party VMware virtual switch to date is Cisco’s and access to this platform is limited (if at all available) to startup players, how on Earth do BigSwitch, Nicira, vCider, etc. plan to insert themselves into an already contentious environment effectively doing mindshare and relevance battle with the likes of mainline infrastructure networking giants and VMware?

If you’re answer is “OpenFlow and OpenStack will enable this access,...

Not meaning to piss anyone off, but many of these startups’ business plans are shrouded in the mystical vail of “wait and see.”


Pro Puppet [Kindle Edition]


Ask about BigDrops?

 Learning and Hadoop

September 2011 – HUG– Atlanta, GA

Machine Learning With Hadoop

Josh Patterson | Sr Solution Architect

Open source work at


After the refining process, one barrel of crude oil yielded more than 40% gasoline and only 3% kerosene, creating large quantities of waste gasoline for disposal.”

--- Excerpt from the book “The American Gas Station”

Hadoop Today: The Oil Industry Circa 1900

Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this year

Speed @ Scale is the new Killer App

Results in that previously took 1 day to process can gain new value when created in 10 minutes.

ML Focused on in Mahout

An algorithm that looks at a user’s past actions and suggests




What is time series data?

What is ISAX Lumberyard?


Packages For Hadoop


UDFs in Pig

used at LinkedIn in many of off-line workflows for data derived products

"People You May Know”





Hadoop Not Good At in Data Mining

We’ve talked about this before with twitter's storm...

Not everything fits great in MapReduce

Mahout as evidence of this


Common Challenges in DevOps Change Management

Matt Ray

Starts out with meat and potato .. everything in source control

Talks about spiceweasel .. yaml or jason..

Think roles not nodes... nodes can be ephemeral.. 

Start from scratch don’t reuse nodes...

Don’t hard code IP addresses .. m&p

Golden mages are an anti-pattern.  <The don’t have to be...


Three Drunk in SysAdmins....

Continuous Delivery of Server Configurations


Puppet, Git, Mcollective,Jeninks,and Capistrano

Three part series... 



Announcing the MongoDB Monitoring Service (MMS)

MongoDB Monitoring Service (MMS) to the public for free. MMS is a SaaS based tool that monitors your MongoDB cluster and makes it easy for you to see what’s going on in a production deployment.


Devops Chicago and Devops Camp

Interview w/Martin J. Logan

Oct 22 - 23 in Chicago

Camp Devops 


Devopsdays Goteborg 2011

The conference will be a two day event on Friday 14 and Saturday 15 of October 2011

Direct download: Devops_Drop_023.mp3
Category:general -- posted at: 5:14pm PDT

Gartner Cites Application Release Automation Tools as Key to DevOps

So many things wrong with this...   (ARA) Nolio 

 As an emerging movement, DevOps may have improved communication and collaboration between development and IT operations teams but it still hasn’t absolutely mastered its ultimate goal of unifying their work. 

 In his report, Ronni J. Colville argues that DevOps can be greatly enhanced by the use of application release automation (ARA) tools and specifically cites Nolio as one such solution.

Can tools help “Cuture”  this question was asked at Puppetconf w/Luke... 

Visibility also requires the establishment of a model of the application and its configuration for each environment. ARA tools can provide a mechanism to create application models for each environment, with externalized configuration settings that typically vary by environment.


Selenium and Nagios

I've implemented a Nagios check for Selenium test cases. With this check it is possible to put your recorded test cases from your Selenium IDE into Nagios to use them for monitoring.

Test-->Selenium IDE-->Export-->check_selenium (nagios plugin)-->Selenium Remote Control


DevOps in Milliseconds

AppNexus engineers have it good. 

They don’t lie awake at night wondering if we can handle the next increase of impressions. 

They don’t worry that our systems are down and we don’t know it. 

They don’t develop in a bubble, toss their code over the wall to a mysterious group of people, and wash their hands clean.

Monitoring: Nagios - 1200 services

Metrics: Graphite - 1 million datapoints every minute.

Nagios plugin that queries Graphite and alerts if values of certain metrics go above or below specified thresholds.

Deployment: Puppet and Maestro

Puppet backed by a MySQL database and fronted by an in-house application we call Maestro.

At AppNexus there is no wall between engineers and operations, and automation is crucial to scaling our infrasctructure. 

Engineers control their own destiny, and we give them the tools to dive deep into production problems and give them tools to dive deep into production problems, make fixes, and improve their products as quickly as they can code.


CI vs Zombies

Runaway builds.

--A runaway build occurs when not all processes created by the build exit cleanly. 

--Zombies – may hang the build, or simply stay around in the background waiting to wreak havoc. 

--They interfere with test isolation. If processes can hang around from an earlier build (or earlier test within the same build) they may affect unrelated tests.

--difficult-to-diagnose failures.

-- eventually leading to exhaustion.

--Manual intervention is required to kill them and clean up. 


Openstack Compute API v1.1 support

Implement fog support for the Openstack Compute API v1.1. Includes support for legacy v1.0 style auth and v2.0 keystone auth.


Mean time to pretty chart- DevOps meets data porn

Alex Benik is a principal at Battery Ventures. Battery Ventures is an investor in DataDog and Tracelytics. 

The current mantra in Web operations is to track, record and monitor everything. Data is valuable and storage is cheap.

Favorite Velocity  John Rauser at Amazon and Kellan Elliot-McCrea from Etsy.

Mean Time to Pretty Chart (MTPC). For full buzzword compliance, let’s say that WebOps + BigData + Information/Graphic Design = MTPC.

MTPC attempts to quantify the amount of time required to determine the root cause of an operational issue and depict it in an eye-catching way. The MTPC metric is challenging because it encompasses a number of challenges spanning large volumes of data acquisition, storage, correlation and design/representation.

A highly incomplete list of relevant commercial and open source tools would include Ganglia, Nagios, Cati, Graphite, Munin, Splunk, New Relic, Tracelytics (see disclosure), DataDog (see disclosure), and AppDynamics.

Enter the Data Scientist. While correlation doesn’t imply causation, with large enough sample sizes the old adage “where there is smoke there is usually fire” often applies. When you can visualize that smoke in a pretty chart, it’s easier to pinpoint the fire.


Jesse Robbins interview on DevOps Cafe #19 (w/ full transcript!)

Direct download: Devops_Drop_022.mp3
Category:general -- posted at: 4:35pm PDT

Puppet Conf Recap ...

Great show.. first class.. venue, food, content...

“Operations as a Strategic Weapon”

Damon and I did our combined talks right after Luke’s Keynote.  I thought we rocked.  They will be posting the videos

Devops Cafe Roundtable with Luke, Teyo, James and Scott  ..

Basically the management team at Puppet Labs

Scott story about joining Puppetlabs... His Loudcloud experience. 

Damon killed.  We talked about Service Orchestration, PaaS, culture patterns.. great stuff... We will post the audio on Devops Cafe site and the Video should be up in a week or two...


Puppet Enterprise 2.0

A lot of new integration with Mcollective and the GUI...

New GUI, right out-of-the-box PE 2.0 automatically discovers all resources – packages, hosts, groups, and users.  Uses Mcollective to discover. 

Visually Clone Resources To Scale Quickly, Efficiently, and Reliably(From the GUI)

With PE 2.0’s new  compliance capability, you now can visually monitor for any unauthorized changes against your desired-state baseline. Can run compliance reports once a day and watch for changing trends...  Give auditors GUI control to see what they need to see...

PE 2.0’s new provisioning capability allows you to quickly and easily create new instances of VMware and Amazon EC2.  KInd of like “Knife” with the added bare metal sauce... 


“Operating at Scale”

Pedro Canahuati

SRE Manager... 

Dealing with issues at “SCALE” and I mean scale....

Switched from XEN to LXC to to overhead at scale...

Been using cFengine for years... About to change to Chef or Puppet.. Looking at both. 

All the #devops thing are going on at FB  CD, Agile in operations, collect and store everything.  Like Google, they had to build a lot of their own stuff.  

They build there own TSDB kind of like Opentsdb.  They have built there own monitoring framework, looking framework (they use Scribe).

ODS tool the abstracts and visualizes all events (very cool) 

I was able to talk to Pedro at the speakers dinner and the following day.  I am a junkie and groupie for guys like this and stuff like .. we talked about CEP and monitoring.  Also about Chef and Puppet.  


Beyond the Node: Arkestration with Noah

John Vincent


Puppet and Juju, scaling the cloud

Marc Cluet & Adam Gandelman

These boys showed up to a gin fight with a knife... 

Slideware of how you can use puppet and Juju together.  I am not a mean guy unless you propose something that you can’t explain in a presentation...

Split brain... Needs to be a hackday .. talked to Dan Bodie about this... Interesting...


Mårten Mickos


CEO Eucalyptus

Great presentation... Talked about what the cloud has done to operations.  Also acknowledges cloud needs devops.... 

My Zing question ... great answer....

We also had some one on one podcasts with the Redhat guys about Openshift and how it works.  

Ended up with an interview with Jay Lyman of 451 group... Post on DTO....

Oh yeah  on the way to have drinks with Gene KIm I got to get my picture talked with Merle Haggard.  

Direct download: Devops_Drop_021.mp3
Category:general -- posted at: 7:44am PDT

Goteborg 2011 - program

Friday 14 October and Saturday

Yours truly doing the keynote...


Announcing Xeround Cloud Database API

Xeround is an elastic, always-on database-as-a-service
for your MySQL applications.

AWS, Rackspce and Heroku

Benchmarks against an RDS Large at  $0.44 vs the $0.08 standard instance Xeround


How GitHub Uses GitHub to Build GitHub

Everyone can push, everyone can deploy 

Master is always deployable

Deploy 10 to 40 times a day

Pull requests are our code review

Master -> Branch -> Pull request -> Master

Pull requests are RAD no meeting, email is your interface, non techs get involved

Culture...   Hack days... make things fun... 

Hubot, our valiant Campfire bot, has continued to grow in complexity. A tiny list of his (current) capabilities:

-unlock the door to our office

-print out a list of the people currently in the office based on their wifi presence

-find an apartment in the area to rent

-deploy GitHub

-say an arbitrary string over the office speakers

-play an audio sample of deadmau5 to everyone through hacked Propane HTML5

-give you a quote from any movie or TV show

-tell you the build status of any git branch

-track and map packages

-SMS any GitHubber from Campfire

-embed a seven day weather forecast


PuppetConf as a Service (PCaaS): Sign up for the Free Live Stream

Mårten Mickos

SRE’s from Facebook and Google

John Vincent @lusis Noah dude

Luke of course

Adrian Cole jClouds

Chad Metcalf Cloudera

Jinesh Varia AWS

Mark Hinkle @mrhinkle


Puppet Change Management for DevOps

What is Puppet?

At Atlassian, we use Puppet extensively with our internal systems, our Hosted products, and our build engineering infrastructure. Here's how we do it in build engineering.

Jira, Bamboo,  Greenhopper Rapid Board

Bamboo with puppet...


IBM Infrastructure as a Service (IaaS) -

From September 12 – November 11, you can provision select virtual machines at the Toronto, Ehningen, Tokyo and Singapore IBM SmartCloud data centers—subject to availability—at no charge. You can access:

Virtual machines to run Linux® (Red Hat or Novell SUSE) or Microsoft® Windows® Server 2003/2008

1 block (256 gigabytes) of persistent storage


DataStax gets $11M, fuses NoSQL and Hadoop

Brisk, Hadoop based on Cassandra

Neo raises $10.6M for Neo4j as graph DBs take off


Building Scalable Systems: an Asynchronous Approach

Node.js and rabittMQ

Direct download: Devops_Drop_020.mp3
Category:general -- posted at: 10:11am PDT

John and Damon are back with a long form interview with Jesse Robbins of Opscode and Amazon fame.

Direct download: DevOpsCafe19.mp3
Category:podcasts -- posted at: 11:25pm PDT

Building data science teams

Data science teams need people with the skills and curiosity to ask the big questions.

People You May Know (PYMK)  LInkedin, Facebook

Netflix and Zynga

Google, Amazon, 

A recent report from the McKinsey Global Institute says that by 2018 the U.S. could face a shortage of up to 190,000 workers with analytical skills.


New CycleCloud HPC Cluster Is a Triple Threat: 30000 cores, $1279/Hour, & Grill monitoring GUI for Chef

We have now launched a cluster 3 times the size of Tanuki, or 30,000 cores, which cost $1279/hour to operate for a Top 5 Pharma. It performed genuine scientific work -- in this case molecular modeling -- and a ton of it. The complexity of this environment did not necessarily scale linearly with the cores.

c1.xlarge instances 3,809

cores 30,472

RAM 26.7-TB

AWS Regions 3    ( us-east, us-west, eu-west )

 Compute Years of Work 10.9 years

 Spot Instances at an average cost of 0.286 USD / instance / hour (0.036 USD / core / hour). Compare that to the 0.68 USD / instance / hour for the same On Demand instance. That’s 57% savings!


What Exactly is Complex Event Processing Today?

Colin Clark...


Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, and is a lot of fun to use!

The lack of a "Hadoop of realtime" has become the biggest hole in the data processing ecosystem.


Building a Devops team

Brian Henerey, from Sony Computer Entertainment Europe.

First interview - remote technical test


Ec2 instance .. install Wordpress with a broken Mysql install 

Tomcat log scraping...

Using screen to watch them...

Round 2 - Face to face interview

Whiteboard test

Pair programming 


How to Think Like a Computer Scientist

Learning with Python


Node.js and MongoDB on Ubuntu

haproxy to catch inbound web traffic and route it to our node.js app cluster

mongodb for app storage

With a sample application....


First steps with Cloud Foundry on Amazon EC2

Setting up an IP address and domain name

Setting up an IP address and domain name

Making it start the right modules at boot

Direct download: Devops_Drop_019.mp3
Category:general -- posted at: 8:33am PDT

NoSQL Benchmark

Yahoo Cloud Servicing Benchmark

Basic operations are Insert, Update, Read, and Scan. There are basic workload sets that combine the basic operations, but new additional workloads can also be created.

This article contains tests conducted on the following products and versions.


Although Cassandra’s latest version is 0.8.0, we have decided to use the previous version known to be stable. Because when testing with the 0.8 version, the gossip protocol between nodes malfunctioned and the node up/down information was incorrect.

HBase-0.90.2 (Hadoop-0.20-append)
The HBase-0.90.2 (Hadoop-0.20-append) was selected because, if not the Hadoop-append version, there may be problems on decreased durability in HDFS.


Insert, Read Only and Read and Update

Insert - Cassandra kills 

Read and update Cassandra beats HBase by a little 

Read Hbase wins of course but only by a little against Cassandra 

Mongo get blow out...

Which leads me into .. why I would love to make this event...


Using Cassandra, Brisk, and Mahout to Manage Time Series, and Predict Future Events

Datastax ... Brisk  a cassandra based Hadoop...


What is glu?

glu is a free/open source deployment and monitoring automation platform.

a glu agent  is running on each of those nodes

ZooKeeper is used to maintain the live state as reported by the glu agents (blue arrows)

the glu orchestration engine is the heart of the system

Glu Script is a Groovy Class with named closures for the actions... (can be groovy or java)

install, configure, start, stop, unconfigure and uninstall

The doc is pretty cool .. however, when I started getting into the state machine stuff I had to stop...

Orchestration .. Zookeeper to build live state, compare live and desired state.

generate delta 



Three months ago, we decided to tear down the framework we were using for our dashboard, Python’s Django, and rebuild it entirely in server-side JavaScript, using node.js. (If there is ever a time in a start-ups life to remodel parts of your infrastructure, it’s early on, when your range of motion is highest.)

This decision was driven by a realization: the LAMP stack is dead. 

1991-1999: The HTML Age.

2000-2009: The LAMP Age.

2010-??: The JavaScript Age.


From $0-100million with no sales people. The Atlassian 10 commandments for startups.

Jira, Confluence 

3 ppl to 300 ppl... 

Start with two founders..  50/50 

Bootstrapping .. first round is 60M

-Sell itself, affordable, global, open 

-Use your own product.... Passionately use your own product...

-Measure everything... Capture everything.... even if you can’t analyze 

-Test everything... 5 users free .. raised money for charity 

-ABM...  ... always sponsor the beer at conference.. like Dyninc...

-Send stuff in the mail.. t-shirts... 

-Make everything into a campaign.. Turned hiring into a marketing campaign - .. send only 4 resumes otherwise you are black listed...

-Don’t be afraid to let your first product will fail.. 


Devops Dude of the Week....

Jordon Sissel

FPM and Logstash and now...


Jordon Sissel.. 

This project contains two EventMachine extensions.

First, it adds an event-driven file-following similar to the unix ‘tail -f’
command. For example, you could use it to follow /var/log/messages the same way
tail -f would.

Second, it adds event-driven file patterns allowing you to watch a given file
pattern for new or removed files. For example, you could watch /var/log/*.log
for new/deleted files.

For logstash, the log agents were
event-driven using EventMachine. The log agents mainly get their data from
logfiles. To that end, we needed a way to treat log files as a stream.

There’s a ruby gem ‘file-tail’ that implements tailing, but not in an
event-driven way. This makes it hard to use in EventMachine programs like

Thus, eventmachine-tail was born.

Further, the usage patterns for logstash required the ability to watch a
directory (or a file pattern) for new log files.

rtail -x "*.gz" "/var/log/**/*"


Direct download: Devops_Drop_018.mp3
Category:general -- posted at: 7:27am PDT

$3m Wellington rail project behind schedule

 KiwiRail said Project Sirius, a $3 million project to install an IBM asset management system, is six months behind schedule.


LexisNexis Releases Code for Its Hadoop-Killer

LexisNexis Risk Solutions' division HPCC Systems has announced that it is open sourcing the code for its High Performance Computing Cluster (HPCC) software. HPCC is a data-processing-and-delivery solution that the company is marketing as an alternative to Hadoop. HPCC includes two major components: Thor, which analyzes large datasets in a manner similar to Hadoop, and Roxie, which is closer to a traditional RBDMS or a data warehouse.


The Apache Software Foundation Announces Apache Whirr as a Top-Level Project

Apache Whirr provides a Cloud-neutral way to run a properly-configured system quickly through libraries, common service API, smart defaults, and command line tool. Whirr is being used for proof of concepts and a way to try out new Cloud services utilizing a variety of Apache products that include Hadoop, HBase, Cassandra, and ZooKeeper. An example of this is enterprise software providers Cloudera, who use Whirr to make it easy to try out their CDH product and run distributed clustered services.


Chef Hack Day - Seattle

Saturday, September 24, 2011 from 9:00 AM to 5:00 PM (PT)

Seattle, WA


Joyent arms cloud for death match with Amazon

The pixar of cloud Jason Hoffman: Chief Scientist, Founder - PhD in Molecular Pathology

Mark Mayo: CTO... 

A month after open-sourcing what it calls "the first major hypervisor" to arrive in half a decade, cloud computing pioneer Joyent has added this hypervisor to its flagship service, allowing Linux and Windows applications onto the Joyent Cloud for the first time.

Joyent and its firebrand CTO told the world they had ported the KVM hypervisor from Linux to SmartOS. They promptly open-sourced the code in an effort to "make the world a better place", and now they've rolled the hypervisor into a new incarnation of the Joyent Cloud

The company claims that its SmartOS virtual machines are up to 14 times faster than comparable Amazon server instances


Rundeck And Nagios Nrpe Checks

I’ve played with a few different jobs so far, including triggering Puppet runs across machines triggered by a Jenkins plugin. I’ve also been looking at running all my monitoring tasks at the click of a button (or again as part of a smoke test triggered by Jenkins) and I thought that might make a nice simple example.

My checks are written as Nagios plugins, and run periodically by Nagios. I also trigger them manually, using Dean’s NRPE runner script.

Direct download: Devops_Drop_017.mp3
Category:general -- posted at: 8:12am PDT

VMware vsphere provider to Fog

Libvirt integration for fog

 added cloudstack support

Full AWS support sqs, sns, rds, elb, dns,cloudformation, cloud_watch


Patrick is going crazy over there.. 

This is some crazy shit essay..

Automated Vmware ESX Installation - Bonus in Vmware Fusion

Using kickstart

After this exercise you should be able to completely script the installation of a Vmware ESX virtual machine and make it run inside Vmware Fusion.


Murder: Fast datacenter code deploys using BitTorrent

twitter eng.. .BitTorrent... Murder a 40 minute deploy..  12 seconds! - in ruby and python also a great video on the blog.


Ruby for Jenkins Goes Pre-Alpha

The prject was started to make Jenkins fit the ruby comminity stlye...

not forced into using jruby, or maven. 

boot a plugin written in pure Ruby into a Jenkins server w/o java or java knowledge.  


Installing on RHEL/CentOS 5

Decomposed the script based install and refactored it to work with centos and layed out the steps to use yum

4 ways to install

  1. bash script that you can invoke from a curl command
  2. in the vcap repo there is a vcap_dev directory with chef cookbooks to install w/chef solo
  3. Keith Hudkins created a chef server install for the barclamp PIT
  4. Canonical ow has debian packages... 


LexisNexis Releases Code for Its Hadoop-Killer

HPCC Systems a division of lexisnexis risk solutions division.

open sourcing the code for its High Performance Computing Cluster (HPCC) 

an alternative to Hadoop. 

Thor, which analyzes large datasets in a manner similar to Hadoop, 

Roxie, which is closer to a traditional RBDMS or a data warehouse.


Ensemble gets some Juju!

 We figured it should represent the complexities and mystery that often surround those skilled in the DevOps field, and be something that played on the same “u” sound and etymology as Ubuntu.  Thus, “Juju” was born!

Direct download: Devops_Drop_016.mp3
Category:general -- posted at: 9:03am PDT

Devops Hackday with Cloudfoundry at VMWare  about 300 ppl showed up.  

First ting we did was everyone got the CF Micro on USB stick (cloud on a stick)

two teams about 9 guys did a puppet git to jenkins to cf automation pipleline.  

Second team was git to jenkins to tomcat with zenoss

Takes aways for VMware was that it’s not just about the cloud.  They thought


Podcast with Chris Pinkham, CEO of Nimbula, during VMworld 2011

Chris was formerly Vice President, IT Infrastructure at

Nimbula Director 

Like I said that area is a blood bath.  I know this when I was at Canonical tring to slep UEC and that was before HP, Dell, Citrix were in the game... 

all the start ups...

10gen raises $20M for MongoDB in maturing NoSQL space

has raised $20 million in a Series D funding round. Sequoia Capital Total venture backing to more than $31 million.

The post points out the Couchbase has raised around 30m as well. Damien Katz is the createor of Couchdb... was an orginally a Lotus Notes developer, then started the CB project, then IBM hired him to work on CB and now he is part of Couchbase...


$3m Wellington rail project behind schedule

 a $3 million project to install an IBM asset management system, is six months behind schedule.

When will this madness stop... I think about flightcast... Built an airline flight delay predictor that is in the high 80% tiles beased out of y combinator with three guys total capital probally less that 1/2 million... 

No excuse for this kind of behavior...


dnsxd is an Erlang DNS server with a focus on DNS Service Discovery.

dnsxd's default datastore module is an interface to CouchDB


Dustin Kirkland has two posts on installing Cloudfoundry on Ubuntu

sudo apt-get install cloudfoundry-server


Devops Book List

The Visible Ops Handbook: Starting ITIL in 4 Practical Steps

Gene Kim

Release It!: Design and Deploy Production-Ready Software (Pragmatic Programmers)

Micheal Nygaurd

Cloud Application Architectures: Building Applications and Infrastructure in the Cloud (Theory in Practice (O'Reilly))

George Reese of Entratus

Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation (Addison-Wesley Signature Series (Fowler))

Jez Humble

Web Operations: Keeping the Data On Time

John Alspaw

Pulling Strings with Puppet: Configuration Management Made Easy

James Turnbull

Not on the list...

The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses by Eric Ries

Test-Driven Infrastructure with Chef by Stephen Nelson-Smith

Direct download: Devops_Drop_015.mp3
Category:general -- posted at: 6:56am PDT

The Last Lean Startup Bundle: 48 hours to claim $3,000,000 in prizes

One book 750 worth of stuff.  the 5 and 10 books kinda not so good...

HP Launches Private Beta Of Cloud Compute, Storage Services

started a twitter discussion.  I pointed out that I am pretty sure (certain) that they did not use opsware as a the deployment tool of choice... 

meta point is that second gen CM tools are dead.. Tivoli, Bladelogic, Opsware... #puppet and #chef rule the cloud... no turning back... 

Fault Tolerance and Protection


Thinking outside the box... 


Autometrics: Self-service metrics collection

lnked in system 

uses zookeeper for coordination. 


  • 500k+ metrics collected in a production data center every minute or about 8800 per second.
  • The average number of metrics per service is about 400, although some services have thousands
  • 1 minute resolution is maintained for 30 days, 5 minute for 90, 2 years of 1 hour resolution.

Amazon deploys every 11.6 seconds

Jon Jenkins

June 16, 2011

some gems in here.

nov 10 2010 all amazn web servers went ec2... 

Makes a great argument for utility computing (i.. cloud)

Apllo deployment system.  

one moth stats 11.6 depolys per second, peak 1079 in one hr,

10k avg sim deploys

30k peak. 

But even their tooling reflects decoupling. Every tool follows the self-service model ("YOU do what you WANT to do with YOUR stuff"). Their deployment system (named Apollo, mentioned in the slides) and their build system, and their many other tooling, all reflect this model.

Cons. What happens is that you might be reinventing the wheel at Amazon. Often. Code reuse is very low across teams. So there's no shared cost of ownership at Amazon, more often than not. It's the complete opposite at Google w.r.t. code reuse. There are many very high-quality libraries at Google that are designed to be shared. Guava (the Java library) is a great example.

Another con. You may not know what you're doing. But as a team you will still build a rickety solution that gets you to a working solution. This is the result of giving a team complete ownership: they'll build what they know with what they have. Amazon is slowly correcting some of these problems by having teams own specific Hard Problems. A good example is storage systems.

And a lack of consistency is a common issue across Amazon. Code quality and conventions fluctuate wildly across teams.

Overall, Amazon has figured out how to decouple things very well.


Data scientist: The hot new gig in tech

The gig which requires the specialist to capture, sort, and figure out what data are relevant is one part statistician, one part forensic scientist, and one part hacker.

A recent report from the McKinsey Global Institute says that by 2018 the U.S. could face a shortage of up to 190,000 workers with analytical skills.

Direct download: Devops_Drop_014.mp3
Category:general -- posted at: 6:42am PDT

 Rails 3 / Active Model support

openstack / swift

Guide to Writing Chef Cookbooks

How Josh writes cookbooks (readme driven dev)

Moving an Elephant: Large Scale Hadoop Data Migration at Facebook

In 2010, Facebook had the largest Hadoop cluster in the world, with over 20 PB of storage. By March 2011, the cluster had grown to 30 PB — that’s 3,000 times the size of the Library of Congress! At that point, we had run out of power and space to add more nodes, necessitating the move to a larger data center. 

edelight / chef-solo-search

10.4 added data bag search for solo.. 

chefsolo-search adds library routines for search and 

ClojureScript and Node.js

Cloud Breakup: Why CloudSpokes Chose Over Azure

entered ga this week... 

Crowbar modularization work begins

Direct download: Devops_Drop_013.mp3
Category:general -- posted at: 10:04am PDT

Camp Devops oct 22-23 in cHicago

CI for the world agile 2011 ... Patrick and Juilian Simpson The Buil dr)

Why Vagrant is cool - Presentation at Devopsdays MountainView 2011

Puppetlabs have released version 1.2 of their Puppet Enterprise product. It now ships with even more compliance related tools as well as the managed installer for all the components and professional support as before.

DevOps and Labor Day

Cameron Haight

rbtrace: like strace, but for ruby code

rbtrace shows you method calls happening inside another ruby process in real time.

5 Startups to Watch at VMworld

Automated Configuration Management With Opscode Chef: The Basic Moving Parts

Can any cloud catch Amazon Web Services? (part 2)

Direct download: Devops_Drop_012.mp3
Category:general -- posted at: 7:51am PDT