Operations Teams need to provide eight critical services to the developers and users of their environment.  At my current employer, I use open source software to provide these services that allow our developers to be more productive and our customers to experience stable, responsive service.



Source Code Management

Keep all of our bespoke software, configurations and notes under strict version control.

Sofware: [Git](http://git-scm.com)
Pros: Fast Stable Many developers familiar with it due to Github's popularity
Cons: Steep learning curve Somewhat cryptic commands
Option: [Subversion](http://subversion.tigris.org)
** **

Continuous Integration

Build, test, version and package our software so that it may be quickly and safely deployed to our staging environment

Sofware: [Jenkins](http://jenkins-ci.org)
Pros: Easy Integration with GIT Nice GUI Flexible enough to meet our needs
Cons: Configuration limited to GUI Written in Java*
Option: [Cruise Control](http://cruisecontrol.sourceforge.net)
 

Provisioning

Spin up nodes to become part of the processing farm and decommission nodes no longer required

Sofware: Custom scripts using [Fog](http://fog.io)
Pros: Simple scripts Easy to customize Support multiple cloud providers
Cons: Custom tool
Option: [Cobbler](http://fedorahosted.org/cobbler/), RightAWS
 

Configuration Management

Ensure that all nodes are automatically and correctly configured and remain in a known configured state

Sofware: [Puppet](http://puppetlabs.com)
Pros: Easy configuration language Well supported Active community
Cons: Have to learn said configuration language Requires serious investment of time
Option: [Chef](http://wiki.opscode.com/display/chef/Home)
 

Monitoring

Check on services and nodes to ensure that things are behaving as expected before the customer notices

 

Sofware: [Icinga](http://www.icinga.org)
Pros: Can be easily auto-configured by Puppet Well understood Nagios syntax Works well with nagios checks and plugins
Cons: Requires serious investment of time and constant care
Option: [Nagios](http://www.nagios.org), [Zenoss](http://www.zenoss.com)
 

Capacity/Performance Management

Collect system metrics for assessing performance and capacity planning.  Some organizations have monitoring perform this role, but I have very strong opinions on this being kept separate.

 

Sofware: [Collectd](http://collectd.org)/[Visage](http://auxesis.github.com/visage/)
Pros: Light, fast daemon on each box Flexible server Many plugins availble
Cons: Separate process to run Requires a lot of disk and disk I/O
Option: [Ganglia](http://ganglia.sourceforge.net)
 

Log Collection

Centrally collect, store and monitor system and application logs

Sofware: [Rsyslog](http://www.rsyslog.com)/[Graylog2](http://graylog2.org/)
Pros: Rsyslog provides flexible configs MongoDB backed server performs well Easy front end for log viewing
Cons: Takes a while to learn Mongo Harder to pull/backup then text logfiles
Options: [Syslog-ng](http://freshmeat.net/projects/syslog-ng/) [Logstash](http://code.google.com/p/logstash/)
 

Deployment Management

Allow developers and technical staff to deploy and monitor application activity.  Since each infrastructure is unique, it makes sense to build a custom solution to this problem.

 

Software: [Mcollective](http://www.puppetlabs.com/mcollective/introduction/)/[Sinatra](http://www.sinatrarb.com/)/[ActiveMQ](http://activemq.apache.org/)
Pros: Sinatra makes it easy to write simple web applications Mcollective is extremely fast ActiveMQ is very flexible and resilient
Cons: Sinatra is not a full featured as Rails Mcollective requires a change of thinking about command/control ActiveMQ is Java*
Options: [Control Tier](http://doc36.controltier.org/wiki/Main_Page)
 
  • I list Java as a con because we do not have extensive in-house Java expertise and it rquires us to install something we would not have normally