A Story of DevOps >>Episode 11. The Flux Capacitor (Performance Beyond Measure)
Mankind has ambition; Ambition to achieve great things both in terms of tangible objectives and in terms of performance. The human race strives to push boundaries and better itself. Some of the entrepreneurs of our time don’t just want to deliver great products and services, they have aspirations of travelling to and even colonising space, e.g. Elon Musk and Jeff Bezos.
We have examples today where we can specifically distinguish greatness in companies like Toyota, Google, Apple, Amazon, Spotify. Netflix and in people like Bezos, Churchill, Ronaldo, Farah and Bolt and I have shared in previous articles what brings people and organisations to become great in #astoryofdevops.
But what is it about some companies that make them stand out – what is it about the unique combination of generative culture and engineering excellence that delivers breath-taking performance?
In the early days of my DevOps journey, after reading the Phoenix project, I struggled to understand how the Unicorns were achieving thousands of deployments a day whilst preserving the stability, performance and security of the live service.
In the Phoenix project, Gene Kim wrote: “High performing organisations such as Amazon, Google, Twitter, Etsy and Netflix have adopted a set of techniques we now call DevOps and they are routinely deploying hundreds or even thousands of production changes a day, while preserving world-class reliability, stability and security. By instituting a set of cultural norms, processes and practices, these high performers are achieving breath-taking performance.”
How was this possible? Surely this was not achievable in our organisation and how would we go about improving the stability and security of our production services AND increase the rate of change. This was a conflict in terms – two opposing poles. We already struggled to deliver high quality services and each time we changed the portfolio of services (probably a few dozen times a year), it was drama and often resulted in a further degradation of the live service.
There was one time we upgraded a time and attendance application and it failed. This would impact the ability to record time against projects, charge customers and pay thousands of employees. As the service lead, I made the call to roll back – under extreme circumstances. A few weeks later we tried again. The drama!
All the king’s Prince2 and all the king’s ITIL were not putting Humpty Dumpty back together again. The checks, balances, reviews, meetings, documentation, decision gates, project plans, project and service checklists, senior stakeholders, experience, etc. did not prevent projects from disrupting the live service time and time again.
But, after reading the Phoenix Project, the realisation that this was not the norm and that companies were deploying thousands of times a day whilst preserving live services with user volumes in the millions or even billions was uncomprehensible.
I felt I needed a magic device – I felt we needed the Flux Capacitor! The search began…
Over the last few years, PuppetLabs and Thoughtworks have published a number of papers known as the State of DevOps Reports. These reports helped me realise the difference between high performance IT teams and their low performing cousins.
Key points that helped me realise that we needed to change included:
- Business performance is predicated by IT performance and IT performance is predicated by organisational culture. My studies in previous articles on topics like Kryptonian and Assemble focus on the environment of the Knowledge Worker and how productivity is optimised by the psychological safety, trust and openness of a Generative culture.
- The right leadership is essential and characteristics of great leaders include inspiring communication, vision, intellectual stimulation, servant leadership, humility and professional will. My studies on this are shared in Freedom.
- High performance teams deploy 200 time more frequently, have 22% less unplanned work and double the morale of low performing IT teams. In our story, we improved deployment frequency by 300 times, saw 35% less unplanned work and doubled the morale of the team – with a strong correlation between deployment frequency and employee morale.
However, despite the improvement, we are not seeing the same impact seen by the outside world. I still had not found the Flux Capacitor.
The Flux Capacitor
Like the Flux Capacitor, High Performance teams have 3 prongs that come together;
- Leadership >> See Freedom
- Generative Culture >> See #AStoryofDevOps
- Engineering Excellence >> See below
Unlike the Back to the Future Flux Capacitor which is Plutonium-powered nuclear reactor which generates 1.21 Gigawatts of electricity, Our DevOps Flux Capacitor is People-powered which generate limitless potential and opportunities (See Kryptonian)
My first 11 articles in #AStoryofDevOps have been focussed on the Leadership and Cultural sides of DevOps. These alone can significantly influence the power of your organisation, teams and individuals. However, without engineering excellence, we cannot ‘time travel’ – we cannot out-pace the competition.
So, where do we start? At the beginning…
Gene Kim, in the Phoenix project shares with us the 3 ways of DevOps:
This simple graphic is steeped in engineering history and you can learn a lot more in Beyond the Phoenix Project.
The First Way
The first way is about the full lifecycle from concept to cash building on the work by Eli Goldratt which include Total Systems Thinking and Theory of Constraints. We need to take a look at the total flow of work through the ‘production line’ – our Kanban Boards, WIP limits, Cumulative Flow Diagrams come into play here. Read this superb article on Professional Scrum with Kanban. This shows how you take the information collected through our Jedi DevOps practices to improve the flow of work.
Taking the philosophies of Goldratt, optimising anything other than the bottleneck is a waste. With the right actionable metrics, a team will soon see the bottlenecks in the flow. There are then 5 ways in which to optimise the bottleneck. The Theory of Constraints Institute explain these following 5 Steps really well and the DevOpsGroup have reviewed this in light of delivering software products.
- Identify the constraint >> Use your data represented in cumulative flow diagrams and burn down charts to identify the constraint
- Exploit the constraint >> Do not let the constraint run idle, make it deliver the best possible quality and improve the output of the constraint.
- Subordinate everything to the constrain >> Slow the entire system down to the performance of the constraint. Not doing this will result in bottlenecks, inventory buildup and the overall system slowing down.
- Elevate the constraint >> After the first three steps, invest in improving the constraints with more people, more computing power, automation, orchestration, etc.
- Prevent inertia becoming the constant >> Re-evaluate the whole system, identify the next constraint and then go through the above steps again. This becomes and ongoing process of improvement (See Toyota Kata, Toyota Production System, Lean Manufacturing – I would encourage reading this article on using the Toyota Production System in DevOps)
A key part of the first way is Total Quality Management which is heavily reliant on Safety Culture. This is best described by the Toyota Production System where defects are identified at the point of creation and all employees have the authority to pull the Andon cord. This cord pull slows the whole production line down for a short period of time. If the defect is not corrected, the whole line stops and members of the production line swarm to fix the problem. This could be a product defect or a delay in the process – anything exceeding the tolerances results in a cord pull and corrective action.
The Andon cord is pulled 3,500 times a day per production line. With repetition, this results in continuous improvement and as the number of cord pulls reduce, the tolerances are tightened up. This is the evidence of a culture where failure leads to learning which results in improvement.
So, to optimise the the first way, Total Systems Thinking, Theory of Constraints and Total Quality Management are key influencers for delivering high quality services as fast as possible from concept to cash.
The Second Way
The second way is all about the amplified feedback loops from the users of the service and the live system. We can do this using AB testing, customer satisfaction, systems monitoring and business indicators such as income, profit and market share.
Building the right telemetry into the system will enable the operators of the service to have real-time visibility of the performance of the service with the ability to respond quickly and effectively to the customer need.
In Lean StartUp by Eric Reis, there is the Build, Measure, Learn loop. Organisations who define their strategy based on a hypothesis need to test this with data to allow them to make decisions to pivot their strategy or persevere.
In our engineering practices, we need to apply the same – in IT, after applying Total Systems Thinking, Theory of Constraints and Total Quality Management, we need to the right user and systems performance metrics to inform the evolution of the product. Data which includes items such as service performance, customer satisfaction, results from AB testing and systems performance should feed into the team and this should inform the future pipeline of work and improvement activities.
The key is to shorten this loop as much as possible and do this by reducing batch sizes and addressing constraints.
The Third Way
This is all about creating a culture which allows for experimentation and continual learning. It’s about taking risks and learning from failure where the team continuously reflects on what it has learnt and improves. Gene Kim says “Improving daily work is more important that doing daily work”
I use the analogy whereby you are pushing a cart up a hill – but the cart has square wheels. At some point, one should reflect and consider a different shape of wheel. Eventually, through experimentation, we find that round wheels work best. We do this for a while and then we stop and consider pulling rather than pushing the cart. Then we we place an engine and a gear system at the top to automate the pulling, then we create a system of loading and unloading the cart and eventually the whole process is easier and automated. The focus on the team then is on creating the valuable items that go into the cart and consuming these at the top of the hill. In one of my teams, we have a great cake baker, Donna, and we used Donna’s cake as the valuable item in the cart. The message is that Donna can then spend more time making delicious cakes rather than pushing a cart of cakes up a hill.
The below is the cake Donna made for the SuccessFactors Product Team after they won an internal competition to get to zero open incidents.
Multiple Deployments a Day.
So, how does one get to multiple deployments a day? By repeating the above over and over again:
- First Way >> Think about the total system, optimise constraints and build quality in
- Second Way >> Develop feedback loops through increasing telemetry and shortening the feedback loop through improved monitoring and reduced batch sizes
- Third Way >> Experiment, fail, learn and try new things to improve the flow of work – give your teams time to make mistakes, push boundaries and get to the place where they are creating and releasing value as fast as possible – i.e. making more cake!
"The secret of being a good scientist, I believe, lies not in our brain power. We have enough. We simply need to look at reality and think logically and precisely about what we see. The key ingredient is to have the courage to face inconsistencies between what we see and deduce and the way things are done. This challenging of basic assumptions is essential to breakthroughs," ― Eliyahu M. Goldratt, The Goal: A Process of Ongoing Improvement
Read more of what we achieved in the Story of DevOps.
- Episode 1 >> Origins (Traversing the Change Curve)
- Episode 2 >> One Ring (Alignment and Empowerment)
- Episode 3 >> Freedom (Leadership)
- Episode 4 >> Assemble (Productive and Teams)
- Episode 5 >> Shield (Tools of the Trade)
- Episode 6 >> Kryptonian (Value, Flow, Quality in a Complex World)
- Episode 7 >> Jedi (Mastery)
- Episode 8 >> Balrog (Confront the Brutal Facts)
- Episode 9 >> Kryptonite (Anti-Patterns of DevOps)
- Episode 10 >> The Suit(Digital Transformation)
- Episode 11 >> Flux Capacitor (Automation and Orchestration)
- Episode 12 >> Resurrection
#AStoryofDevOps #DevOps #FluxCapacitor