Sunday, 11 June 2017

Why Company Culture Enabled Us To Win The OpenStack Super User Award

Image result for innovationWhen I joined Betfair back in 2014 I was equally amazed and surprised at the same time. Looking around the business it was full of intelligent, articulate and driven people. This was not the bit that surprised me of course, it was a key reason I joined the company. The technology the development teams were using was absolutely fantastic, the scale was immense and everyday they were pushing the remit of what was possible from a product perspective.

When joining a new company it is always slightly intimidating, as a newbie at a company you always question; Am I really smart enough to work here? What can I possibly bring to this organisation?Betfair was engineering led, when engineers on the ground had an idea, they were allowed the freedom to implement it and they were in charge.

Image result for tickets
However, the one thing that surprised me was the infrastructure that the development teams were using was more than a little dated, it had served them well, but the infrastructure processes weren't very innovative. Put simply developers had to fill out lengthy system designs to get new infrastructure and fill in tickets for network and infrastructure changes which slowed them down greatly.

My personal opinion was that the developers were really delivering fantastic products to market in spite of the internal IT processes that were hindering them. My main cause for surprise was on my 1st day I asked if I could have some virtual machines to start working on some proof of concepts and I was told that one of my colleagues had some servers I could use underneath his desk.

Image result for restructure IT
I really couldn't believe in 2014 that a company known for their bleeding edge tech products, couldn't provide their engineers with infrastructure to try out new ideas on. However, initiatives to change this were very much underway. When I joined Betfair, they had very recently moved to having a new CTO in charge of all technology, where before it had a CTO for development and a CIO for infrastructure that had caused a traditional dev and ops wall. The very cultural issue that DevOps initiatives look to solve, it is fair to say their was an appetite for change, this included the removal of shadow IT that had worked round these known impediments.


Image result for openstackIn early 2015 we started our i2 (infrastructure 2) project initiative to apply development principles to infrastructure. We would look at doing a major hardware refresh, which was long overdue. So we would use this opportunity to choose a new technology stack to create a new infrastructure platform that respected an everything as code mandate, nothing would be done manually, and would look to apply continuous integration and continuous delivery processes to all infrastructure.

At the same time we would provide a self-service framework for developers to consume that infrastructure so we had a consistent set of delivery tooling and workflow actions to create VMs, networks, storage in the same way, using Ansible playbooks, so we had complete consistency in our deployment methodology that we had previously lacked. This would make the software release process more predictable, manageable and whisper it boring.

This brought us to our i2 reference architecture, with OpenStack at the centre of it and Nuage networks used for software defined networking. We would use OpenStack as the API middleware that would programmatically control our infrastructure: OpenStack White Paper

Image result for openstack not deadAt the time, during the selection process we heard from many detractors "OpenStack was dead",  "it wouldn't scale", "it wouldn't be able to support the performance we required". "We needed  proprietary software". This was all of course nonsense, fake news if you will, and we knew it. However, convincing everyone in the business this was the case was a monumental battle and I am proud to say we won by making data based arguments. In an argument it is hard to argue against hard facts. Always fight for what you believe is right and never give up, winning hearts and minds is the biggest challenge. For a company like Betfair that was engineering led, innovative, loved open source, linux based, OpenStack was a perfect fit and I still believe that.


However, implementing great technology such as OpenStack and Nuage are not enough to be successful, we were very lucky at Betfair as we had top down sponsorship from CTO level to make necessary culture changes to make it successful and the trust and support of management. If you do not couple the two together you will not create the desired business benefits.

Image result for developers pagerduty
For instance at Betfair we put our developers on call and they wrote the chef recipes that installed the applications that they owned. This meant it stopped the throw it over the wall mentality to the operations team. Instead this encouraged developers to care about monitoring their applications, improving them sufficiently and taking ownership operationally . They would make sure applications didn't have operational issues so they weren't paged out and woken up late at night. Ownership encouraged the right kind of behaviours from teams.


On the infrastructure side when I joined there were cross-country silos, so we set-up DevOps roadshows to break down those boundaries. Infrastructure teams would present to each other what they were all doing and knowledge share. We sought to encourage collaboration across countries, improved team spirit and shared ideas to create cross-county initiatives. We empowered engineers, we didn't dictate to them.

Image result for company silos kill
At any time keeping three countries of people engaged, motivated and aligned is very difficult but this made an amazing difference. If you facilitate the right kind of behaviours with continual learning and knowledge sharing you ultimately create an engaged and happy workforce and the correct culture.


Now at Paddy Power Betfair today we have teams from 4 counties (Ireland was added post-merger) peer reviewing each others merge requests daily on our i2 framework and collaborating on new features. This really wouldn't have happened in 2014 with each country deploying their applications in a completely different way using different tooling and processes. Teams were not aligned at all and we had silos everywhere.

Image result for t shaped teamsThese new cultural initiatives meant when we started i2 project in earnest, we had sorted the majority of the cultural issues and managed to create a cross functional team made up of people that were bought into these new ways of working who would create the core automation. This took people from a development, operations and networking backgrounds and put them in a single team to create self-service processes to help automate all the pain points on the incumbent infrastructure. This team was made up of people from the United Kingdom, Porto and Romania.

This created T shaped teams that brought deep dive knowledge from a particular discipline and then through working in the team that created automation for:
  • Base images using Packer to create Centos 6, Centos 7 and Windows 2012 R2
  • OpenStack VM provisioning 
  • OpenStack/ Nuage Network provisioning and ACL rules
  • Load Balancer configuration
  • Storage Provisioning
  • Bare Metal provisioning
  • Switch provisioning
They each learned brand new sets of skills, pushing them outside their comfort zones, thus enabling them able to support the full technology stack. It made us all better engineers and as a result we were able to achieve pretty amazing things with the automation we put in place. The core team that initially built the i2 project was made up of around 12 people, by automating everything, it meant you didn't need huge teams to manage the infrastructure. I loved turning up to work and seeing what the teams would achieve next.

If we had not instigated the necessary cultural changes and had the backing from the organisation, or trust from our managers to implement the necessary changes, we would have never have built a successful i2 project. We would likely have just another failed OpenStack or private cloud initiative.
To date our automation initiatives have won the i2 project a RedHat innovation award, an HP innovation award and most recently picked up the OpenStack Super User Award, at the latest OpenStack summit in Boston, which we are immensely proud of.

Allowing engineers to be creative, innovate and try new things continually will bring huge business benefits. We now do around 1000 deployments on any given working day, when a development team checks in code it triggers the continuous delivery cycle for one of  the 200 applications that have been onboarded onto the platform so far. Each application uses the self-service automation and we provision about 3000 VMs a day over our two data centres.

Our deployments are completely immutable, every deployment will create new flavours (CPU, RAM and Disk), host aggregates, organises hypervisors, create networks, load balancing and virtual machines then install software on them based on the YAML files developers have filled in and added to GIT.

Image result for progress
As a comparison in 2014 I couldn't get 1 VM for a POC, today I watch as the platform spins up over 3000 VM's a day, and allows our developers to build new products and it supports those products in production on OpenStack. To say we have come on a massive journey is an understatement and even now I think at times it is taken for granted somewhat, this is human nature as people always want better and faster.

But through the continuous improvement model we implement we are incrementally improving the self-service framework and adding new features. When adding new features it is important to not compromise the integrity of the platform so we won't say yes to everything as scope creep is one of the reasons successful projects die.

Company culture allowed us to achieve the i2 project, sometimes I think people don't understand what an important factor this was. We were brave, innovative and we achieved great things. So if you have great people at your company, I plead with every manager, empower them. Steve Jobs once said:
Image result for steve jobs hire smart quote“It doesn’t make sense to hire smart people and then tell them what to do; we hire smart people so they can tell us what to do.”
I think every manager can learn from this statement, the people you hire will make your company successful if you hire better than what you have.

Always strive to raise the bar with every hire, hire people that may not agree with you all the time. The bravest thing a manager can do is empower their staff, those that are fearful will micro-manage their staff and they will ultimately fail in initiatives as they can't know everything and are a level of abstraction too far away from the detail to make the correct decisions.


Image result for why cloud projects failIn conclusion, it is manager's job to remove blockers and impediments that hinder their engineers from doing their jobs and help create a company culture that allows them to be successful. OpenStack and Nuage has been a huge technology enabler but without sorting the necessary company culture before building the i2 platform we wouldn't have gotten anywhere. Don't underestimate how important company culture is at your organisation, the majority of cloud initiatives fail because of the culture not technology.

So anyone wanting to go on a similar journey, you will encounter detractors, non-believers, critics, but don't give up, it isn't impossible, you need to disrupt the notion that anything is impossible. Tear up the script, throw away the rulebook and create a new one. You only push technology forward by doing something that hasn't been done before.

So prototype new solutions, be brave, be unapologetic, not everything will work and treat failures as learning experiences. OpenStack has been a massive success for us and could be for others, I for one believe in open source technology, community and continual learning and so should you.

Videos of Paddy Power Betfair's OpenStack Summit talks from the most recent Summit in Boston are below:

Lessons learned from running 1000 Application Deployments a Day on OpenStack:

https://www.openstack.org/videos/boston-2017/lessons-learned-from-running-1000-application-deployments-a-day-on-openstack-at-paddy-power-betfair

Immutable OpenStack Infrastructure:

https://www.openstack.org/videos/boston-2017/immutable-openstack-infrastructure

How Paddy Power Betfair uses OpenStack Manila to manage stateful data in the DevOps process

https://www.openstack.org/videos/boston-2017/how-paddy-power-betfair-uses-openstack-manila-to-manage-stateful-data-in-the-devops-process

Continuous Delivery Of Stateful applications using Cinder at Paddy Power Betfair

https://www.openstack.org/videos/boston-2017/continuous-delivery-of-stateful-applications-using-cinder-at-paddy-power-betfair