Sunday, 11 June 2017

Why Company Culture Enabled Us To Win The OpenStack Super User Award

Image result for innovationWhen I joined Betfair back in 2014 I was equally amazed and surprised at the same time. Looking around the business it was full of intelligent, articulate and driven people. This was not the bit that surprised me of course, it was a key reason I joined the company. The technology the development teams were using was absolutely fantastic, the scale was immense and everyday they were pushing the remit of what was possible from a product perspective.

When joining a new company it is always slightly intimidating, as a newbie at a company you always question; Am I really smart enough to work here? What can I possibly bring to this organisation?Betfair was engineering led, when engineers on the ground had an idea, they were allowed the freedom to implement it and they were in charge.

Image result for tickets
However, the one thing that surprised me was the infrastructure that the development teams were using was more than a little dated, it had served them well, but the infrastructure processes weren't very innovative. Put simply developers had to fill out lengthy system designs to get new infrastructure and fill in tickets for network and infrastructure changes which slowed them down greatly.

My personal opinion was that the developers were really delivering fantastic products to market in spite of the internal IT processes that were hindering them. My main cause for surprise was on my 1st day I asked if I could have some virtual machines to start working on some proof of concepts and I was told that one of my colleagues had some servers I could use underneath his desk.

Image result for restructure IT
I really couldn't believe in 2014 that a company known for their bleeding edge tech products, couldn't provide their engineers with infrastructure to try out new ideas on. However, initiatives to change this were very much underway. When I joined Betfair, they had very recently moved to having a new CTO in charge of all technology, where before it had a CTO for development and a CIO for infrastructure that had caused a traditional dev and ops wall. The very cultural issue that DevOps initiatives look to solve, it is fair to say their was an appetite for change, this included the removal of shadow IT that had worked round these known impediments.

Image result for openstackIn early 2015 we started our i2 (infrastructure 2) project initiative to apply development principles to infrastructure. We would look at doing a major hardware refresh, which was long overdue. So we would use this opportunity to choose a new technology stack to create a new infrastructure platform that respected an everything as code mandate, nothing would be done manually, and would look to apply continuous integration and continuous delivery processes to all infrastructure.

At the same time we would provide a self-service framework for developers to consume that infrastructure so we had a consistent set of delivery tooling and workflow actions to create VMs, networks, storage in the same way, using Ansible playbooks, so we had complete consistency in our deployment methodology that we had previously lacked. This would make the software release process more predictable, manageable and whisper it boring.

This brought us to our i2 reference architecture, with OpenStack at the centre of it and Nuage networks used for software defined networking. We would use OpenStack as the API middleware that would programmatically control our infrastructure: OpenStack White Paper

Image result for openstack not deadAt the time, during the selection process we heard from many detractors "OpenStack was dead",  "it wouldn't scale", "it wouldn't be able to support the performance we required". "We needed  proprietary software". This was all of course nonsense, fake news if you will, and we knew it. However, convincing everyone in the business this was the case was a monumental battle and I am proud to say we won by making data based arguments. In an argument it is hard to argue against hard facts. Always fight for what you believe is right and never give up, winning hearts and minds is the biggest challenge. For a company like Betfair that was engineering led, innovative, loved open source, linux based, OpenStack was a perfect fit and I still believe that.

However, implementing great technology such as OpenStack and Nuage are not enough to be successful, we were very lucky at Betfair as we had top down sponsorship from CTO level to make necessary culture changes to make it successful and the trust and support of management. If you do not couple the two together you will not create the desired business benefits.

Image result for developers pagerduty
For instance at Betfair we put our developers on call and they wrote the chef recipes that installed the applications that they owned. This meant it stopped the throw it over the wall mentality to the operations team. Instead this encouraged developers to care about monitoring their applications, improving them sufficiently and taking ownership operationally . They would make sure applications didn't have operational issues so they weren't paged out and woken up late at night. Ownership encouraged the right kind of behaviours from teams.

On the infrastructure side when I joined there were cross-country silos, so we set-up DevOps roadshows to break down those boundaries. Infrastructure teams would present to each other what they were all doing and knowledge share. We sought to encourage collaboration across countries, improved team spirit and shared ideas to create cross-county initiatives. We empowered engineers, we didn't dictate to them.

Image result for company silos kill
At any time keeping three countries of people engaged, motivated and aligned is very difficult but this made an amazing difference. If you facilitate the right kind of behaviours with continual learning and knowledge sharing you ultimately create an engaged and happy workforce and the correct culture.

Now at Paddy Power Betfair today we have teams from 4 counties (Ireland was added post-merger) peer reviewing each others merge requests daily on our i2 framework and collaborating on new features. This really wouldn't have happened in 2014 with each country deploying their applications in a completely different way using different tooling and processes. Teams were not aligned at all and we had silos everywhere.

Image result for t shaped teamsThese new cultural initiatives meant when we started i2 project in earnest, we had sorted the majority of the cultural issues and managed to create a cross functional team made up of people that were bought into these new ways of working who would create the core automation. This took people from a development, operations and networking backgrounds and put them in a single team to create self-service processes to help automate all the pain points on the incumbent infrastructure. This team was made up of people from the United Kingdom, Porto and Romania.

This created T shaped teams that brought deep dive knowledge from a particular discipline and then through working in the team that created automation for:
  • Base images using Packer to create Centos 6, Centos 7 and Windows 2012 R2
  • OpenStack VM provisioning 
  • OpenStack/ Nuage Network provisioning and ACL rules
  • Load Balancer configuration
  • Storage Provisioning
  • Bare Metal provisioning
  • Switch provisioning
They each learned brand new sets of skills, pushing them outside their comfort zones, thus enabling them able to support the full technology stack. It made us all better engineers and as a result we were able to achieve pretty amazing things with the automation we put in place. The core team that initially built the i2 project was made up of around 12 people, by automating everything, it meant you didn't need huge teams to manage the infrastructure. I loved turning up to work and seeing what the teams would achieve next.

If we had not instigated the necessary cultural changes and had the backing from the organisation, or trust from our managers to implement the necessary changes, we would have never have built a successful i2 project. We would likely have just another failed OpenStack or private cloud initiative.
To date our automation initiatives have won the i2 project a RedHat innovation award, an HP innovation award and most recently picked up the OpenStack Super User Award, at the latest OpenStack summit in Boston, which we are immensely proud of.

Allowing engineers to be creative, innovate and try new things continually will bring huge business benefits. We now do around 1000 deployments on any given working day, when a development team checks in code it triggers the continuous delivery cycle for one of  the 200 applications that have been onboarded onto the platform so far. Each application uses the self-service automation and we provision about 3000 VMs a day over our two data centres.

Our deployments are completely immutable, every deployment will create new flavours (CPU, RAM and Disk), host aggregates, organises hypervisors, create networks, load balancing and virtual machines then install software on them based on the YAML files developers have filled in and added to GIT.

Image result for progress
As a comparison in 2014 I couldn't get 1 VM for a POC, today I watch as the platform spins up over 3000 VM's a day, and allows our developers to build new products and it supports those products in production on OpenStack. To say we have come on a massive journey is an understatement and even now I think at times it is taken for granted somewhat, this is human nature as people always want better and faster.

But through the continuous improvement model we implement we are incrementally improving the self-service framework and adding new features. When adding new features it is important to not compromise the integrity of the platform so we won't say yes to everything as scope creep is one of the reasons successful projects die.

Company culture allowed us to achieve the i2 project, sometimes I think people don't understand what an important factor this was. We were brave, innovative and we achieved great things. So if you have great people at your company, I plead with every manager, empower them. Steve Jobs once said:
Image result for steve jobs hire smart quote“It doesn’t make sense to hire smart people and then tell them what to do; we hire smart people so they can tell us what to do.”
I think every manager can learn from this statement, the people you hire will make your company successful if you hire better than what you have.

Always strive to raise the bar with every hire, hire people that may not agree with you all the time. The bravest thing a manager can do is empower their staff, those that are fearful will micro-manage their staff and they will ultimately fail in initiatives as they can't know everything and are a level of abstraction too far away from the detail to make the correct decisions.

Image result for why cloud projects failIn conclusion, it is manager's job to remove blockers and impediments that hinder their engineers from doing their jobs and help create a company culture that allows them to be successful. OpenStack and Nuage has been a huge technology enabler but without sorting the necessary company culture before building the i2 platform we wouldn't have gotten anywhere. Don't underestimate how important company culture is at your organisation, the majority of cloud initiatives fail because of the culture not technology.

So anyone wanting to go on a similar journey, you will encounter detractors, non-believers, critics, but don't give up, it isn't impossible, you need to disrupt the notion that anything is impossible. Tear up the script, throw away the rulebook and create a new one. You only push technology forward by doing something that hasn't been done before.

So prototype new solutions, be brave, be unapologetic, not everything will work and treat failures as learning experiences. OpenStack has been a massive success for us and could be for others, I for one believe in open source technology, community and continual learning and so should you.

Videos of Paddy Power Betfair's OpenStack Summit talks from the most recent Summit in Boston are below:

Lessons learned from running 1000 Application Deployments a Day on OpenStack:

Immutable OpenStack Infrastructure:

How Paddy Power Betfair uses OpenStack Manila to manage stateful data in the DevOps process

Continuous Delivery Of Stateful applications using Cinder at Paddy Power Betfair

Sunday, 2 April 2017

DevOps Does Not Mean Automation

Image result for devops conferenceIn the past few months I have attended some DevOps meet-ups, I hadn’t actually had the chance to do this for a year or so. It is always really refreshing sitting and listening to others experiences and war stories at companies other than your own. It also gives a sense of perspective, allowing you to gauge how mature your organisation is, in comparison to others that are on their own DevOps or continuous delivery journey.

I also love conferences and meet-ups as I see it as an opportunity to go out and talk to people and see if anyone is doing something interesting that we could bring to our own organisation, be that a new way of working, a new technology or just something we hadn’t come across yet that would bring value.

Image result for its not rocket scienceHowever, I found myself a little disappointed in many aspects around general understanding of DevOps from some pretty high powered companies at recent meet-ups. DevOps isn’t really a new thing and it still seems that some haven’t grasped that it is about culture. What does “culture” really mean? Oh no DevOps bingo buzzword alert….

For me it is pretty simple, it means having a supportive organisation that does not break IT departments into traditional silos. It means having T-shaped teams that are put together to deliver different projects. It means developing a culture where a manager’s job isn’t about barking orders anymore or dictating how things are done, instead managers are there to remove blockers or impediments stopping their teams from doing their jobs.

This DevOps culture allows engineers the ability to collaborate, share ideas, talk as necessary with other teams to solve everyday problems as they don’t have line managers stating their staff don't have time or are too busy to speak and work with others when they aren’t. This requires people to champion this new way of working and it allows a work-place where innovation is at the forefront not fear of change or moving too fast.

A DevOps culture should allow the word “impossible” to be replaced with “how can we” or “when can we achieve this by”. If there is an issue or engineering challenge, it can be incrementally improved on or fixed by empowering engineers to do this.

Image result for not automationDevOps culture is about challenging everything, prototyping new solutions, creating feedback loops in processes and implementing a continuous improvement model on every process and removing waste.

I can’t stress this more, DEVOPS DOES NOT MEAN AUTOMATION, IT leaders still seem to confuse continuous delivery or automation with DevOps. The DevOps culture you create in your organisation, for the above reasons, can mean it facilitates continuous integration, continuous delivery and automation. 

But DevOps and automation can be mutually exclusive, you can have automation and not be doing DevOps and it seems loads of companies are doing automation and not DevOps.

At a recent London DevOps meet-up I watched a panel of so called “experts” talk utter drivel for 30 minutes unchallenged on anything they were saying while some newbies to DevOps scribbled down these so called words of wisdom on notepads. Eventually I posed the question to the panel via an interactive question wall (Oh the irony, DevOps meetup, communicate via question wall):

“Does the fact that there seems to be a common misconception that: DevOps = (an automation team) not mean we see a constant creation of 'DevOps Teams' in industry and yet another silo?”

Image result for devops is not a teamDespite the question receiving most up votes to be asked next to the panel, by an equally agitated and disillusioned audience (newbies aside), it was promptly deleted by the organisers so never asked to the panel. The shock, the horror, the (insert expletive).

Why did that happen? Probably because half the panel were talking about DevOps teams they had set-up in their organisations. At a DevOps meet-up they had failed to embrace the challenge everything mantra by asking for questions and censoring some. Upsetting yes, unexpected? No.

To quote a current colleague of mine, most big organisations will, “give the current ops team a raise, rename their ops team the DevOps team and go out to DevOps meet-ups and tell the world they are doing DevOps so they can get a pat of the back and show they are a forward thinking company”.

Technology leaders and managers in majority want to work with other managers that agree with them, so they can work in harmony, pat each other on the back and all agree how good they are doing. Here-in lies the status quo, the conformational bias, my peers agree it must be right, in truth that's because you picked them. This is where complacency sets in and company rot. A variety of opinions are required for organisations to be successful, difference in opinion, conflicting ideas and challenging preconceptions make us better as long as they are channelled in a non-destructive manner. 

In a way, we need to disrupt the notion that a harmonious group of managers that agree on everything is a good thing, before we can move forward in IT, sometimes it isn’t pretty and sometimes it can make for difficult conversations. But if we accept every persons opinion in an organisation should carry the same weight if you have recruited intelligent people then and only then can we move forward as a group.

Image result for develop your peopleAnother annoyance for me, from the same panel at the very same meet-up, was when broaching the subject of talent retention, they talked about locking graduates or staff into the company. 

Oh how this made me mad, in my opinion when you take on graduates, as a technology company, you have a duty to help these individuals grow and mature by giving them the skills and training necessary to be successful in the IT industry. 

This means pushing graduates outside their comfort zones at all times, coaching them daily and giving them fun and interesting work that you yourself would love to do, not crap you don't want to do. Hiring graduates shouldn't mean locking them into the company.

After graduates have done in theory their apprenticeship of 2-3 years at a company, I would actively encourage them to try something different if they don’t feel they are progressing at the current company, which could mean looking elsewhere in the company or moving jobs to a different company. By that time the next wave of graduates should be doing their apprenticeship at the organisation and be ready to replace them if they do leave.
Image result for graduates are the future
However, my main point here is that, if companies are meeting individuals needs in terms of personal development and have a created a good company culture that empowers their staff and makes them happy, they shouldn’t need to worry about talent retention, that will take care of itself.

Image result for IT cultureIdeally we should be setting graduates up to be the next technology leaders in organisations and developing them all the time. It is not a company’s duty to lock in talent to the point they stay at the company despite being unhappy, as they have been told them they won’t get a job anywhere else. 

Companies that are fearful of losing talent have a far bigger cultural issue, which is a culture of fear and uncertainty from management. There should be no bus factors of one in your organisations and if your daily operations are completely automated then what are you worried about? Or was that all bluster just for the sake of those at the meetup and not in fact true? It is time for some companies to stop all the talk and learn how to walk the walk.