Monday 2 May 2016

A Week At The OpenStack Summit In Texas

The Openstack summit in Texas was the third Openstack Summit I have attended. The first two being Vancouver and Tokyo last year. A lot had changed in that time for Betfair, the first summit I attended in Vancouver we were simply fact finding, trying to use the conference as a way to see if some of our initial OpenStack theories and designs were valid, while in Tokyo we were checking for new developments that we could try and help speed up our pilot and check on project updates on Ironic (Bare Metal) and Manila (File Share As A Service) that we would implement as part of phase two of our OpenStack implementation.


Betfair sent us to those two OpenStack summits as they wanted to ensure we made the correct architectural decisions when implementing our own OpenStack solution, while readily trying to avoid repeating any glaring mistakes other vendors had made and already learned from. We also used the conferences to evaluate software defined networking (SDN) solutions that were on offer, using it to answer many of our early queries. We also questioned if we should be using ceph? Should we use local disk or centralised storage? What OpenStack projects were actually mature enough to use yet? What was the actual situation with Ironic bare metal provisioning?

The OpenStack Summit put simply is a very technical and developer led conference, which set's it apart from most vendor specific summits. It is probably the most self-deprecating conference I have ever attended, as the people presenting are honest, will share mistakes and war stories to help the community and others improve. It doesn't say things are great, if it doesn't believe they are, which is a helpful cultural shift that many vendor specific conferences could learn from.


The OpenStack Summit runs in parallel around 10 sessions at a time per 40 minute windows from 9am through to 6pm, utilising one main conference center and is supplemented by two hotels, so you can gather the enormity of the set-up with around 7,500 people attending in Austin. This remember is a conference for an open source project, which in itself is very impressive and something of a huge cultural shift in it's own right.

Who'd have thought a project like OpenStack that Gartner called a science experiment in 2013 would attract so many big hitters with every vendor you can think of attending? Gartner incidentally gave a keynote at the OpenStack summit last week saying OpenStack was ready for production use and a great platform. At one point people said the world was flat, so Gartner too make mistakes, so let's not be too hard on them, despite the fact I have always felt that a magic quadrent sounds like something a salesman would sell you alongside a pyramid scheme, but I digress.


Priv Vendors 2011 2014One of the key themes to come out of the OpenStack Summit in Austin was something that has been obvious to me for many years while implementing configuration management processes in industry. Technology rarely cause cloud projects to fail, with only around 6% of projects failing due to technology, it's the company culture and those implementing projects not understanding where the value add is that causes the projects to fail, and not focusing on initial requirements.

A cloud platform is useless to a developer if they have to raise a ticket to an infrastructure team to create a VM and cannot get the specification they want, then there is no value add, so that failure to change a companies operational model is the main reason for failure of cloud projects. The value add for Betfair has not implementing OpenStack or a Nuage SDN as standalone initiatives or as Gartner put it "science projects". The value add for Betfair is using this fantastic technology as an enabler and advantage to allow us to speed up time to market, to automate our whole platform so we can easily roll back or recover from failure, and to give our developers a platform to easily test and innovate on. We use Openstack as our infrastructure middle-ware utilising a consistent set of apis to control the infrastructure via automated workflows.

So if a cloud platform fails or isn't adopted, its down to the people who implemented or didn't change the operational processes and the silo'd culture, and very rarely is the failure the underlying technology. If you don't fix the processes and people issues, then it doesn't matter what your platform is or how good it is, so companies using ticketing systems and promoting silo'd teams and communication, beware you will most likely fail with your cloud initiative.

The OpenStack Summit in Texas was very different from Vancouver and Tokyo for Betfair, the Betfair team had been selected by the OpenStack community to give 2 of the 400 breakout talks, as well as being nominated for the OpenStack super user award. Although ultimately unsuccessful in taking home the super user award, even being nominated was a great achievement, given we are very early in our implementation, only 7 months in to date.


Betfair was one of four finalists for the super user award, beating off some very impressive OpenStack users along the way. The OpenStack super user nomination for Betfair was for the pace at which Betfair had implemented an active active data center at which Openstack and Nuage SDN technology were at the heart of. We had also created an automated template for on-boarding applications, by automating everything against the OpenStack and Nuage software defined networking API's. This has enabled deployment of code, platform image, networks, ACL rules and load balancing, using common workflows, while doing it all using open source tooling.

I make no apologies for saying this is mightily impressive to go from a 4 week POC, to a  Pilot, to actually running applications in production all in 6 months on a brand new platform, while automating every step of the deployment. This is while at the same time trying to change peoples opinions and ways of working that weren't used to the pace at which we had to work or using automation in their daily tasks. Sure we are still iterating bits of the implementation, but we always will be, we will continue to keep improving the automation and learning until we reach near perfection, its been a herculean effort by the team given the time frames. The proudest part for me is having teams in the UK, Porto and Cluj all collaborate together to make this possible, this wasn't silo'd teams that made this happen this was a DevOps initiative in the truest sense across multiple countries and we need to keep that collaboration going to continue the success.


I understand there will be questions of do we deserve this super user award nomination yet? In my honest opinion I don't think we would have been worthy winners... YET.
But having attended the conference and spoken to our peers, I do think we were worthy nominees. When people from companies the calibre of Walmart approach us and say that what we have done with your automated deployment process is what they have been trying to do for years. It really hits home what a huge project this really is, and what we have achieved in a short frame of time, and what a key differentiation it can make to the business.


The purist in me, wants the time we win the OpenStack super user award (I like to believe we could one day) to be when we are running all of Paddy Power Betfairs applications in production on OpenStack, and have fully automated everything in the data center top to bottom. By then the quantifiable benefits will be that we have increased our time to market for our Paddy Power Betfair applications, as well as the time to recover from failure, while giving our developers infrastructure to actually innovate on and facilitate their fantastic ideas, all the while contributing back everything we do to the open source community to help others.

Only then will then have done what we set out to achieve when we finish our migration project, we may not catch the eyes of our peers like we have in out first 6 months, or be nominated again for an OpenStack super user award again, but achieving all the initial requirements is the measuring stick for our success. A scenario where our developers are writing code for the best applications possible without worrying about infrastructure issues, while our infrastructure and network engineers are developing code for OpenStack and the open source community to improve and optimise the infrastructure and network we run those applications on is the ideal scenario and the future.


Why can't Paddy Power Betfair be the new Etsy, I see no reason why we shouldn't be leading the way from now on and set a new benchmark using the fantastic platform we have built for our developers. As highlighted at the OpenStack summit the only thing that could stop us is culture, I'm quietly confident we will do just fine. Rapid on-boarding, bare metal and containers as a service are our next stop, and I am just a little bit excited about what we can achieve next...

The two Betfair sessions from the Austin Openstack Summit can be watched below:

DevOps at Betfair using Openstack and SDN:
https://www.youtube.com/watch?v=aKa2idHhk94

Why Betfair chose Openstack - the Road to Their Production Private Cloud:
https://www.youtube.com/watch?v=-Tmuph-vUWU


Sunday 3 January 2016

Stop Confirmation Bias And Become A Better Engineer

Confirmation bias is when "people tend to seek out information that confirms their existing opinions and overlook or ignore information that refutes their beliefs".  As a result this hinders decision making and useful information is ignored that could benefit a business. Humans are notoriously creatures of habit, take someone out to a restaurant and they will tend to order what they know from the restaurant menu, rather than choose something they haven't had before and risk the unknown. This is despite the menu having a variety of different options, but these new options are seen as a risk, as they may not be as good as what we normally have or are used to having.

This same situation is prominent everyday in the IT industry when selecting vendors or tooling, we need to change this as it actually makes us bad engineers. More importantly having confirmation bias for one particular vendor or tool can have an adverse impact on a business, as instead of picking the correct tools for the job, staff will simply look up if their current vendor of choice and say its either possible or not based on one vendor or tool. When it comes to vendors or specific tools, it really concerns me that companies hire staff and put a particular tool name in their job title. It equally concerns me that people in IT still try and market themselves as an expert in only one technology, as all technology has a shelf life normally shorter than a career. Luckily we have seen this trend in the job market change from "x vendor admin" to more vendor and tools agnostic terms such as "engineer" which suggests companies are starting to see the error of their ways.

Anyone that calls themselves "x vendor expert" or "x vendor guy or girl", one would argue has already shown they put vendors and tooling at the forefront of importance, rather than the end goals of the business they are working for, or process they need to implement, which is a worrying trend. Those individuals in practice tend to be less open minded about the technology stack and probably more likely to exhibit traits of confirmation bias in favour of "x vendor or tool", when asked to make a technology decision. They will promote a particular technology to the extreme, while rubbishing competitors tooling without having taken the time to understand it or even use it. They have a tendency to ignore alternate or potentially better solutions due to the confirmation bias. Any new vendor or tool that may compete, they seek out negative information or blogs that justify their decision to promote a particular vendor. They will also seek out like minded individuals who also share their confirmation bias to justify it rather than road testing all the options.

This is very unhealthy behaviour, falling in love with vendors or tooling is very dangerous. I jokingly call this symptom "falling in love with the stripper syndrome" as I feel this best sums it up, for those looking in from the outside. The vendor becomes like a stripper to the client, the vendor will tell the client they love them, they are great all the time and very very important to the point the client believes it.  By now the client is doing all the certifications the vendor provides, buying their software licenses and more than likely buying the books they have published. The rule here, is never fall in love with the vendor, their job is to make as much money as they can, they don't love the client, it's their job to take make money from the client! This may sound cynical but this is sales routine after all, and business, it's not a criticism of vendors.


What the DevOps initiative should have taught us by now is processes and ways of working are actually more important than the vendors or tooling, which should in theory be interchangeable around the process.  My personal opinion is the best job title for people implementing continuous integration or continuous delivery is "process engineers" as this is essentially what they are doing, people need to be open to substitute out any vendor or tool if a better one comes along. Some will obviously be harder to swap out but as long as benefits the business then it should be possible.
Vendors and tooling should always come secondary to good processes that facilitate business needs. If using a particular vendor over another gives an actual business advantage, I am all for it, as long as desired processes are not compromised. 


At all costs we need to avoid the tail (vendor) wagging the dog (company). If that happens the processes will suffer due to the short comings of tooling, which may not be the best available, and lead to vendor lock in. All software is flawed and vendors best practise guides tend to be a little divorced from the reality of everyday project needs and delivery, so the approaches that they provide under "best practise" very rarely tend to be the way customers are using the tooling in production. This is why it is important that vendors that are chosen listen to clients needs rather than setting a road map of what they think the client wants. There is a very important lesson in this, the biggest company in the market will very frequently not be the best partner. Take the football analogy of Ronaldinho, who was for a period the greatest football player in the world. Ronaldinho over a 2 year period won the European Cup with Barcelona, World Footballer Of The Year and then the World Cup with Brazil. Afterwards, Ronaldinho was never the same player, the issue with Ronaldinho was that he was no longer motivated, as he had won everything and therefore lost his hunger to go that extra mile. In vendor partnerships it is important for companies to partner with vendors that are hungry for success and will go that extra mile to support their business needs and sometimes that means going for those that are up and coming and still have something to prove.

Before making tools choices we should always write down a mission statement of what needs to be achieved, map out the current process flow, map out with your peers the new desired optimum process flow, making it as lean way as possible, then finally choose and map the best tools for the job to implement the desired process. I am still astounded that this never seems to be the ordering in IT when selecting tooling to implement process or automation. Typically I have seen individuals first select their favourite tool, or vendor they are familiar with, then try and use the "best practise" method recommended by the vendor in their certification course to implement it. This scenario makes me want to cry, as it forces vendors or tools to dictate process to companies, so companies build their processes around tools which is simply wrong. It may be a controversial statement but I actually believe in 2016 vendor certifications are worthless, I have benefited more from going to technology conferences. I do believe vendor certifications kick the inspiration and creative thinking out of people, the same way I would argue exams at university weren't as valuable as doing practical course work.



Rather than vendors telling us what we should be doing, we need to be pushing and moulding software vendors road-maps into what we need to achieve to meet our companies needs. This is why open source has come to the forefront and why I believe any company that doesn't open source their code will eventually have to do so to survive. When a new tool is championed or recommended, the first things the people I work with do, is go to GitHub and have a look at the source code. Is it scalable? Is the code base well written? The days of having one vendor or knowing only one technology with closed source code are gone, such vendors are actually treated with suspicion. It's not open source? What bad code are they actually hiding? Vendors all need to adapt to open source to survive and on the flip side companies need to make business decisions based on process and start becoming completely vendor and tools agnostic. Easier said than done, but 2016 is a good time to start if you haven't already, we need to start seeing vendors or tools as a facilitator of processes and not the dictator of process. The worth IT staff bring to companies is first and foremost engineering expertise and problem solving skills, not tools knowledge so please don't make yourself a one trick pony. Instead challenge everything, build new skills and make yourself a better engineer.