DevOps A Misinterpreted Buzz Word: August 2015

Sunday, 23 August 2015

Why Agile Is a Key Component In Facilitating Continuous Delivery

When we look at the aims of continuous integration and continuous delivery processes, automating the build and deployment steps is never enough on it’s own, don’t get me wrong, it is of course a great start, but this does not solve all the daily problems teams will encounter. A continuous integration process in it’s purest form takes a code change, compiles it, runs unit tests on it, packages it, and then uploads it to an artifact repository. The continuous integration process then hands over to continuous delivery, the continuous delivery process starts by polling for new artifacts in the artifacts repository and when a new artifact is available it triggers the 1st phase of the deployment pipeline. The deployment pipeline has an associated deployment run book to deploy the application to an environment, when the application deployment is completed then an associated test pack will be executed, depending on the phase of the pipeline and it will return a subsequent pass if all tests complete successfully or a fail condition if tests have failed. The continuous delivery pipeline and all its test phases create feedback loops for deployment and test cycles which allows process improvement and refinement. A deployment pipeline works on the principle that we are proving that a release candidate is broken rather than proving it works, as we can't ever truly prove a release is perfect it is always a best effort.

A continuous delivery pipeline should have a static run book of deployment activities which are common across all environments, with the only difference between deployments being the environment file or potentially the number of servers, which will be scaled up as we near production. The environment files job is to hold environment specific information, so all the information that makes an environment unique, which will be used to transform all application configuration files at deployment time and make them environment specific so they are configured correctly for that environment. The transformed application configuration files will allow an application to connect to environment specific database end-points, other environment specific applications, or simply define the URL for the application in an environment, or application run-time settings for that specific environment.

The environments and phases an application will traverse before being released to production can vary, typically a “Quality Assurance Component Test” environment which tests the application component in isolation, utilising stubbing to other components is used as a 1st quality gate prior to the application being integrated with other components. When completed it will automatically trigger the 2nd quality gate which will deploy the application to a “Quality Assurance Integration Test” environment where the application is put in an environment with other applications and a set of integration tests are run to test end to end integrations between the applications are working correctly. When completed the application may then be deployed to a 3rd quality gate “Performance Test” environment, which typically mimics the storage that is used in production, but at lesser scale, and runs a series of performance tests to make sure the new code change hasn’t introduced a performance degradation in the system. After continuous deployment to the automated test environments is complete, the release candidate awaits the manual button push to deploy it to production. To cut down time to market, some or all of these testing phases can be run in parallel and other test phases may be introduced for security, compliance or exploratory manual testing while the release candidate is awaiting production deployment. Later when the test gates have been hardened enough, and confidence has been built up in the processes, the manual button push before the production deployment can be removed and a company can instead implement continuous deployment instead of continuous delivery. Continuous deployment means that after the performance testing is quality gate is completed it will automatically trigger a deployment to production automatically, with the code check-in from continuous integration being the production deployment trigger, rather than a manual button press from a user after all test cycles have successfully completed.

So how can agile help continuous integration and delivery? The truth is it can facilitate interactions between teams, allowing teams to see what other teams are working on, thus enabling them to work in a pro-active rather than reactive fashion and promotes collaboration, forward planning and eliminates team silos. It also helps to show where blockers reside across teams, the illustration of blockers can help with streamlining processes, and will help delivery teams only push a feature when everything has been lined up so the automated processes don’t break. In industry we wouldn’t knowingly make a change that would break a car manufacturing pipeline, by pushing broken changes down it, so why are continuous delivery pipelines for software any different? If we look at the software delivery life-cycle that continuous delivery or deployment helps facilitate, it encompasses contributions from teams from all parts of a business. In order for it to work correctly developers, quality assurance testers, performance testers, infrastructure teams, storage teams, hardware vendors, network engineers and security staff must all involved in this process and be key stakeholders. The main concepts that agile processes promote are realistic planning and consideration of others work, so this should be a perfect fit for cross functional teams.

First let’s look at the infrastructure team as an example of how agile processes can help deliver software to market quicker. In order to build or deploy code we first need infrastructure, so the infrastructure team will need to know how much compute is required in a private cloud and be able to measure how much capacity they will need over time. Once a request has been made to the infrastructure team they may need to coordinate with the hardware vendors and make a purchase order or talk with the storage team to make sure there is enough space in the existing racks for the request. Even if you are buying more capacity in AWS or some other public cloud that means increasing the capacity quota, so public clouds don’t make this interaction go away, someone still needs to look after and approve company spend, public clouds may make reacting to needs for more capacity quicker to resolve, but it still doesn't fix the broken process in the team interaction, just because a problem takes less time to fix when using public cloud in a reactive manner doesn't make it any less of a process issue. When purchasing more kit for a private cloud a procurement process or purchase order will need to be completed which isn’t quick process, as it has associated hardware delivery timelines.

So how can agile help with this process and quicken time to market? If we have a scrum team just full of developers, they will typically think about developing the application and focus on the coding aspect, which is natural it is their bread and butter. However, If someone from the infrastructure team was a part of that scrum team too, they would be able to ask questions from another perspective, that may not be considered by developers. An infrastructure engineer is more likely to ask if we add this new feature will we need to scale out the infrastructure to support it? How much kit would you need for test environments? Would scaling out this application mean we need to scale out some of it's dependencies too. This would help avoid the scenario where developers finish coding the new feature, check it into source control, it hits the continuous delivery pipeline and they realise they don’t have enough kit to test the feature or deliver it to production, then they will have to send an email to the infrastructure team demanding new kit right away. This will only block the delivery of that feature to market, making everything wait on the slowest component which will be the order time of the kit from the hardware vendor. This in turn will frustrate the infrastructure team who will complain they weren’t notified early enough and the developers, who just want their feature to go live, will complain about the infrastructure team being a blocker, it will also cause strain between the Dev and Ops side of the business and also potentially involve an escalation to management.

It really isn’t necessary, to mitigate this an infrastructure engineer could be part of the scrum team and involved in planning of the feature, so while the feature is being developed they can line up the necessary compute kit. However, sometimes this isn’t practical or feasible for infrastructure engineers to join all the development teams scrums and stand-ups in large businesses. Instead a better use of time would be for scrum masters to train developers to consider infrastructure requirements from the outset in every user story. If more infrastructure is required they can assign a task on that features user story for the infrastructure team and talk with the team and have them estimate the duration. The user story will then have a dependency on the infrastructure resource who can carry out the necessary kit order for new compute while the feature is still being developed. This moves the business into pro-active mode as opposed to reactive mode, this way a user story and feature can’t be set to done, until the necessary kit is available to deploy it to test environments or production. A user story is then “Done” in agile terms when it is ready for production deployment. This way the feature isn’t checked in to trunk until the kit is made available for the test and production environments, so until then, we don’t want to trigger the continuous delivery process and block the pipeline with something that isn’t ready to be delivered yet. This is a very simple solution and will keep everyone informed and make sure the time to market is as quick as possible but it needs the buy in from all the teams to work.

Agile can help quality assurance teams in a similar way by involving testers in sprint planning instead of having a developer write a piece of code and then hand it off to a developer to test. This hand-over in waterfall processes is sometimes referred to as “throwing it over the wall” in technology, so typical scenarios are developers write code or features in isolation, check it in, it hits the first quality assurance environment and the test gate fails with lots of quality assurance tests broken. This is because the quality assurance teams test packs haven’t been adequately updated to cater for the new feature as they didn’t know what was coming down the pipeline, this has led to lots of broken tests in the component and integration test packs, so the feature is blocked from being delivered to production until those tests are retrospectively fixed. Even more frustratingly for the developer the new feature may even be operating correctly, but the quality assurance team need a green test run before they can sign off the release.

Operating in this manner means the quality assurance team don’t actually know what are actual genuine errors and what are errors caused by out of date test packs and this completely compromises the value of the test and feedback loop. To mitigate this quality assurance testers should work as part of the scrum team, so they can create tests while the developers are developing the new feature. As a bi-product this also helps cross skill developers to write automated tests for quality assurance test packs. This is so that when a user story is created, tasks to create the feature and associated feature tests are created together, and testing is not an after thought. This way the quality assurance testers are well informed of the new features and can adequately update their quality assurance test packs before the feature is checked in to the trunk branch in source control. The trunk branch will trigger the continuous integration process and associated continuous delivery pipeline, so before this happens a quality assurance engineer will need the time required to remove any out of date regression tests or tweaking them accordingly which is all planned work as part of the sprint. The quality assurance tester will also iteratively run the feature tests, while the feature is being developed by the developer, to make sure quality assurance is done prior to check-in so once the feature is ready it only needs to pass all regression tests as part of the continuous delivery pipeline. This means testing is not treated as an after thought and is an integral piece of software delivery.

Agile can work the same way for network engineers, security engineers or any other team for that matter. Assigning tasks to network engineers as part of a sprint will involve them earlier in the process and the same can be said by security. A group of security champions can monitor user stories that are being planned and raise any security concerns as part of the design process, as opposed to blocking changes going live once features have already been developed and being seen as giving developers more work. It is a common sense approach, to involve every team you need up-front, as opposed to sitting in a silo and not discussing requirements and needs with other teams. Involving everyone in the process of sprint planning, so all requirements and considerations are made at the start of the process is integral for continuous delivery to be successful. This means additional work doesn’t need to be done in an unplanned fashion and all work is considered from the outset. This helps avoid delays delivering the product to market which minimises work in progress (WIP), instead all WIP is tracked as part of sprints and encapsulated by the scrum teams and therefore visible to the business. Any tasks not feeding into sprints is hidden WIP, which will slow down delivery of features to market, if it isn’t transparent it can’t be streamlined and improved upon.

As humans we like to know what is going on and what is planned and agile is a delivery pipeline in it’s own right, as it stores up all the WIP that is required to allow code to be pushed to live using continuous delivery pipelines, but it has to include everyone in a business or it just won’t work. Really the principles and process to deliver code to production are very simple when done correctly, but people are complex and they make mistakes, but iterating these processes over time will make everyone consider the end to end process and become better at their job. If someone in technology cannot explain to you the development life-cycle process of the business then how can they be part of any team to make it better and easier. That is the first step, go back to basics, think what we are trying to achieve and throw away all the unnecessary bureaucratic processes that just don’t matter. If you don’t do this your competitor might and then you won’t be able to compete, so lean processes are so important for companies to remain competitive. If two companies of equal ability are competing to deliver a new product to market quicker and one has fully waterfall business processes and the other has truly agile business processes not confined to just the development teams, I know who my money would be on being able to deliver the product to market faster. Agile and continuous delivery isn’t just for developers it is for everyone and it is truthful and open and that's why big businesses were scared of it for so long, it shows what everyone is doing in detail which can be scary at first for many. However, if your company isn't embracing agile and continuous delivery then it is time to start, it just makes everything so much easier and to fix everything we need to be honest.

Thursday, 13 August 2015

DevSecCon London's Unique DevOps and Security Conference

When discussing DevOps processes such as “continuous delivery” and associating them with “security standards” this normally springs to mind analogies like Leonardo Di Caprio and an Oscar win, or Jon Snow and knowledge of everything, or for those well-read individuals Einstein's theory of gravity and quantum mechanics, or to the everyday person square pegs in round holes. It is two ideologies that are deemed not to mesh or an anti-pattern even. But let’s investigate the reasons for this perception, why do people think this concept doesn't mesh and why do people deem it something that won't work?...

So what is the perception of security teams in the DevOps community? Ironically,the one thing development and operations staff will tend to agree on is security teams, if nothing else, and how irritating they are, with analogies of a mosquito that won’t go away often perpetuated. When discussing security teams with IT and playing the word association game, some of the common words you will hear will be “bureaucratic”, “blockers”, “process monkeys” or just plain “annoying”. This summation needs to be taken with a pinch of salt like any stereotype, but is this an unfair assessment, does everyone really need to hate security teams? Security teams are surely no different from other teams and a security team can be as poorly run and managed as the next team. Tin hat on, there are good security teams and security practitioners out there, I have met them and worked with them at Betfair and other companies I have worked for. I don’t believe for a minute anyone comes to work every morning to do a bad job, or set out to annoy people by setting meaningless tasks. The security team have different priorities such as passing audits, ISO accreditation's, detecting vulnerabilities, preventing loss of data to name a few and most importantly keeping the company in business by proving the business is complying with the necessary rules and regulations. It is a very important role and not an easy one given the security teams priorities sometimes don't match the other teams priorities.

So if we flip the coin, how are DevOps practitioners viewed from a security teams perspective, when they implement processes such as continuous delivery? Security staff generally look upon IT staff with suspicion, as they believe they don’t give a damn about security, and do little to give them any visibility of what is going on and believe they are hiding what they are doing. Typically security tooling is bolted onto existing processes as an afterthought, in an attempt to try and give the security team the information they need to do their job and they often have to beg for information to gain visibility of what is going on, which eats up IT peoples time. Using common word association the common words associated with developers and operations teams by security staff will be “cowboys”, “unhelpful”, “uncaring”, “naive” or just “idiots” when talking in the context of appreciating security requirements. The issue is that IT staff have different priorities from the security team, they want to develop a quality product to market as quickly as possible, maintain up-time of the platform and make changes in a repeatable, reproducible way, and it is a general consensus that they don’t have time for a multitude of meetings around security.

Are continuous delivery and deployment something that cannot sit happily alongside Security? Let’s take a step back and analyse what continuous delivery and deployment set out to do. The main reason for these processes are that they promote IT staff to develop processes that are repeatable and reproducible, which in turns allows products to be delivered to market faster. Is this not exactly what security teams want, consistent processes that can be measured and are transparent and visible so that they can audit processes and make improvements to that process continually?

If we re-trace our steps as to why a DevOps philosophy was originally adopted the sole reason was not tooling, automation or anything else, it was in fact allowing developers and operations staff to collaborate daily and remove the “chucking it over the fence” mentality from daily routines. It was viewed that working in silos was no longer productive and engagement earlier on in the development life-cycle was beneficial for all parties. So by involving Operations staff in scrum meetings and properly assigning work rather than having a reactive model, this allowed operations staff to know what changes were coming, adequately plan any necessary infrastructure changes that were required, rather than finding out at the last minute and having a massive fight and associated finger pointing exercise which delayed the product reaching customers. It also worked exactly the same way with developers and QA testers, involving test teams in scrum teams to start writing tests while developers were writing code earlier in the development life-cycle, allowed testers to not have code “chucked over the fence” that they had no idea how to test. This cut out delays incurred by developers having to spend valuable time explaining to a tester what the feature was meant to do and how to test it and a tester explaining to a developer how something actually needed to be tested. Up front discussions and collaboration between teams helped solve these issues, instead teams could plan features and necessary testing as part of one scrum team. So to mitigate having issues in all these areas, we have learned that if we involve the stakeholders in sprint planning sessions then we have happier staff and more joined up processes.

So today we still have two silos which are the IT and security team, so how can we solve this issue? The solution is already there and has been proven to work. That solution is early engagement and collaboration which could be as simple as inviting your security team to be part of your scrum team or attending stand-ups so teams can build processes with security in mind from the outset, thus collaborating with security practitioners earlier on in the development cycle. This keeps everyone informed and happy, allowing teams to appreciate each others goals. Collaborative projects could be created that even go as far as including vulnerability detection or security steps in a quality gate on deployment pipelines and sharing knowledge on how processes could be automated. The possibilities are endless to improve this relationship and help security staff meet their goals and the winner will be the business.

This is why I am proud to be a part of a new conference in London called DevSecCon, which aims to promote the DevOps and SecOps collaboration. This is a fairly niche conference, that hasn’t been done before, so it will ensure some very interesting talks. At the DevSecCon conference we will discuss topics such as integrating vulnerability scanning into your continuous delivery process and adopting a mind-set of inclusiveness and collaboration with regards security and DevOps processes. The mission statement is that in a continuous delivery process teams should have nothing to hide, so there is no reason not to include security and bring them into the circle of trust. So if you are not familiar with this topic, I urge you to come along or even take part: http://www.devseccon.com/

As well as the sponsorship from my company Betfair and the security sponsorship from Qualys and MWR, we have some really cutting edge companies that have really embraced a DevOps mind-set and pushed forward automation in their respective fields.

At DevSecCon in the networking space, Nuage Networks will be presenting their software defined networking and how it allows secure and automated networking in private and public clouds using it's overlay technology, which allows both DevOps practitioners and security teams peace of mind, that network changes can be rapidly changed without compromising security standards. Arista will also be talking about how they have automated their top of rack switches, which can be racked and cabled in the datacenter and then set-up using a fully automated process. Both companies are really setting an example of how to change perception of their disciplines, when just a few years ago you would have shuddered having networking and hardware vendors presenting at an event associated with promoting DevOps processes. So things can change and one day we will look back and think about the time when security processes weren’t seamlessly integrated into the continuous delivery processes and laugh or potentially cry at the stupidity of it and all the time we wasted. But today could be the day we change things for the better, it’s too important for your business to wait and play catch up, so instead of fighting security we should be embracing it, as DevOps practitioners should really have nothing to hide. So let's help push through the next evolution in DevOps which has security integrated from the outset, let's not allow it to remain an after thought.

Monday, 3 August 2015

Changing A Teams Mentality And The Use Of Negative Language

Self destructive behaviour within a team is probably one of the biggest inhibitors of people and teams reaching their full potential. In a nutshell the main loser is the employer. A bold statement, but fear stops people from trying something new and breeds a culture of uncertainty, causing teams to worry too much about making mistakes. But what happens if the team isn’t even aware they are doing it and how can we break them out of this trend? It is a topic I have pondered recently, in the office I will rarely stick on a pair of headphones, I put this down to my natural tendency to be nosy, as I like to know what is going on at all times. I also love to people watch, just listening to others conversations can help you appreciate why some problems are occurring in a business and also help you propose solutions. Through listening to office conversations, I have noticed a very common theme in teams that are deemed to not be as “high performing” as others and that trait is the use of “negative language". Aside from the more obvious chestnuts such as “I don’t have time to look at this as I am too busy”, “not our teams problem”, “not our priority” and constantly saying “no” which are often associated with teams passing the buck, blame culture or not caring about the business as a whole. I have noticed a general tendency for those teams to use negative language in their daily operations within their team, while "higher performing" teams use it less and it isn't the norm. Is this a coincidence?

What do I mean by negative language? So teams that use negative language generally report that they have “multiple issues” or “problems” when reporting to management or senior management. This is in my mind is overly focusing on the negative aspects of a problem, such language suggests that the team has encountered an absolute blocker and that there are no solutions to the problems or alternate solutions. Or so it would seem, when prompted these individuals or teams normally tend to have solutions or alternatives but they aren’t focusing on this aspect. So why are those teams and people reporting issues in such a negative fashion and why are they generally being very vocal when doing it? Do they not see it lays the foundations for a negative team culture? At times those reporting the issues also seem happy that they have discovered the said issue, even gleeful when they report it. This is the main point that baffled me, are they really happy something is failing? So it made me question a few things....

So let’s look at a scenario, a new joiner joins a company and sees their line manager reporting problems to senior management in a loud and vocal way. This manager is viewed as the benchmark or measuring stick, so it sets a precedence for that teams behaviour, as the manager is viewed as the team's role model. To climb the corporate ladder, the junior or mid level team members view this as the normal way issues should be reported, as their role model behaves this way and one day they want to be in a team lead or in a manager position too. So when junior or mid level engineers discover issues they tend to be happy or gleeful that they can report this up to management, as they see it as adding value and warranting their place in the team. In truth they aren’t happy about finding the issue, although it appears that way to outsiders, instead they are happy about finding issues as it is considered “adding value” to the team by their management. Now this makes more sense....

Lets take a step back and look at the effect of this type of behaviour and why it is actually destructive. Being so vocal about problems in reality makes other team members worry that if they create an issue or do something wrong in the team that it will be loudly reported too. This psychologically causes the team not to take risks and play it safe to avoid the indignation of having a mistake loudly reported to senior management. Immediately a negative mindset has been established in the team that “problems” are reported loudly and always focused on vigorously. This will hardly encourage a team member to experiment or try something new, instead the individuals in the team will be scared of things going wrong and being the cause of an issue. Consequently, It does the polar opposite to what a “DevOps mindset” aims to do, which encourages team members to share their failures or things they have tried. This means issues are covered up in the team or hidden for fear of the embarrassment that failure would bring. It does little to encourage openness, knowledge sharing and transparency. Healthy teams should learn to talk about successes and failures in equal measure as there is a lesson to be learnt from both, some would argue more so when discussing failures.

I am not for a moment suggesting we don’t talk or highlight “issues” or “problems” to management but when explaining to others it would be more productive to focus on “alternate solutions” or “potential solutions” rather than stop at “issues” or “problems” which make them sound insurmountable or something to worry about. Problems are commonplace in technology and should be treated as such, I haven’t met an insurmountable issue yet, there is always a solution to every issue even if some are not initially perfect. It is important for management to illustrate that teams will have issues and problems daily but they need to be dealt with, but do so in a productive and positive way. The use of negative language and apportioning blame to other teams by team leads or managers tends to trickle down to junior members of the team who inherit the same language and behaviour which is very damaging. The use of negative language tends to make teams very negative towards problems, fearing the worst, escalating them quickly with the tendency to panic under pressure. This in turn will cause the escalation of small problems up the management chain rather than remaining positive and calm and solving them without shouting them from the rooftops as this has been seen as a way of “adding value”. It also sometimes makes sister teams doubt the engineering capability of that team as they don’t view them as a team with any answers and a team that reacts badly to problems as this is the image they perpetuate.

Adding value should focus on fixing the problem and solutions rather than reporting it. I once had a manager that said to me that everyone's job is to make your direct manager's job as easy as possible, not more difficult, so escalating problems should be seen as the last resort. In practice escalating every issue up to senior management is not productive, and leads to a similar situation as the boy that cried wolf, as when you have an actual issue that needs dealt with it just seems like more white noise. Dealing with failure is what makes us all better engineers, so without problems or failures we cannot learn or improve so they should be seen as a natural learning experience and not something to fear. I think a lot more can be done by team leads and senior engineers to correct the use of “negative language” or “blame culture” in their team on a daily basis. So for every “problem” replace it with a “numerous potential solutions” and try and keep calm and set an example to your team.

One of the main virtues of a "DevOps mindset" is to operate without fear of failure and this will give workers the freedom to be creative and try new things. It will also make for a better working environment and happier workers which will in turn move your team into the “high performing” space. It will nurture new ideas, knowledge sharing and openness that will slowly help the team improve. Enough is enough, it’s time for a positive change, so the next time you are about to be negative in front of your team please think about the consequences. Instead be calm, try and put yourself in a new graduates position, other teams shoes and see things from a different viewpoint. Only by considering a different viewpoint, can we understand why people are behaving in a particular way. Only when we have a full understanding of that viewpoint and the motivation for the behavior can start to help them change their ways.