Why we chose OpenStack and SDN?
When we first embarked on building a new private cloud in 2015, we needed technology which would last at least 5 years so we chose to use the Red Hat distribution of OpenStack and software defined networking with Nuage Networks.
https://betsandbits.com/2016/10/25/openstack-reference-architecture/
OpenStack was chosen as we wanted an API middle-ware that we could connect networking, compute and storage to and control programmatically. This would make the private cloud solution vendor agnostic, so at a later date if we wanted to introduce a new compute, storage or network vendor we could easily substitute them in without having to write new automation orchestration each time.
Using OpenStack as our automation middle-ware meant that we could build our continuous delivery pipeline orchestration using the OpenStack APIs and they would be compatible with any OpenStack distribution. This was instead of using vendor API's directly that could suddenly change in the next major release meaning rework, or worse still, becoming stuck on older versions of software that would eventually go out of support.
For our automation pipeline orchestration we use the OpenStack Shade library heavily https://github.com/openstack-infra/shade/tree/master/shade and the Ansible OpenStack modules http://docs.ansible.com/ansible/list_of_cloud_modules.html#openstack

Software defined networking was also deemed a must on the project as it would help us simplify network operations and allow us to mutate the network in a completely programmatic fashion. At the time the OpenStack Neutron component, was not nearly as mature as it is today, as of the Kilo release struggled to support OpenStack clouds at the scale of 650 compute nodes in one OpenStack cloud.
A good article on scaling the latest Newton release of the OpenStack Neutron (Networking) service can be found here and it seems to have come a long way since the Kilo release and now works at massive scale https://t.co/ywe9cK4upC
https://www.youtube.com/watch?v=oqF6ezq3eWE
The Nuage VSP solution also meant that we could easily bridge back to our legacy network for application dependencies that had not yet been migrated into OpenStack. This was achieved by making the legacy network routable using the Nuage VSG hardware gateway, which meant we didn't have to fiddle with individual VLANs each time an application needed to talk to the native network. Application access permissions are locked down using the Nuage ACL ingress and egress firewall policy on least privilege basis meaning each application is set-up with minimum amount of ACL policies it needs to operate.
What Challenges Have We Faced With OpenStack or Nuage VSP SDN?
One of the initial challenges we had was updating the OpenStack installer, as Red Hat had recently went from using the Foreman installer in the Juno release to using the new Red Hat OpenStack Installer (OSP Director) in the Kilo release which is a hardened version of the RDO/ Triple O upstream project we needed to work out how we would integrate Nuage into this installer.
The Red Hat OpenStack Installer uses Heat (OpenStacks orchestration service) templates to install OpenStack but at the time these only worked for installing pure OpenStack services, not SDN plug-ins, so initially a lot of development work had to go into integrating Nuage into the installer.
Nuage have since integrated these features https://github.com/dttocs/nuage-ospdirector/wiki which will save users from having to do the custom work we had to do when we started our OpenStack implementation. It is important to bear in mind any service needs to have a Heat template before it can be installed with OpenStack as the installer is a full life-cycle management tool. If as service isn't integrated in a Heat template it will be overwritten the next time the installer is run. There is no room for manual tweaks in this world or you will come unstuck as the infrastructure is scaled out.
Another feature we required was making the solution support our CLOS leaf-spine architecture provided by our Arista switches so the solution is rack aware. This customization is now going into the upstream OpenStack project https://review.openstack.org/#/c/377088/
This will make our OpenStack upgrades far easier as Nuage plug-in and leaf-spine support are integrated in the installer without the need for bespoke customisations every time a new Red Hat distribution is released.
A real game changer has been our partnership with Red Hat, who offered us to be a part of their high touch programme, this has allowed us to collaborate with Red Hat on a monthly basis on features we would like to see included in OpenStack. The CLOS leaf-spine support is one such feature that has been developed as a result of our feedback to Red Hat and they have created a blueprint to implement the features we required in the Triple O/ RDO/Red Hat OpenStack Installer to allow it to support modern networking needs.
So I would advise users to understand how the API calls work and how they traverse each service so when an issue is encountered they can track via the OpenStack request id in the logs.
When increasing timeouts on Nova and Neutron it was important to make sure that Nova, Neutron and HAProxy configuration are upped sufficiently to sensible values. Another default value to look out for is the file descriptor settings on RabbitMQ, in OpenStack these were exceptionally low meaning that once multiple concurrent API calls it hit the limit and maxed out. Upping these settings were really very easy to do once you found out where to look. I do think OpenStack would really benefit from having more sensible default settings and I know it is something that is being discussed in the OpenStack community.
We have also hit some minor bugs with Pacemaker and RabbitMQ versions in the Kilo release which were fixed by going to the latest versions that were already available. But these are simply all minor niggles which are natural when dealing with any new technology so really there was nothing out of the ordinary on this front or any show stoppers.
The other main challenge we encountered with Nuage VSP was the amount we were calling the Nuage APIs meant that we had to move to the Nuage 3.2 R10 release as we were spinning up over 1000 virtual machines a day with a huge amount of ACLs which was being very CPU intensive on the Nuage VSDs. But Nuage like Red Hat through our partnership programme, analysed how we were using the platform, replicated our set-up in their lab and then made some great optimizations for us to ensure as we scaled out the platform we could continue to make sure all requests for new virtual machines became active in a few seconds, as opposed to having to wait 30 seconds during busy periods. It really does pay to choose your partners with care as I haven't seen this degree of support from many vendors I have worked with over the years,
Quantifiable Benefits of OpenStack and Nuage VSP SDN?
So what are the quantifiable benefits of using OpenStack and Nuage VSP SDN together? So in our 1st year of running OpenStack in production here are some highlights when taking the approach we have achieved the following:- Developers can now can self-serve on-boarding of applications and receive compute, networking and storage on demand
- · 82 production microservice applications have been migrated onto the OpenStack platform so far using our automated continuous delivery pipelines and are live in production
- · We do on average 500 deploys a day to test and production environments on OpenStack
- We provision over 1100+ virtual machines each day on the 2 OpenStack clouds we have in each of our datacenters as all our virtual machines are immutable
- · We deployed 50 hypervisors using Red Hat OpenStack Installer across the 2 OpenStack clouds we have in each of our datacenters in one business day
- · We peaked at 650 deployments and 2000+ virtual machines deployed in the 1 day prior to Christmas
- · We do on average 220 production releases a week now on OpenStack
- · We currently have 2207 deployed active virtual machines in OpenStack which is just % of our end estate for our newly merged Paddy Power Betfair company
- · We now run 120 KVM hypervisors per datacentre (240 hypervisors in total) with the end state being 1300 KVM hypervisors.
- · We are currently running 17280 cores on OpenStack with 384 Terabytes of storage (end state is 100000 cores and 2.08 Petabytes of storage)
All in not a bad first year running OpenStack in production with some pretty impressive landmarks. In the new year we will move to the Ocata release of OpenStack so we are pretty excited about implementing the Manila and Ironic projects in OpenStack to offer bare metal and manage NFS.
Shameless Book Plug
I have also written a book called DevOps for Networking which looks at some of the techniques that can be applied to automate the data center and networking in particular. It also shows how to approach building a DevOps model at a company and trying to encourage network teams to automate their job. It focuses on multiple topics such as continuous integration and continuous deployment and covers OpenStack, Nuage and AWS as well as Ansible.It can be purchased at Packts website:
https://www.packtpub.com/networking-and-servers/devops-networking
or alternately on Amazon:
https://www.amazon.co.uk/DevOps-Networking-Steven-Armstrong/dp/1786464853/ref=tmm_pap_swatch_0?_encoding=UTF8&qid=&sr=: