Troubleshooting best practices for DevOps teams – strategies to recover quickly from downtime

Monitoring Board

It’s Saturday night – and your system is down, customers can’t access the application any more. And your key developers are not in reach. Sounds like a quite uncomfortable situation. Read here what you can do to prepare for such events – and recover as quickly as possible from outages and downtime events.

The usual steps to remedy problems are clear: understand the issue, fix the root cause. Sounds very straightforward. However, what if the person on call is not the experienced developer and doesn’t know right away what to do? In DevOps teams with shared responsibility and distributed 24/7 support that may happen sooner or later. To be on the safe side you need an approach that enables the people on call to remedy the most common problems without deep expert knowledge. How to prepare for that? Here are some best practices.

(1) Know the usual suspects

Chances are that this is not the first time that the service failed. That may be due to some known but not yet fixed problem, or due to some dependency to an service outside of the team’s control. Such potential “known” issues should be documented prominently, along with step-by-step instructions on how to get up and running again. Ideally this should be part of the ‘troubleshooting’ section of your runbook (see below).

(2) Provide quick diagnostics via monitoring boards

A good monitoring board is the starting point for efficient troubleshooting. Each service should have its ‘health’ board where the status of each major component is displayed, e.g. via green/red status panels. Make sure the overall situation can be perceived at a glance. Where finer grained information is available make it accessible as drill down from these panels. An example: for service latency you can use time series plots over the last few hours. For such display it may be helpful to display horizontal lines within the chart to indicate the ‘normal’ range of the value.

Monitoring Board
Monitoring Board

The board should also show the status for each required external service. This will immediately indicate if the own service or some external dependency – like e.g. a database – is the cause of the problem.

Building good monitoring boards takes time and effort. For each component you need to come up with a reliable status test. However the work will pay back sooner or later. Running production systems without such monitors is like driving a car at night without the lights on.

Grafana is a widely used open source tool for building such boards. There are also lots of other tools, including commercial systems that automatically take care of code instrumentation for health and latency measurement.

(3) Set up symptom based fix procedures

This is the most underrated approach to speed up system recovery. It will take some time and effort to prepare but will most likely provide good learnings for the team and put you in a much better position if problems occur. How does it work?

As engineers we are used to start reasoning about system behaviour from the viewpoint of individual components:

“if the database index is corrupt => service xyz will have high latency

However in an outage situation such information is not very helpful, especially for non-experts. The guys will not see the database index problem – they will see the high service latency. And they want to know what to do about it. So lets analyse the system and set up instructions that start from exactly such observable symptoms. This is how it may look:

high latency of service xyz may be caused by an overloaded database

symptom-cause-fix

Imagine your 24/7 support had a complete list of possible system problems (symptoms) – and for each of them a corresponding fix procedure. Troubleshooting would be a lot easier. Of course there may be more than one potential root cause. Or additional checks may be required to find out which of the possible causes is the culprit.  Here’s the approach to do this analysis in a systematic way. For best outcomes do it with the entire team:

Phase 1: Problem brainstorming

  • Brainstorm possible system problems and failure symptoms
  • Ask yourself: what can go wrong and how would that become visible in system behaviour?
  • Try to make this list as exhaustive as possible

Phase 2: Assign root causes and verification checks

  • For each symptom list the possible root causes
  • If required or useful: add instructions to verify or exclude the suspected root cause – these are the verification checks

Phase 3: Write down fix procedures

  • For each root cause write down the required steps to bring the system back up to normal operation
  • If possible include verification instructions – how would you check that the procedure solved the problem?

Congratulation: you just created a troubleshooting guideline  : )

Do this exercise with the team, and repeat it every few weeks or months t o make it more complete over time – and to adapt it to modified system behaviour or new features. The troubleshooting guideline is also an essential part of the fourth best practice:

(4) Keep a runbook

Set up and maintain a runbook for each of your applications and services. The runbook contains the basic operational data for the service:

  • Name and short description
  • SLA (service level agreement or target)
  • Involved team members
  • Involved artefacts and libraries – and corresponding links to the repositories
  • Consumed external services (dependencies)
  • Build and deployment approach
  • Smoke tests (how would you quickly verify that the service is up and running?)
  • Monitoring KPI’s and strategies
  • Troubleshooting guideline (see above)
  • …and everything else that may be helpful

Keep the runbook up to date – and make sure it is easily accessible for whoever may need the related information.

And how about logging?

Logs are important, no doubt about that. However you should not rely only on logs to find out about your system’s health status. Set up monitoring boards for that purpose. And have your logs ready and easily accessible for verification checks – or situations where the approaches 1…3 did not help and you need to dive one level deeper.

Fast software release cycles – how to avoid accidents at high speed

Why are fast release cycles so important for software development – and what strategies can help to avoid accidents although the team is producing at high speed.

Blurr - fast software releases
Photo by chuttersnap on Unsplash

Fast release cycles create customer value

The goal of every software development team should be to deliver new functionality to the users as soon as possible. Why? Finished software that sits in the shelf waiting for the next release is not usable. It is incomplete work, wasted effort and money. To add value you need to put that shiny new feature in the hands of the customer. Only then the new features make a difference in the real world. This means your software is only complete after the release and deployment. The entire process from development and testing to deployment needs to be optimized for speed.

Fast release cycles enable flexibility

Or think about a situation were your tests have discovered a security problem in your software. Now you need to be able to fix it quickly. Or you may need to adapt to a breaking change in some other consumed service that is not even in your own hands. Things happen, and in Cloud World you need to be flexible and able to adapt quickly. Once again – you need to be able to fix fast, but this only helps if you are also fast at testing and deployment. However nobody wants to be reckless. Jump and see how it goes? You want to be sure that your fix works.

Fast release cycles - but no reckless jump into the unknown
Photo by Victor Rodriguez on Unsplash

Why incremental changes are your friend

The good news is that you won’t change the entire product from one day to another. If planned accordingly the team can break down the work into small steps. Ideally these can be tested individually to get immediate feedback. It works? Great. There’s a new problem? Ok, we should know pretty well where it comes from since only a small number of changes occurred since the last good version. And the developers will have these changes fresh in their minds. Fixing the problem should be much easier compared to yesterday’s approach were many changes came to test a long timer after implementation -and all at the same time. So let’s assume the new small change is implemented and tested separately.

Incremental step wise changes help to shorten release cycles
Photo by Lindsay Henwood on Unsplash

 

 

 

 

 

 

 

 

 

 

The next and final step is to deploy this incremental change and we’re done? Sounds too good to be true and indeed… How can you assure that the small change didn’t cause any side effects and broke something within the existing overall system? This is called a regression.

The new bottleneck: regression testing

So you need to test for regressions. And this basically means that you need an overall test of the entire system, which often is a huge effort. If you want to be on the safe side you will have to repeat this exercise over and over again for each small incremental change. Now if such an overall tests would take days or weeks it kills the nice-and-small incremental approach. It would just be too slow and too expensive.

Software test lab
Photo by Ani Kolleshi on Unsplash

The only way out of this dilemma is…

Test automation – the enabler for high speed releases

Imagine a setup where you could prove with a click on a button that your software is doing what it is supposed to do. That today’s changes did not introduce any regression. Test automation aims at achieving this. Manually clicking through an application may still have its place within an overall testing concept. But in no way this should be your normal approach. Put the test procedures in code and execute them automatically. This is what enables quick feedback to code changes – and therefore fast release cycles. This automated approach has the added benefit of repeatability – and test status reports that your the test framework will create automatically if set up accordingly.

Does this mean that testers are not required any more? Not at all, rather the opposite is true. Test automation won’t save cost – this is about saving time and improving quality. Or in other words: about avoiding regressions although the team is going at high speed. This is where testers play a key role. However with test automation the tester’s focus and know-how profile changes completely. Yesterday testing meant manually executing test procedures over and over again. Today it means development and coding of tests. Testers are becoming developers – or developers take more and more responsibility for testing. Welcome once more to DevOps world. (more here).

Fast software release – mission accomplished?

So let’s assume you the team works with incremental changes. You have automation in place to quickly test the changes for functionality and regressions. We are good to go – now what needs to happen to put the new version to production – into the hands of the users? This will be covered in the next article about Deployment automation. Stay tuned.

Cattle or Pet – what IaC means and why you shouldn’t use admin-UI’s

Why manually installed servers are like pets

Before we look at new approaches let’s see how IT infrastructure was managed in the past. Life of any IT system usually started with basic server setup. Joe the admin would plug in the new hardware, configure hard drives and network and then install the operating system. Then on top of that whatever software or applications were required. He would do that manually via scrips that got adapted to the new infrastructure because names, IP addresses etc. would have to be changed for each new system. Then Joe would check if everything worked, maybe fine tune and add whatever was required before putting the shiny new machine to production. In case of a problem he would troubleshoot it and correct his setup. And over time Joe would take care of his machine. Patch it with newer versions of the OS and system software. Look after backups. Maybe extend disks or memory. The server would be like Joe’s dog – a pet.

Pets are unique

If Joe’s pet had a problem Joe would find out the cause and fix it. Maybe he would have to experiment with one or two settings. Look left and right. However, finally it always worked. But the machine would get more unique over time – more unlike any other server in the world. A pet. Much needed to happen before Joe would re-setup his beloved server. A disaster, like maybe a virus. Then poor Joe would have to go through all his setup steps again, trying not to forget anything.

Fast forward to cloud world. Remember, you don’t own the hardware any more? You basically just rent it. Or you don’t even rent the hardware but just consume services (see: IaaS vs. SaaS). In no way you can continue to handle your infrastructure as Joe did. Well –

You can do that but then the sky shall fall onto your head !

Why? First of all you probably need to set up your cloud infrastructure more than once. You will need a productive system, but you won’t use that for testing during development. So you need another environment for development. Or you may need to set up your entire infrastructure at another region, or with another provider. Or one day you may want to experiment with a new approach or run separate tests in parallel – yet another system required. It is crucial that all these systems have the exact same configuration. Otherwise be prepared for big surprises during release…

Cloud infrastructure is cattle

The only reasonable way to handle this is by automated infrastructure setup. Don’t go down Joe’s road. He could do it manually and survive because he owned the hardware – in cloud world you don’t. Yours is more like this:

A horde of cattle. Your machines and all other cloud building blocks are standardized, and there are potentially many available. They don’t have their own personality, at least they should not. If one disappears some other will take its place. Means you need to be prepared to replace it quickly and this is where automated infrastructure setup comes in. You will have scripts to set it up and to configure it without ever touching an admin UI. Your infrastructure becomes code. Infrastructure as Code: IaC.

What infrastructure as code means

People often use the term “cattle vs. pets” when talking about IaC. This goes hand in hand with the “immutable server” concept: you will never change infrastructure configuration once it is in place. Consider it as immutable. Instead fix your IaC code and run it again to set up a new machine. Delete the old one. This is fast and reliable and always gets you to the exact same state – 100% guaranteed. You can do this as often as required, and in the end you have a piece of (IaC) code that is tested and can be reused whenever required later on. You will probably have parameters that are specific to a certain environment, like e.g. the URL’s for your development and your productive system. Make sure to keep them separate from your IaC code. You want to run the identical code for each variant of your system. This is how you ensure they are all identical. IaC also solves another challenge: manual changes to your production system. Consider these as “high risk” activity – don’t do it. Instead, pre-test and run your updated IaC code. This will reduce chances for human errors, ensure reproducible results and at the same time provide full traceability of the production changes.

How to write IaC code

How would you manage and run this IaC code? Since it’s code you will have it under version control and you should execute it via a build pipeline – see this article on tooling. And how exactly would you write the IaC code? There are several approaches. The large cloud providers each have their own standards like e.g. CloudFormation for AWS or ARM Templates for Azure. Or take a look into Terraform which has some nice additional features. If you need to automate the setup of individual virtual machines (which you should avoid, consider serverless instead) then Puppet, Chef or Ansible are probably the most popular options.


Check out these related articles:


 

IaaS vs. SaaS – why the difference is very relevant for cloud software

Find out what the difference between IaaS (Infrastructure as a Service) and SaaS (Software as a Service) is – and what this has to do with cloud software architecture.

Lift and shift?

You may have heard of the term lift and shift – which means you take some existing system or application that has been living happily in a data center for years, rip it out and move it over to some cloud provider. Usually the idea behind this is saving on data center infrastructure and management cost. And usually lift & shift means that you spin up the required number of virtual machines in the cloud and reconfigure their connections and backup settings. Sounds simple, but is this giving you the benefits that true cloud solutions could provide? Most likely not. VM’s are great, and they have revolutionized app and service provisioning. They have decoupled the 1:1 relationship between soft- and hardware and allowed for easy sharing of servers between different systems. And they are still going to be around for a long time. Using VM’s from some cloud provider would mean consuming Infrastructure as a Service (IaaS).

However we are in 2019, and VM’s are not the latest-and-greatest technology any more – rather look at SaaS (Software as a Service) as the new mainstream. What does that mean?

Assume you need a database to store your customer and sales data. In ancient times you would have bought a server and installed your operating system and database software. You would have planned for regular updates of your systems to cater for security patches and bug fixes. If your server broke down you were in trouble and hopefully had a disaster recovery plan that allowed for fast re-installation once the hardware was fixed or replaced. Then the VM concept entered the arena. You could just backup your entire server to a single file. Move it to another machine if required. And the virtual machine concept enabled better use of existing hardware by running several VM’s on the same physical machine – a huge step forward.

The IaaS approach

With the cloud you can now use the exact same technology but without owning any physical hardware by yourself. Just sign up with a cloud provider and book as many cloud based VM’s as you need – with size and performance as required. This approach is called IaaS – Infrastructure as a Service. But it still leaves you with maintenance work for the VM’s operating system and database. You still need to maintain and manage those setup procedures and maybe fine tune your database system parameters. If you need high availability you’ll have to set up several VM’s and manage the cluster. And if your database is sitting idle because there’s not much activity during non office hours you still pay for every single minute all your VM’s are up and running. You could do better: go and buy not the VM for the database – rather buy the database service itself! Skip that middle layer. The cloud providers offer a broad range of database services – from the classical relational DB’s to NoSQL and Caches, all fully managed, with high-availability and data backup rather simple configuration options.

The SaaS approach

This is the SaaS approach – software as a service. It will be much simpler to set up and maintain, and it will typically be more cost efficient. And easier to scale up if required. Can you do this for existing legacy software that you can’t or don’t want to touch? Probably not, at least not if your legacy software requires a specific version of database xyz with specific settings and configuration. Can you go that way for a new development? Yes of course – pick the most suitable option for your use case, and don’t forget to take a look at resource pricing before you make your choice. Your system architecture and your selection of cloud components as building blocks will have a huge impact on your future operation cost.

For many use cases the SaaS approach will be more interesting. Go that way if you can. Especially for new developments consider using SaaS over IaaS approaches, and if possible serverless over containers. The ‘lift & shift’ approach for existing applications could for many companies be the first step towards cloud based IT. However don’t stop there – at least you may want to investigate a more in-depth approach over time, where the existing application is restructured and optimized to leverage the cloud capabilities.


Check out these related articles:


 

Cloud Native? What it means to develop software for the cloud

What is cloud native – and what does it mean to develop software for the cloud? Find out what makes software teams life very different in cloud world. What is so special about cloud software? 

Is there anything special at all? If ‘cloud’ would only mean booking a virtual server and installing the application there – then the answer would be no. However this would only leverage a small part of the what-is-possible. Hidden within the term “cloud native” is the assumption that you want to realize on-demand scalability, flexible update capabilities, low administration overhead and be able to adapt infrastructure cost dynamically to performance demands – just to name a few. Sounds wonderful… however these things won’t come all by themselves. Your software needs to be designed, built and deployed according to…

The cloud-rules-of-the-game

Huh, what’s this? First of all don’t get me wrong: there is nothing bad about having a monolithic application residing on a single server or cluster – and from development perspective this may be the simplest way to realize it. However this monolith may get difficult to extend over time without side effects, it may be difficult to set it up for high availability and to scale it with growing number of users, data, or whatever may be the reason for your scaling needs. And it may be hard to update your monolith to a newer version without impacting the users during the update.

Software monolith vs microservices

Now consider the same application based on what is called a microservice architecture: the monolith gets split up into smaller, decoupled services that interact with each other as providers and/or consumers of stable API’s.

What makes microservices well suited for cloud applications?

Let’s assume that each service may exist not only once but with multiple instances up and running simultaneously. And let’s assume the consuming services are fault tolerant and can handle situations where a provider service doesn’t respond to a call. Wouldn’t that be cool? The overall system would be robust, and it would be very well suited to run on cloud infrastructure.

  • Because now if service xyz is starting to become a bottleneck you can simply create more instances of that service to handle the extra load. Or even better, this would happen automatically according to rules that you have configured up-front. This approach is called “scaling out” (compared to the old-school “scale-up” approach where you would get a bigger server to handle more load).
  • Next up imagine that you need to update service xyz to a newer version. One way of doing this would be to create additional service instances with the new version and remove the old ones over time, an approach called “rolling update“.
  • Or you decide to add a new feature to your application. Since only 2 of your 5 services are impacted you will only need to update these 2. Less change is easier to handle and means less risk.

Getting microservice architecture right is not easy, but once you have it the advantages are huge. Note that microservices architecture as such has nothing to do with the cloud as such. However both go very well together because in a cloud environment you need to be prepared to micro outages of single services anyway. You don’t really control the hardware any more, at least you shouldn’t want to. In order to be cost efficient with a cloud approach each service should be enabled to run on commodity infrastructure. From that take and pay for as much as required.

Where will your services live?  – serverless vs. containers

For hosting your workers or compute loads consider a serverless approach over a VM or container based one. This basically means that you only write the service code as such and determine when and how the logic will be triggered for execution. All the rest is handled by cloud infrastructure. At Amazons AWS this technology is called Lambda, Microsoft named it Azure Functions, Google calls it Cloud Functions. The principle is always the same. There’s no virtual machine any more, not even Docker or Kubernetes containers – means less things to manage and look after, means less operations and maintenance effort. And you only pay for the execution time. If nothing happens there’s no cost. If you suddenly require high compute performance your serverless approach will be scaled automatically – if you have done your architecture homework. Serverless will e.g. require that your services are stateless, means whatever information they need to keep between two executions of the service must be stored externally, e.g. in a cache or database service. As with microservices the advantages need to be earned. Serverless will not solve all problems, but make sure your team (at least the architect) understands the concepts and can make informed decisions about its use.

Other game changers in cloud world

What else has changed in a world where dedicated physical servers seem to have disappeared? Very trivial things need to be handled differently.

  • There’s no local hard drive any more. If your service runs in a VM or Docker container it may feel so, but remember: in cloud world machines are cattle. A VM/container may die or disappear and will be replaced by a new one. Bad luck if you had data on a local drive of that machine. Now you’ll need to think about alternative ways for storing away your data or settings. In cloud world you may want to use a storage service for that purpose, a central configuration service, a database, environment variables… the choice depends as always on the requirements. Make sure the team knows the available cloud building blocks.
  • If you are a Microsoft shop: there’s no registry any more. See above comments.
  • For logging there are no local files any more. These would anyway not make much sense in a world where services are distributed. You’ll rather send log output to a central logging service. That will consolidate all logs of the various services in one central place, making troubleshooting much easier. There are many open source solutions for logging, our you may just use the one your cloud provider provides.
  • And finally the term infrastructure will get a whole new meaning in cloud world. Infrastructure still exists, but now it needs to be managed very differently. You should strive to set it up automatically, based on scripts that you can re-run any number of times. This is crucial because you will need more than one cloud system. At least you should have one for development and testing which is separate from the real one – your production system. The two environments should be as identical as possible, otherwise your test results are not meaningful and you will chase after phantom problems that are just caused by some infrastructure misconfiguration. Those scripts will set up your required cloud resources.  Means you describe the infrastructure in code just as your service logic. Infrastructure as Code (IaC) is the term for that. Check out this article for more.

So what is Cloud Native after all?

Hopefully it became clear that software needs quite a few specific considerations to feel comfortable in cloud world. Which is the reason why “lift and shift” for existing legacy software is a valid approach, but won’t leverage the full cloud potential. In order to run efficiently in the cloud, software must be designed for that purpose – this is the meaning of cloud native – software that is architected and optimized for a cloud environment. For legacy software that usually means: for efficient shifting to the cloud major refactoring or even a complete rewrite may be required.

So welcome to cloud world. Tremendous power and flexibility is at your disposal. However you’ll need to architect your software with the cloud and its building blocks in mind. Decompose your application into services. Consider serverless approaches to reduce operation effort and improve scalability and availability. Focus on your domain knowledge and your specific value-add. For the basics use existing building blocks wherever it makes sense.


Check out these related articles:


 

What does cloud mean – and what are the real advantages?

We’re in 2019 and it seems like new software projects are designed for the cloud. Seems like. Maybe this it not true yet despite of all the hype – but what does ‘cloud’ mean? What are the drivers to use it, and what are the benefits?

Once more the internet is the big game changer. Network bandwidth at close-to-zero cost and with high availability enabled the shift of IT workloads. Software that had been running on company servers and internal data centers is gradually transferred to cloud providers – e.g. Amazon Web Services (AWS), Microsoft Azure and the Goggle Cloud, to name the 3 largest ones.

Data Center
Photo by Tanner Boriack on Unsplash

What is “the cloud”?

These companies run huge data centers and sell their compute power in small slices to the masses. Of course IT and software still run on server hardware – but for the users it doesn’t feel like this any more. Users of cloud services are completely shielded from the physical hardware. They don’t need to think about all the basic installation and maintenance work that was usually required to get a large number of computers up and running. To backup their data, upgrade and patch the operating system etc. The ‘cloud’ has established a high-level abstraction for all this and made it easy to consume compute power as required.

What started more than 10 years ago with some simple storage services as a sideline business of an online bookstore has since then grown into an incredibly versatile web of services ranging from virtual machines and networks to databases, caches, load balancers, streaming engines and many more basic building blocks of current cloud systems. All of this is available on demand within seconds or minutes. Performance KPI’s can be selected and scaled as required. The cloud provider takes care of all the necessary heavy lifting in the background – highly automated and with redundant infrastructure that usually spans multiple data centers. The user just consumes, and only pays what he needs.

Unlimited scalability

Need to set up a website for 200 users? 5 minutes and we’re online. Need to scale up to 1 million users? Just a few more minutes and here we go. Need to add a terabyte-size database cluster? Minutes again and the system is ready. Compare this to the weeks of planning, ordering of hardware, system installation and configuration that would have been required to set everything up locally. And now assume that the system was only needed for a 3-week marketing campaign. No problem, let’s stop and remove all services when it’s over, and the cost goes back down to zero immediately.

This is the power of the cloud. Virtually unlimited resources and flexibility, on-demand consumption of services and pay-as-you-go pricing models.

Are Cloud systems cost efficient?

Is the use of cloud resources always cost efficient? It depends. The huge system for the 3-week marketing campaign is most likely very cost efficient. Now if you think about replacing your well managed local company servers and databases by cloud resources – depending how it is done you may end up with lower or even higher cost than today. Adopting the cloud usually means much more than just re-locating existing servers as they are – this would be called lift & shift. Check out this article for more. One thing that is not always easy with cloud services is exact pricing. Since you pay for the resources ‘as required’ and resource cost may depend on load, data volume and other dynamic factors the overall system cost may vary over time. Use the cost estimation tool of your favorite cloud provider and fill it with 2 or 3 typical load scenarios. This should give you a good idea about monthly cost and also highlight your major cost drivers.

Should all software run in the Cloud?

Cost aside, is each and every workload suitable for the cloud? Again it depends. Despite high reliability of networks and globally distributed redundant infrastructure – it is nearly impossible to achieve the latency and robustness of a local system with a cloud based approach. And you would rather not consider a cloud based approach if your compute requirements are extremely simple and need to be very low cost. You would most likely not shift the control program of your washing machine to some cloud, just put it onto a 5$ chip and use it for the next years. Which also eliminates the need to connect your washing machine in some way or another to the cloud…

Are Cloud Systems secure?

Finally let’s talk about security. For many decision makers this is the most critical question when deciding for or against a cloud based solution. Is critical data safe in some providers cloud data center, from which you don’t even know where exactly it is located? And where your data is stored side by side with data of millions of other customers? The answer is most likely yes, but as always when it comes to security it also depends. Security always requires measures on multiple levels. It starts with the operating system of the servers, the configuration and administration of the various application layers, the approach for authentication and authorization, backup, encryption etc.

Security on autopilot?

When consuming cloud services you can count on your provider to do his part if the job. But you still have to look after yours. An example: the cloud database service you are using may provide these encryption and backup features. But it’s up to you to turn them on and manage the keys and access permissions appropriately.

In any case all capabilities for building of highly secure solutions exist – but they need to be used appropriately. Compared to some locally managed PC your data will most likely be much more secure in the cloud. And the overall system can have higher availability and automatable backup if configured correctly.

When it comes to sharing and redistributing of data cloud solutions really shine. It is far easier to provide managed access to data located within some cloud resource compared to securely opening up your local system to outside access via the internet. So security is an area of concern, but nothing that should keep you away from using cloud systems.


Check out these related articles:


 

Tool checklist for cloud development – set your team up for productivity

What tools are required for teams that are developing software for the cloud? Verify this checklist to find out if your team has the basics for productive software development in place.

Versioning system

You know that you
(1) have one, (2) it is set up correctly and (3) and your team is using it
… if you can answer the following questions with YES (in case you’re not sure if you need a versioning system read here)

  • Each team member is able to revert source code to a former version at any time.
  • You are 100% sure that this works because you have tried and tested it
  • You are able to quickly find out what has changed between versions
  • You are treating configuration files, documentation and any other artifacts like your source code – everything is under version control
  • Team members check in modified source code at a regular basis
  • Your team has agreed on a common branching strategy
  • If your versioning system would fail completely today there would be no panic. You would just set it up again from scratch and reload yesterdays backup. You know that this would work because you have tested it at least once.
  • If yesterdays backup was not created there is a notification in your team mailbox

Issue tracking

You know that you
(1) have one, (2) it is set up correctly and (3) and your team is using it
… if you can answer the following questions with YES (in case you’re not sure if you need an issue tracking system read here)

  • The team has a complete list of all currently known bugs and issues
  • The list is accessible to each team member and everybody is be able to work on issues (e.g. add comments)
  • Each team member can see “his/her” issues with one single click, and ideally get automatically notified about status changes for “his/her” issues
  • The team has a clear policy on issue status management: what issue status values exist and who is allowed to change issue states? An example: many teams follow the principle that an issue should be verified and closed by the person who initially opened it. Your team may want to handle that differently – just ensure that everybody agrees on the same standard.
  • Each issue in the list has at least:
    • a clear title and a description that each team member can understand without need to ask whoever has written the issue
    • a status and an owner within the team
    • additional information required to tackle the issue (screenshots, steps to reproduce, logs etc.) and meta information that helps to manage the issue and understand its history (time created, component or service concerned, version, …)
  • Daily backups are created and stored on a separate system. The backup / restore procedure has been tested at least once. This is not required if you use the cloud service of some provider (see article)

Build and Deployment System

You know that you
(1) have one, (2) it is set up correctly and (3) and your team is using it
… if you can answer the following questions with YES (in case you’re not sure if you need a build / deployment system read here)

  • Each developer can create a new build with a single command or click
  • Build status is visualized and team members get notified about build problems
  • New builds can be deployed to the desired environment with a single command or click (at least for the productive environment most teams will set up rules regarding who and when this is is allowed)
  • You can always tell which version is installed on what environment
  • You have a track record of builds and deployments

Team Collaboration and Knowledge Base

You know that you have what you need if you can answer the following questions with YES (in case you’re not sure if you need tooling for team collaboration read here)

  • Each team member can access a common system and find the most relevant 5 documents via direct short link
  • Each team member can add or modify content 
  • Each team member can search for information or documents via keywords

Check out these related articles:


 

Tools for agile development? These are the must-haves for every dev team

What tools are essential for agile development? A code editor may be enough to get started. But what are the must-haves for every dev team?

(1) Software versioning

Never develop software without a versioning system. Why? sooner or later you will need to know what you changed since that last version that actually worked and did not have that strange bug you have now been tracking down for hours. Or you may need to fix a minor problem in the productive version but don’t want to roll out all your latest untested changes. Or you work in a team and several people contribute to a common product or service.

A version control system is a central code repository and provides what you need to combine work results in a controlled way. Don’t try to get along by copying your sources around in different folders or zip archives. There a many great tools for proper version management, pick one and use it. GitHub is one of the favorites. it even takes care of permanently and securely storing your sources. Left your notebook in the coffee shop and now it’s gone forever? Well bad luck, but at least you still have your source code available. And while we’re at it: put everything under version control, not only your source code. Version control your configuration files, your specs, your user documentation – whatever you and your team are producing. It’s just like a car insurance – sooner or later you will be grateful for having it.

(2) Issue tracking 

Software comes with bugs, and you need a system to keep track of them. You’ll want to know what problems have been found in which version of your software, what has been fixed already. Team members will need to add comments, logs or screenshots to an incident. Ideally you may want to link the issue to the code in your central repository that fixes a the problem. And since you work in a team you want a system that supports simultaneous multi user access. Don’t waste your time trying to do all this with some spreadsheet software. Yes you could do that, but it won’t really work. There are many good incident management and bug tracking systems available. You may want to take a look at the issue tracking that comes along with GitHub, or check out Jira – these are probably the most popular ones.

(3) Build and deployment automation 

Software is born on one or several developer machines, but this is (hopefully) not where it is going to be run for productive use. Various steps need to happen before the checked-in code arrives in production. Depending on your development environment, programming language, used components etc. the artifacts for the prod system need to be built and versioned. You may want to apply static code analysis. Deployment to a test environment comes next. If all tests have been passed you or product management or whoever is responsible for that may decide to put that shiny new version out for productive use by your customers or users.

Let’s note that most of these steps need to be automated in order to get to reproducible high quality results. The system behind this is often called the build pipeline because things happen sequentially in a predefined order. If any of the steps fails the pipeline stops – the team needs to fix the problem and restart the pipeline again. A basic build pipeline system is e.g. available as part of GitHub. Then there are many open source and commercial solutions out in the market, and each of the major cloud providers offers a solution. If you’re not sure what to use start with GitHub – you can always move to some other system later on. Just make sure that you have an automated build and deployment approach right from the beginning. Improve it over time.

(4) Team collaboration 

This is a crucial topic and the perfect approach for team collaboration yet needs to be uncovered. And of course no tooling will ever replace person-to-person communication, willingness to engage, to share information, to work together. Check out this [] blog post to dive into this. Having said all that – collaboration is still an area where good tooling really can make a huge difference. What you want to have is a central place for all your team’s information sharing needs. Documents, drawings, ideas, team calendars, whatever. You will want to search for stuff, and create links between various pieces of information.

Of course you could just store stuff on some shared folder or Dropbox. However this will be far from good. Who will want to browse around in some obscure folder structure? Admit it – this does not even work for your own stuff on your notebook, am I right ? We are in the new millennium and Wiki’s have been invented – if you’re not using one already please start to do so now. Once you have used one you’ll wonder how a team could ever have survived without. It will of course only be accessible by your team, and each member can set up new pages and content with a few clicks. Documents become much more lightweight and can easily be interlinked. Updates are done in seconds and you’ll have one valid version per document only. Paradise. Who knows, maybe even your most no-documentation-required style people will start to add stuff to existing pages.

There are many hosted Wiki flavors out there, setting one up for your team is a matter of minutes. A good choice is Confluence (by Atlassian – no I’m not getting paid by them, I just like their software). If you happen to use their Jira for issue tracking you can take advantage of the integration between both. In my opinion a great choice and all you need to cover your team’s basic tooling needs.

(5) Is there point 5 ? 

Well there’s many more tools out there for life-cycle management, test automation, code analysis, etc. etc. – but these may be somewhat more advanced topics and your needs will depend on the type of software you develop, on your programming languages, target system and so on. For now we wanted to stay with the indispensable basics. You can always add to your tool-set, just make sure you’re not missing out on versioning, build automation, issue tracking and a Wiki – the basic tools for efficient agile development.


Check out these related articles:


 

What is DevOps? How development for the cloud changes a dev teams life

What DevOps means is quickly explained: Development + Operations together. ,But what does DevOps really mean for development teams and their day-to-day work? And what is ‘operations’ to begin with…?

“Operations” explained

What is operations, does all software need to be operated? To explain this let’s take your local Word and Excel, or whatever local software you have installed, as an example. It just sits on your notebook. Once in a while you’ll probably update it to a newer version but that’s it. It is your personal software on your own machine – no real operation involved.

Compare that to your email. Here again you may use some local client, or just the browser. No operations. But then there is your email provider and all that is required to manage your mail account and transmit your mails. This is done by services that run in some datacenter and you can count on a team of experts that look after that software. They’ll make sure that the system runs smoothly, they apply the latest security patches, protect it against hacking attacks and securely back up your data, to name just a few of the tasks. This is what an operations team does. And you need to rely on it because you are using services that are not under your own control. Means whatever software runs in a cloud or datacenter will need operations.

Software development vs. operation

Traditionally there has been a very clear separation between teams that develop software and the ones that operate it:

DevOps separation of development and operation

Photo by Raj Eiamworakul on Unsplash (stuff in red by Tom)

The developers would write their code and maybe even test it 🙂 but as soon as possible throw it over the wall to the operations team. Then it would be up to them to figure out how to install and run it. Okay, maybe there are companies with good collaboration between these groups, but still there may be some conflict of interest. Developers will want their new versions out on the productive system as soon and as frequently as possible to bring new features and bug fixes to their users. The Ops team will try to slow things down a bit and play it safe since every update is considered a risk and may required some system downtime.

Operation for cloud software

Now fast forward to cloud world with an agile team in Scrum mode. Software sitting idle waiting to be deployed is considered ‘waste’. The infrastructure does not consist of physical servers owned by administrators any more. Now infrastructure is code and the dev teams’ architectural design decisions have a huge impact on the required cloud building blocks and corresponding cost. Also the ongoing operations effort is to a large part determined by the architecture. System changes may require modification of the infrastructure code, too: adapted configuration of the used cloud provider services, extensions to the system monitoring etc. The entire separation between development and operations does not make sense any more. As they say: you can do it, but it’s not good.

DevOps to rescue

Instead let’s put everybody in one team. Ensure that operational concerns are considered during development and when designing the system architecture. The team should consider the overall lifecycle cost and minimize effort accordingly. This is what DevOps is supposed to mean. No wall any more, not even two distinct groups, hopefully.

development team
Photo by rawpixel on Unsplash

Check out these related articles:


 

What is SCRUM – setting up highly productive development teams

What is the best team setup for efficient development? How the Scrum methodology and agile principles can help to increase software development team productivity. 

The challenging way to version 1.0

Let’s assume that you have a good-enough-idea of what you want to build, at least for the first version of the new baby. How do you start off? Just let the team work – after all they are professionals and they should know… You can do that but it won’t work. Why? First of all, despite all the lengthy discussions you still don’t know exactly what you want to build. Most likely every member of your team has a different version of that clear goal. And the tricky thing is that you are not aware of these subtle but relevant differences. Between your idea and the first version of tangible software you’ll have to cross the land of realization. These are dangerous grounds where orks roam and you better not send out the team unprepared. As long as you are safe in Rivendell make a plan that everybody can follow even when walking alone for some time. Slice the mammoth. Break down the work in small pieces and write them down, each part at a time. Make sure all parts fit together and that the sum of the parts makes up what you have in mind for that next version of your software. Put the parts in an order, with most important ones first, and assign each part to one of your team members. Sounds a lot better than just heading out into the wilderness, doesn’t it?

No Master Plan required?

No. You won’t create that 200 page master plan for all that may need to be done within the next 3 years. Don’t waste your time, things will change and if we’re honest you don’t even know what’s going to happen 3 months down the road. But the next 2 weeks should be doable. That seems to be a time range that humans can overlook. 2 weeks are still soon enough to review the first achievements. And just long enough to give the team the time required to create results in the first place. Feedback and intermediate steps are what you are looking for when navigating the uncertain plains of software development. An incredibly high percentage of software projects fail and that should not happen to yours. Don’t be afraid but also don’t be careless. The orks are out there and your team needs a structured approach to succeed. Many methodologies exist and the best choice depends among other things on what you want to build. If you’re about to prepare the next mission to mars you may want to stick to rather old school waterfall style approaches. For all others: take a close look at

Scrum and agile principles

This may sound very abstract at first but is very straight forward once you have a basic understanding. Much of this about accepting that humans seem to have great difficulty to define upfront how exactly that final software should look, work and behave. Of course you can try to define it to the last detail before building it – but it won’t be good, at least not nearly as good as it may get with more degrees of freedom to change and adapt. Think about a cook inventing a new recipe. Will he do it all upfront at his desk? The basics, sure, but the difference between good and great happens during preparation in the kitchen. So with software. Learning usually happens over time and the teams growing understanding of the domain unlocks new possibilities. However if change hits too often chances are that nothing gets ever done. Some balance seems to be required and this is where Scrum comes in. It strikes a balance between not planning at all (no, not good) and overdoing the planning part. It allows for enough flexibility to factor in changes but provides the basic guide rails and structure to keep chaos at bay. And it provides some insight into work and team progress over time. Sounds great, doesn’t it? Let’s take a closer look. If you want the full story go to scrum.org. What you read here is a high level summary.

Epics, User Stories and the Backlog

Did we mention the 2 week time range and that a breakdown of the vision to smaller parts is required? In Scrum these small parts are called epics and user stories. A user story describes a functionality, a feature, some requirement from a specific stakeholders perspective. It should be a very tangible thing with clearly defined outcome, with just enough information so that the people in the team understand what is expected. The team will order the user stories  by priority – they all go into a central list called the backlog. And this is what the team is working on within the 2 week period – which is by the way called a sprint. Before the sprint starts the team has agreed on the user stories to be tackled and their priority. The backlog for this sprint is then fixed and the team can focus on working on it undisturbed. No change of priorities, no sudden additional tasks. Let the team do its job. In time before the next sprint starts there’ll be another planning for this next sprint. Then the game starts over again.

Why 2 week sprint cycles?

This is some sort of balance between structure and flexibility: the current sprint is fix, but you can plan the next one as required, factoring in whatever changes that came up. So the backlog is your definition of the work to be done, and it will change over time. However the team will work on stable ground and produce results every 2 weeks. Note that there is no fixed deadline and no fixed overall goal any more – it just doesn’t make sense in this ever changing environment. Instead there is the backlog and each completed user story drives the team towards the next better version. Some forecast about what will be available when is still possible but with less binding guarantee. This approach may sound very challenging in an enterprise, customer, contract context, and it somehow is.  By the way – how reliable was your binding guarantee in the past anyway?

Overcome lack of transparency

This is another major challenge in software development. Where does the team stand? How much time is still required to get that next version ready? When will it be done? If the answer is ‘one more week’ or ‘one more month’ you know you’re doomed. These answers mean in fact that we are optimistic and quite sure that we can get the job done – but have zero clue about the remaining effort. Just leave me/us alone and come back again in one week/month. Then by the way you’ll get the exact same answer again. Sounds familiar? This is not bad intention, this is how humans are structured. We tend to underestimate larger work packages. Back to Scrum and the user stories. A story should be small and have a clear, tangible outcome. Something you could see or ideally test once it is done. Like an updated version of that UI with the new feature xyz. Small means it should be feasible to complete it within 1…3 days.

Break things down into smaller parts

If this sounds impossible break your story down to several parts, think about a step wise approach. Sounds like a bit of extra work but the team will get many benefits in return. Each small part will either be not started at all, in progress or done. That’s it. Nothing in between. No it’s-almost-done-come-back-in-a-week any more. The team will see that progress is visible on a day by day basis. Check out this blog post [] for more details. Ah, how will the team see that by the way? The backlog will be visible to the team at any time. You could do it via post-it notes on a wall, one per user story, as very basic approach. Or probably much better via one of the many available Scrum tools with online access for the team and a virtual board for each sprint. You can find some recommendations in the tools article.

Streamline team communication

This is the next critical element. No lengthy status meetings any more, from now on there’s only short ‘stand-ups’ but these happen each and every day. The team meets standing up. Don’t get too comfortable, this should be a quick thing, just a few minutes. Everybody tells what his/her current topics are, any problems or help needed? Then up to the next one, no lengthy discussions. And over, back to work. This approach is much more efficient than weekly status meetings that nobody follows over the entire time anyway. For any detail clarifications people can meet one-one-one as required.

Learning and improvement

Good is never good enough. At the end of each sprint the team will change the perspective to the meta level and meet for a ‘retrospective’. The purpose of this meeting is to think about the team as such, reflect on what worked well within the sprint, what did not, what could be improved? The team should capture and follow up on proposals and ideas. Make team members address impediments together. Get stronger and more efficient over time. And by the way this is it. Stand ups, backlog grooming and planning, a retrospective. No other meetings. Leave the team alone, give them the time to do their work.

And how about roles?

Scrum articles often start with role descriptions as if this was the most important thing. I think the idea is the most important, the team working together, ever improving. Yes, somebody should own the definition of the software product to be build, collect feedback and bring it back into the team, in general act as interface for the team towards the outside world. The outside world will be first of all customers and users, but also most likely other internal stakeholders like upper management, sales, marketing, whatever – depending on company size. This is the Product Owner (PO). The team as such is responsible for contributing to and for delivering on the backlog, but the PO is the one who writes the user stories and prioritizes them. This puts him in the driver seat for steering the product and ensuring fits with the overall vision and market requirements. However a smart PO will always collect as much buy-in and feedback from his team as possible. And then somebody needs to look after the process as such. Invite to backlog planning, ensure stand-ups are short and remove impediments for the team as such. This is what the the ‘scrum master’ does. The team could determine one of the members, or rotate the assignment. For larger organisations there may be full-time scrum masters that looks after several teams. And that’s it.


Check out these related articles:


 

The new developer Onboarding Checklist

Ensure a smooth and efficient start of your new team member so he/she feels comfortable and can contribute to your teams’ results as early as possible. 

4 weeks before day one

Plan for a place within your office and verify that the basics are there (desk, chair, power supply, network…)
Order required equipment:
– notebook + docking station
– keyboard & mouse
– LCD monitor
Depending on your company: order ID card(s) and organize whatever entries in your company systems may be required (company ID and directory, email, etc.)
Block time in your calendar for day one. You should have enough time to look after things, introduce the new team member and go through the onboarding plan together.
Block another hour 3 days after day one.

1 week before day one

Make sure ID card and equipment has arrived and is complete.
Depending on your company: have notebook set up with your standard enterprise applications
Prepare the onboarding plan – what will your new team member need to know about your company (you can find a template here: [*]). Make sure to review that plan with the team.
Sit with the team and plan the initial assignments during the first 2..3 weeks. Make sure to leave enough room for startup and learning.

DAY ONE

Take some time to chat and introduce the new one to the team
Explain your companies’ basics – where is the coffee machine, what are the usual office hours (if any), how to you handle work from home, overtime, holidays, business travel etc.
Explain your work context – what is your groups position within the company, who are your customers, what are your interfaces. You have covered some of that during your hiring interview (see [*]) already.
Give a broad overview of the new ones’ role. Maybe leave the details for later. Go through the onboarding plan together. Hand over that plan – your new team member will own it from now on.
Note that onboarding is not done after day one – It is a process that takes much longer

3 days after day one

Now that the dust has settled take some time to discuss what has been happening so far. Answer questions. Are there any issues to solve? How does the new position feel like? Talk about the role and why it is important. What the major success factors? Why does your group exist? What is the new one’s contribution to that?
Fix a date for the next follow-up meeting 3…4 weeks later

3…4 weeks after day one

Collect feedback. What is good? Any help needed? How’s the team? How’s the work context in your place compared to the new one’s past experience? This is also a learning opportunity for yourself.
While you sit together write down 2…4 high level goals and/or desired outcomes. Focus on goals and outcomes, not on tasks. E.g. “ensure that the developed services are highly available” is something concrete and tangible, it could even be measured. You’re not doing this to control and measure but to ensure a good mutual understanding, and to get priorities clear.
Explain your approach for ongoing communication and follow-up on this exercise. Fix a meeting for 3…6 months in the future where you will review results, give credits for achievements, talk about experiences and expections, improvement potentials or whatever needs to be adressed outside of the day-to-day work context. You can find a template in the download section.


Check out these related articles:


 

How to set up a high-performance team (part3 – onboarding)

So let’s assume all is arranged and your new team member will start within the next few days or weeks. Time to think ahead about onboarding.

Why is well organized onboarding so important?

Let’s just wait and see what needs to be done whenever that person shows up. It’s amazing how many new employees need to wait for days and weeks for equipment and access rights until they can finally really start. What a waste of time, what a frustrating experience – something that just does not happen in a well managed team. So you’ll think in advance about your new team members equipment. You have a notebook already in the shelf or it’s agreed that he/she brings his own along? Good. Otherwise: get one ahead of time, you want to have it ready on day 1 and delivery may take some time. The team is informed and ready to welcome the additional help ? Of course you have thought about a desk and other basic stuff that may be relevant – like entries in your company directory, access rights to your systems, ID cards, and so on? You need to organize all that anyway, do it up front and provide a smooth and positive start.

Planning for knowledge ramp-up

In addition to these practical things you need to organize the knowledge ramp-up. Sit with your team and create a list of topics that the new team member will need to know or learn. Prioritize it. For each topic note the source of information. It may be yourself, one of the team members, or a link to a team Wiki page – or set up an ‘onboarding’ Wiki page for later re-use. Make sure that the team has thought about some first entry-level task that can be tackled along or after the initial topics to start team integration and interaction. Think back about your first day in a new office, a new team – it feels good if you feel welcomed, people have been waiting for you and you most likely made the right choice when choosing this position.

You may want to use this Onboarding checklist  to plan for this.


Check out these related articles:


 

How to set up a high-performance team (part2 – the job interview)

So let’s assume you have published your job offering. What are the next steps to get your dream team together?

If after a few days nobody has contacted you it may be time to re-visit your job offering. Is it clear enough? Are normal human beings able to fulfill your requirements or are you looking for superman? Have you checked if a quick Google search brings up your offering? Put yourself in the shoes of the great people you are looking for – would you find that job, and would you apply for it?

(5) Select job interview candidates 

Let’s assume the first candidates have submitted their CV‘s. Scan them. Does your candidate fit to your job description and profile? Skip the section with the list of skills – everybody can write up such lists. Instead look for former positions where the skills you are looking for have been learned or used. If you look at somebody fresh from college the combination of courses may reveal areas of interest. You want people in your team that are interested in what they are doing – this is where they will achieve remarkable results. Don’t be afraid to hire people that are better than you. You want a team of A-players where the combination of expert skills is more that the sum of the individuals.

There may be also a cover letter. What you want to see in a cover letter is that the person is able to connect his or her CV to your job offering. However if writing skills are not on the very top of your priority list let’s not be overly strict. In general don’t overrate documents – never take a decision based on this information alone. Use it rather to filter out people that don’t seem to fit at all, and use them as starting point for the next step. So invite whoever passes this first scan… If it’s a development position ask your candidate to bring some sample source code to talk about. If it’s a technical writer ask for some sample documentation – whatever is uncritical from IP perspective.

(6) What to asks during the job interview

So here you are, sitting at a table with your candidate if any possible. Skype interviews are only the second best option, without the person-to-person contact you will miss a lot of information. Bring in another colleague or someone from HR to get a second opinion later on. Make sure that there is enough time planned without external disturbance, you’ll probably need 1…2 hours. Don’t forget that the candidate is as much deciding for you as you are deciding for him/her. Lead the conversation, explain what your company is doing, explain the job and how that fits together. This should only take a few minutes. Find out what your candidate knows about your company – the A players will have looked up your website and other available information.

Then let your candidate talk. Listen. Ask questions, get into a conversation. The CV is your guiding line, try to understand the history of that person in front of you. You may ask for achievements your candidate is proud of and the related success factors. Or ask about major challenges in former positions and learnings out of them. Make sure to get at least a few proof points related to the skills that person claims to have. Look at whatever your candidate brought along and have a focused discussion around that. Don’t forget that good conversation skills are not the only thing you are looking for. This is why finding the right people is so hard. Try to get a feeling for how that person would do his job and how he/she would integrate within your existing team. Take notes, ask for questions. Make sure your candidate gets an impression about what’s going on within your local company setup.

(7) Post interview reflection

After the interview ask yourself if this person – to be more precise: this persons’ personality would fit into your team. Discuss it with colleagues. Only if you are comfortable at that level think about the skill set and experience and how it matches up with your profile. You may need a second interview. Follow your head, follow your gut feeling. Talent my be more valuable than some specific piece of knowledge. Knowledge and experience can be acquired over time, talent not. If you are not sure it may be better to continue searching. Wrong people decisions will have a very negative impact on your team, and they are difficult and lengthy to correct. However if you have interviewed several candidates and none of them seems to fit your standards may be unrealistically high.


Continue reading here:


 

How to set up a high-performance team (part1 – roles and skills)

Creating software is all about people. Yes, you need tools and a strategy and so on – but the people you work with define the baseline from where you are starting and how fast you can go.

The level of creativity, experience, motivation in your team is the result of the personality mix of your people. Good leadership will build on that and make the team stronger over time, and this blog is much about how to do this. However the team’s limits as well as its unlocked potential are to a large degree defined by who is part of the group. So choose your setup wisely if you can. In case you are not flexible and don’t have any choice: stop reading here and focus on the many other topics that can help to optimize whatever team you have.

When I had to hire people for the first time and did some reading on the topic I found this frightening statement :

”…only 20 to 30% of all hirings are considered successful after the first 6 months by both parties”

Huh – if this is true and only 3 out of my 10 new hirings are ok I’m set up for trouble. It means sooner or later I’ll have to spend once more time on finding and training new people all over again. And most likely my project won’t run as successful as it could. In the end all went well but this statement (may it be true or not) motivated me to spend the time and effort required to find the right people. What does that mean?

Think about your dream team setup

First of all think about what you need. Imagine your dream team setup as tangible and concrete as possible. Hint: you are not looking for “5 good developers”. Each team will have some role split, be it on purpose or not. Compare that to a football team. Each player can run and hit the ball, hopefully. But great teams will have an ‘expert’ gate keeper and another guy up front who’s specialized at scoring goals. Have you ever seen a world class team where everybody is doing everything? It seldom happens. Humans tend to be particularly good at certain things and less good at others, this is just how it is. Accept it and work with, at least as a starting point. If over time the team members broaden their scope and work at peak level in various areas this is great and an amazing work experience. Just don’t count on this to happen from day one.

Rather think about what expert skills you need. This may be someone good at front-end or UX, or someone with back-end development experience. Or architecture. Or requirements management… If your team is small some skills may need to be combined to a single role, which will make it a bit more difficult to find the perfect candidate. If it is larger you may have several positions with same or similar skills. Then for each position take a sheet of paper (or use the template from the download section) and follow these steps:

(1) Write down team roles

write down what the person will do within the team. What are the responsibilities and goals? What will make up the usual work day? This is one of the most important steps in your team setup process and lays the ground for whatever comes next. Note that you don’t start by drafting a hiring offer. You are not writing down what skills or experience you are looking for. This comes later. Don’t take the shortcut. Invest the time and think about the position as such. Remember, your team members are one of the most important success factors and you need to get this right. Do this for each position you need, on a separate sheet. Write down the number of open positions on each sheet.

(2) Look at all roles in context

Put all sheets next to each other and look at them in context. Do they match up? Is everything covered to set up a working team? Talk it over with colleagues. Sleep over it and check again a day later.

(3) Define skillsets per role

Now think about the required skills for each of the positions. What will that person need to know in order to be successful? Try to find a balance between being to general (“needs programming experience”) and being too specific (“must know AwsomeLibrary V2.3 because this is what we plan to use”). This will go into your job offering description. It’s important that whoever reads it gets a good idea of what the position is about. You want to attract the talents – make sure to put in whatever makes your position interesting. Note down the desired experience level for the major areas. If certain skills are good to have but not absolutely required note that, too.

(4) The final touches for your job offering

Add a general overview about your company, your setup and why it is fun to work with you. Then get your job offering published on the platform of your choice. If you are looking for contracted resources for a limited time period there are freelancer project portals. Or companies specialized in hiring out. Or if you work for a larger company you’ll anyway involve HR and they’ll place the job offering for you.

Permanent hiring or contracting?

How about permanent hiring vs. contracting? You may wonder why there was no differentiation so far. Isn’t offering somebody a permanent employment totally different from contracting for 6 months? Well, yes and no. In 6 months (or whenever your contract ends) it will feel very very different. And filling a permanent job position may take some extra lead time. Otherwise – for your team it will feel very similar. There will be a new person that needs to be onboarded, go through the ramp-up phase, go through the team norming and storming phases until finally the first tangible results show up. You don’t want to go through this more often than required, so choose your team members wisely.

When will you try to hire permanently? In today’s volatile world companies are more and more reluctant to engage in long term. Not good, but this is often how it is. So you may not be allowed to hire anyway. If you have a choice think about this: your software will most likely never be ‘done’ once and for all. Whatever software is used in real business life will need at least some maintenance, and new requirements will come up sure as day follows night. Each person leaving takes knowledge away from the team and reduces the team performance. New people will need to go through the learning curve again. It takes teams usually several months in stable setup to reach peak performance. Whenever possible you should go for permanent jobs, unless you have a project with definite end date at hand or need very special skills over a short time period. Today ‘permanent’ does anyway not mean ‘forever’ any more.


Continue reading here:


 

Team productivity and cloud software development

This blog is for you if you want to set up your software development team for success. Produce steady, high quality outcomes based on best practice processes. Leverage the cloud to your advantage.

We’ll talk about team setup. How to organize work, track progress and avoid disasters. You’ll find some basic introductions, best practices, guidelines and checklists.

What I write about has been covered in books, articles, conferences. This blog condenses my personal views and learnings from what has worked and what hasn’t. You may look at it as sharing some practical experience. My views will not be true for everybody. Take whatever is helpful for you and leave the rest aside. This blog will not make you an expert in any of the topics covered – but it can provide an overview and guide you toward further reading.

This blog is for the software adventurers of our time. Join the ride.

Start on the path - towards team productivity, cloud software and other adventures of our time

Some recommendations to start reading:

 

Photo by Lawrence Walters on Unsplash