One of the questions we are frequently asked is: “How can I guarantee that my IT systems will never go down?” There really isn't a way to completely prevent downtime. Even the giants, like Apple, Facebook, Amazon, and Microsoft deal with downtime.
The goal is actually a combination of tools in place so when systems go down (and they will), you can get back to work quickly.
What is system uptime?
Basically, uptime is a term used in the IT industry to indicate a time in which IT systems are operational. Taking it a step further, system uptime also includes the accessibility of the system. When systems are up and accessible, business can run smoothly.
It is not uncommon for uptime to only refer to the system being "up," but it does not always take into account the times when the application may be "up" but still isn't working properly. The value of that uptime degrades because the application is basically not useable for the organization. That's why we think it is important to define system uptime as including accessibility.
Calculating system uptime
Many service level agreements will have conditions for uptime percentages. Often, that time is expressed in 9's. The phrase refers to the number of nines in the percentage. For example, five nines means a system is fully operational 99.999% of the time. In terms of downtime, the system would have to be down less than an average of six minutes throughout the year.
Even though the number only changes by one nine, that change can make a huge difference in terms of downtime. While five nines uptime averages less than six minutes downtime, four nines uptime translates to about four and a half hours.
A few hours may not seem like a lot of time, but every minute of downtime is time where your organization cannot run properly. With the constant threat of cyberattacks and an increasing awareness on technology for business operations, downtime can be detrimental to a business.
Increasing System Uptime
When it comes to increasing system uptime, there is one word you need to know: redundancy. According to Argonne National Labratory, redundancy can be defined as "building multiple resources that serve the same function and can replace each other in the event of the loss of primary system resources."
Even the best systems require periodic patching, upgrades, reboots, etc. In fact, regular updates and patches are something we recommend to increase system uptimes, in part because updated systems are more difficult targets for cybercriminals. But during these updates, patches, and upgrades, there is a period of downtime.
And, that doesn’t factor in possible power outages, network connectivity disruptions, and hardware failures. There are ways to work around these disruptions—battery backup units, next-generation operating systems, etc.— but you will still be left with periodic maintenance.
Redundancy allows for system A to be offline while system B stays up and running. From a service perspective, that means no downtime. Plus, redundancy often has the benefit of balancing the load across all active systems.
Redundancy comes in many different forms, such as
- multiple network cards.
- multiple power supplies.
- multiple, fault-tolerant hard drives.
- multiple servers.
- multiple network appliances.
- multiple internet access paths.
- multiple geographic locations.
As you can imagine, each of these redundancy options has associated costs—with the greater uptime requiring greater expenditures. That is why every organization must decide for themselves how much downtime can be tolerated.
And this isn't just downtime in general, but downtime for each system and application. Thirty minutes of downtime for an internal site might be acceptable, but any downtime for a public-facing, revenue-generating site would bring business to a screeching halt.
How Anteris can help
We have the systems in place to reduce downtime and increase redundancy. As a strategic IT partner, we will help you decide what redundancy options will work best for your organization and strike a balance between cost and downtime.
Let us make your technology freeing, not frustrating.
Frequently Asked Questions
What are the specific challenges or limitations of implementing redundancy in different types of organizations, such as small businesses versus large enterprises?
Implementing redundancy presents different challenges depending on an organization's size. Small businesses may struggle with limited budgets and expertise, while large enterprises might face complexities in coordinating redundancy across multiple systems and locations.
How do cost considerations impact the choice of redundancy options, and what are some cost-effective strategies for improving system uptime on a tight budget?
Cost considerations are crucial in redundancy planning. Organizations often seek cost-effective strategies like cloud-based solutions, which offer scalability and reliability without the need for extensive on-premises infrastructure. Prioritizing critical systems for redundancy can also help manage costs.