Victorian Times Down in the Data Center

May 29, 2013adminArticles

The shift from preventive to predictive maintenance

Over the last 15 years the face of the data industry as we know it today has not only changed but actually come in to being. Way back in March of 1995, “Yahoo!” had only been heard on cowboy movies, eBay would not come into existence for another 7 months, and three whole years (a virtual eternity in this industry) would go by before Google made its debut.

Certainly, the financial industry had always known the necessity of storing all of our details and transactions, but the concept that a profitable business could not just rely on but actually consist solely of data and data storage was science fiction back then.

The level of operating hardware and software grew with the demand. The hardware grew at turns faster, larger and more power hungry than the software it ran. While capabilities grew by leaps and bounds, the intricacies of both hardware and software meant merely rebooting the computer at the first sign of a problem was often no longer an option. Even a minor “glitch” could result in a catastrophic loss of data, which began to mean a catastrophic loss of business.

A multitude of industries enjoyed growth over this period, all reliant on the idea that a computer must be closed down in a controlled fashion. Many companies learned the potential for data loss the hard way and the subsequent market for backup storage devices grew exponentially with data center space being dominated by huge tape storage machines of enormous complexity and cost.

While backup is still important, the cost of business downtime while backup is instigated has become so expensive to these new companies in terms of loss of revenue and customers, that the emphasis over the last 8 or so years has been to avoid using these behemoths wherever possible by preventing power loss to the computers in the first place.

Power loss has always been notoriously difficult to predict (One Fortune 500 company I visited several years back concluded with their historical data that 98% of unplanned power outages were as a result of personnel being in the battery room). Dual redundancy became the flavor of the day and is still used today. Even with dual redundancy there are still some areas of this critical power feed that are the Cinderella of this critical power continuity: the batteries.

Arguably the only piece of Victorian technology to be found in any data center, the humble lead acid battery still stands virtually unchallenged in its ability to provide the instantaneous power required when the AC feed drops out and the uninterruptible power supply needs to draw from somewhere to allow it to continue supporting its all-important computers.

Hi-tech data centers spend untold millions on multiple UPS systems and multiple generators that are maintained at start temperature, 24×7, to be ready the instant they are required. Typically the generators are started once a month to make sure they will work when required, but what about the batteries? Very often they are lucky to get a quarterly or even a semi-annual test or inspection to validate their integrity. This weak link is an area that needs improvement in over 90% of critical power sites in the US today. Regular servicing of batteries is perceived as preventive but it is not. Batteries are not like generators, batteries are chemical devices and their state can change virtually overnight. Testing a battery one week does not prevent it from failing the next week, no more than a personal medical exam one week prevents illness the next. Having a fixed battery monitoring system is essential for the reliable uptime of a data center.

So having established that UPS backup batteries are critical to data security and knowing that battery monitoring needs to be part of the power chain security system is a big step and kudos for getting this far. But as always, there is more to know.

What do you need to consider when looking for a proper and effective battery monitoring system?

Battery monitoring has come as far as the machines it now supports. In the 80s there was much debate about whether measuring the internal resistance of a battery told anything about its state of health and whether it would or could perform. Over the last 20 years it has been categorically proven that it is possible to detect many of the failure mechanisms of a lead acid battery using measurement techniques involving the “ohmic value” of a battery.

There are several companies that have specialized in the techniques for measuring lead acid battery state of health and despite different techniques, most of these companies’ products work by informing the user about battery state of health. There are many factors to consider, so how do you know which one to pick?

Price:

As always this is going to be a factor but what is the cost compared to one unplanned power outage?

An expensive battery with no monitoring is not as good as a low cost battery with monitoring and, if well chosen, your monitor will last over the life of many batteries so consider carefully where you spend your money.

Ease of installation:

This is extremely important and leads to two other critical factors when choosing a monitoring system for your UPS batteries.

(a) The downtime: It can be difficult to schedule downtime and the costs involved are very real, not to mention, the potential risks involved both to personnel as well as the infrastructure being supported by the batteries remaining online; assuming that the user can switch strings out while installation is going on. Due to the way a lead acid battery behaves, having a two string system provide 15 minutes of run time will not give the user 7.5 minutes if he only has one string operational. In many cases it may provide less than one minute. Obviously, the disruption and time it takes for installation can have a significant impact on the overall operation of the business.

(b) The complexity: The more complex a system and the more wires it has, the more likely it is to go wrong and the more maintenance and inspections it will require in order to keep it running. A battery monitoring system needs to be more reliable than the battery system it is monitoring otherwise it becomes an annoyance of false alarms which very quickly get ignored. Make sure the system uses fiber optics wherever possible rather than cumbersome wiring harnesses.

Expandability:

A good battery monitoring system is going to last over the lifetime of many batteries and during these lifetimes the expandability of the system will become an important consideration. Make sure that the system chosen can expand with minimum expense. Systems requiring entirely new control boxes and without adequate connectivity should be avoided. Look for systems with modularity that will only require adding a few components to the present system and where the addition of these components is simple and quick.

Backwards compatibility:

Make sure that the system you choose has backwards compatibility. This may not be important or evident in the beginning, but if the system is to last over the life of several batteries then this is an important factor and when you come to add several more strings of batteries to your 30 cabinets of 12 jars then you will be glad you considered this.

Versatility:

Ensure that the system you choose can cope with differing voltages of jars in your strings of batteries rather than having to buy a completely separate system just because you want to add a couple of strings of 12 volt jars to your 4 strings of 240 2 volt wet cells. A well thought out battery monitoring system should be able to cope with all voltages of jars on one system. Your chosen battery monitoring system should last years and for the life of many battery banks so don’t “put yourself into a corner” on day one.

Measurement frequency:

If you are considering a battery monitoring system then you already know that quarterly maintenance is not going to give you the nine 9s of reliability your business is most likely demanding. So an “ohmic” test every three months won’t do it, but what will?

It has consistently been shown that since their introduction, valve regulated lead acid batteries can double their ohmic value in as little as two or three days. A quarterly maintenance schedule may assure your battery health for around two weeks each as an absolute maximum giving you a total of 8 weeks of possible “safe” time, but leaving you totally vulnerable for the remaining 44 weeks of the year.

From this then it can be seen that a battery monitoring system recommending measuring ohmic value every two weeks, or worse every month is going to fall way short of ensuring the nine 9’s of reliability that you require. Make sure the system you choose measures daily for optimum reliability. The less current a system uses to test the ohmic value of a battery the more often it can test. Be careful with this however as once a day is quite adequate for this crucial measurement.

Technical support:

However good a battery monitoring system is, technical support from the manufacturer may become important, not only during the initial stages but also later on when staff members move on or if further training is required. A battery monitoring system is a longterm decision; make sure the company chosen is in it for the long haul as well.

Global coverage: Your choice of system at your site may become part of a larger requirement for your company. Make sure that the company you choose has reasonable global presence, not only for ordering and shipping of new orders, but also so that unwanted or damaged product does not have to be shipped halfway around the world in order to get a replacement.

Finally:

As with all technologies there is more to battery monitoring than first meets the eye, so choose a company that is not new to the technology it uses but one that is not stuck in the dark ages either. Make sure there are plenty of the systems out there and avoid being the “guinea pig” testing new products on your valuable site(s). Make sure the technology is up-to-date but established.