How to approach the development of a disaster recovery strategy for your Oracle or SQL Server database systems
Oracle Gold Partner

How to approach the development of a disaster recovery strategy for your Oracle or SQL Server database systems

March 8th, 2010

[back to main news page]

Developing a Disaster Recovery plan is a task that can often be swept aside as a ‘value vacuum’, when it fact it could prove to be one of the most valuable aspects of your IT infrastructure…

System downtime and data loss through hardware or disk failure, system corruption, theft and fire or flood damage, power failure or human error is something that you hope your organisation will never have to deal with. However, it happens far more regularly than many think and can prove incredibly costly for organisations without a fully operational Disaster Recovery strategy in place.

How the organisation backs up data is core to the Disaster recovery strategy and it is around this issue that most of the big choices will be made. These choices will of course be influenced by the level of Disaster Recovery that the organisation needs: the size, importance and ‘replaceability’ of the system and its data; and how much of that data the organisation can afford to lose.

1) Consider the impact that system corruption, failure or damage would have on the organisation

The first question to ask during the development of a Disaster Recovery strategy is ‘how much can we afford to lose?’ Consider how a system failure would impact on the business and how quickly the system would be required to be restored back to full capacity. This should help you set criteria based on the two core aspects of your strategy:

  • ‘Recovery Point Objective’ [RPO] – the amount of data loss that is acceptable to the company, or that the company can cope with.
  • ‘Recovery Time Objective’ [RTO] – the duration of time and service level within which the system must be restored after a disaster in order to avoid unacceptable levels of damage to business continuity.

Once RPO and RTO values are ascertained, you should be able to conclude how much should be invested in protection for IT systems against the risk of failure.

For example, a company selling high value products such as cars will probably have a very low RPO. If each individual transaction is attributed with a high value, storing accurate copies of data prior to a system failure or other disaster is a valuable exercise. Conversely, supermarket system handle a high volume of low value transactions and would have a higher RPO, as the organisation can probably afford to lose up to a few days worth of transactions.

2) Decide whether the cost of a Disaster Recovery strategy is justified

If the cost of backing up systems and data is more than the value of the systems and data themselves, it makes more sense to simply redo any lost work rather than strive to back it all up. However, this approach to Disaster Recovery is becoming increasingly rare, as most organisation are coming to understand that lost data is not always replaceable and data loss has more than just a financial impact.

3) Consider your backup options

Once the organisation has established a policy around the amount of data it can afford to lose and the time within which restoration is required, choices regarding the mode of backup can be made. An appropriate balance between the speed at which systems must be restored back to normal; the complexity of protection required and the level of investment that can be made must be achieved.

  • Standby Servers

Standby servers protect against the loss of data transactions and system downtime through server failure.

They are secondary servers that are loaded with an exact copy of the database sitting on the primary server and can be brought online in the event of primary production server failures or scheduled maintenance work. Users can continue working on the database with minimal disruption.

The implementation of a standby server involves: the creation of the database and continual log backups on the primary server; the creation and maintenance of the standby server through backup and restoration of the database sitting on the primary server; the ability to bring the standby server online in the event of a primary server failure.

  • Tape Backups

Offsite tape backup is still one of the most common approaches to disaster recovery. Magnetic backup tapes are run at stipulated intervals and then physically transferred to a remote location for safety. Tapes can be recalled in the event of an on-site system failure and are used to recover the database system to the point at which that tape was used for backup.

Tapes are small and portable, which means they lend themselves to simple Disaster Recovery strategies such as taking backups home each evening. They also require a fairly low level of initial investment. However, recovery can be an issue: the equipment used for recovery has to replicate the equipment used in production and the recovery procedure needs to be tested regularly to ensure that backups are being performed correctly and that recording is even taking place.

  • Disk-based Production Data Backups

Using disk-based storage to take direct copies of production data and sending them to a different, remote site protects against on-site disaster: it effectively means that all of your eggs are not in one basket.

With a disk backup strategy, a complete copy of the existing database systems is taken and stored off-site at stipulated times, or when a certain number of changes have been made. In the event of system failure, this exact system replica can be used to get the systems back up and running very quickly without the need for any reinstallation.

Periodic Replication involves disk backups in the form of ‘snapshots’ being taken periodically, transferred to an off-site location and archived as new snapshots come in. Much like a system restore for the average PC, you can choose to restore the database systems to a point at which everything was known to be working without issue. This method offers protection against any on-site issues or errors. However, depending on the gap in time between the last snapshot and system failure, a significant amount of processes can be lost.

Continuous data replication means that the data copy off-site is almost a ‘real-time’ copy. In the event of a server failure or other unforeseen incident, systems can quickly be returned to full working order using the remote disk backup. If your critical applications typically process a high volume of transactions, only a small amount of data will ever be lost simply because the data copies are so current.

The regularity of data copies that continuous replication involves is also responsible for a negative of this method. Any mistakes made at the primary site will be replicated. There are ways to overcome this issue, but ‘fixes’ such as the implementation of a tape-based system running parallel to the disk backups are rather impractical.

  • Hosted Server Backups

Remote hosted backup services are an increasingly popular method of system and data backup. System information can be transferred to a remote location quickly and easily, and without need to purchase any extra hardware or licences. Hosted backup services offer protection against fires, floods and other such ‘disasters’, as all system information is stored in an off-site secure facility.

The risk associated with trusting a third party with the privacy and integrity of backed up data is minimal. Service providers dedicate all of their resources to ensuring data is transferred and kept in the most secure conditions. The in-house skills of hosted backup service providers make hardware failures incredibly unlikely, as any potential issues are pre-empted and resolved long before they threaten hosted data environments.

Accurate system copies and the added protection of off-site storage are a given with hosted server backups. Provided you take care in choosing your hosted backup provider and ensure that the company you choose has the correct security measures in place, the up to date system copy will be fit to facilitate a full system restore with hardly any data loss, usually within minutes.

4) Other Quick Tips

  • If you are using tape backups, always verify your storage media, preferably using the backup software as it is written to the tape. If you are unable to verify it as it is written to tape, schedule a check after a certain number of backups to ensure that the data is being backed up correctly. Also, make sure tapes are stored properly, at separate locations in protective containers.
  • Again, when using backup tapes, remember that there is much more to consider in an emergency restore than just the backup tapes themselves. You will need boot disks or CDs for all of your key systems with the correct OS components and network/tape drivers to make the system functional and able to extract the contents of backup tapes. Would you be able to find all of these at short notice?
  • Whatever strategy you choose, make sure that everyone is aware of the backup strategy and is comfortable with their role in it. For example, if you are using tape backups, keep a detailed log of tapes’ locations and contents, and designate certain individuals to source them in required. If you are using a remote hosted backup service, designate someone to manage that relationship. This person should have the service provider’s details and will simply need to call them to instigate a full system restore.
  • Contact Xynomix for further information and advice on backup strategies for your IT systems here, or visit http://www.xynomix.com/oracle-consultancy/xynomix-disaster-recovery

What would you like to do now?

Or…