Difference between revisions of "DIY Computer backup"

From DIYWiki
Jump to navigation Jump to search
Line 152: Line 152:
 
|-
 
|-
 
|Encrypt
 
|Encrypt
|If you backup might be accessible to others (say offsite, in the cloud etc) then consider an encrypted backup, so that you don't risk data leaks should a malign actor access your backup data.
+
|If your backup might be accessible to others (say offsite, in the cloud etc) then consider an encrypted backup, so that you don't risk data leaks should a malign actor access your backup data.
  
  

Revision as of 23:19, 5 October 2023

**** CAUTION - INCOMPLETE ARTICLE - WORK IN PROGRESS ****

As a DIYer, the chances are you have accumulated a vast collection of files over the years, from plans and designs, drawings, photos and loads of other stuff along with the normal pile of documents, videos, scans and recordings. Now it is said you can basically divide computer users into two groups; those who have lost important information or documents on their computer, and those that are going to!

With that in mind, this article will cover some of the ways to make that unwanted loss easier to recover from by having a good working backup solution to safeguard your information and make sure you don't lose it.

In theory...

In theory this is not a difficult problem - just keep another copy of the data somewhere, so that if your computer dies, gets stolen, or goes up in smoke, you can get all your stuff back from your copy. However the reality is a bit more complex. What happens if you have not actually lost your data, but realise that some time ago some of it was corrupted. The file is still there, but when you try to open it, it won't; you just get an error message? Having a faithful copy of a corrupted file does not help much. How do you make sure that your backup does not get destroyed, or stolen?

Computer backup is a "deep" subject, with lots of options, and one size most definitely not fit all.

Requirements

One of the hardest bits to get right is actually working out what your requirements are. Your requirements will need to factor in how much information you need to store, what you are prepared to pay to do it, how long you are prepared to wait to store and recover it, and what kind of incidents you need to protect against.

So first some terminology

Type What is it Pros Cons
Disaster Recovery / Bare metal backup This is a backup that makes a complete copy of everything on your computer. All the data and files, all the applications, the operating system and all the configuration. The idea being that if something goes wrong, like you hard drives just fails without warning, you replace the drive, boot from your backup recovery CD/USB thumb drive, and restore the last backup. Once that is done you can carry on exactly where you left off. Typically very fast, and can fix many types of failure. Saves lots of time if you need to do a full recovery and don't want to have to reinstall your operating system and all your applications. Often it is not very granular - if you just accidentally deleted a single file, having to restore the entire computer back to where it was the last time you did a backup, might be overkill and might actually lose more data that has not been backed up recently. Also you have the problem that a bare metal backup may only be easy to restore to exactly the same or very similar hardware.



It is also unable to deal with a file that was corrupted some time ago, and has been backed up many times since then (see Generational backup below)

Full backup The process of making a full copy of all of the files that you want to backup The backup is complete and does not depend on any other backup It might take a long time, and might require lots of storage space.
Incremental backup A backup that is a follow on activity from another backup, that captures only the changes since the last backup Quick to do, and often requires little storage space. Restoration is more complex, and may require restoring a number of backups in sequence to get back to the most recent state.
Generational backup A backup that keeps not just the current version of each file, but also a number (or possibly all) of the previous versions as well. This lets you not only recover from total loss of your file, but also to step back through time to find the desired version of a file - even if it is not the latest. Takes more storage space and is often more difficult to administer. May make recovery more complicated.
Online backup Not to be confused with a backup to somewhere on the internet, Simply a backup that is stored somewhere that is always accessible to the computer.



This could be the USB thumb drive, or external hard drive plugged into your computer, or perhaps saved to another computer or network attached storage device on your computer network.

Easy and rapid access, no manual intervention required The backup itself is vulnerable - it could be overwritten or destroyed by the computer it is attached to. (either user or software error, hardware failure, or malicious activity users or hackers)
Offline backup One stored somewhere that is not immediately accessible. Say on an external hard drive. While the backup medium is offline, it is immutable. Becomes vulnerable when re-attached.
Offsite backup An offline backup that is stored in physically different place to the the original data. This might be cloud hosted storage, this might be a hard drive that you have stored at a friends house, at the office, or even if your bank safe deposit box! Offsite backups are essential to mitigate against some kinds of disaster. They are safe from fire or theft of the original equipment. Since they are not accessible (i.e. online) to the computer they protect, they can't be easily overwritten, corrupted, deleted even by a bit of malicious software running on your computer or the actions of a malicious individual. Take more time and effort to maintain and administer. The off site copy now needs to be secured to prevent it being a data leak problem!
Immutable backup One that can't be changed, can't be overwritten, corrupted, or deleted. Can include backups written to "write once, read many" (aka WORM) media like a DVD ROM The ultimate in rock solid protection. Can be expensive to maintain since it can swallow an almost infinite amount of storage capacity. Can be very slow to access.
Cloud / Internet backup Storing backup data on someone else's server Offsite, and can be arranged to be immutable (at least from the end that is being backed up) Often incurs monthly costs - the amount depending on the type of storage and how "accessible" it needs to be.
Fault tolerant backup Backups need to be fault tolerant. That typically means having multiple copies of data. Fault tolerant destinations are also good places to store backups. For example on hardware that is itself fault tolerant, or can be easily swapped out for something compatible. Can protect against hardware failure stopping you from backing up or restoring data. More expensive, can require ongoing maintenance.

The law of backup

There are some "laws" of backup you should follow - good practices that can help you avoid traps that jeopardise your chances of successfully recovering your data from a backup.

The Laws of Backup Why?
Test your backup It is sometimes very easy to think you are fully protected because you have a backup system in place. However finding out that it does not actually work like you expected at the very moment you actually need it is not a good feeling!


Check you have actually backed up what you thought you had

  • Did you include all the right folders, including those which are normally hidden by the operating system but include important application or configuration data?
  • Were you able to backup those files that were actually open at the time of your backup? You know those things like that critical database or email folder or perhaps even a virtual machine?


Have you actually backed up enough?

It might be tempting to only backup the stuff you "need", and not worry about the OS or your applications - after all the operating system and applications can be reinstalled from their original sources, in theory at least. The practicalities can be very different. If you chose not to backup the OS and applications:

  • Do you still have those install CDs?
  • What about the activation keys required to install the software that came with them?
  • Did the application require online activation? Does that still work, and will it work on new hardware?
  • How many hours did it actually take to install all your applications? Then all the updates? Then all the extra add-ons and downloads that you added in the many months after?
  • Did you originally install from a download? Do you still have a backup of that? Could you find it again?
  • If you download again, can you get the same version you had installed? Do you remember which version you had? If you use the latest one, it is compatible with all your files?


Are you able to actually recover all the files?

  • Are you able to recover the information you want to a new location, and not just overwrite the working copy?
  • If you have a bare metal disaster recovery backup, can you restore data to a incompatible hardware platform?
  • Will it let you also recover individual files and not just the whole lot?


Can you do do a restore in a realistic timeframe?

  • Cloud / internet backups can be nifty, but have you got the available bandwidth to download several TB of data quickly enough to be useful?
  • Can you physically get to your off site copy when you need to?
  • Even local storage like a USB thumb drive in your pocket may actually have relatively low read speeds, and take many hours to copy from.


Can you recover the actual version of the files you need, and not just the latest copy?

  • The actual version of a file is often more important than just the most recent
Don't destroy your only working backup It might seem like a good idea to send your next backup to the same device as your last one. But that means you are probably destroying or at least corrupting your one and only working backup before you have completed a new one. Traditional backup solutions will use multiple sets of backup media for several reasons, and this is one.
Do it often enough Doing a backup can be a tedious process, so the temptation is only do it when "you have enough changes to make it worth while". The problem here is that you tend to forget or underestimate the number of changes or how much work it will be to redo them. It is bad enough finding out that you have lost the last few days of work, but worse when you realise it was actually several weeks, and now you can't even remember exactly what you have lost.
Automate See "Do it often enough". The way to make sure that it happens often enough is to automate the process. If you don't have to think about doing it, it won't get forgotten.


However check that the automation is actually working when it should.

Encrypt If your backup might be accessible to others (say offsite, in the cloud etc) then consider an encrypted backup, so that you don't risk data leaks should a malign actor access your backup data.


(make sure you have a hard copy of the decryption key or passphrase that is not help just on the computer)

Backup Destinations

There are a vast number of options for protecting your data. Which will work for you will depend on your requirements. It is common to need several different options to fully achieve your requirements.

Destination Good for Limitations Cost
Another partition on the your hard drive or SSD Operating systems allow storage devices to be partitioned into multiple logical volumes. That can make a lot of sense with the massive storage capacity of modern disks. This can allow you to keep you OS and applications separate from your data, and your data more logically organised. A backup on another partition can be a handy convenience - allowing very quick and easy restoration of files. Backups on alternate partitions are not really proper backups, they are highly vulnerable:
  • If the drive fails, you lose you main copy and backup.
  • If the backup is online all the time, it is easy to corrupt - ransomware can encrypt that at the same time as your main copy.
  • You can accidentally delete it just as easily
  • So can a careless or malicious family member, or member of staff
  • A stolen, failed, or burnt computer will lose your backup.
Low
A different hard drive or SSD in the same computer Same advantages as above Mostly the same disadvantages, with the exception that you will still be protected should only the first drive fail. You have some limited "fault tolerance" Low
An external hard drive Also very quick to access, and has the advantage that if disconnected from the computer after the backup, it is now "offline" as less easy to corrupt. Being offline, it can also be moved off site for extra security.
  • Easy to leave online when not intended reducing protection to just that of a different drive in the same computer.
  • Being off site can be good, although that might make your data less secure if others can access it and you have not used encryption.
  • HDDs might not be good long term storage options - ones unused for a long periods may fail to startup reliably.
Low
A USB thumb drive Small, cheap and convienent
  • Typically slow to read and write
  • Reliability not always good
  • Can be accidentally be left "online" when not intended.
  • Easily lost or damaged (and a trip through the washing machine never does them any favours!)
Very Low
Fault tolerant disk systems Using multiple physical drives in a fault tolerant arrangement (like RAID, JBOD, Storage Pools etc) can mitigate the risk of losing data as a result of a device failure. Access speed can be very good (and in some cases faster than "ordinary" storage). Used alone, not a complete backup option, but can often form the basis of a first line of defence. Medium
Network attached storage (NAS) Network storage devices (i.e. anything with storage and a computer such as and off the shelf NAS or a home built one, or a "real" server) can add resilience to your backup:
  • Often have fault tolerant storage systems
  • May take care of "snapshops" , making sure you have generational backups.
  • Can be fast enough to use as primary storage - i.e. mapping your normal working folders onto the NAS/Server so that that is the primary storage location, making you data easy to access from multiple computers and devices. Play media to multiple destinations.
  • Slightly slower than internal native storage (typically limited by local area network performance).
  • If in the same building, still vulnerable to disaster damage.
  • Typically still online and so vulnerable to accidental deletion / corruption / ransomware. However with a bit of planning they can also host storage used for backup that is not directly accessible to the systems using them. They can "pull" information from the protected system, rather than permit it to push data to them.
Medium to High
Online internet / Cloud storage Can be highly convenient allowing real time backup of your data - keeping every change as soon as it happens. Sharing and restoring to other systems is often easy.

There are a number of distinct services available in this space. See cloud backup offerings below.

Ranges from free to quite pricey - much depends on how much space you need and how long you can wait for recovery. Varies
Tape In many cases the "go to" solution for larger business users. Masses or storage at relatively low cost per TB. Relatively fast. Good for keeping offline backups and also offsite backup. May need some manual intervention. Also some maintenance. Initial equipment costs can be very high. Tape is also a serial access medium - so if you want just a few files recovered from a backup, it might take longer to spool through the tape to the right place to start recovery. High initial cost
Optical disks (CD/DVD) Cheap, ideal for offline and offsite storage, can be immutable. Less well suited to modern quantities of data. Some question over the long term lifespan of media in storage. May require significant amounts of disk handling with large amounts of data. Relatively slow performance compared to other local options.


With time, access to suitable mechanisms may become more limited since many systems no longer include a DVD re-writer as standard.

Vey Low

Fault tolerant technologies

There are a number of technologies like RAID "Redundant Array of Inexpensive Disks", JBOD "Just a Bunch of Disks", Windows Storage Spaces, that can make data storage more reliable. They typically combine a number of storage devices together, and use them in combination to improve reliability. This might just be by "mirroring" (i.e. keeping two or more copies of everything on a set of matched disks, so that if one fails there is still a working copy that can be used to regenerate the content onto a replacement disk. More elaborate systems use check or parity disks to store error checking and error correcting data that can be used to detect and fix problems that arise from failing or corrupted disks. Many technologies also allow you to grow your storage capacity later, by adding more physical disks and then joining the, to your storage pool.

Fault tolerate systems might even have dual power supplies and dual network interfaces to make sure that any common single hardware failure can't stop you being able to access your backup.

Home built / DIY fault tolerant system can be very easy to build with lots of off the shelf software solutions available.

NAS Options

There are many commercial off the shelf NAS vendors like QNAP, Synology, and ReadyNAS These can form not only the starting point of a good backup system, they can also so many other things like media serving to all your home devices, TV recording, Surveillance camera recording, all in a discrete and power efficient box.

DIY NAS Options

Assembling your own NAS is a very DIYable thing. This could range from something as simple as a Raspberry Pi on a network connected to a external USB hard drive, to a full blown build using server class components and multiple SSDs and HDDs controlled by something like FreeNAS

Cloud Storage Options

There are a number of different cloud storage facilities that work in a variety of ways.

Standalone Storage

You can buy standalone storage from the "big 3" cloud providers. Storage of this type may not be directly accessible as a mounted "drive" or disk, but can often be access via a web site, or more commonly by software that uses their cloud storage API (Application Programmers Interface). There are many software providers that integrate access to these platforms into their products, or you can roll your own (for example see Bob's notes on manipulating S3 storage from the command line).

Amazon AWS

Amazon's vast cloud infrastructure allows you to purchase storage in many different forms (but always in "buckets"), ranging from high performance storage for real time access, to slower access options more suited to long term backup. Their "S3 Glacier Deep Archive - For long-term data archiving that is accessed once or twice in a year and can be restored within 12 hours", is current priced at under $0.001 per GB/Month. See the full price list.

Azure

Microsoft will sell you "blobs" of storage. Like Amazon have various tiers of storage for different applications. See their price list

Google Cloud

Google will also sell you storage by the bucket. See their price list.

Storage integrated into other security products

There are many security products out there (Anti-virus, Ransomware detection, intrusion detection etc) that also have facilities for cloud backup either built in or available as an extra service. While typically more expensive for large amounts of data, they can be ideal for easy and reliable protection for critical data.

Acronis

A popular vendor of disk cloning and backup tools, also have fully integrated cloud backup facilities. See their range here.

Avast Cloud Backup

Avast (incorporating AVG), have a backup module for their Cloud Care security suite. See details here

Dedicated backup solutions

Some vendors specialise just in backup

Backblaze

Backblaze made a big splash in the storage and backup market with the first product that took a novel approach to system backup. They built out their own storage farm of servers and disks and then started selling complete backup services priced per PC (or Mac) including unlimited storage. They now have a range of products that use their own storage platforms as well as other cloud provider's storage networks. Details here.

iDrive

Another dedicated backup provider that can use their own infrastructure or integrate with other providers storage options (they also do cloud storage product that can be accessed using an Amazon S3 compatible API which allows their storage to be used by many applications that can use S3 storage). More details here

Drive sharing and syncing products

There are a number of drive sharing products that can maintain a complete copy of selected folders on your computer, and sync them in near real time to the could. This allows files to be kept in sync across multiple computers and other devices. Save a file on your desktop, and then within seconds it will be accessible from your laptop or on your phone. They can also make sending large files to others or sharing documents very easy since you can simply post a link to the file on the service, rather than the file itself.

Note that these are not true backup products, in the sense that they replicate what happens on your computer - so deleting or changing a file locally, can also delete or change it in the cloud (depending on settings). Some do have generational storage allowing easy recovery of earlier versions of overwritten or corrupted files. They do however integrate very well with the OS file system, and keep near real time copies of your data.

The best known products being:


Backup Strategies

Here are some layered strategies that could be considered for your backup needs.

Solution Notes
  1. Bare metal to external HDD
  2. User data mapped to NAS storage with fault tolerance
  3. NAS backed up to AWS Long term storage
The bare metal backup to HDD allows fast recovery of OS and apps from an offline locally held drive. Since the NAS is being used as the primary storage location for your files, there is no actual recovery required - they will sill be there as soon as you get the machine running again.

The fault tolerant disks will protect against drive failure without any interruption to your work. The NAS may provide generational snapshots of older file versions. The cloud backup copy gives reassurance that you can recover from NAS failure or theft or destruction.

  1. As above
  2. But with the NAS replicating itself to another NAS in a different location using a protocol like Rsync
If you have two locations with good internet connectivity, this can give many of the same advantages as above but without the additional costs of cloud backup services.
  1. PC Mirrored to Backblaze account
  2. Encrypted local backup to a collection of external HDDs/SSDs
  3. One backup kept offline at a second location
The mirrored account allows for easy recovery of all files and apps (Backblaze also have a "mail a disk out" option to recover large backups when you don't have adequate internet speed. The second copy on your own hardware giving protection should your online account ever be stolen or hacked.
  1. Disaster recovery backup to a set of DVD ROMs
  2. Disaster recovery backup repeated a few times per year and a set of DVDs stored offsite.
  3. Weekley backups of your important files burnt to DVD ROM
  4. Copies of previous backup DVDs stored off site.
  5. Free google drive account storing main documents folder
A fairly cheap solution for smaller quantities of data (e.g. under 10GB ). The disaster recovery disks will get your machine up and running quickly after failure or replacement, and the backup DVDs will get back all but the most recent weeks work. The google drive offering a chance to get back more recently changed files.


This is not as rock solid as some solutions, but is certainly a big step up from doing nothing and hoping for the best.