In This Article
What is Data Storage?
Data storage refers to the particular technology that is used to record and retain digital media by using computer components to retrieve them at a later point in time.
Technically, data storage signifies optical, magnetic, mechanical media to record and store digital data and information for current or future use.
- There are two basic foundations of data storage which are the type of data and the device used to store it.
- Data storage allows faster operation and data continuity because the desired data can be simply pulled out from the storage rather than having to type everything every time.
- Data storage is connected to the computer directly so that it can fetch and access the desired data quickly.
- There are three main forms of data storage namely file storage, block storage and object storage.
- There are mainly two types of data storage available such as direct attached storage and network based storage.
Understanding Data Storage
Typically, data storage is an important aspect which should be foolproof and reliable.
However, in order to understand it, you will need to know a lot of other things that are allied to it.
As you may know, digital information can be of two specific types, namely:
- Input data and
- Output data.
The input data is provided by the users through different input devices such as keyboard, mouse and others.
And, the computer processes this data in the Central Processing Unit or the CPU to produce the output.
This means that the computer cannot process anything and produce any output if there is no user input.
Providing data input to the computer by the users manually is quite time consuming and energy prohibitive.
One easy but a short-term solution to it is to use the computer memory which is also referred to as RAM or the Random Access Memory and the Read Only Memory or the ROM.
These typically control the basic functionalities of a computer.
However, RAM comes with little storage capacity and also has very limited memory retention ability.
As for the ROM, just as the name indicates, it only allows reading the memory but does not allow editing it.
Though significant progress has been made in computer memory in the form of Dynamic RAM or DRAM and Synchronous DRAM or SDRAM, these too are limited to memory retention, space, and cost.
More importantly, these too lose their memory retention ability when the system powers down.
Therefore, in such a situation, the only effective solution seems to be available in data storage.
These are specific devices to store data which will retain it even if the system powers down.
At the most basic level, data storage has two specific foundations. These are:
- The form of the data and
- The device that stores it.
The most significant benefit is that the users do not need to input data manually every time into the computer.
Instead, they can simply instruct the computer itself to pull the desired data out from the storage device.
Importance of Data Storage
The importance of data storage seemed to have increased even more with the advent of advanced data analytics and big data.
Add to that, the large amount of Internet of Things or IoT devices has also contributed to the importance of data storage.
According to a recent report of IDC, an IT analyst firm, in 2020 alone about 64.2 Zettabytes of data was either created or replicated and it is expected that by the end of 2025, there will be about 163 ZB of newly created data.
This is a tenfold increase to the amount of 2016 when it was just 16 ZB.
With the amount of data growing drastically with every passing minute and hour, it is important to organize them in the best possible way to make them easily available and recoverable.
Therefore, the need for scalable and highly dense data storage systems is the need of the hour.
The modern storage systems are therefore needed to be more sophisticated and high performing so that it supports using Artificial Intelligence, machine learning and other AI related technologies.
This will helps in analyzing data easily during real-time database analytics and deriving the maximum value whether it is in the form of any of the following:
- Object storage platforms
- Storage Area Networks or SANs
- Scale-up and scale-out Network Attached Storage or NAS or
- Converged, hyper-converged and composable infrastructure.
Digital data and information is written by using software commands on the target storage media.
Data storage in modern computers, works by establishing a connection with it and the computer either through a network or directly.
The computer fetches the data and accesses it from the storage device for the necessary processing.
The computer system can not only read the data from there but can also create an output from it and save it in the same location or any other.
And, the users can also share the data storage with others.
Data storage typically involves the hardware and software systems integrated which captures, manages, secures and prioritizes the data.
This data can be in any form such as:
- Backups and
- Data warehouses.
It may also be stored in different places such as:
- On premises
- At data collection facilities
- In edge computing environments
- Mobile devices and
- On cloud platforms.
It can be one or a combination of the above as well. It primarily depends on the requirements for storage capacity for the particular data.
For example, simple documents will need only a few kilobytes of space but the graphics intensive files and digital photographs may need megabytes of storage space, and video files may take up gigabytes of space.
The computer applications list such capacity requirements, which is just a small part of the working process.
The storage administrators then take it up from there and determine the following:
- The amount of data that is needed to be retained
- The time that data is required to be retained
- The compliance regulations that is applicable
- The specific data reduction methods to be used
- The probable issues during the process which may affect the capacity and
- The disaster recovery or DR requirements.
Forms of Data Storage
Ideally, there are three basic forms in which data can be recorded and stored.
- File storage – This is also called file-based storage or file-level storage and indicates the hierarchical storage tactic that helps in storing and organizing data. This means that data is stored in files that are then organized in folders which are subsequently arranged in the directories and subdirectories under a hierarchy.
- Block storage – Also referred to as block-level storage, this indicates the methodology where data is stored in blocks that are subsequently stored as independent pieces. Each of these blocks has a unique identifier. Typically, block storage facilitates faster, more efficient and more reliable data transfer and is therefore more favored by the developers.
- Object storage – Commonly referred to as object-based storage, in this data storage architecture a large number of unstructured data is handled. Typically, this data cannot be easily arranged into or does not conform to the traditional relational database that has rows and columns. This includes emails, photos, videos, audio files, sensor data, web pages, and other types of textual and non-textual website and media content both.
Benefits of Data Storage
Efficient data storage will offer a lot of benefits both to an individual as well as to a business in particular. Some of these are:
- Data continuity
- Easier and faster data accessibility and recovery
- Reliability in data preservation and
- Better security for the stored and protected files.
Most importantly, data storage offers more flexible capacity options and price points.
Data Storage Devices
There are different types of data storage devices used today and all of these devices come with their own pros and cons.
It can be a semiconductor memory that uses Integrated Circuits or ICs based on semiconductors and the data is stored here in MOS or Metal Oxide Semiconductor memory cells.
It can also be a non-volatile magnetic storage that is built on diverse patterns of magnetization.
There is a magnetically coated surface that holds the information that can be accessed with one or more read/write heads that may have one or more transducers for recording.
Non-volatile optical storage devices are also used that come with an optical circular disc to store information in the deformities on the surface of it. This data can be read with a laser diode that illuminates the surface and observes the reflections.
Here is a brief description of all of them.
- SSD and flash storage – In this type of data storage, solid-state and flash storage technology is used. There are flash memory chips that help to write and store data by using a flash memory. There are no moving parts in an SSD or Solid State drive as there is in the Hard Disk Drive or HDD and therefore it offers lower latency and therefore fewer SSDs are required. They produce less heat, noise and are faster and more reliable.
- HDD storage: These types of storage devices come with a circular platter or disc that is put on a spindle and is coated with a thin layer of magnetic material. The disc spins at a high speed of may be up to 15,000 revolutions per minute and when it does data is written on its surface. Magnetic recording heads are used for this. These are positioned by the high-speed actuator arm to the space on the disc that is first available. Data here is written in a circular manner.
- Hybrid storage – These specific types of data storage devices combine the features of both SSD flash storage and HDDs. They offer the speed of an SSD and larger storage space like the HDD. The balanced infrastructure of these storage devices uses the appropriate technology depending on the different storage requirements. These storage devices are also quite economic as compared to a high capacity SSD or an HDD in isolation.
- Cloud storage – This is another cost-effective and more scalable data storage option than on-premise storage networks and hard drives. Right from the storage equipment and their maintenance to storage management and security, everything is offered by the cloud service providers in lieu of a rent. You can access them from anywhere through any system as and when required through any network that may be available.
- Hybrid cloud storage – This is also a cost-effective cloud storage option for your data where both public and private cloud elements are combined. You can choose which particular cloud to store your data. For example, highly sensitive data can be stored in the private cloud environment and less sensitive data can be stored in the public cloud setting.
Apart from these options, you can also choose to use a backup software and appliance to store your data and prevent them from loss, damage, fraud and failure.
Copies of data are made periodically and stored in a secondary data which can be anything such as:
- An HDD
- An SSD
- Tape drives or
Backup storage is usually offered as a service which is also referred to as BaaS or Backup-as-a-Service.
Just like any other types of as-a-service solutions, BaaS is also a cost-effective option to store data more reliably in a remote location that offers scalability.
Additionally, businesses today also use a Virtual Tape Library or VTL to store their data and files.
However, in contrast to their names, there is no tape used in this type of storage systems at all.
Instead, there is a disk on which the data is written sequentially.
It is so called because these disks can retain the properties and characteristics of a tape and allow more scalability and quicker recovery.
Charts and Units
Amount of data stored is measured in bits, which is the smallest unit of measure of computer memory.
This typically has a binary value of either 1 or 0.
The value of the bit is calculated on the basis of the level of electrical voltage present in one single capacitor.
The next higher unit to measure memory is byte, which is equal to 8 bits.
The measuring units of storage in computers and network systems are based on two particular standards such as:
- A base-10 decimal system and
- A base-2 binary system.
There is not much of a discrepancy for lower storage amounts in whichever standards used but it is more prominent as the storage capacity increases.
For example, whether it is measured in bits or bytes, the discrepancy in measurements based on base-10 and base-2 standards will be 1000 bits in the former and 1024 bits in the latter.
This can be expressed differently based on the storage capacity of the devices such as in bits, bytes, KB, MB, GB, TB, PB, and EB.
Different data measurement charts are used for this such as:
- The data measurement size charts
- The connection speed charts
- The computer technology charts
- The video format charts
- The audio/video charts and
- Data comparison charts.
All these charts have different uses and therefore, all of them may not be used every time.
Depending on these, the units of data storage can be arranged from the smallest to the largest as follows:
- Bit or b = 1 or 0 (on or off)
- Byte or B = 8 bits
- Kilobyte or KB = 1024 bytes
- Megabyte or MB = 1024 kilobytes
- Gigabyte or GB = 1024 megabytes
- Terabyte or TB = 1024 gigabytes
- Petabyte or PB = 1024 Terabytes
- Exabyte or EB = 1024 Petabytes
- Zettabyte or ZB = 1024 Exabytes and
- Yottabyte or YB = 1024 Zettabytes.
Also, the binary or base-2 units of each can be represented as follows:
- Kibibyte or KiB = 210
- Mebibyte or MiB = 220
- Gibibyte or GiB = 230
- Tebibyte or TiB = 240
- Pebibyte or PiB = 250 and
- Exbibyte or EiB = 260.
And, the decimal or base-10 units of each can be represented as follows:
- Kilobyte or KB = 103
- Megabyte or MB = 106
- Gigabyte or GB = 109
- Terabyte or TB = 1012
- Petabyte or PB = 1015 and
- Exabyte or EB = 1018.
Based on the above two charts, the percentage difference between base-10 and base-2 units can also be determined as follows:
- For the decimal value of 100 Kilobytes the binary equivalent will be 97.65 Kibibytes and the percentage difference between the two values will be 2.35%
- For the decimal value of 100 Megabytes the binary equivalent will be 95.36 Mebibytes and the percentage difference between the two values will be 4.64%
- For the decimal value of 100 Gigabytes the binary equivalent will be 93.13 Gibibytes and the percentage difference between the two values will be 6.87%
- For the decimal value of 100 Terabytes the binary equivalent will be 90.94 Tebibytes and the percentage difference between the two values will be 9.06%
- For the decimal value of 100 Petabytes the binary equivalent will be 88.81 Pebibytes and the percentage difference between the two values will be 11.19% and
- For the decimal value of 100 Exabytes the binary equivalent will be 86.73 Exbibytes and the percentage difference between the two values will be 13.27%.
Different systems may distinguish the standards and provide the measurements on both values.
Types and Solutions
There are two main types of data storage namely direct attached storage and network based storage.
Each of these two types of data storage comes with its characteristic pros and cons and there are different types of devices that fall under these two categories as explained below.
Direct Attached Storage
Also known as DAS, just as the name signifies, it refers to the immediate storage space and is connected to the computing system directly.
More often than not, this is the only gadget attached to it.
DAS offers quite respectable local backup services but its sharing ability is limited.
The most common DAS devices are:
- Floppy disks
- Optical discs
- Compact discs or CDs
- Digital Video Discs or DVDs
- Hard Disk Drives or HDDs
- Flash drives and Solid State Drives or SSDs.
Network Based Storage
It refers to a network that allows connecting more than one computer and accessing them all through the network.
This, therefore, allows much better collaboration and data sharing.
It stores data off-site and therefore is more suitable to create backups and for data security.
There are two common types of network based storage setups namely:
- Network Attached Storage or NAS and
- Storage Area Network or SAN.
NAS is a single device storage or RAI that is made up of either Redundant Array of Independent Disks or RAID.
SAN storage, on the other hand, refers to a network of several devices connected together. These can be of different types such as:
- SSD and flash storage
- Backup software and appliances
- Hybrid storage
- Cloud storage and
- Hybrid cloud storage.
All of these types of data storage devices are already explained above.
The different characteristics of NAS are:
- It is a file storage system
- It allows limited users
- It is a TCP/IP Ethernet network
- It operates at limited speed
- It offers limited expansion options
- It is easy to set up and
- It is a low-cost option.
On the other hand, the characteristics of SAN are:
- It is a block storage system
- It is fiber channel network
- It allows multiple users
- It is faster in performance
- It is highly expandable
- It is complicated to set up and
- It is a high-cost option.
One of the main benefits of NAS is that the data is centralized and access can be controlled by setting permission levels.
Depending on the available IT infrastructure, there are different solutions available for data storage such as:
- One-platform flash storage solution that simplifies management of data on premises or on cloud by eliminating different silos
- Storage virtualization solution that reduces complexity and cost by centralizing management and simplifying diverse environments to expose hidden capacity
- Tape storage solution which is low cost in comparison to other media and is reliable with air gap, cyber resilience, long term retention, and energy efficiency
- Smarter SDS or Software Defined Storage solution that offers both functionality and intelligence to allow best storage configuration
- HCS or Hyper Converged Storage solution that uses the cloud for computing and functioning and allows managing storage and virtualization as a physical unit in a single system and offers higher storage capacity and efficiency for much improved user experience
- Cloud storage solution that is faster, low-cost and secure to access from any device from anywhere and
- Artificial Intelligence or AI solutions that make handling repetitive tasks for specific types of data sets easier.
Properties of Data Storage
Data storage and its different technologies at all levels can be characterized based on the core properties.
Volatility of the memory is one of the most important properties, where non-volatile nature of it is preferred since it can retain the stored data and information without needing a constant supply of power
Mutability is another important property which can be further classified as:
- Mutable storage or read/write storage which allows overwriting information at any time
- Slow write, fast read storage which allows overwriting information several times but at a slower rate than the read operation such as in SSD and CD-RW
- Write Once Read Many or WORM storage such as CD-R and other semiconductor programmable ROMs and
- Read only storage that contains the information loaded during the time of manufacture such as in the CD-ROM and the mask ROM ICs.
Accessibility is an important property that can be further categorized as:
- Random access, which allows accessing the storage any time and for any number of times such as in the primary and secondary storage and
- Sequential access, where the accessing is done one after the other in a serial order such as in off-line storage.
Addressability is another important property that can be categorized further into three parts such as:
- Location addressable, which allows accessing every unit of information in the storage individually with the help of its numerical memory address
- File addressable, where the information is segregated into files of different lengths which are location addressable to make it more comprehensible and can be selected with the help of file names and human-readable directory and
- Content addressable where each individual unit of information is accessed on the basis of the part of the contents stored with the implementation of a computer program or software or a computer device or hardware.
Next, the capacity of the storage can also be divided further into two categories such as:
- Raw capacity, which indicates the total amount of information the storage medium or device can hold and is usually expressed in bits or bytes and
- Memory storage density, which indicates the compactness of information stored in the device which is evaluated by dividing the storage capacity of the medium by unit length, area or volume.
Performance is an important property of the storage devices which is evaluated on the basis of different parameters such as:
- Latency, which refers to the time taken to access a specific location within the storage device and is measured in nanoseconds for primary storage, in milliseconds for secondary storage, and in seconds for tertiary storage
- Throughput, which indicates the rate of reading the information from or writing information to a storage device and is typically expressed in Megabytes per second apart from bit rate
- Granularity, which indicates the size of the biggest block of data that can be accessed effectively as one single unit with no added latency introduced
- Reliability, which refers to the probability of changing of the spontaneous bit value under different conditions of failure of the system overall and
- Utilities, which refers to the usefulness of the storage offered with its performance.
Consumption of energy also affects the performance of the storage device overall and therefore is also considered as an important property of it.
Efficient use of energy may result in performance benefits such as:
- Reduced fan usage
- Shutting down automatically when inactive and
- Lower heat generation.
Security is also a vital property to consider which involves encryption mainly in different forms such as:
- Full disk encryption
- File or folder encryption and
- Volume or virtual disk encryption.
Susceptibility and reliability is also important because different types of data storage may come with different types of vulnerabilities or point of failures.
There may also be different techniques of predictive failure analysis, which, when not done, may result in total loss of data.
Finally, error detection ability counts in terms of performance of the storage device.
Errors accumulating can result in down spiking the rate of data transfer and even data corruption.
Some useful ways to estimate or rectify errors are S.M.A.R.T. diagnostic and error scanning.
Modern Data Storage Technologies
The modern storage technologies are quite complex than before but are at the same time more reliable, faster, and beneficial.
For example, the more advanced NAND flash technology has not only opened up the possibilities of software-based storage but has also made it much better and more reliable.
In order to use this technology and configuration, one has to install an SSD on an x86-based server and use a custom open-source code or third-party storage software for storage management.
NVMe or Non Volatile Memory Express is another good technology that maintains a high industry-standard protocol which is designed especially for the flash-based SSDs.
Using this technology, applications can communicate directly with the Central Processing Unit or CPU through the PCIe or Peripheral Component Interconnect Express links.
This saves them from the hassles of transmitting SCSI or Small Computer System Interface command sets through the network host bus adapter.
As such, NVMe makes the best use of the SSD technology in a much better way in comparison to the SAS or Serial Attached SCSI and SATA or Serial Advanced Technology Attachment interfaces that were developed especially for the slower HDDs.
The more developed NVMe over Fabrics or NVMe-oF optimizes communications between the SSDs and other systems even better through a network fabric such as Fiber Channel, Ethernet, and InfiniBand.
Then there is the NVDIMM of Non Volatile Dual In-line Memory Module which is a hybrid technology that comprises NAND and DRAM devices.
It is integrated with backup power and can be plugged into any regular DIMM slot on a memory bus.
It helps in processing basic calculations in the DRAM and for the other operations it uses flash.
Any host computer with the necessary BIOS or Basic Input Output System drivers can recognize it.
The NVDIMMs are best used for extending memory of the system and improving storage performance rather than adding to the capacity.
Usually, the regular NVDIMs that are available in the market as of now come with enough storage capacity already of up to 32 GB.
Then there is the SAN, which as explained above, is a high speed network behind the servers.
It is specialized to connect storage devices with the servers.
It comes with a special communication infrastructure that allows such physical connections to bridge the interconnected elements in the network such as the directors and switches.
It is also considered as an addition to the storage bus concept that allows interconnecting servers and storage devices with similar elements such as:
- LAN or Local Area Networks and
- WAN or Wide Area Networks.
There are different components of SAN that allow faster data accessibility, improved network availability, and better system manageability.
- Fiber Channel
- Storage appliances and
- Networking hardware and software.
There is also a management layer in SAN that puts in order the storage elements, the connections, and the computer systems.
It also ensures more robust and secure data transfers.
So, that is all that can be said about the major aspects of data storage in computers.
With such extensive knowledge gained from this article, now you surely know exactly where to find this article, if you wish to store it in your computer, and how it will be stored.