Knowledgebase: Job handling
What is parallelism and how is it supported
Posted by Sven Koester, Last modified by Ibrahim Tannir on 11 August 2009 10:45
Q: What kind of parallelisms or concurrencies does PresSTORE support?|
A: PresSTORE supports job, device and source or client parallelisms in all possible combinations.
Q: What is device parallelism?
A: Device parallelism denotes the concurrent distribution and writing of a single data stream to several devices. PresSTORE supports device parallelism in that it will take a single source of data (e.g. client file system) and write it to several media using multiple drives at the same time. The data will be separated on a file by file basis, which means that a single file including all of its components (helios, xinet or hfs forks, windows data streams) will never be split and will always be contained on a single medium.
Q: What is client or source parallelism?
A: While saving data to a medium, several data sources (data from different clients, file systems or jobs) can be combined into a single data stream to be saved on a single medium. Normally, PresSTORE divides a file into blocks, which best fit the medium to which the file is being saved. When more than one source is feeding the medium, the blocks from the different sources are multiplexed (intermingled or interleaved) into a single data stream.
Q: What is job parallelism?
A: PresSTORE allows multiple jobs of any kind (backup, archive, synchronize, label or restore) to run simultaneously by distributing the hardware resources among them. Such jobs may simultaneously employ the same pools, volumes, drives, clients and file systems.
Q: When does device parallelism make sense?
A: Device parallelism makes sense, when the data stream arriving from the source has a greater bandwidth than the drive to which the stream is being saved to. In this case, involving further drives to distribute the load makes sense. The side-effect of drive parallelism is that the data from a single sources (e.g. files of a single directory) will be occupying several media - so all the media may be required for a restore. (See also backup-to-disk parallelism).
Q: When does backup-to-disk parallelism make sense?
A: (See also device parallelism). A backup-to-disk done in parallel to more than one virtual drive only makes sense if the drives are writing files to different disks, since only in such a case, one can achieve more bandwidth.
Q: When does source or client parallelism make sense?
A: Source parallelism makes sense when the incoming data stream cannot saturate the device it is writing to. This may be due to extremely fast devices which are nowadays becoming very common, or due to a low network bandwidth, when saving data from clients. In this case, multiplexing the incoming streams into a single data stream to the writing device makes sense. The caveat here being, the more streams get multiplexed, the slower the restore is going to be later. As a rule of thumb, about 4 clients should be multiplexed to one device.
Q: What is device cloning?
A: Device cloning is the process in which two block-level identical PresSTORE volumes are simultaneously written to increase the security against a failure of the medium the data are written to. Since with tape media, failures are quite common, it is always recommended to employ cloning for long term archived data, as well as data which is removed from the primary storage.