Comparing File Contents

The Challenge

    It would be impractical to read every file in a group of files suspected of being duplicates and compare it byte-by-byte with every other file in the group. The process is simple for two files, but can quickly become unmanageable and far too time-consuming for larger numbers of files.

The Solution

    The solution is to read each file and use the contents to create a "digest," or digital signature of the file that requires very little memory. These small digests can then be compared very quickly. Because the digests are not exact copies of what the files contain, it is theoretically possible that two files with different contents could produce the same digest (and hence be falsely labeled duplicates), but the MD5 digital signature algorithm used to generate the digests was specifically designed to minimize the probability of that to a statistically insignificant level.

Just In Case

    However, if there is any doubt, the Compare Contents command in the Action drop-down menu can be used to compare the contents of any two files in a file list window.

 


Table of Contents