Content Recognition by Media
This document explains the steps performed by RetroPlatform Player to recognize unknown titles by their media.
The RetroPlatform framework (as used in players like Amiga Forever and C64 Forever) employs a score-based mechanism to try to identify unknown content by analyzing its media. Recognition is desirable because it allows to display the proper publisher name, title, etc., and to preset an appropriate configuration.
The database of known titles is called RetroPlatform Library, and it may be a local "Library.rp-lib" file, or a live service. When installed, the file may be at a location like "C:\ProgramData\Cloanto\RetroPlatform". After a library update, the previous version is preserved as "Library.rp-lib.backup".
A numerical score is used to express similarity between media image files and known data stored in the RetroPlatform Library. A value of 100 indicates "full match", while 0 indicates "completely unrecognized".
In a first step, the media is divided into "slices", for each of which a SHA-1 checksum is calculated. Floppy disks, tapes and executables are sliced into 10 parts. CD images (e.g. ISO and CUE+ISO/WAV/MP3) are sliced into 9 parts for the first data track, and a remaining part for the additional tracks (usually audio). Some data such as certain boot block and root block properties are processed separately. If the checksums of all 10 slices match an image in the database, the recognition score is set to 100 and the medium is considered to be the same as the reference once.
If less than 10 slices match, then each matching slice is assigned 9 points, and individual 1 point scores are assigned to certain properties such as volume name, creation date and modification date.
For different Amiga disk image formats (e.g. ADF, DMS, etc.), the player tries to map the image to a common (non-copy-protected) format.
The mechanism of dividing media into "slices" is highly effective for fixed-size media like floppy disk images, because it can help recognize the media even after small changes to the content. For best results, portions of the disk which are recognized as blank do not contribute to the score (or else two formatted disks with only a few small files each would have a high similarity even if the files were different).
Variable-size content, such as tape and CD images, does not require processing of empty portions (because there are no empty parts). For variable-size content such as tape data, the mechanism is less effective, but still at least as good as other techniques (e.g. CRC-32 over the full content).
To make a practical example applied to a floppy disk, if for example a small high score file is written to a game disk, it is likely to affect 1 of the 10 SHA-1 checksums ("slices"), and possibly some fields in the root block. Overall, however, the modified disk will still have a similarity score of about 90, compared to the original disk.
The recognition logic used by player tools such as RP9 Toolbox can be adjusted under Tools/Options/Media Recognition Scores. Default settings are:
- "Do not prompt" if the score is at least 70 points
- "Always prompt" if similar items are within 20 points
- Ignore media content if under 50 points
When opening or importing media, the player may ask for confirmation if the score is under a threshold that is considered reliable for automatic recognition, even if only one match is found. This default value is 70 (out of 1-99). Regardless of this value, when an exact match is found in the library the player always stops searching for further matches, and never prompts for details.
Even a high score (e.g. above 70) may require further disambiguation if more than one match is found. The default score difference between two items that are considered similar is 20. So for example if one match has a score of 78 and another one a score of 85, the player will prompt for manual disambiguation (because both matches are within 20 points from each other).
Another adjustable value indicates what score is always considered too low for media recognition purposes, causing the player to revert to an analysis of the file name alone (e.g. RP9 or TOSEC file name data). The default is 50. The valid range is 0-100 (0 = "never use file name", 100 = "never use library data").
It is safe to link to this page.