This is perfectly legimitimate to have two different DICOM files with identical SOP Instance UID. This happens a lot when losslessly compressing a DICOM DataSet.
Since the compression is lossless, the professional interpretation of the DICOM contained Pixel Data cannot possibly be affected, thus it is legal to preserve the exact same SOP Instance UID.
An application is required to change a SOP Instance UID whenever the professional interpretation of the Pixel Data may be affected (eg. lossy compression).
You may find a minimal explanation of what is the derivation mecanism in DICOM at GDCM wiki:
But in any case, you should always refer to the DICOM standard, when in doubt.
As a side note Media Storage SOP Instance UID and SOP Instance UID are identical, by definition. Information from group 0x2 is simply derived from the DICOM DataSet to generate valid Part-10 DICOM File.
Also Referenced SOP Instance UID In File is kind of special since it belong to group 0x4. Therefore it may only be present within a DICOMDIR DataSet, which is not a typical DICOM DataSet. DICOMDIR are only needed to index other DICOM File on medias (eg. CDROM...)
Failed SOP Instance UID List is also not present in typical DICOM DataSet since it should only be present in C-STORE response DataSet.
And clearly Referenced SOP Instance UID cannot possibly have the same value as SOP Instance UID since it would create a self-referencing loop in the DICOM DataSet.