Posts: 1 md5 hash issue Options 9m ago
Hi all I'm a SAS newbie and need some help with the following issue. I'm experimenting with the md5 hash function and testing it on the sashelp.shoes dataset.
In a datastep, for each input observation, I'm creating a new variable that is a concatenation of each field using catx with comma as the field separator. I then use that variable as input to the md5 function and sure enough I get a hash value back which I print out to the log using PUT. I repeated this for each input line. However I then exported the shoes dataset to a CSV text file. The export enclosed the fields in double quotes and included $ and commas for the currency fields which I removed manually from the CSV file. So visually the the lines of the CSV looked exactly like the lines output from SAS. Next I wrote a little python script to read each line of the CSV text file and calculate a md5 hash for that. Unfortunately none of the hashes for the CSV file matched the hashes from SAS. Has anybody done something similar and if so can you tell me where I'm going wrong. I know the python code is correct as I checked the results using the built-in md5 checker in windows.
When I get into work again tomorrow I'll post some of the code I'm using if that helps. Meanwhile if anyone can help that would be appreciated
I'm using enterprise guide V4.3
As promised , here is the SAS code I'm using and the first few output records I'm getting.
/* concatenate all fields of a dataset and compute a checksum */
proc sql;
select name into :varstr2 separated by ','
from dictionary.columns
where libname = "SASHELP" and
memname = "SHOES";
quit;
data stuff(drop=check all);
format check $hex32.;
set sashelp.shoes end=end1;
newvar2 = catx(',',&varstr2);
all = catx(',',&varstr2);
check = md5(all);
put all;
put check;
run;
Africa,Boot,Addis Ababa,12,29761,191821,769
0F7503F59119E8248D89ED645F886871
Africa,Men's Casual,Addis Ababa,4,67242,118036,2284
8066D31E7C2A254EAB127C121B526DF7
Africa,Men's Dress,Addis Ababa,7,76793,136273,2433
653E4A1DF8B5708DF9C8B97587A1E981
Africa,Sandal,Addis Ababa,10,62819,204284,1861
D59E63E5319B4E3018F28D46A4CED9F9
Africa,Slipper,Addis Ababa,14,68641,279795,1771
1612FC1FE23B55078B7693ECE1E6D028
Now here is the python code and the same output records I'm getting for that:
import hashlib
filename = "f:/test/shoes.csv"
md5_hash = hashlib.md5()
with open(filename,"r") as f:
for x in f:
result=hashlib.md5(x.encode('utf-8'))
print (x)
print(result.hexdigest())
Africa,Boot,Addis Ababa,12,29761,191821,769
7001aaebd146b10aaed951cb692c6c4b
Africa,Men's Casual,Addis Ababa,4,67242,118036,2284
916a0c39554b70d691d03c71e8daa763
Africa,Men's Dress,Addis Ababa,7,76793,136273,2433
ea9e85e9843d3bb02206bc0ba7c3d5d4
Africa,Sandal,Addis Ababa,10,62819,204284,1861
5865cfc5d443b5a2e0038c573b5b6fb9
Africa,Slipper,Addis Ababa,14,68641,279795,1771
0226115fb928f326044ca43e186ae23a2
Update. I was thinking it might be something to do with newlines/linefeeds on the python side so changed my code to just look at the first input string in isolation.
import hashlib
x="Africa,Boot,Addis Ababa,12,29761,191821,769"
md5_hash = hashlib.md5()
result=hashlib.md5(x.encode('utf-8'))
print (x)
print(result.hexdigest())
Africa,Boot,Addis Ababa,12,29761,191821,769
65d38fa13c098fc3959b1eb0c19b0427
Hmmm, still doesn't match with the SAS version