I found this post on the Mathworks website, which explains how to get a hash from a string by calling .NET through Matlab. My understanding of how it works is that it converts every character in the string to an unsigned integer, and then computes a hash from the uint array. At first I blindly applied this code to numerical arrays with double precision. But I noticed that I was sometimes getting identical hashes for non-identical input arrays. After some troubleshooting, I found that the hasher treats all input as integer even if uint8 is not called. For example defining the hasher and calling it on 0.1 and 0.2 results in the same hash:
hasher = System.Security.Cryptography.HashAlgorithm.Create('SHA256');
double(hasher.ComputeHash(.1))
ans =
91 169 60 157 176 207 249 63 82 181 33 215 66 14 67 246 237 162 120 79
double(hasher.ComputeHash(.2))
ans =
91 169 60 157 176 207 249 63 82 181 33 215 66 14 67 246 237 162 120 79
I'm wondering what the best approach would be to re-work the code to work on numerical arrays with double precision. I can think of two ways off the top of my head. (1) Convert the numerical arrays to character arrays, or (2) multiply the arrays by 10n where n is sufficiently large so that every element of the array is an integer. Neither of these feels very elegant. This stuff is a bit out of my usual wheelhouse because I don't have any formal background in programming.
For context, I'm setting up hashing to identify unique entries in a database. It's set up as a Matlab table where the elements of certain columns are numerical arrays, and the other columns are metadata. The metadata is human-entered and I'm trying to clean up the database by checking for data entries that may have been entered in duplicate with different metadata.
Post Details
- Posted
- 3 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/matlab/comm...