3

Hashing String Data in JavaScript, C#, C++, and SQL Server – j2i.net

 10 months ago
source link: https://blog.j2i.net/2023/05/30/hashing-string-data-in-javascript-c-and-c/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Hashing String Data in JavaScript, C#, C++, and SQL Server

Posted on May 30, 2023January 24, 2023Author j2inetCategories DevelopmentTags algorithm, Kotlin

I’m working with some data that needs to be hashed in both C# and JavaScript. Usually converting an algorithm across languages is pretty trivial. But in JavaScript the regular numeric type is a double-precision 64-bit number. While this sounds sufficiently large, when used as an integer this only provides 53-bits of precision. As you might imagine, using a 53-bit numeric type on one system and 64-bit on another would result in differences in outcome. This would make hased data between these two functions incompatible with each other. To avoid these potential problems, I needed to use a different type. I used BIGINT.

A potential issue with BIGINT is that it can accommodate extremely large values. This isn’t usually a problem, but I need to have identical behaviour for the hash function to have identical results across the languages. Fixing this is simple though. I only need to perform the bitwise AND operation to truncate any bits in the BIGINT beyond position 64. The hast function I’m using was originally found on StackOverflow. This might not be the final Hash function that I use, but for now it works.

A key thing to note in the JavaScript implementation is the n suffix on the numbers. This ensures that they are all using the BIGINT type. Also take note of the bitwise operation with the number 0xFFFFFFFFn. This ensures that the number is truncated and acting like a 64-bit integer.

//
function hashString(s) {
const A =  54059n ;
const B = 76963n ;
const C = 86969n;
const FIRSTH = 37n;
var h = FIRSTH;
for ( var i=0;i<s.length;++i) {
var c = BigInt(s.charCodeAt(i));
h = ((h * A) ^ (((c) *B))) & 0xFFFFFFFFFFFFFFFFn;
}
return h;
}

The C++ implementation (used for the Arduino ) follows. Using native types in C there’s nothing special that needs to be done.

#define A 54059   /* a prime */
#define B 76963   /* another prime */
#define C 86969   /* yet another prime */
#define FIRSTH 37 /* also prime */
unsigned long hash_str(String s) {
unsigned long h = FIRSTH;
for (auto i = 0; i < s.length(); ++i) {
h = ((h * A) ^ (s[i] * B)) & 0xFFFFFFFFFFFFFFFF;
//s++;
}
return h; 
}

The difference between the C# and C++ versions o the code are only notational. They both handle 64-bit integers just fine with no special tricks needed.

ulong hashString(String s) {
const ulong A =  54059ul ;
const ulong B = 76963ul ;
const ulong C = 86969ul;
const ulong FIRSTH = 37ul;
var h = FIRSTH;
var stringBytes = Encoding.ASCII.GetBytes(s);
for ( var i=0;i<stringBytes.Length;++i) {
var c = stringBytes[i];
h = ((h * A) ^ (((c) *B))) & 0xFFFFFFFFFFFFFFFFul;
}
return h;
}

The differences for Kotlin are also notational, but significantly different from the C# and C++ in how the bitwise operators are expressed.

fun hashString(s:String): ULong {
val A:ULong =  54059u ;
val B:ULong = 76963u ;
val C:ULong = 86969u;
val FIRSTH:ULong = 37u;
var h = FIRSTH;
var stringBytes = s.toByteArray()
for ( i in 0..stringBytes.size-1) {
var c = stringBytes[i].toULong();
h = ((h * A) xor (((c) * B))) and 0xFFFFFFFFFFFFFFFFu;
}
return h;
}

After having written this post, I was working in SQL Server. I was going to save some of this hashed data within SQL Server and decided to try with implementing a hash function there. Everything started out the same, but I ran into a notable problem. I encountered arithmetic overflow issues with declaring the mask 0xFFFFFFFFFFFFFFFF. This mask isn’t strictly necessary, but I’ve placed it there should I happen to use one of these implementations to hash to a smaller data type. I was using the BIGINT data type. But that data type only provides 63-bits of precision, not 64. Knowing that now I could just use a smaller mask to have a hash function that works identically across environments. If you’d like to try it out, the SQL Server implementation follows here.

CREATE FUNCTION HashString
(
@SourceString as VARCHAR(15)
)
RETURNS BIGINT
AS
BEGIN
DECLARE @A BIGINT =  54059
DECLARE @B BIGINT = 76963
DECLARE @C BIGINT = 86969
DECLARE @FIRSTH BIGINT = 37
DECLARE @StrLEn BIGINT = LEN(@SourceString)
DECLARE @Index BIGINT = 1
DECLARE @MASK BIGINT = 0xFFFFFFFFFFFF
DECLARE @Letter CHAR
DECLARE @LetterCode BIGINT
DECLARE @H BIGINT = @FIRSTH
WHILE @Index <= @StrLEn 
BEGIN
SET @Letter = SUBSTRING(@SourceString, @Index, 1)
SET @LetterCode = UNICODE(@Letter)
SET @H = ((@H * @A) ^ (@LetterCode * @B)) & @MASK
SET @Index = @Index + 1    
END
return  @H;
END
GO

Mastodon: @[email protected]
Instagram: @j2inet
Facebook: @j2inet
YouTube: @j2inet
Telegram: j2inet
Twitter: @j2inet

Posts may contain products with affiliate links. When you make purchases using these links, we receive a small commission at no extra cost to you. Thank you for your support.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK