Anyone else experienced opposition to randomized algorithms?

Has anyone else experienced that whenever you mention code uses a randomized algorithm, you immediately have at least one person who insists that it would be better to use a deterministic algorithm? It seems like there’s a large part of the profession who assume that randomization is complex and determinism is simple, but in general my experience is the opposite - that randomized algorithms tend to be easier to get right and keep working right.

I’m thinking of a case where I had randomized property testing and people freaked out (and later asked me how to implement it), and now where my boss is saying “why don’t we just transfer all these days in order” and doesn’t seem to get the substantial increase in complexity and fragility or substantial decrease in throughput that represents.

I guess I’m asking for war stories or advice here.

No no no, you just gotta phrase it the right way. “For extra robustness, input is perturbed by a stochastic noise function generating Gaussian noise from an input nonce. The function was selected to ensure its statistical soundness to p < 0.01…”

For automated testing, randomness seems ideal to me. Just make sure to save the input seed. If anyone gives you guff, ask them how a hashtable works, and whether the standard hashtable implementation in their language-of-choice protects against HashDoS.
Applications of Randomness in Systems Performance Measurement, Trevor Blackwell’s Phd thesis, makes a strong argument that systems should use randomness to avoid deterministic failures and reduce configuration sensitivity.

Here are some excerpts from the abstract.

This thesis presents and analyzes a simple principle for building systems: that there should be a random component in all arbitrary decisions. If no randomness is used, system performance can vary widely and unpredictably due to small changes in the system workload or configuration. This makes measurements hard to reproduce and less meaningful as predictors of performance that could be expected in similar situations.

For TCP/IP, changes of a few percent in link propagation delays and other parameters caused order of magnitude shifts in bandwidth allocation between competing connections. For memory systems, changes in the essentially arbitrary order in which functions were arranged in memory caused changes in runtime of tens of percent for single benchmarks, and of a few percent when averaged across a suite of benchmarks. In both applications the measured variability is larger than performance increases often reported for new improved designs, suggesting that many published measurements of the benefits of new schemes may be erroneous or at least irreproducible.

To make TCP/IP and memory systems measurable enough to make benchmark results meaningful and convincing, randomness must be added. Methods for adding randomness to conventional program linkers, to linkers which try to optimize memory system performance by avoiding cache conflicts, and to TCP/IP are presented and analyzed. In all of the systems, various amounts of randomness can be added in many different places. We show how to choose reasonable amounts of randomness based on measuring configuration sensitivity, and propose specific recipes for randomizing TCP/IP and memory systems. Substantial reductions in the configuration sensitivity are demonstrated, making measurements much more robust and meaningful. The accuracy of the results increases with the number of runs and thus is limited only by the available computing resources.

When the overall performance of a system is strongly influenced by the worst case behavior, reducing the sensitivity of the system can also make it perform better. Using average waiting time as a metric, TCP/IP performance is shown to improve significantly when randomization is added to the sending host’s congestion window calculations. Although the improvements are less than those achieved by previously proposed schemes using randomized packet discard algorithms inside the network, the proposed modifications can be implemented entirely in the sending host and so can be deployed more easily.
1. This is very good, thank you.
I’ve definitely faced this in the context of property-based testing. The resistance dissipated when we hit a bug which would have clearly been hit by a natural property-based test, FWIW.
Unless you actually want the testing to be slightly different on every run, you can make it randomised and deterministic - use seed=1. That’s a pretty common approach in quite a few places.
Yes, when I worked in a university research lab I suggested using a cryptographic hash function which was demonstrably sufficient to reduce collisions to negligible probability, to implement a relational database DISTINCT operator without having to retain the full keys (just their hash codes). Some world-famous faculty reacted quite negatively and irrationally to that idea; apparently they couldn’t emotionally accept even a negligible probability of failure.

After making the argument as quantitative as possible, and pointing out that cryptographic systems and content-addressable storage (like git) rely on the same assumptions, I finally won them over, but it was a lot harder than I had expected, given such an intelligent and knowledgeable audience.
olliej

10 hours ago

| link

What’s the goal of the nondeterminism in your case?
1. ketralnis
  
  edited 9 hours ago
  
  | link
  
  I don’t think it’s that nondeterminism is a goal, it’s that determinism is not always a goal. For instance if you need a way to shed 20% of your traffic once it goes over some threshold you could deterministically drop every 5th request, keeping the state and coordination necessary to do so, or you could randomly drop every request with a 20% probability. The latter is much simpler. It’s not that nondeterminism is desireable, it’s that determinism adds more complexity and we don’t really need it for our goals here.
  
  Or a recommender that scores everything that you’re likely to buy. There’s likely to be something that the algorithm thinks you’re really likely to buy but you aren’t so it will stack it way at the top of the list, so every time you see recommendations it’s always the top item. (If we remove things after you’ve bought them, this result is virtually guaranteed.) We could recompute the scores often and keep track of what you’ve already seen and frequencies of re-showing the same product, enforcing diversity by occasionally promoting items lower in the list. That would work. Or we could multiply the score by a random float, effectively weighting the likelihood by the computed score but still allowing it to look fresh every time you look and getting the diversity that way. If you need determinism then this quick hack isn’t available to you, but if it’s not something that you actually need then it is.
2. Property testing will generate random cases. If you run a test 100 times with random data, you might generate data which breaks a test; that’s good!
I should start a list of fallacies of complexity theory, because this is one that I’ve encountered before. The complexity class BPP contains problems which can be solved efficiently but only randomly. If there exists an approach which derandomizes the entirety of BPP to P, then it likely involves creating specific PRNGs which are hard to predict.

Recommend

Does anyone else experience FOMO?

Ask HN: Anyone else getting a 500 on their Google Calendars?

One secret tip for first-time OSS contributors. Shh! 🤫 don't tell anyone else

Anyone else feel like another event is coming before WWDC?

Anyone else experiencing horrible lag/stuttering in 15.5 update?

Anyone else dislikes Magsafe for charging?

Anyone else disappointed with the new M2 Air announcement?

Anyone else think the 8Gb 24” iMac is a slow old dog?

Does anyone else love the 11” MacBook Air?

Does anyone else feel like the M1 MBA is now an even better deal?

About Joyk