

Random notes, digital art, and pairwise comparisons is polynomial
source link: https://andrewpwheeler.com/2022/12/09/random-notes-digital-art-and-pairwise-comparisons-is-polynomial/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Random notes, digital art, and pairwise comparisons is polynomial
So not too much in the hopper for the blog at the moment. Have just a bunch of half-baked ideas (random python tips, maybe some crime analysis using osmnx, scraping javascript apps using selenium, normal nerd data science stuff).
Still continuing my blog series on the American Society of Evidence Based Policing, and will have a new post out next week on officer use of force.
If you have any suggestions for topics always feel free to ask me anything!
Working on some random digital art (somewhat focused on maps but not entirely). For other random suggestions I like OptArt and Rick Wicklin’s posts.

Dall-E is impressive, and since it has an explicit goal of creating artwork I think it is a neat idea. Chat bots I have nothing good to say. Computer scientists working on them seem to be under the impression that if you build a large/good enough language model out pops general intelligence. Wee bit skeptical of that.
At work a co-worker was working on timing applications for a particular graph-database/edge-detection project. Initial timings on fake data were not looking so good. Here we have number of nodes and timings for the application:
Nodes Minutes
1000 0.16
10000 0.25
100000 1.5
1000000 51
Offhand people often speak about exponential functions (or growth), but here what I expect is we are really looking at is pairwise comparisons (not totally familiar with the tech the other data scientist is using, so I am guessing the algorithmic complexity). So this likely scales something like (where n
is the number of nodes in the graph):
Time = Fixed + C1*(n) + C2*(n choose 2) + e
Fixed is just a small constant, C1 is setting up the initial node database, and C2 is the edge detection which I am guessing uses pairwise comparisons, (n choose 2)
. We can rewrite this to show that the binomial coefficient is really polynomial time (not exponential) in terms of just the number of nodes.
C2*[n choose 2] = C2*[{n*(n-1)}/2]
C2*[ (n^2 - n)/2 ]
C2/2*[n^2 - n]
C2/2*n^2 - C2/2*n
And so we can rewrite our original equation in terms of simply n
:
Time = Fixed + (C1 - C2/2)*N + C2/2*N^2
Doing some simple R code, we can estimate our equation:
n <- 10^(3:6)
m <- c(0.16,0.25,1.5,51)
poly_mod <- lm(m ~ n + I(n^2))
Since this fits 3 parameters with only 4 observations, the fit is (not surprisingly) quite good. Which to be clear does not mean much, if really cared would do much more sampling (or read the docs more closely about the underlying tech involved):
> pred <- predict(poly_mod)
> cbind(n,m,pred)
n m pred
1 1e+03 0.16 0.1608911
2 1e+04 0.25 0.2490109
3 1e+05 1.50 1.5000989
4 1e+06 51.00 50.9999991
And if you do instead poly_2 <- lm(m ~ n + choose(n,2))
you get a change in scale of the coefficients, but the same predictions.
We really need this to scale in our application at work to maybe over 100 million records, so what would we predict in terms of minutes based on these initial timings?
> nd = data.frame(n=10^(7:8))
> predict(poly_mod,nd)/60 # convert to hours
1 2
70.74835 6934.56850
So doing 10 million records will take a few days, and doing 100 million will be close to 300 days.
With only 4 observations not much to chew over (really it is too few to say it should be a different model). I am wondering though how to best handle errors for these types of extrapolations. Errors are probably not homoskedastic for such timing models (error will be larger for larger number of nodes). Maybe better to use quantile regression (and model the median?). I am not sure (and that advice I think will also apply to modeling exponential growth as well).
Recommend
-
39
Or maybe I should have called this blog post, “I’m seeing an excessive number of DRS initiated vMotions on my newly upgraded 6.5 environment”. Recently I was part of a few conversations about the nature of DRS load balanci...
-
11
pairwise and startWith The pairwise operator is a great tool if you also need the previous value for every element. It uses a buffer with a size of 2, making it memory efficient as we...
-
11
John Fremlin's blog: WPA supplicant and TKIP pairwise cipherWaiting for updates: connectedPosted 2011-10-03 08:49:00 GMTOn some wifi routers, the CCMP WPA pairwise c...
-
8
Use a bar chart to visualize pairwise correlations 2 Visualizing th...
-
10
Like Article Convert given Array to 0 by reducing elements pairwise with any positive valueLast Updated : 12 Jan, 2022Given an
-
9
Maximum number of pairwise non-acute vectors in R^n May 21, 2017...
-
6
Computer Science > Computational Complexity [Submitted on 6 Sep 2022 (
-
15
论文标题:Pairwise Adversarial Training for Unsupervised Class-imbalanced Domain Adaptation论文作者:Weili Shi, Ronghang Zhu, Sheng Li论文来源:KDD 2022论文地址:
-
4
Speeding up pairwise comparisons to 2.8 million/sec in Java on a regular laptop Notes on coding strategies to speed up pairwise comparisons for 200,000 journals Pairwise comparisons go something like:
-
8
Pairwise Swap Elements of a Linked ListOctober 08, 2023 |370 ViewsGFG POTD: 07/10/2023 | Pairwise Swap Elements of a Linked ListProblem of the Day, linked-list, Data Structure and Algorithm
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK