4

Explanation needed for Boyer-Moore Majority Element, and one of its extensions.

 1 year ago
source link: https://codeforces.com/blog/entry/58594?f0a28=1
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
By prac64, history, 5 years ago, In English

Hey CodeForces, recently I came accross a problem, finding an element in an array which occurs more than n/3 times, while using constant space and linear time. Now I admit, this is a well known interview problem and plenty of articles exist which cover this, however I have struggled to find one that offers proof of correctness or any explanation.

Please help me understand the correctness.

Article: https://www.geeksforgeeks.org/given-an-array-of-of-size-n-finds-all-the-elements-that-appear-more-than-nk-times/

Basic Idea: Like we do in Boyer-Moore algorithm, however instead of keeping one element and counter, keep 2 , then update them similar to the original algorithm.

Thank you CodeForces!

5 years ago, # |

Rev. 3  

+24

You can have a randomized algorithm. That is -
Take a random number from the array, and check if it appears more than e8f8616e518b67af73332593c4c73ae9f207dc04.png times. If you repeat this x times, then you have 7654f18ceb3fcaa80fe9fe0a50f584f228cfa4f7.png probability of hitting a number with frequency more than e8f8616e518b67af73332593c4c73ae9f207dc04.png.
If n = 105 and k = 3, then if you check only 40 random numbers, you have 99% probability of hitting such a number.
It may have complexity O(nx) or O(n + x) depending on the bounds of the numbers of array, and linear space. However, this may not be fesible always if x needs to be high.

This idea can be used to solve this problem — 840D - Destiny

  • 5 years ago, # ^ |

    Do you have the prove of the probability?

    • 5 years ago, # ^ |

      Hint: Try finding the probability that you don't hit such number in all (x - 1) guesses.

  • 5 years ago, # ^ |

    Is it not important that you hit 40 distinct random numbers, in which case you will have to use auxillary memory to keep track of what you've already checked. Also can you prove how many calls to the rand() function you require to get 40 distinct values ?

    Anyway, thank you for giving me a new problem solving method ! :)

    • 5 years ago, # ^ |

      You don't need to keep track of which numbers you have visited. The point is, if you keep taking a random number and check, you'll find one in around 40 checks. So, just keep trying untill you find one. If you are going to use C/C++, the rand() % n shuold be enough. You may safely assume that it doesn't generate same number over and over again in close calls.
      But in case, if the data is generated based on the numbers generated by rand(), then you can just reseed it by some random prime. Or use rand() * 1ll * rand() % n or something like that.

      • 5 years ago, # ^ |

        Makes much more sense now, thank you !

      • 5 years ago, # ^ |

        You can also have an array A, initialized to contain [1....N] and random_shuffle() it and then choose the first however many elements you need as indices.

        • 5 years ago, # ^ |

          Forgive me if I am wrong , but is it not more complex, I mean that will definitely take O(n) operations, but Rezwan's method is expected to take much fewer.

          Anyway I was primarily looking for proof of correctness of the said algorithm.Since I primarily like understanding why algorithms are correct, even though I am not very good at it.

          Solving problems is a different game, for ex, in this problem one could probably use quickselect on multiples of k and still get correct answer in expected O(kn)time, but that is not the point.

          • 5 years ago, # ^ |

            It's the same complexity

            • 5 years ago, # ^ |

              oh yeah.. my bad !

  • 64 minutes ago, # ^ |

    just tried to implement it but not getting where i am wrong? anyone help it !! My Code:: if i put if(num>=ans) continue in my code; then it accepts solution why?? if i remove equal to sign then also it should work . which case i am missing?

5 years ago, # |

Rev. 2  

0

Problem: Find all elements of A, |A| = n, that appear more than e8f8616e518b67af73332593c4c73ae9f207dc04.png times.

Let's split A in two smaller subproblems A1, |A1| = n1, and A2, |A2| = n2, such that n = n1 + n2. We can define Ki = (a1, a2, ..., ak) as the k greatest elements by frequency in subproblem Ai and F(x, Ai) as the frequency of element x in Ai.

Then we argue that the elements of K1 and K2 are the only candidates to appear more than e8f8616e518b67af73332593c4c73ae9f207dc04.png times in the original problem (K).

Let's do a proof by contradiction and assume that there is an element x such that d646ea44eaa03065c216b63e724694b9b44959ad.png, 9406e3be205df45006b76cd0d4b33c9bc7abc390.png and 19debdbefd20b3ccb8310dd638de76e6757bcd59.png. We know that 4b1840f60b1b2468295bd1a15bcefef123c0ed27.png. Using the converse we know that d9d9b9d76ac477cc060952eebd23c55af3f7fb31.png and d5469fac3ece36cb839299dc665c6f4469acd2c2.png then:

F(x, A) = F(x, A1) + F(x, A2)

207427e698717aec5579e80adf7a5681799ce1e5.png

372cc1b131188cc23d2dc85f6dd44ccd354174eb.png Contradiction!!

So, the elements of K1 and K2 are the only candidates to appear more than e8f8616e518b67af73332593c4c73ae9f207dc04.png times in the original problem (K).

For a more formal proof, use strong induction on |A1| + |A2| within the induction on |A|. Regards.

5 years ago, # |

Nevermind thought out a rough proof:

Consider k-1 stacks(for now, we will reduce to counters later) and the remaining array that is to be seen, now instead of incrementing the counter, imagine you push the same value of the element on the respective stack. similarly for decrementing, imagine you remove one element from every stack.

Now define solution set as elements in stack+ remaining array. we can remove elements from solution set only when it contains k distinct elements.. that is k-1 distinct elements in the stacks and a new element from the remaining array. so now we discard k distinct elements from the solution set.

After we are finished with the entire array, we would have discarded only tuples of k distinct elements. It is now very clear that elements with frequency greater(strictly) than n/k remain in the stack. If the array did not contain any greater-than-n/k-elements, the stack will have garbage values. Hence we need to check once more in (k-1)n time.

Instead of having stacks with same elements he compressed it into value+num_instances pairs.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK