3

Variation of a standard heap problem

 1 year ago
source link: http://codeforces.com/blog/entry/103711
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

We are given a stream of integers. We have to find median of the last K elements of the stream.

In other words

Constraints:
nums , nums is the integer given in the stream.
K , atleast K elements have been given in the stream.

29 hours ago, # |

I will describe a solution that works in for add query and for median query. I believe that this solution can be optimized to for both queries, but I doubt that this problem is solvable in significantly faster (like, in pure logarithmic time), even though the naive bound of is the best lower bound I provide.

So, the solution. Build a segment tree over your array with BST's storing all elements in a segment in each vertex. The addition operation simply adds a number into BST's, thus, it works in time. The median query is now essentially a problem of finding a median in a union of BST's. The simple way to do it is to perform a binary search over the answer. Binary search gives one logarithm, lower_bound on BST's gives two more. In total, this operation works in time.

I think that the binary search part can be done in time, however, I am not sure how.

  • 7 hours ago, # ^ |

    UPD. The classical solution of the problem of finding nth_element on a segment works there as well. And if you use the solution with persistent segment trees, it would work in time on each operation.

    • 7 hours ago, # ^ |

      Rev. 2  

      0

      Does this work online?

      EDIT: It seems I misread your solution initially, please disregard my message.

6 hours ago, # |

Rev. 2  

0

You can solve this problem in O(log(n)) time per operation without having to implement your own data structures. Use an order statistic tree T of pairs of integers and a queue Q of integers. If LeetCode doesn't use libstdc++, you will need to implement a treap to replace the order statistic tree.

Maintain a counter i which keeps track of the index of the current number we are adding. To add the number x, add the pair (x, i) to T and add x to Q. If i >= k, we must remove y = Q.front() from both Q and T, since we only care about the most recent k numbers. To do this, get an iterator to y in T using T.lower_bound(make_pair(y, 0)), erase it from T, and pop y from Q.

To find the median, just use the given formula directly. In particular, if k is odd, return T.find_by_order(k / 2), and if k is even, return 0.5 * (T.find_by_order(k / 2 - 1) + T.find_by_order(k / 2)).

Order statistic trees are discussed at https://www.geeksforgeeks.org/ordered-set-gnu-c-pbds/. If order statistic trees aren't supported by the standard library implementation used by LeetCode and you need to implement a treap instead, see https://www.geeksforgeeks.org/treap-a-randomized-binary-search-tree/.

51 minute(s) ago, # |

What's wrong with this approach — Keep two sets (smaller half and larger half), so that the median can be computed in time. Also keep a map to indicate which set each number lies in. After every input from stream, use the map to remove the last number, and then insert the new number and move 0 or 1 elements around (also reflecting changes in the map) so that both sets are either equal in size or off by 1.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK