Saturday, 15 June 2013

In this article , we will discuss about Binary Indexed Trees and solutions to some problems of spoj to put lights on the powerfull data structure.

Let's begin with an example: HAYBALE


Problem Task: You are initially given a zero initialized array of size N. You are then asked to sequence of k instructions ,each of the form "A B" which means that you have to increment the array from A to B by 1.And at last you have to report the median of the array.

Note: I encourage you to think atleast one solution to the problem before procedding to the solution.

Solutions:

1.Brute Force: For each query update the array from A to B by 1.Time Complexity is the difference between the sum of all B's with the sum of all A's plus the time needed to find the median . Worst Case Time complexity: O(N*k+K). As median of the array can be calculated in O(N+K) time(Reason: Maximum value of an element is k).This Solution is for sure going to get Time Out.

2. Start and end Index:  First of all , i would like to give credits to to my friend Satvik Gupta who came up with this solution. Store all the start Index in an array(i.e say start[] ) and store the end Index in another array (i.e. say end[]) . Sort Both arrays in increasing order. Then Find each element of the array one by one. array[i]=Binary_Search(start,i)-Binary_Search(end,i-1),where Binary_Search(array,x) searches for the index of the array which is just smaller than or equal to x. Now find the median of the array.Logic: Finding out number of instructions which crossed the ith element. Time Complexity: (K log K + N log K + N +K) which is an acceptable time complexity.

3. Using Segment Tree:  Update each nodes for every instructions. After executing all instructions , find the value of each element of the array. Then find the median of it. Time complexity : O(K Log N + N log N + N+ K). This is also an acceptable time complexity.More information on segment tree can be found here: SegTrees

4. Update Method:  For each Query, add 1 to array at position A and add -1 to array at position at B+1. After executing all the queries ,update the array by , array[i]=array[i]+array[i-1]. Now find the median of the array. Time Complexity: O(N+K). Fastest Solution to the problem is by this method.

Note that there is no method called "Update Method" in our dictionary of known algorithms. It's idea is used in Binary Indexed trees.

Now consider another problem as follows:
Problem Task: You are initially given a 0 initialized array of size N. Now you are given a mixed variety of two types of queries:
1. I A B   -> Increment the array indexed from A to B by 1.
2. R A     -> Report the value of array[A]
For Ex:
Input:
I 1 2
I 3 4
R 2
I 1 4
I 4 7
I 3 8
R 8
I 1 2
R 2
R 4
Output:
1
1
3
4

Methods  that we can still use it to solve this problem is only 3 as all other method needs a heavy pre-computation of order N or Order K log K as soon as a report query is encountered.However,we can still use Method 2 by mantaining sorted dynamic arrays(You may use set in C++ to make your life comfortable). However the method which looked so much powerfull seems to have been underestimated in our discussion. If we think for the reasons about rejecting the 4th method, we will say that it requires a pre-computaion of order N as soon as report query is insitigated. Can we do update and report opertation using Method 4 in just Log N time. Yes,we can and here comes the concept of Binary Indexed Tree. 

 Pre-Requisites needed to Understand BIT

1.  Consider any number x represented in Binary Format(Base-2) as  a1b where b is all zero and our target is to get a0b . Think about how can we get a0b from a1b. For example Consider 6: 110 ,our aim is to get 100(a=1,b=0).
Solution: x=x -  (x&(-x))
Let's analyze x&(-x):
x is a1b
-x would be (a1b)¯ + 1 = a¯0b¯ + 1 = a¯1b (Reason : b consists of all zeros ,so b complement consists of all 1,thus adding 1 to b complement will make b and pass a carry ). 
x&-x would be 1b.
x - x&-x = a1b-1b = a0b


2.Structure of the Tree: It is basically an array which stores sum of some particular indices of the array(will be discussed,skip ahead if you don't get this).

3.BIT computes the sum of all the elements upto a  given index.

Idea Behind the BIT

 In our whole Discussion , N will always be denoted by the size of the array.
 All numbers can be represented in powers of 2 is the main idea of the structure. Now I will provide you the method to update the tree. 

What actually are we doing by updating index i by x, we will actually mean the Increment Query from index i to the end of the array by x. So, we now aim to keep values in the array in such a way that the query can be solved in order log N. So here goes the method:

Recursion(i<Length of the Array):
             Array[i]+=x
             i= i + (i&-i)
             Recursion(i)

Even If you did not understand how it works and how can it help us in finding the sum of all the elements upto a index i,you will get the reasons as you procede further in this blog.

Before we travel a little more deeper in this, I want you to solve a simple exercise by hand. Let N=15, and i=5 , which values of the array are disturbed using the loop above(Do it in Binary Form Only).
Answer: 101,110,1000.

Now, Consider the number 8(Powers of 2),which update values affect this value.
Try doing it by hand for all N from 1 to 8 as if N>8 then it cannot effect the value of array[8] by the Recurion we saw. After you have done some litlle bit of effort you will notice that every element from 1 to 8 is added up in 8. Try doing for any different N and for any Power of 2(say y) ,you will notice that all the numbers upto y effects y in the update process. Indices which are powers of 2 contain complete sum of all the numbers upto that index.Others would contain partial sum.


How to Read From such a Structure

Read(i)
      sum<-0
      while(i>0):
          sum=sum+Array[i];
          i=i - (i&-i)

Try finding out the steps from where the algorithm goes for 100100100 for N quite large than the given value.
Answer(All Numbers in Binary Form):
100100100
100100000
100000000

Now Try Finding Out the values which 100000000 contains:Answer is All the numbers less than or equal to 100000000.(By the Conclusion we arrived just above)
Now Try Finding Out for value 100100000. This would contain sum from(see carefully) :
100000001 to 100100000

Now Try finding out the value
100100100. This would contain sum from(see carefully):
100100001 to 100100100

Now when we sum all three ,we will notice that we had actually found the sum to all elements upto 100100100.

How to report the index with a given sum:
 
First of all note that the method which i am going to explain is only for those in which sum for greater index is more than the sum to smaller index.

We go through all bits (starting with the highest one), make the index, compare the sum of the current index and given value and, according to the outcome, take the lower or higher half of the interval (just like in binary search).

bitMask is the highest bit of the number N and MaxVal=(bitMask<<1) -1. For ex: bitMask=1000 for N=1010.

int find(int sum){
    int idx = 0;
    while ((bitMask != 0) && (idx < MaxVal)){
        int tIdx = idx + bitMask;     // we make midpoint of interval
        if (sum == tree[tIdx])         // if it is equal, we just return idx
            return tIdx;
        else if (sum > tree[tIdx]){
            idx = tIdx;             // update index ,sum is more so go to higher interval.
            sum -= tree[tIdx];         // set frequency for next loop
        }
        bitMask >>= 1;                 // half current interval
    }
    if (sum != 0) // maybe given cumulative frequency doesn't exist
        return -1;
    else
        return idx;


2D BIT  works in a similar way as 1D does.

It's Update Function is as Follows:

Update(int x , int y , int val):
    while (x <= max_x):
        updatey(x , y , val);
        x += (x & -x);

Updatey(int x , int y , int val):

    while (y <= max_y):
        Array[x][y] += val;
        y += (y & -y);


  Some Illustration of Using this Data Structure

 Problem 1: spoj.com/problems/MSE06H

 Solution:

Store the given pair of highways and sort them according to the city on the                  East Coast. Apply Binary Indexed tree on West Coast City as follows:
for(int i=k;i>=1;i--):
for(j=i;j>=1 && a[j].x==a[i].x;j--)  ans+=read(a[j].y-1);
for(j=i;j>=1 && a[j].x==a[i].x;j--)         update(a[j].y,1);
i=j+1;
a[i].x stores the city of East Coast connected to a[i].y ,the city on the West Coast.read and update functions are the same as described above.

Problem 2: spoj.com/problems/ORDERSET/

Solution:

Imp Idea is to store all queries to get different x's used in the queries. Now,Sort all x's and apply BIT on it's index.
For Ex:
Input:
8
I 10
I 20
I -20
I -40
D 30
D 20
K 2
C 3

Processing of x: 10 20 -20 -40 30 3 are diff x's used in the input.
Sort x:              -20 -40 3 10 20 30

BIT array initially: 0 0 0 0 0 0
BIT array after I 10:
0 0 0 1 0 0
Beware of some of the conditions mentioned in the problem,such as delete only if it is present and insert only if it is not present.This can easily be done by mantaining an array which keeps record of it.

Problem 3: spoj.com/problems/CTRICK/

Solution:

Apply BIT on an array of size n. Update all elements of the array in the begining by 1. Now, Start rotating in the array, say for rotation  value rot and the number of remaining cards i, process it like this:
rot=rot%i + 1
seacrh for rot from the last visited position and update it by -1.

Problem 4: spoj.com/problems/CCOST/

Solution:

The First Intution that comes to mind after reading the problem is 2D BIT but if you the memory requirement , you will be able to figure it out that the problem is not for 2D BIT. The problem can be solved by 1D BIT only.
Store all the Tree Updates and all the queries and sort both of them w.r.t to y cordinate. Mantain an BIT for the x-cordinate ( compress the Array to it's Rank to maximum size of 10^5 instead of 10^7). Now Update the BIT and process the query simultaneously like this:
For a particular y cordinate(say y1) Update the BIT First for all points whose y cordinate is  less than or equal to y1 and then process all the queries in which  the y cordinate is less than or equal to the y1.


Homework Problems:
1.spoj.com/problems/ASSIST/
2.spoj.com/problems/MCHAOS/
3.spoj.com/problems/CPAIR/
4.spoj.com/problems/KQUERY/
5.spoj.com/problems/KPMATRIX/
6.spoj.com/problems/WINDVANE/
7.spoj.com/problems/RRSCHED/
8.spoj.com/problems/CVJETICI/
9.spoj.com/problems/INCSEQ/

More Places to Study BIT:
http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees


Acknowledgement:

1. Utkarsh Lath.
2. Praveen Dhinwa.
3.Topcoder Tutorials.

Contributors:

1. Devendra Agarwal
2. Praveen Dhinwa.

Hello World