Saturday, June 30, 2012

3SUM algorithm

Problem is:
Given an array of integers, find subset of 3 elements of the array such that sum of those three elements is zero.
Brute force:
1. Find all 3-element subsets in the set S.
2. For each subset, find some of each subset
3. Print the one whose sum is zero
Alternative and improved:
1. Sort the list.
2. Following is the algorithm

-25 -10 -7 -3 2 4 8 10  (a+b+c==-25)
 -25 -10 -7 -3 2 4 8 10  (a+b+c==-22)
 . . .
 -25 -10 -7 -3 2 4 8 10  (a+b+c==-7)
 -25 -10 -7 -3 2 4 8 10  (a+b+c==-7)
 -25 -10 -7 -3 2 4 8 10  (a+b+c==-3)
 -25 -10 -7 -3 2 4 8 10  (a+b+c==2)
 -25 -10 -7 -3 2 4 8 10  (a+b+c==0)


where bold one is 'a' and red ones are 'b' and 'c' - shown in the parenthesis on the right.

Following is the working code:

class tuple
{
private:
int i;
int j;
int k;
public:
tuple() { i = 0; j = 0; k = 0; }
tuple(int a, int b, int c) { i = a; j = b; k = c; }
int GetI() { return i; }
int GetJ() { return j; }
int GetK() { return k; }
void SetI(int a) { i = a; }
void SetJ(int b) { j = b; }
void SetK(int c) { k = c; }
};

int compare (const void* a, const void *b)
{
return (*(int*)a - *(int*)b);
}

tuple* SUM (int *a, int n)
{
if (a == NULL) return NULL;
if (n <= 2) return NULL;

qsort (a, n, sizeof(int), compare);
cout << "Sorting complete \n";
cout << "Sorted array is:\n";
for (int i = 0; i < n; i++)
cout << a[i] << '\t';

cout << endl;

for (int i = 0; i < n - 3; i++)
{
int j = i + 1;
int k = n - 1;
while (j < k)
{
if (a[i] + a[j] + a[k] == 0)
return new tuple(i, j, k);
if (a[i] + a[j] + a[k] < 0)
j++;

if (a[i] + a[j] + a[k] > 0)
k--;
}
}

return NULL;
}


Tuesday, June 26, 2012

Find number of inversions in a given list of integers

Inversion: pairs (i,j) of array indices with i<j and A[i] > A[j] 
The task is to find number of such pairs.

A) Brute force way is to start from the left and find out for each element, the number of pairs. At the end, we will have the total count.

B) Another way is to use device-and-conquer paradigm. Divide the list into two. Keep dividing till we reach single element. Then the merge comes for sorting the array. While merging, if the left element is more than the element from the right then we know that the all the elements in the right are smaller than the bigger element, hence we have that count of inversions.
I have outlined the procedure on paper as below:
The code for the same is below:
int MergeAndCount(int *arr, int l, int m, int r);
int MergeSort(int *arr, int l, int r)
{
if (arr == NULL) return 0;
if (l >= r) return 0;

int mid = (l+r)/2;

int left = MergeSort(arr, l, mid);
int right = MergeSort(arr, mid + 1, r);
int mc = MergeAndCount(arr, l, mid, r);
return (left + right + mc);
}

int MergeAndCount(int *arr, int l, int m, int r)
{
if (arr == NULL) return 0;
if (l >= r) return 0;

int icounter = 0;
int tl = l;
int tm = m + 1;
int tr = r;
int temp = 0;
int *tarr = new int [r - l + 1];
while (tl <=m && tm <= r)
{
if (arr[tl] <= arr[tm])
{
tarr[temp++] = arr[tl++];
}
else
{
icounter += m - tl + 1;
tarr [temp++] = arr[tm++];
}
}

while (tl <= m)
tarr[temp++] = arr[tl++];
while (tr <= r)
tarr[temp++] = arr[tr++];

// Now copy back the array
temp = 0;
while (l <= r)
{
arr[l++] = tarr[temp++];
}

delete tarr;
return icounter;
}

I hope it helps.

Monday, June 25, 2012

Longest Palindrome Substring size

In this post, I am going to concentrate on longest palindrome substring's size.
Here is the example:
Let the string be "GEEGGEEROREEGOF"
There are two palindromes in this string:
1. GEEG
2. GEEROREEG

A) One approach is to find all substrings --> Filter them for the one that are palindromes --> Find which one of them is longest and we have the answer.
B) The previous one was brute force. Then I come across this link: http://www.geeksforgeeks.org/archives/19155. The guy has explained beautifully. And it worked.
The idea is that start from the two ends and find out if the characters match. Once we have a match, we start finding recursively if that's a palindrome and if we have the longest one.
Here is the code in C++:

#include<iostream>
#include<string>
using namespace std;

int max (int x, int y) { return (x > y)? x : y; }

// Returns the length of the longest palindromic subsequence in seq
int lps(string seq, int i, int j)
{
   // Base Case 1: If there is only 1 character
   if (i == j)
     return 1;

   // Base Case 2: If there are only 2 characters and both are same
   if (seq[i] == seq[j] && i + 1 == j)
     return 2;

   // If the first and last characters match
   if (seq[i] == seq[j])
      return lps (seq, i+1, j-1) + 2;

   // If the first and last characters do not match
   return max( lps(seq, i, j-1), lps(seq, i+1, j) );
}

Friday, June 15, 2012

Discuss and implement phone book

In this post, I will discuss how to design data structure for a phone book and then implement the same.
A phone book, or some times called the contacts book is a collection of people information on the phone or any other device. So, for the purpose of simplicity, I will use the phone book as a collection of a set {Name, phone number, email} where any of them can be empty. The fastest implementation I found is using TRIE. So, I decided to implement the same.  Following is the implementation, which is pretty straight forward and self explanatory:


// This shows TRIE data structure implementation in C++
// The TRIE is implemented using two different classes.
// 1. Class for each node - this contains a vector of pointers to the same class and a character value
// 2. The container class that holds the root of the TRIE.

#include <iostream>
#include <string>
#include <vector>

using namespace std;

class Node
{
public:
Node() { c = ' ';}
~Node();
inline vector <Node *> GetChildren() { return child; }
Node *FindChild(char);
void AddChild(char);

private:
vector <Node *> child;
char c;
};

class TrieRoot
{
public:
void AddWord(string s);
bool IsWordPresent(string s);
TrieRoot() { node = new Node(); }
~TrieRoot() { delete node; }

private:
Node *node;
};

Node * Node::FindChild(char c)
{
if (child.empty()) return NULL;
for (unsigned int i = 0; i < child.size(); i++)
{
if ( child[i]->c == c ) 
return child[i];
}
return NULL;
}

void Node::AddChild(char c)
{
Node *temp = new Node();
temp->c = c;
child.push_back(temp);
}

void TrieRoot::AddWord (string s)
{
if (s.empty()) return;
Node *temp = node;
for(unsigned int i = 0; i < s.size(); i++)
{
Node *temp1 = temp->FindChild(s[i]);
if (temp1 == NULL)
{
temp->AddChild(s[i]);
temp = temp->FindChild(s[i]);
}
else
{
temp = temp1;
}
}

return;
}

// Breadth first search
bool TrieRoot::IsWordPresent(string s)
{
if (s.empty()) return false;
Node *temp = node;
for (unsigned int i = 0; i < s.size(); i++)
{
Node *temp1 = temp->FindChild(s[i]);
cout << "Charater to search for: " << s[i] << endl;
if (temp1 == NULL
return false;
}
temp = temp1;
}

return true;
}

Node::~Node()
{
if (!child.empty())
for (unsigned int i = 0; i < child.size(); i++)
delete child[i];
}

Tuesday, June 12, 2012

Boyer-Moore algorithm

This is an efficient algorithm for string search. On wikipedia, I found it difficult to understand. But then, I finally learnt it. What I found is that a good way of learning is to see how does different cases work - learn by example. So, I decided to have a blog where I present examples followed by algo/code. Note that I am assuming here that the string from which strings search is performed to be made has ASCII characters only. This is just to make the algorithm simpler and easier to understand.
Let me first give an insight in layman terms, on how this algorithm works:
The string is searched starting from the end towards the beginning.
Before start searching there is a preprocessing done on the string to be searched (S). There is a small table created (I call it jump table). As the name suggests, the table is used for jumping to a new location in the Original string (T) in case the string is not matched at the current location.
Let's assume S: LAP
And T: THIS IS A LAPTOP
To create the jump table, we will give each character in S as the position value:
LAP
210


i.e., L is given 2, A as 1 and P as 0.
The algorithm starts comparing from ^ and moves towards left as shown below:
T: THIS IS A LAPTOP
S: LAP
 <-- ^


In this case, the match is found at I in string T. 
Now since I is not in the jump table, string S starting location becomes 'S', i.e., at the next location to last mismatched location.

T: THIS IS A LAPTOP
S:    LAP
   <--  ^
In this iteration also, the same follows, so the next iteration point is:
T: THIS IS A LAPTOP
S:       LAP
      <--  ^
And the next iteration follows the same:
T: THIS IS A LAPTOP
S:          LAP
         <--  ^
Now, in this iteration, the mismatch character in T is 'A', which is in the jump table - has a value 1. This means that the jump for the next iteration is 1 character. Hence the next iteration looks like.

T: THIS IS A LAPTOP
S:           LAP
          <--  ^

For this iteration, the match is found, hence the algorithm terminates.
If there is no mismatch, the algorithm terminates when the string T ends.
I hope, I am able to clearly state the algorithm using the above example.
I have tried to capture the same in the C/C++ code as follows. Note that for the jump table I have used an array of 256 entries, one for each ASCII character. This can be replaced with different implementations, but the idea remains the same.
Now the code:

// The algorithm here is boyer-moore's string matching algorithm
#include <iostream>
#include <cstring>
#define MAX 256
using namespace std;


// Assuming that the array has been initialized to appropriated values
void PreprocessString(char *str, int size, int *arr, int sarr)
{
if (str == NULL)
{
cout << "String passed is null" << endl;
return;
}

if (size <= 0)
{
cout << "String can't be processed as its size is zero or negative" << endl;
return;
}

if (arr == NULL)
{
cout << "Int array is NULL" << endl;
return;
}

if (sarr <= 0)
{
cout << "Array size is zero or negative" << endl;
return;
}

for (int i = 0; i < size; i++)
if (arr[str[i]] == -1)
arr[str[i]] = size - i - 1;

return;
}

int BoyerMoore(char *T, char *S, int lt, int ls)
{
if (T == NULL)
{
cout << "String to be searched is NULL" << endl;
return -1;
}
if (S == NULL)
{
cout << "Pattern string is NULL" << endl;
return -1;
}

if (lt <= 0)
{
cout << "Actual string can't be processed as its size is zero or negative" << endl;
return -1;
}
if (ls <= 0)
{
cout << "Pattern string can't be processed as its size is zero or negative" << endl;
return -1;
}

// Initialize the array assuming the ascii characters in the array
int arr[MAX];

for (int i = 0; i < MAX; i++)
{
arr[i] = -1;
}

PreprocessString(S, ls, arr, MAX);

// This will tell how much to jump for next iteration
int jump = 0;

for(int counter = 0; counter <= lt - 1;)
{
// Now perform the backward search
int j;
for (j = ls - 1; j >= 0; j--)
{
if (S[j] != T[counter + j])
break;
}

// if j < 0, we have a match at counter location
if (j < 0) return counter;

// Else we need to adjust jump
if (arr[T[counter + j]] != -1)
jump = arr[T[counter + j]];
else
jump = j == 0 ? 1 : j;

counter += jump;
}

return -1;
}


I appreciate any feedback!