Coders Coffee Corner

Monday, June 12, 2006

Phone screens/ Job interviews Etc

Folks,
This is a long time overdue.
I graduated in December 03 after completing my masters in computer science.

People scared me enough to extend my graduation by six months because of the economy.But trust me, if you have the right credentials, jobs will come calling.

I have , in the last six months interviewed with quite a few companies. I made it past the phone screens in quite a few of them. The interviews are posted in chronological order.

I have some background in Unix systems programming; Linux/BSD systems administration;Compiler design;distributed systems design;enterprise software design(J2EE-based); Python/Web based - data mining programming; strong C++/Java/OO design skills and a passion for OS and most things open source. I have started writing an OS and i will be posting my experiences. Remember, I had no real-world experince. Most of my experience was in good old academic/research lands.

May 2003: Bloomberg (Phone screen)
December 2003: Morning Star, Chicago (Phone Screen)
December 2003: Expedia (includes phone round + final rounds)
Januray 2004 : Oracle India (Phone screen - 2 rounds)
February 2004: Amazon (Phone screen - 2 rounds)
March 2004 : Microsoft (includes phone round + final rounds)
May 2004: Google (Phone screen - 2 rounds)

May 2003: Bloomberg (Phone screen)
I hardly remember this. This was first of my interviews and my inexperience showed up.
I was grilled with questions based on Unix and shell scripting. It lasted for about half-an-hour. Bllomberg is yet to get back to me! Bloomberg has(from what i heard) a properitery software system and all new recruits are to go through a mandatory training program. This happens only once a year(summer).
I was quizzed on fork, awk/pattern matching , memory management in C/C++ , new/malloc/free , sql/refrential integrity

December 2003: Morning Star(Phone screen)
1. C++ memory model (new , placement_new , delete/delete[] , std::bad_alloc and ways to tackle low memory situations(Scott Meyers book on C++ has good refrences to this)
2. Design a Sql Engine
3. C++ inheritence
4. C++ casts (with emphasis on inheritence)

December 2003: Expedia (Phone Screen)
Expedia is an excellent company to work for. It has a bunch of extremely smart and focused people.
I was interviewed by a manager (managers usually dont do a phone screen! shows the technical orientation of the company)
1.Algorithm to find a single duplicate in a set of sequential numbers (p,p+1,p+2 ... p+q). One of the numbers in this set repeat and the question was to come up with the best solution (both in terms of memory and running time). No assumptions can be made on the ordering of data.
2.Algorithm to determine if a linked list has cycles
3. Resume based questions
4. Was asked to code sorted_insert for a linked list (the function is to insert a node in an appropriate position, assuming that the head_ptr passed pointed to a sorted linked list)


December 2003: Expedia (On-site interviews at Bellevue, WA)
Expedia is closely linked with Microsoft hiring philosophy. Be prepared to stand throug out the interview, holding a marker and to write C/C++ code on white boards
Round 1:
1. C++ inling and performance issues.
2. Algorithm to remove white spaces from strings
Things to keeps in mind :
a. Overflows (extremely long strings)
b. prefer your functions over library calls
3. Design an ultra-efficient hash for 3-character airport codes terminated by nulls
(Hint: the solution offered had something to do with unions and longs)

Round 2:
Made a very grave mistaking of stating "I hate UI programming" to a UI designer :(
1. Discussion of my areas of interest in computer science and my programming preferences
2. Program to compute all combinations of a given string
(Hint: the most optimal solution uses bit vectors)

Round 3:
This was the dreaded lunch round! Lunch rounds suck especially for vegetarians like me. People hardly know which places offer veggie food! And of course, you will not know when to eat and when to answer your recruiter!
During lunch, we had some small talk about weather, my background, places to visit in Seattle. The interviewer then took me to his office.
1. Discussion about hash tables (when to use, when have i used them , advantages/disadvantages , hash functions and hashing strategies)
2. Design and code a parser that correctly computes in-order expressions
(Hint: trivial if you have done a course in compiler-designer; easy otherwise ;)

Round 4:
1. Write a program to rotate objects in an array.
For eg, if the vector was a->b->c->d
rotating this once to the right yields
d->a->b->c
the constraints where that the objects are huge, and i use very little extra space (preferably just another temp obj)

Round 5:
1. The same rotation question, but this time it was for linked lists. This is much simpler compared to one in round 4

Notes:
My thoughts after the interviews (Disclaimer : might be wrong)
0. I was interviewed for a software design engineer position
1. Expedia looks at design/algorithm and code
2. I was seldom asked to test my code



January 2004: Oracle India (banglore) (phone screens - 2 rounds)
Oracle , afaik does most systems oriented stuff in banglore and app-oriented stuff in hyderbad, India.
I was interviewed for a new team that would port Oracle 64-bit solaris version to Linux. The pre-requisite was reasonable knowledge of linux systems/ systems programming.
Was basically quized on IPCs, process based sys calls (fork , wait , exec etc) and file-system based sys calls.
Was offered a position (my first ever job :)

February 2004: Amazon (Phone screen - 2 rounds)
Round 1:

1. Strategies to sort files >1 GB!
2. Find cycles in linked list

Round 2:
1. c++ questions (pointers , memory , polymorphism)
2. puzzle! - derive a formula to compute the angle between the horu hand and the minute hand in an analog clock. The only input is time in hh:mm:ss

March 2004: Microsoft (Phone screen + 3 on campus rounds)
Phone Screen: (With SQL Server Team)
1. C++ questions (virtual pointers, v-tables and v-ptrs)
2. Test a vending machine (what else?)
3. Define joins and various types of joins
4. Define referential intergrity

HR round:-
1. What is your favorite language?
2. Describe the most hallenging project? What were the challenges faced and how did you overcome them?
3. Test a stapler

Round 1:
1. Derive and test a formula that divides a NxN choclate bars into single individual units
2. If given a chance, what product feature (in any microsoft product), will you change? Why? How do you think you can rectify the glitch?

Round 2:
1. write and test atoi(string)
remeber delimiters, recursion, very long strings

Round 3:
1. Merge two linked lists.
2. Merge linked lists so that the end result is a sorted list
3. given two tree nodes, find the common ancestor node to them
4. puzzle: two parachutes land x distant apart on a perfectly straight rail road. Once the para-troopers land, they pick a direction at random and start walking. Devise a hueristic that will make the paratroopers meet

May 2004: Google (two rounds of phone screen)
Introduction : I was interviewed by the site reliability engineering team. They were looking for developers with sys administration experience

Round 1:
1. How do you go about  renaming all files ending in .java to .new and vice-versa
2. Devise an algorithm for finding the 1000 closest points to origin in a 3-d matrix with billion points. Discuss the time complexity of this algorithm
3. Discuss the design of a chat system with file transfer capability.

Round 2:
This was totally on sys administration. I dont remember all the questions.
1. Difference between fopen and open
2. nfs based questions - what does it mean to get a stale  nfs handle?
3. lots of questions on various sys calls ....dont remember them :(
4. routing using /etc/hosts , nsswitch , ip routing/dns
5. write a sql query to implement the set difference operators between two tables
6.describe in detail what happens when I type in the name of an executable at unix prompt  (went into details like fork , exec , loader etc etc)

Functional Testing

I am going to list out all generic test cases for the following function signatures
(blackbox testing)

func(char*) // note not a const char* , eg : strlen
func(char* , char*) // eg: strcpy, strstr , strcp etc etc
func(int*) // eg: sort , find duplicates in an arr etc etc

func(char* s)
1. s == null
2. s == "" //not empty
3. very large strings (strlen > int_max, to check for interger overflows)
4. char* of nulls
5. non-null terminated (strange case...in c all strs are null terminated...terminating condition violated)
6. char* made of lower case chars only
7. char* made of upper case chars only
8. char* made of combo of upper/lower case chars
9. char* made of special chars
10. char* made of non-printable chars
11. char* made of numerals
12. char* made of numerals/chars
13. char* made of patterns like '\ n' (shouldnt be intepreted as \n)

how can func(char*) be implemented?
14. recursion (check for stack overflow)
15. check for manipulation of passed char* (can char* be modified as per reqs?)
16. pass read-only mem as char* (check for clean termination with proper err msgs)

security checks:
17. buffer overflow (pass very long strings and something else? ...need to fill in the blanks)
since char* can be interpreted as a pointer to any mem location, pass pointers to other datatypes


err msg checks (usability)
18. check for proper err msgs/codes

clean termination tests:
19. see how cleanly the function terminates
a. when err occurs
b. when there are no errs

performance tests:
20. memory footprint
21. resource usage (file desc? , io device usage, interrupts/signals?)
22. runtime characteristics ( big-Oh and worst case analysis)
23. system calls and context switches (why should anyone invoke sys calls in a func like strlen(), but still a valid test case. for eg, if u need to find strlen of a str that is 10^100 long, one might be tempted to use files for intermediate processing, and opening, reading and writing to files are sys calls even if u use fread/fwrite and fopen)

packaging
24. does the function depend on external libaries? are the libs statically bound. dynamically bound (perf issues)
25. protability (does it use code that doesnt compile in other archs)...not ok for generic lib calls like strlen

documentation
26. check for dumnetation (eg: check manpage for strlen)


Most test cases will repeat for func(char* s1 , char* s2)
1. s1 == null
2. s1 == ""
3. s2 == null
4. s2 == ""
5. s1 and s2 null
6. s1 and s2 ""
7. s1 >> s2 (s1 much longer than s2)
8. s2 >> s1 (s2 much longer than s1)
9. s1, s2 overlap (important case)
10. s1 == s2 (pointer wise)
11. s1 == s2 (content wise)
12. s1 subset of s2 content-wise
13. s2 subset of s1 content-wise

Sunday, April 03, 2005

The intricacies of strstr

the deceptively simply sounding and oft-used function strstr has more than what meets the eye.

How many times is that you have searched a web-page/word document/filesystem for mactching strings? Probably more than once a day. The function that does string matching for you is strstr

This has the signature
char* strstr(const char* text , const char* pattern_tomatch)

It returns a pointer to the located string, else a null pointer.

Lets go about implementing this function.

The first and most trivial implementation has a O(n2) complexity


Heres the code for this

char* strstr(const char* s1 , const char* s2){
// set tmp pointers to intial location
char* tmp = s2;
char* tmp2 = s1;

// null and empty checks
if (0 == s2) return s1;
// returns s1 if s2 is ""
if (0 == *s2) return s1;
if (0 == s1) return 0;
if (0 == *s1) return 0;

while(*s1){
if (*s1 == *s2){
if (0 == *(++s2)){
return (s1-(s2-tmp-1));
}
}
else {
s2 = tmp;
s1 = ++tmp2;
if (*s1 == *s2) s2++;
}
s1++;
}
return 0;
}

In order to simplify the above "obfuscated" code and to prove its O(n2), lets rewrite using a for loop



char* strstr(const char* s1, const char* s2){

//skipping validations

// bad practise, size of s1 and/or s2 can be > long
unsigned long len_s1 = strlen(s1);
unsigned long len_s2 = strlen(s2);

unsigned long j = 0;
for (unsigned long i=0;i
if (s1[i] == s2[j]){
for (j=1;j if (s1[i+j] != s2[j])
break;
}
}
if (j == len_s2+1) return s1+i+(j-len_s2);
}

return (char*)0;
}



I always prefer not using ints/longs as they are always susceptible to overflows and lots of of (consequent) security bugs

I will now go about writing developing an algortihm for speeding up the whole process.

The most common thought that comes to the mind is to somehow use a hash table (hash table is logical to me because the problem involves a lookup of individual charaters)
Hash tables arent suitable for this problem because hashes dont store ordering information.
For eg,
aaxbaa , ab are to strstr'ed
a hash table would like
key | # occurences
a 4
b 1
x 1
the string to be matched "ab", will match the input str because both a and b exist in the hash table.

Obviously, this approach is no good. There should be a means to store the place of occurence of a character and not the number of occurences. So a hash table can look like this

key | linked list of positions
a 0->1->4->5
b 3
x 2

Now the algorithm can be modified to do look for s2 ("ab" in our case) such that the lookups are in increasing, consecutive order.

so, a look of "a" will yield 0 and b will yield 2. o and 2 are in consecutive, so its not a match.
Suppose, we look fo "xb", lookup for x yields 2 and lookup for b yields 3.
So (2,3) is an increasing, consecutive order and we have a match.

Now, what if s2 had many occurences of a character. In order to scale, the lookup should be engineered to take this into account.....
to be contd(with other possible heuristics) ......