Ferozeh dot Com



Expert advice on tech interviews !

about ferozeh


Ferozeh !

Read Solved Interview Questions !

www.imo.im is a small start-up based in California. The guy heading the venture is a former Google engineer who probably cashed in on his stock options and left. Surprisingly they were looking to hire people inspite of the recession. I had a phone interview with Marc Sherry the technical lead at imo and it went very well. I don't remember the questions correctly since it was sometime ago so I can't document them. However I was cleared to the next stage and they asked me to complete a programming exercise and submit it. I will document that exercise for you.

Write a program that reads N files from disk and removes any fragments that occur in all files, where a fragment is 3 or more consecutive words.

Example:
INPUT
-----------
File1.txt
It is snowing and I want to drive home.
File2.txt
It is snowing and I want to go skiing.
File3.txt
It is hot and I want to go swimming.
OUTPUT
--------------
Out.File1.txt
It is snowing drive home.
Out.File2.txt
It is snowing go skiing.
Out.File3.txt
It is hot go swimming.

Example command line: ./program File1.txt File2.txt File3.txt The output files should be the input filename with "Out." prepended as seen in the example above.

Just as a note, your code will be tested on more and larger files than the ones used in the example, so efficiency will be taken into consideration.

We suggest that you concentrate on design and coding of fragment removal first and then file I/O. Do not spend too much time on file I/O



Please include comments, a short design document, and instructions on how to compile and run your code under Linux. The program should take input file names from the command line. You are free to use any standard libraries (such as glibc and/or STL).

Compress all your submissions into a single zip file.

Assumptions:

  1. You can ignore capitalization when matching fragments, but preserve it in the output.
  2. You can ignore punctuation, i.e. you can strip it out.
  3. You can make any further assumptions you'd like, but please specify them.

This exercise is not as simple as it seems Consider the following string:

IIIIIIamamamamamam

Say we are to remove the string "Iam" then when we reach the middle of the string we hit the first "Iam", we remove it but wait on, notice that the removel of "Iam" in the middle will result in another "Iam" sub-string being created. However since we have skipped over all the previous 'I' this problem becomes very interesting. I doubt this problem would have an elegant solution especially since in the original question the file is distributed over a number of files.

I didn't complete the exercise as I had my finals going on but by the time I had time I already had been hired elsewhere. However I feel this is a very good design question for interviews and provides good foor for thought for candidates. Go ahead try your luck at it !