Introduction
As part of our standard recruitment procedures, we ask programming candidates
to undertake a small programming exercise. This should be written in Python.
The exercise consists of:
-
Part 1: writing a program to solve a problem
-
Part 2: writing an analysis of other solutions to the problem.
Please design and code this program alone. While you will have colleagues in
your normal working environment, we need to see how good you are, not how good
your current friends or colleagues are. If we hire you and your level of
competence is not as indicated by this test, we will have to carefully consider
whether your employment should continue past the probation period. If you do
copy code from elsewhere, for any reason (e.g. a library routine), please
clearly indicate its source.
You should be able to finish this exercise in 4 hours. Some of the best
people can complete it in much less time, but the time that you take will
depend ultimately on your skill and the amount of care you use.
All code, comments and documentation should be in English (ideally using
British spelling conventions).
Submission
You should submit your complete source code and the analysis document (please
note the names required for the files) in a zip file with the name
AblingPythonTest_<your name>.zip - for example,
AblingPythonTest_ChiMo.zip. You should email it to
recruit@abling.com with the subject line "Python test results from
<your name>".
Part 1: Programming Problem
Write a program called wordcount.py that reads a file that
contains ASCII encoded text and counts the occurrences of each word. Please
code the program in a single file. Please use standard Python libraries only.
Make sure your code is compatible with Python 2.4.
The program will not require any user intervention to operate. It will take
the first argument from the command line as the full path to the file
containing the words. The second argument will be the full path to the file in
which to write the results. Always overwrite the results file without asking.
For example, if the command:
c:\>wordcount.py words.txt c:\test_results\countxyz.txt
is entered in a command box (also know as cmd or DOS box) then the program
wordcount.py will execute (assuming it has been put into the folder c:\), read
the words from the words.txt file in the local folder and write the results
into the file countxyz.txt in the folder c:\test_results.
A word is a sequence of characters in the range [a-zA-Z]. Any other character
is treated as a word separator.
A word may appear with mixed upper or lower case characters in the text file.
Upper case characters should be converted to lower case before the word is
counted.
Lines in the input file may be terminated in either the *nix style (line
feed) or DOS (carriage-return, line-feed) so your program should deal sensibly
with either type.
The result file is to have the words listed in alphabetical order, with one
word per line, followed by ": " (colon and a space), the word count, and line
separator. The line separator should be carriage-return, line-feed (i.e. the
ASCII characters 0x0d 0x0a). So the word file:
Hot2hat
not/hot
nat hat-hot
would give the results:
hat: 2
hot: 3
nat: 1
not: 1
Guidelines
The guidelines below are indications of maximum size only. Your
program should cope with longer words, lines and files. However, you can use
this information to help you select your algorithm.
| Word length: |
100 characters maximum (ASCII encoding only) |
| Line length: |
1000 characters maximum |
| Number of words in word list: |
100,000 maximum |
Quality
Quality is paramount. You should make sure that your program is coded in a
professional manner and it should be thoroughly commented throughout. While
running time is important to us, we do not need you to spend a lot of time
tuning and we would like to see your first correct effort. Spend more time
getting it right than getting it fast.
Test files
We have provided some files to help you in
PythonTestFiles.zip. This has two files in it:
-
words.txt - a sample set of words
-
count.txt - the expected output from words.txt
Please ensure that your program reproduces the count.txt output exactly. We
will use binary comparison to see that your program is correct.
Part 2: Analysis Document
In addition to the program, write an analysis of the algorithm you have
chosen and other possible algorithms to solve the problem. Look at the expected
running time of the different algorithms as a function of the number of words and
the number of duplicate words.
This should be no more than 4k of plain ASCII text. It should be stored in a
file called analysis.txt and submitted along with your
program.