You must write a program which reads, processes and reports on the contents of a text file.
Your program should:
1. Read the name of the text file from the console.
2. Read in a text file, not all at once. (This can be line by line, word by word or character by character.)
3. The file content must be converted to a sequence of words, discarding punctuation and folding all letters
into lower case.
4. Store the unique words and maintain a count of each different word.
5. The words should be ordered by decreasing count and, if there are multiple words with the same count,
alphabetically. (This ordering may be achieved as the words are read in, partially as the words are read or
at the end of all input processing.)
6. Output the total number of words and the number of “unique words”
7. Output the first fifteen words in the sorted list, along with their counts.
8. Output the last fifteen words in the list, along with their counts.
You must choose appropriate data structures and algorithms to accomplish this task. Note that
1) in the context of this assignment, appropriate choices will be efficient and will not use excessive
instructions or data.
2) where a punctuation mark appears between two letters, the sequence is to be treated as a single word.
Thus, it’s will become its, you’ll will become youll and loop-hole will become loophole.
3) you can assume that the input file contains no more than 50,000 different words.
4) Two sample input files “sample-short.txt” and “sample-long.txt” is provided for you to
test your program and produce output.
5) you may use any data structures or algorithms that have been presented in class up to the end of week 4.
If you use other data structures or algorithms appropriate references must be provided.
6) Programs must compile and run under gcc (C programs), g++ (C++ programs), java or python.
Programs which do not compile and run will receive no marks.
7) Programs should be appropriately documented with comments.
8) All coding must be your own work.
9) Standard libraries of data structures and algorithms such as STL should not be used.
10) Code be sourced from textbooks, the internet, etc may also not be unless it is correctly credited. In the
event that you use code sourced in this way you will not receive marks for that part of the program.
A pdf file describing your solution and program output should be produced. This file should contain:
1. A high‐level description of the overall solution strategy.
2. A list of all of the data structures used, where they are used and the reasons for their choice.
3. A list of any standard algorithms used, where they are used and why they are used.
4. The output produced by your program on the provided “sample-long.txt” file.
5. The report should be no more than 2 pages. If it is more than 2 pages, marking will be only based on the
first two pages.
6. The report pdf file should be called <your login>-a1.pdf
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: email@example.com 微信:itcsdx