Java代写 | Fall 2018 Assignment 4 | CS代写 | Java 文本处理

Java代写/Java文本处理/Java String/Java IO/ArrayList/toString/Java基础/Java入门

这次的assignment要求使用Java IO读取爱丽丝梦游仙境全书TXT文本,并对部分词语进行比对和统计。

1. Material Covered

  • Files and Exceptions
  • Classes and inheritance
  • String processing

2. Notes and Instructions

  • Follow the programming standards posted on the course website to avoid losing marks.
  • To submit the assignment you will upload the required files to the Assignment 4 folder on the course website.
  • The markers will compile and execute your code, and will run several different tests on it (not all of which will be given in the assignment). To be eligible to earn full marks, your Java programs must compile and run as downloaded, without requiring any modifications.
  • Hand in only and exactly the files specified in this assignment, in the specified format. Failure to do so will result in lost marks.

Background:

This assignment will create a concordance for a book. A “concordance” is a list of the words that appear in a book, with a reference to where the words appear. You will be given the entire text of the book Alice in Wonderland to use as test data (but you should also create smaller test files of your own).

The assignment specifies many of the required methods, but you can and should add more of your own design.

Part 1: The WordData class

Create a WordData class which will store all the information about one particular word in the concordance, including:

  • The word itself (a String of lowercase letters only)
  • A list of all of the line numbers on which the word appears. This must be stored as an ArrayList<Integer>
  • The number of times the word appears. (This may be different from the length of the ArrayList because a word might appear more than once on the same line.)

Provide suitable constructor(s) and other methods to create and manage these objects. You must have:

  • A standard toString method that will show the word, its frequency, and the lines on which it appears. If there is a large list of lines, show only the first 15 of them, followed by … and then the last one. For example:

alice appears 386 times on 384 lines:

13,16,25,31,37,41,57,65,70,84,89,99,102,110,123…3301

  • A standard equals method boolean equals(Object other) which will return true if the other Object is a WordData object that contains the same word. The frequency and list of lines are ignored. Note that the parameter must be type Object and not type WordData (this is required so that the method will override the equals method in the Object class).
  • A standard compareTo method int compareTo(Object other) which will compare the frequencies of two WordData objects (the word itself and the list of lines are ignored). It should return a negative value if this word has a larger frequency than the other word, a 0 if they are the same, and a positive number otherwise. (This will be used to sort frequencies into descending order, so the output is the reverse of the normal output.) If the other Object is not a WordData object at all, return 0.

Run your own tests on this class before continuing.

Part 2: The WordDataArrayList class

The concordance will consist of a list of WordData objects, so create a class to hold such a list. An ArrayList<WordData> would do almost what we need, but not quite. The standard add methods for the ArrayList class don’t work the correct way, and so we will need to implement a special one.

  • Create a WordDataArrayList class which is a subclass of ArrayList<WordData>. This means that a WordDataArrayList object will not contain an ArrayList instance variable. It will be an ArrayList Make sure you understand this!
  • Provide a method void add(int line, String word) which indicates that the given word has appeared on the given line. The list should be adjusted accordingly. If a WordData object for that word already exists, it should be modified, otherwise a new one should be created.

Run your own tests on this class before continuing.

Part 3: The WordScanner class

You will need to pick out all of the individual words in a file that contains a book. The Scanner class can find “tokens” (sequences of non-blank characters) but that is not the same as finding “words”. So once again a customized class should be created.

  • Create a WordScanner It would be nice to make this a subclass of Scanner, but that’s not possible because, unlike ArrayList, Scanner is a final class and that means that it won’t allow subclasses. Instead, simply use a Scanner in the ordinary way.
  • Provide a constructor public WordScanner(String line, int minSize) which will prepare to find all of the words in the given line that contain minSize letters or more.
  • Provide a public method String nextWord() which will return the next word on the line which contains at least the minimum number of letters, or null if no such word can be found.
  • Here’s how “words” will be defined:
    1. Replace all hyphens (-) in the line by blanks. This is because in the old style of English used in Alice in Wonderland, many things like “waistcoat-pocket” are hyphenated, but this program should treat those as separate words, as they would be in modern English.
    2. Convert all letters to lower case letters.
    3. In any remaining sequence of non-blank characters, remove all non-letters. Whatever is left will be a “word”. In some places in the book there might be punctuation marks all by themselves, which would give words of size 0. That’s OK.
    4. Accept only words that meet the minimum length requirement.

Run your own tests on this class before continuing.

Part 4: The Concordance class

Define a Concordance class containing a standard main method which will be used to test your program. This method should:

  1. Use a JFileChooser to allow the user to pick a text file. Be sure to use new JFileChooser(“.”) to create it, with the “.” This will ensure that it begins at the “current directory”. Do not use the path to a folder on your own computer! When the marker tries to run it, there will be no such folder, and the program will not run!
  2. Allow the user to specify the minimum word length to be used, and the number of common words to be listed in the output. You can use a simple Scanner, or use JOptionPane.
  3. Construct a Concordance for that file, containing all of the words of the minimum length or longer.
  4. Sort the concordance so that the words are in descending order of frequency. You don’t have to write your own sorting method. Make use of the following two built-in methods:
    1. void Arrays.sort(Object[]) will sort any array of objects, based on the results of the compareTo method of those objects. (This is why WordData has a compareTo ) You will need to import java.util.Arrays
    2. But that will sort only arrays, not ArrayList Never fear! The ArrayList instance method Object[] toArray() will create an Object[] array out of any ArrayList.
  5. Print the total number of words found. Print out the most frequent words in the concordance, as requested by the user. (You will be printing from the sorted array, not the unsorted WordDataArrayList.)

For example, here is a sample test run (using the file AliceFullText.txt):

> run Concordance


What is the minimum length word to list?  5


List how many of the most common words?  10


Number of words with 5 letters or more: 1977


alice appears 386 times on 384 lines:


13,16,25,31,37,41,57,65,70,84,89,99,102,110,123…3301


little appears 128 times on 127 lines:


81,114,120,121,129,135,138,143,146,170,172,180,184,197,204…3334


about appears 94 times on 93 lines: 42,46,59,69,120,126,146,172,232,258,311,324,347,352,362…3334


would appears 83 times on 78 lines:


21,62,64,118,128,129,147,236,316,358,378,393,434,435,443…3336


again appears 83 times on 82 lines:


38,73,85,112,253,308,312,322,331,372,388,394,405,409,430…3321


herself appears 83 times on 80 lines:


42,57,90,106,173,184,187,189,190,192,204,236,273,279,323…3332


could appears 77 times on 74 lines:


19,80,127,129,130,175,181,182,251,270,280,323,382,383,406…3311


there appears 75 times on 72 lines:


25,50,67,76,84,87,102,109,115,135,258,398,417,437,503…3258


thought appears 74 times on 74 lines:


16,27,57,71,116,128,169,194,233,237,321,333,353,358,362…3113


queen appears 68 times on 67 lines:


1239,1241,1393,1491,1693,1713,1896,1916,1920,1922,1935,1945,1948,1955,1964…3312

HAND IN: Your Concordance, WordData, WordDataArrayList, and WordScanner classes, as separate java files.

TEST DATA: The file AliceFullText.txt will contain the entire book Alice in Wonderland. The file AliceChapter1.txt will contain only Chapter 1. You may wish to create even smaller test data files of your own while debugging your program. Even the full book should take no more than a few seconds to create the concordance. If your program is very slow, you’ve done something very inefficient (or you have an extremely slow computer).

MARKING: The markers will run your code several times, using different test cases. Many of the marks will be based on obtaining the correct results. One set of correct results is given above. Run your own tests until you’re confident that your results will be correct.