Training the TextGenerator

You have been provided with a template for the TextGenerator class in TextGenerator.java, with some method implementations missing. You will have to fill in the implementations for the train and generateText methods. First, let’s get an idea of how the TextGenerator will do its job.

You’ll be creating what is called a Markov model to generate random text. The idea is to take each word and determine how often every other word follows that word. For instance the word “the” is far more likely to be followed by the word “dog” than “ran.” On the other hand, “dog” is far more likely to be followed by “ran” than “dog.” Once we have these frequencies, we can generate text by generating a word, and then generating a word to follow that word, and then a word to follow that word, and so on.

First, you will fill in the train method. This method takes a body of text (as a string) and uses it to count the frequency counts of words following other words (stored in hash maps). Note the three instance variables in the constructor of TextGenerator:

  • totalWords – the total number of words that the TextGenerator has been trained on (not the number of distinct words, just words total)
  • totalTable – this hash map uses strings for keys (words) and stores numbers as values (counts). It stores the total number of times each word was encountered in the training
  • countTable – this hash map uses strings for keys (words). It will store hash maps as values. Each of these inner hash maps will use strings for keys (words) and store numbers (counts). This hash map of hash maps is for storing the number of times each word follows each other word. So

ChainingHashMap m = (ChainingHashMap)countTable.get(“the”)

is a hash map containing the counts of the words following “the” and

Integer c = (Integer)m.get(“dog”)

should be the number of times “dog” followed “the” in the training text.

Note: If you have not been able to get your ChainingHashMap working the way you’d like, you may alter TextGenerator to work with standard Java HashMaps instead.

Your train method should:

  • Split the training text into a list of
  • For each word in the list (except the last one).
    • Increase totalWords by 1.
    • Increase the totalTable entry for that word by 1.
      • Note: if the word is not already in the table, you must add it, and initialize its
    • Increase the countTable entry for that word and the next word in the list by 1.
      • Note: if the word is not already in countTable, you must add it, setting its associated data entry to a new ChainingHashMap.
      • Note: if the word is in the countTable, but the next word is not in the inner hash map associated with the word, you must add the next word, and initialize its

To test your train method, temporarily add some print statements at the end to print out totalWords, totalTable, and countTable and try training on simple examples to make sure they look the way they should. For instance, if you train with the string ‘a b a b a’ you should get something like: totalWords: 4

totalTable: {“a”:2, “b”:2}

countTable: {“a”:{“b”:2}, “b”:{“a”:2}}

(Obviously your formatting may vary). Note: Once you are convinced that your method works, don’t forget to remove the print statements!!!