This is the test class code that I need help with: @Test public void testTask3ex

ID: 3845226 • Letter: T

Question

This is the test class code that I need help with:

@Test
public void testTask3example()
{
MarkovModel model = new MarkovModel(2,"aabcabaacaac");
ModelMatcher match = new ModelMatcher(model,"aabbcaac");
assertEquals(0,1); //TODO replace with test code
}
}

The details for this task and code are below:

Task 3 – Matching test strings to a model (ModelMatcher)

Link to task 1: https://www.chegg.com/homework-help/questions-and-answers/task-1-analysing-n-grams-sample-text-ngramanalyser-task-need-complete-ngramanalyser-class--q22251885

Link to task 2: https://www.chegg.com/homework-help/questions-and-answers/task-2-building-markov-model-text-sample-markovmodel-task-need-complete-markovmodel-class--q22255515

For this task, you will need to complete the ModelMatcher class, which determines which of two Markov models better matches a test string. Given two Markov models built from text samples taken from two different sources, we can use them to estimate which source it is more likely a test string was drawn from. Even using only zero-th order models constructed using English and Russian text, we should be able to tell, for instance, that the string

“ ”

is more likely to be Russian than English.

We will computer a measure of fit of test strings against models, called the likelihood of a sequence under the model. The likelihood of a sequence under a kk-th order Markov model is calculated as follows:

For each symbol c in the sequence, compute the probability of observing c under the model, given its k-letter context p (assuming the sequence to “wrap around”, as described above), using the Laplace-smoothed estimate of probability we used for the MarkovModel class.

Compute the likelihood of the entire sequence as the product of the likelihoods of each character.

In mathematical notation:

Let ss be an input sequence of length nn, and let MM be a kk-th order Markov model. In order to calculate the likelihood of the sequence ss under the model MM, for each symbol cici in ss (where 1in1in), let pipi be the kk-length context of the symbol cici (assuming wraparound). The likelihood of the sequence ss under the model is

ni=1laplace(ci)i=1nlaplace(ci)

where laplacelaplace is the Laplace-smoothed probability of cici occurring (given its context) as described in the previous task.

The probability we obtain from this calculation may be very small – in fact, potentially so small as to be indistinguishable from zero when using Java’s built-in floating-point arithmetic. Therefore, we will calculate and express the likelihood using log probabilities, which do not suffer from this problem. (A weakness of log probabilities is that they cannot straightforwardly represent probabilities of zero, but our use of Laplace smoothing allows us to avoid this problem.) The product of two probabilities pp and qq can be calculated by adding their log probabilities, logplogp and logqlogq.

By way of example, suppose we have constructed a 2nd-order Markov model using the input string “aabcabaacaac”, as described in the example for Task 2. If we are then given a test string "aabbcaac", we can compute its log likelihood as follows:

For each character in the test string, obtain its length-2 context (assuming wraparound). Note that the tri-gram "caa" occurs twice in the (wrapped) test string.

For each character in the test string, calculate its Laplace-smoothed probability of occurring (given its context), and then take the log of this.

Then

total log likelihood = sum of log probabilities = -3.0792, and

average log likelihood = sum of log probabilities = -0.3849

If we have two Markov models, and one test string, then we can select the best model as being the one that maximizes the likelihood of the string under the model – i.e. we choose the model with the higher average log likelihood. Notice that the example above lists ngrams from that most in agreement with the model (highest log likelihood) to the least likely. This helps us to explain why a particular test string matches a model or not.

You will need to write code in the ModelMatcher class that takes as input one Markov model and a “test string” for which we want to identify how well it matches the given model. Complete the ModelMatcher class as follows. You may wish to complete sub-task (c) before sub-task (b).

Fill in the @author and @version fields in the header comments with all authors’ names, student numbers and the date.

Complete the constructor, which calculates and sets the fields in the class. Getter methods for these fields are already complete.

Complete the totalLogLikelihood method, a private helper method which sums a map containing log likelihoods for strings, and the averageLogLikelihood helper method, which calculates the average.

Complete the toString method for the ModelMatcher class. This method should give a summary of the results. You may choose how to arrange the string, but it should include at least the name, averageLogLikelihood and a table of the loglikelihood probabilities as shown in the example above.

Ensure you complete Javadoc comments for each method and class including @param, @returns and @throws fields as required.

import java.util.HashMap;
import java.util.Collection;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Map;
import java.util.Set;

public class ModelMatcher
{

    /** log likelihoods for a teststring under a given model */
    private HashMap<String,Double> logLikelihoodMap;
    /** summary statistic for this setting */
    private double averageLogLikelihood;

    /**
     * Constructor to initialise the fields for the log likelihood map for
     * a test string and a given Markov model and
     * the average log likelihood summary statistic
     * @param MarkovModel model a given Markov model object
     * @param String teststring
     */
    public ModelMatcher(MarkovModel model, String testString) {

int modelOrder = model.getK();
this.logLikelihoodMap = new HashMap<String, Double>();

        double laplaceEstimate;
        double logLikelihood;
        String sequence;

NgramAnalyser stringNgram = new NgramAnalyser(modelOrder + 1,testString);

Set<String> distinctNgrams = stringNgram.getDistinctNgrams();

        for (String ngram : distinctNgrams) {
            laplaceEstimate = model.laplaceEstimate(ngram);
            //Use change of base formula to find log(10) likelihood
            logLikelihood = Math.log(laplaceEstimate) / Math.log(10);

this.logLikelihoodMap.put(ngram, logLikelihood);
}

this.averageLogLikelihood =
this.averageLogLikelihood(this.logLikelihoodMap, stringNgram.getNgramCount());

}

    /** Helper method that calculates the average log likelihood statistic
     * given a HashMap of strings and their Laplace probabilities
     * and the total number of ngrams in the model.
     *
     * @param logs map of ngram strings and their log likelihood
     * @param ngramCount int number of ngrams in the original test string
     * @return average log likelihood: the total of loglikelihoods
     *    divided by the ngramCount
     */
    private double averageLogLikelihood(HashMap<String,Double> logs, int ngramCount) {

double logSum = this.totalLogLikelihood(logs);

        return (logSum / ngramCount);
    }

    /** Helper method to calculate the total log likelihood statistic
     * given a HashMap of strings and their Laplace probabilities
     * and the total number of ngrams in the model.
     *
     * @param logs map of ngram strings and their log likelihood
     * @return total log likelihood: the sum of loglikelihoods in logs
     */
    private double totalLogLikelihood(HashMap<String,Double> logs) {
        double logSum = 0;

        for (Map.Entry<String, Double> entry : logs.entrySet()) {
            logSum += entry.getValue();
        }

return (logSum);
}

    /**
     * @return the average log likelihood statistic
     */
    public double getAverageLogLikelihood() {
        return averageLogLikelihood;
    }

    /**
     * @return the log likelihood value for a given ngram from the input string
     */
    public double getLogLikelihood(String ngram) {
        return (logLikelihoodMap.get(ngram));
    }


    /**
     * Make a String summarising the log likelihood map and its statistics
     * @return Header lines containing the testString, the averageLogLikelihood
     *         and a sorted table of ngrams and their loglikelihoods
     * The likelihood table should be ordered from highest to lowest likelihood
     */
    public String toString() {
       String returnString = "";

returnString = returnString + Double.toString(this.averageLogLikelihood);
returnString = returnString + NgramAnalyser.printHashMap(this.logLikelihoodMap);

return returnString;
}

}

ProjectTest.java

import static org.junit.Assert.*;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.util.Set;

public class ProjectTest
{
    /**
     * Default constructor for test class ProjectTest
     */
    public ProjectTest() {

}

    /**
     * Sets up the test fixture.
     *
     * Called before every test case method.
     */
    @Before
    public void setUp() {
    }

    /**
     * Tears down the test fixture.
     *
     * Called after every test case method.
     */
    @After
    public void tearDown() {

    }

    //TODO add new test cases from here include brief documentation

    /**
     * Test if the number of lines in a string output from Ngram.toString()
     * is valid (i.e equal to the size of the alphabet of that Ngram)
     * Also ensures that the sort, splice and constructor functions work
     * as required to produce the required comparison
     */
    @Test(timeout=1000)
    public void testSensibleToStringSize() {

        String[] stringsToTest = {"Hello my friend",
        "be",
        "Have a nice day you filthy animal",
        "asdfghjkl$$sdfghj%%",
        "2",
        "adadadadaaaaa",
        "    "};

Integer[] ngramSizesToTest = {1, 2, 3, 4, 5};

NgramAnalyser analysis;
String analysisString;

int i = ngramSizesToTest[0];
String s = stringsToTest[5];

        if (i > s.length()) {
            try {
                analysis = new NgramAnalyser(i, s);
            } catch (IllegalArgumentException e) {
                assertEquals(0, 0);
            }
        } else {
            analysis = new NgramAnalyser(i, s);
            analysisString = analysis.toString();

            //Number of lines is equal to the number of 's plus 1
            int numberofLines = analysisString.length() -
                                analysisString.replace(" ", "").length() + 1;

assert(numberofLines >= analysis.getAlphabetSize());

}
}

    /**
     * Tests various aspects of the getDistinctNgrams function
     * inlcuding set length, ...
     */

    @Test(timeout=1000)
    public void testGetDistinctNgrams() {
         String[] stringsToTest = {
            "123!@#123!@#",
            "adadadadadadadad",
            "cadadcdaadcdbed",
            "aaaaaa",
            "HOWWEYVUFXBINEF"
         };

         String stringToTest = stringsToTest[0];
         int ngramSize = 2;
         NgramAnalyser analysis = new NgramAnalyser(ngramSize, stringToTest);

         Set<String> distinctNgrams = analysis.getDistinctNgrams();
         int distinctNgramCount = analysis.getDistinctNgramCount();
         int totalNgramCount = analysis.getNgramCount();

//Test that there are fewer or equal distinct Ngrams than total Ngrams
assert(distinctNgramCount <= totalNgramCount);

         //Test that the alphabet size is smaller than
         //or equal to the number of distinct NGrams
         assert(analysis.getAlphabetSize() <= distinctNgramCount);

    }

    @Test(timeout=1000)
    public void testLaplaceExample() {
        assertEquals(0,1); //TODO replace with test code
    }

    @Test(timeout=1000)
    public void testSimpleExample() {
        assertEquals(0,1); //TODO replace with test code
    }

    @Test
    public void testTask3example()
    {
        MarkovModel model = new MarkovModel(2,"aabcabaacaac");
        ModelMatcher match = new ModelMatcher(model,"aabbcaac");
        assertEquals(0,1); //TODO replace with test code
    }

Characters and contexts in test string "aabbcaac" Context Character Frequency "aa" 'b' 1 "ab" 'b' 1 "bb" 'c' 1 "bc" 'a' 1 "ca" 'a' 2 "aa" 'c' 1 "ac" 'a' 1

Explanation / Answer

ProjectTest.java

import static org.junit.Assert.*;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.util.Set;

public class ProjectTest
{
    /**
     * Default constructor for test class ProjectTest
     */
    public ProjectTest() {

}

    /**
     * Sets up the test fixture.
     *
     * Called before every test case method.
     */
    @Before
    public void setUp() {
    }

Integer[] ngramSizesToTest = {1, 2, 3, 4, 5};

NgramAnalyser analysis;
String analysisString;

int i = ngramSizesToTest[0];
String s = stringsToTest[5];

            //Number of lines is equal to the number of 's plus 1
            int numberofLines = analysisString.length() -
                                analysisString.replace(" ", "").length() + 1;

assert(numberofLines >= analysis.getAlphabetSize());

}
}

    /**
     * Tests various aspects of the getDistinctNgrams function
     * inlcuding set length with comparison to basic boundaries
     */

    @Test(timeout=1000)
    public void testGetDistinctNgrams() {
         String[] stringsToTest = {
            "123!@#123!@#",
            "adadadadadadadad",
            "cadadcdaadcdbed",
            "aaaaaa",
            "HOWWEYVUFXBINEF"
         };

         String stringToTest = stringsToTest[0];
         int ngramSize = 2;
         NgramAnalyser analysis = new NgramAnalyser(ngramSize, stringToTest);

         Set<String> distinctNgrams = analysis.getDistinctNgrams();
         int distinctNgramCount = analysis.getDistinctNgramCount();
         int totalNgramCount = analysis.getNgramCount();

//Test that there are fewer or equal distinct Ngrams than total Ngrams
assert(distinctNgramCount <= totalNgramCount);

         //Test that there are fewer or equal distinct Ngrams than the size
         //of the analysed string
         assert(distinctNgramCount <= stringToTest.length());

         //Test that the alphabet size is smaller than
         //or equal to the number of distinct NGrams
         assert(analysis.getAlphabetSize() <= distinctNgramCount);

}

    /**
     * Tests the NgramAnalyser function for more complicated and longer ngrams
     *
     */
    @Test(timeout=1000)
    public void testNgramAnalyser() {

        String stringToTest = "baaaaaaaaaamsdbfajeduhgtraaaab";
        int ngramSize = 16;
        NgramAnalyser analysis = new NgramAnalyser(ngramSize, stringToTest);

        //Test toString method
        String toString = analysis.toString();
        //System.out.println(toString); //REMOVE BEFORE SUBMITTING!!!!!

//Test that ngramCount = length of the string
assert(analysis.getNgramCount() == stringToTest.length());

    @Test
    public void testTask3example()
    {
        MarkovModel model = new MarkovModel(2,"aabcabaacaac");
        ModelMatcher match = new ModelMatcher(model,"aabbcaac");
        assertEquals(0,1); //TODO replace with test code
    }
}

MarkovModel.java

public class MarkovModel
{

   /** Markov model order parameter */
   int k;
   /** ngram model of order k */
   NgramAnalyser ngram;
   /** ngram model of order k+1 */
   NgramAnalyser n1gram;

   /**
   * Construct an order-k Markov model from string s
   * @param k int order of the Markov model
   * @param s String input to be modelled
   */
   public MarkovModel(int k, String s)
   {
       ngram = new NgramAnalyser(k, s);
       n1gram = new NgramAnalyser((k+1), s);
   }

   /**
   * @return order of this Markov model
   */
   public int getK()
   {
       return k;
   }

   /** Estimate the probability of a sequence appearing in the text
   * using simple estimate of freq seq / frequency front(seq).
   * @param sequence String of length k+1
   * @return double probability of the last letter occurring in the
   * context of the first ones or 0 if front(seq) does not occur.
   */
   public double simpleEstimate(String sequence) {
       double prob;
       String seqNotLast = sequence.substring(0, sequence.length()-1);

       if (ngram.getDistinctNgrams().contains(seqNotLast))
       {
           double n1g = n1gram.getNgramFrequency(sequence);
           double ng = ngram.getNgramFrequency(seqNotLast);
           try{
               prob = (n1g/ng);
           }
           catch(ArithmeticException e){
               return 0.0;
           }
           return prob;
       }
       else
       {
           return 0.0;
       }
   }
   /**
   * Calculate the Laplacian probability of string obs given this Markov model
   * @input sequence String of length k+1
   * @return Laplacian Probability
   */
   public double laplaceEstimate(String sequence)
   {
       //TODO replace this line with your code
       String context = sequence.substring(0, sequence.length()-1);
       double npc = n1gram.getNgramFrequency(sequence);
       double np = ngram.getNgramFrequency(context);
       double laplace;
       laplace = (npc + 1)/(np + ngram.getAlphabetSize());
       return laplace;
   }

   /**
   * @return String representing this Markov model
   */
   public String toString()
   {
       //TODO replace this line with your code
       String toRet = "";
       String k = Integer.toString(getK());
       toRet += (k + " ");
       toRet += (Integer.toString(ngram.getAlphabetSize()) + " ");
       toRet += ngram.toString() + n1gram.toString();
       return toRet;
   }

}

ModelMatcher.java

import java.util.HashMap;
import java.util.Collection;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.SortedSet;
import java.util.TreeSet;

public class ModelMatcher
{

   /** log likelihoods for a teststring under a given model */
   private HashMap<String,Double> logLikelihoodMap;
   /** summary statistic for this setting */
   private double averageLogLikelihood;

   /**
   * Constructor to initialise the fields for the log likelihood map for
   * a test string and a given Markov model and
   * the average log likelihood summary statistic
   * @param MarkovModel model a given Markov model object
   * @param String teststring the string to check compatability with the model
   */
   public ModelMatcher(MarkovModel model, String testString)
   {
       logLikelihoodMap = new HashMap<String, Double>();
       int kVal = model.getK();
       String seq;
       for (int curPos = 0; (curPos + kVal) < testString.length(); curPos++)
       {
           if (curPos + (kVal + 1) <= testString.length())
           {
               seq = testString.substring(curPos, (curPos + kVal + 1));
           }
           else
           {
               String fromEnd = testString.substring(curPos, (testString.length()));
               String fromStart = testString.substring(0, ((kVal + 1) - fromEnd.length()));
               seq = fromEnd + fromStart;
           }

           // seq should be of length k+1
           // context should be the first k chars of seq
           String context = seq.substring(0, kVal);
           // impChar should be the last character of seq
           String impChar = seq.substring(kVal, seq.length());

           double loggedProb = Math.log10(model.laplaceEstimate(seq));
           if (logLikelihoodMap.containsKey(seq))
           {
               logLikelihoodMap.put(seq, (logLikelihoodMap.get(seq) + loggedProb));
           }
           else
           {
               logLikelihoodMap.put(seq, loggedProb);
           }
       }
       averageLogLikelihood = averageLogLikelihood(logLikelihoodMap, testString.length());

}

   /** Helper method that calculates the average log likelihood statistic
   * given a HashMap of strings and their Laplace probabilities
   * and the total number of ngrams in the model.
   *
   * @param logs map of ngram strings and their log likelihood
   * @param ngramCount int number of ngrams in the original test string
   * @return average log likelihood: the total of loglikelihoods
   *    divided by the ngramCount
   */
   private double averageLogLikelihood(HashMap<String,Double> logs, int ngramCount)
   {
       double totalLogs = 0.0;
       for (String i : logs.keySet())
       {
           totalLogs += logs.get(i);
       }
       double avg = totalLogs/ngramCount;
       return avg;
   }

   /** Helper method to calculate the total log likelihood statistic
   * given a HashMap of strings and their Laplace probabilities
   * and the total number of ngrams in the model.
   *
   * @param logs map of ngram strings and their log likelihood
   * @return total log likelihood: the sum of loglikelihoods in logs
   */
   private double totalLogLikelihood(HashMap<String,Double> logs)
   {
       double totalLogs = 0.0;
       for (String i : logs.keySet())
       {
           totalLogs += logs.get(i);
       }
       return totalLogs;
   }

   /**
   * @return the average log likelihood statistic
   */
   public double getAverageLogLikelihood()
   {
       return averageLogLikelihood;
   }

   /**
   * @param ngram String the ngram to find the log likelihood of.
   * @return the log likelihood value for a given ngram from the input string
   */
   public double getLogLikelihood(String ngram)
   {
       return (logLikelihoodMap.get(ngram));
   }

   /**
   * Make a String summarising the log likelihood map and its statistics
   * @return String of ngrams and their loglikeihood differences between the models
   * The likelihood table should be ordered from highest to lowest likelihood
   */
   public String toString()
   {
       SortedSet<String> keysArray = new TreeSet<String>(logLikelihoodMap.keySet());
       String toRet = "";
       for (String key : keysArray)
       {
           double logLike = logLikelihoodMap.get(key);
           String logLikeS = Double.toString(logLike);
           String thisKey = (key + "     " + logLikeS + " ");
           toRet += thisKey;
       }
       System.out.println(toRet);
       return toRet;
   }

}

Navigate

This is the test class code that I need help with: @Test public void testTask3ex

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

This is the test class code that I need help with: @Test public void testTask3ex

Question

Explanation / Answer

Related Questions

Navigate