I need help Implementing assignment using c++. Assignement: Purpose To explore a
ID: 3918263 • Letter: I
Question
I need help Implementing assignment using c++.
Assignement:
Purpose
To explore a meaningful use of complex binary trees
To implement a real algorithm still in use today
Background and Instructions
Lossless data compression remains a key interest in computer science. The goal is to represent data using fewer bits than the human-readable data normally requires, as compressed data is more easily stored and transmitted. Decompression can later restore the data to its original state.
It is almost a guarantee that you have used the ZIP archive file format at some point in your life; it is natively supported in Microsoft Windows, Mac OS X, and most free operating systems. It's quite possible you have used it earlier today. Although the file format accepts many compression algorithms, the most common is DEFLATE. This algorithm uses the LZ77 algorithm and Huffman coding to compress data. Huffman coding creates a prefix code for each character based on the frequency of the character, with the most used characters receiving the shortest codes.
For this assignment, you will implement Huffman coding. You can learn more about the process here, but be warned that the algorithm has some trivial ambiguities that are resolved below.
Huffman Coding Algorithm
The algorithm begins by accepting an input string and counting the number of times each character is used. Each character-count pair is stored in a leaf node.
Add these leaf nodes to a priority queue, giving lower counts higher priorities and removing the highest priority node first.
If two leaf nodes have equal count, the alphabetically smaller character has lower priority.
If a leaf node and a compound node (see below) have equal count, the leaf node has lower priority.
If two compound nodes have equal count, the alphabetically smaller compound node has lower priority.
Remove the first two nodes from the priority queue, combine them into a compound node, and add the new node back into the priority queue. Repeat this process until only one node is left.
A compound node has the following properties:
Its right child is the first node removed from the priority queue
Its left child is the second node removed from the priority queue
Its count is equal to the sum of the counts of its two children
Its "character" is a concatenation of its left child's character and its right child's character (compound nodes store a string representing the characters its descendent leaves store)
The single remaining compound node in the priority queue is the root of the Huffman tree and is ready for encoding and decoding messages.
To encode a message, encode each character individually and concatenate the results together. For each character, follow the path from the root of the Huffman tree to the leaf node storing that character; record each "left turn" as a 0 and each "right turn" as a 1. The character's encoding is the resulting bitstream.
If the message you are asked to encode contains a character not found in your tree, return an empty string.
If asked to encode the empty string, return an empty string.
To decode a message, begin at the root of the Huffman tree. For each 0 or 1 in the encoded message, move down the tree left or right, respectively. When a leaf node is reached, record the character it is storing and return to the root of the tree. Continue this process until you reach the end of the encoded message; the last 0 or 1 in the encoded message should take you to the leaf node storing the last character in the decoded message.
If the message you are asked to decode ends at a compound node instead of a leaf node, return an empty string.
If asked to decode the empty string or any message containing characters other than 0 and 1, return an empty string.
Requirement Notes
As stated above, there are a number of trivial ambiguities in Huffman coding, but in order to facilitate automated testing, you'll need to adhere to the algorithm outlined above.
You must make your own priority queue using a maximum heap. This can be tree-based or array/vector-based.
"Alphabetically smaller" for this assignment means using the < operator on strings.
As an example, the input string "Secret" produces the following encodings:
S = 000
c = 001
e = 1
r = 010
t = 011
IMPORTANT NOTES FROM PROFESSOR:
- use array
- Must build own priority queue. STL queue not allowed.
- You can use a map class to burn a tree (whenever you fine a lead node delete.)
- SPACE complexity is what we are worried about. The program will not be graded on run time efficiency.
- Manage memory correctly. Do not have memory leaks!
- Capital letters have highest priority of all
- no main needed - only use for your testing purposes.
// steps for program
step 1: Priority Queue, burn when tree is built. Step 2: tree builds code to map, burn tree when finished. Step 3: map
functions given by professor that can't be changed:
Coder(string sample_text);
string encode(string message);
string decode(string encoded_message);
coder.h
#pragma once
#include <iostream>
using namespace std;
class Coder
{
public:
/**
* Constructor; uses the provided sample text as an input string to create
* a Huffman tree, which is used for encoding and decoding messages. You may
* assume the given sample text will contain at least two distinct characters.
*
* See the lab specs for full details.
*/
Coder(string sample_text);
/**
* Encodes the given string based on the Huffman tree created from the sample text
* (provided in the constructor of this class).
*
* See the lab specs for some details on this method's operations.
*/
string encode(string message);
/**
* Decodes the given string based on the Huffman tree created from the sample text
* (provided in the constructor of this class).
*
* See the lab specs for some details on this method's operations.
*/
string decode(string encoded_message);
};
Explanation / Answer
//Node.h
#pragma once
#include<iostream>
class Node {
public:
char data;
int freq;
Node *left, *right;
};
//Heap.h
#pragma once
#include "Node.h"
class Heap {
public:
int curSize;
int capacity;
Node** ar;
Node* newNode(char data, int freq);
void addNode(Heap* PQ, Node* n);
void swapNode(Node** n1, Node** n2);
void heapify(Heap* PQ, int idx);
Node* extractMin(Heap* PQ);
Heap* initPQ(int capacity);
void buildPQ(Heap* PQ);
Heap* createPQ(char data[], int freq[], int x);
Node* makeTree(char data[], int freq[], int sz);
void getCode(Node* root, int arr[], int top);
void printTree(Node* n);
};
//Heap.cpp
#include "Heap.h"
#include <iostream>
Node* Heap::newNode(char ch, int f) {
Node* temp = (Node*)malloc(sizeof(Node));
temp->left = temp->right = NULL;
temp->data = ch;
temp->freq = f;
return temp;
};
void Heap::addNode(Heap* PQ, Node* n) {
PQ->curSize = PQ->curSize + 1;
int i = PQ->curSize - 1;
while (i && n->freq < PQ->ar[(i - 1) / 2]->freq) {
PQ->ar[i] = PQ->ar[(i - 1) / 2];
i = (i - 1) / 2;
}
PQ->ar[i] = n;
}
void Heap::swapNode(Node** n1, Node** n2) {
Node* temp = *n1;
*n1 = *n2;
*n2 = temp;
};
void Heap::heapify(Heap* PQ, int idx) {
int smallest = idx;
int left = 2 * idx + 1;
int right = 2 * idx + 2;
if ((left < PQ->curSize) && (PQ->ar[left]->freq < PQ->ar[smallest]->freq))
smallest = left;
if ((right < PQ->curSize) && (PQ->ar[right]->freq < PQ->ar[smallest]->freq))
smallest = right;
if (smallest != idx) {
swapNode(&PQ->ar[smallest], &PQ->ar[idx]);
heapify(PQ, smallest);
}
};
Node* Heap::extractMin(Heap* PQ) {
Node* temp = PQ->ar[0];
PQ->ar[0] = PQ->ar[PQ->curSize - 1];
PQ->curSize = PQ->curSize-1;
heapify(PQ, 0);
return temp;
};
Heap* Heap::initPQ(int c) {
Heap* PQ = (Heap*)malloc(sizeof(Heap));
PQ->curSize = 0;
PQ->capacity = c;
PQ->ar = (Node**)malloc(PQ->capacity * sizeof(Node*));
return PQ;
};
void Heap::buildPQ(Heap* PQ) {
int n = PQ->curSize - 1;
for (int i = (n - 1) / 2; i >= 0; --i)
heapify(PQ, i);
};
Heap* Heap::createPQ(char data[], int freq[], int sz) {
Heap* PQ = initPQ(sz);
for (int i = 0; i < sz; ++i)
PQ->ar[i] = newNode(data[i], freq[i]);
PQ->curSize = sz;
buildPQ(PQ);
return PQ;
};
Node* Heap::makeTree(char data[], int freq[], int sz) {
Node *left, *right, *top;
Heap* PQ = createPQ(data, freq, sz);
while (PQ->curSize != 1) {
left = extractMin(PQ);
right = extractMin(PQ);
top = newNode('$', left->freq + right->freq);
top->left = left;
top->right = right;
addNode(PQ, top);
}
return extractMin(PQ);
};
void Heap::getCode(Node* root, int arr[], int top) {
//0 for left edge
if (root->left) {
arr[top] = 0;
code(root->left, arr, top + 1);
}
//1 for right edge
if (root->right) {
arr[top] = 1;
code(root->right, arr, top + 1);
}
//leaf
if (!(root->left) && !(root->right)) {
printf("%c: ", root->data);
for (int i = 0; i < top; ++i)
printf("%d", arr[i]);
printf(" ");
}
};
void Heap::printTree(Node* n) {
int arr[1000];
getCode(n, arr, 0);
};
//Coder.cpp
#include "Coder.h";
#include "Heap.h";
#include<string>
Coder::Coder(string s) {
char charar[52];
int freqar[52];
for(int i = 0; i < 26; i++)
charar[i] = (char)(65+i);
for(int i = 26; i < 52; i++)
charar[i] = (char)(97+i);
for(int i = 0; i < 52; i++)
freqar[i] = 0;
for(int i = 0; i < s.length(); i++) {
if(s[i] >= 65 && s[i] <= 90)
freqar[s[i]-65]++;
else if(s[i] >= 97)
freqar[26+s[i]-97]++;
}
Heap pq;
pq.printTree(charar, freqar, 52);
};
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.