READ THE INSTRUCTIONS CAREFULLY BEFORE STARTING There are lots of real world app
ID: 3842617 • Letter: R
Question
READ THE INSTRUCTIONS CAREFULLY BEFORE STARTING
There are lots of real world applications that require efficient storage and queries of spatial data (points, lines, rectangles...etc). In this assignment you'll be implementing a data structure that stores 2D points in a way that supports efficient "what's nearby" queries. For example, if the points represent restaurants, this data structure could quickly answer the query "which restaurants are within walking distance."
The data structure is a binary tree that is similar to the binary search tree we discussed in class. The reason we can't just use a binary search tree is that there are many ways to compare two points. Should we order them by x? order them by y? order them by some mix?
Our data structure will have two types of nodes that differ by how they compare points (implemented as a single node class with a boolean variable indicating what type it is). x-nodes compare points by their x coordinate, while y-nodes compare nodes by their y coordinate. An x-node's children must be y-nodes, and a y-node's children must be x-nodes.
With these types of nodes, we can define the modified binary-search-property for our data structure:
For every x-node, all descendants in the left subtree are to the left (geometrically) of the point stored at the node, and all descendants in the right subtree are to the right of the point stored at the node.
For every y-node, all descendants in the left subtree have a smaller y coordinate than the point stored at the node (using the swing coordinate system they are actually "above" visually), and all descendants in the right subtree have a greater y value than the point stored at the node.
This property allows us to efficiently perform queries asking "which points are within a given radius of a target."
You need to implement the following:
A SpatialTree class (which will have an inner class for storing the nodes of the tree). It should contain the following:
A constructor that takes an ArrayList of Point objects (use the Swing Point class). It should construct a tree using the points using the algorithm described below.
A draw method. This should draw all the points stored in the tree, as well as the split lines stored at each node. Each node divides the points stored in its descendants into two groups, draw this dividing line. Hint: when drawing a node it will be helpful to the rectangular region that the node and its descendants occupy. This region will basically be one half of it's parent's region (and depends on where it's parent is)
A query method that takes a query point and a radius. It should return an ArrayList containing all points within a radius of the query point. This method is the purpose of creating this data structure. You should only explore nodes that might contain nodes near the query point. Draw a picture to determine how you can tell whether or not a node needs to be explored.
We'll be using Swing to draw our tree and the queried points.
You'll need to create a Canvas class that inherits from JPanel and:
draws your tree using the draw method described above
calls the query method based on mouse input. Specifically, when you click, drag, and release, the program should query your tree for points that are near the click location. The radius of the query should be the distance between the mouse down and mouse up points (Point has a distance() method that will be useful here). After the query is completed, draw the selected points in a different color (and probably larger) to visualize the results.
Constructing your tree
The tree is constructed top-down in a recursive manner. To create a node, you must pass an ArrayList of points. Store the first element of the array in the node, then partition the points into two arrayLists to satisfy the property described above. Recursively create the children nodes with the two ArrayLists you just created. Note that this means that the Node constructor is a recursive method that builds the SpatialTree using the ArrayList of points. When constructing the tree, just construct the root and pass the arraylist of points and at the end the entire tree is built.
Test cases
Test your code with the following inputs:
Random points in the window
The points (0,0), (1,1), (2,2), ... in order
A random set of points that you sort by x-coordinate. You'll need to use the arrayList sort method which takes a Comparator object saying how to compare points. Your Comparator object should compare the x coordinates
For each test input, try performing similar queries (draw circles that are approximately the same size), and print how many nodes are visited during the query (you'll probably want to add another private member variable to help keep track of this).
output:
Explanation / Answer
1.
Summary of the article:
Big Data is extended in all domains such as biological, physical and biomedical sciences as it is concern with large-volume, complex, growing data sets with multiple and autonomous sources. HACE Theorem is developed in this article to discuss all these concerns and describe how the Big Data processing framework provides data accessing, data privacy and domain knowledge in three tiers (Tier I, Tier II and Tier III). The tiers are clearly explained with many examples and finally HACE Theorem suggest the key characteristics of Big Data which are, it can be able to handle the heterogeneous and diverse data sources, autonomous distributed and decentralized control, complex evolving data. With these characteristics, Big Data able to provide accurate results in real time for all engineering and sciences.
2.
5 lessons learnt from reading this article:
i. Big Data processing require information which comes from multiple, heterogeneous, autonomous sources with complex and evolving relationships, and such information keeps growing.
ii. Online discussions is being used as a new means to sense the public interests and generate feedback in real-time, and are mostly appealing compared to generic media, such as radio or TV broadcasting.
iii. There is unprecedented data volumes which require an effective data analysis and prediction platform to achieve fast-response and real-time classification for such Big Data.
iv. Big Data is diverse and heterogeneous in nature because different information collectors use their own schemata for data recording, and the nature of different applications also results in diverse representations of the data.
v. Main characteristic of Big Data applications Autonomous data sources with distributed and decentralized controls.
3.
The concerns of Big Data are:
i. large-volume,
ii. Complexity,
iii. Growing data sets with multiple, autonomous sources
iv. Sensitive information
v. privacy
4.
This article was written to address the above concerns.
5.
Big Data characteristics using the HACE Theorem are:
i. large-volume,
ii. heterogeneous,
iii. Autonomous sources with distributed and decentralized control,
iv. Seeks to explore complex and evolving relationships among data.
6.
Fundamental Challenges in Big Data Mining:
7.
Blind Men and Giant Elephant:
Blind men need to draw a picture of Giant elephant. Blind man can’t see the giant elephant but he tries to draw the picture based on the information collected from the others up to limited extend.
The below points are challenging for blind man:
Therefore, blind men constantly collect the information from others and extract the knowledge of interest to draw the picture of giant elephant.
Big Data mining is similar way of collecting and exploring large volumes of data and extract the useful information.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.