Hi can anyone give me some advice for my Data Structures final? This is the syno
ID: 673800 • Letter: H
Question
Hi can anyone give me some advice for my Data Structures final? This is the synopsis:
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Of course one of the most important things computers do is find things very quickly. All of the topics (lists, sorting, graphs, hashes, etc..) are designed to make finding information very quick. This final assignment is NOT a programming assignment. Each class you take should have at least one writing assignment (and this is the writing assignment for this class). You will be doing some web research and a great place to start is Topic - https://cop3530.pbworks.com/w/page/96946899/Topic%20-%20Search%20techniques
You will be doing a bit of research for a web company. They want to be able to quickly find what users of an app have visited a geographical location within a certain period of time. The App has the user information and periodically polls and stores the location of the user using the GPS function of the phone. The App collects and stores millions of pieces of information an hour - so search efficiency is extremely important.
The company wants you to design a storage scheme for the data that will allow you to quickly and easily determine if a user has visited a specific store or location, how often, and for how long. You must determine how you will store and retrieve the data and what types of data structure and organizational scheme you will use for the data.
You will write a one page paper covering the "plan" for the data storage and search techniques that you will use to ensure that you have incredibly fast retrieval of the information for the proposed application. Please save you paper in PDF format before you submit. (be sure to back up your plan with web links and resources - so you will want to do some research. Also you will have to develop an algorithm to effectively determine if someone has actually visited a location - so you will have to think about that too.
----------------------------------------------------------------------------------------------------------------------------------------------------------------
I'm honestly not sure the best method to go about this scenario. What would be the best way to store and retrieve the geo-locational app data? Is there a specific algorithm that works best and how can it be implemented? Thank you in advance!
Explanation / Answer
>In the emerging big data ecosystem, storage providers offer the infrastructure on which all analytic tools run, and by far the most common system for storing and batch processing enterprise big data is HDFS, the Hadoop Distributed File System.
>“Hadoop is a unifying element for people using big data because it is a standard to store and retrieve large data sets. It is like a big data parking lot,” said Abe Usher, chief innovation officer of the HumanGEO Group, where he works with defense and intelligence agencies.
>Hadoop is an open-source framework that breaks up large data sets and distributes the processing work across a cluster of servers. Once the data is loaded into the cluster, a user queries the data with the MapReduce framework, which “maps” the query to the proper node where it is processed then “reduces” the results from the queries on the distributed machines to one answer. Commercial versions of Hadoop from companies such as Cloudera, Hortonworks and IBM are available
>Data integration and retrieval
>Traditional relational databases weren’t designed to cope with the variety, velocity and volume of unstructured data coming from audio devices, machine-to-machine communications, cell phones, sensors, social media platforms and video. Instead, NoSQL databases are built to write data much faster than an RDBMS and deliver fast query speeds across large volumes. They are distributed tools that manage unstructured and semi-structured data that requires frequent access. Some examples include:
>MongoDB leverages in-memory computing and is built for scalability, performance and high availability, scaling from single-server deployments to large, complex multisite architectures.
>Apache Cassandra handles big data workloads across multiple data centers with no single point of failure, providing enterprises with high database performance and availability.
>Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's BigTable data storage system. Other NoSQL systems were built with government security sensitivities in mind. MarkLogic’s enterprise-grade platform can integrate diverse data from legacy databases, open-source technologies and Web information sources. The government-grade security NoSQL database has been used for fraud detection, risk analysis and vendor and bid management.
>And in 2008, the National Security Agency created Accumulo and contributed it to the Apache Foundation as an incubator project in September 2011. Because it includes cell-level security, the tool can restrict users’ access to only particular fields of the database. This enables data of various security levels to be stored within the same row, and users of varying degrees of access to query the same table, while preserving data confidentiality. According to the NSA, hundreds of developers are currently using Accumulo.
>Extraction, transformation and loading tools
>Extraction, Transformation and Loading (ETL) processes are critical components for migrating data from one database to another or for feeding a data warehouse or business intelligence system. An ETL tool retrieves data from all operational systems and prepares it for further analysis by reformatting, cleaning, mapping and standardizing it. As ETL tools mature, they increasingly support integration with Hadoop. Talend provides traditional ETL capabilities but also simplifies big data integration. The company’s Open Studio for Big Data offers a unified open-source environment that simplifies the loading, extraction, transformation and processing of large and diverse data sets. Pentaho’s enterprise Kettle ETL engine – called Pentaho Data Integration – consists of a core data integration engine and GUI applications that allow the user to define data integration jobs and transformations
>Universal information access is another emerging area of big data that combines elements of database and search technologies, giving users a single point of access to all data, regardless of source, format or location. UIA offers the reporting and visualization features commonly found in business intelligence applications
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.