Case Study Can We Trust Big Data? Today’s companies are dealing with an avalanch
ID: 400117 • Letter: C
Question
Case Study
Can We Trust Big Data?
Today’s companies are dealing with an avalanche of data from social media, search, and sensors as well as from traditional sources. According to one estimate, 2.5 quintillion bytes of data per day are generated around the world. Making sense of “big data” to improve decision making and business performance has become one of the primary opportunities for organizations of all shapes and sizes, but it also represents big challenges.
Big data helps streaming music service Spotify create a service that feels personal to each of its 75 million global users. Spotify uses the big data it collects on user listening habits (more than 600 gigabytes daily) to design highly individualized products that captivate its users around a particular mood or moment in time rather than offering the same tired genres. Users can constantly enhance their listening experience with data-driven features such as the Discovery tool for new music, a Running tool that curates music timed to the beat of their workout, and Taste Rewind—which tells what they would have listened to in the past by analyzing what they listen to now. By constantly using big data to fine-tune its services, Spotify hopes to create the perfect user experience.
A number of services have emerged to analyze big data to help consumers. There are now online services to enable consumers to check thousands of different flight and hotel options and book their own reservations, tasks previously handled by travel agents. For instance, a mobile app from Skyscanner shows deals from all over the web in one list—sorted by price, duration, or airline—so travelers don’t have to scour multiple sites to book within their budget. Skyscanner uses information from more than 300 airlines, travel agents, and timetables and shapes the data into at-a-glance formats with algorithms to keep pricing current and make predictions about who will have the best deal for a given market.
Big data is also providing benefits in law enforcement (see this chapter’s Interactive Session on Organizations), sports, education, science, and healthcare. A recent McKinsey Global Institute report estimated that the U.S. healthcare system could save $300 billion each year—$1,000 per American—through better integration and analysis of the data produced by everything from clinical trials to health insurance transactions to “smart” running shoes. Healthcare companies are currently analyzing big data to determine the most effective and economical treatments for chronic illnesses and common diseases and provide personalized care recommendations to patients.
There are limits to using big data. A number of companies have rushed to start big data projects without first establishing a business goal for this new information. Swimming in numbers and other data doesn’t necessarily mean that the right information is being collected or that people will make smarter decisions.
Experts in big data analysis believe too many companies, seduced by the promise of big data, jump into big data projects with nothing to show for their efforts. They start amassing and analyzing mountains of data without no clear objective or understanding of exactly how analyzing big data will achieve their goal or what questions they are trying to answer. Darian Shirzai, the founder of Radius Intelligence Inc., likens this to haystacks without needles. Companies don’t know what they’re looking for because they think big data alone will solve their problem.
According to Michael Walker of Rose Business Technologies, which helps companies build big data systems, a significant majority of big data projects aren’t producing any valuable, actionable results. A recent report from Gartner, Inc. stated that through 2017, 60 percent of big data projects will fail to go beyond piloting and experimentation and will eventually be abandoned. This is especially true for very large-scale big data projects. Companies are often better off starting with smaller projects with narrower goals.
Hadoop has emerged as a major technology for handling big data because it allows distributed processing of large unstructured as well as structured data sets across clusters of inexpensive computers. However, Hadoop is not easy to use, requires a considerable learning curve, and does not always work well for all corporate big data tasks. For example, when Bank of New York Mellon used Hadoop to locate glitches in a trading system, Hadoop worked well on a small scale, but it slowed to a crawl when many employees tried to access it at once. Very few of the company’s 13,000 IT specialists had the expertise to troubleshoot this problem. David Gleason, the bank’s chief data officer at the time, said he liked Hadoop but felt it still wasn’t ready for prime time. According to Gartner, Inc. research director for information management Neil Heudecker, technology originally built to index the web may not be sufficient for corporate big data tasks.
It often takes a lot of work for a company to combine data stored in legacy systems with data stored in Hadoop. Although Hadoop can be much faster than traditional databases for some tasks, it often isn’t fast enough to respond to queries immediately or to process incoming data in real time (such as using smartphone location data to generate just-in-time offers).
Hadoop vendors are responding with improvements and enhancements. For example, Hortonworks produced a tool that lets other applications run on top of Hadoop. Other companies are offering tools as Hadoop substitutes. Databricks developed Spark open source software that is more adept than Hadoop at handling real-time data, and the Google spinoff Metanautix is trying to supplant Hadoop entirely.
It is difficult to find enough technical IT specialists with expertise in big data analytical tools, including Hive, Pig, Cassandra, MongoDB, or Hadoop. On top of that, many business managers lack numerical and statistical skills required for finding, manipulating, managing, and interpreting data.
Even with big data expertise, data analysts need some business knowledge of the problem they are trying to solve with big data. For example, if a pharmaceutical company monitoring point-of-sale data in real time sees a spike in aspirin sales in January, it might think that the flu season is intensifying. However, before pouring sales resources into a big campaign and increasing flu medication production, the company would do well to compare sales patterns to past years. People might also be buying aspirin to nurse their hangovers following New Year’s Eve parties. In other words, analysts need to know the business and the right questions to ask of the data.
Just because something can be measured doesn’t mean it should be measured. Suppose, for instance, that a large company wants to measure its website traffic in relation to the number of mentions on Twitter. It builds a digital dashboard to display the results continuously. In the past, the company had generated most of its sales leads and eventual sales from trade shows and conferences. Switching to Twitter mentions as the key metric to measure changes in the sales department’s focus. The department pours its energy and resources into monitoring website clicks and social media traffic, which produce many unqualified leads that never lead to sales.
Although big data is very good at detecting correlations, especially subtle correlations that an analysis of smaller data sets might miss, big data analysis doesn’t necessarily show causation or which correlations are meaningful. For example, examining big data might show that from 2006 to 2011 the U.S. murder rate was highly correlated with the market share of Internet Explorer, since both declined sharply. But that doesn’t necessarily mean there is any meaningful connection between the two phenomena.
Several years ago, Google developed what it thought was a leading-edge algorithm using data it collected from web searches to determine exactly how many people had influenza and how the disease was spreading. It tried to calculate the number of people with flu in the United States by relating people’s location to flu-related search queries on Google. The service has consistently overestimated flu rates when compared to conventional data collected afterward by the U.S. Centers for Disease Control and Prevention (CDC). According to Google Flu Trends, nearly 11 percent of the U.S. population was supposed to have had influenza at the flu season’s peak in mid-January 2013. However, an article in the science journal Nature stated that Google’s results were nearly twice the actual amount estimated by the CDC, which had 6 percent of the population coming down with the disease. Why did this happen? Several scientists suggested that Google was “tricked” by widespread media coverage of that year’s severe flu season in the United States, which was further amplified by social media coverage. Google’s algorithm only looked at number of flu search requests, not the context of the searches.
Big data can also provide a distorted picture of the problem. Boston’s Street Bump app uses a smartphone’s accelerometer to detect potholes without the need for city workers to patrol the streets. Users of this mobile app collect road condition data while they drive and automatically provide city government with real-time information to fix problems and plan long-term investments. However, what Street Bump actually produces is a map of potholes that favors young, affluent areas where more people own smartphones. The capability to record every road bump or pothole from every enabled phone is not the same as recording every pothole. Data often contain systematic biases, and it takes careful thought to spot and correct for those biases.
And let’s not forget that big data poses some challenges to information security and privacy. As Chapter 4 pointed out, companies are now aggressively collecting and mining massive data sets on people’s shopping habits, incomes, hobbies, residences, and (via mobile devices) movements from place to place. They are using such big data to discover new facts about people, to classify them based on subtle patterns, to flag them as “risks” (for example, loan default risks or health risks), to predict their behavior, and to manipulate them for maximum profit.
When you combine someone’s personal information with pieces of data from many different sources, you can infer new facts about that person (such as the fact that they are showing early signs of Parkinson’s disease or are unconsciously drawn toward products that are colored blue or green). If asked, most people might not want to disclose such information, but they might not even know such information about them exists. Privacy experts worry that people will be tagged and suffer adverse consequences without due process, the ability to fight back, or even knowledge that they have been discriminated against or manipulated in the marketplace.
1. What business benefits did the companies and services described in this case achieve by analyzing and using big data?
2. List and describe the limitations to using big data.
3. Should all organizations try to analyze big data? Why or why not? What people, organization, and technology issues should be addressed before a company decides to work with big data?
Explanation / Answer
Big data is an essential part of any organisation which has different layers. It has different stages which passes through raw statistics or unstructured data to an actionable structured data.
Layers are necessary to build a system which moves from raw data material to the specific actionable data inside.
Layers which are implemented into the big data are as follows
Data source layer
Data storage layer
Data analysis layer
Data output layer
Data source layer
This is specifically here is the first layer where the data arrives. Data which arise at this specific layer is unstructured as well as contains multiple sales record customer database is a feedback. It is one of the first Strategies and steps in setting up the data strategy for assessing the specific data which is entered into the organisation. Determination of the new sources or available data is possible.
Data storage layer
This specific layer has all the big data which arise at your organisation. Once the big data enters your organisation it is stored into this specifically her. Some specifically Design Tool have been developed to store this kind of data and as well as Apache Hadoop dfs system or Google file system has widely been used for scoring the big data. Every database has its own understanding of Hadoop has its own HBase and all of them are based on the noSQL architecture which are very popular.
Data processing layer
In this layer data is processed and analysed. Most common for analysing and processing data is mapreduce tool. This is specific tool uses this big data to analyse the specific and process the specific terms which has been implemented into the big data. Larger organisations have invested in their own Data Analytics tool as well as teams which are part of their big data. Automatic patterns as well as different algorithms have also been implemented into the organisation for better big data mining. Drawing the conclusions regarding the manual data analysis is also present.
Data output layer
This layer contains the passed data which comes out from the organisation for the specific benefit of the teams as well as people. Clear and concise communicate is very essential in this is specific stage and the output can be in form of charts reports, figures or recommendations. Big data analysiss final part is this specific stage and the measurable improvement in at least one KPI has to be achieved by the specification taken on the basis of the analysis which has been carried out.
Predictive modelling is a process in which data Mining or big data is used to create a prediction of the specific model which is later used as a prototype. Number of forecast items and predictors are used to obtain the specific model which is Virtually assign. When the data has been collected from relevant predictors statistical. Complex software algorithm as well as statistical analysis model is design with the help of the specific softwares and they are validated or specifically revised.
Offshore data warehousing is very widely being used for organisations for improving the overall data warehousing needs as well as reducing the overall cost involved in data warehousing. Offshore development of the data warehousing provides extensive support to the stopping as well as it also increase the overall productivity by lowering and maintaining the cost which are heavily invested in on shore data warehousing systems. Offshore data warehousing systems are very flexible in terms of acquiring band with. If a project is specifically determined with required details,Offshore model is the most appropriate way to use with this specific situation as it provides intensive support to the overall structure of organisation as of your data warehousing is very effective in terms of directly affecting the total bandwidth captured by the organisation.
Basic limitation of the Offshore data warehousing is that it is very stressful to the changes. For making any kind of change in the of your data warehouse, intensive efforts are need to be done which is directly implemented into the chain management system.
Development of the Offshore models are widely developed as per the need of extreme data warehousing for most of the organisation which is creating problem on land. Pirate using the overall cost and development of the Offshore data warehousing systems, overall implementation of multiple requirements for your organisation which include intensive amount of big data is stored in data warehouse is which is build Offshore for the benefit of the organisation.
Offshore data warehousing systems also have different risk involved as the maintenance level of the Offshore warehousing system is very high and any kind of repair which is done on the system is very costly. Once the link is damaged between the data warehouse which is located Offshore and the line centre, mitigating the problem and finding the issue in the connections is one of the biggest issues which arrived in this type of data warehousing.
All in all we can say that Offshore data warehousing has been very effective in implementing better Strategies for the organisation to reduce their overall expenses in big data storage. By lowering the overall expenses in the of big data storage by doing Offshore development of the data warehousing systems, organisations are directly focusing on creating better Strategies for managing the data and improving their overall market situation. It also increases the flexibility which makes it more efficient for the Change management of the organisations as it is a requirement of today's business.
Not all the organisations are required to analyse big data as many of the smaller organisations does not have enough funding to implement search specific program of analysing the big data or processing the big data in a positive margin. Implementation of analysis on big data would be a great burden on their all over operation structure reducing the impact of available opportunities with which the specific funding can be invested somewhere else rather than being invested on analysing big data reducing the impact of a business operations as well as creating a burden on the total business rather than benefiting it.
Implementation of electronic records systems should be assessed before implementation of any kind of big data. Style of operation for the company as well as availability of other marketing analysis methods such as job analysis and analysing the market using external analysis methods would be the most appropriate way of implementing search specific strategies before application of big data analysis in their operating environment.
Implementation of big data requires heavy cost as well as also required to maintain certain standards of operation such as hardware changes and requirement of consistent monitoring as our supervision. This type of approach reduces the overall impact of big data implementation as well and limits the availability of opportunities for the specific firm in their operative environment. Hence, Application of big data is only recommended for bigger forms rather than implementing it with each and every firm where the specific type of data analysis method is not required.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.