Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

1) Choose four different companies (Amazon, Youtube. facebook, Yahoo, Google) an

ID: 3882163 • Letter: 1

Question

1) Choose four different companies (Amazon, Youtube. facebook, Yahoo, Google) and find out what database systems do they use. See if it is one single database system or a combination of many. Comment on how their choice aligns (or does not) with their needs and interests.  ( Cite all your sources. )                       

2) List the popular NOSQL database systems that are out there. Find out examples where different companies that have implemented them. Based on your findings, explain what benefits would NOSQL database systems offer over relational database systems.  (Cite all your sources. )                    

                                                                                                                 

(((((please type your answer I can't read hands writing)))))

Explanation / Answer

Different types of databases used by Google:
1)Google uses a number of databases, but the primary one is BigTable.This database is never released outside but you can use it through it's interface which is part of Google Cloud Platform. Google also use MySQL for other minor apps. BigTable is not a relational database(like MySQL)but a huge distributed hash table. BigTable enables huge scalability of storage for Google. At atomic level, data is stored using Google File system in millions of chunk servers.Along with BigTable, Object based storage, Distributed File System are also used.
Features of BigTable:
a) stores data in tables
b) A row is a URL name
c) The row ranges are broken up into partitions called tablets
d) These tablets are distributed across multiple servers for load balancing.
e)fast and extremely large-scale DBMS
f)tables are optimized for GFS (Google File System) by being split into multiple tablets - segments of the table as split along a row chosen such that the tablet will be ~200 megabytes in size.

sources: https://cloud.google.com/datastore/docs/concepts/overview
https://cloud.google.com/sql/docs/
https://cloud.google.com/spanner/docs/true-time-external-consistency

FACEBOOK:
Facebook uses diffrent kinds of databases:
1) MySQL: Facebook uses MySQL to manage many petabytes of data from activities like likes, comments and shares , wallposts and timeline etc.This data is replicated in various servers.

2) MEMCACHED: MEMCACHED is a memory caching system that is used to speed up website by caching data and objects in RAM to reduce reading time.Having a caching system allows Facebook to be as fast as it is at recalling your data.
3)HAYSTACK:Facebook users have uploaded over 15 billion photos which make Facebook the biggest photo sharing website.For each photo , fb generates and four images of different sizes , which becomes a total of 1.5 PB of storage. Implements a HTTP based photo server which stores photos in a generic object store called Haystack.
CASSANDRA (Multi-dimensional, distributed key-value store): Currently used for Facebook's private messaging
HIPHOP FOR PHP:HipHop for PHP is a set of PHP execution engines.Facebook is able to move fast and maintain a high number of engineers who are able to work across the entire codebase.
Hive (Data warehouse for Hadoop, supports tables and a variant of SQL called hiveQL).:Used for simple summarization jobs, machine learning and many other applications.
Sources:http://www.facebook.com/press/info.php?statistics
http://hadoop.apache.org/hive/
https://www.facebook.com/note.php?note_id=24413138919

YOUTUBE:
Youtube uses MySQL for storing users , playlists,channels, video metadata etc. But , it doesnot store videos in the form of BLOB.
Youtube gives a unique name to each uploaded video file, sends it in batch for conversion, and thumb generation and stores metadata in Database. Video is published to many servers.

Amazon: Amazon also uses different kinds of databases for its operation.
Order processing systems are built using RDBMS ,for storing users and their details and transactions.
Product related data like details , sellers etc are stored in NoSQL databases.
Static pages serving can be done with in memory NoSQL.


Most popular NoSQL databases and companies using them:
1. MongoDB:This is a document store database.Doesnot use a relation schema, instead using JSON like documents to store data.Supports dynamic shemas, and is free and open source software.Provides load balancing , replication, indexing, quering etc.
2.Cassandra: Cassandra is a decentralized, distributed, column-oriented database engine. It is optimized for clusters, especially those across multiple datacenters. Cassandra provides low latency client access. Like MongoDB, it is also free and open source.Cassandra is a column-oriented database, meaning that its rows actually contain what we most usually think of as vertical data.The advantage of column-oriented database design is that some types of data lookups can become very fast, given that the desired data could be stored consecutively in a single row. Used by tech giants like Facebook as above mentioned.

3. HBase:Another column-oriented database, HBase is a free and open source implementation of Google's BigTable. While HBase is a legitimate piece of software in its own right, some of its popularity and widespread use undoubtedly comes from its close association with Hadoop, as it is part of the Apache project. It facilitates the efficient lookup of sparse, distributed data, which is one of its strongest selling points.
4. Redis:Redis is the most popular and widely-used key-value store implementation.Redis holds its key-value pairings in memory, making their access quick.Over the years, APIs have been developed for an incredibly wide variety of languages as well, making Redis an easy choice for developers.

DynamoDB offered by Amazon Web Services, is used heavily by the company itself and by several companies dependent on this service. Its supports high TPS operations (>1000) which is very essential for the high volume experienced by the site especially during Festival seasons.

Advantages of NoSQL over relational databases:
1.Elastic scaling: For years, database administrators have relied on scale up — buying bigger servers as database load increases — rather than scale out — distributing the database across multiple hosts as load increases.
RDBMS might not scale out easily on commodity clusters, but the new breed of NoSQL databases are designed to expand transparently to take advantage of new nodes, and they’re usually designed with low-cost commodity hardware in mind.

2)Big data:
The volumes of data that are being stored also have increased massively over the last years.
RDBMS capacity has been growing to match these increases, but as with transaction rates, the constraints of data volumes that can be practically managed by a single RDBMS are becoming intolerable for some enterprises. Today, the volumes of "big data" that can be handled by NoSQL systems, such as Hadoop
3)NoSQL databases can store Large volumes of structured, semi-structured, and unstructured data.
4)NoSQL databases provide quick iteration, and frequent code pushes
5)Efficient, scale-out architecture instead of expensive, monolithic architecture
sources:
1. wikipedia