Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Objectives: Give students the chance to practice a programming language that wil

ID: 3884976 • Letter: O

Question

Objectives:

Give students the chance to practice a programming language that will be needed in the course

Handle and understand semi-structured data.

Extract the required information while it is not possible to use SQL queries, or database techniques.

Find patterns of data.

Find significant entities that have special characteristics.

Give students the chance to perform some data analytics steps.

You will be given a file named hobbies.txt.

This file contains a group of fictitious Facebook users and their hobbies.

Each line in the file contains a user/username and a list of hobbies of that user.

The data in each line is delimited by commas.

For instance in the line: 2254,reading,coding,swimming,playing soccer,

i.The user/username is: 2254

ii.The hobbies are: reading, coding, swimming, and playing soccer

iii.The number and type of hobbies may differ from one user to another.

This file will be your data set that your code has to read to be able to implement a code that does the following:

Finding circles/networks of friends:

i.In each circle you will report, all the users should share at least x number of hobbies

ii.x is a variable that a user can input to the program.

iii.Circles of friends should be written to a file named circles.txt.

iv.Each line should have the usernames in the circle/network you found, tab character, and list of shared hobbies.

v.for example, a line may look like: 2254,552,1258               reading,swimming,hiking

Finding popular users:

i.Popularity is based on being part of at least y circles/networks.

ii.y can be variable that a user can input to the program.

iii.Popular users should be written to a file named popular.txt. Each user and how many circles/networks the user belongs to, should be in separate line and separated by the tab character.

iv.For instance: 2254           5

v.This step should occur after step (a.).

vi.Hint: You may want to save the circles you found in part (a.) in some data structure so that you can us them in this part.

Message me for the File "HOBBIES.txt".

Explanation / Answer

XML provides a natural representation for hierarchical structures and repeating fields or structures. Further, XML document type definitions (DTDs) and schemas allow fine-grained control over how much variation to allow in the data: Vocabulary designers can require XML data to be perfectly regular, or they can allow a little variation, or a lot. In the extreme case, an XML vocabulary can effectively say that there are no rules at all beyond those required of all well-formed XML. Because XML syntax records only what is present, not everything that might be present, sparse data does not make the XML representation awkward; XML storage systems are typically built to handle sparse data gracefully.

The most important contribution XML makes to the problem of semi-structured data, however, is to call into question the nature and existence of the problem. As the description makes clear, semi-structured data is just data that does not fit neatly into the relational model. Referring to “the problem of semi-structured data” suggests subliminally that the problem lies in the failure of the data to live up fully to the relational model, rather than in the model and its failure fully to support the natural structure of the data.