Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

UFIDs must be unique across all possible distributed file systems. UFIDs can be

ID: 3767754 • Letter: U

Question

UFIDs must be unique across all possible distributed file systems. UFIDs can be made unique by including the address of the host that creates them and a logical clock that is increased whenever a UFID is created.

True or False: The host address is included only for uniqueness, not for addressing purposes. Briefly explain your answer.

(b) In NFS, client caching is used in the client module. What’s the major motivation of using client caching? Briefly justify your answer

(c)However, a problem arises when we use client caching. There is a need to maintain the consistency of client caches with the file server.

a.What mechanism is used in NFS to achieve the consistency? Please describe how it operates.

b.What interaction between client module and server may be needed to support the above mechanism? What’s the resulted performance penalty? And can you suggest a way to reduce this performance penalty?

AFS uses callback to ensure cache consistency. How does AFS deal with the risk that callback messages may be lost?

In the Google Files System (GFS), there is a single Master server, which maintains all metadata, e.g., name space, access control, file-to-chunk mappings, garbage collection, chunk migration, and etc. All GFS clients will consult the Master for metadata before file accesses. This design may potentially lead to heavy traffic load around the Master as well as heavy processing load at the Master. Please describe what techniques are used in GFS to ease such potential problems.

What is the mechanism used in NFS and AFS, respectively, to maintain the consistency of client caches? Describe how each mechanism operates. Which one is more scalable and why?

To what extent does Sun NFS deviate from one-copy file update semantics? Construct a scenario in which two user-level processes sharing a file would operate correctly in a single UNIX host but would observe inconsistencies when running in different hosts.

Explanation / Answer

a)

The statement “The host address is included only for uniqueness, not for addressing purposes” is true.

Since there are many distributed servers connected to distributed system, to identify a file in the distributed system uniquely (without ambiguity), uniqueness is important. This can be achieved by including host address and clock time in the UFID, because the host id is unique over the distributed system and time also changed request to request.

b)

Client caching is used to reduce the number of requests transmitted to servers. For example, the client module caches the results of operations like read, write, lookup etc. By using caches, the burden on server is reduced and the response time of the application is reduced.

c)

a. when client caches are used, the file or data in the server is copied to client local space.

When the client updates the file or data, the change is not reflected immediately in the server or in the client that has the same file in its cache.

A timestamp method is used to overcome this file or data inconsistency.

In this timestamp method, two timestamps tags are associated with the cache blocks.

One time tag is Tc, which is the time when the cache block was last validated.

Another time tag is Tm, which is the time when the block is last modified at the server.

If at a time T, the cache block is valid, if T-Tc less than threshold(freshness value) or Tm at server and Tm saved at client are same.

b.

The server updates the block modified time and client also stores the cache block modified times. When an application wants to validate a cached block, it checks the T-Tc and if it is greater than freshness value, requests the server for Tm(server) time and checks with Tm(client).If both are same the application considers the block is valid.

Otherwise, the client module requests the server for updated block.

Performance:

If freshness value (threshold) value is set very least time, or the file is frequently modified at the server, the performance of the application degraded remarkably. Why because, each time the validation failed, the clients sends requests to the server(burden increases on the server), as a result validation time and block receiving time increases the responsive time.

By setting freshness value optimally and feed backing the time tag Tm(server) to clients whenever it is changed, the performance can be improved.