TechLog: Google File System

Here is my understading of Google File System from Youtube video.

Google File System
- provides file system abstraction for scalable system
- Google File System has a Namenode
- Namenode works as master server

- Files are devided into 64 MB chunks (why? Becuase large systems have a very high read load. So 64MB chunks (which are pretty big) are necessary so reads will finish only when chunk is read completely which is regular case)
- Due to high frequency of hard disk failures data needs to be replicated so chunks are replicated
- The locations of the chunks are stored by Namenode and it fits totally in the memory

writing the files
- Coordinated by not executed by Namenode
- Namenode chooses the primary replica to do the writes
- For all the writes the primary replicas would be same
- Now for primary replica is leased for mutation to client i guess
- Namenode does the write ahead logging for each write
- Each chunk server sends back the the ack to primary replica server
- After each write is received at every chunk server primary replica server commands to execute writes
- Writes done and ack is send back to primary replica to client (not to namenode)

Appending Records at the end of the file:
- Record append functionality is provided
- Line can be appended at the end of the line

Deleting Files
- Deleting files works as lazy deletes
- Files are maked delete and the garbage collected
- Deletion is not considered as main operation

GFS also puts some responsibility to clients / programs. As appending records may create muliple reocords in the file and those records are sent back to client/program. The program now reads the file and sees multiple / stale records. Recognizes them and handles the descrepancy. Consider this as trade off between consistency and scalability.

For more information refer to this Google File System.

TechLog

Saturday, December 25, 2010

Google File System

No comments:

Post a Comment

Labels

Blog Archive