這是曾經試過的一個 open source 的 file system , 為什麼試它?理由如下:
1. Open Source...走 apache license <--- This is so important for business used licensed.
2. 很單純的 HIGH PERFORMANCE SCALABLE STORAGE ,就是在解決 storage 的問題!
3. 安裝和設定上非常的簡單
4. 可以跟 hadoop 結合 <--- 贊!
再來看它的運作原理
KFS consists of three components:
- Meta server: This provides the global namespace for the filesystem. It keeps the directory structure in-memory in a B-tree.
- Chunkserver: Files in KFS are split into _chunks_. Each chunk is 64MB in size. Chunks are replicated and striped across chunkservers.
- Client library: The client library is linked with applications. This enables applications to read/write files stored in KFS.
Terminology:
1. Chunk size is 64MB
2. File consists of a set of chunks
3. Each chunk is fixed in size
KFS 的 System Architecture (系統架構)
1• Single meta-data server that maintains the global namespace
2• Multiple chunkservers that enable access to data
3• Client library linked with applications for accessing files in KFS
4• System implemented in C++
Meta-data Server
• Maintains the directory tree in-memory using a B+ tree
– Tree records the chunks that belong to a file and file attributes
– For each chunk
• Record the file offset/chunk version
– Meta-server logs mutations to the tree to a log
– Log is periodically rolled over (once every 10 mins)
– Offline process compacts logs to produce a checkpoint file
• Chunk locations are tracked by the metaserver in an in-core table
• Chunks are versioned to handle chunkserver failures
• Periodically, heartbeats chunkservers to determine load information as well as responsiveness
而當 Meta-data Server Crash Recovery 時的處理(可見其還有等待改進的空間!!!)
• Following crash, restart metaserver
• Rebuild tree using last checkpoint+log files
• Chunkservers connect back to the metaserver and send chunk information
– Metaserver rebuilds the chunk location information
– Metaserver identifies stale chunks and notifies appropriate chunkservers
• Meta-server is a single point of failure
– To protect filesystem, backup logs/checkpoint files to remote nodes
– Will be addressed in a future release
在 Chunk Server
• Stores chunks as files in the underlying filesystem (such as, XFS/ZFS)
– Chunk size is fixed at 64MB
• To handle disk corruptions,
– Adler-32 checksum is computed on 64K blocks
– Checksums are validated on each read
• Chunk file has a fixed length header (~5K) for storing checksums and other meta info
當 Chunk Server Crash 時要進行 Recovery
• Following a crash, restart chunkserver
• Chunkserver scans the chunk directory to determine chunks it has
– Chunk filename identifies the owning file-id, chunk-id, chunk version
• Chunkserver connects to metaserver and tells it the chunks/versions it has
• Metaserver responds with stale chunk id’s (if any)
• Stale chunks are moved to lost+found
Chunk Server 上進行 Data Scrubbing
• Package contains a tool, chunkscrubber that can be used to scrub chunks
– Scrubber verifies checksums and identifies corrupted blocks
• Support for periodic scrubbing will be added in a future release
– Scrubber will identify corrupted blocks and they will be moved to lost+found
– Metaserver will use re-replication to proactively recover lost chunks
文章定位: