Hadoop 中的 hdfs 介紹＠克理斯在 Baning~

2008-11-03 02:06:01| 人氣5,231| 回應0 | 上一篇 | 下一篇

Hadoop 中的 hdfs 介紹

Overview

HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data. The architecture of HDFS is described in detail here. This user guide primarily deals with interaction of users and administrators with HDFS clusters. The diagram from HDFS architecture depicts basic interactions among NameNode, the DataNodes, and the clients. Clients contact NameNode for file metadata or file modifications and perform actual file I/O directly with the DataNodes.

概要

HDFS 是 Hadoop 專案中的分散式存儲主要格式。一個 HDFS Cluster 由一個管理檔系統資料的 NameNode 和存儲實際資料的一些 Datanode 所組成。 HDFS 的架構在這連結裡有詳細描述。這個 user guide 主要給需要跟 HDFS Cluster 使用的工程師或管理員。HDFS 架構文章中描繪了 Namenode、 Datanode 和用戶端們之間的基本關係。用戶端與 Namenode 通訊而可以獲取或者修改檔的定義資誱，並可以與 Datanode進行實際的 I/O 操作。

The following are some of the salient features that could be of interest to many users.

下面的列表應該是大多數用戶關心的HDFS突出特點。

Hadoop, including HDFS, is well suited for distributed storage and distributed processing using commodity hardware. It is fault tolerant, scalable, and extremely simple to expand. Map-Reduce, well known for its simplicity and applicability for large set of distributed applications, is an integral part of Hadoop.

Hadoop，包括 HDFS，非常適合廉價機器上運作的分散式存儲和分散式處理。這個系統是可容錯的、具有延展性的，且非常易於擴展。並且，以簡單性和高應用性著稱的 Map-Reduce 是 Hadoop 不可或缺的組成部分。

HDFS is highly configurable with a default configuration well suited for many installations. Most of the time, configuration needs to be tuned only for very large clusters.
HDFS的默認配置適合於大多數安裝的應用。通常情況下，只有在非常大規模的 Cluster 應用上才需要修改原來的設定配置檔。

Hadoop is written in Java and is supported on all major platforms.
HDFS是用java編寫的，支持大多數平臺。
Hadoop supports shell-like commands to interact with HDFS directly.
Hadoop 支援 shell-like 的指令並直接使用 HDFS 。
The NameNode and Datanodes have built in web servers that makes it easy to check current status of the cluster.
Namenode和Datanode都內建了 web 伺服器，可以方便地查看 Cluster 現在最新運作狀態
New features and improvements are regularly implemented in HDFS. The following is a subset of useful features in HDFS:
一些新的功能和改進逐步的在 HDFS 上被實作，下面是 HDFS 中有用的子群集功能：
- File permissions and authentication.
- 檔案權限和証証
- Rack awareness: to take a node's physical location into account while scheduling tasks and allocating storage.
- Rack awareness：當調度任務和分配存儲的時候將節點的物理位置考慮進去
- Safemode: an administrative mode for maintenance.
- 安全模式：用來維護時的管理者模式
- fsck: a utility to diagnose health of the file system, to find missing files or blocks.
- fsck：診斷檔案系統的一個工具，用來查找丟失的檔案或者 blocks
- Rebalancer: tool to balance the cluster when the data is unevenly distributed among DataNodes.
- Rebalancer :當資料在 Datanode 間沒有均勻分佈的時候，用於重新平衡 Cluster 的工具
- Upgrade and rollback: after a software upgrade, it is possible to rollback to HDFS' state before the upgrade in case of unexpected problems.
- 升級和回復：當Hadoop軟體升級，在升級遇到不可預期的問題時候，可以回復到 HDFS 升級前的狀態
- Secondary NameNode: performs periodic checkpoints of the namespace and helps keep the size of file containing log of HDFS modifications within certain limits at the NameNode.
- 二級Namenode ：幫助定時的 checkpoint 在 Namenode ，且包含了 HDFS 修改日誌的檔大小在限制範圍內(@Namenode)。

我要檢舉

#concept#hdfs#hadoop

台長：克理斯在 Internet!

您可能對以下文章有興趣

大陸最牛的中文作文

什麼是書籤服務!?

蘇愷27型戰機富豪新玩具

交通部觀光局旅館業及民宿管理系統

人氣(5,231) | 回應(0)| 推薦 (0)| 收藏 (0)| 轉寄
全站分類: 數位資訊(科技、網路、通訊、家電)

回應(0)

克理斯在 Baning~ 戀上簡單奢華~ 卻選擇愛上美麗的錯~ 再也 追不回我自己~ 偷偷流下眼淚\ ,在夜裡 . 舔舐著妳留給我的記憶, 卻發覺都是妳的倩影. 明早,我會再努力 面對一切! 2,360愛的鼓勵 1訂閱站台