24h購物| | PChome| 登入
2008-11-03 02:06:01| 人氣5,227| 回應0 | 上一篇 | 下一篇

Hadoop 中的 hdfs 介紹

推薦 0 收藏 0 轉貼0 訂閱站台

原文來自:http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html

譯者:Chris Lin ( Hes Sin , Lin)   -- Chris @ Internet

----------------------------------------------------------------------------------------------------

Purpose:

 This document is a starting point for users working with Hadoop Distributed File System (HDFS) either as a part of a Hadoop cluster or as a stand-alone general purpose distributed file system. While HDFS is designed to "just work" in many environments, a working knowledge of HDFS helps greatly with configuration improvements and diagnostics on a specific cluster.

目的:

這一個文件是讓使用者開始在 Hadoop Distributed File Syste (HDFS) 上的應用,無論是在使用 Hadoop Cluster 或者是在單獨獨立的分散式系統上。 HDFS 被設計在可以在許多環境上"馬上能夠運作",在 HDFS 上的運作知識可以讓你在設定一個群組的效能調整和問題診斷上有相當的幫助。

Overview

HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data. The architecture of HDFS is described in detail here. This user guide primarily deals with interaction of users and administrators with HDFS clusters. The diagram from HDFS architecture depicts basic interactions among NameNode, the DataNodes, and the clients. Clients contact NameNode for file metadata or file modifications and perform actual file I/O directly with the DataNodes.

概要 

       HDFS Hadoop 專案中的分散式存儲主要格式。一個 HDFS Cluster 由一個管理檔系統資料的 NameNode 和 存儲實際資料的一些 Datanode 所組成。 HDFS 的架構在這連結裡有詳細描述。這個 user guide 主要給需要跟 HDFS Cluster 使用的工程師或管理員。HDFS 架構文章中描繪了 NamenodeDatanode 和用戶端們之間的基本關係。用戶端與 Namenode 通訊而可以獲取或者修改檔的定義資誱,並可以與 Datanode進行實際的 I/O 操作。

The following are some of the salient features that could be of interest to many users.

下面的列表應該是大多數用戶關心的HDFS突出特點。

Hadoop, including HDFS, is well suited for distributed storage and distributed processing using commodity hardware. It is fault tolerant, scalable, and extremely simple to expand. Map-Reduce, well known for its simplicity and applicability for large set of distributed applications, is an integral part of Hadoop.

Hadoop,包括 HDFS,非常適合廉價機器上運作的分散式存儲和分散式處理。這個系統是可容錯的、具有延展性的,且非常易於擴展。並且,以簡單性和高應用性著稱的 Map-Reduce Hadoop 不可或缺的組成部分。

  • HDFS is highly configurable with a default configuration well suited for many installations. Most of the time, configuration needs to be tuned only for very large clusters. 
  •  HDFS的默認配置適合於大多數安裝的應用。通常情況下,只有在非常大規模的 Cluster 應用上才需要修改原來的設定配置檔。
  • Hadoop is written in Java and is supported on all major platforms.       
  •  HDFS是用java編寫的,支持大多數平臺。
  • Hadoop supports shell-like commands to interact with HDFS directly.
  • Hadoop 支援 shell-like 的指令並直接使用  HDFS

  • The NameNode and Datanodes have built in web servers that makes it easy to check current status of the cluster.
  • NamenodeDatanode都內建了 web 伺服器,可以方便地查看 Cluster 現在最新運作狀態

  • New features and improvements are regularly implemented in HDFS. The following is a subset of useful features in HDFS:
  • 一些新的功能和改進逐步的在 HDFS 上被實作,下面是 HDFS 中有用的子群集功能:
    • File permissions and authentication.
    • 檔案權限和証証
    • Rack awareness: to take a node's physical location into account while scheduling tasks and allocating storage.
    • Rack awareness:當調度任務和分配存儲的時候將節點的物理位置考慮進去
    • Safemode: an administrative mode for maintenance.
    • 安全模式:用來維護時的管理者模式
    • fsck: a utility to diagnose health of the file system, to find missing files or blocks.
    • fsck:診斷檔案系統的一個工具,用來查找丟失的檔案或者 blocks
    • Rebalancer: tool to balance the cluster when the data is unevenly distributed among DataNodes.
    • Rebalancer :當資料在 Datanode 間沒有均勻分佈的時候,用於重新平衡 Cluster 的工具
    • Upgrade and rollback: after a software upgrade, it is possible to rollback to HDFS' state before the upgrade in case of unexpected problems.
    • 升級和回復 :當Hadoop軟體升級,在升級遇到不可預期的問題時候,可以回復到 HDFS 升級前的狀態
    • Secondary NameNode: performs periodic checkpoints of the namespace and helps keep the size of file containing log of HDFS modifications within certain limits at the NameNode.
    • 二級Namenode :幫助定時的 checkpoint 在 Namenode ,且包含了 HDFS 修改日誌的檔大小在限制範圍內(@Namenode)。

 

台長: 克理斯 在 Internet!
人氣(5,227) | 回應(0)| 推薦 (0)| 收藏 (0)| 轉寄
全站分類: 數位資訊(科技、網路、通訊、家電)

是 (若未登入"個人新聞台帳號"則看不到回覆唷!)
* 請輸入識別碼:
請輸入圖片中算式的結果(可能為0) 
(有*為必填)
TOP
詳全文