# Hadoop Get Started

## Hadoop Get Started

### Java のインストール

```
[ec2-user@ip-172-31-44-80 ~]$ java -version
java version "1.7.0_171"
OpenJDK Runtime Environment (amzn-2.6.13.0.76.amzn1-x86_64 u171-b01)
OpenJDK 64-Bit Server VM (build 24.171-b01, mixed mode)
```

### Unix ユーザアカウントの作成

hadoopの処理をそれぞれ分けるために専用にUnixユーザーアカウントを作成するのが良い。

```
[ec2-user@ip-172-31-44-80 ~]$ sudo su
[root@ip-172-31-44-80 ec2-user]# groupadd hadoop
[root@ip-172-31-44-80 ec2-user]# useradd -g hadoop hadoop
[root@ip-172-31-44-80 ec2-user]# useradd -g hadoop hdfs
[root@ip-172-31-44-80 ec2-user]# useradd -g hadoop mapred
[root@ip-172-31-44-80 ec2-user]# useradd -g hadoop yarn
[root@ip-172-31-44-80 ec2-user]# ls -al /home/
total 28
drwxr-xr-x  7 root     root     4096 Apr  7 06:55 .
dr-xr-xr-x 25 root     root     4096 Apr  7 06:42 ..
drwx------  3 ec2-user ec2-user 4096 Apr  7 06:42 ec2-user
drwx------  2 hadoop   hadoop   4096 Apr  7 06:55 hadoop
drwx------  2 hdfs     hadoop   4096 Apr  7 06:48 hdfs
drwx------  2 mapred   hadoop   4096 Apr  7 06:48 mapred
drwx------  2 yarn     hadoop   4096 Apr  7 06:48 yarn
```

パスワードの設定

```
[ec2-user@ip-172-31-44-80 local]$ sudo passwd hadoop
Changing password for user hadoop.
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
passwd: all authentication tokens updated successfully.
[ec2-user@ip-172-31-44-80 local]$ sudo passwd hdfs
Changing password for user hdfs.
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
passwd: all authentication tokens updated successfully.
[ec2-user@ip-172-31-44-80 local]$ sudo passwd mapred
Changing password for user mapred.
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
passwd: all authentication tokens updated successfully.
[ec2-user@ip-172-31-44-80 local]$ sudo passwd yarn
Changing password for user yarn.
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
passwd: all authentication tokens updated successfully.
```

生成したユーザーにsudo権限を与える

```
[ec2-user@ip-172-31-44-80 local]$ sudo visudo
[ec2-user@ip-172-31-44-80 local]$ sudo groupadd sudo
[ec2-user@ip-172-31-44-80 local]$ sudo usermod -G sudo hadoop
[ec2-user@ip-172-31-44-80 local]$ sudo usermod -G sudo hdfs
[ec2-user@ip-172-31-44-80 local]$ sudo usermod -G sudo mapred
[ec2-user@ip-172-31-44-80 local]$ sudo usermod -G sudo yarn
```

visudo実行時は、以下を追加

```
hadoop  ALL=(ALL)       ALL
hdfs    ALL=(ALL)       ALL
mapred  ALL=(ALL)       ALL
yarn    ALL=(ALL)       ALL
```

### Hadoopのインストール

```
[ec2-user@ip-172-31-44-80 ~]$ cd /usr/local
[ec2-user@ip-172-31-44-80 local]$ sudo wget http://ftp.tsukuba.wide.ad.jp/software/apache/hadoop/common/hadoop-2.8.3/hadoop-2.8.3.tar.gz
--2018-04-07 07:00:46--  http://ftp.tsukuba.wide.ad.jp/software/apache/hadoop/common/hadoop-2.8.3/hadoop-2.8.3.tar.gz
Resolving ftp.tsukuba.wide.ad.jp (ftp.tsukuba.wide.ad.jp)... 203.178.132.80, 2001:200:0:7c06::9393
Connecting to ftp.tsukuba.wide.ad.jp (ftp.tsukuba.wide.ad.jp)|203.178.132.80|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 244469481 (233M) [application/x-gzip]
Saving to: ‘hadoop-2.8.3.tar.gz.1’

hadoop-2.8.3.tar.gz.1                              100%[==============================================================================================================>] 233.14M  11.0MB/s    in 31s     

2018-04-07 07:01:17 (7.48 MB/s) - ‘hadoop-2.8.3.tar.gz.1’ saved [244469481/244469481]
[ec2-user@ip-172-31-44-80 local]$ sudo tar xzf hadoop-2.8.3.tar.gz
[ec2-user@ip-172-31-44-80 local]$ sudo chown -R hadoop:hadoop hadoop-2.8.3
[ec2-user@ip-172-31-44-80 ~]$ sudo vim /etc/bashrc
```

/etc/bashrcには以下を追加。

```
export HADOOP_HOME=/usr/local/hadoop-2.8.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
```

編集内容の読み込みとHadoopのインストールができたことの確認

```
[ec2-user@ip-172-31-44-80 ~]$ . ~/.bashrc
[ec2-user@ip-172-31-44-80 local]$ hadoop version
Hadoop 2.8.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b3fe56402d908019d99af1f1f4fc65cb1d1436a2
Compiled by jdu on 2017-12-05T03:43Z
Compiled with protoc 2.5.0
From source with checksum 9ff4856d824e983fa510d3f843e3f19d
This command was run using /usr/local/hadoop-2.8.3/share/hadoop/common/hadoop-common-2.8.3.jar
```

### SSHの設定

クラスター内のマシンからhdfsやyarnユーザーからパスワードレスログインを許可するようにセットアップする必要がある。SSH鍵生成時のパスフレーズは、Test1234のように入力しておく。

```
[ec2-user@ip-172-31-44-80 local]$ su hdfs
Password: 
[hdfs@ip-172-31-44-80 local]$ ssh-keygen -t rsa -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hdfs/.ssh/id_rsa.
Your public key has been saved in /home/hdfs/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:b0iKwqpe1Vl6ziDjn1KC0M1HSnfKODL1od1AuI/7lAU hdfs@ip-172-31-44-80
The key's randomart image is:
+---[RSA 2048]----+
|      o.         |
|     + = .       |
|  . = @E*.       |
| . + O.*=.       |
|  . ++==So       |
| . .o+o=Bo       |
|  o...=o.oo      |
| ... oo ..       |
|=.    o+         |
+----[SHA256]-----+
[hdfs@ip-172-31-44-80 local]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hdfs@ip-172-31-44-80 local]$ exit
exit
[yarn@ip-172-31-44-80 local]$ ssh-keygen -t rsa -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Saving key "/home/yarn/.ssh/id_rsa" failed: passphrase is too short (minimum five characters)
[yarn@ip-172-31-44-80 local]$ ssh-keygen -t rsa -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/yarn/.ssh/id_rsa.
Your public key has been saved in /home/yarn/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:EgaMuDWE64ANH3u4mE6bXY5H8Sk699ah0x+Trg3FJQo yarn@ip-172-31-44-80
The key's randomart image is:
+---[RSA 2048]----+
| +.o.            |
|+ = ..           |
|.B =  oE   . .   |
|= = .o .. o o    |
|oo o  + S. o     |
|oo.  + + .. .    |
|o + * . +..+     |
| + = + + o+ o    |
|    + o...o+     |
+----[SHA256]-----+
[yarn@ip-172-31-44-80 local]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
```

ssh-agentを利用して、SSHができるようにしておく。

```
[ec2-user@ip-172-31-44-80 local]$ su hdfs
Password: 
[hdfs@ip-172-31-44-80 local]$ eval `ssh-agent`
Agent pid 23016
[hdfs@ip-172-31-44-80 local]$ ssh-add ~/.ssh/id_rsa
Enter passphrase for /home/hdfs/.ssh/id_rsa: 
Identity added: /home/hdfs/.ssh/id_rsa (/home/hdfs/.ssh/id_rsa)
[hdfs@ip-172-31-44-80 local]$ exit
exit
[ec2-user@ip-172-31-44-80 local]$ su yarn
Password: 
[yarn@ip-172-31-44-80 local]$ eval `ssh-agent`
Agent pid 23037
[yarn@ip-172-31-44-80 local]$ ssh-add ~/.ssh/id_rsa
Enter passphrase for /home/yarn/.ssh/id_rsa: 
Identity added: /home/yarn/.ssh/id_rsa (/home/yarn/.ssh/id_rsa)
```

### Hadoopの設定

```
[hdfs@ip-172-31-44-80 ~]$ cd $HADOOP_HOME/sbin
[hdfs@ip-172-31-44-80 sbin]$ ls
distribute-exclude.sh  hdfs-config.cmd  kms.sh                   slaves.sh      start-balancer.sh  start-secure-dns.sh  stop-all.cmd      stop-dfs.cmd        stop-yarn.cmd   yarn-daemons.sh
hadoop-daemon.sh       hdfs-config.sh   mr-jobhistory-daemon.sh  start-all.cmd  start-dfs.cmd      start-yarn.cmd       stop-all.sh       stop-dfs.sh         stop-yarn.sh
hadoop-daemons.sh      httpfs.sh        refresh-namenodes.sh     start-all.sh   start-dfs.sh       start-yarn.sh        stop-balancer.sh  stop-secure-dns.sh  yarn-daemon.sh
```

### HDFSファイルシステムのフォーマット

HDFSのインストールにはフォーマットが必要。\
データノードが全ファイルシステムのメタデータを管理し、データノードは動的にクラスターをjoin/leaveするので、データノードはフォーマット処理に関係しない。\
作られるファイルシステムの大きさは、クラスター中のデータノードの数によって決められるので、考える必要はない。

```
[ec2-user@ip-172-31-44-80 ~]$ su hdfs
Password: 
[hdfs@ip-172-31-44-80 ec2-user]$ hdfs namenode -format
18/04/07 08:03:50 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   user = hdfs
STARTUP_MSG:   host = ip-172-31-44-80.us-west-2.compute.internal/172.31.44.80
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.8.3
STARTUP_MSG:   classpath = /usr/local/hadoop-2.8.3/etc/hadoop:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jsch-0.1.54.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/hadoop-annotations-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/nimbus-jose-jwt-3.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/httpclient-4.5.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/json-smart-1.1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/avro-1.7.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jetty-sslengine-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/hadoop-auth-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/xz-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/htrace-core4-4.0.1-incubating.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/httpcore-4.4.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/jcip-annotations-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/hadoop-common-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/hadoop-nfs-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/common/hadoop-common-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/okio-1.4.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/hadoop-hdfs-client-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/okhttp-2.4.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/htrace-core4-4.0.1-incubating.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-native-client-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-native-client-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-client-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-nfs-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/hdfs/hadoop-hdfs-client-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/curator-test-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/javassist-3.18.1-GA.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/curator-client-2.7.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/xz-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/json-io-2.5.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/fst-2.50.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/commons-math-2.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/java-util-1.9.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-registry-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-tests-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-client-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-common-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-common-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-api-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/hadoop-annotations-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.3-tests.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.8.3.jar:/usr/local/hadoop-2.8.3/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.8.3.jar:/usr/local/hadoop-2.8.3/contrib/capacity-scheduler/*.jar
STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r b3fe56402d908019d99af1f1f4fc65cb1d1436a2; compiled by 'jdu' on 2017-12-05T03:43Z
STARTUP_MSG:   java = 1.7.0_171
************************************************************/
18/04/07 08:03:50 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
18/04/07 08:03:50 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-e3ff70d9-024d-4a8c-b199-7a39526f4ee6
18/04/07 08:03:51 INFO namenode.FSEditLog: Edit logging is async:true
18/04/07 08:03:51 INFO namenode.FSNamesystem: KeyProvider: null
18/04/07 08:03:51 INFO namenode.FSNamesystem: fsLock is fair: true
18/04/07 08:03:51 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
18/04/07 08:03:51 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
18/04/07 08:03:51 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
18/04/07 08:03:51 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
18/04/07 08:03:51 INFO blockmanagement.BlockManager: The block deletion will start around 2018 Apr 07 08:03:51
18/04/07 08:03:51 INFO util.GSet: Computing capacity for map BlocksMap
18/04/07 08:03:51 INFO util.GSet: VM type       = 64-bit
18/04/07 08:03:51 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
18/04/07 08:03:51 INFO util.GSet: capacity      = 2^21 = 2097152 entries
18/04/07 08:03:51 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
18/04/07 08:03:51 INFO blockmanagement.BlockManager: defaultReplication         = 3
18/04/07 08:03:51 INFO blockmanagement.BlockManager: maxReplication             = 512
18/04/07 08:03:51 INFO blockmanagement.BlockManager: minReplication             = 1
18/04/07 08:03:51 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
18/04/07 08:03:51 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
18/04/07 08:03:51 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
18/04/07 08:03:51 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
18/04/07 08:03:51 INFO namenode.FSNamesystem: fsOwner             = hdfs (auth:SIMPLE)
18/04/07 08:03:51 INFO namenode.FSNamesystem: supergroup          = supergroup
18/04/07 08:03:51 INFO namenode.FSNamesystem: isPermissionEnabled = true
18/04/07 08:03:51 INFO namenode.FSNamesystem: HA Enabled: false
18/04/07 08:03:51 INFO namenode.FSNamesystem: Append Enabled: true
18/04/07 08:03:51 INFO util.GSet: Computing capacity for map INodeMap
18/04/07 08:03:51 INFO util.GSet: VM type       = 64-bit
18/04/07 08:03:51 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
18/04/07 08:03:51 INFO util.GSet: capacity      = 2^20 = 1048576 entries
18/04/07 08:03:51 INFO namenode.FSDirectory: ACLs enabled? false
18/04/07 08:03:51 INFO namenode.FSDirectory: XAttrs enabled? true
18/04/07 08:03:51 INFO namenode.NameNode: Caching file names occurring more than 10 times
18/04/07 08:03:51 INFO util.GSet: Computing capacity for map cachedBlocks
18/04/07 08:03:51 INFO util.GSet: VM type       = 64-bit
18/04/07 08:03:51 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
18/04/07 08:03:51 INFO util.GSet: capacity      = 2^18 = 262144 entries
18/04/07 08:03:51 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
18/04/07 08:03:51 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
18/04/07 08:03:51 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
18/04/07 08:03:51 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
18/04/07 08:03:51 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
18/04/07 08:03:51 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
18/04/07 08:03:51 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
18/04/07 08:03:51 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
18/04/07 08:03:51 INFO util.GSet: Computing capacity for map NameNodeRetryCache
18/04/07 08:03:51 INFO util.GSet: VM type       = 64-bit
18/04/07 08:03:51 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
18/04/07 08:03:51 INFO util.GSet: capacity      = 2^15 = 32768 entries
18/04/07 08:03:51 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1797533508-172.31.44.80-1523088231605
18/04/07 08:03:51 INFO common.Storage: Storage directory /tmp/hadoop-hdfs/dfs/name has been successfully formatted.
18/04/07 08:03:51 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/hadoop-hdfs/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
18/04/07 08:03:51 INFO namenode.FSImageFormatProtobuf: Image file /tmp/hadoop-hdfs/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
18/04/07 08:03:51 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/04/07 08:03:51 INFO util.ExitUtil: Exiting with status 0
18/04/07 08:03:51 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ip-172-31-44-80.us-west-2.compute.internal/172.31.44.80
************************************************************/
```

### デーモンの起動と停止

HDFSデーモンをstart-dfs.shで開始

* スクリプトは、hdfs getconf -namenodesのコマンドを実行して得られた各マシン上でネームノードを開始。
* スレーブファイル上にリスト化されている各マシン上でデータノードを開始。
* hdfs getconf -secondaryNameNodesのコマンドを実行して得られた各マシン上でセカンダリのネームノードを開始。

```
[ec2-user@ip-172-31-44-80 sbin]$ su hdfs
[hdfs@ip-172-31-44-80 sbin]$ sudo mkdir /usr/local/hadoop-2.8.3/logs/
[hdfs@ip-172-31-44-80 sbin]$ sudo chmod 775 /usr/local/hadoop-2.8.3/logs/
[hdfs@ip-172-31-44-80 sbin]$ sudo chown -Rf hadoop:hadoop /usr/local/hadoop-2.8.3/logs/
[hdfs@ip-172-31-44-80 sbin]$ ls -al /usr/local/hadoop-2.8.3
total 160
drwxrwxr-x 10 hadoop hadoop  4096 Apr  7 08:36 .
drwxr-xr-x 13 root   root    4096 Apr  7 07:04 ..
drwxr-xr-x  2 hadoop hadoop  4096 Dec  5 04:28 bin
drwxr-xr-x  3 hadoop hadoop  4096 Dec  5 04:28 etc
drwxr-xr-x  2 hadoop hadoop  4096 Dec  5 04:28 include
drwxr-xr-x  3 hadoop hadoop  4096 Dec  5 04:28 lib
drwxr-xr-x  2 hadoop hadoop  4096 Dec  5 04:28 libexec
-rw-r--r--  1 hadoop hadoop 99253 Dec  5 04:28 LICENSE.txt
drwxrwxr-x  2 hadoop hadoop  4096 Apr  7 08:36 logs
-rw-r--r--  1 hadoop hadoop 15915 Dec  5 04:28 NOTICE.txt
-rw-r--r--  1 hadoop hadoop  1366 Dec  5 04:28 README.txt
drwxr-xr-x  2 hadoop hadoop  4096 Dec  5 04:28 sbin
drwxr-xr-x  4 hadoop hadoop  4096 Dec  5 04:28 share
```

core-site.xmlの編集

```
[hdfs@ip-172-31-44-80 ec2-user]$ sudo vim /usr/local/hadoop-2.8.3/etc/hadoop/core-site.xml
```

core-site.xml

```
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:8020</value>
  </property>
</configuration>
```

start-dfs.shの実行

```
[hdfs@ip-172-31-44-80 ec2-user]$ start-dfs.sh
Starting namenodes on [localhost]
Enter passphrase for key '/home/hdfs/.ssh/id_rsa': 
localhost: starting namenode, logging to /usr/local/hadoop-2.8.3/logs/hadoop-hdfs-namenode-ip-172-31-44-80.out
Enter passphrase for key '/home/hdfs/.ssh/id_rsa': 
localhost: starting datanode, logging to /usr/local/hadoop-2.8.3/logs/hadoop-hdfs-datanode-ip-172-31-44-80.out
Starting secondary namenodes [0.0.0.0]
Enter passphrase for key '/home/hdfs/.ssh/id_rsa': 
Enter passphrase for key '/home/hdfs/.ssh/id_rsa': 
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.8.3/logs/hadoop-hdfs-secondarynamenode-ip-172-31-44-80.out
```

ネームノード情報の取得

```
[hdfs@ip-172-31-44-80 sbin]$ hdfs getconf -namenodes
localhost
```

セカンダリのネームノード情報の取得

```
[hdfs@ip-172-31-44-80 sbin]$ hdfs getconf -secondaryNameNodes
0.0.0.0
```

スレーブファイルは以下。

```
[hdfs@ip-172-31-44-80 hadoop-2.8.3]$ pwd
/usr/local/hadoop-2.8.3
[hdfs@ip-172-31-44-80 hadoop-2.8.3]$ cat etc/hadoop/slaves 
localhost
```

YARNデーモンをstart-yarn.shで開始

* スクリプトは、ローカルマシン上でリソースマネージャを開始
* スレーブファイルにリスト化された各マシン上でノードマネージャを開始

```
[hdfs@ip-172-31-44-80 sbin]$ su yarn
Password: 
[yarn@ip-172-31-44-80 sbin]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.8.3/logs/yarn-yarn-resourcemanager-ip-172-31-44-80.out
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:JKwwVSwxYDwyPu0fSeyRd7+/TEDDw9JZxSQQSMjhCr8.
ECDSA key fingerprint is MD5:68:09:14:01:ae:9f:14:5f:ec:79:bd:f9:c8:93:9e:ce.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Enter passphrase for key '/home/yarn/.ssh/id_rsa': 
localhost: starting nodemanager, logging to /usr/local/hadoop-2.8.3/logs/yarn-yarn-nodemanager-ip-172-31-44-80.out

```

MapReduceデーモンであるjob history serverを開始

```
[yarn@ip-172-31-44-80 sbin]$ su mapred
Password: 
[mapred@ip-172-31-44-80 sbin]$ mr-jobhistory-daemon.sh start historyserver
[mapred@ip-172-31-44-80 sbin]$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /usr/local/hadoop-2.8.3/logs/mapred-mapred-historyserver-ip-172-31-44-80.out
```

Hadoopクラスターが立ちあがって起動したらユーザーへアクセスする権限を与える

```
[ec2-user@ip-172-31-44-80 ~]$ sudo su
[root@ip-172-31-44-80 ec2-user]# hadoop fs -mkdir -p /user/ec2-user
[root@ip-172-31-44-80 ec2-user]# hadoop fs -chown ec2-user:ec2-user /user/ec2-user/
```

ユーザーディレクトリに空間の制限を設けるのも良い。 コマンド: `hdfs dfsadmin -setSpaceQuota 1t /user/ec2-user/`

ベンチマークを実行

## Hadoop 設定

Hadoopのインストールの設定を制御するファイル

* hadoop-env.sh
  * Hadoopを稼働するスクリプトで利用される環境変数
* mapred-env.sh
  * MapRecudeを稼働するスクリプトで利用される環境変数。hadoop-env.shを上書き。
* yarn-env.sh
  * YARNを稼働するスクリプトで利用される環境変数。hadoop-env.shを上書き。
* core-cite.xml
  * HDFS,MapReduce,YARNに共通なI/Oの設定等
* hdfs-site.xml
  * HDFSデーモン(ネームノード、セカンダリネームノード、データノード)
* mapred-site.xml
  * MapReuceデーモン(Job history server)
* yarn-site.yml
  * YARNデーモン(リソースマネージャ、Web app proxy server, ノードマネージャ)
* slaves
  * データノードとノードマネージャを稼働するマシンのリスト
* hadoop-metrics2.properties
  * Hadoopでどのようにメトリクスをpublishするか
* log4j.properties
  * システムログファイル、ネームノード監査ログ、タスクJVMプロセスのタスクログ
* hadoop-policy.xml
  * セキュアモードのHadoopを稼働するアクセス制御リストの制御設定

```
$ ls /usr/local/hadoop-2.8.3/etc/hadoop
capacity-scheduler.xml  hadoop-env.sh               httpfs-env.sh            kms-env.sh            mapred-env.sh               ssl-server.xml.example
configuration.xsl       hadoop-metrics2.properties  httpfs-log4j.properties  kms-log4j.properties  mapred-queues.xml.template  yarn-env.cmd
container-executor.cfg  hadoop-metrics.properties   httpfs-signature.secret  kms-site.xml          mapred-site.xml.template    yarn-env.sh
core-site.xml           hadoop-policy.xml           httpfs-site.xml          log4j.properties      slaves                      yarn-site.xml
hadoop-env.cmd          hdfs-site.xml               kms-acls.xml             mapred-env.cmd        ssl-client.xml.example
```

## Hadoop Clusterのベンチマーク

Benchmarksはtest JARファイルでパッケージ化されている。

* 確認方法

```
$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-*-tests.jar
An example program must be given as the first argument.
Valid program names are:
  DFSCIOTest: Distributed i/o benchmark of libhdfs.
  DistributedFSCheck: Distributed checkup of the file system consistency.
  JHLogAnalyzer: Job History Log analyzer.
  MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
  NNdataGenerator: Generate the data to be used by NNloadGenerator
  NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
  NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
  NNstructureGenerator: Generate the structure to be used by NNdataGenerator
  SliveTest: HDFS Stress Test and Live Data Verification.
  TestDFSIO: Distributed i/o benchmark.
  fail: a job that always fails
  filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
  largesorter: Large-Sort tester
  loadgen: Generic map/reduce load generator
  mapredtest: A map/reduce test check.
  minicluster: Single process HDFS and MR cluster.
  mrbench: A map/reduce benchmark that can create many small jobs
  nnbench: A benchmark that stresses the namenode w/ MR.
  nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR.
  sleep: A job that sleeps at each map and reduce task.
  testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
  testfilesystem: A test for FileSystem read/write.
  testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
  testsequencefile: A test for flat files of binary key value pairs.
  testsequencefileinputformat: A test for sequence file input format.
  testtextinputformat: A test for text input format.
  threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
  timelineperformance: A job that launches mappers to test timlineserver performance.
```

* 利用方法の確認方法

```
$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-*-tests.jar \
> TestDFSIO
18/04/15 04:24:01 INFO fs.TestDFSIO: TestDFSIO.1.8
Missing arguments.
Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -truncate | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]
```

### TerasortでMapReduceのベンチマーク

* 1,000個のmapを利用するデータのテラバイトのデータ生成方法

```
$ hadoop jar \
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
teragen -Dmapreduce.job.maps=1000 10t random-data
```

* terasortを稼働

```
$ hadoop jar \
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
terasort random-data sorted-data

```

* sanity check

```
$ hadoop jar \
$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
teravalidate sorted-data report
```

実際には10Mで実行した結果

データ生成

```
[mapred@ip-172-31-44-80 ec2-user]$ HADOOP_USER_NAME=hdfs JAVA_HOME=/usr/lib/jvm/jre /usr/local/hadoop-2.8.3/bin/hadoop fs -chown mapred:hadoop /
[mapred@ip-172-31-44-80 ec2-user]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teragen -Dmapreduce.job.maps=1000 10m random-data
18/04/15 04:44:28 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/04/15 04:44:28 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/04/15 04:44:28 INFO terasort.TeraGen: Generating 10000000 using 1
18/04/15 04:44:28 INFO mapreduce.JobSubmitter: number of splits:1
18/04/15 04:44:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local339502331_0001
18/04/15 04:44:29 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/04/15 04:44:29 INFO mapreduce.Job: Running job: job_local339502331_0001
18/04/15 04:44:29 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/04/15 04:44:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 04:44:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 04:44:29 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/04/15 04:44:29 INFO mapred.LocalJobRunner: Waiting for map tasks
18/04/15 04:44:29 INFO mapred.LocalJobRunner: Starting task: attempt_local339502331_0001_m_000000_0
18/04/15 04:44:29 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 04:44:29 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 04:44:29 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 04:44:29 INFO mapred.MapTask: Processing split: org.apache.hadoop.examples.terasort.TeraGen$RangeInputFormat$RangeInputSplit@756a69a3
18/04/15 04:44:30 INFO mapreduce.Job: Job job_local339502331_0001 running in uber mode : false
18/04/15 04:44:30 INFO mapreduce.Job:  map 0% reduce 0%
18/04/15 04:44:41 INFO mapred.LocalJobRunner: 
18/04/15 04:44:41 INFO mapred.Task: Task:attempt_local339502331_0001_m_000000_0 is done. And is in the process of committing
18/04/15 04:44:41 INFO mapred.LocalJobRunner: 
18/04/15 04:44:41 INFO mapred.Task: Task attempt_local339502331_0001_m_000000_0 is allowed to commit now
18/04/15 04:44:41 INFO output.FileOutputCommitter: Saved output of task 'attempt_local339502331_0001_m_000000_0' to hdfs://localhost:8020/user/mapred/random-data/_temporary/0/task_local339502331_0001_m_000000
18/04/15 04:44:41 INFO mapred.LocalJobRunner: map
18/04/15 04:44:41 INFO mapred.Task: Task 'attempt_local339502331_0001_m_000000_0' done.
18/04/15 04:44:41 INFO mapred.Task: Final Counters for attempt_local339502331_0001_m_000000_0: Counters: 21
	File System Counters
		FILE: Number of bytes read=302059
		FILE: Number of bytes written=672763
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=0
		HDFS: Number of bytes written=1000000000
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Map-Reduce Framework
		Map input records=10000000
		Map output records=10000000
		Input split bytes=82
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=96
		Total committed heap usage (bytes)=137363456
	org.apache.hadoop.examples.terasort.TeraGen$Counters
		CHECKSUM=21472776955442690
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=1000000000
18/04/15 04:44:41 INFO mapred.LocalJobRunner: Finishing task: attempt_local339502331_0001_m_000000_0
18/04/15 04:44:41 INFO mapred.LocalJobRunner: map task executor complete.
18/04/15 04:44:42 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 04:44:42 INFO mapreduce.Job: Job job_local339502331_0001 completed successfully
18/04/15 04:44:42 INFO mapreduce.Job: Counters: 21
	File System Counters
		FILE: Number of bytes read=302059
		FILE: Number of bytes written=672763
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=0
		HDFS: Number of bytes written=1000000000
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Map-Reduce Framework
		Map input records=10000000
		Map output records=10000000
		Input split bytes=82
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=96
		Total committed heap usage (bytes)=137363456
	org.apache.hadoop.examples.terasort.TeraGen$Counters
		CHECKSUM=21472776955442690
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=1000000000
```

terasortを稼働

```
[mapred@ip-172-31-44-80 ~]$ hadoop jar \
> $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
> terasort random-data sorted-data
18/04/15 05:01:57 INFO terasort.TeraSort: starting
18/04/15 05:01:58 INFO input.FileInputFormat: Total input files to process : 1
Spent 79ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
Computing input splits took 81ms
Sampling 8 splits of 8
Making 1 from 100000 sampled records
Computing parititions took 707ms
Spent 790ms computing partitions.
18/04/15 05:01:59 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/04/15 05:01:59 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/04/15 05:01:59 INFO mapreduce.JobSubmitter: number of splits:8
18/04/15 05:01:59 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local2107661221_0001
18/04/15 05:02:00 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-mapred/mapred/local/1523768520025/_partition.lst <- /home/mapred/_partition.lst
18/04/15 05:02:00 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:8020/user/mapred/sorted-data/_partition.lst as file:/tmp/hadoop-mapred/mapred/local/1523768520025/_partition.lst
18/04/15 05:02:00 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/04/15 05:02:00 INFO mapreduce.Job: Running job: job_local2107661221_0001
18/04/15 05:02:00 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/04/15 05:02:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:00 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:00 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/04/15 05:02:00 INFO mapred.LocalJobRunner: Waiting for map tasks
18/04/15 05:02:00 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000000_0
18/04/15 05:02:00 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:00 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:00 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:00 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:0+134217728
18/04/15 05:02:00 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:00 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:00 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:00 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:00 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:00 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:01 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:01 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:01 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:01 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:01 INFO mapreduce.Job: Job job_local2107661221_0001 running in uber mode : false
18/04/15 05:02:01 INFO mapreduce.Job:  map 0% reduce 0%
18/04/15 05:02:03 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:03 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:03 INFO mapred.LocalJobRunner: 
18/04/15 05:02:03 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:03 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:03 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888140; bufvoid = 104857600
18/04/15 05:02:03 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313704(65254816); length = 2525113/6553600
18/04/15 05:02:05 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:05 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:05 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586498 bytes
18/04/15 05:02:06 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000000_0 is done. And is in the process of committing
18/04/15 05:02:06 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:06 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000000_0' done.
18/04/15 05:02:06 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000000_0: Counters: 22
	File System Counters
		FILE: Number of bytes read=139889898
		FILE: Number of bytes written=279851724
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=144217800
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=27
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Map-Reduce Framework
		Map input records=1342178
		Map output records=1342178
		Map output bytes=136902156
		Map output materialized bytes=139586518
		Input split bytes=123
		Combine input records=0
		Spilled Records=2684356
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=46
		Total committed heap usage (bytes)=355991552
	File Input Format Counters 
		Bytes Read=134217800
18/04/15 05:02:06 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000000_0
18/04/15 05:02:06 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000001_0
18/04/15 05:02:06 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:06 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:06 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:06 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:134217728+134217728
18/04/15 05:02:06 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:06 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:06 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:06 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:06 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:06 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:07 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:07 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:07 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:07 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:07 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:09 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:09 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:09 INFO mapred.LocalJobRunner: 
18/04/15 05:02:09 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:09 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:09 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:09 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:10 INFO mapreduce.Job:  map 13% reduce 0%
18/04/15 05:02:10 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:10 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:10 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:12 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000001_0 is done. And is in the process of committing
18/04/15 05:02:12 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:12 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000001_0' done.
18/04/15 05:02:12 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000001_0: Counters: 22
	File System Counters
		FILE: Number of bytes read=279477325
		FILE: Number of bytes written=559024590
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=278435500
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=29
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Map-Reduce Framework
		Map input records=1342177
		Map output records=1342177
		Map output bytes=136902054
		Map output materialized bytes=139586414
		Input split bytes=123
		Combine input records=0
		Spilled Records=2684354
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=226
		Total committed heap usage (bytes)=423100416
	File Input Format Counters 
		Bytes Read=134217700
18/04/15 05:02:12 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000001_0
18/04/15 05:02:12 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000002_0
18/04/15 05:02:12 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:12 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:12 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:12 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:268435456+134217728
18/04/15 05:02:12 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:12 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:12 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:12 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:12 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:12 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:12 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:12 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:12 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:12 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:13 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:14 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:14 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:14 INFO mapred.LocalJobRunner: 
18/04/15 05:02:14 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:14 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:14 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:14 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:15 INFO mapreduce.Job:  map 25% reduce 0%
18/04/15 05:02:16 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:16 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:16 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:17 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000002_0 is done. And is in the process of committing
18/04/15 05:02:17 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:17 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000002_0' done.
18/04/15 05:02:17 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000002_0: Counters: 22
	File System Counters
		FILE: Number of bytes read=419064752
		FILE: Number of bytes written=838197456
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=412653200
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=31
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Map-Reduce Framework
		Map input records=1342177
		Map output records=1342177
		Map output bytes=136902054
		Map output materialized bytes=139586414
		Input split bytes=123
		Combine input records=0
		Spilled Records=2684354
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=83
		Total committed heap usage (bytes)=413663232
	File Input Format Counters 
		Bytes Read=134217700
18/04/15 05:02:17 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000002_0
18/04/15 05:02:17 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000003_0
18/04/15 05:02:17 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:17 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:17 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:17 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:402653184+134217728
18/04/15 05:02:17 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:17 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:17 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:17 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:17 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:17 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:18 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:18 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:18 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:18 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:18 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:19 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:19 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:20 INFO mapred.LocalJobRunner: 
18/04/15 05:02:20 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:20 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:20 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888140; bufvoid = 104857600
18/04/15 05:02:20 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313704(65254816); length = 2525113/6553600
18/04/15 05:02:20 INFO mapreduce.Job:  map 38% reduce 0%
18/04/15 05:02:21 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:21 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:21 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586498 bytes
18/04/15 05:02:22 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000003_0 is done. And is in the process of committing
18/04/15 05:02:22 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:22 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000003_0' done.
18/04/15 05:02:22 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000003_0: Counters: 22
	File System Counters
		FILE: Number of bytes read=558652283
		FILE: Number of bytes written=1117370530
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=546871000
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=33
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Map-Reduce Framework
		Map input records=1342178
		Map output records=1342178
		Map output bytes=136902156
		Map output materialized bytes=139586518
		Input split bytes=123
		Combine input records=0
		Spilled Records=2684356
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=82
		Total committed heap usage (bytes)=430440448
	File Input Format Counters 
		Bytes Read=134217800
18/04/15 05:02:22 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000003_0
18/04/15 05:02:22 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000004_0
18/04/15 05:02:22 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:22 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:22 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:22 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:805306368+134217728
18/04/15 05:02:22 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:22 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:22 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:22 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:22 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:22 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:23 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:23 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:23 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:23 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:23 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:25 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:25 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:25 INFO mapred.LocalJobRunner: 
18/04/15 05:02:25 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:25 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:25 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:25 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:26 INFO mapreduce.Job:  map 50% reduce 0%
18/04/15 05:02:26 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:26 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:26 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:28 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000004_0 is done. And is in the process of committing
18/04/15 05:02:28 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:28 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000004_0' done.
18/04/15 05:02:28 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000004_0: Counters: 22
	File System Counters
		FILE: Number of bytes read=698239710
		FILE: Number of bytes written=1396543396
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=681088700
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=35
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Map-Reduce Framework
		Map input records=1342177
		Map output records=1342177
		Map output bytes=136902054
		Map output materialized bytes=139586414
		Input split bytes=123
		Combine input records=0
		Spilled Records=2684354
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=147
		Total committed heap usage (bytes)=492306432
	File Input Format Counters 
		Bytes Read=134217700
18/04/15 05:02:28 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000004_0
18/04/15 05:02:28 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000005_0
18/04/15 05:02:28 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:28 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:28 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:28 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:671088640+134217728
18/04/15 05:02:28 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:28 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:28 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:28 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:28 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:28 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:28 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:28 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:28 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:28 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:28 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:30 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:30 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:30 INFO mapred.LocalJobRunner: 
18/04/15 05:02:30 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:30 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:30 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:30 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:31 INFO mapreduce.Job:  map 63% reduce 0%
18/04/15 05:02:32 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:32 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:32 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:33 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000005_0 is done. And is in the process of committing
18/04/15 05:02:33 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:33 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000005_0' done.
18/04/15 05:02:33 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000005_0: Counters: 22
	File System Counters
		FILE: Number of bytes read=837826625
		FILE: Number of bytes written=1675716262
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=815306400
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=37
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Map-Reduce Framework
		Map input records=1342177
		Map output records=1342177
		Map output bytes=136902054
		Map output materialized bytes=139586414
		Input split bytes=123
		Combine input records=0
		Spilled Records=2684354
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=82
		Total committed heap usage (bytes)=488636416
	File Input Format Counters 
		Bytes Read=134217700
18/04/15 05:02:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000005_0
18/04/15 05:02:33 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000006_0
18/04/15 05:02:33 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:33 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:33 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:33 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:536870912+134217728
18/04/15 05:02:33 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:33 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:33 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:33 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:33 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:33 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:34 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:34 INFO mapred.MapTask: bufstart = 0; bufend = 72511698; bufvoid = 104857600
18/04/15 05:02:34 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23370804(93483216); length = 2843593/6553600
18/04/15 05:02:34 INFO mapred.MapTask: (EQUATOR) 75355282 kvi 18838816(75355264)
18/04/15 05:02:34 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:35 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:35 INFO mapred.MapTask: (RESET) equator 75355282 kv 18838816(75355264) kvi 18127932(72511728)
18/04/15 05:02:35 INFO mapred.LocalJobRunner: 
18/04/15 05:02:35 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:35 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:35 INFO mapred.MapTask: bufstart = 75355282; bufend = 34888038; bufvoid = 104857600
18/04/15 05:02:35 INFO mapred.MapTask: kvstart = 18838816(75355264); kvend = 16313708(65254832); length = 2525109/6553600
18/04/15 05:02:36 INFO mapreduce.Job:  map 75% reduce 0%
18/04/15 05:02:37 INFO mapred.MapTask: Finished spill 1
18/04/15 05:02:37 INFO mapred.Merger: Merging 2 sorted segments
18/04/15 05:02:37 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 139586394 bytes
18/04/15 05:02:38 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000006_0 is done. And is in the process of committing
18/04/15 05:02:38 INFO mapred.LocalJobRunner: map > sort
18/04/15 05:02:38 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000006_0' done.
18/04/15 05:02:38 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000006_0: Counters: 22
	File System Counters
		FILE: Number of bytes read=977413540
		FILE: Number of bytes written=1954889128
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=949524100
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=39
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Map-Reduce Framework
		Map input records=1342177
		Map output records=1342177
		Map output bytes=136902054
		Map output materialized bytes=139586414
		Input split bytes=123
		Combine input records=0
		Spilled Records=2684354
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=79
		Total committed heap usage (bytes)=488112128
	File Input Format Counters 
		Bytes Read=134217700
18/04/15 05:02:38 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000006_0
18/04/15 05:02:38 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_m_000007_0
18/04/15 05:02:38 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:38 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:38 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:38 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/random-data/part-m-00000:939524096+60475904
18/04/15 05:02:38 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:02:38 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:02:38 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:02:38 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:02:38 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:02:38 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:02:39 INFO mapred.LocalJobRunner: 
18/04/15 05:02:39 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:02:39 INFO mapred.MapTask: Spilling map output
18/04/15 05:02:39 INFO mapred.MapTask: bufstart = 0; bufend = 61685418; bufvoid = 104857600
18/04/15 05:02:39 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 23795364(95181456); length = 2419033/6553600
18/04/15 05:02:39 INFO mapreduce.Job:  map 88% reduce 0%
18/04/15 05:02:40 INFO mapred.MapTask: Finished spill 0
18/04/15 05:02:40 INFO mapred.Task: Task:attempt_local2107661221_0001_m_000007_0 is done. And is in the process of committing
18/04/15 05:02:40 INFO mapred.LocalJobRunner: map
18/04/15 05:02:40 INFO mapred.Task: Task 'attempt_local2107661221_0001_m_000007_0' done.
18/04/15 05:02:40 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_m_000007_0: Counters: 22
	File System Counters
		FILE: Number of bytes read=977414035
		FILE: Number of bytes written=2017784102
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1010000000
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=41
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Map-Reduce Framework
		Map input records=604759
		Map output records=604759
		Map output bytes=61685418
		Map output materialized bytes=62894942
		Input split bytes=123
		Combine input records=0
		Spilled Records=604759
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=76
		Total committed heap usage (bytes)=492306432
	File Input Format Counters 
		Bytes Read=60475900
18/04/15 05:02:40 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_m_000007_0
18/04/15 05:02:40 INFO mapred.LocalJobRunner: map task executor complete.
18/04/15 05:02:40 INFO mapred.LocalJobRunner: Waiting for reduce tasks
18/04/15 05:02:40 INFO mapred.LocalJobRunner: Starting task: attempt_local2107661221_0001_r_000000_0
18/04/15 05:02:40 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:02:40 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:02:40 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:02:40 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@6cd08990
18/04/15 05:02:40 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=344614496, maxSingleShuffleLimit=86153624, mergeThreshold=227445584, ioSortFactor=10, memToMemMergeOutputsThreshold=10
18/04/15 05:02:40 INFO reduce.EventFetcher: attempt_local2107661221_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
18/04/15 05:02:40 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000005_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000005_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:40 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000005_0
18/04/15 05:02:40 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000002_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:40 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000002_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:41 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000002_0
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000001_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000001_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:41 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000001_0
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000004_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000004_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:41 INFO mapreduce.Job:  map 100% reduce 0%
18/04/15 05:02:41 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000004_0
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000007_0 decomp: 62894938 len: 62894942 to MEMORY
18/04/15 05:02:41 INFO reduce.InMemoryMapOutput: Read 62894938 bytes from map-output for attempt_local2107661221_0001_m_000007_0
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 62894938, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->62894938
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000000_0: Shuffling to disk since 139586514 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000000_0 decomp: 139586514 len: 139586518 to DISK
18/04/15 05:02:41 INFO reduce.OnDiskMapOutput: Read 139586518 bytes from map-output for attempt_local2107661221_0001_m_000000_0
18/04/15 05:02:41 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000003_0: Shuffling to disk since 139586514 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:41 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000003_0 decomp: 139586514 len: 139586518 to DISK
18/04/15 05:02:42 INFO reduce.OnDiskMapOutput: Read 139586518 bytes from map-output for attempt_local2107661221_0001_m_000003_0
18/04/15 05:02:42 INFO reduce.MergeManagerImpl: attempt_local2107661221_0001_m_000006_0: Shuffling to disk since 139586410 is greater than maxSingleShuffleLimit (86153624)
18/04/15 05:02:42 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local2107661221_0001_m_000006_0 decomp: 139586410 len: 139586414 to DISK
18/04/15 05:02:43 INFO reduce.OnDiskMapOutput: Read 139586414 bytes from map-output for attempt_local2107661221_0001_m_000006_0
18/04/15 05:02:43 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
18/04/15 05:02:43 INFO mapred.LocalJobRunner: 8 / 8 copied.
18/04/15 05:02:43 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 7 on-disk map-outputs
18/04/15 05:02:43 INFO mapred.Merger: Merging 1 sorted segments
18/04/15 05:02:43 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 62894925 bytes
18/04/15 05:02:43 INFO reduce.MergeManagerImpl: Merged 1 segments, 62894938 bytes to disk to satisfy reduce memory limit
18/04/15 05:02:43 INFO reduce.MergeManagerImpl: Merging 8 files, 1040000048 bytes from disk
18/04/15 05:02:43 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
18/04/15 05:02:43 INFO mapred.Merger: Merging 8 sorted segments
18/04/15 05:02:43 INFO mapred.Merger: Down to the last merge-pass, with 8 segments left of total size: 1039999912 bytes
18/04/15 05:02:43 INFO mapred.LocalJobRunner: 8 / 8 copied.
18/04/15 05:02:43 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
18/04/15 05:02:52 INFO mapred.LocalJobRunner: reduce > reduce
18/04/15 05:02:53 INFO mapreduce.Job:  map 100% reduce 87%
18/04/15 05:02:57 INFO mapred.Task: Task:attempt_local2107661221_0001_r_000000_0 is done. And is in the process of committing
18/04/15 05:02:57 INFO mapred.LocalJobRunner: reduce > reduce
18/04/15 05:02:57 INFO mapred.Task: Task attempt_local2107661221_0001_r_000000_0 is allowed to commit now
18/04/15 05:02:57 INFO output.FileOutputCommitter: Saved output of task 'attempt_local2107661221_0001_r_000000_0' to hdfs://localhost:8020/user/mapred/sorted-data/_temporary/0/task_local2107661221_0001_r_000000
18/04/15 05:02:57 INFO mapred.LocalJobRunner: reduce > reduce
18/04/15 05:02:57 INFO mapred.Task: Task 'attempt_local2107661221_0001_r_000000_0' done.
18/04/15 05:02:57 INFO mapred.Task: Final Counters for attempt_local2107661221_0001_r_000000_0: Counters: 29
	File System Counters
		FILE: Number of bytes read=3057414387
		FILE: Number of bytes written=3057784150
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1010000000
		HDFS: Number of bytes written=1000000000
		HDFS: Number of read operations=44
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=4
	Map-Reduce Framework
		Combine input records=0
		Combine output records=0
		Reduce input groups=10000000
		Reduce shuffle bytes=1040000048
		Reduce input records=10000000
		Reduce output records=10000000
		Spilled Records=10000000
		Shuffled Maps =8
		Failed Shuffles=0
		Merged Map outputs=8
		GC time elapsed (ms)=69
		Total committed heap usage (bytes)=500170752
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Output Format Counters 
		Bytes Written=1000000000
18/04/15 05:02:57 INFO mapred.LocalJobRunner: Finishing task: attempt_local2107661221_0001_r_000000_0
18/04/15 05:02:57 INFO mapred.LocalJobRunner: reduce task executor complete.
18/04/15 05:02:58 INFO mapreduce.Job:  map 100% reduce 100%
18/04/15 05:02:58 INFO mapreduce.Job: Job job_local2107661221_0001 completed successfully
18/04/15 05:02:58 INFO mapreduce.Job: Counters: 35
	File System Counters
		FILE: Number of bytes read=7945392555
		FILE: Number of bytes written=12897161338
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=5848096700
		HDFS: Number of bytes written=1000000000
		HDFS: Number of read operations=316
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=20
	Map-Reduce Framework
		Map input records=10000000
		Map output records=10000000
		Map output bytes=1020000000
		Map output materialized bytes=1040000048
		Input split bytes=984
		Combine input records=0
		Combine output records=0
		Reduce input groups=10000000
		Reduce shuffle bytes=1040000048
		Reduce input records=10000000
		Reduce output records=10000000
		Spilled Records=29395241
		Shuffled Maps =8
		Failed Shuffles=0
		Merged Map outputs=8
		GC time elapsed (ms)=890
		Total committed heap usage (bytes)=4084727808
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=1000000000
	File Output Format Counters 
		Bytes Written=1000000000
18/04/15 05:02:58 INFO terasort.TeraSort: done
```

sanity check

```
[mapred@ip-172-31-44-80 ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar teravalidate sorted-data report
18/04/15 05:06:35 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
18/04/15 05:06:35 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
18/04/15 05:06:36 INFO input.FileInputFormat: Total input files to process : 1
Spent 37ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
18/04/15 05:06:36 INFO mapreduce.JobSubmitter: number of splits:1
18/04/15 05:06:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1823691639_0001
18/04/15 05:06:36 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
18/04/15 05:06:36 INFO mapreduce.Job: Running job: job_local1823691639_0001
18/04/15 05:06:36 INFO mapred.LocalJobRunner: OutputCommitter set in config null
18/04/15 05:06:36 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:06:36 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:06:36 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/04/15 05:06:36 INFO mapred.LocalJobRunner: Waiting for map tasks
18/04/15 05:06:36 INFO mapred.LocalJobRunner: Starting task: attempt_local1823691639_0001_m_000000_0
18/04/15 05:06:36 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:06:36 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:06:36 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:06:36 INFO mapred.MapTask: Processing split: hdfs://localhost:8020/user/mapred/sorted-data/part-r-00000:0+1000000000
18/04/15 05:06:36 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
18/04/15 05:06:36 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
18/04/15 05:06:36 INFO mapred.MapTask: soft limit at 83886080
18/04/15 05:06:36 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
18/04/15 05:06:36 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
18/04/15 05:06:36 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
18/04/15 05:06:37 INFO mapreduce.Job: Job job_local1823691639_0001 running in uber mode : false
18/04/15 05:06:37 INFO mapreduce.Job:  map 0% reduce 0%
18/04/15 05:06:42 INFO mapred.LocalJobRunner: 
18/04/15 05:06:42 INFO mapred.MapTask: Starting flush of map output
18/04/15 05:06:42 INFO mapred.MapTask: Spilling map output
18/04/15 05:06:42 INFO mapred.MapTask: bufstart = 0; bufend = 82; bufvoid = 104857600
18/04/15 05:06:42 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214388(104857552); length = 9/6553600
18/04/15 05:06:42 INFO mapred.MapTask: Finished spill 0
18/04/15 05:06:42 INFO mapred.Task: Task:attempt_local1823691639_0001_m_000000_0 is done. And is in the process of committing
18/04/15 05:06:42 INFO mapred.LocalJobRunner: map
18/04/15 05:06:42 INFO mapred.Task: Task 'attempt_local1823691639_0001_m_000000_0' done.
18/04/15 05:06:42 INFO mapred.Task: Final Counters for attempt_local1823691639_0001_m_000000_0: Counters: 22
	File System Counters
		FILE: Number of bytes read=302147
		FILE: Number of bytes written=675681
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1000000000
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=5
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=1
	Map-Reduce Framework
		Map input records=10000000
		Map output records=3
		Map output bytes=82
		Map output materialized bytes=94
		Input split bytes=123
		Combine input records=0
		Spilled Records=3
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=74
		Total committed heap usage (bytes)=296747008
	File Input Format Counters 
		Bytes Read=1000000000
18/04/15 05:06:42 INFO mapred.LocalJobRunner: Finishing task: attempt_local1823691639_0001_m_000000_0
18/04/15 05:06:42 INFO mapred.LocalJobRunner: map task executor complete.
18/04/15 05:06:42 INFO mapred.LocalJobRunner: Waiting for reduce tasks
18/04/15 05:06:42 INFO mapred.LocalJobRunner: Starting task: attempt_local1823691639_0001_r_000000_0
18/04/15 05:06:42 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
18/04/15 05:06:42 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
18/04/15 05:06:42 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
18/04/15 05:06:42 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@5cde5e61
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=333971456, maxSingleShuffleLimit=83492864, mergeThreshold=220421168, ioSortFactor=10, memToMemMergeOutputsThreshold=10
18/04/15 05:06:42 INFO reduce.EventFetcher: attempt_local1823691639_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
18/04/15 05:06:42 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1823691639_0001_m_000000_0 decomp: 90 len: 94 to MEMORY
18/04/15 05:06:42 INFO reduce.InMemoryMapOutput: Read 90 bytes from map-output for attempt_local1823691639_0001_m_000000_0
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 90, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->90
18/04/15 05:06:42 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
18/04/15 05:06:42 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
18/04/15 05:06:42 INFO mapred.Merger: Merging 1 sorted segments
18/04/15 05:06:42 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: Merged 1 segments, 90 bytes to disk to satisfy reduce memory limit
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: Merging 1 files, 94 bytes from disk
18/04/15 05:06:42 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
18/04/15 05:06:42 INFO mapred.Merger: Merging 1 sorted segments
18/04/15 05:06:42 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes
18/04/15 05:06:42 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/04/15 05:06:42 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
18/04/15 05:06:42 INFO mapred.Task: Task:attempt_local1823691639_0001_r_000000_0 is done. And is in the process of committing
18/04/15 05:06:42 INFO mapred.LocalJobRunner: 1 / 1 copied.
18/04/15 05:06:42 INFO mapred.Task: Task attempt_local1823691639_0001_r_000000_0 is allowed to commit now
18/04/15 05:06:42 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1823691639_0001_r_000000_0' to hdfs://localhost:8020/user/mapred/report/_temporary/0/task_local1823691639_0001_r_000000
18/04/15 05:06:42 INFO mapred.LocalJobRunner: reduce > reduce
18/04/15 05:06:42 INFO mapred.Task: Task 'attempt_local1823691639_0001_r_000000_0' done.
18/04/15 05:06:42 INFO mapred.Task: Final Counters for attempt_local1823691639_0001_r_000000_0: Counters: 29
	File System Counters
		FILE: Number of bytes read=302367
		FILE: Number of bytes written=675775
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1000000000
		HDFS: Number of bytes written=24
		HDFS: Number of read operations=8
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Map-Reduce Framework
		Combine input records=0
		Combine output records=0
		Reduce input groups=3
		Reduce shuffle bytes=94
		Reduce input records=3
		Reduce output records=1
		Spilled Records=3
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=296747008
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Output Format Counters 
		Bytes Written=24
18/04/15 05:06:42 INFO mapred.LocalJobRunner: Finishing task: attempt_local1823691639_0001_r_000000_0
18/04/15 05:06:42 INFO mapred.LocalJobRunner: reduce task executor complete.
18/04/15 05:06:43 INFO mapreduce.Job:  map 100% reduce 100%
18/04/15 05:06:43 INFO mapreduce.Job: Job job_local1823691639_0001 completed successfully
18/04/15 05:06:43 INFO mapreduce.Job: Counters: 35
	File System Counters
		FILE: Number of bytes read=604514
		FILE: Number of bytes written=1351456
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=2000000000
		HDFS: Number of bytes written=24
		HDFS: Number of read operations=13
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=4
	Map-Reduce Framework
		Map input records=10000000
		Map output records=3
		Map output bytes=82
		Map output materialized bytes=94
		Input split bytes=123
		Combine input records=0
		Combine output records=0
		Reduce input groups=3
		Reduce shuffle bytes=94
		Reduce input records=3
		Reduce output records=1
		Spilled Records=6
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=74
		Total committed heap usage (bytes)=593494016
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=1000000000
	File Output Format Counters 
		Bytes Written=24
```

## 参考

* Amazon | Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale | Tom White | Software Development\
  <https://www.amazon.co.jp/Hadoop-Definitive-Storage-Analysis-Internet/dp/1491901632>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://hayashier.gitbook.io/article/hadoop/hadoop-get-started.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
