Hadoop
的伪分布式安装,参考:
配置文件
# 这里列出了Hadoop全部的配置文件,无论是伪分布式还是完全分布式实际都是通过这些配置文件实现# $HADOOP_HOME/etc/hadoop/├── capacity-scheduler.xml├── configuration.xsl├── container-executor.cfg├── core-site.xml├── hadoop-env.cmd├── hadoop-env.sh├── hadoop-metrics2.properties├── hadoop-metrics.properties├── hadoop-policy.xml├── hdfs-site.xml├── httpfs-env.sh├── httpfs-log4j.properties├── httpfs-signature.secret├── httpfs-site.xml├── kms-acls.xml├── kms-env.sh├── kms-log4j.properties├── kms-site.xml├── log4j.properties├── mapred-env.cmd├── mapred-env.sh├── mapred-queues.xml.template├── mapred-site.xml.template├── slaves├── ssl-client.xml.example├── ssl-server.xml.example├── yarn-env.sh└── yarn-site.xml
配置HDFS
# etc/hadoop/core-site.xml# etc/hadoop/hdfs-site.xml fs.defaultFS hdfs://v108.zlikun.com:9000 hadoop.tmp.dir /var/hadoop/tmp dfs.permissions false # 上述配置全部配置项参考:# http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/core-default.xml# http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml# Hadoop集群启动之后,NameNode是通过SSH来启动和停止各个节点上的各种守护进程的,所以在节点之间执行指令的时候不能有密码# 配置SSH免密登录$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys$ chmod 0600 ~/.ssh/authorized_keys# 执行格式化,注意输出日志中出现 ` Storage directory /var/hadoop/tmp/dfs/name has been successfully formatted.` 语句时,说明格式化成功$ bin/hdfs namenode -format18/01/30 08:50:38 INFO namenode.NameNode: STARTUP_MSG: /************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = v108.zlikun.com/192.168.1.108STARTUP_MSG: args = [-format]STARTUP_MSG: version = 2.7.5STARTUP_MSG: classpath = /opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/opt/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/opt/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/opt/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/opt/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/opt/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/opt/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/opt/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/opt/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/opt/hadoop/share/hadoop/common/lib/hadoop-auth-2.7.5.jar:/opt/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/opt/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/opt/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/opt/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/opt/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/opt/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/opt/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/opt/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/opt/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/opt/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/opt/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/opt/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/opt/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/opt/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/opt/hadoop/share/hadoop/common/lib/activation-1.1.jar:/opt/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/opt/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/opt/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/opt/hadoop/share/hadoop/common/lib/hadoop-annotations-2.7.5.jar:/opt/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/opt/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/opt/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/opt/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/opt/hadoop/share/hadoop/common/lib/jsch-0.1.54.jar:/opt/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/opt/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/opt/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/opt/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/opt/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/opt/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/opt/hadoop/share/hadoop/common/lib/xz-1.0.jar:/opt/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/opt/hadoop/share/hadoop/common/lib/jetty-sslengine-6.1.26.jar:/opt/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/opt/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/opt/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/opt/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/opt/hadoop/share/hadoop/common/lib/junit-4.11.jar:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/opt/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/opt/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/opt/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/opt/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/opt/hadoop/share/hadoop/common/lib/asm-3.2.jar:/opt/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/opt/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/opt/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/opt/hadoop/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/opt/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/opt/hadoop/share/hadoop/common/hadoop-common-2.7.5.jar:/opt/hadoop/share/hadoop/common/hadoop-common-2.7.5-tests.jar:/opt/hadoop/share/hadoop/common/hadoop-nfs-2.7.5.jar:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/opt/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/opt/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/opt/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/opt/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/opt/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/opt/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/opt/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/opt/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/opt/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/opt/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/opt/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/opt/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/opt/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/opt/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/opt/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/opt/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/opt/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/opt/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/opt/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/opt/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/opt/hadoop/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/opt/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/opt/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.5.jar:/opt/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.5-tests.jar:/opt/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/opt/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/opt/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/opt/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/opt/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/opt/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/opt/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/opt/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/opt/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/opt/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/opt/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/opt/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/opt/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/opt/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/opt/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/opt/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/opt/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/opt/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/opt/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/opt/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/opt/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/opt/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/opt/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/opt/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/opt/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/opt/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/opt/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/opt/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/opt/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/opt/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/opt/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.5.jar:/opt/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/opt/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/opt/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/opt/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/opt/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/opt/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/opt/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/opt/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/opt/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/opt/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/opt/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/opt/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/opt/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/opt/hadoop/share/hadoop/mapreduce/lib/xz-1.0.jar:/opt/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/opt/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/opt/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/opt/hadoop/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/opt/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/opt/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.5-tests.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.5.jar:/opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.5.jar:/opt/hadoop/contrib/capacity-scheduler/*.jarSTARTUP_MSG: build = https://shv@git-wip-us.apache.org/repos/asf/hadoop.git -r 18065c2b6806ed4aa6a3187d77cbe21bb3dba075; compiled by 'kshvachk' on 2017-12-16T01:06ZSTARTUP_MSG: java = 1.8.0_151************************************************************/18/01/30 08:50:38 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]18/01/30 08:50:38 INFO namenode.NameNode: createNameNode [-format]Formatting using clusterid: CID-a8cec172-6f1b-4aa5-8f73-f99ad2bb29b218/01/30 08:50:38 INFO namenode.FSNamesystem: No KeyProvider found.18/01/30 08:50:38 INFO namenode.FSNamesystem: fsLock is fair: true18/01/30 08:50:38 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false18/01/30 08:50:39 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=100018/01/30 08:50:39 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true18/01/30 08:50:39 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.00018/01/30 08:50:39 INFO blockmanagement.BlockManager: The block deletion will start around 2018 Jan 30 08:50:3918/01/30 08:50:39 INFO util.GSet: Computing capacity for map BlocksMap18/01/30 08:50:39 INFO util.GSet: VM type = 64-bit18/01/30 08:50:39 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB18/01/30 08:50:39 INFO util.GSet: capacity = 2^21 = 2097152 entries18/01/30 08:50:39 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false18/01/30 08:50:39 INFO blockmanagement.BlockManager: defaultReplication = 118/01/30 08:50:39 INFO blockmanagement.BlockManager: maxReplication = 51218/01/30 08:50:39 INFO blockmanagement.BlockManager: minReplication = 118/01/30 08:50:39 INFO blockmanagement.BlockManager: maxReplicationStreams = 218/01/30 08:50:39 INFO blockmanagement.BlockManager: replicationRecheckInterval = 300018/01/30 08:50:39 INFO blockmanagement.BlockManager: encryptDataTransfer = false18/01/30 08:50:39 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 100018/01/30 08:50:39 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE)18/01/30 08:50:39 INFO namenode.FSNamesystem: supergroup = supergroup18/01/30 08:50:39 INFO namenode.FSNamesystem: isPermissionEnabled = true18/01/30 08:50:39 INFO namenode.FSNamesystem: HA Enabled: false18/01/30 08:50:39 INFO namenode.FSNamesystem: Append Enabled: true18/01/30 08:50:39 INFO util.GSet: Computing capacity for map INodeMap18/01/30 08:50:39 INFO util.GSet: VM type = 64-bit18/01/30 08:50:39 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB18/01/30 08:50:39 INFO util.GSet: capacity = 2^20 = 1048576 entries18/01/30 08:50:39 INFO namenode.FSDirectory: ACLs enabled? false18/01/30 08:50:39 INFO namenode.FSDirectory: XAttrs enabled? true18/01/30 08:50:39 INFO namenode.FSDirectory: Maximum size of an xattr: 1638418/01/30 08:50:39 INFO namenode.NameNode: Caching file names occuring more than 10 times18/01/30 08:50:39 INFO util.GSet: Computing capacity for map cachedBlocks18/01/30 08:50:39 INFO util.GSet: VM type = 64-bit18/01/30 08:50:39 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB18/01/30 08:50:39 INFO util.GSet: capacity = 2^18 = 262144 entries18/01/30 08:50:39 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.999000012874603318/01/30 08:50:39 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 018/01/30 08:50:39 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 3000018/01/30 08:50:39 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 1018/01/30 08:50:39 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 1018/01/30 08:50:39 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,2518/01/30 08:50:39 INFO namenode.FSNamesystem: Retry cache on namenode is enabled18/01/30 08:50:39 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis18/01/30 08:50:39 INFO util.GSet: Computing capacity for map NameNodeRetryCache18/01/30 08:50:39 INFO util.GSet: VM type = 64-bit18/01/30 08:50:39 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB18/01/30 08:50:39 INFO util.GSet: capacity = 2^15 = 32768 entries18/01/30 08:50:39 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1709412250-192.168.1.108-151732023940518/01/30 08:50:39 INFO common.Storage: Storage directory /var/hadoop/tmp/dfs/name has been successfully formatted.18/01/30 08:50:39 INFO namenode.FSImageFormatProtobuf: Saving image file /var/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression18/01/30 08:50:39 INFO namenode.FSImageFormatProtobuf: Image file /var/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.18/01/30 08:50:39 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 018/01/30 08:50:39 INFO util.ExitUtil: Exiting with status 018/01/30 08:50:39 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at v108.zlikun.com/192.168.1.108************************************************************/# 查看格式化的目录$ ls -l /var/hadoop/tmp/total 0drwxr-xr-x. 3 root root 18 Jan 30 08:50 dfs dfs.replication 1 dfs.namenode.rpc-address v108.zlikun.com:9000 dfs.namenode.rpc-bind-host 0.0.0.0
运行HDFS
# 启动HDFS,启动后应有三个进程 ( 因为是伪分布式,所以各个节点都在同一台机器上 )$ sbin/start-dfs.sh$ jps9091 DataNode9242 SecondaryNameNode8973 NameNode# 此时应可以在在浏览器中通过URL访问HDFS信息# http://192.168.1.108:50070/# 如果访问不到,可能是防火墙禁用了50070的端口访问,这里选择关闭防火墙 ( 生产环境不要这样做 )$ firewall-cmd --staterunning$ systemctl stop firewalld$ firewall-cmd --state not running# 这里直接禁用掉防火墙 ( 开机时不会自启动 )$ systemctl disable firewalldRemoved symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.# 在HDFS中创建一个目录 ( 这里创建一个用户目录,Hadoop中用户目录是/user目录,这里直接使用的是root帐号 )$ bin/hdfs dfs -mkdir -p /user/root# 上传一个本地文件到HDFS中$ bin/hdfs dfs -put input/lang.txt lang.txt# 查看上传后的文件$ bin/hdfs dfs -ls /user/rootFound 1 items-rw-r--r-- 1 root supergroup 59 2018-01-30 09:01 /user/root/lang.txt# 运行词频统计程序,这次统计的文件位于HDFS中$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar wordcount lang.txt output18/01/30 09:06:01 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id18/01/30 09:06:01 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=18/01/30 09:06:01 INFO input.FileInputFormat: Total input paths to process : 118/01/30 09:06:01 INFO mapreduce.JobSubmitter: number of splits:118/01/30 09:06:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local147097953_000118/01/30 09:06:02 INFO mapreduce.Job: The url to track the job: http://localhost:8080/18/01/30 09:06:02 INFO mapreduce.Job: Running job: job_local147097953_000118/01/30 09:06:02 INFO mapred.LocalJobRunner: OutputCommitter set in config null18/01/30 09:06:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 118/01/30 09:06:02 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter18/01/30 09:06:02 INFO mapred.LocalJobRunner: Waiting for map tasks18/01/30 09:06:02 INFO mapred.LocalJobRunner: Starting task: attempt_local147097953_0001_m_000000_018/01/30 09:06:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 118/01/30 09:06:02 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]18/01/30 09:06:02 INFO mapred.MapTask: Processing split: hdfs://v108.zlikun.com:9000/user/root/lang.txt:0+5918/01/30 09:06:02 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)18/01/30 09:06:02 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 10018/01/30 09:06:02 INFO mapred.MapTask: soft limit at 8388608018/01/30 09:06:02 INFO mapred.MapTask: bufstart = 0; bufvoid = 10485760018/01/30 09:06:02 INFO mapred.MapTask: kvstart = 26214396; length = 655360018/01/30 09:06:02 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer18/01/30 09:06:02 INFO mapred.LocalJobRunner: 18/01/30 09:06:02 INFO mapred.MapTask: Starting flush of map output18/01/30 09:06:02 INFO mapred.MapTask: Spilling map output18/01/30 09:06:02 INFO mapred.MapTask: bufstart = 0; bufend = 99; bufvoid = 10485760018/01/30 09:06:02 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214360(104857440); length = 37/655360018/01/30 09:06:02 INFO mapred.MapTask: Finished spill 018/01/30 09:06:02 INFO mapred.Task: Task:attempt_local147097953_0001_m_000000_0 is done. And is in the process of committing18/01/30 09:06:02 INFO mapred.LocalJobRunner: map18/01/30 09:06:02 INFO mapred.Task: Task 'attempt_local147097953_0001_m_000000_0' done.18/01/30 09:06:02 INFO mapred.Task: Final Counters for attempt_local147097953_0001_m_000000_0: Counters: 23 File System Counters FILE: Number of bytes read=296004 FILE: Number of bytes written=586165 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=59 HDFS: Number of bytes written=0 HDFS: Number of read operations=5 HDFS: Number of large read operations=0 HDFS: Number of write operations=1 Map-Reduce Framework Map input records=1 Map output records=10 Map output bytes=99 Map output materialized bytes=92 Input split bytes=111 Combine input records=10 Combine output records=7 Spilled Records=7 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=18 Total committed heap usage (bytes)=165744640 File Input Format Counters Bytes Read=5918/01/30 09:06:02 INFO mapred.LocalJobRunner: Finishing task: attempt_local147097953_0001_m_000000_018/01/30 09:06:02 INFO mapred.LocalJobRunner: map task executor complete.18/01/30 09:06:02 INFO mapred.LocalJobRunner: Waiting for reduce tasks18/01/30 09:06:02 INFO mapred.LocalJobRunner: Starting task: attempt_local147097953_0001_r_000000_018/01/30 09:06:02 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 118/01/30 09:06:02 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]18/01/30 09:06:02 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@8fe6e1c18/01/30 09:06:02 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=1018/01/30 09:06:02 INFO reduce.EventFetcher: attempt_local147097953_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events18/01/30 09:06:02 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local147097953_0001_m_000000_0 decomp: 88 len: 92 to MEMORY18/01/30 09:06:02 INFO reduce.InMemoryMapOutput: Read 88 bytes from map-output for attempt_local147097953_0001_m_000000_018/01/30 09:06:02 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 88, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->8818/01/30 09:06:02 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning18/01/30 09:06:02 INFO mapred.LocalJobRunner: 1 / 1 copied.18/01/30 09:06:02 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs18/01/30 09:06:02 INFO mapred.Merger: Merging 1 sorted segments18/01/30 09:06:02 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes18/01/30 09:06:02 INFO reduce.MergeManagerImpl: Merged 1 segments, 88 bytes to disk to satisfy reduce memory limit18/01/30 09:06:02 INFO reduce.MergeManagerImpl: Merging 1 files, 92 bytes from disk18/01/30 09:06:02 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce18/01/30 09:06:02 INFO mapred.Merger: Merging 1 sorted segments18/01/30 09:06:02 WARN io.ReadaheadPool: Failed readahead on ifileEBADF: Bad file descriptor at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)18/01/30 09:06:02 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 79 bytes18/01/30 09:06:02 INFO mapred.LocalJobRunner: 1 / 1 copied.18/01/30 09:06:02 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords18/01/30 09:06:02 INFO mapred.Task: Task:attempt_local147097953_0001_r_000000_0 is done. And is in the process of committing18/01/30 09:06:02 INFO mapred.LocalJobRunner: 1 / 1 copied.18/01/30 09:06:02 INFO mapred.Task: Task attempt_local147097953_0001_r_000000_0 is allowed to commit now18/01/30 09:06:02 INFO output.FileOutputCommitter: Saved output of task 'attempt_local147097953_0001_r_000000_0' to hdfs://v108.zlikun.com:9000/user/root/output/_temporary/0/task_local147097953_0001_r_00000018/01/30 09:06:02 INFO mapred.LocalJobRunner: reduce > reduce18/01/30 09:06:02 INFO mapred.Task: Task 'attempt_local147097953_0001_r_000000_0' done.18/01/30 09:06:02 INFO mapred.Task: Final Counters for attempt_local147097953_0001_r_000000_0: Counters: 29 File System Counters FILE: Number of bytes read=296220 FILE: Number of bytes written=586257 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=59 HDFS: Number of bytes written=58 HDFS: Number of read operations=8 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Map-Reduce Framework Combine input records=0 Combine output records=0 Reduce input groups=7 Reduce shuffle bytes=92 Reduce input records=7 Reduce output records=7 Spilled Records=7 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=5 Total committed heap usage (bytes)=165744640 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Output Format Counters Bytes Written=5818/01/30 09:06:02 INFO mapred.LocalJobRunner: Finishing task: attempt_local147097953_0001_r_000000_018/01/30 09:06:02 INFO mapred.LocalJobRunner: reduce task executor complete.18/01/30 09:06:03 INFO mapreduce.Job: Job job_local147097953_0001 running in uber mode : false18/01/30 09:06:03 INFO mapreduce.Job: map 100% reduce 100%18/01/30 09:06:03 INFO mapreduce.Job: Job job_local147097953_0001 completed successfully18/01/30 09:06:03 INFO mapreduce.Job: Counters: 35 File System Counters FILE: Number of bytes read=592224 FILE: Number of bytes written=1172422 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=118 HDFS: Number of bytes written=58 HDFS: Number of read operations=13 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Map-Reduce Framework Map input records=1 Map output records=10 Map output bytes=99 Map output materialized bytes=92 Input split bytes=111 Combine input records=10 Combine output records=7 Reduce input groups=7 Reduce shuffle bytes=92 Reduce input records=7 Reduce output records=7 Spilled Records=14 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=23 Total committed heap usage (bytes)=331489280 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=59 File Output Format Counters Bytes Written=58# 查看统计结果$ bin/hdfs dfs -cat output/*erlang 1golang 1java 3javascript 1lua 1ruby 1rust 2# 停止HDFS$ sbin/stop-dfs.sh Stopping namenodes on [v108.zlikun.com]v108.zlikun.com: stopping namenodelocalhost: stopping datanodeStopping secondary namenodes [0.0.0.0]0.0.0.0: stopping secondarynamenode
配置YARN
# 复制 mapred-site.xml.template 为 mapred-site.xml# etc/hadoop/mapred-site.xml ,配置 mapreduce任务由YARN来调度# etc/hadoop/yarn-site.xml mapreduce.framework.name yarn # 上述全部配置参考:# http://hadoop.apache.org/docs/r2.7.5/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml# http://hadoop.apache.org/docs/r2.7.5/hadoop-yarn/hadoop-yarn-common/yarn-default.xml yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.resource.memory-mb 4096 yarn.nodemanager.resource.cpu-vcores 2 yarn.log-aggregation-enable true yarn.log-aggregation.retain-seconds 604800 yarn.nodemanager.remote-app-log-dir /tmp/logs
运行YARN
# 启动HDFS和YARN$ sbin/start-dfs.sh$ sbin/start-yarn.sh# 查看进程 ( 应有5个进程 )$ jps10977 NameNode11523 NodeManager11101 DataNode11262 SecondaryNameNode11407 ResourceManager# 同样,YARN可以通过浏览器来访问其状态信息# http://192.168.1.108:8088/cluster# 删除之前生成的文件 ( MapReduce程序输出的目录不能是系统已存在的目录 ),下面将重新执行词频统计程序$ bin/hdfs dfs -rm -r output18/01/30 09:10:18 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.Deleted output# 重新运行词频统计程序,这次由YARN来调度执行$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar wordcount lang.txt output18/01/30 10:18:28 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803218/01/30 10:18:29 INFO input.FileInputFormat: Total input paths to process : 118/01/30 10:18:29 INFO mapreduce.JobSubmitter: number of splits:118/01/30 10:18:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517321368440_000218/01/30 10:18:29 INFO impl.YarnClientImpl: Submitted application application_1517321368440_000218/01/30 10:18:29 INFO mapreduce.Job: The url to track the job: http://v108.zlikun.com:8088/proxy/application_1517321368440_0002/18/01/30 10:18:29 INFO mapreduce.Job: Running job: job_1517321368440_000218/01/30 10:18:38 INFO mapreduce.Job: Job job_1517321368440_0002 running in uber mode : false18/01/30 10:18:38 INFO mapreduce.Job: map 0% reduce 0%18/01/30 10:18:44 INFO mapreduce.Job: map 100% reduce 0%18/01/30 10:18:49 INFO mapreduce.Job: map 100% reduce 100%18/01/30 10:18:50 INFO mapreduce.Job: Job job_1517321368440_0002 completed successfully18/01/30 10:18:50 INFO mapreduce.Job: Counters: 49 File System Counters ... ... Job Counters ... ... Map-Reduce Framework Map input records=1 Map output records=10 Map output bytes=99 Map output materialized bytes=92 Input split bytes=111 Combine input records=10 Combine output records=7 Reduce input groups=7 Reduce shuffle bytes=92 Reduce input records=7 Reduce output records=7 Spilled Records=14 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=146 CPU time spent (ms)=1900 Physical memory (bytes) snapshot=328867840 Virtual memory (bytes) snapshot=4159520768 Total committed heap usage (bytes)=219676672 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=59 File Output Format Counters Bytes Written=58
界面预览
HDFS 管理界面
HDFS 文件浏览器 YARN 监控界面由于是笔记性质的博客,所以写了很多注释,其中有谬误之处,请读者留言指出,我好修改。