`
sillycat
  • 浏览: 2486730 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

Hadoop Docker 2019 Version 3.2.1

 
阅读更多
Hadoop Docker 2019 Version 3.2.1

I try to set up a HDFS in Docker, which can be running on one server to provide DFS. That is it. The files there can be easily share with multiple machines.

Exception:
> systemctl start sshd
Failed to get D-Bus connection: Operation not permitted

Solution:
I can not fix that in CentOS. So I start to use Ubuntu instead.

Set Up Client and Try
> wget http://apache-mirror.8birdsvideo.com/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
> tar zxvf hadoop-3.2.1.tar.gz
> mv hadoop-3.2.1 ~/tool/
Place in working directory, and add this to Path
PATH=$PATH:/opt/hadoop/bin

Check version
> hdfs version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /home/carl/tool/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar

List the file
> hdfs dfs -ls hdfs://localhost:9000/
Found 1 items
drwxr-xr-x   - dr.who supergroup          0 2019-12-07 16:25 hdfs://localhost:9000/hello

The put command works
> hdfs dfs -put ./README.txt hdfs://localhost:9000/hello/README.txt

But I can not upload and download from web console. Check the developer tool, I found it is using the Docker Container hostname and 9864 port.
https://my.oschina.net/u/3163032/blog/1622221

Official Website
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html
https://note.louyj.com/blog/post/louyj/Authentication-for-Hadoop-HTTP-web-consoles
<property>
        <name>hadoop.http.filter.initializers</name>
        <value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
    </property>
    <property>
        <name>hadoop.http.authentication.type</name>
        <value>simple</value>
    </property>
    <property>
        <name>hadoop.http.authentication.token.validity</name>
        <value>12000</value>
    </property>
    <property>
        <name>hadoop.http.authentication.simple.anonymous.allowed</name>
        <value>false</value>
    </property>
    <property>
        <name>hadoop.http.authentication.signature.secret.file</name>
        <value>/tool/hadoop/hadoop-http-auth-signature-secret</value>
    </property>
    <property>
        <name>hadoop.http.authentication.cookie.domain</name>
        <value></value>
    </property>

The Hadoop-http-auth-signature-secret  is a text file with content hello!123
This will works
http://rancher-worker1:9870/explorer.html?user.name=hello!123#/

Warning:
2019-12-08 01:56:21,717 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

It is an info, I do not know how to disable it right now.

The most important conf/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://0.0.0.0:9000</value>
    </property>
    <property>
        <name>hadoop.http.filter.initializers</name>
        <value>org.apache.hadoop.security.AuthenticationFilterInitializer</value>
    </property>
    <property>
        <name>hadoop.http.authentication.type</name>
        <value>simple</value>
    </property>
    <property>
        <name>hadoop.http.authentication.token.validity</name>
        <value>12000</value>
    </property>
    <property>
        <name>hadoop.http.authentication.simple.anonymous.allowed</name>
        <value>false</value>
    </property>
    <property>
        <name>hadoop.http.authentication.signature.secret.file</name>
        <value>/tool/hadoop/hadoop-http-auth-signature-secret</value>
    </property>
    <property>
        <name>hadoop.http.authentication.cookie.domain</name>
        <value></value>
    </property>
</configuration>

Nothing special in conf/hadoop-env.sh
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"
export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)}
case ${HADOOP_OS_TYPE} in
  Darwin*)
    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.realm= "
    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.kdc= "
    export HADOOP_OPTS="${HADOOP_OPTS} -Djava.security.krb5.conf= "
  ;;
esac

Secret password file conf/hadoop-http-auth-signature-secret
hello123

Configuration file conf/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
          <name>dfs.permissions</name>
          <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>0.0.0.0:9870</value>
      </property>
      <property>
        <name>dfs.datanode.http.address</name>
        <value>0.0.0.0:9864</value>
      </property>
</configuration>

All steps are in Dockerfile
#Run a kafka server side

#Prepare the OS
FROM            ubuntu:16.04
MAINTAINER      Carl Luo <luohuazju@gmail.com>

ENV DEBIAN_FRONTEND noninteractive
ENV JAVA_HOME       /usr/lib/jvm/java-8-openjdk-amd64
ENV LANG            en_US.UTF-8
ENV LC_ALL          en_US.UTF-8

RUN apt-get -qq update
RUN apt-get -qqy dist-upgrade

#Prepare the denpendencies
RUN apt-get install -qy wget unzip vim
RUN apt-get install -qy iputils-ping

#Install JAVA
RUN apt-get update && \
    apt-get install -y --no-install-recommends locales && \
    locale-gen en_US.UTF-8 && \
    apt-get dist-upgrade -y && \
    apt-get install -qy openjdk-8-jdk

#Prepare for hadoop and spark
RUN apt-get install -y openssh-server
RUN mkdir /var/run/sshd
RUN ssh-keygen -q -t rsa -N '' -f /root/.ssh/id_rsa
RUN cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

RUN            mkdir /tool/
WORKDIR        /tool/
RUN            wget http://apache-mirror.8birdsvideo.com/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
RUN            tar zxvf hadoop-3.2.1.tar.gz
RUN            ln -s /tool/hadoop-3.2.1 /tool/hadoop
ADD            conf/core-site.xml /tool/hadoop/etc/hadoop/
ADD            conf/hdfs-site.xml /tool/hadoop/etc/hadoop/
ADD            conf/hadoop-env.sh /tool/hadoop/etc/hadoop/
ADD            conf/hadoop-http-auth-signature-secret /tool/hadoop/hadoop-http-auth-signature-secret

#set up the app
EXPOSE  9870 9000 9864
RUN     mkdir -p /app/
ADD     start.sh /app/
WORKDIR /app/
CMD    [ "./start.sh" ]

Makefile to help me build the images
IMAGE=sillycat/public
TAG=ubuntu-hadoop-1.0
NAME=ubuntu-hadoop-1.0
HOSTNAME=rancher-worker1
   
docker-context:

build: docker-context
    docker build -t $(IMAGE):$(TAG) .

run:
    docker run -d -p 9870:9870 -p 9000:9000 -p 9864:9864 --hostname ${HOSTNAME} --name $(NAME) $(IMAGE):$(TAG)

debug:
    docker run -ti -p 9870:9870 -p 9000:9000 -p 9864:9864 --hostname ${HOSTNAME} --name $(NAME) $(IMAGE):$(TAG) /bin/bash

clean:
    docker stop ${NAME}
    docker rm ${NAME}

logs:
    docker logs ${NAME}

publish:
    docker push ${IMAGE}

Shell script to start the service start.sh
#!/bin/sh -ex

#prepare ENV
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"

#start ssh service
nohup /usr/sbin/sshd -D >/dev/stdout &

#start the service
cd /tool/hadoop
bin/hdfs namenode -format
sbin/start-dfs.sh
tail -f /dev/null


References:
https://phoenixnap.com/kb/how-to-enable-ssh-centos-7
https://serverfault.com/questions/824975/failed-to-get-d-bus-connection-operation-not-permitted
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
https://serverfault.com/questions/562756/how-to-remove-the-path-with-an-nginx-proxy-pass

Security
https://www.jianshu.com/p/51c39dfecff2
https://www.twblogs.net/a/5cfed4aebd9eee14029f459f/zh-cn


分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics