Hadoop与Hive日志 -

yangyangmyself

浏览: 229430 次
性别:
来自: 广州

最近访客更多访客>>

wangenbao1

gmacel

yuzhibing2

dy.f

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Hadoop与Hive日志

博客分类：

大数据

hadoop hdfs hive

1、包准备

http://archive.cloudera.com/cdh5/cdh/5/

hadoop-2.5.0-cdh5.3.0.tar.gz

zookeeper-3.4.5-cdh5.3.0.tar.gz

hive-0.13.1-cdh5.3.0.tar.gz

https://repo1.maven.org/maven2/org/apache/hive/hive-jdbc/

jdk1.7

2、环境准备

1）免密码SSH

ssh-keygen -t rsa -P ""

cat id_rsa.pub>> authorized_keys

chmod 700 .ssh

chmod 600 authorized_keys

2）主机名与IP映射

vi/etc/sysconfig/network

vim /etc/hosts

3）时钟同步

4）关闭防火墙

3、配置文件

1）Hadoop Core

配置项HDFS和MapReduce常用IO配置等

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9001</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/opt/hadoop/tmp</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

2）Hadoop Env 运行环境变量

略

3）Hdfs-Site Hadoop

守护进程的配置项，包括NameNode、辅助NameNode和DataNode等

<configuration>

<property>

<!-- specify this so that running 'hadoop namenode -format' formats the right dir -->

<name>dfs.name.dir</name>

<value>/opt/hadoop/hdfs/name</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/opt/hadoop/hdfs/data</value>

</property>

<property>

<name>fs.checkpoint.dir</name>

<value>/opt/hadoop/hdfs/namesecondary</value>

</property>

</configuration>

4）Mapred-site

jobtracker和tasktracker配置

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:8021</value>

</property>

<property>

<name>mapred.local.dir</name>

<value>/opt/hadoop/mapred/local</value>

</property>

<property>

<name>mapred.system.dir</name>

<value>/opt/hadoop/mapred/system</value>

</property>

<property>

<name>mapred.tasktracker.map.tasks.maximum</name>

<value>7</value>

</property>

<property>

<name>mapred.tasktracker.reduce.tasks.maxinum</name>

<value>7</value>

</property>

</configuration>

4、Hdfs

1）格式化

bin/hdfs.sh namenode -format

【注】重复格式化，导致datanode无法启动，可以删除core-site.xml中配置

的dfs name、data配置

2）启动NameNode和DataNode

sbin/start-dfs.sh

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory

(defaults to $HADOOP_HOME/logs)

或者

$ ./hadoop-daemon.sh start namenode

$ ./hadoop-daemon.sh start secondarynamenode

$ ./hadoop-daemon.sh start datanode

5、Namenode、SecondNameNode、DataNode、JobTracker、TaskTracker关系统

【注】 当MapReduce作业在高负载集群时，JobTracker会占用大量内存和

CPU资源，因些单独运行在一个专用节点

TaskTracker与map、reduce关系

问题：

1）Unable to load native-hadoop library for your platform... using

builtin-java classes where applicable

原因1：64位使用了32本地库

6、HDFS

dfs.namenode.secondary.http-address	0.0.0.0:50090	The secondary namenode http server address and port.
dfs.namenode.secondary.https-address	0.0.0.0:50091	The secondary namenode HTTPS server address and port.
dfs.datanode.address	0.0.0.0:50010	The datanode server address and port for data transfer.
dfs.datanode.http.address	0.0.0.0:50075	The datanode http server address and port.
dfs.datanode.ipc.address	0.0.0.0:50020	The datanode ipc server address and port.
dfs.namenode.http-address	0.0.0.0:50070

dfs.namenode.name.dir

file://${hadoop.tmp.dir}/dfs/name

Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.

dfs.namenode.edits.dir

${dfs.namenode.name.dir}

Determines where on the local filesystem the DFS name node should store the transaction (edits) file. If this is a comma-delimited list of directories then the transaction file is replicated in all of the directories, for redundancy. Default value is same as dfs.namenode.name.dir

dfs.datanode.data.dir

file://${hadoop.tmp.dir}/dfs/data

Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories that do not exist will be created if local filesystem permission allows.

dfs.namenode.checkpoint.dir	file://${hadoop.tmp.dir}/dfs/namesecondary	Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.
dfs.namenode.checkpoint.edits.dir	${dfs.namenode.checkpoint.dir}	Determines where on the local filesystem the DFS secondary name node should store the temporary edits to merge. If this is a comma-delimited list of directories then the edits is replicated in all of the directories for redundancy. Default value is same as dfs.namenode.checkpoint.dir

等价：bin/hdfs.sh dfs== bin/hadoop fs

浏览HDFS分布式文件系统根目下的文件

$>bin/hdfs.sh dfs -ls /

或者 $>bin/hadoop.sh fs -ls /

在根目下创建目录user

$>bin/hdfs.sh dfs -mkdir /user

上传本地文件至HDFS分布式文件系统

$>bin/hdfs dfs -put ../etc/hadoop /user/

或者使用

[-copyFromLocal [-f] [-p] <localsrc> ... <dst>] 上传

从HDFS分式文件系统下载到本地

[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] 下载

[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]下载

删除文件

$>bin/hdfs.sh dfs -rm /user/hadoop/*

查看HDFS文件内容

$>bin/hdfs.sh dfs -cat /user/core-site.xml

//=================HDFS Configuration对象加载参数=================

1.加载类路径资源文件

loaded in-order from the classpath:

1) core-default.xml: Read-only defaults for hadoop.

2) core-site.xml: Site-specific configuration for a given hadoop installation.

2.Final参数

Configuration parameters may be declared final. Once a resource

declares a value final,

no subsequently-loaded resource can alter that value.

For example, one might define a final parameter with:

<property>

<name>dfs.hosts.include</name>

<value>/etc/hadoop/conf/hosts.include</value>

<final>true</final>

</property>

3.变量扩展

Value strings are first processed for variable expansion. The available properties are:

1) Other properties defined in this Configuration; and, if a name is undefined here

2) Properties in System.getProperties().

For example, if a configuration resource contains the following property definitions:

<property>

<name>basedir</name>

<value>/user/${user.name}</value>

</property>

<property>

<name>tempdir</name>

<value>${basedir}/tmp</value>

</property>

//===================HDFS 接口=======================

与HDFS常用交互方式：

1、HTTP（不依赖特定HDFS版本）

HftpFileSystem/HsftpFileSystem

2、FTP（暂未实现）

3、Thrift

4、JAVA（命令行解释器采用FileSystem）

5、C

JAVA接口：

1.Hadoop URL 读取数据

此方法存在很大问题，Java虚拟机只能调用一次URL.setURLStreamHandlerFactory，无

法屏蔽第三方组件调用，否则无法从Hadoop中读取数据。

public class App {

static{

URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());

}

public static void main(String[] args) {

InputStream in = null;

try {

in = new URL("hdfs://192.168.121.200:9001/user/core-site.xml").openStream();

IOUtils.copyBytes(in, System.out, 4096, false);

} catch (IOException e) {

e.printStackTrace();

IOUtils.closeStream(in);

}

}

}

2.FileSystem API 读取数据

public class FileSystemReadHadoopApp {

public static void main(String[] args) {

// see Configuration describer above

Configuration conf = new Configuration();

InputStream in = null;

try {

// 读类路径下的core.site.xml文件中默认的文件系统

FileSystem _fs = FileSystem.get(conf);

// 去掉校验和验证

// .crc文件大小512字节，可以通过参数io.bytes.per.checksum

//_fs.setVerifyChecksum(false);

in = _fs.open(new Path(new URI("hdfs://192.168.121.200:9001/user/core-site.xml")));

IOUtils.copyBytes(in, System.out, 4096, false);

} catch (Exception e) {

e.printStackTrace();

IOUtils.closeStream(in);

}

}

}

7、Hadoop I/O

7.1 Hdfs完整性校验证文件：数据块同.filename.crc文件同一目录，大小512字节，可

以通过参数io.bytes.per.checksum设置文件大小，默认512字节

校验和验证参数开关：

1） 代码执行FileSystem类setVerifyChecksum方法，设备为false

2） 解释命令行 -ignoreCrc和-get或等价-copyToLocal

校验实现LocalFileSystem/RawLocalFileSystem：

1） FileSystem _fs2 = new RawLocalFileSystem();

_fs2.initialize(null, conf);

2）设置全局属性fs.file.impl值为org.apache.hadoop.fs.RawLocalFileSystem

7.2 压缩

1）效率：bzip2 > gzip > lzo

2）压缩速度：lzo > gzip > bzip2

3）解压速度：lzo > gzip > bzip2

7.2.1 解压缩接口CompressionCodec

hadoop codec实现：

编号	压缩格式	HadoopCompressionCodec
1	DEFLATE	org.apache.hadoop.io.compress.DefaultCodec
2	gzip	org.apache.hadoop.io.compress.GzipCodec
3	bzip2	org.apache.hadoop.io.compress.Bzip2Codec
4	lzo	com.hadoop.compression.lzo.LzopCodec

CompressionCodec接口方法：

1）压缩：CompressionOutputStream createOutputStream(OutputStream out)

2）解压：CompressionInputStream createInputStream(InputStream in)

CompressionCodecFactory根据扩展名推断CompressionCodec:

CompressionCodecFactory从属性io.compression.codecs定义的列表中找到codec，

默认情况hadoop提供所有codec，所以需要定制的codec时，才需要修改此属性。

为了性能，最好使用“原生”类库实现压缩与解压。hadoop.native.lib=false 禁用原生

代码类库；可以通过java.library.path设置原生类库位置（hadoop配置文件下）

压缩格式选择：

1. 原始文件

2. 支持压缩分片bzip2

3. 应用分片

4. sequence file

5. avro 数据文件

MapReduce压缩：

mapred.output.compress=true

mapred.output.compression.codec= codec实现类

或者

压缩格式mapred.output.compression.type=RECORD

Map压缩：

mapred.compress.map.output=true

mapred.map.output.compression.codec= codec实现类

八、YARN on a Single Node

8.1 配置信息

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

</configuration>

8.2 启动

$ ./yarn-daemon.sh start resourcemanager

$ ./yarn-daemon.sh start nodemanager

配置控制台日志：

hadoop-env.sh

export HADOOP_ROOT_LOGGER=DEBUG,console

fs.default.name

For MRv1:

<property> <name>fs.default.name/name> <value>hdfs://mycluster</value> </property>

For YARN:

<property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property>

配置日志（独立于Hadoop之外）：

export HADOOP_LOG_DIR=/var/log/hadoop

内存配置：

默认情况，hadoop为各个守护进程分配1G内存，该值由配置文件hadoop-env.sh

属性HADOOP_HEAPSIZE参数控制；

# The maximum amount of heap to use, in MB. Default is 1000.

#export HADOOP_HEAPSIZE=

#export HADOOP_NAMENODE_INIT_HEAPSIZE=""

tasktracker启动独立的子JVM分别运行map和reduce任务，任务内存由参数

mapred.child.java.opts控制，默人为200M

一个tasktracker所能同时运行的最大map任务数由

mapred.tasktracker.map.tasks.maxinum,默认为2

相应的，一个tasktracker所能同时运行的最大reduce任务数由

mapred.tasktracker.reduce.tasks.maxinum属性控制，默认为2

一个tasktracker所能同时运行的任务数取决于一台机器有多少处理器，由于Mapreduce作业通常

是I/O受限的（主要开销主要I/O操作）；经验法则是任务数（包括map和reduce）与处理

器数的比值最好在1和2之间（低于处理器数）

守护进程namenode、secondnamenode、jobtracker默认1G内存，经验法则，每1百万数

据块分配1G内存空间，如200节点集群为例，每个节点4T磁盘空间，数据块大小为128M，复

本数量为3的话，则约有2百万个数据块（甚至更多）：200 * 4000000M / (128M * 3) ，因此

本例namenode、secodenamenode配置为2G （namenode与secondnamenode一样）

# Command specific options appended to HADOOP_OPTS when specified

export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=

${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=

${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"

export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=

ERROR,RFAS $HADOOP_DATANODE_OPTS"

export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=

${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=

${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"

创建账号授权

#./hadoop fs -mkdir /user/oy

#./hadoop fs -ls /user

-rw-r--r-- 3 root supergroup 1063 2017-03-17 07:24 /user/core-site.xml

drwxr-xr-x - root supergroup 0 2017-03-17 07:14 /user/hadoop

drwxr-xr-x - root supergroup 0 2017-06-16 19:39 /user/oy

#./hadoop fs [-chown [-R] [OWNER][:[GROUP]] PATH...]

#./hadoop fs -chown oy:oy /user/oy

#./hadoop fs -chown oy:oy /user/oy

//=======================Hadoop环境变量===========================

Hadoop启动时需读取hadoop-config.sh配置的变量：

配置文件位置：hadoop-2.6.0-cdh5.4.0/libexec/hadoop-config.sh

//======================Hadoop问题===============================

问题

1）NativeCodeLoader

NativeCodeLoader: Unable to load native-hadoop library for your platform... using

builtin-java classes where applicable

2）格式化

17/06/19 06:50:43 WARN common.Util: Path /opt/hadoop/hfiles/hdfs/name should be

specified as a URI in configuration files. Please update hdfs configuration.

17/06/19 06:50:43 WARN common.Util: Path /opt/hadoop/hfiles/hdfs/name should be

specified as a URI in configuration files. Please update hdfs configuration.

增加文件前缀：

<property>

<!-- specify this so that running 'hadoop namenode -format' formats the right dir -->

<name>dfs.name.dir</name>

<value>file:/opt/hadoop/hfiles/hdfs/name</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>file:/opt/hadoop/hfiles/hdfs/data</value>

</property>

<property>

<name>fs.checkpoint.dir</name>

<value>file:/opt/hadoop/hfiles/hdfs/namesecondary</value>

</property>

Formatting using clusterid: CID-009ca18e-cedc-4f93-b40e-c1c1107b4164

17/06/19 06:50:43 INFO namenode.FSNamesystem: No KeyProvider found.

17/06/19 06:50:43 DEBUG crypto.OpensslCipher: Failed to load OpenSSL Cipher.

java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsOpenssl()Z

at org.apache.hadoop.util.NativeCodeLoader.buildSupportsOpenssl(Native Method)

at org.apache.hadoop.crypto.OpensslCipher.<clinit>(OpensslCipher.java:84)

at org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.<init>(OpensslAesCtrCryptoCodec.java:50)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)

at org.apache.hadoop.crypto.CryptoCodec.getInstance(CryptoCodec.java:68)

at org.apache.hadoop.crypto.CryptoCodec.getInstance(CryptoCodec.java:101)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:802)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:778)

at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:980)

at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1425)

at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1550)

3）yarn 任务挂起

[root@hadoop1 conf]# jps

2776 DataNode

5722 RunJar

6427 Jps

3397 ResourceManager

5478 RunJar

2943 SecondaryNameNode

2669 NameNode

0: jdbc:hive2://localhost:10000/testdb> select count(*) from t1;

INFO : Number of reduce tasks determined at compile time: 1

INFO : In order to change the average load for a reducer (in bytes):

INFO : set hive.exec.reducers.bytes.per.reducer=<number>

INFO : In order to limit the maximum number of reducers:

INFO : set hive.exec.reducers.max=<number>

INFO : In order to set a constant number of reducers:

INFO : set mapreduce.job.reduces=<number>

WARN : Hadoop command-line option parsing not performed. Implement the Tool

interface and execute your application with ToolRunner to remedy this.

INFO : Starting Job = job_1498049139033_0001, Tracking URL =

http://hadoop1:8088/proxy/application_1498049139033_0001/

INFO : Kill Command = /opt/hadoop/hadoop-2.6.0-cdh5.4.0/bin/hadoop job -kill job_1498049139033_0001

无nodemanager启动，以致任务挂起

结束yarn job:

[root@hadoop1 bin]# yarn application -kill application_1498049139033_0001

//================================HIVE===========================

hive.metastore.local 在新的版本（0.10）以后不再使用的配置属性。

1）hive本地模式

hive.metastore.uris 为空测为本地模式

cli启动hive，不需要启动metastore、hiveserver或hiveserver2

2）hive远程模式

// 配置连接metastore server

<property>

<name>hive.metastore.uris</name>

<value>thrift://ip:port</value>

</property>

3）hive端口

hive --service metastore & --->默认启动9083端口

hive --service hiveserver2 & ---->thrift监听端口10000(默认)

4）hive cli（新版已过时）

hive cli 依赖 hadoop

默认情况下载hive包，配置conf/hive-env.sh文件：

# Set HADOOP_HOME to point to a specific hadoop install directory

HADOOP_HOME=/opt/hadoop/hadoop-2.6.0-cdh5.4.0

# Hive Configuration Directory can be controlled by:

export HIVE_CONF_DIR=/opt/hadoop/hive-1.1.0-cdh5.4.0/conf

本地（嵌入模式）

hive.metastore.uris属性值为空

5）hive beeline（类似hive cli）

本地嵌入/远程模式都可以启动

依赖hiveserver2,需先启动，默认监听端口10000；也可以自定端口。

hive --service hiveserver2 -p 11000 &

连接hiveserver2:

jdbc:hive2://ip:port/dbname

例如：

[root@hadoop1 bin]# ./beeline

17/06/21 06:28:52 DEBUG util.VersionInfo: version: 2.6.0-cdh5.4.0

Beeline version 1.1.0-cdh5.4.0 by Apache Hive

beeline> !connect jdbc:hive2://localhost:10000/testdb

scan complete in 12ms

Connecting to jdbc:hive2://localhost:10000/testdb

Enter username for jdbc:hive2://localhost:10000/testdb:

Enter password for jdbc:hive2://localhost:10000/testdb:

Connected to: Apache Hive (version 1.1.0-cdh5.4.0)

Driver: Hive JDBC (version 1.1.0-cdh5.4.0)

Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://localhost:10000/testdb> show tables;

+-----------+--+

| tab_name |

+-----------+--+

| t1 |

+-----------+--+

1 row selected (2.72 seconds)

0: jdbc:hive2://localhost:10000/testdb>

端口查看：

[root@hadoop1 bin]# netstat -anp | grep 10000

6）hive 初始化使用 postgresql

[root@hadoop1 bin]# ./schematool -dbType postgres -initSchema

初始化各类数据库（oracle/mysql/postgres/derby/mssql）脚本文件位于

$HIVE_HOME/scripts/metastore/upgrade/

7）FATAL: no pg_hba.conf entry for host "192.168.110.166", 

user "jcbk", database "jcbk", SSL off

修改postgres中的PostgreSQL\9.5\data\pg_hba.conf：

# TYPE DATABASE USER ADDRESS METHOD

# IPv4 local connections:

host all all 127.0.0.1/32 md5

host all all 192.168.110.166/32 trust

# IPv6 local connections:

host all all ::1/128 md5

# Allow replication connections from localhost, by a user with the

# replication privilege.

#host replication postgres 127.0.0.1/32 md5

#host replication postgres ::1/128 md5

修改postgres中的PostgreSQL\9.5\data\postgresql.conf，监听所有主机请求：

listen_addresses = '*'

8）Unable to open a test connection to the given database

Caused by: java.sql.SQLException: Unable to open a test connection to the

given database. JDBC url = jdbc:postgresql://localhost:5432/jcbk, username

= jcbk. Terminating connection pool (set lazyInit to true if you expect to start

your database after your app). Original Exception: ------

org.postgresql.util.PSQLException: Connection refused. Check that the hostname

and port are correct and that the postmaster is accepting TCP/IP connections.

原因:IP不正确

9） Failed to get schema version.

Metastore connection URL: jdbc:postgresql://localhost:5432/jcbk?createDatabaseIfNotExist=true

Metastore Connection Driver : org.postgresql.Driver

Metastore connection User: jcbk

org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.

*** schemaTool failed ***

原因：数据库脚本未初始化

10）MissingTableException

Caused by: org.datanucleus.store.rdbms.exceptions.MissingTableException:

Required table missing : ""VERSION"" in Catalog "" Schema "". DataNucleus

requires this table to perform its persistence operations. Either your MetaData

is incorrect, or you need to enable "datanucleus.autoCreateTables"

hive-site.xml 修改如下:

<property>

<name>datanucleus.autoCreateSchema</name>

<value>true</value>

</property>

11）Failed to start database 'metastore_db'

Caused by: java.sql.SQLException: Failed to start database 'metastore_db' with

class loader sun.misc.Launcher$AppClassLoader@2bb0bf9a, see the next exception for details.

at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)

at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown Source)

... 61 more

Caused by: java.sql.SQLException: Database at /opt/hadoop/hive-1.1.0-cdh5.4.0/bin/metastore_db

has an incompatible format with the current version of the software. The database

was created by or upgraded by version 10.11.

删除$HIVE_HOME/bin/metastore_db

12） Could not connect to meta store using any of the URIs

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/

CDH-5.3.0-1.cdh5.3.0.p0.30/jars/hive-common-0.13.1-cdh5.3.0.jar!/hive-log4j.properties

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException:

Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

Caused by: MetaException(message:Could not connect to meta store using any of the

URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException:

java.net.ConnectException: Connection refused

metastore （hive server， port=9083） 未启动

#./hive --service metastore &

13） Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

the Hive properties to implicitly create or alter the existing schema are disabled by default.

Hive will not attempt to change the metastore schema implicitly. When you execute a

Hive query against an old schema, it will fail to access the metastore;

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException:

Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:367)

at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)

at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Caused by: java.lang.RuntimeException:

Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

hive 配置文件hive-site.xml，增加如下：

<property>

<name>hive.metastore.schema.verification</name>

<value>false</value>

</property>

14）HiveServer2 Security Configuration

HiveServer2 supports authentication of the Thrift client using either of these methods:

Kerberos authentication

http://www.cloudera.com/documentation/archive/cdh/4-x/4-3-0/CDH4-Security-Guide/cdh4sg_topic_9_1.html

<property>

<name>hive.server2.authentication</name>

<value>KERBEROS</value>

</property>

<property>

<name>hive.server2.authentication.kerberos.principal</name>

<value>hive/_HOST@YOUR-REALM.COM</value>

</property>

<property>

<name>hive.server2.authentication.kerberos.keytab</name>

<value>/etc/hive/conf/hive.keytab</value>

</property>

jdbc:hive2://node1:10000/default;principal=hive/HiveServer2Host@YOUR-REALM.CO

LDAP authentication

<property>

<name>hive.server2.authentication</name>

<value>LDAP</value>

</property>

<property>

<name>hive.server2.authentication.ldap.url</name>

<value>LDAP_URL</value>

</property>

<property>

<name>hive.server2.authentication.ldap.baseDN</name>

<value>LDAP_BaseDN</value>

</property>

jdbc:hive2://node1:10000/default;user=LDAP_Userid;password=LDAP_Password

15）Hive jdbc 与 server 版本不一致问题

Could not establish connection to jdbc:hive2://192.168.121.200:10000/default:

Required field 'client_protocol'

解决办法：将hadoop以下位置的jline替换成$HIVE_HOME/lib下的jline版本

E:\openjar\hadoop\hadoop\hadoop-2.5.0-cdh5.3.0\share\hadoop\yarn\lib

16）Windows下使用beeline 命令行访问 linux下的hive

使用方法如下：

(1) 下载安装hadoop，并配置环境变量HADOOP_HOME

(2) 下载安装JDK，并配置环境变量JAVA_HOME

(3) 下载安装hive，因$HIVE_HOME/bin下不存在beeline.cmd，在bin目录下

创建文件beeline.cmd，将以下内容拷备至beeline.cmd。

(4) hadoop下的jline与hive中的jline版本存在冲突时，统一成hive中jline版本

(5) 使用beeline jdbc客户端远程连接hive服务端时，如果创建的外部表locaion

文件路径需要权限时，需要在连接HIVE时提供用户名密码，否测无法创建外部表

(6) hive外部表挂接的文件，如果需要权限，则连接hive查询表时，需要提供用户名及密码

// ===============================beeline.cmd=================================

@echo off

@rem Licensed to the Apache Software Foundation (ASF) under one or more

@rem contributor license agreements. See the NOTICE file distributed with

@rem this work for additional information regarding copyright ownership.

@rem The ASF licenses this file to You under the Apache License, Version 2.0

@rem (the "License"); you may not use this file except in compliance with

@rem the License. You may obtain a copy of the License at

@rem

@rem http://www.apache.org/licenses/LICENSE-2.0

@rem

@rem Unless required by applicable law or agreed to in writing, software

@rem distributed under the License is distributed on an "AS IS" BASIS,

@rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

@rem See the License for the specific language governing permissions and

@rem limitations under the License.

SetLocal EnableDelayedExpansion

pushd %CD%\..

if not defined HIVE_HOME (

set HIVE_HOME=%CD%

)

popd

if "%HADOOP_BIN_PATH:~-1%" == "\" (

set HADOOP_BIN_PATH=%HADOOP_BIN_PATH:~0,-1%

)

if not defined JAVA_HOME (

echo Error: JAVA_HOME is not set.

goto :eof

)

@rem get the hadoop envrionment

if not exist %HADOOP_HOME%\libexec\hadoop-config.cmd (

@echo +================================================================+

@echo ^| Error: HADOOP_HOME is not set correctly ^|

@echo +----------------------------------------------------------------+

@echo ^| Please set your HADOOP_HOME variable to the absolute path of ^|

@echo ^| the directory that contains \libexec\hadoop-config.cmd ^|

@echo +================================================================+

exit /b 1

)

@rem supress the HADOOP_HOME warnings in 1.x.x

set HADOOP_HOME_WARN_SUPPRESS=true

call %HADOOP_HOME%\libexec\hadoop-config.cmd

@rem include only the beeline client jar and its dependencies

pushd %HIVE_HOME%\lib

for /f %%a IN ('dir /b hive-beeline-**.jar') do (

set CLASSPATH=%CLASSPATH%;%HIVE_HOME%\lib\%%a

)

for /f %%a IN ('dir /b super-csv-**.jar') do (

set CLASSPATH=%CLASSPATH%;%HIVE_HOME%\lib\%%a

)

for /f %%a IN ('dir /b jline-**.jar') do (

set CLASSPATH=%CLASSPATH%;%HIVE_HOME%\lib\%%a

)

for /f %%a IN ('dir /b hive-jdbc-**-standalone.jar') do (

set CLASSPATH=%CLASSPATH%;%HIVE_HOME%\lib\%%a

)

popd

call %JAVA_HOME%\bin\java %JAVA_HEAP_MAX% %HADOOP_OPTS%

-classpath %CLASSPATH% org.apache.hive.beeline.BeeLine %*

endlocal

//===========================================================================

17）windows 命令行远程连接 HDFS

（1）下载hadoop，配置环境变量HADOOP_HOME

（2）配置hadoop core-site.xml，配置远程hdfs namenode ip:port

<property>

<name>fs.default.name</name>

<value>hdfs://192.168.121.200:8020</value>

</property>

（3）下载winutils.exe文件，放置于$HADOOP_HOME/bin，解决以下问题：

17/06/22 16:09:14 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path

java.io.IOException: Could not locate executable

E:\openjar\hadoop\hadoop\hadoop-2.5.0-cdh5.3.0\bin\winutils.e

xe in the Hadoop binaries.

（4）在window上传文件时出现"块分片"异常，要linux上可以上传文件至远程HDFS服务器

18） Hdfs client : Permission denied

1、在系统的环境变量或Java JVM变量里面添加HADOOP_USER_NAME，这个值具体等于多少看自己的情况，以后会运行HADOOP上的linux的用户名。（修改完重启eclipse，不然可能不生效）

2、将当前系统的帐号修改为hadoop

3、使用HDFS的命令行接口修改相应目录的权限，hadoop fs -chmod 777 /user,后面的/user是要上传文件的路径，不同的情况可能不一样，比如要上传的文件路径为hdfs://namenode/user/xxx.doc，则这样的修改可以，如果要上传的文件路径为hdfs://namenode/java/xxx.doc，则要修改的为hadoop fs -chmod 777 /java或者hadoop fs -chmod 777 /，java的那个需要先在HDFS里面建立Java目录，后面的这个是为根目录调整权限。

4、在hdfs的配置文件中，将dfs.permissions修改为False

19） HDFS身份验证模式

Hadoop支持2种不同的身份验证模式，可以通过hadoop.security.authentication属性进行配置：

simple

　　在simple身份认证模式下，用户的身份信息就是客户端的操作系统的登录用户，在Unix类的操作系统中，HDFS的用户名等同使用whoami命令查看结果的用户名。

kerberos

　　在kerberos身份认证模式下，HDFS用户的身份是由kerberos凭证决定的。kerberos认证的安全性较高，但配置相对复杂，一般情况下很少使用。

20） Using Hive with Existing Files on S3

说明

安装JDK,配置JAVA_HOME环境变量，接下来就是配置$HIVE_OPTS参数。

更新配置

首先需要配置参数：This can be done via HIVE_OPTS, configuration files ($HIVE_HOME/conf/hive-site.xml), or via Hive CLI’s SETcommand.

Here are the configuration parameters:

Name	Value
fs.s3n.awsAccessKeyId	Your S3 access key
fs.s3n.awsSecretAccessKey	Your S3 secret access key

通过S3创建Hive表

CREATE EXTERNAL TABLE mydata (key STRING, value INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '=' LOCATION 's3n://mys3bucket/';

Note: don’t forget the trailing slash in the LOCATION clause!

Here we’ve created a Hive table named mydata that has two columns: a key and a value. The FIELDS TERMINATED clause tells Hive that the two columns are separated by the ‘=’ character in the data files. The LOCATION clause points to our external data in mys3bucket.

Now, we can query the data:

SELECT * FROM mydata ORDER BY key;

20） HIVE文件存储格式

1.textfile

textfile为默认格式；存储方式：行存储；磁盘开销大 数据解析开销大；压缩的text文件 hive无法进行合并和拆分；查询的效率最低，加载数据的速度最高

2.sequencefile

二进制文件,以<key,value>的形式序列化到文件中；存储方式：行存储；可分割 压缩，一般选择block压缩；优势是文件和Hadoop api中的mapfile是相互兼容的。 存储空间消耗最大,压缩的文件可以分割和合并 查询效率高，需要通过text文件转化来加载

3.rcfile

存储方式：数据按行分块 每块按照列存储；压缩快 快速列存取；读记录尽量涉及到的block最少；读取需要的列只需要读取每个row group 的头部定义；读取全量数据的操作 性能可能比sequencefile没有明显的优势；存储空间最小，查询的效率最高，需要通过text文件转化来加载，加载的速度最低

4.orc

存储方式：数据按行分块 每块按照列存储；压缩快 快速列存取；效率比rcfile高,是rcfile的改良版本

5.自定义格式

用户可以通过实现inputformat和 outputformat来自定义输入输出格式。

21）hiveserver 与 hiveserver2

从hive1.0以后，hiveserver被hiveserver2取代

[root@hadoop1 bin]# ./hive --service hiveserver --help

17/07/05 23:25:36 DEBUG util.VersionInfo: version: 2.6.0-cdh5.4.0

Starting Hive Thrift Server

Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.hive.service.HiveServer

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:270)

at org.apache.hadoop.util.RunJar.run(RunJar.java:214)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

查看图片附件

1
顶

1
踩

分享到：

Spring Cloud之OAuth2 | Spring Cloud之Configuration Server

2017-07-08 11:33
浏览 3321
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hadoop与Hive日志

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hadoop与Hive日志

评论

发表评论

相关推荐

数据接入ElasticSearch方式培训PPT

最近访客更多访客>>