1、解压
tar -zxvf Downloads/apache-hive-3.1.1-bin.tar.gz -C applications/
2、建软连接
ln -s apache-hive-3.1.1-bin hive
3、驱动包mysql-connector-java-5.1.27.jar放在/lib
cp ~/Downloads/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46.jar ~/applications/apache-hive-3.1.1-bin/lib/
4、配置环境变量
/etc/profile
export HIVE_HOME=/opt/applications/hive
exporT PATH=$HIVE_HOME/bin:$PATH
5、创建Hive mysql数据库和表
首先创建hive账户
mysql> create user 'hive' identified by '123456';
将mysql所有权限授予hive账户
grant all on *.* to 'hive'@'%' identified by '123456';
flush privileges;
使用hive用户登录mysql数据库:
mysql -h localhost -u hive -p
创建数据库hive
mysql> create database hive;
Query OK, 1 row affected (0.05 sec)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| confluence |
| hive |
| mysql |
| performance_schema |
| sys |
+--------------------+
6 rows in set (0.00 sec)
6、hive-site.xml配置
新建 hive-env.sh
cp conf/hive-env.sh.template conf/hive-env.sh
hive-env.sh配置
HADOOP_HOME=/opt/applications/hadoop
export HIVE_CONF_DIR=/opt/applications/hive/conf
export HIVE_AUX_JARS_PATH=/opt/applications/hive/lib
hive-site.xml不存在,复制一份
[wls81@master applications]$ cd hive/
[wls81@master hive]$ cd conf/
[wls81@master conf]$ ls -lrt
total 332
-rw-r--r-- 1 wls81 wls81 2662 Apr 4 2018 parquet-logging.properties
-rw-r--r-- 1 wls81 wls81 2060 Apr 4 2018 ivysettings.xml
-rw-r--r-- 1 wls81 wls81 2365 Apr 4 2018 hive-env.sh.template
-rw-r--r-- 1 wls81 wls81 1596 Apr 4 2018 beeline-log4j2.properties.template
-rw-r--r-- 1 wls81 wls81 2274 Apr 4 2018 hive-exec-log4j2.properties.template
-rw-r--r-- 1 wls81 wls81 3086 Oct 24 07:49 hive-log4j2.properties.template
-rw-r--r-- 1 wls81 wls81 7163 Oct 24 07:49 llap-daemon-log4j2.properties.template
-rw-r--r-- 1 wls81 wls81 3558 Oct 24 07:49 llap-cli-log4j2.properties.template
-rw-r--r-- 1 wls81 wls81 299970 Oct 24 08:19 hive-default.xml.template
[wls81@master conf]$ cp hive-default.xml.template hive-site.xml
默认值
<property>
<name>hive.metastore.db.type</name>
<value>DERBY</value>
<description>
Expects one of [derby, oracle, mysql, mssql, postgres].
Type of database used by the metastore. Information schema & JDBCStorageHandler depend on it.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=metastore_db;create=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>APP</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mine</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value/>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
修改如下
<property>
<name>hive.metastore.db.type</name>
<value>mysql</value>
<description>
Expects one of [derby, oracle, mysql, mssql, postgres].
Type of database used by the metastore. Information schema & JDBCStorageHandler depend on it.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver
</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
增加配置远程数据库模式
https://blog.csdn.net/dufufd/article/details/78614958三种部署Mysql模式
<property>
<name>hive.metastore.local</name>
<value>false</value>#true 为本地模式
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
7、修改hive数据目录
修改配置文件vi hive-site.xml,更改相关数据目录
默认值
<property>
<name>hive.querylog.location</name>
<value>${system:java.io.tmpdir}/${system:user.name}</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>${system:java.io.tmpdir}/${system:user.name}</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>${system:java.io.tmpdir}/${system:user.name}/operation_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
修改值
<property>
<name>hive.querylog.location</name>
<value>/wls/log/hive/logs</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/Data/hive/scratchdir</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/Data/hive/resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/wls/log/hive/operation_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
sudo mkdir -p /wls/log/hive/logs
sudo mkdir -p /wls/log/hive/operation_logs
sudo mkdir -p /Data/hive/scratchdir
sudo mkdir -p /Data/hive/resources
sudo chown -R wls81:wls81 /wls/log/hive/logs
sudo chown -R wls81:wls81 /Data/hive/scratchdir
sudo chown -R wls81:wls81 /Data/hive/resources
sudo chown -R wls81:wls81 /wls/log/hive/operation_logs
其中
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
创建Hdfs
hdfs dfs -mkdir -p /user/hive/warehouse
8、初始化hive 元数据
./bin/schematool -dbType mysql -initSchema
初始化报错
Metastore connection URL: jdbc:mysql//localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
Underlying cause: java.sql.SQLException : No suitable driver found for jdbc:mysql//localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false
SQL Error code: 0
Use --verbose for detailed stacktrace.
原因是Jdbc:mysql后面忘记红色部分
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
</property>
初始化成功,数据库表有74张表
9、启动
遇到问题
aused by: com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0x8
原因是Conf/hive-site.xml的红色部分,删除即可
<property>
<name>hive.txn.xlock.iow</name>
<value>true</value>
<description>
Ensures commands with OVERWRITE (such as INSERT OVERWRITE) acquire Exclusive locks fortransactional tables. This ensures that inserts (w/o overwrite) running concurrently
are not hidden by the INSERT OVERWRITE.
</description>
</property>
再次启动
前言
作为数据仓库的工具,hive提供了两种ETL运行方式,分别是通过Hive 命令行和beeline客户端;
命令行方式即通过hive进入命令模式后通过执行不同的HQL命令得到对应的结果;相当于胖客户端模式,即客户机中需要安装JRE环境和Hive程序。
beeline客户端方式相当于瘦客户端模式,采用JDBC方式借助于Hive Thrift服务访问Hive数据仓库。
HiveThrift(HiveServer)是Hive中的组件之一,设计目的是为了实现跨语言轻量级访问Hive数据仓库,有Hiveserver和 Hiveserver2两个版本,两者不兼容,使用中要注意区分。体现在启动HiveServer的参数和jdbc:hiveX的参数上。
beeline相关的Server.Thrift配置
主要是hive/conf/hive-site.xml中hive.server2.thrift相关的一些配置项,但要注意一致性
<property>
<name>hive.server2.thrift.bind.host</name>
<value>master</value>
<description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10001</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'http'.</description>
</property>
进入beeline连接数据库后,因为要访问的文件在HDFS上,对应的路径有访问权限限制,所以,这里要设成hadoop中的用户名,实例中用户名即为'wls81’。
如果使用其它用户名,可能会报权限拒绝的错误。或通过修改hadoop中的配置项hadoop.proxyuser.XX为“*” 来放宽用户名和权限
<property>
<name>hive.server2.thrift.client.user</name>
<value>wls81</value>
<description>Username to use against thrift client</description>
</property>
<property>
<name>hive.server2.thrift.client.password</name>
<value>123456</value>
<description>Password to use against thrift client</description>
</property>
hadoop/etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
配置解析:
hadoop.proxyuser.hadoop.hosts 配置成*的意义,表示任意节点使用 hadoop 集群的代理用户hadoop 都能访问 hdfs 集群,hadoop.proxyuser.hadoop.groups 表示代理用户的组所属
如果代理用户的组所属wls81,上述则修改为 hadoop.proxyuser.wls81.hosts hadoop.proxyuser.wls81.hosts
启动beeline并访问Hive
master上启动hiveserver2,
nohup hive --service metastore & #启动metastore服务
nohup hive --service hiveserver2 &
ps -ef | grep Hive 能看到Hiveserver2已启动
beeline
或者用
beeline -u jdbc:hive2:
beeline的一些操作
!help //查看帮助
!close //关闭当前连接 如我们连接jdbc连接
!table ; //显示表
!sh clear ; //执行shell脚本命令
!quit ; //退出beeline终端
在beeline上执行聚合函数和高级查询
select count(*) from t1; //统计
select max(*) from t1; //最大值
select min(*) from t1; //最小值
select sum(*) form t1; //求和
select avg(*) from t1; //求平均值
select * from t1 order by id limit 5,5; //分页
select * from (select id,name from t1) a; //子查询或者叫嵌套查询
select name,case when id < 3 then 'small' case when id =3 then "true" else 'big'
//case when等价于java中if else/switch case
select count(*),sum(id) from t1 gourp by city having id >10;
like和rlike区别
like和rlike一般用于模糊查询
(假如我们要从employees表中查找所有住址街道名称中含有单词Chicago和Ontario的雇员名称和街道信息)
like实例:
select name,address from employees
where address like '%Chicago%' OR address like '%Ontario%';
rlike实例:
select name,address from employees
where address rlike '.*(Chicago|Ontario).*';
我们看的出来rlike是like的强化版,支持java的正则表达式,更方便,简化代码
相关推荐
VM虚拟机上,安装ubantu搭建hadoop+Hive集群,步骤详细。
大数据集群 Hadoop HBase Hive Sqoop 集群环境安装配置及使用文档 在本文档中,我们将详细介绍如何搭建一个大数据集群环境,包括 Hadoop、HBase、Hive 和 Sqoop 的安装配置及使用。该文档将分为四部分:Hadoop 集群...
hadoop+hive+spark部署文档
根据项目实际搭建测试开发环境,包括hadoop hbase hive的详细搭建过程
大数据离线分析系统,基于hadoop的hive以及sqoop的安装和配置
Centos+Hadoop+Hive+HBase
Hadoop+Zookeeper+Hbase+Hive部署
1、内容概要:Hadoop+Spark+Hive+HBase+Oozie+Kafka+Flume+Flink+Elasticsearch+Redash等大数据集群及组件搭建指南(详细搭建步骤+实践过程问题总结)。 2、适合人群:大数据运维、大数据相关技术及组件初学者。 3、...
win10下搭建Hadoop(jdk+mysql+hadoop+scala+hive+spark),包括jdk的安装、mysql安装和配置,hadoop安装和配置,scala安装和配置,hive安装和配置,spark安装和配置。
最近在研究Hadoop Hive,在度娘找了几百几千份文档,拼凑起来才勉强部署好,太耗时,太耗时,太耗时 现在我整理好我过程中的每一个步骤,供大家一起学习共勉。
描述基于CentOS7的Hadoop2.7.7集群部署+hive3.1.1+Tez0.9.1的环境搭建
在LINUX虚拟机中搭建 HADOOP+HIVE大数据平台,完善伪分布搭建手册 。Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速...
基于Python+Flask+Hadoop+Hive的股票大数据分析系统的设计与实现+部署文档+全部资料 高分项目.zip基于Python+Flask+Hadoop+Hive的股票大数据分析系统的设计与实现+部署文档+全部资料 高分项目.zip 【备注】 1、该...
基于Hadoop+Hive+Vue+sqoop数据分析的具有智能推荐功能的网上书籍商城的设计与实现+部署文档+全部资料 高分项目基于Hadoop+Hive+Vue+sqoop数据分析的具有智能推荐功能的网上书籍商城的设计与实现+部署文档+全部资料 ...
大数据开发_大数据自动化部署_包括hadoop+hive+hbase+spark+storm等组件
针对Hadoop云平台的安装,部署,主要包括hadoop,hive,yarn的配置说明,使用的版本是Apache官方开源版本.
docker安装 docker安装方法有多种,下面列举在Windows和Linux系统中的安装步骤:12 Windows系统中的安装方法: 对于Windows Server系统,可以使用Microsoft发布的PowerShell模块DockerMicrosoftProvider来安装...
3.集群能正常运行的条件是集群可节点宕机数应保证有超过集群机器总数一半的机器在运行,因此从经济和实用性来说,集群的节点一般是奇数个,本文部署4台机器,其容灾能力与部署3台机器一致,即只能宕机1台
win hive+hadoop 部署
hadoop mapreduce hive spark hbase spack storm sqoop hadoop hive spark hadoop 查看输入的文件内容 [hadoop@master ~]$ more wc.input 任务二 将输入文件上传到HDFS 在master主节点,使用 root 用户登录,然后...