Hadoop HBase Hive
启动:
$HADOOP_HOME/bin/start-all.sh
$HBASE_HOME/bin/start-hbase.sh
$HIVE_HOME/bin/hive start
环境配置
1、JDK安装
2、SSH配置
3、环境变量
/etc/profile
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_65
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${JRE_HOME}/lib/rt.jar
export HADOOP_HOME=/usr/local/hadoop
export CLASSPATH=.:$CLASSPATH:$HADOOP_HOME/lib
export HBASE_HOME=/usr/local/hbase
export HIVE_HOME=/usr/local/hive
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib
export PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin:${HADOOP_HOME}/bin:${HBASE_HOME}/bin:$HIVE_HOME/bin:$PATH
Hadoop
Hadoop版本:1.1.2
目的
This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).
Prerequisites
Required Software
Required software for Linux and Windows include:
1. JavaTM 1.6.x, preferably from Sun, must be installed.
2. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
Additional requirements for Windows include:
1. Cygwin - Required for shell support in addition to the required software above.
Installing Software
If your cluster doesn't have the requisite software you will need to install it.
For example on Ubuntu Linux:
$ sudo apt-get install ssh
$ sudo apt-get install rsync
On Windows, if you did not install the required software when you installed cygwin, start the cygwin installer and select the packages:
· openssh - the Net category
下载
To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.
Prepare to Start the Hadoop Cluster
Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_65
Try the following command:
$ bin/hadoop
This will display the usage documentation for the hadoop script.
伪分布式Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
配置Configuration
Use the following:
hostname myhadoop
vi /etc/hostname
myhadoop
vi /etc/hosts
ip myhadoop
conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://myhadoop:9000</value>
</property>
</configuration>
conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/hadoopdata/dfsname</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/hadoopdata/dfsdata</value>
</property>
</configuration>
conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>myhadoop:9001</value>
</property>
</configuration>
Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
执行Execution
Format a new distributed-filesystem:
$ bin/hadoop namenode -format
Start the hadoop daemons:
$ bin/start-all.sh
The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).
Browse the web interface for the NameNode and the JobTracker; by default they are available at:
<!--[if !supportLists]-->· <!--[endif]-->NameNode - http://localhost:50070/
<!--[if !supportLists]-->· <!--[endif]-->JobTracker - http://localhost:50030/
Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input
Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
Examine the output files:
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*
or
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*
When you're done, stop the daemons with:
$ bin/stop-all.sh
HBase
HBase版本:0.94.7
hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://myhadoop:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>myhadoop</value>
</property>
bin/start-hbase.sh
Hive
一、hive配置
cp hive-default.xml.template hive-site.xml
cp hive-log4j.properties.template hive-log4j.properties
cp hive-env.sh.template hive-env.sh
二、修改hive-env.sh
配置HADOOP_HOME路径。
三、修改hive-site.xml配置文件,把Hive的元数据存储到MySQL中
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://myhadoop:3306/hive_metadata?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>file:///usr/local/hive/lib/hive-hbase-handler-0.9.0.jar,file:///usr/local/hive/lib/hbase-0.94.7-security.jar,file:///usr/local/hive/lib/protobuf-java-2.4.0a.jar,file:///usr/local/hive/lib/zookeeper-3.4.5.jar</value>
</property>
删除/usr/local/hive/lib下的hbase-0.92.0.jar,hbase-0.92.0-tests.jar,zookeeper-3.4.3.jar
从hbase拷贝hbase-0.94.7-security.jar、zookeeper-3.4.5.jar和protobuf-java-2.4.0a.jar到hive/lib下。
三、修改hive-log4j.properties
#log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
四、在hdfs上面,创建目录
$HADOOP_HOME/bin/hadoop fs -mkdrr /tmp
$HADOOP_HOME/bin/hadoop fs –mkdir /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
五、手动上传mysql的jdbc库到hive/lib
~ ls /home/cos/toolkit/hive-0.9.0/lib
mysql-connector-java-5.1.22-bin.jar
六、启动hive
方式1:
bin/hive start
方式2:
#启动metastore服务
~ bin/hive --service metastore &
Starting Hive Metastore Server
#启动hiveserver服务
~ bin/hive --service hiveserver &
Starting Hive Thrift Server
#启动hive客户端
~ bin/hive shell
Logging initialized using configuration in file:/root/hive-0.9.0/conf/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201211141845_1864939641.txt
hive> show tables
OK
Hive函数、复杂类型访问操作
hive提供了复合数据类型:
Structs: structs内部的数据可以通过DOT(.)来存取,例如,表中一列c的类型为STRUCT{a INT; b INT},我们可以通过c.a来访问域a
Maps(K-V对):访问指定域可以通过["指定域名称"]进行,例如,一个Map M包含了一个group-》gid的kv对,gid的值可以通过M['group']来获取
Arrays:array中的数据为相同类型,例如,假如array A中元素['a','b','c'],则A[1]的值为'b'
Array:
建表:
create table class_test(name string, student_id_list array<INT>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':';
导数据:
vim test.txt
034,1:2:3:4
035,5:6
036,7:8:9:10
LOAD DATA LOCAL INPATH '/opt/test.txt' INTO TABLE class_test ;
查询:
select student_id_list[3] from class_test;
Map:
建表:
create table employee(id string, perf map<string, int>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':';
导数据:
vim test2.txt
1 job:80,team:60,person:70
2 job:60,team:80
3 job:90,team:70,person:100
LOAD DATA LOCAL INPATH '/opt/test2.txt' INTO TABLE employee;
查询:
select perf['person'] from employee;
select perf['person'] from employee where perf['person'] is not null;
Struct使用
建表:
create table student_test(id INT, info struct<name:STRING, age:INT>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':';
'FIELDS TERMINATED BY' :字段与字段之间的分隔符
'COLLECTION ITEMS TERMINATED BY' :一个字段各个item的分隔符
导入数据:
cat test3.txt
1,zhou:30
2,yan:30
3,chen:20
4,li:80
LOAD DATA LOCAL INPATH '/opt/test3.txt' INTO TABLE student_test;
查询:
select info.age from student_test;
查询每年9.30-10.07十一期间的身份证号:
select substr(rzsj,1,4) as year, sfzmhm from jnlk where substr(rzsj,6,5)>='09-30' and substr(rzsj,6,5)<='10-07' and rzsj is not null order by year,sfzmhm;
练习
Hbase创建表:
create 'blog','article','author'
插入hbase数据:
put 'blog','1','article:title','Head First HBase'
put 'blog','1','article:content','HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data.'
put 'blog','1','article:tags','Hadoop,HBase,NoSQL'
put 'blog','1','author:name','hujinjun'
put 'blog','1','author:nickname','一叶渡江'
put 'blog','10','article:tags','Hadoop'
put 'blog','10','author:nickname','heyun'
put 'blog','100','article:tags','hbase,nosql'
put 'blog','100','author:nickname','shenxiu'
hive:
CREATE EXTERNAL TABLE blog(key int,title string,content string,tags string,name string,nickname string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "article:title,article:content,article:tags,author:name,author:nickname") TBLPROPERTIES("hbase.table.name" = "blog");
hive> create table wyp (id int, name string, age int, tel string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
vim /opt/wyp.txt
1 wyp 25 13188888888888
2 test 30 13888888888888
3 zs 34 899314121
hive> load data local inpath '/opt/wyp.txt' into table wyp;
vim /opt/add.txt
5 wyp1 23 131212121212
6 wyp2 24 134535353535
7 wyp3 25 132453535353
8 wyp4 26 154243434355
$HADOOP_HOME/bin/hadoop fs -mkdir /wyp
$HADOOP_HOME/bin/hadoop fs -copyFromLocal /opt/add.txt /wyp/add.txt
hive> load data inpath '/wyp/add.txt' into table wyp;
hive> select * from wyp;
FAQ
hive hwi 启动错误
错误日志:
INFO hwi.HWIServer: HWI is starting up
WARN conf.HiveConf: DEPRECATED: Ignoring hive-default.xml found on the CLASSPATH at /usr/local/hive/conf/hive-default.xml
FATAL hwi.HWIServer: HWI WAR file not found at /usr/local/hive/usr/local/hive/lib/hive-hwi-0.9.0.war
解决方法:
这样的错误解决办法很简单,hive-site.xml中添加:
<property>
<name>hive.hwi.war.file</name>
<value>lib/hive-hwi-0.9.0.war</value>
<description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>
</property>
否则路径错误!
相关推荐
hadoop+hive+hbase在线装,超级简单
伪分布式的Hadoop+Hive+HBase搭建记录[收集].pdf
Hadoop+HBase+Hive+lucene分布式搜索引擎分析系统
hbase=2.2.2 mysql=5.7.3 hive=3.1.2 scala=2.11.12 spark=2.4.0 sbt=1.3.8 flink=1.9.1 maven=3.6.3 建议配合主机的IntelliJ-IDEA的Bigdata拓展工具以及SSH服务进行远程操控使用。 3.本机密码都是“hadoop”。 4....
jdk1.8.0_131、apache-zookeeper-3.8.0、hadoop-3.3.2、hbase-2.4.12 mysql5.7.38、mysql jdbc驱动mysql-connector-java-8.0.8-dmr-bin.jar、 apache-hive-3.1.3 2.本文软件均安装在自建的目录/export/server/下 ...
Hbase 高可用分布式搭建,详细版
包括《Hadoop集群监控与Hive高可用-向磊》,hadoop的三本圣经《Hadoop权威指南(第2版)》,《Hadoop实战》和《Hadoop源码分析完整版》。
自己整理的Hadoop环境的一些安装,和一些简单的使用,其中包括Hadoop、hbase、hive、mysql、zookeeper、Kafka、flume。都是一些简单的安装步骤和使用,只在自己的虚拟机(Linux centOS7)上使用测试过。按照步骤一步...
Hadoop2.2+Zookeeper3.4.5+HBase0.96集群环境搭建
自己的安装文档,验证通过! hadoop+zookeeper+hbase+hive
Hadoop 分布式集群搭建 Hadoop由Apache基金会开发的分布式系统基础架构,是利用集群对大量数据进行分布式处理和存储的软件框架。用户可以轻松地在Hadoop集群上开发和运行处理海量数据的应用程序。Hadoop有高可靠,...
伪分布式安装教程: Flume Hbase Spark Hive Kafka Sqoop zookeeper等分布式系统框架 备注:Hadoop安装教程当时忘记记录,后续也懒得弄,所以上传资料也暂无hadoop安装教程,尽请理解!!!!
Hadoop伪分布式配置 25 启动Hadoop 26 运行Hadoop伪分布式实例 30 启动YARN 35 附加教程: 配置PATH环境变量 37 使用Eclipse编译运行MapReduce程序(Hadoop-Eclipse-Plugin,建议) 38 使用Eclipse打包自己的...
Hadoop-2.2.0+Hbase-0.96.2+Hive-0.13.1分布式整合,Hadoop-2.X使用HA方式
集群搭建:里面包括hadoop hive hbase spark mongo 等组建
01_hadoop_hdfs1分布式文件系统01 02_hadoop_hdfs1分布式文件系统02 03_hadoop_hdfs1分布式文件系统03 04_hadoop_hdfs1分布式文件系统04 05_hadoop_hdfs1分布式文件系统05 06_hadoop_hdfs1分布式文件系统06 07_...
通过对Hadoop分布式计算平台最核心的分布式文件系统HDFS、MapReduce处理过程,以及数据仓库工具Hive和分布式数据库Hbase的介绍,基本涵盖了Hadoop分布式平台的所有技术核心。通过这一阶段的调研总结,从内部机理的...
jmx prometheus grafana 监控开源hadoop各个组件模板。 Hadoop、Zookeeper、HBase等包含12个json模板 导入即可使用,不许做任何修改。
centos大数据分布式集群搭建,包含hadoop spark hbase hive solr elasticsearch redis zookeeper rocketmq mongodb mariadb storm kafka docker
资源描述:妳那伊抹微笑_云计算之Hadoop-2.2.0+Hbaase-0.96.2 +Hive-0.13.1完全分布式环境整合安装文档V1.0.0.docx 博客地址:http://blog.csdn.net/u012185296 技术方向:Flume+Kafka+Storm+Redis/Hbase+Hadoop+...