第二章 Getting Started
1.Hive最大的局限性是什么?
一是不支持行级别的增删改(insert, delete, update)
二是查询性能非常差(基于Hadoop MapReduce),不适合延迟小的交互式任务
三是不支持事务
2. Hive MetaStore是干什么的?
Hive persists table schemas and other system metadata.
The information required for table schema, partition information, etc.,is small, typically much smaller than the large quantity of data stored in Hive. As a result, you typically don’t need a powerful dedicated database
server for the metastore. However because it represents a Single Point of Failure (SPOF), it is strongly recommended that you replicate and back up this database using the standard techniques you would normally
use with other relational database instances. We won’t discuss those techniques here.
3. 在Hadoop分布式集群环境下,Hive提交MapReduce作业到Hadoop集群,
一:Hive是否需要安装到集群的每台机器上?答案:不需要,Hive只需要一个实例,Hive可以看成MapReduce作业提交客户端
二:Hive是否要安装到Hadoop集群Master节点所在的机器上?答案:不需要,Hive可以远程提交作业到Hadoop Master
4. Hive的Word Count计数
CREATE TABLE docs (line STRING);
LOAD DATA INPATH 'docs' OVERWRITE INTO TABLE docs;
CREATE TABLE word_counts AS
SELECT word, count(1) AS count FROM
(SELECT explode(split(line, '\s')) AS word FROM docs) w
GROUP BY word
ORDER BY word;
第三章 Data Types and File Formats
1. 在使用Hive时,在Hive的bin目录下生成了一个metastore_db目录,这个目录是怎么生成的?
Hive默认使用Derby作为MetaStore的存储数据库,而Derby是基于文件的数据库,那么当使用Derby时,Derby默认在当前hive的工作目录下创建一个metastore_db目录作为MetaStore数据库目录。
也就是说,hive命令运行在哪个目录下,Derby就在哪个目录下创建metastore_db目录
2. ./hive命令实际上等价于./hive --service cli,cli是hive命令启动的默认服务,也就是说hive实际上是启动服务的命令
3. 打开hwi服务时报错,参考
4. Create Table行、列、集合元素分隔符
CREATE TABLE employees (
name STRING,
salary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
ROW FORMAT DELIMITED是FIELDS TERMINATED BY、COLLECTION ITEMS TERMINATED BY和MAP KEYS TERMINATED BY子句的开始,也就是说,指定行、列、集合元素分隔符时(或者其中之一),必须使用ROW FORMAT DELIMITED开头。
FIELDS TERMINATED BY指定列分隔符
COLLECTION ITEMS TERMINATED BY指定STRUCT元素项、数组元素项、MAP元素项(KV元素对)之间的分隔符
MAP KEYS TERMINATED BY指定MAP元素项中KV之间的分隔符
5. 完整性检查
Hive在写数据时,比如LOAD DATA,并不会进行Schema检查(关系型数据写数据时的数据完整性检查是关系型数据库的核心能力之一,Schema on Write);相反,HIVE在读数据时,会对每一行进行按照Schema进行分隔解析,同时保证最大限度的容错性,
比如数据中列不够时,自动补上null;数据列过多,自动丢弃
第四章 HiveQL: DataDefinition
1. HQL is perhaps closest to MySQL’s dialect, but with significant differences. Hive offers no support for rowlevel inserts, updates, and deletes.
2. Hive doesn’t support transactions、
3. HiveQL DDL are used for creating, altering, and dropping databases, tables, views, functions, and indexes.(没提Partition、Bucket?)
4. SHOW and DESCRIBE commands for listing and describing items
5. Hive中的Database
The Hive concept of a database is essentially just a catalog or namespace of tables. However, they are very useful for larger clusters with multiple teams and users, as a way of avoiding table name collisions. It’s also common to use databases to organize production tables into logical groups. If you don’t specify a database, the default database is used.
4.1 数据库操作
4.1.1
hive> CREATE DATABASE financials;
./hdfs dfs -ls /user/hive/warehouse drwxr-xr-x - hadoop supergroup 0 2015-04-04 04:35 /user/hive/warehouse/financials.db
4.1.2. 在Hive命令行终端显示当前正在操作的数据库的名字
hive> set hive.cli.print.current.db=true; hive (financials)>
4.1.3. 数据库基本操作
hive> CREATE DATABASE financials COMMENT 'Holds all financial tables'; ///创建数据库,同时指定一个COMMENT hive> DESCRIBE DATABASE financials; ///对数据库进行描述 hive> USE financials; ///切断数据库 hive> DROP DATABASE hive> DROP DATABASE IF EXISTS financials; //如果数据库中有table,那么不允许删除数据库 hive> DROP DATABASE IF EXISTS financials CASCADE; ///如果数据库中有table,则级联删除table hive> CREATE DATABASE financials WITH DBPROPERTIES ('creator' = 'Mark Moneybags', 'date' = '2012-01-02'); ///创建数据库时指定数据库的KV属性 hive> DESCRIBE DATABASE EXTENDED financials; ///EXTENDED可以显示创建数据库指定的KV属性 ALTER DATABASE financials SET DBPROPERTIES ('creator = 'Joe Dba'); ///数据库的名字、数据位置都不可修改,但是可以修改数据库的KV属性
4.2 数据库表操作
4.2.1 创建表
CREATE TABLE IF NOT EXISTS mydb.employees ( name STRING COMMENT 'Employee name', salary FLOAT COMMENT 'Employee salary', subordinates ARRAY<STRING> COMMENT 'Names of subordinates', deductions MAP<STRING, FLOAT> COMMENT 'Keys are deductions names, values are percentages', address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> COMMENT 'Home address') COMMENT 'Description of the table' TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...) LOCATION '/user/hive/warehouse/mydb.db/employees';
可以针对列添加COMMENT,也可以针对整个表添加COMMENT
可以指定表中的数据存放的路径,默认是warehouse的路径+数据库的名字+表名字,即上面使用的默认路径
4.2.2 描述表
hive> DESCRIBE EXTENDED mydb.employees; name string Employee name salary float Employee salary subordinates array<string> Names of subordinates deductions map<string,float> Keys are deductions names, values are percentages address struct<street:string,city:string,state:string,zip:int> Home address Detailed Table Information Table(tableName:employees, dbName:mydb, owner:me, ... location:hdfs://master-server/user/hive/warehouse/mydb.db/employees, parameters:{creator=me, created_at='2012-01-02 10:00:00', last_modified_user=me, last_modified_time=1337544510, comment:Description of the table, ...}, ...)
相关推荐
[奥莱理] Programming Hive (英文版) [奥莱理] Programming Hive (E-Book) ☆ 出版信息:☆ [作者信息] Edward Capriolo, Dean Wampler, Jason Rutherglen [出版机构] 奥莱理 [出版日期] 2012年10月03日 [图书...
hive hadoo MapReduce 介绍Hive。Hive入门,Hive学习笔记
Programming Hive introduces Hive, an essential tool in the Hadoop ecosystem that provides an SQL (Structured Query Language) dialect for querying data stored in the Hadoop Distributed Filesystem (HDFS...
Hive 学习笔记Hive
Programming Hive.pdf 书籍 免积分下载
Hive Programming Hive Programming
对Hive的学习过程总结,包括Hive的基本使用,Hive的调优,UDF函数的使用说明!
hadoop,hive,pig,zookeeper,hbase,hdfs
hive
包括了Hive简介、安装搭建、常用操作、函数整理、优化整理。比较适合新手入门!个人整理,有问题请留言或发送邮件至name_hanlin@163.com
hive笔记.md的
HiveSQL基础窗口函数学习笔记
Hive操作笔记(呕心沥血制作)Hive操作笔记(呕心沥血制作)Hive操作笔记(呕心沥血制作)Hive操作笔记(呕心沥血制作)Hive操作笔记(呕心沥血制作)Hive操作笔记(呕心沥血制作)Hive操作笔记(呕心沥血制作)Hive...
IT十八掌第三期配套课堂笔记 1、Hive工作原理、类型及特点 2、Hive架构及其文件格式 3、Hive操作及Hive复合类型 4、Hive的JOIN详解 5、Hive优化策略 6、Hive内置操作符与函数 7、Hive用户自定义函数接口 8、Hive的...
hive学习笔记,大数据,数据仓库纪要.
programming_hive-master.zip 经典hive书籍的代码