- 浏览: 483973 次
- 性别:
- 来自: 南阳
文章分类
最新评论
-
yuanhongb:
这么说来,感觉CGI和现在的JSP或ASP技术有点像啊
cgi -
draem0507:
放假了还这么勤啊
JXL操作Excel -
chenjun1634:
学习中!!
PHP/Java Bridge -
Jelen_123:
好文章,给了我好大帮助!多谢!
hadoop安装配置 ubuntu9.10 hadoop0.20.2 -
lancezhcj:
一直用job
Oracle存储过程定时执行2种方法(转)
SQOOP是一款开源的工具,主要用于在HADOOP与传统的数据库间进行数据的传递,下面从SQOOP用户手册上摘录一段描述
Sqoop is a tool designed to transfer data between Hadoop andrelational databases. You can use Sqoop to import data from arelational database management system (RDBMS) such as MySQL or Oracleinto the Hadoop Distributed File System (HDFS),transform the data in Hadoop MapReduce, and then export the data backinto an RDBMS.
这里我主要描述一下安装过程
1、下载相应软件
我使用的HADOOP版本是APACHE官方版本0.20.2,但是后来在使用的过程中报错,查阅了一些文章,发现SQOOP是不支持此版本的,一般都会推荐你使用CDH3。不过后来通过拷贝相应的包到sqoop-1.2.0-CDH3B4/lib下,依然还是可以使用的。当然,你可以选择直接使用CDH3。
下面是CDH3和SQOOP 1.2.0的下载地址
http://archive.cloudera.com/cdh/3/hadoop-0.20.2-CDH3B4.tar.gz
http://archive.cloudera.com/cdh/3/sqoop-1.2.0-CDH3B4.tar.gz
其中sqoop-1.2.0-CDH3B4依赖hadoop-core-0.20.2-CDH3B4.jar,所以你需要下载hadoop-0.20.2-CDH3B4.tar.gz,解压缩后将hadoop-0.20.2-CDH3B4/hadoop-core-0.20.2-CDH3B4.jar复制到sqoop-1.2.0-CDH3B4/lib中。
另外,sqoop导入mysql数据运行过程中依赖mysql-connector-java-*.jar,所以你需要下载mysql-connector-java-*.jar并复制到sqoop-1.2.0-CDH3B4/lib中。
2、修改SQOOP的文件configure-sqoop,注释掉hbase和zookeeper检查(除非你准备使用HABASE等HADOOP上的组件)
#if [ ! -d "${HBASE_HOME}" ]; then
# echo “Error: $HBASE_HOME does not exist!”
# echo ‘Please set $HBASE_HOME to the root of your HBase installation.’
# exit 1
#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
# echo “Error: $ZOOKEEPER_HOME does not exist!”
# echo ‘Please set $ZOOKEEPER_HOME to the root of your ZooKeeper installation.’
# exit 1
#fi
3、启动HADOOP,配置好相关环境变量(例如$HADOOP_HOME),就可以使用SQOOP了
下面是个从数据库导出表的数据到HDFS上文件的例子
[wanghai01@tc-crm-rd01.tc sqoop-1.2.0-CDH3B4]$ bin/sqoop import --connect jdbc:mysql://XXXX:XX/crm --username crm --password 123456 --table company -m 1
11/09/21 15:45:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/21 15:45:26 INFO tool.CodeGenTool: Beginning code generation
11/09/21 15:45:26 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `company` AS t LIMIT 1
11/09/21 15:45:26 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `company` AS t LIMIT 1
11/09/21 15:45:26 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/21 15:45:26 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/21 15:45:26 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/2bd70cf2b712a9c7cdb0860722ea7c18/company.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/./company.java
11/09/21 15:45:26 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/2bd70cf2b712a9c7cdb0860722ea7c18/company.jar
11/09/21 15:45:26 WARN manager.MySQLManager: It looks like you are importing from mysql.
11/09/21 15:45:26 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
11/09/21 15:45:26 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
11/09/21 15:45:26 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
11/09/21 15:45:26 INFO mapreduce.ImportJobBase: Beginning import of company
11/09/21 15:45:27 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `company` AS t LIMIT 1
11/09/21 15:45:28 INFO mapred.JobClient: Running job: job_201109211521_0001
11/09/21 15:45:29 INFO mapred.JobClient: map 0% reduce 0%
11/09/21 15:45:40 INFO mapred.JobClient: map 100% reduce 0%
11/09/21 15:45:42 INFO mapred.JobClient: Job complete: job_201109211521_0001
11/09/21 15:45:42 INFO mapred.JobClient: Counters: 5
11/09/21 15:45:42 INFO mapred.JobClient: Job Counters
11/09/21 15:45:42 INFO mapred.JobClient: Launched map tasks=1
11/09/21 15:45:42 INFO mapred.JobClient: FileSystemCounters
11/09/21 15:45:42 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=44
11/09/21 15:45:42 INFO mapred.JobClient: Map-Reduce Framework
11/09/21 15:45:42 INFO mapred.JobClient: Map input records=8
11/09/21 15:45:42 INFO mapred.JobClient: Spilled Records=0
11/09/21 15:45:42 INFO mapred.JobClient: Map output records=8
11/09/21 15:45:42 INFO mapreduce.ImportJobBase: Transferred 44 bytes in 15.0061 seconds (2.9321 bytes/sec)
11/09/21 15:45:42 INFO mapreduce.ImportJobBase: Retrieved 8 records.
查看一下数据
[wanghai01@tc-crm-rd01.tc sqoop-1.2.0-CDH3B4]$ hadoop fs -cat /user/wanghai01/company/part-m-00000
1,xx
2,eee
1,xx
2,eee
1,xx
2,eee
1,xx
2,eee
到数据库中查一下验证一下
mysql> select * from company;
+------+------+
| id | name |
+------+------+
| 1 | xx |
| 2 | eee |
| 1 | xx |
| 2 | eee |
| 1 | xx |
| 2 | eee |
| 1 | xx |
| 2 | eee |
+------+------+
8 rows in set (0.00 sec)
OK,是没有问题的。仔细看执行命令时打出的信息,会发现一个ERROR,这是因为之前我执行过此命令失败了,而再次执行的时候相关的临时数据没有清理。
本篇文章来源于 Linux公社网站(www.linuxidc.com) 原文链接:http://www.linuxidc.com/Linux/2011-10/45081.htm
===================================================
SQOOP是一款开源的工具,主要用于在HADOOP与传统的数据库间进行数据的传递,下面从SQOOP用户手册上摘录一段描述
Sqoopis a tool designed to transfer data between Hadoop and relational databases.You can use Sqoop to import data from a relational database management system(RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System(HDFS),transform the data in Hadoop MapReduce, and then export the data backinto an RDBMS.
SQOOP是Cloudera公司开源的一款在HDFS以及数据库之间传输数据的软件。内部通过JDBC连接HADOOP以及数据库,因此从理论上来讲,只要是支持JDBC的数据库,SQOOP都可以兼容。并且,SQOOP不仅能把数据以文件的形式导入到HDFS上,还可以直接导入数据到HBASE或者HIVE中。
下面是一些性能测试数据,仅供参考:
表名:tb_keywords
行数:11628209
数据文件大小:1.4G
HDFS –> DB
DB -> HDFS
SQOOP
428s
166s
HDFS<->FILE<->DB
209s
105s
从结果上来看,以FILE作为中转方式性能是要高于SQOOP的。原因如下:
1、 本质上SQOOP使用的是JDBC,效率不会比MYSQL自带的到导入\导出工具效率高
2、 以导入数据到DB为例,SQOOP的设计思想是分阶段提交,也就是说假设一个表有1K行,那么它会先读出100行(默认值),然后插入,提交,再读取100行……如此往复
即便如此,SQOOP也是有优势的,比如说使用的便利性,任务执行的容错性等。在一些测试环境中如果需要的话可以考虑把它拿来作为一个工具使用。
下面是一些操作记录
[wanghai01@tc-crm-rd01.tc.baidu.com bin]$ sh export.sh
Fri Sep 23 20:15:47 CST 2011
11/09/23 20:15:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/23 20:15:48 INFO tool.CodeGenTool: Beginning code generation
11/09/23 20:15:48 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:48 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:48 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/23 20:15:48 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/23 20:15:49 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/eb16aae87a119b93acb3bc6ea74b5e97/tb_keyword_data_201104.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/./tb_keyword_data_201104.java
11/09/23 20:15:49 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/eb16aae87a119b93acb3bc6ea74b5e97/tb_keyword_data_201104.jar
11/09/23 20:15:49 INFO mapreduce.ExportJobBase: Beginning export of tb_keyword_data_201104
11/09/23 20:15:49 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:15:49 INFO input.FileInputFormat: Total input paths to process : 1
11/09/23 20:15:49 INFO input.FileInputFormat: Total input paths to process : 1
11/09/23 20:15:49 INFO mapred.JobClient: Running job: job_201109211521_0012
11/09/23 20:15:50 INFO mapred.JobClient: map 0% reduce 0%
11/09/23 20:16:04 INFO mapred.JobClient: map 1% reduce 0%
11/09/23 20:16:10 INFO mapred.JobClient: map 2% reduce 0%
11/09/23 20:16:13 INFO mapred.JobClient: map 3% reduce 0%
11/09/23 20:16:19 INFO mapred.JobClient: map 4% reduce 0%
11/09/23 20:16:22 INFO mapred.JobClient: map 5% reduce 0%
11/09/23 20:16:25 INFO mapred.JobClient: map 6% reduce 0%
11/09/23 20:16:31 INFO mapred.JobClient: map 7% reduce 0%
11/09/23 20:16:34 INFO mapred.JobClient: map 8% reduce 0%
11/09/23 20:16:41 INFO mapred.JobClient: map 9% reduce 0%
11/09/23 20:16:44 INFO mapred.JobClient: map 10% reduce 0%
11/09/23 20:16:50 INFO mapred.JobClient: map 11% reduce 0%
11/09/23 20:16:53 INFO mapred.JobClient: map 12% reduce 0%
11/09/23 20:16:56 INFO mapred.JobClient: map 13% reduce 0%
11/09/23 20:17:02 INFO mapred.JobClient: map 14% reduce 0%
11/09/23 20:17:05 INFO mapred.JobClient: map 15% reduce 0%
11/09/23 20:17:11 INFO mapred.JobClient: map 16% reduce 0%
11/09/23 20:17:14 INFO mapred.JobClient: map 17% reduce 0%
11/09/23 20:17:17 INFO mapred.JobClient: map 18% reduce 0%
11/09/23 20:17:23 INFO mapred.JobClient: map 19% reduce 0%
11/09/23 20:17:25 INFO mapred.JobClient: map 20% reduce 0%
11/09/23 20:17:28 INFO mapred.JobClient: map 21% reduce 0%
11/09/23 20:17:34 INFO mapred.JobClient: map 22% reduce 0%
11/09/23 20:17:37 INFO mapred.JobClient: map 23% reduce 0%
11/09/23 20:17:43 INFO mapred.JobClient: map 24% reduce 0%
11/09/23 20:17:46 INFO mapred.JobClient: map 25% reduce 0%
11/09/23 20:17:49 INFO mapred.JobClient: map 26% reduce 0%
11/09/23 20:17:55 INFO mapred.JobClient: map 27% reduce 0%
11/09/23 20:17:58 INFO mapred.JobClient: map 28% reduce 0%
11/09/23 20:18:04 INFO mapred.JobClient: map 29% reduce 0%
11/09/23 20:18:07 INFO mapred.JobClient: map 30% reduce 0%
11/09/23 20:18:10 INFO mapred.JobClient: map 31% reduce 0%
11/09/23 20:18:16 INFO mapred.JobClient: map 32% reduce 0%
11/09/23 20:18:19 INFO mapred.JobClient: map 33% reduce 0%
11/09/23 20:18:25 INFO mapred.JobClient: map 34% reduce 0%
11/09/23 20:18:28 INFO mapred.JobClient: map 35% reduce 0%
11/09/23 20:18:31 INFO mapred.JobClient: map 36% reduce 0%
11/09/23 20:18:37 INFO mapred.JobClient: map 37% reduce 0%
11/09/23 20:18:40 INFO mapred.JobClient: map 38% reduce 0%
11/09/23 20:18:46 INFO mapred.JobClient: map 39% reduce 0%
11/09/23 20:18:49 INFO mapred.JobClient: map 40% reduce 0%
11/09/23 20:18:52 INFO mapred.JobClient: map 41% reduce 0%
11/09/23 20:18:58 INFO mapred.JobClient: map 42% reduce 0%
11/09/23 20:19:01 INFO mapred.JobClient: map 43% reduce 0%
11/09/23 20:19:04 INFO mapred.JobClient: map 44% reduce 0%
11/09/23 20:19:10 INFO mapred.JobClient: map 45% reduce 0%
11/09/23 20:19:13 INFO mapred.JobClient: map 46% reduce 0%
11/09/23 20:19:19 INFO mapred.JobClient: map 47% reduce 0%
11/09/23 20:19:22 INFO mapred.JobClient: map 48% reduce 0%
11/09/23 20:19:25 INFO mapred.JobClient: map 49% reduce 0%
11/09/23 20:19:34 INFO mapred.JobClient: map 50% reduce 0%
11/09/23 20:19:37 INFO mapred.JobClient: map 52% reduce 0%
11/09/23 20:19:40 INFO mapred.JobClient: map 53% reduce 0%
11/09/23 20:19:43 INFO mapred.JobClient: map 54% reduce 0%
11/09/23 20:19:46 INFO mapred.JobClient: map 55% reduce 0%
11/09/23 20:19:49 INFO mapred.JobClient: map 56% reduce 0%
11/09/23 20:19:52 INFO mapred.JobClient: map 57% reduce 0%
11/09/23 20:19:55 INFO mapred.JobClient: map 58% reduce 0%
11/09/23 20:19:58 INFO mapred.JobClient: map 59% reduce 0%
11/09/23 20:20:01 INFO mapred.JobClient: map 60% reduce 0%
11/09/23 20:20:04 INFO mapred.JobClient: map 62% reduce 0%
11/09/23 20:20:07 INFO mapred.JobClient: map 63% reduce 0%
11/09/23 20:20:10 INFO mapred.JobClient: map 64% reduce 0%
11/09/23 20:20:13 INFO mapred.JobClient: map 65% reduce 0%
11/09/23 20:20:16 INFO mapred.JobClient: map 66% reduce 0%
11/09/23 20:20:19 INFO mapred.JobClient: map 67% reduce 0%
11/09/23 20:20:22 INFO mapred.JobClient: map 68% reduce 0%
11/09/23 20:20:25 INFO mapred.JobClient: map 69% reduce 0%
11/09/23 20:20:28 INFO mapred.JobClient: map 70% reduce 0%
11/09/23 20:20:31 INFO mapred.JobClient: map 72% reduce 0%
11/09/23 20:20:34 INFO mapred.JobClient: map 73% reduce 0%
11/09/23 20:20:37 INFO mapred.JobClient: map 74% reduce 0%
11/09/23 20:20:40 INFO mapred.JobClient: map 75% reduce 0%
11/09/23 20:20:43 INFO mapred.JobClient: map 76% reduce 0%
11/09/23 20:20:46 INFO mapred.JobClient: map 77% reduce 0%
11/09/23 20:20:49 INFO mapred.JobClient: map 78% reduce 0%
11/09/23 20:20:52 INFO mapred.JobClient: map 80% reduce 0%
11/09/23 20:20:55 INFO mapred.JobClient: map 81% reduce 0%
11/09/23 20:20:58 INFO mapred.JobClient: map 82% reduce 0%
11/09/23 20:21:01 INFO mapred.JobClient: map 83% reduce 0%
11/09/23 20:21:04 INFO mapred.JobClient: map 84% reduce 0%
11/09/23 20:21:07 INFO mapred.JobClient: map 85% reduce 0%
11/09/23 20:21:10 INFO mapred.JobClient: map 86% reduce 0%
11/09/23 20:21:13 INFO mapred.JobClient: map 87% reduce 0%
11/09/23 20:21:22 INFO mapred.JobClient: map 88% reduce 0%
11/09/23 20:21:28 INFO mapred.JobClient: map 89% reduce 0%
11/09/23 20:21:37 INFO mapred.JobClient: map 90% reduce 0%
11/09/23 20:21:47 INFO mapred.JobClient: map 91% reduce 0%
11/09/23 20:21:53 INFO mapred.JobClient: map 92% reduce 0%
11/09/23 20:22:02 INFO mapred.JobClient: map 93% reduce 0%
11/09/23 20:22:11 INFO mapred.JobClient: map 94% reduce 0%
11/09/23 20:22:17 INFO mapred.JobClient: map 95% reduce 0%
11/09/23 20:22:26 INFO mapred.JobClient: map 96% reduce 0%
11/09/23 20:22:32 INFO mapred.JobClient: map 97% reduce 0%
11/09/23 20:22:41 INFO mapred.JobClient: map 98% reduce 0%
11/09/23 20:22:47 INFO mapred.JobClient: map 99% reduce 0%
11/09/23 20:22:53 INFO mapred.JobClient: map 100% reduce 0%
11/09/23 20:22:55 INFO mapred.JobClient: Job complete: job_201109211521_0012
11/09/23 20:22:55 INFO mapred.JobClient: Counters: 6
11/09/23 20:22:55 INFO mapred.JobClient: Job Counters
11/09/23 20:22:55 INFO mapred.JobClient: Launched map tasks=4
11/09/23 20:22:55 INFO mapred.JobClient: Data-local map tasks=4
11/09/23 20:22:55 INFO mapred.JobClient: FileSystemCounters
11/09/23 20:22:55 INFO mapred.JobClient: HDFS_BYTES_READ=1392402240
11/09/23 20:22:55 INFO mapred.JobClient: Map-Reduce Framework
11/09/23 20:22:55 INFO mapred.JobClient: Map input records=11628209
11/09/23 20:22:55 INFO mapred.JobClient: Spilled Records=0
11/09/23 20:22:55 INFO mapred.JobClient: Map output records=11628209
11/09/23 20:22:55 INFO mapreduce.ExportJobBase: Transferred 1.2968 GB in 425.642 seconds (3.1198 MB/sec)
11/09/23 20:22:55 INFO mapreduce.ExportJobBase: Exported 11628209 records.
Fri Sep 23 20:22:55 CST 2011
###############
[wanghai01@tc-crm-rd01.tc.baidu.com bin]$ sh import.sh
Fri Sep 23 20:40:33 CST 2011
11/09/23 20:40:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/23 20:40:33 INFO tool.CodeGenTool: Beginning code generation
11/09/23 20:40:33 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:33 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:33 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/23 20:40:33 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/23 20:40:34 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/a913cede5621df95376a26c1af737ee2/tb_keyword_data_201104.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/./tb_keyword_data_201104.java
11/09/23 20:40:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/a913cede5621df95376a26c1af737ee2/tb_keyword_data_201104.jar
11/09/23 20:40:34 WARN manager.MySQLManager: It looks like you are importing from mysql.
11/09/23 20:40:34 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
11/09/23 20:40:34 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
11/09/23 20:40:34 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
11/09/23 20:40:34 INFO mapreduce.ImportJobBase: Beginning import of tb_keyword_data_201104
11/09/23 20:40:34 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `tb_keyword_data_201104` AS t LIMIT 1
11/09/23 20:40:40 INFO mapred.JobClient: Running job: job_201109211521_0014
11/09/23 20:40:41 INFO mapred.JobClient: map 0% reduce 0%
11/09/23 20:40:54 INFO mapred.JobClient: map 25% reduce 0%
11/09/23 20:40:57 INFO mapred.JobClient: map 50% reduce 0%
11/09/23 20:41:36 INFO mapred.JobClient: map 75% reduce 0%
11/09/23 20:42:00 INFO mapred.JobClient: map 100% reduce 0%
11/09/23 20:43:19 INFO mapred.JobClient: Job complete: job_201109211521_0014
11/09/23 20:43:19 INFO mapred.JobClient: Counters: 5
11/09/23 20:43:19 INFO mapred.JobClient: Job Counters
11/09/23 20:43:19 INFO mapred.JobClient: Launched map tasks=4
11/09/23 20:43:19 INFO mapred.JobClient: FileSystemCounters
11/09/23 20:43:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1601269219
11/09/23 20:43:19 INFO mapred.JobClient: Map-Reduce Framework
11/09/23 20:43:19 INFO mapred.JobClient: Map input records=11628209
11/09/23 20:43:19 INFO mapred.JobClient: Spilled Records=0
11/09/23 20:43:19 INFO mapred.JobClient: Map output records=11628209
11/09/23 20:43:19 INFO mapreduce.ImportJobBase: Transferred 1.4913 GB in 165.0126 seconds (9.2544 MB/sec)
11/09/23 20:43:19 INFO mapreduce.ImportJobBase: Retrieved 11628209 records.
Fri Sep 23 20:43:19 CST 2011
import.sh和export.sh中的主要命令如下
/home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/sqoop import --connect jdbc:mysql://XXXX/crm --username XX --password XX --table tb_keyword_data_201104 --split-by winfo_id --target-dir /user/wanghai01/data/ --fields-terminated-by '\t' --lines-terminated-by '\n' --input-null-string '' --input-null-non-string ''
/home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/bin/sqoop export --connect jdbc:mysql://XXXX/crm --username XX --password XX --table tb_keyword_data_201104 --export-dir /user/wanghai01/data/ --fields-terminated-by '\t' --lines-terminated-by '\n' --input-null-string '' --input-null-non-string ''
本篇文章来源于 Linux公社网站(www.linuxidc.com) 原文链接:http://www.linuxidc.com/Linux/2011-10/45080.htm
=============================================
Sqoop是一个用来将Hadoop和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导入到Hadoop的HDFS中,也可以将HDFS的数据导入到关系型数据库中。
Sqoop的User Guide地址:http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_introduction
1:tar zxvf sqoop-1.1.0.tar.gz
2:修改配置文件 /home/hadoopuser/sqoop-1.1.0/conf/sqoop-site.xml
一般只需要修改如下几个项:
sqoop.metastore.client.enable.autoconnect
sqoop.metastore.client.autoconnect.url
sqoop.metastore.client.autoconnect.username
sqoop.metastore.client.autoconnect.password
sqoop.metastore.server.location
sqoop.metastore.server.port
3:
bin/sqoop help
bin/sqoop help import
4:
[hadoopuser@master sqoop-1.1.0]$ bin/sqoop import --connect jdbc:mysql://localhost/ppc --table data_ip --username kwps -P
Enter password:
11/02/18 10:51:58 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not find appropriate Hadoop shim for 0.20.2
java.lang.RuntimeException: Could not find appropriate Hadoop shim for 0.20.2
at com.cloudera.sqoop.shims.ShimLoader.loadShim(ShimLoader.java:190)
at com.cloudera.sqoop.shims.ShimLoader.getHadoopShim(ShimLoader.java:109)
at com.cloudera.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:173)
at com.cloudera.sqoop.tool.ImportTool.init(ImportTool.java:81)
at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:411)
at com.cloudera.sqoop.Sqoop.run(Sqoop.java:134)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:170)
at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:196)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:205)
解决办法:
默认情况下:
./hadoop-0.20.2/conf/hadoop-env.sh
# Extra Java runtime options. Empty by default.
# export HADOOP_OPTS=-server
需要更改成:
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsqoop.shim.jar.dir=/home/hadoopuser/sqoop-1.1.0/shims"
特别需要注意的是:
Sqoop目前在Apache 版本的Hadoop 0.20.2上是无法使用的。
目前只支持CDH 3 beta 2版本。所以如果想使用的话,得升级到 CDH 3 beta 2版本了。
“Sqoop does not run with Apache Hadoop 0.20.2. The only supported platform is CDH 3 beta 2. It requires features of MapReduce not available in the Apache 0.20.2 release of Hadoop. You should upgrade to CDH 3 beta 2 if you want to run Sqoop 1.0.0.”
这个问题 已经被Cloudera 标记为 Major Bug,希望能尽快解决吧。
本篇文章来源于 Linux公社网站(www.linuxidc.com) 原文链接:http://www.linuxidc.com/Linux/2011-08/41442.htm
发表评论
-
实现mapreduce多文件自定义输出
2012-07-13 15:02 2568普通maprduce中通常是有map和reduce两个阶 ... -
hadoop中mapred.tasktracker.map.tasks.maximum的设置
2012-06-11 16:33 1175目前,我们邮件的一部分log已经迁移到Hadoop集群上 ... -
Hadoop集群的NameNode的备份
2012-06-01 15:58 1439Hadoop集群中,NameNode节 ... -
Hbase Shell的常用命令
2012-06-01 15:54 1388总结的一些Hbase shell的命令 都很简单,可以h ... -
Hadoop集群中增加新节点
2012-06-01 15:53 1160向一个正在运行的Hadoo ... -
Hadoop集群上使用Lzo压缩
2012-06-01 15:47 1204自从Hadoop集群搭建以来,我们一直使用的是Gzip进行 ... -
SSH连接反应慢的分析解决
2012-06-01 09:41 1480原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 ... -
云计算大会视频与演讲备忘
2012-05-29 15:53 834<!--StartFragment --> 阿里 ... -
HBase HFile与Prefix Compression内部实现全解--KeyValue格式
2012-05-25 14:40 19581. 引子 HFile (HBase File)是HB ... -
HFile详解-基于HBase0.90.5
2012-05-25 14:25 23921. HFile详解 HFile文件分为以下六大部分 ... -
Hive-0.8.1索引的实验
2012-05-19 09:29 1549先说一个0.7.1和0.8.1的Metastore不兼容 一 ... -
Hive HBase 整合(中文)
2012-05-07 09:07 1488hive hbase整合,要求比较多,1.hive的得是0.6 ... -
编写hive udf函数
2012-05-04 19:13 1278udf编写入门大写转小写package com.afan;im ... -
Hive与HBase的整合
2012-04-28 10:48 1874开场白:Hive与HBase的整合功能的实现是利用两者本身对外 ... -
hadoop,hbase,hive安装全记录
2012-04-26 10:09 2570操作系统:CentOS 5.5Hadoop:hadoop- ... -
HDFS+MapReduce+Hive+HBase十分钟快速入门
2012-04-15 16:19 11071. 前言 本文的目的是让一个从未接触Hadoo ... -
云框架Hadoop之部署Hive
2012-04-12 15:47 903标签:Hive 云框架 Hadoop 部署 原创作品,允许 ... -
hive实战
2012-04-10 16:01 9961. 安装hive 2. hive实战 3. hi ... -
hadoop hive 资料
2012-04-09 11:10 1159Hive 是建立在 Hadoop 上的数据仓库基础构架。它提供 ... -
Hadoop的rmr和trash
2012-02-14 10:40 1189这两天在操作Hadoop集 ...
相关推荐
### Sqoop 安装与使用详解 #### 一、Sqoop功能概述 Sqoop是一款用于在Hadoop和关系型数据库之间高效传输数据的工具。它能够便捷地将关系型数据库管理系统(RDBMS)中的数据导入到HDFS或从HDFS导出到RDBMS中,同时也...
【大数据技术基础实验报告——Sqoop的安装配置与应用】 Sqoop是一款用于在Apache Hadoop和关系型数据库之间传输数据的工具,它简化了大量数据的导入导出过程。本实验报告将详细介绍如何安装配置Sqoop以及如何使用...
Sqoop 组件安装配置 Sqoop 是 Apache 旗下一款“ Hadoop 和关系数据库服务器之间传送数据”的工具。主要用于在 Hadoop(Hive) 与传统的数据库 (MySQL 、 Oracle 、 Postgres 等 ) 之间进行数据的传递,可以将一个...
Sqoop 组件安装配置 Sqoop 是一个开源的数据传输工具,用于在 Hadoop 和结构化数据存储之间传输数据。 Sqoop 提供了一个命令行界面,允许用户定义数据传输的参数和配置。Sqoop 的安装和配置是将其集成到 Hadoop ...
通过掌握 Sqoop 的基本概念、安装配置、import 和 export 命令,你将能够有效地在 Hadoop 和 RDBMS 之间进行数据迁移。在实际操作中,可能还需要根据具体需求调整参数,以满足不同的导入导出场景。持续学习和实践 ...
Sqoop 安装与配置 Sqoop 是一款开源的数据传输工具,由 Cloudera 公司开发,用于在 Hadoop 和结构化数据存储之间传输数据。Sqoop 提供了一个命令行接口,允许用户使用 SQL 语句来从关系数据库中导出数据,并将其...
本电商数仓项目中的第九部分重点介绍了如何安装和配置Sqoop,这对于构建高效的数据仓库系统至关重要。以下是对 Sqoop 的详细介绍以及安装配置步骤。 一、Sqoop 简介 Sqoop 是一个用来在 Hadoop 和结构化数据存储...
大数据集群 Hadoop HBase Hive Sqoop 集群环境安装配置及使用文档 在本文档中,我们将详细介绍如何搭建一个大数据集群环境,包括 Hadoop、HBase、Hive 和 Sqoop 的安装配置及使用。该文档将分为四部分:Hadoop 集群...
Sqoop 工具的缺点是需要安装和配置,需要添加环境变量和修改配置文件。 Sqoop 工具的应用场景包括: 1. 数据迁移:Sqoop 工具可以将传统型数据库中的数据迁移到 Hadoop 中。 2. 数据集成:Sqoop 工具可以将多个...
安装Sqoop需要预先搭建Java环境和Hadoop环境,其安装过程涉及下载、解压、配置驱动和环境变量设置等步骤。一旦安装配置完成,就可以利用Sqoop来执行大规模数据的转移工作,从而满足大数据应用中对于数据处理和分析的...
Hadoop-Sqoop配置 Sqoop 是一种数据迁移工具,主要用于在结构化数据源和 Hadoop 之间进行数据迁移。Sqoop 可以连接各种数据源,如 MySQL、PostgreSQL、Oracle 等,并将数据导入 Hadoop 中。 Sqoop 的主要特点是...
Sqoop 集群搭建是指在 Hadoop 集群环境中安装和配置 Sqoop,以实现数据的高效转换。 一、Sqoop 安装 Sqoop 的安装可以分为三步:下载 Sqoop 压缩包、解压缩包、将 Sqoop 拷贝到指定目录下。 首先,下载 Sqoop ...
通过这个实验,你不仅掌握了 Sqoop 的安装,还了解了如何配置环境变量和管理文件权限,这些都是在大数据环境中工作的重要技能。同时,熟悉 Sqoop 的基本操作将有助于你在实际项目中更加有效地处理数据迁移任务。
2. Sqoop 安装与配置 安装 Sqoop 需要先确保已安装 Java 和 Hadoop。在系统环境变量中设置 `HADOOP_HOME`,并添加 Sqoop 的安装路径到 `PATH` 变量。例如: ``` export SQOOP_HOME=/home/hadoop/sqoop-1.2.0 ...
安装配置 sqoop 链接:https://blog.csdn.net/m0_69097184/article/details/134153494Sqoop 是一款用于在 Apache Hadoop 和结构化数据存储(如关系型数据库)之间进行大规模数据迁移的工具。它提供了命令行界面,...
在本教程中,我们将详细探讨 Sqoop 的安装过程及其基本使用方法。 ### Sqoop 的安装 1. **环境准备**:确保你已经安装了 Java 运行环境(JRE)和 Java 开发工具(JDK),因为 Sqoop 需要它们。检查 Java 版本: `...
在这个“sqoop安装工具”压缩包中,包含了 Sqoop 的安装包以及 MySQL 的驱动包,这为用户一次性解决安装需求提供了便利。 首先,我们需要理解Sqoop的基本工作原理。Sqoop是通过MapReduce任务来执行数据迁移的,它...