`

mysql applier with hadoop

 
阅读更多

MySQL Applier for Hadoop

Replication via the Hadoop Applier is implemented by connecting to the MySQL master and reading binary log events as soon as they are committed, and writing them into a file in HDFS. "Events" describe database changes such as table creation operations or changes to table data.

MySQL to HDFS Integration

 

The Hadoop Applier uses an API provided by libhdfs, a C library to manipulate files in HDFS. The library comes precompiled with Hadoop distributions.

It connects to the MySQL master to read the binary log and then:

  • Fetches the row insert events occurring on the master
  • Decodes these events, extracts data inserted into each field of the row, and uses content handlers to get it in the format required
  • Appends it to a text file in HDFS.

Databases are mapped as separate directories, with their tables mapped as sub-directories with a Hive data warehouse directory. Data inserted into each table is written into text files (named as datafile1.txt) in Hive / HDFS. Data can be in comma separated format; or any other, that is configurable by command line arguments.

Mapping between MySQL and HDFS Schema

 

 

download  from http://labs.mysql.com/

 

 

 

 

 

 Preferences

http://dev.mysql.com/tech-resources/articles/mysql-hadoop-applier.html

http://www.tuicool.com/articles/NfArA3i

 

a similar project is  https://github.com/noplay/python-mysql-replication

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics