`

some important optimized advices for hbase-0.94.x

 
阅读更多

The following gives you a list to run through when you encounter problems with your cluster setup.

1.Basic setup checklist

This section provides a checklist of things you should confirm for your cluster, before going into a deeper analysis in case of problems or performance issues.

File handles.

HBase is a database, so it uses a lot of files at the same time. The default ulimit -n of 1024 on most Unix or other Unix-like systems is insufficient. Any significant amount of loading will lead to I/O errors stating the obvious: java.io.IOException: Too many open files. You may also notice errors such as the following:

2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException

2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901 

The ulimit -n for the DataNode processes and the HBase processes should be set high. To verify the current ulimit setting you can also run the following:

$ cat /proc/<PID of JVM>/limits

You should see that the limit on the number of files is set reasonably high—it is safest to just bump this up to 32000, or even more. “File handles and process lim- its” on page 49 has the full details on how to configure this value.

  TODO but i found this value is different from the one set in OS,and the later always is 4096(ie 4 times of OS's one)

 

 

DataNode connections(dfs.datanode.max.xcievers)

The DataNodes should be configured with a large number of transceivers—at least 4,096, but potentially more. There’s no particular harm in setting it up to as high as 16,000 or so. See “Datanode handlers” on page 51 for more infor- mation.

Not having this configuration in place makes for strange-looking failures. Eventually, you will see a complaint in the datanode logs about the xcievers limit being exceeded, but on the run up to this one manifestation is a complaint about missing blocks. For example:

10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry... 

Compression. Compressionshouldalmostalwaysbeon,unlessyouarestoringprecom- pressed data. “Compression” on page 424 discusses the details. Make sure that you have verified the installation so that all region servers can load the required compression libraries. If not, you will see errors like this:

hbase(main):007:0> create 'testtable', { NAME => 'colfam1', COMPRESSION => 'LZO' } ERROR: org.apache.hadoop.hbase.client.NoServerForRegionException: \

No server address listed in .META. for region \ testtable2,,1309713043529.8ec02f811f75d2178ad098dc40b4efcf.

In the logfiles of the servers, you will see the root cause for this problem (abbreviated and line-wrapped to fit the available width):

2011-07-03 19:10:43,725 INFO org.apache.hadoop.hbase.regionserver.HRegion: \ Setting up tabledescriptor config now ...

2011-07-03 19:10:43,725 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: \ Instantiated testtable,,1309713043529.8ec02f811f75d2178ad098dc40b4efcf.

2011-07-03 19:10:43,839 ERROR org.apache.hadoop.hbase.regionserver.handler. \ OpenRegionHandler: Failed open of region=testtable,,1309713043529. \ 8ec02f811f75d2178ad098dc40b4efcf.
java.io.IOException: java.lang.RuntimeException: \

java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec
at org.apache.hadoop.hbase.util.CompressionTest.testCompression
at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs ...

page501image19448page501image19608
The missing compression library triggers an error when the region server tries to open the region with the column family configured to use LZO compression.

Garbage collection/memory tuning. We discussed the common Java garbage collector set- tings in “Garbage Collection Tuning” on page 419. If enough memory is available, you should increase the region server heap up to at least 4 GB, preferably more like 8 GB. The recommended garbage collection settings ought to work for any heap size.

Also, if you are colocating the region server and MapReduce task tracker, be mindful of resource contention on the shared system. Edit the mapred-site.xml file to reduce the number of slots for nodes running with ZooKeeper, so you can allocate a good share of memory to the region server. Do the math on memory allocation, accounting for memory allocated to the task tracker and region server, as well as memory allocated for each child task (from mapred-site.xml and hadoop-env.sh) to make sure you are leaving enough memory for the region server but you’re not oversubscribing the system. Refer to the discussion in “Requirements” on page 34. You might want to consider separating MapReduce and HBase functionality if you are otherwise strapped for resources.

Lastly, HBase is also CPU-intensive. So even if you have enough memory, check your CPU utilization to determine if slots need to be reduced, using a simple Unix command such as top, or the monitoring described in Chapter 10.

 

 

2.Stability issues

In rare cases, a region server may shut itself down, or its process may be terminated unexpectedly. You can check the following:

• Double-checkthattheJVMversionisnot1.6.0u18(whichisknowntohavedet- rimental effects on running HBase processes).

• Check the last lines of the region server logs—they probably have a message con- taining the word "aborting" (or "abort"), hopefully with a reason.

The latter is often an issue when the server is losing its ZooKeeper session. If that is the case, you can look into the following:

2.1 ZooKeeper problems. It is vital to ensure that ZooKeeper can perform its tasks as the co- ordination service for HBase. It is also important for the HBase processes to be able to communicate with ZooKeeper on a regular basis. Here is a checklist you can use to ensure that your do not run into commonly known problems with ZooKeeper:

 

 

Check that the region server and ZooKeeper machines do not swap
If machines start swapping, certain resources start to time out and the region servers will lose their ZooKeeper session, causing them to abort themselves. You can use Ganglia, for example, to graph the machines’ swap usage, or execute

$ vmstat 20

 

page502image23136page502image23296

 in fact ,this is the theshold that the swap will be used when the memorey have been used for (1-20%) percent .

 

on the server(s) while running load against the cluster (e.g., a MapReduce job): make sure the "si" and "so" columns stay at 0. These columns show the amount of data swapped in or out. Also execute

$ free -m

to make sure that no swap space is used (the swap column should state 0). Also consider tuning the kernel’s swappiness value (/proc/sys/vm/swappiness) down to 5 or 10. This should help if the total memory allocation adds up to less than the box’s available memory, yet swap is happening anyway.

Check network issues
If the network is flaky, region servers will lose their connections to ZooKeeper and abort.

Check ZooKeeper machine deployment
ZooKeeper should never be codeployed with task trackers or data nodes. It is per- missible to deploy ZooKeeper with the name node, secondary name node, and job tracker on small clusters (e.g., fewer than 40 nodes).

It is preferable to deploy just one ZooKeeper peer shared with the name node/job tracker than to deploy three that are collocated with other processes: the other processes will stress the machine and ZooKeeper will start timing out.

Check pauses related to garbage collection
Check the region server’s logfiles for a message containing "slept"; for example, you might see something like "We slept 65000ms instead of 10000ms". If you see this, it is probably due to either garbage collection pauses or heavy swapping. If they are garbage collection pauses, refer to the tuning options mentioned in “Basic setup checklist” on page 471.

Monitor slow disks
HBase does not degrade well when reading or writing a block on a data node with a slow disk. This problem can affect the entire cluster if the block holds data from the META region, causing compactions to slow and back up. Again, use monitor- ing to carefully keep these vital metrics under control.

 it is hard to find unless you use some disk check tools as hdpam

ref:

hbase-guide

swappiness

HBase性能调优

分享到:
评论

相关推荐

    historian-optimized.js

    这个主要是historian-optimized.js文件在搭建环境时候没有正常获取到,通过chrom浏览器的控制台可以查看到historian-optimized.js报错 解决方法: 将historian-optimized.js放到compiled目录下

    jquery.mobile-1.4.5.min.js 含sources源码

    A Touch-Optimized Web Framework jQuery Mobile is a HTML5-based user interface system designed to make responsive web sites and apps that are accessible on all smartphone, tablet and desktop devices. ...

    AXP221s-X-Powers.pdf

    AXP221s 是一款高度集成的电源系统管理芯片,针对单芯锂电池 ( 锂离子或锂聚合物 ) 且需要多路电源转换输出的应用,提供简单易用而又可以灵活配置的完整电源解决方案,充分满足多核应用处理器系统对于电源相对复杂而...

    A animated GIF engine for iOS in Swift. Optimized for Multi-Image case..zip

    A animated GIF engine for iOS in Swift. Optimized for Multi-Image case..zip,An animated gif & apng engine for iOS in Swift. Have a great performance on memory and cpu usage.

    react-optimized-list-demo-源码.rar

    react-optimized-list-demo-源码.rar

    GA-optimized-neural-network-main.zip

    GA-optimized-neural-network-main.zip

    0 - NVM-Express-Base-Specification-2_0-2021.06.02-Ratified-4.pdf

    This interface is optimized for all storage solutions, attached using a variety of transports including PCI Express, Ethernet, InfiniBandTM, and Fibre Channel. The mapping of extensions defined in ...

    optimized_nn_operations-main.rar

    optimized_nn_operations-main.rar

    An Approach to Optimized ResourceScheduling Algorithm for Open-source .pdf

    An Approach to Optimized ResourceScheduling Algorithm for Open-source 这篇论文比较难找,是一片发表在IEEE上面的文章,贡献出来给大家

    用粒子群算法求解单一水库优化调度 pso-reservoir.zip

    只需要修改相应的约束条件就可以进行优化计算了(With a single particle swarm algorithm is optimizing reservoir operation, only need to modify the corresponding constraint conditions can were optimized ) ...

    emscripten-fastcomp.7z

    a toolkit for the construction of highly optimized compilers, optimizers, and runtime environments. LLVM is open source software. You may freely distribute it under the terms of the license agreement...

    NVM-Express-1_4b-2020.09.21-Ratified

    The NVM ExpressTM (NVMeTM) interface allows ... This interface is optimized for Enterprise and Client solid state drives, typically attached as a register level interface to the PCI Express interface.

    mirrors-llvm-project-release-11.x.zip

    This directory and its sub-directories contain source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments. The README briefly describes ...

    NVM-Express-1_4a-2020.03.09-Ratified.pdf

    This interface is optimized for Enterprise and Client solid state drives, typically attached as a register level interface to the PCI Express interface. Note: During development, this specification ...

    tensorflow models-master.7z.002

    They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The [research models]...

    tensorflow models-master.7z.001

    They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The [research models]...

    spark-2.4.5-bin-hadoop2.6.tgz

    Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in ...MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.

    开源项目-wildducktheories-timeserieslog.zip

    开源项目-wildducktheories-timeserieslog.zip,timeserieslog - a go API optimized for processing timeseries data

    开源项目-minio-highwayhash.zip

    开源项目-minio-highwayhash.zip,Native Go version of HighwayHash with optimized assembly implementations on Intel and ARM. Able to process over 10 GB/sec on a single core on Intel CPUs.

Global site tag (gtag.js) - Google Analytics