`
huobengluantiao8
  • 浏览: 1061370 次
文章分类
社区版块
存档分类
最新评论

TROUBLESHOOTING GUIDE (TSG) - ORA-1555

 
阅读更多
TROUBLESHOOTING GUIDE (TSG) - ORA-1555 [ID 467872.1]

修改时间 05-OCT-2011 类型 HOWTO 状态 PUBLISHED

In this Document
Goal
Solution
References


Applies to:

Oracle Server - Enterprise Edition - Version: 9.0.1.0 to 11.2.0.1.0 - Release: 9.0.1 to 11.2
Oracle Server - Enterprise Edition - Version: 9.0.1.0 to 11.2.0.1.0 [Release: 9.0.1 to 11.2]
Information in this document applies to any platform.

Goal

A clear, step by step, method to diagnose an ORA-1555 issue using either

UNDO_MANAGEMENT = AUTO ... Automatic Undo Management (AUM)

OR

UNDO_MANAGEMENT = MANUAL ... Manual Undo (Rollback)

Solution

There are many good notes on ORA-1555 causes and solutions. In order to be able to resolve a problem, we must first understand the problem.

We will examine an extended example of the most common cause of the ORA-1555.

EXAMPLE:

Time 1: Session #1 starts a query on table A
Time 2: Session #2 updates row X in table A
Time 3: The query from session #1 arrives at row X and discovers the last updated time
(based on the SCN number) of the row to be later than Time 1 ... so the UNDO or
LOB segments are used to get the read consistent view (before image) of the row
and then the query proceeds
Time 4: Session #2 updates row Y in table A and then COMMITs (thus making it possible
for this transaction slot to be overwritten)
Time 5: Session #2 updates row Z in table B and COMMITs ... either due to space pressure
(using AUM) or bad luck (using rollback) in the read consistent view for the update
of row Y in table A at time 4 ... is overwritten (we wont examine why the overwrite
occurs at this point)
Time 6: The query from session #1 arrives at row Y and and discovers the last updated time
(based on the SCN number) of the row to be later than Time 1 ... so the UNDO or
LOB segments are examined to find the read consistent view ... BUT ... the
transaction slot containing the view ... was overwritten at time 5 ... so no read
consistent view is available ... so an ORA-1555 occurs

NOTE ... UNDO is a generic term that can refer to either UNDO (AUM) or Rollback segments

==============================================================================================

How can AUM allow a read consistent view of a transaction slot for a committed row to be overwritten?

1) The transaction slot for the row has expired.

This means that the current time minus the commit time of the row is greater than the
UNDO_RETENTION.
Once a transaction slot of a committed row is 'expired', it is available for
reuse. Often we get customers who ask:

Why do my queries sometimes run so much longer than my UNDO_RETENTION before
encountering an ORA-1555 and other times it occurs soon after?


The answer is all left up to chance and how much pressure the undo tablespace is under.

2) The transaction slot for the row has NOT expired and yet was still overwritten.

This condition occurs under two conditions:

a) The undo tablespace has become full and must 'steal' transaction slots from committed, unexpired
rows

b) A bug

3) The LOB segment read consistent copy of the LOB is no longer available

Investigate how the LOB column is configured, in-row or out-of-row. In-row LOBs should be utilizing
normal UNDO algorithms in the UNDO tablespace. Out-of-row LOBs use the information below.

Read consistency with LOBs is controlled by 2 methods

a) Use of PCTVERSION (old method)

Specify the maximum percentage of overall LOB storage space used for maintaining old versions of
the LOB. The default value is 10, meaning that older versions of the LOB data are not overwritten
until they consume 10% of the overall LOB storage space.

The downfall to this method is customers who do frequent updates or deletes of tables with LOBs
often exceed even 100% PCTVERSION ...

b) Use of RETENTION (current method for use with AUM)

Use this clause to indicate that Oracle Database should retain old versions of this LOB column.
Oracle Database uses the value of the UNDO_RETENTION initialization parameter to determine
the amount of committed undo data to retain in the database.

This method uses the same expiration as undo segments

If an ORA-1555 occurs with a LOB segment. then this means that either:

PCTVERSION was exceeded and the read consistent view of the LOB was overwritten

OR

the LOB has exceeded RETENTION and the row LOB was overwritten sometime during
the query that generated the ORA-1555

New information

LOB problems with ORA-1555 can be very difficult to fix. Updates to out-of-row LOBs
should be minimized. Deletes and inserts have much less impact to UNDO operations than
updates. PCTVERSION is the preferred approach with 10.2.x and AUM. However,
100% will mean that lots of disk space is required to keep up with UNDO "copies" of LOB
information in environments with frequent LOB updates.

In some cases it has helped to move the LOB column to a tablespace where the Segment
Space Management is manual.

NOTE: transaction slots for UNCOMMITTED rows cannot be overwritten

Note 422826.1 How To Identify LOB Segment Use PCTVERSION Or RETENTION
Note 563470.1 Lob retention not changing when undo_retention is changed
Note 800386.1 ORA-1555 - UNDO_RETENTION is silently ignored if the LOB
resides in a MSSM tablespace
=======================================================================================

Now that we understand why the ORA-1555 occurs and some aspects about how they can occur we need to examine the following:

How can we logically determine and resolve what has occurred to cause the ORA-1555?

1) Determine if UNDO_MANAGEMENT is MANUAL or AUTO

If set to MANUAL, it is best to move to AUM. If it is not feasible to switch to AUM see

Note 69464.1 Rollback Segment Configuration & Tips

to attempt to tune around the ORA-1555 using V$ROLLSTAT

If set to AUTO, proceed to #2

2) Gather the basic data

a) Acquire both the error message from the user / client ... and the message in the alert log

User / Client session example:

ORA-01555: snapshot too old: rollback segment number 9 with name "_SYSSMU1$" too small

Alert log example

ORA-01555 caused by SQL statement below (Query Duration=9999 sec, SCN:0x000.008a7c2d)

b) Determine the QUERY DURATION from the message in the alert log

From our example above ... this would be 9999

c) Determine the undo segment name from the user / client message

From our example above ... this would be _SYSSMU1$

d) Determine the UNDO_RETENTION of the undo tablespace

show parameter undo_retention

3) Determine if the ORA-1555 is occurring with an UNDO or a LOB segment

If the undo segment name is null ...

ORA-01555: snapshot too old: rollback segment number with name "" too small

or the undo segment is unknown

ORA-01555: snapshot too old: rollback segment number # with name "???" too small

then this means this is a read consistent failure on a LOB segment

If the segment_name or the undo segment is known the error is occurring with an UNDO segment.

==============================================================================================

What to do if an ORA-1555 is occurring with an UNDO segment
-------------------------------------------------------------------------------------------------


1) QUERY DURATION > UNDO_RETENTION

There are no guarantees that read consistency can be maintained after the transaction slot for the
committed row has expired (exceeded UNDO_RETENTION)

Why would one think that the transaction slot's time has exceeded UNDO_RETENTION?

Lets answer this with an example

If UNDO_RETENTION = 900 seconds ... but our QUERY DURATION is 2000 seconds ...

This says that our query has most likely encountered a row that was committed more than 900
seconds ago ... and has been overwritten as we KNOW that the transaction slot being examined
no longer matches the row we are looking for

The reason we say "most likely" is that it is possible that an unexpired committed transaction slot was
overwritten due to either space pressure on the undo segment or this is a bug

SOLUTION:

The best solution is to tune the query can to reduce its duration. If that cannot be done then increase
UNDO_RETENTION
based on QUERY DURATION to allow it to protect the committed
transaction slots for a longer period of time

NOTE: Increasing UNDO_RETENTION requires additional space in the UNDO tablespace. Make
sure to accommodate for this space. One method of doing this is to set AUTOEXTEND on one or
more of the UNDO tablespace datafiles for a period of time to allow for the increased space. Once the
size has stabilized, AUTOEXTEND can be removed.

See the solution for #2 below for more options

2) QUERY DURATION <= UNDO_RETENTION

This case is most often due to the UNDO tablespace becoming full sometime during the time when the
query was running

How do we tell if the UNDO tablespace has become full during the query?

Examine V$UNDOSTAT.UNXPSTEALCNT for the period while the query that generated the
ORA-1555 occurred.

This column shows committed transaction slots that have not exceeded UNDO_RETENTION but
were overwritten due to space pressure on the undo tablespace (IE became full).

If UNEXPSTEACNT > 0 for the time period during which the query was running then this shows
that the undo tablespace was too small to be able to maintain UNDO_RETENTION. Unexpired
blocks were over written, thus ending read consistency for those blocks for that time period.

set pagesize 25
set linesize 120

select inst_id,
to_char(begin_time,'MM/DD/YYYY HH24:MI') begin_time,
UNXPSTEALCNT "# Unexpired|Stolen",
EXPSTEALCNT "# Expired|Reused",
SSOLDERRCNT "ORA-1555|Error",
NOSPACEERRCNT "Out-Of-space|Error",
MAXQUERYLEN "Max Query|Length"
from gv$undostat
where begin_time between
to_date('<start time of the ORA-1555 query>','MM/DD/YYYY HH24:MI:SS')
and
to_date('<time when ORA-1555 occured>','MM/DD/YYYY HH24:MI:SS')
order by inst_id, begin_time;

NOTE: Start time of the query can be determined by subtracting the query duration from the timestamp
of the ORA-1555

EXAMPLE:

Tue May 26 16:16:57 2009
ORA-01555 caused by SQL statement below (SQL ID: 54yn3n36w24ft, Query Duration=922 sec, SCN: 0x0007.8a55f4e3)

922 seconds is 15 min 22 seconds

So the start time of the query would be May 26 16:01:35


If after researching, you find no existing bugs seem to match your case, then additional info will be
needed to file a bug.

Starting with release 9.2.0.4 an event, 10442, can be set to dump all the necessary diagnostics
information to one trace file. NOTE: Event 10442 does not trigger diagnostics in cases where the
ORA-1555 error is associated with out-of-row LOB undo.

The event can be enabled using the following settting in init.ora

event="10442 trace name context forever, level 10"

or

Alter system set events '10442 trace name context forever , level 10';

Reproduce the ORA-1555 error with a new connection to trigger the trace diagnostics file.

NOTE: There have been issues with 10g and 11g on RAC using 10442. The event can contribute to factors leading to a hang on one or more nodes on the RAC. Refer to Bug 10051817 for more fixes for these problems on RAC.


Another thing to consider is whether Oracle is behaving as designed as far as "steal logic" is concerned

Here is the logic for allocating space for a new transaction--based on Bug:4070480

* Allocate new extent in the undo tablespace
* If failed, steal an expired extent from another undo segment. This involves shrinking from the
other segment and adding to the current undo segment.
* If failed, autoextend a datafile (if enabled)
* If failed, reuse an unexpired extent from the current undo segment.
* If failed, steal an unexpired extent from another undo segment.
* If failed, report an "Out-Of-Space" error.

What to do if an ORA-1555 is occurring with a LOB segment
---------------------------------------------------------------------------------------------

1) Often the ORA-1555 with the null undo segment is accompanied by an ORA-22924

Error: ORA-22924
Text: snapshot too old
---------------------------------------------------------------------------
Cause: The version of the LOB value needed for the consistent read was already overwritten by another
writer.

2) Determine if PCTVERSION or RETENTION is used

Examine DBA_LOBS.PCTVERSION and DBA_LOBS.RETENTION
(or investigate through the data dictionary--see below)

NOTE: Only one of these may be set, so this will determine which method of undo management is
used for this LOB

There is a bug with these two columns prior to 10.2 and as such the data dictionary must be
examined to determine which of these is being used

Note 422826.1 How To Identify LOB Segment Use PCTVERSION Or RETENTION From
Data Dictionary

If PCTVERSION is used
------------------------------------

This method of LOB undo management is akin to use of rollback segments and as such there is little
control to tune undo usage.

The only parameter we have control over with this method is PCTVERSION itself.

SOLUTION:

Increase PCTVERSION

ALTER TABLE **table name** MODIFY LOB (**lob column name**) (PCTVERSION 100)

If the parameter is at 100% then the only solution will be to move to use of RETENTION

ALTER TABLE **table name** MODIFY LOB (**lob column name**) (RETENTION)

If RETENTION is used
---------------------------------

You may have RETENTION in place, but it may not be working as you expect.

See Note 800386.1 ORA-1555 - UNDO_RETENTION is silently ignored if the LOB
resides in a MSSM tablespace
Note 563470.1 Lob retention not changing when undo_retention is changed

If RETENTION is used the same rules apply for the LOB segments as do the UNDO segments

1) QUERY DURATION > UNDO_RETENTION

There are no guarantees that read consistency can be maintained after the old lob segment for
the committed row has expired (exceeded UNDO_RETENTION)

SOLUTION:

The best solution is to tune the query to reduce its duration. If that cannot be done then
increase UNDO_RETENTION based on QUERY DURATION to allow it to protect the
committed transaction slots for a longer period of time. A change to UNDO_RETENTION
does not automatically change RETENTION. See Note 563470.1 Lob retention not
changing when undo_retention is changed.

NOTE 1:
Increasing UNDO_RETENTION requires additional space in the LOB segments
tablespace. Make sure to accommodate for this. One method of doing this is to set
AUTOEXTEND on one of more of the lob segments tablespace datafiles. There may or
may not be a stabilization of size like the UNDO tablespace. If it does stabilize then the
AUTOEXTEND can be removed. See the solution for #2 below for more options.

NOTE 2:
RETENTION for LOBs does not function the same as it does for undo. LOB segments
will not automatically extend to allow for retention of read consistent copies of a lob. As
such, at times like these, PCTVERSION 100 is a better solution.

2) QUERY DURATION <= UNDO_RETENTION

This case is most often due to the LOB segments tablespace becoming full sometime during the time
when the query was running.

How do we tell if the LOB segment tablespace has become full during the query?

Unfortunately there is no easy way to do this. V$UNDOSTAT does not contain info about the LOB
segments. There may be a way to use Enterprise Manager and examine the high watermark
(HWM) of the tablespace in question.

There is an enhancement request for this info to be included in the future

Bug:3200789 Abstract: VISIBILITY OF LOB SEGMENT USAGE FOR UNDO

SOLUTION:

Provide more space in the LOB segments tablespace so that transaction slots do not have to be
stolen. This can be done by:

* Using AUTOEXTEND on an existing LOB segments tablespace datafile
* Increase the size of an existing LOB segments tablespace datafile
* Add a datafile to the LOB segments tablespace

3) Occasionally we encounter situations that do not fit into either #1 or #2 above. We have to consider the possibility of a bug in those cases.

EXAMPLES:

Note 253131.1 Concurrent Writes May Corrupt LOB Segment When Using Auto Segment
Space Management (ORA-1555)

Bug:5636728 FALSE ORA-1555 WHEN READING LOBS AFTER SHRINK

NOTE: If RAC is in use, changes must be made to AT LEAST the instance that raised the ORA-1555

The following query will determine the instance on which the ORA-1555 occurred

select stat.inst_id, seg.tablespace_name
from dba_rollback_segs seg, gv$rollstat stat
where seg.segment_id = stat.usn
and seg.segment_name='<UNDO SEGMENT NAME REPORTED IN THE ALERT LOG MESSAGE>';


@ Former title: ORA-1555 - YET STILL ANOTHER APPROACH


References

BUG:5636728 - FALSE ORA-1555 WHEN READING LOBS AFTER SHRINK
BUG:6919859 - ORA-22924, ORA-1555: SNAPSHOT TOO OLD WITH RETENTION BASED LOB
NOTE:104008.1 - ORA-01555: "SNAPSHOT TOO OLD" - Additional Solutions
NOTE:10630.1 - ORA-01555: "Snapshot too old" - Overview
NOTE:253131.1 - Concurrent Writes May Corrupt LOB Segment When Using Auto Segment Space Management (ORA-1555)
NOTE:422826.1 - How To Identify LOB Segment Use PCTVERSION Or RETENTION From Data Dictionary
NOTE:452341.1 - ORA-01555 And Other Errors while Exporting Table With LOBs, How To Detect Lob Corruption.
NOTE:563470.1 - Lob retention not changing when undo_retention is changed
NOTE:800386.1 - ORA-1555 - UNDO_RETENTION is silently ignored if the LOB resides in a MSSM tablespace
NOTE:877613.1 - AUM Common Analysis/Diagnostic Scripts

分享到:
评论

相关推荐

    Python库 | troubleshooting-framework-1.3.0.3.tar.gz

    《Python库:troubleshooting-framework-1.3.0.3.tar.gz详解》 在信息技术领域,Python作为一种强大且易学的开发语言,被广泛应用于各种项目中。它丰富的库生态系统是其的一大亮点,其中“troubleshooting-...

    NetBackup Appliance Troubleshooting Guide - 3.1 and 3.1.1.pdf

    NetBackup Appliance Troubleshooting Guide - 3.1 and 3.1.1

    NetBackup-52xx-5030-Troubleshooting-Guide-2604

    《NetBackup 52xx/5030 故障排除指南》是Symantec公司为用户提供的一份详细的故障诊断和解决手册,适用于NetBackup 52xx和5030型号的设备。该指南主要关注如何处理在使用NetBackup Appliance过程中遇到的各种问题,...

    Openstack-troubleshooting-field-survival-guide

    - [官方网站](https://docs.openstack.org/operations-guide/ops-maintenance.html) 2. **了解系统结构**: - **组件定位**:清楚哪些组件参与了问题的产生。 - **工作原理**:理解这些组件是如何协同工作的。 ...

    Troubleshooting SQL Server - A Guide for the Accidental DBA

    本书名为《Troubleshooting SQL Server - A Guide for the Accidental DBA》,是一本专注于SQL Server性能调优和故障排除的指南,非常适合那些偶然担任数据库管理员(DBA)角色的读者。书中包含了丰富的实践知识和...

    NetBackup Appliance Troubleshooting Guide - 3.1 and 3.1.x.pdf

    《NetBackup Appliance Troubleshooting Guide - 3.1 and 3.1.x》是针对Veritas NetBackup设备的一份详尽的故障排查指南,旨在帮助用户解决在使用Veritas NetBackup 3.1和3.1.x版本时可能遇到的问题。这份文档由...

    NetBackup 52xx 5330 Appliance Troubleshooting Guide-271

    总之,《NetBackup 52xx 5330 Appliance Troubleshooting Guide-271》是一份全面的指南,为用户提供了一整套处理NetBackup Appliance故障的工具和资源,确保用户能够有效地维护和优化他们的备份和恢复系统。...

    JDK9-JSE-Troubleshooting Guide-244.pdf

    总之,《JDK9-JSE-Troubleshooting Guide-244》是一本针对Java 9开发者的重要参考资料,它提供了解决各种技术问题的方法,帮助开发者提高代码质量和运行效率。通过深入学习和应用这份指南,开发者可以更好地掌握Java...

    java troubleshooting guide for HP-UX.pdf

    ### Java Troubleshooting Guide for HP-UX #### 关于文档 本文档主要为HP-UX系统上的Java应用程序提供了一系列故障排除指南与工具介绍。HP-UX作为一种基于UNIX的操作系统,在运行Java应用程序时可能会遇到各种问题...

    vmware vsphere 6.5安装配置资料(中文版).zip

    vsphere-esxi-vcenter-server-65-troubleshooting-guide vsphere-esxi-vcenter-server-65-upgrade-guide vsphere-esxi-vcenter-server-65-virtual-machine-admin-guide vsphere-html-host-client-18-guide

    NetBackup Appliance Troubleshooting Guide - 3.1.x,

    Veritas NetBackup Appliance Troubleshooting Guide 是一份详细的指南,主要针对Veritas NetBackup设备的3.1.x、3.2和3.3.0.1版本。该文档旨在帮助用户解决在使用NetBackup Appliance过程中遇到的各种问题,提供...

    HACMP6.1 Troubleshooting guide-120

    《HACMP6.1故障排查指南-120》是针对AIX操作系统上的高可用性集群多处理(High Availability Cluster Multi-Processing for AIX)的专门技术文档。本指南旨在帮助用户识别并解决在使用HACMP 6.1版本时遇到的问题,确保...

    TroubleShooting (BIOS-Xwindow)

    ### 故障排查(BIOS-Xwindow)详细解析 #### 实验背景与目的 本实验旨在通过对BIOS至Xwindow启动过程中的故障排查,帮助学习者掌握如何快速且有效地解决开机过程中遇到的各种问题,确保系统能够正常运行。...

    JDK18-troubleshooting-guide.pdf

    Java Troubleshooting Guide Java 是一种广泛使用的编程语言,由 Oracle 公司开发和维护。Java Platform, Standard Edition 是 Java 的一个版本,提供了完整的 Java 环境,包括虚拟机、类库、开发工具等。为了帮助...

    NetBackup Appliance Troubleshooting Guide - 3.1.pdf

    《NetBackup Appliance 故障排查指南 - 3.1》是Veritas Technologies LLC发布的一份详细的技术文档,专门针对Veritas NetBackup设备的故障诊断和解决。此指南的版本为3.1,旨在帮助用户识别和处理在使用NetBackup...

    GTFM-troubleshooting-deep-neural-networks-OpenAI专家指导你改模型-141.pdf

    Josh Tobin等人撰写的一篇名为《Troubleshooting Deep Neural Networks》的文章(141页PDF),为解决这一难题提供了宝贵的见解。本文将基于给定文件的信息,深入探讨该文档中涉及的关键知识点。 #### 二、为何讨论...

    Troubleshooting with Wireshark - 分析网站服务器慢的原因

    在"Troubleshooting with Wireshark - Analyzing Slow Web Servers"这个主题中,我们将深入探讨如何使用Wireshark来诊断和解决网站服务器响应缓慢的问题。 首先,了解网站服务器慢的原因至关重要。这可能是由于多种...

    故障处理-troubleshooting-and-optimization.zip

    【项目简介】 Hadoop性能调优 主要涉及hdfs、yarn、mapreduce、hive等的调优,同时也包括一些linux系统产生调优 Hadoop常见故障处理 记录工作中所遇到的以及搜索时所看到的常见故障解决办法 Spark性能调优 ...

Global site tag (gtag.js) - Google Analytics