The Message ID:
SUN4V-8001-7R
indicates a problem
in the interconnect between the identified memory DIMM(s)
and the memory controller as a result of exceeding the threshold
for correctable memory link errors.
No data has been lost. However, a memory data lane has been taken
out of service which may impact system performance. The system
is at increased risk of incurring an uncorrectable error, which
will cause a service interruption, until the problem is resolved.
Also, see NOTE below for a description of an existing issue with this
diagnosis (CR 6618751).
Use the command fmdump -v -u <EVENT_ID>
with the <EVENT_ID> from the PSH console message
to locate the suspected faulty components as shown in the example below.
# fmdump -v -u d505971c-f3dd-e27e-9e05-faed155c39bd
TIME UUID SUNW-MSG-ID
Oct 16 15:10:53.2312 d505971c-f3dd-e27e-9e05-faed155c39bd SUN4V-8001-7R
70% fault.memory.link-c
Problem in: mem:///unum=MB/CMP0/BR0/CH0/D0
Affects: mem:///unum=MB/CMP0/BR0/CH0/D0
FRU:
hc://:product-id=SUNW,SPARC-Enterprise-T5220:chassis-id=0704
BB5053:server-id=wgs48-100:serial=22ab471//motherboard=0/chip=0/branch=0/dram-ch
annel=0/dimm=0
Location: MB/CMP0/BR0/CH0/D0
30% fault.memory.link-c
Problem in:
hc://:product-id=SUNW,SPARC-Enterprise-T5220:chassis-id=0704
BB5053:server-id=wgs48-100:serial=101083:part=541215101/motherboard=0
Affects:
hc://:product-id=SUNW,SPARC-Enterprise-T5220:chassis-id=0704
BB5053:server-id=wgs48-100:serial=101083:part=541215101/motherboard=0
FRU:
hc://:product-id=SUNW,SPARC-Enterprise-T5220:chassis-id=0704
BB5053:server-id=wgs48-100:serial=101083:part=541215101/motherboard=0
Location: MB
In the above example, two suspected components are listed (green font);
a memory DIMM and the system motherboard. Depending on the system
configuration, several memory DIMMs may be included in the suspect
list. Also note the diagnosis indicates the probability of the problem
being the memory DIMM (70%) is higher than the motherboard (30%).
A possible cause of the connection problem is the seating of a
memory DIMM. Use the following steps to verify correct DIMM seating.
1.) Locate the suspected DIMM(s)
The location of the suspected DIMM(s) is specified in the "Location"
field. In the example above, the DIMM is located at:
Location: MB/CMP0/BR0/CH0/D0
Refer to the Service Manual or the Service Label attached to the
server chassis to find the physical location of the DIMM.
Some platforms also provide fault indicators for individual DIMMs
to validate the location.
2.) Reseat the DIMM(s)
Follow the instructions in the Service Manual to insure the DIMM is
seated correctly. The Service Manual for your platform is available
through the Sun Documentation Center.
3.) Clear the fault
Clear the original fault to insure that a new problem will be noted.
Use the command fmadm repair <EVENT_ID>
with the <EVENT_ID> from the PSH
console message as shown in the example below.
# fmadm repair d505971c-f3dd-e27e-9e05-faed155c39bd
fmadm: recorded repair to d505971c-f3dd-e27e-9e05-faed155c39bd
If the problem occurs again after reseating the DIMM(s), then the
recommended action is to replace the DIMM(s) called out in the
suspect list.
1.) Locate and replace the suspected DIMM(s)
The DIMM(s) to replace are identified by the command
fmdump -v -u <EVENT_ID>
as shown in the example above.
2.) Clear the fault
After replacing the DIMM(s) clear the original fault to insure
that a new problem will be noted.
Use the command fmadm repair <EVENT_ID>
with the <EVENT_ID> from the PSH
console message as shown in the example above.
If the problem occurs again after replacing the DIMM(s) then contact your service provider.
NOTE: For the T5440, a suspect FRU for this fault may be a
Memory Module, and for the T5240 a suspect FRU may be the optional
Memory Mezzanine. Neither of these FRUs will be included in the
suspect list for this fault (CR 6618751).
If this fault occurs on either of these platforms follow the
instructions above to reseat or replace the DIMMs identified in
the suspect list (Note, the identified DIMMs may be located on a
T5440 Memory Module or T5240 Memory Mezzanine. Refer to your
server's Service Manual or Service Label to find the physical
location of the DIMMs). If these actions do not resolve the problem,
contact your service provider.
A patch is not yet available to resolve this issue. When the patch
becomes available, it will be posted on this message.