Base RAM ECC RAS Test

Overview

The Neoverse N2 reference design platform has support for both Secure and Non-Secure RAM. Secure RAM is defined in 0x0400_0000 - 0x04ff_ffff address range. And non-secure RAM is defined in 0x0600_0000 - 0x07ff_ffff address range. Both the regions are defined as part of Application Processor (AP) memory map. These regions are backed by ECC to support error detection and correction.

Here we use Secure RAM ECC feature to demonstrate 1-bit corrected error injection and handling. This test is based on Firmware First software error handling approach.

Note

This test is only supported on Neoverse N2 based reference design platforms.

1-bit CE error injection on Secure RAM

ECC module on Base RAM can detect and correct 1-bit CE that occur on RAM. Base element RAMs implements 6 register banks of RAS registers. One pair of register bank is assigned to AP, SCP and MCP components. One bank in each pair implements ECC RAS register for secure RAM and other implements register for non-secure RAM.

RAS register within each bank defines a ErrCtrl register. This register can be programmed to generate Corrected Error (CE) or Uncorrected Error (UE) errors in the RAM. After the ErrCtrl is programmed to inject desired error, injection software must initiate a read transaction to any Secure RAM location for injection to take effect. After that the error records will be populated with appropriate values and a corresponding fault or error interrupt will be generated.

Detailed Error injection software sequence is illustrated to inject 1-bit CE into Secure RAM.

  • Map the AP Secure RAM ECC RAS registers device memory space.

  • Program the ErrCtlr register to inject CE.

  • Program the ErrCtrl register to enable ECC support.
    • mmio_write_32((AP_S_RAM_ECC_RAS_BASE + 0x004),INJECT_CE_BIT | ENABLE_ECC)

  • Read any Secure RAM region.
    • data = *(volatile uint32_t *)SECURE_RAM_ADDR

Download the platform software

Skip this section if the required sources have been downloaded.

To obtain the required sources for the platform, follow the steps listed on the Setup Workspace page. Ensure that the platform software is downloaded before proceeding with the steps listed below. Also, note the host machine requirements listed on that page which is essential to build and execute the platform software stack.

Procedure to perform 1-bit CE injection and handling on Secure RAM

Boot upto Busybox

Refer to the Busybox Boot page to build the reference design platform software stack and boot into busybox on the Neoverse RD FVP.

Secure RAM error handling test

Run below command to inject 1-bit CE on the Secure RAM. This test injects error to address location 0x0403_0500. This test used EINJ ACPI table to make the injection. Base element RAM is not a standard defined error_type in EINJ ACPI table so use the vendor defined error type. Bit 31 of error_type field represents vendor error type. Use error_type value 0x8002_0000 to represent Secure RAM errors.

echo 0x80020000 > /sys/kernel/debug/apei/einj/error_type
echo 1 > /sys/kernel/debug/apei/einj/error_inject

The firmware publishes this error to OSPM via standard error record format (CPER) for Memory errors. The kernel on reception of this error information logs this on the console.

{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 20
{1}[Hardware Error]: It has been corrected by h/w and requires no further action
{1}[Hardware Error]: event severity: corrected
{1}[Hardware Error]:  Error 0, type: corrected
{1}[Hardware Error]:   section_type: memory error
{1}[Hardware Error]:   physical_address: 0x0000000004030500
{1}[Hardware Error]:   physical_address_mask: 0x0000ffffffffffff

Copyright (c) 2022-2023, Arm Limited. All rights reserved.