Shared RAM ECC RAS Test

Overview

The RD Fremont platform has support for Shared RAM that is shared between AP, MCP, SCP and RSS. The shared RAM is protected with SECDED (Single Error Correct Double Error Detect). RD Fremont platform defines ECC RAS registers to log any ECC errors that occur during Shared RAM access from each master AP, SCP, MCP or RSS. There are 4 sets of ECC RAS registers defined for each master to log errors based on master’s PAS. The list for Shared RAM ECC RAS registers is defined below:

  • AP Secure RAM ECC RAS registers

  • AP Non-Secure RAM ECC RAS registers

  • AP Realm RAM ECC RAS registers

  • AP Root RAM ECC RAS registers

  • SCP Secure RAM ECC RAS registers

  • SCP Non-Secure RAM ECC RAS registers

  • SCP Realm RAM ECC RAS registers

  • SCP Root RAM ECC RAS registers

  • MCP Secure RAM ECC RAS registers

  • MCP Non-Secure RAM ECC RAS registers

  • MCP Realm RAM ECC RAS registers

  • MCP Root RAM ECC RAS registers

For instance any error that occurs during SRAM access from AP when AP is executing in root PAS is logged into “AP Root RAM ECC RAS registers”. This doc demonstrates the error logging for 1-bit CE that occurs during SRAM access from AP when executing in root PAS.

Note

This test is only supported on RD-Fremont-Cfg1 platform. The test is limited to error logging at EL3 and does not involve Host OS as explained in section “Firmware First Error Handling” of RAS document

1-bit CE error injection on Shared RAM

Each ECC RAS register set implements SRAMECC_ERRMISC1 register which provides a way to inject Corrected Error (CE) or Uncorrected Error (UE) in the Shared RAM. The error injection only takes effect if the register programming is followed by a read access to shared RAM. If the injection is successful the error records pertaining to the master and respective access are populated with error information and an error interrupt is delivered to the master.

Detailed Error injection software sequence is illustrated to inject 1-bit CE into Shared RAM from AP executing in root PAS.

  • Add memory map for the Shared RAM ECC RAS registers memory space.

  • Add memory map for the Shared memory space.

  • Program the SRAMECC_ERRMISC1 register to inject CE.
    • mmio_write_32((AP_RT_RAM_ECC_RAS_BASE + SRAM_ERR_MISC1_OFFSET),

      SRAM_INJECT_ERROR_CE);

  • Read any Shared RAM region.
    • data = *(volatile uint32_t *) SHARED_RAM_ADDR

Download the platform software

Skip this section if the required sources have been downloaded.

To obtain the required sources for the platform, follow the steps listed on the Setup Workspace page. Ensure that the platform software is downloaded before proceeding with the steps listed below. Also, note the host machine requirements listed on that page which is essential to build and execute the platform software stack.

Procedure to perform 1-bit CE injection and handling on Shared RAM

Boot upto Busybox

Refer to the Busybox Boot or Buildroot Boot page to build the reference design platform software stack and boot into busybox on the Neoverse RD FVP.

Shared RAM error handling test

Run below command to inject 1-bit CE to the Shared RAM. This test uses EINJ ACPI table to perform error injection. Shared RAM is not a standard defined error_type in EINJ ACPI table so use the vendor defined error type. Bit 31 of error_type field represents vendor error type. Use error_type value 0x8002_0000 to represent Shared RAM errors.

mount -t debugfs none /sys/kernel/debug (Needed for buildroot)
echo 0x80020000 > /sys/kernel/debug/apei/einj/error_type
echo 1 > /sys/kernel/debug/apei/einj/oem-einj/sel-firmware-first
echo 1 > /sys/kernel/debug/apei/einj/oem-einj/sel-component
echo 1 > /sys/kernel/debug/apei/einj/oem-einj/sel-error-type
echo 1 > /sys/kernel/debug/apei/einj/error_inject

Shared RAM error handling happens in Firmware first mode. The EL3 firmware receives the fault handling interrupt (FHI) for the corrected error detected and logs the error on the secure console.

INFO:    SGI: Base element RAM interrupt [85] handler
INFO:    ErrStatus = 0x86000000
INFO:    ErrAddr = 0x19100

Copyright (c) 2024, Arm Limited. All rights reserved.