Fremont CPU RAS Test

Overview

The RD Fremont platform has support for 2 error nodes.The presence of these nodes thus enables RAS extension on RD Fremont core.

  • Node 0: Includes the L3 memory system in the DSU.

  • Node 1: Includes the private L1 and L2 memory systems in the core.

The RAM’s in the RD Fremont core support SED parity (Single Error Detect) and SECDED ECC (Single Error Correct Double Error Detect) capabilities.

RD Fremont also supports inserting errors in the error detection logic to verify error handling software.

Note

The RD Fremont platform is based on direct connect configuration and has no DSU. Hence RD Fremont reference design platform supports only one error node i.e Node0.

DE error injection on RD Fremont

Poseidon core implements Pseudo Fault Generation registers. With the help of these register software can inject either CE, DE or UE into the cache RAMs.

Detailed Error injection software sequence is illustrated to inject 1-bit DE.

  • Select error record for L1 and L2 memory systems i.e. Node0
    • write_errselr_el1 (0)

  • Program the Error Control Register to enable Error Detection, FHI for CE, DE and UE.

    • write_erxctlr_el1 (0x109) (Note: To enable ERI on UE write 0x10D)

  • Program the PFG Control Register to 0.
    • write_cpu_pfg_ctrl_register (0)

  • Clear the Error Status Register to 0.
    • write_erxstatus_el1 (0xFFC00000)

  • Set PFG countdown register to 1.
    • write_cpu_pfg_cdn_register (1)

  • For Corrected Error injection write
    • write_cpu_pfg_ctrl_register (0x80000020) // Generates FHI interrupt

Download the platform software

Skip this section if the required sources have been downloaded.

To obtain the required sources for the platform, follow the steps listed on the Setup Workspace page. Ensure that the platform software is downloaded before proceeding with the steps listed below. Also, note the host machine requirements listed on that page which is essential to build and execute the platform software stack.

Select the Build option

RD Fremont CPU supports both Firmware First and Kernel First Error handling. At givenpoint of time either of the support can be enabled. Firmware First Support is enabled by default. To enable Kernel First support disable build option TF_A_RAS_FW_FIRST=0. Navigate to your workspace and

  • For Firmware First
    • vim build-scripts/configs/rdfremontcfg1/rdfremontcfg1

    • Set TF_A_RAS_FW_FIRST=1

  • For Kernel First
    • vim build-scripts/configs/rdfremontcfg1/rdfremontcfg1

    • Set TF_A_RAS_FW_FIRST=0

Note

Clean and build once you switch error handling.

Procedure to perform DE injection and handling on N2 CPU

Boot upto Busybox

Refer to the Busybox Boot or Buildroot Boot page to build the reference design platform software stack and boot into busybox on the Neoverse RD FVP.

Fremont CPU error handling test

After the busybox boot is complete, use below commands to inject 1-bit DE on the RD Fremont. EINJ table debugfs enteries are used to inject the error. The “sel-firmware-first” field in oem-einj is utilized to toggle firmware first error injection, with the default being kernel first error injection. “sel-error-type” is used for choosing the type of error injection, and the current implementation suppports deferred errors.

  • Firmware First Error Injection

mount -t debugfs none /sys/kernel/debug (Needed for buildroot)
echo 0x80020000 > /sys/kernel/debug/apei/einj/error_type
echo 1 > /sys/kernel/debug/apei/einj/oem-einj/sel-firmware-first
echo 2 > /sys/kernel/debug/apei/einj/oem-einj/sel-component
echo 2 > /sys/kernel/debug/apei/einj/oem-einj/sel-error-type
echo 1 > /sys/kernel/debug/apei/einj/error_inject
  • Kernel First Error Injection

mount -t debugfs none /sys/kernel/debug (Needed for buildroot)
echo 0x80020000 > /sys/kernel/debug/apei/einj/error_type
echo 0 > /sys/kernel/debug/apei/einj/oem-einj/sel-firmware-first
echo 2 > /sys/kernel/debug/apei/einj/oem-einj/sel-component
echo 2 > /sys/kernel/debug/apei/einj/oem-einj/sel-error-type
echo 1 > /sys/kernel/debug/apei/einj/error_inject

Note

Error injection, whether firmware-first or kernel-first, are both initiated from the kernel.

Firmware First Error Handling

On successful error injection the firmware reception log’s this error information on the console.

INFO:    [CPU RAS] CPU intr received = 17 on cpu_id = 2
INFO:    [CPU RAS] ERXMISC0_EL1 = 0x0
INFO:    [CPU RAS] ERXSTATUS_EL1 = 0x40800000
INFO:    [CPU RAS] ERXADDR_EL1 = 0x0

Kernel First Error Handling

On successful error injection the kernel receives a error event which is received in the irq handler. The handler traverses through the error record info and logs the error. Logs from kernel first error handling test.

[ 2365.760926] Injecting DE-
[ 2365.760928] ARM RAS: error from CPU7
[ 2365.760930] ERR0STATUS: 0x40800000

Copyright (c) 2022-2024, Arm Limited. All rights reserved.