Fremont CPU RAS Test
Overview
The RD Fremont platform has support for 2 error nodes.The presence of these nodes thus enables RAS extension on RD Fremont core.
Node 0: Includes the L3 memory system in the DSU.
Node 1: Includes the private L1 and L2 memory systems in the core.
The RAM’s in the RD Fremont core support SED parity (Single Error Detect) and SECDED ECC (Single Error Correct Double Error Detect) capabilities.
RD Fremont also supports inserting errors in the error detection logic to verify error handling software.
Note
The RD Fremont platform is based on direct connect configuration and has no DSU. Hence RD Fremont reference design platform supports only one error node i.e Node0.
DE error injection on RD Fremont
Poseidon core implements Pseudo Fault Generation registers. With the help of these register software can inject either CE, DE or UE into the cache RAMs.
Detailed Error injection software sequence is illustrated to inject 1-bit DE.
- Select error record for L1 and L2 memory systems i.e. Node0
write_errselr_el1 (0)
Program the Error Control Register to enable Error Detection, FHI for CE, DE and UE.
write_erxctlr_el1 (0x109) (Note: To enable ERI on UE write 0x10D)
- Program the PFG Control Register to 0.
write_cpu_pfg_ctrl_register (0)
- Clear the Error Status Register to 0.
write_erxstatus_el1 (0xFFC00000)
- Set PFG countdown register to 1.
write_cpu_pfg_cdn_register (1)
- For Corrected Error injection write
write_cpu_pfg_ctrl_register (0x80000020) // Generates FHI interrupt
Download the platform software
Skip this section if the required sources have been downloaded.
To obtain the required sources for the platform, follow the steps listed on the Setup Workspace page. Ensure that the platform software is downloaded before proceeding with the steps listed below. Also, note the host machine requirements listed on that page which is essential to build and execute the platform software stack.
Select the Build option
RD Fremont CPU supports both Firmware First and Kernel First Error handling. At givenpoint of time either of the support can be enabled. Firmware First Support is enabled by default. To enable Kernel First support disable build option TF_A_RAS_FW_FIRST=0. Navigate to your workspace and
- For Firmware First
vim build-scripts/configs/rdfremontcfg1/rdfremontcfg1
Set TF_A_RAS_FW_FIRST=1
- For Kernel First
vim build-scripts/configs/rdfremontcfg1/rdfremontcfg1
Set TF_A_RAS_FW_FIRST=0
Note
Clean and build once you switch error handling.
Procedure to perform DE injection and handling on N2 CPU
Boot upto Busybox
Refer to the Busybox Boot or Buildroot Boot page to build the reference design platform software stack and boot into busybox on the Neoverse RD FVP.
Fremont CPU error handling test
After the busybox boot is complete, use below commands to inject 1-bit DE on the RD Fremont. EINJ table debugfs enteries are used to inject the error. The “sel-firmware-first” field in oem-einj is utilized to toggle firmware first error injection, with the default being kernel first error injection. “sel-error-type” is used for choosing the type of error injection, and the current implementation suppports deferred errors.
Firmware First Error Injection
mount -t debugfs none /sys/kernel/debug (Needed for buildroot)
echo 0x80020000 > /sys/kernel/debug/apei/einj/error_type
echo 1 > /sys/kernel/debug/apei/einj/oem-einj/sel-firmware-first
echo 2 > /sys/kernel/debug/apei/einj/oem-einj/sel-component
echo 2 > /sys/kernel/debug/apei/einj/oem-einj/sel-error-type
echo 1 > /sys/kernel/debug/apei/einj/error_inject
Kernel First Error Injection
mount -t debugfs none /sys/kernel/debug (Needed for buildroot)
echo 0x80020000 > /sys/kernel/debug/apei/einj/error_type
echo 0 > /sys/kernel/debug/apei/einj/oem-einj/sel-firmware-first
echo 2 > /sys/kernel/debug/apei/einj/oem-einj/sel-component
echo 2 > /sys/kernel/debug/apei/einj/oem-einj/sel-error-type
echo 1 > /sys/kernel/debug/apei/einj/error_inject
Note
Error injection, whether firmware-first or kernel-first, are both initiated from the kernel.
Firmware First Error Handling
On successful error injection the firmware reception log’s this error information on the console.
INFO: [CPU RAS] CPU intr received = 17 on cpu_id = 2
INFO: [CPU RAS] ERXMISC0_EL1 = 0x0
INFO: [CPU RAS] ERXSTATUS_EL1 = 0x40800000
INFO: [CPU RAS] ERXADDR_EL1 = 0x0
Kernel First Error Handling
On successful error injection the kernel receives a error event which is received in the irq handler. The handler traverses through the error record info and logs the error. Logs from kernel first error handling test.
[ 2365.760926] Injecting DE-
[ 2365.760928] ARM RAS: error from CPU7
[ 2365.760930] ERR0STATUS: 0x40800000
Copyright (c) 2022-2024, Arm Limited. All rights reserved.