Skiboot Versions Save

OPAL boot and runtime firmware for POWER

v5.10.4

6 years ago

skiboot-5.10.4


skiboot 5.10.4 was released on Wednesday April 4th, 2018. It replaces skiboot-5.10.3 as the current stable release in the 5.10.x series.

It is recommended that 5.10.3 be used instead of any previous 5.10.x version due to the bug fixes and debugging enhancements in it.

Over skiboot-5.10.3, we have one bug fix:

  • xive: disable store EOI support

    Hardware has limitations which would require to put a sync after each store EOI to make sure the MMIO operations that change the ESB state are ordered. This is a killer for performance and the PHBs do not support the sync. So remove the store EOI for the moment, until hardware is improved.

    Also, while we are at changing the XIVE source flags, let’s fix the settings for the PHB4s which should follow these rules :

    • SHIFT_BUG for DD10

    • STORE_EOI for DD20 and if enabled

    • TRIGGER_PAGE for DDx0 and if not STORE_EOI

v5.10.3

6 years ago

skiboot-5.10.3


skiboot 5.10.3 was released on Thursday March 28th, 2018. It replaces skiboot-5.10.2 as the current stable release in the 5.10.x series.

It is recommended that 5.10.3 be used instead of any previous 5.10.x version due to the bug fixes and debugging enhancements in it.

Over skiboot-5.10.2, we have a few improvements and bug fixes:

  • NPU2: dump NPU2 registers on npu2 HMI

    Due to the nature of debugging npu2 issues, folk are wanting the full list of NPU2 registers dumped when there’s a problem.

    This is different than the solution introduced in 5.10.1 as there we would dump the registers in a way that would trigger a FIR bit that would confuse PRD.

  • npu2: Add performance tuning SCOM inits

    Peer-to-peer GPU bandwidth latency testing has produced some tunable values that improve performance. Add them to our device initialization.

    File these under things that need to be cleaned up with nice #defines for the register names and bitfields when we get time.

    A few of the settings are dependent on the system’s particular NVLink topology, so introduce a helper to determine how many links go to a single GPU.

  • hw/npu2: Assign a unique LPARSHORTID per GPU

    This gets used elsewhere to index items in the XTS tables.

  • occ: Set up OCC messaging even if we fail to setup pstates

    This means that we no longer hit this bug if we fail to get valid pstates from the OCC.

    [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 [ 10.318805] Disabling lock debugging due to kernel taint [ 10.318808] Severe Machine check interrupt [Not recovered] [ 10.318812] NIP [000000003003e434]: 0x3003e434 [ 10.318813] Initiator: CPU [ 10.318815] Error type: Real address [Load/Store (foreign)] [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3 [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240 [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1) [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000 [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1

  • core/fast-reboot: disable fast reboot upon fundamental entry/exit/locking errors

    This disables fast reboot in several more cases where serious errors like lock corruption or call re-entrancy are detected.

  • core/opal: allow some re-entrant calls

    This allows a small number of OPAL calls to succeed despite re- entering the firmware, and rejects others rather than aborting.

    This allows a system reset interrupt that interrupts OPAL to do something useful. Sreset other CPUs, use the console, which allows xmon to work or stack traces to be printed, reboot the system.

    Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which is used for many other things that does not mean a serious permanent error.

  • core/opal: abort in case of re-entrant OPAL call

    The stack is already destroyed by the time we get here, so there is not much point continuing.

  • npu2: Disable fast reboot

    Fast reboot does not yet work right with the NPU. It’s been disabled on NVLink and OpenCAPI machines. Do the same for NVLink2.

    This amounts to a port of 3e4577939bbf (“npu: Fix broken fast reset”) from the npu code to npu2.

v5.10.2

6 years ago

skiboot-5.10.2


skiboot 5.10.2 was released on Tuesday March 6th, 2018. It replaces skiboot-5.10.1 as the current stable release in the 5.10.x series.

Over skiboot-5.10.1, we have one improvement:

  • Tie tm-suspend fw-feature and opal_reinit_cpus() together

    Currently opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) always returns OPAL_UNSUPPORTED.

    This ties the tm suspend fw-feature to the opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) so that when tm suspend is disabled, we correctly report it to the kernel. For backwards compatibility, it’s assumed tm suspend is available if the fw-feature is not present.

    Currently hostboot will clear fw-feature(TM_SUSPEND_ENABLED) on P9N DD2.1. P9N DD2.2 will set fw-feature(TM_SUSPEND_ENABLED). DD2.0 and below has TM disabled completely (not just suspend).

    We are using opal_reinit_cpus() to determine this setting (rather than the device tree/HDAT) as some future firmware may let us change this dynamically after boot. That is not the case currently though.

v5.10.1

6 years ago

skiboot-5.10.1


skiboot 5.10.1 was released on Thursday March 1st, 2018. It replaces skiboot-5.10 as the current stable release in the 5.10.x series.

Over skiboot-5.10, we have an improvement for debugging NPU2/NVLink problems and a bug fix. These changes are:

  • NPU2 HMIs: dump out a LOT of npu2 registers for debugging

  • libflash/blocklevel: Correct miscalculation in blocklevel_smart_erase()

    This fixes a bug in pflash.

    If blocklevel_smart_erase() detects that the smart erase fits entire in one erase block, it has an early bail path. In this path it miscaculates where in the buffer the backend needs to read from to perform the final write.

    Fixes: https://github.com/open-power/skiboot/issues/151

v6.0-rc1

6 years ago

skiboot-6.0-rc1


skiboot v6.0-rc1 was released on Tuesday May 1st 2018. It is the first release candidate of skiboot 6.0, which will become the new stable release of skiboot following the 5.11 release, first released April 6th 2018.

Skiboot 6.0 will mark the basis for op-build v2.0 and will be required for POWER9 systems.

skiboot v6.0-rc1 contains all bug fixes as of skiboot-5.11, skiboot-5.10.5, and skiboot-5.4.9 (the currently maintained stable releases). Once 6.0 is released, we do not expect any further stable releases in the 5.10.x series, nor in the 5.11.x series.

For how the skiboot stable releases work, see Skiboot stable tree rules and releases for details.

The current plan is to cut the final 6.0 in early May, with skiboot 6.0 being for all POWER8 and POWER9 platforms in op-build v2.0.

Over skiboot-5.11, we have the following changes:

New Features

  • Disable stop states from OPAL

    On ZZ, stop4,5,11 are enabled for PowerVM, even though doing so may cause problems with OPAL due to bugs in hcode.

    For other platforms, this isn’t so much of an issue as we can just control stop states by the MRW. However the rebuild-the-world approach to changing values there is a bit annoying if you just want to rule out a specific stop state from being problematic.

    Provide an nvram option to override what’s disabled in OPAL.

    The OPAL mask is currently ~0xE0000000 (i.e. all but stop 0,1,2)

    You can set an NVRAM override with:

    nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0xFFFFFFF

    This nvram override will disable all stop states.

  • interrupts: Create an “interrupts” property in the OPAL node

    Deprecate the old “opal-interrupts”, it’s still there, but the new property follows the standard and allow us to specify whether an interrupt is level or edge sensitive.

    Similarly create “interrupt-names” whose content is identical to “opal-interrupts-names”.

  • SBE: Add timer support on POWER9

    SBE on P9 provides one shot programmable timer facility. We can use this to implement OPAL timers and hence limit the reliance on the Linux heartbeat (similar to HW timer facility provided by SLW on P8).

  • Add SBE driver support

    SBE (Self Boot Engine) on P9 has two different jobs: - Boot the chip up to the point the core is functional - Provide various services like timer, scom, stash MPIPL, etc., at runtime

    We will use SBE for various purposes like timer, MPIPL, etc.

  • opal:hmi: Add missing processor recovery reason string.

    With this patch now we see reason string printed for CORE_WOF[43] bit.

    [ 477.352234986,7] HMI: [Loc: U78D3.001.WZS004A-P1-C48]: P:8 C:22 T:3: Processor recovery occurred. [ 477.352240742,7] HMI: Core WOF = 0x0000000000100000 recovered error: [ 477.352242181,7] HMI: PC - Thread hang recovery

  • Add DIMM actual speed to device tree

    Recent HDAT provides DIMM actuall speed. Lets add this to device tree.

  • Fix DIMM size property

    Today we parse vpd blob to get DIMM size information. This is limited to FSP based system. HDAT provides DIMM size value. Lets use that to populate device tree. So that we can get size information on BMC based system as well.

  • PCI: Set slot power limit when supported

    The PCIe slot capability can be implemented in a root or switch downstream port to set the maximum power a card is allowed to draw from the system. This patch adds support for setting the power limit when the platform has defined one.

  • hdata/spira: parse vpd to add part-number and serial-number to xscom@ node

    Expected by FWTS and associates our processor with the part/serial number, which is obviously a good thing for one’s own sanity.

Improved HMI Handling

  • opal/hmi: Add documentation for opal_handle_hmi2 call

  • opal/hmi: Generate hmi event for recovered HDEC parity error.

  • opal/hmi: check thread 0 tfmr to validate latched tfmr errors.

    Due to P9 errata, HDEC parity and TB residue errors are latched for non-zero threads 1-3 even if they are cleared. But these are not latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0 tfmr value and ignore them on non-zero threads if they are not present on thread 0.

  • opal/hmi: Print additional debug information in rendezvous.

  • opal/hmi: Fix handling of TFMR parity/corrupt error.

    While testing TFMR parity/corrupt error it has been observed that HMIs are delivered twice for this error

    • First time HMI is delivered with HMER[4,5]=1 and TFMR[60]=1.

    • Second time HMI is delivered with HMER[4,5]=1 and TFMR[60]=0 with valid TB.

    On second HMI we end up throwing “HMI: TB invalid without core error reported” even though TB is in a valid state.

  • opal/hmi: Stop flooding HMI event for TOD errors.

    Fix the issue where every thread on the chip sends HMI event to host for TOD errors. TOD errors are reported to all the core/threads on the chip. Any one thread can fix the error and send event. Rest of the threads don’t need to send HMI event unnecessarily.

  • opal/hmi: Fix soft lockups during TOD errors

    There are some TOD errors which do not affect working of TOD and TB. They stay in valid state. Hence we don’t need rendez vous for TOD errors that does not affect TB working.

    TOD errors that affects TOD/TB will report a global error on TFMR[44] alongwith bit 51, and they will go in rendez vous path as expected.

    But the TOD errors that does not affect TB register sets only TFMR bit 51. The TFMR bit 51 is cleared when any single thread clears the TOD error. Once cleared, the bit 51 is reflected to all the cores on that chip. Any thread that reads the TFMR register after the error is cleared will see TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through rendez-vous path and threads that see TFMR[51]=0, returns doing nothing. This ends up in a soft lockups in host kernel.

    This patch fixes this issue by not considering TOD interrupt (TFMR[51]) as a core-global error and hence avoiding rendez-vous path completely. Instead threads that see TFMR[51]=1 will now take different path that just do the TOD error recovery.

  • opal/hmi: Do not send HMI event if no errors are found.

    For TOD errors, all the cores in the chip get HMIs. Any one thread from any core can fix the issue and TFMR will have error conditions cleared. Rest of the threads need take any action if TOD errors are already cleared. Hence thread 0 of every core should get a fresh copy of TFMR before going ahead recovery path. Initialize recover = -1, so that if no errors found that thread need not send a HMI event to linux. This helps in stop flooding host with hmi event by every thread even there are no errors found.

  • opal/hmi: Initialize the hmi event with old value of HMER.

    Do this before we check for TFAC errors. Otherwise the event at host console shows no error reported in HMER register.

    Without this patch the console event show HMER with all zeros

    [ 216.753417] Severe Hypervisor Maintenance interrupt [Recovered] [ 216.753498] Error detail: Timer facility experienced an error [ 216.753509] HMER: 0000000000000000 [ 216.753518] TFMR: 3c12000870e04000

    After this patch it shows old HMER values on host console:

    [ 2237.652533] Severe Hypervisor Maintenance interrupt [Recovered] [ 2237.652651] Error detail: Timer facility experienced an error [ 2237.652766] HMER: 0840000000000000 [ 2237.652837] TFMR: 3c12000870e04000

  • opal/hmi: Rework HMI handling of TFAC errors

    This patch reworks the HMI handling for TFAC errors by introducing 4 rendez-vous points improve the thread synchronization while handling timebase errors that requires all thread to clear dirty data from TB/HDEC register before clearing the errors.

  • opal/hmi: Don’t bother passing HMER to pre-recovery cleanup

    The test for TFAC error is now redundant so we remove it and remove the HMER argument.

  • opal/hmi: Move timer related error handling to a separate function

    Currently no functional change. This is a first step to completely rewriting how these things are handled.

  • opal/hmi: Add a new opal_handle_hmi2 that returns direct info to Linux

    It returns a 64-bit flags mask currently set to provide info about which timer facilities were lost, and whether an event was generated.

  • opal/hmi: Remove races in clearing HMER

    Writing to HMER acts as an “AND”. The current code writes back the value we originally read with the bits we handled cleared. This is racy, if a new bit gets set in HW after the original read, we’ll end up clearing it without handling it.

    Instead, use an all 1’s mask with only the bit handled cleared.

  • opal/hmi: Don’t re-read HMER multiple times

    We want to make sure all reporting and actions are based upon the same snapshot of HMER in case bits get added by HW while we are in OPAL.

libflash and ffspart

Many improvements to the ffspart utility and libflash have come in this release, making ffspart suitable for building bit-identical PNOR images as the existing tooling used by op-build. The plan is to switch op-build to use this infrastructure in the not too distant future.

  • libflash/blocklevel: Make read/write be ECC agnostic for callers

    The blocklevel abstraction allows for regions of the backing store to be marked as ECC protected so that blocklevel can decode/encode the ECC bytes into the buffer automatically without the caller having to be ECC aware.

    Unfortunately this abstraction is far from perfect, this is only useful if reads and writes are performed at the start of the ECC region or in some circumstances at an ECC aligned position - which requires the caller be aware of the ECC regions.

    The problem that has arisen is that the blocklevel abstraction is initialised somewhere but when it is later called the caller is unaware if ECC exists in the region it wants to arbitrarily read and write to. This should not have been a problem since blocklevel knows. Currently misaligned reads will fail ECC checks and misaligned writes will overwrite ECC bytes and the backing store will become corrupted.

    This patch add the smarts to blocklevel_read() and blocklevel_write() to cope with the problem. Note that ECC can always be bypassed by calling blocklevel_raw_() functions.

    All this work means that the gard tool can can safely call blocklevel_read() and blocklevel_write() and as long as the blocklevel knows of the presence of ECC then it will deal with all cases.

    This also commit removes code in the gard tool which compensated for inadequacies no longer present in blocklevel.

  • libflash/blocklevel: Return region start from ecc_protected()

    Currently all ecc_protected() does is say if a region is ECC protected or not. Knowing a region is ECC protected is one thing but there isn’t much that can be done afterwards if this is the only known fact. A lot more can be done if the caller is told where the ECC region begins.

    Knowing where the ECC region start it allows to caller to align its read/and writes. This allows for more flexibility calling read and write without knowing exactly how the backing store is organised.

  • libflash/ecc: Add helpers to align a position within an ecc buffer

    As part of ongoing work to make ECC invisible to higher levels up the stack this function converts a ‘position’ which should be ECC agnostic to the equivalent position within an ECC region starting at a specified location.

  • libflash/ecc: Add functions to deal with unaligned ECC memcpy

  • external/ffspart: Improve error output

  • libffs: Fix bad checks for partition overlap

    Not all TOCs are written at zero

  • libflash/libffs: Allow caller to specifiy header partition

    An FFS TOC is comprised of two parts. A small header which has a magic and very minimmal information about the TOC which will be common to all partitions, things like number of patritions, block sizes and the like. Following this small header are a series of entries. Importantly there is always an entry which encompases the TOC its self, this is usually called the ‘part’ partition.

    Currently libffs always assumes that the ‘part’ partition is at zero. While there is always a TOC and zero there doesn’t actually have to be. PNORs may have multiple TOCs within them, therefore libffs needs to be flexible enough to allow callers to specify TOCs not at zero.

    The ‘part’ partition is otherwise a regular partition which may have flags associated with it. libffs should allow the user to set the flags for the ‘part’ partition.

    This patch achieves both by allowing the caller to specify the ‘part’ partition. The caller can not and libffs will provide a sensible default.

  • libflash/libffs: Refcount ffs entries

    Currently consumers can add an new ffs entry to multiple headers, this is fine but freeing any of the headers will cause the entry to be freed, this causes double free problems.

    Even if only one header is uses, the consumer of the library still has a reference to the entry, which they may well reuse at some other point.

    libffs will now refcount entries and only free when there are no more references.

    This patch also removes the pointless return value of ffs_hdr_free()

  • libflash/libffs: Switch to storing header entries in an array

    Since the libffs no longer needs to sort the entries as they get added it makes little sense to have the complexity of a linked list when an array will suffice.

  • libflash/libffs: Remove backup partition from TOC generation code

    It turns out this code was messy and not all that reliable. Doing it at the library level adds complexity to the library and restrictions to the caller.

    A simpler approach can be achived with the just instantiating multiple ffs_header structures pointing to different parts of the same file.

  • libflash/libffs: Remove the ‘sides’ from the FFS TOC generation code

    It turns out this code was messy and not all that reliable. Doing it at the library level adds complexity to the library and restrictions to the caller.

    A simpler approach can be achived with the just instantiating multiple ffs_header structures pointing to different parts of the same file.

  • libflash/libffs: Always add entries to the end of the TOC

    It turns out that sorted order isn’t the best idea. This removes flexibility from the caller. If the user wants their partitions in sorted order, they should insert them in sorted order.

  • external/ffspart: Remove side, order and backup options

    These options are currently flakey in libflash/libffs so there isn’t much point to being able to use them in ffspart.

    Future reworks planned for libflash/libffs will render these options redundant anyway.

  • libflash/libffs: ffs_close() should use ffs_hdr_free()

  • libflash/libffs: Add setter for a partitions actual size

  • pflash: Use ffs_entry_user_to_string() to standardise flag strings

  • libffs: Standardise ffs partition flags

    It seems we’ve developed a character respresentation for ffs partition flags. Currently only pflash really prints them so it hasn’t been a problem but now ffspart wants to read them in from user input.

    It is important that what libffs reads and what pflash prints remain consistent, we should move the code into libffs to avoid problems.

  • external/ffspart: Allow # comments in input file

p9dsu Platform changes

The p9dsu platform from SuperMicro (also known as ‘Boston’) has received a number of updates, and the patches once carried by SuperMicro are now upstream.

  • p9dsu: detect p9dsu variant even when hostboot doesn’t tell us

    The SuperMicro BMC can tell us what riser type we have, which dictates the PCI slot tables. Usually, in an environment that a customer would experience, Hostboot will do the query with an SMC specific patch (not upstream as there’s no platform specific code in hostboot) and skiboot knows what variant it is based on the compatible string.

    However, if you’re using upstream hostboot, you only get the bare ‘p9dsu’ compatible type. We can work around this by asking the BMC ourselves and setting the slot table appropriately. We do this syncronously in platform init so that we don’t start probing PCI before we setup the slot table.

  • p9dsu: add slot power limit.

  • p9dsu: add pci slot table for Boston LC 1U/2U and Boston LA/ESS.

  • p9dsu HACK: fix system-vpd eeprom

  • p9dsu: change esel command from AMI to IBM 0x3a.

ZZ Platform Changes

  • hdata/i2c: Fix up pci hotplug labels

    These labels are used on the devices used to do PCIe slot power control for implementing PCIe hotplug. I’m not sure how they ended up as “eeprom-pgood” and “eeprom-controller” since that doesn’t make any sense.

  • hdata/i2c: Ignore multi-port I2C devices

    Recent FSP firmware builds add support for multi-port I2C devices such as the GPIO expanders used for the presence detect of OpenCAPI devices and the PCIe hotplug controllers used to power cycle PCIe slots on ZZ.

    The OpenCAPI driver inside of skiboot currently uses a platform- specific method to talk to the relevant I2C device rather than relying on HDAT since not all platforms correctly report the I2C devices (hello Zaius). Additionally the nature of multi-port devices require that we a device specific handler so that we generate the correct DT bindings. Currently we don’t and there is no immediate need for this support so just ignore the multi-port devices for now.

  • hdata/i2c: Replace i2c_ prefix with dev_

    The current naming scheme makes it easy to conflate “i2cm_port” and “i2c_port.” The latter is used to describe multi-port I2C devices such as GPIO expanders and multi-channel PCIe hotplug controllers. Rename i2c_port to dev_port to make the two a bit more distinct.

    Also rename i2c_addr to dev_addr for consistency.

  • hdata/i2c: Ignore CFAM I2C master

    Recent FSP firmware builds put in information about the CFAM I2C master in addition the to host I2C masters accessible via XSCOM. Odds are this information should not be there since there’s no handshaking between the FSP/BMC and the host over who controls that I2C master, but it is so we need to deal with it.

    This patch adds filtering to the HDAT parser so it ignores the CFAM I2C master. Without this it will create a bogus i2cm@ which migh cause issues.

  • ZZ: hw/imc: Add support to load imc catalog lid file

    Add support to load the imc catalog from a lid file packaged as part of the system firmware. Lid number allocated is 0x80f00103.lid.

Bugs Fixed

  • core: Fix iteration condition to skip garded cpu

  • uart: fix uart_opal_flush to take console lock over uart_con_flush This bug meant that OPAL_CONSOLE_FLUSH didn’t take the appropriate locks. Luckily, since this call is only currently used in the crash path.

  • xive: fix missing unlock in error path

  • OPAL_PCI_SET_POWER_STATE: fix locking in error paths

    Otherwise we could exit OPAL holding locks, potentially leading to all sorts of problems later on.

  • hw/slw: Don’t assert on a unknown chip

    For some reason skiboot populates nodes in /cpus/ for the cores on chips that are deconfigured. As a result Linux includes the threads of those cores in it’s set of possible CPUs in the system and attempts to set the SPR values that should be used when waking a thread from a deep sleep state.

    However, in the case where we have deconfigured chip we don’t create a xscom node for that chip and as a result we don’t have a proc_chip structure for that chip either. In turn, this results in an assertion failure when calling opal_slw_set_reg() since it expects the chip structure to exist. Fix this up and print an error instead.

  • opal/hmi: Generate one event per core for processor recovery.

    Processor recovery is per core error. All threads on that core receive HMI. All threads don’t need to generate HMI event for same error.

    Let thread 0 only generate the event.

  • sensors: Dont add DTS sensors when OCC inband sensors are available

    There are two sets of core temperature sensors today. One is DTS scom based core temperature sensors and the second group is the sensors provided by OCC. DTS is the highest temperature among the different temperature zones in the core while OCC core temperature sensors are the average temperature of the core. DTS sensors are read directly by the host by SCOMing the DTS sensors while OCC sensors are read and updated by OCC to main memory.

    Reading DTS sensors by SCOMing is a heavy and slower operation as compared to reading OCC sensors which is as good as reading memory. So dont add DTS sensors when OCC sensors are available.

  • core/fast-reboot: Increase timeout for dctl sreset to 1sec

    Direct control xscom can take more time to complete. We seem to wait too little on Boston failing fast-reboot for no good reason.

    Increase timeout to 1 sec as a reasonable value for sreset to be delivered and core to start executing instructions.

  • occ: sensors-groups: Add DT properties to mark HWMON sensor groups

    Fix the sensor type to match HWMON sensor types. Add compatible flag to indicate the environmental sensor groups so that operations on these groups can be handled by HWMON linux interface.

  • core: Correctly load initramfs in stb container

    Skiboot does not calculate the actual size and start location of the initramfs if it is wrapped by an STB container (for example if loading an initramfs from the ROOTFS partition).

    Check if the initramfs is in an STB container and determine the size and location correctly in the same manner as the kernel. Since load_initramfs() is called after load_kernel() move the call to trustedboot_exit_boot_services() into load_and_boot_kernel() so it is called after both of these.

  • hdat/i2c.c: quieten “v2 found, parsing as v1”

  • hw/imc: Check for pause_microcode_at_boot() return status

    pause_microcode_at_boot() loops through all the chip’s ucode control block and pause the ucode if it is in the running state. But it does not fail if any of the chip’s ucode is not initialised.

    Add code to return a failure if ucode is not initialized in any of the chip. Since pause_microcode_at_boot() is called just before attaching the IMC device nodes in imc_init(), add code to check for the function return.

Slot location code fixes:

  • npu2: Use ibm, loc-code rather than ibm, slot-label

    The ibm,slot-label property is to name the slot that appears under a PCIe bridge. In the past we (ab)used the slot tables to attach names to GPU devices and their corresponding NVLinks which resulted in npu2.c using slot-label as a location code rather than as a way to name slots.

    Fix this up since it’s confusing.

  • hdata/slots: Apply slot label to the parent slot

    Slot names only really make sense when applied to an actual slot rather than a device. On witherspoon the GPU devices have a name associated with the device rather than the slot for the GPUs. Add a hack that moves the slot label to the parent slot rather than on the device itself.

  • pci-dt-slot: Big ol’ cleanup

    The underlying data that we get from HDAT can only really describe a PCIe system. As such we can simplify the devicetree slot lookup code by only caring about the important cases, namly, root ports and switch downstream ports.

    This also fixes a bug where root port didn’t get a Slot label applied which results in devices under that port not having ibm,loc- code set. This results in the EEH core being unable to report the location of EEHed devices under that port.

opal-prd

  • opal-prd: Insert powernv_flash module

    Explictly load powernv_flash module on BMC based system so that we are sure that flash device is created before starting opal-prd daemon.

    Note that I have replaced pnor_available() check with is_fsp_system(). As we want to load module on BMC system only. Also pnor_init has enough logic to detect flash device. Hence pnor_available() becomes redundant check.

NPU2/NVLINK2

  • npu2/hw-procedures: fence bricks on GPU reset

    The NPU workbook defines a way of fencing a brick and getting the brick out of fence state. We do have an implementation of bringing the brick out of fenced/quiesced state. We do the latter in our procedures, but to support run time reset we need to do the former.

    The fencing ensures that access to memory behind the links will not lead to HMI’s, but instead SUE’s will be populated in cache (in the case of speculation). The expectation is then that prior to and after reset, the operating system components will flush the cache for the region of memory behind the GPU.

    This patch does the following:

    1. Implements a npu2_dev_fence_brick() function to set/clear fence state

    2. Clear FIR bits prior to clearing the fence status

    3. Clear’s the fence status

    4. We take the powerbus out of CQ fence much later now, in credits_check() which is the last hardware procedure called after link training.

  • hw/npu2.c: Remove static configuration of NPU2 register

    The NPU_SM_CONFIG0 register currently needs to be configured in Skiboot to select NVLink mode, however Hostboot should configure other bits in this register.

    For some reason Skiboot was explicitly clearing bit-6 (CONFIG_DISABLE_VG_NOT_SYS). It is unclear why this bit was getting cleared as recent Hostboot versions explicitly set it to the correct value based on the specific system configuration. Therefore Skiboot should not alter it.

    Bit-58 (CONFIG_NVLINK_MODE) selects if NVLink mode should be enabled or not. Hostboot does not configure this bit so Skiboot should continue to configure it.

  • npu2: Improve log output of GPU-to-link mapping

    Debugging issues related to unconnected NVLinks can be a little less irritating if we use the NPU2DEV{DBG,INF}() macros instead of prlog().

    In short, change this:

    NPU2: comparing GPU 'GPU2' and NPU2 'GPU1' NPU2: comparing GPU 'GPU3' and NPU2 'GPU1' NPU2: comparing GPU 'GPU4' and NPU2 'GPU1' NPU2: comparing GPU 'GPU5' and NPU2 'GPU1' : npu2_dev_bind_pci_dev: No PCI device for NPU2 device 0006:00:01.0 to bind to. If you expect a GPU to be there, this is a problem.

    to this:

    NPU6:0:1.0 Comparing GPU 'GPU2' and NPU2 'GPU1' NPU6:0:1.0 Comparing GPU 'GPU3' and NPU2 'GPU1' NPU6:0:1.0 Comparing GPU 'GPU4' and NPU2 'GPU1' NPU6:0:1.0 Comparing GPU 'GPU5' and NPU2 'GPU1' : NPU6:0:1.0 No PCI device found for slot 'GPU1'

  • npu2: Move NPU2_XTS_BDF_MAP_VALID assignment to context init

    A bad GPU or other condition may leave us with a subset of links that never get initialized. If an ATSD is sent to one of those bricks, it will never complete, leaving us waiting forever for a response:

    watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [acos:2050] ... Modules linked in: nvidia_uvm(O) nvidia(O) CPU: 23 PID: 2050 Comm: acos Tainted: G W O 4.14.0 #2 task: c0000000285cfc00 task.stack: c000001fea860000 NIP: c0000000000abdf0 LR: c0000000000acc48 CTR: c0000000000ace60 REGS: c000001fea863550 TRAP: 0901 Tainted: G W O (4.14.0) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28004484 XER: 20040000 CFAR: c0000000000abdf4 SOFTE: 1 GPR00: c0000000000acc48 c000001fea8637d0 c0000000011f7c00 c000001fea863820 GPR04: 0000000002000000 0004100026000000 c0000000012778c8 c00000000127a560 GPR08: 0000000000000001 0000000000000080 c000201cc7cb7750 ffffffffffffffff GPR12: 0000000000008000 c000000003167e80 NIP [c0000000000abdf0] mmio_invalidate_wait+0x90/0xc0 LR [c0000000000acc48] mmio_invalidate.isra.11+0x158/0x370

    ATSDs are only sent to bricks which have a valid entry in the XTS_BDF table. So to prevent the hang, don’t set NPU2_XTS_BDF_MAP_VALID unless we make it all the way to creating a context for the BDF.

Secure and Trusted Boot

  • hdata/tpmrel: detect tpm not present by looking up the stinfo->status

    Skiboot detects if tpm is present by checking if a secureboot_tpm_info entry exists. However, if a tpm is not present, hostboot also creates a secureboot_tpm_info entry. In this case, hostboot creates an empty entry, but setting the field tpm_status to TPM_NOT_PRESENT.

    This detects if tpm is not present by looking up the stinfo->status.

    This fixes the “TPMREL: TPM node not found for chip_id=0 (HB bug)” issue, reproduced when skiboot is running on a system that has no tpm.

PCI

  • phb4: Restore bus numbers after CRS

    Currently we restore PCIe bus numbers right after the link is up. Unfortunately as this point we haven’t done CRS so config space may not be accessible.

    This moves the bus number restore till after CRS has happened.

  • romulus: Add a barebones slot table

  • phb4: Quieten and improve “Timeout waiting for electrical link”

    This happens normally if a slot doesn’t have a working HW presence detect and relies instead of inband presence detect.

    The message we display is scary and not very useful unless ou are debugging, so quiten it up and change it to something more meaningful.

  • pcie-slot: Don’t fail powering on an already on switch

    If the power state is already the required value, return OPAL_SUCCESS rather than OPAL_PARAMETER to avoid spurrious errors during boot.

CAPI/OpenCAPI

  • capi: Keep the current mmio windows in the mbt cache table.

    When the phb is used as a CAPI interface, the current mmio windows list is cleaned before adding the capi and the prefetchable memory (M64) windows, which implies that the non-prefetchable BAR is no more configured. This patch allows to set only the mbt bar to pass capi mmio window and to keep, as defined, the other mmio values (M32 and M64).

  • npu2-opencapi: Fix ‘link internal error’ FIR, take 2

    When setting up an opencapi link, we set the transport muxes first, then set the PHY training config register, which includes disabling nvlink mode for the bricks. That’s the order of the init sequence, as found in the NPU workbook.

    In reality, doing so works, but it raises 2 FIR bits in the PowerBus OLL FIR Register for the 2 links when we configure the transport muxes. Presumably because nvlink is not disabled yet and we are configuring the transport muxes for opencapi.

    bit 60: link0 internal error

    bit 61: link1 internal error

    Overall the current setup ends up being correct and everything works, but we raise 2 FIR bits.

    So tweak the order of operations to disable nvlink before configuring the transport muxes. Incidentally, this is what the scripts from the opencapi enablement team were doing all along.

  • npu2-opencapi: Fix ‘link internal error’ FIR, take 1

    When we setup a link, we always enable ODL0 and ODL1 at the same time in the PHY training config register, even though we are setting up only one OTL/ODL, so it raises a “link internal error” FIR bit in the PowerBus OLL FIR Register for the second link. The error is harmless, as we’ll eventually setup the second link, but there’s no reason to raise that FIR bit.

    The fix is simply to only enable the ODL we are using for the link.

  • phb4: Do not set the PBCQ Tunnel BAR register when enabling capi mode.

    The cxl driver will set the capi value, like other drivers already do.

  • phb4: set TVT1 for tunneled operations in capi mode

    The ASN indication is used for tunneled operations (as_notify and atomics). Tunneled operation messages can be sent in PCI mode as well as CAPI mode.

    The address field of as_notify messages is hijacked to encode the LPID/PID/TID of the target thread, so those messages should not go through address translation. Therefore bit 59 is part of the ASN indication.

    This patch sets TVT#1 in bypass mode when capi mode is enabled, to prevent as_notify messages from being dropped.

Debugging/Testing improvements

  • core/stack: backtrace unwind basic OPAL call details

    Put OPAL callers’ r1 into the stack back chain, and then use that to unwind back to the OPAL entry frame (as opposed to boot entry, which has a 0 back chain).

    From there, dump the OPAL call token and the caller’s r1. A backtrace looks like this:

    CPU 0000 Backtrace: S: 0000000031c03ba0 R: 000000003001a548 ._abort+0x4c S: 0000000031c03c20 R: 000000003001baac .opal_run_pollers+0x3c S: 0000000031c03ca0 R: 000000003001bcbc .opal_poll_events+0xc4 S: 0000000031c03d20 R: 00000000300051dc opal_entry+0x12c --- OPAL call entry token: 0xa caller R1: 0xc0000000006d3b90 ---

    This is pretty basic for the moment, but it does give you the bottom of the Linux stack. It will allow some interesting improvements in future.

    First, with the eframe, all the call’s parameters can be printed out as well. The ___backtrace / ___print_backtrace API needs to be reworked in order to support this, but it’s otherwise very simple (see opal_trace_entry()).

    Second, it will allow Linux’s stack to be passed back to Linux via a debugging opal call. This will allow Linux’s BUG() or xmon to also print the Linux back trace in case of a NMI or MCE or watchdog lockup that hits in OPAL.

  • asm/head: implement quiescing without stack or clobbering regs

    Quiescing currently is implmeented in C in opal_entry before the opal call handler is called. This works well enough for simple cases like fast reset when one CPU wants all others out of the way.

    Linux would like to use it to prevent an sreset IPI from interrupting firmware, which could lead to deadlocks when crash dumping or entering the debugger. Linux interrupts do not recover well when returning back to general OPAL code, due to r13 not being restored. OPAL also can’t be re-entered, which may happen e.g., from the debugger.

    So move the quiesce hold/reject to entry code, beore the stack or r1 or r13 registers are switched. OPAL can be interrupted and returned to or re-entered during this period.

    This does not completely solve all such problems. OPAL will be interrupted with sreset if the quiesce times out, and it can be interrupted by MCEs as well. These still have the issues above.

  • core/opal: Allow poller re-entry if OPAL was re-entered

    If an NMI interrupts the middle of running pollers and the OS invokes pollers again (e.g., for console output), the poller re- entrancy check will prevent it from running and spam the console.

    That check was designed to catch a poller calling opal_run_pollers, OPAL re-entrancy is something different and is detected elsewhere. Avoid the poller recursion check if OPAL has been re-entered. This is a best-effort attempt to cope with errors.

  • core/opal: Emergency stack for re-entry

    This detects OPAL being re-entered by the OS, and switches to an emergency stack if it was. This protects the firmware’s main stack from re-entrancy and allows the OS to use NMI facilities for crash / debug functionality.

    Further nested re-entry will destroy the previous emergency stack and prevent returning, but those should be rare cases.

    This stack is sized at 16kB, which doubles the size of CPU stacks, so as not to introduce a regression in primary stack size. The 16kB stack originally had a 4kB machine check stack at the top, which was removed by 80eee1946 (“opal: Remove machine check interrupt patching in OPAL.”). So it is possible the size could be tightened again, but that would require further analysis.

  • hdat_to_dt: hash_prop the same on all platforms Fixes this unit test on ppc64le hosts.

  • mambo: Add persistent memory disk support

    This adds support to for mapping disks images using persistent memory. Disks can be added by setting this ENV variable:

    PMEM_DISK=”/mydisks/disk1.img,/mydisks/disk2.img”

    These will show up in Linux as /dev/pmem0 and /dev/pmem1.

    This uses a new feature in mambo “mysim memory mmap ..” which is only available since mambo commit 0131f0fc08 (from 24/4/2018).

    This also needs the of_pmem.c driver in Linux which is only available since v4.17. It works with powernv_defconfig + CONFIG_OF_PMEM.

  • external/mambo: Add di command to decode instructions

    By default you get 16 instructions but you can specify the number you want. i.e.

    systemsim % di 0x100 4 0x0000000000000100: Enc:0xA64BB17D : mtspr HSPRG1,r13 0x0000000000000104: Enc:0xA64AB07D : mfspr r13,HSPRG0 0x0000000000000108: Enc:0xF0092DF9 : std r9,0x9F0(r13) 0x000000000000010C: Enc:0xA6E2207D : mfspr r9,PPR

    Using di since it’s what xmon uses.

  • mambo/mambo_utils.tcl: Inject an MCE at a specified address

    Currently we don’t support injecting an MCE on a specific address. This is useful for testing functionality like memcpy_mcsafe() (see https://patchwork.ozlabs.org/cover/893339/)

    The core of the functionality is a routine called inject_mce_ue_on_addr, which takes an addr argument and injects an MCE (load/store with UE) when the specified address is accessed by code. This functionality can easily be enhanced to cover instruction UE’s as well.

    A sample use case to create an MCE on stack access would be

    set addr [mysim display gpr 1] inject_mce_ue_on_addr $addr

    This would cause an mce on any r1 or r1 based access

  • external/mambo: improve helper for machine checks

    Improve workarounds for stop injection, because mambo often will trigger on 0x104/204 when injecting sreset/mces.

    This also adds a workaround to skip injecting on reservations to avoid infinite loops when doing inject_mce_step.

  • travis: Enable ppc64le builds

    At least on the IBM Travis Enterprise instance, we can now do ppc64le builds!

    We can only build a subset of our matrix due to availability of ppc64le distros. The Dockerfiles need some tweaking to only attempt to install (x86_64 only) Mambo binaries, as well as the build scripts.

  • external: Add “lpc” tool

    This is a little front-end to the lpc debugfs files to access the LPC bus from userspace on the host.

  • core/test/run-trace: fix on ppc64el

skiboot-5.4.8

6 years ago

skiboot-5.4.8

skiboot-5.4.8 was released on Wednesday October 11th, 2017. It replaces skiboot-5.4.7 as the current stable release in the 5.4.x series.

Over skiboot-5.4.7, we have a few bug fixes for FSP platforms:

  • libflash/file: Handle short read()s and write()s correctly

    Currently we don't move the buffer along for a short read() or write() and nor do we request only the remaining amount.

  • FSP/NVRAM: Handle "get vNVRAM statistics" command

    FSP sends MBOX command (cmd : 0xEB, subcmd : 0x05, mod : 0x00) to get vNVRAM statistics. OPAL doesn't maintain any such statistics. Hence return FSP_STATUS_INVALID_SUBCMD.

    Sample OPAL log: :

    [16944.384670488,3] FSP: Unhandled message eb0500
    [16944.474110465,3] FSP: Unhandled message eb0500
    [16945.111280784,3] FSP: Unhandled message eb0500
    [16945.293393485,3] FSP: Unhandled message eb0500
    
  • FSP/CONSOLE: Limit number of error logging

    Commit c8a7535f (FSP/CONSOLE: Workaround for unresponsive ipmi daemon, added in skiboot 5.4.6 and 5.7-rc1) added error logging when buffer is full. In some corner cases kernel may call this function multiple time and we may endup logging error again and again.

    This patch fixes it by generating error log only once.

  • FSP/CONSOLE: Fix fsp_console_write_buffer_space() call

    Kernel calls fsp_console_write_buffer_space() to check console buffer space availability. If there is enough buffer space to write data, then kernel will call fsp_console_write() to write actual data.

    In some extreme corner cases (like one explained in commit c8a7535f) console becomes full and this function returns 0 to kernel (or space available in console buffer < next incoming data size). Kernel will continue retrying until it gets enough space. So we will start seeing RCU stalls.

    This patch keeps track of previous available space. If previous space is same as current means not enough space in console buffer to write incoming data. It may be due to very high console write operation and slow response from FSP -OR- FSP has stopped processing data (ex: because of ipmi daemon died). At this point we will start timer with timeout of SER_BUFFER_OUT_TIMEOUT (10 secs). If situation is not improved within 10 seconds means something went bad. Lets return OPAL_RESOURCE so that kernel can drop console write and continue.

  • FSP/CONSOLE: Close SOL session during R/R

    Presently we are not closing SOL and FW console sessions during R/R. Host will continue to write to SOL buffer during FSP R/R. If there is heavy console write operation happening during FSP R/R (like running top command inside console), then at some point console buffer becomes full. fsp_console_write_buffer_space() returns 0 (or less than required space to write data) to host. While one thread is busy writing to console, if some other threads tries to write data to console we may see RCU stalls (like below) in kernel.

    kernel call trace: :

    [ 2082.828363] INFO: rcu_sched detected stalls on CPUs/tasks: { 32} (detected by 16, t=6002 jiffies, g=23154, c=23153, q=254769)
    [ 2082.828365] Task dump for CPU 32:
    [ 2082.828368] kworker/32:3    R  running task        0  4637      2 0x00000884
    [ 2082.828375] Workqueue: events dump_work_fn
    [ 2082.828376] Call Trace:
    [ 2082.828382] [c000000f1633fa00] [c00000000013b6b0] console_unlock+0x570/0x600 (unreliable)
    [ 2082.828384] [c000000f1633fae0] [c00000000013ba34] vprintk_emit+0x2f4/0x5c0
    [ 2082.828389] [c000000f1633fb60] [c00000000099e644] printk+0x84/0x98
    [ 2082.828391] [c000000f1633fb90] [c0000000000851a8] dump_work_fn+0x238/0x250
    [ 2082.828394] [c000000f1633fc60] [c0000000000ecb98] process_one_work+0x198/0x4b0
    [ 2082.828396] [c000000f1633fcf0] [c0000000000ed3dc] worker_thread+0x18c/0x5a0
    [ 2082.828399] [c000000f1633fd80] [c0000000000f4650] kthread+0x110/0x130
    [ 2082.828403] [c000000f1633fe30] [c000000000009674] ret_from_kernel_thread+0x5c/0x68
    

    Hence lets close SOL (and FW console) during FSP R/R.

  • FSP/CONSOLE: Do not associate unavailable console

    Presently OPAL sends associate/unassociate MBOX command for all FSP serial console (like below OPAL message). We have to check console is available or not before sending this message.

    OPAL log: :

    [ 5013.227994012,7] FSP: Reassociating HVSI console 1
    [ 5013.227997540,7] FSP: Reassociating HVSI console 2
    
  • FSP: Disable PSI link whenever FSP tells OPAL about impending Reset/Reload

    Commit 42d5d047 fixed scenario where DPO has been initiated, but FSP went into reset before the CEC power down came in. But this is generic issue that can happen in normal shutdown path as well.

    Hence disable PSI link as soon as we detect FSP impending R/R.

  • fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORM Also, return OPAL_BUSY_EVENT on failure sending FSP_CMD_REBOOT / DEEP_REBOOT.

    We had a race condition between FSP Reset/Reload and powering down the system from the host:

    Roughly:

    # FSP Host


    1 Power on
    2 Power on 3 (inject EPOW)
    4 (trigger FSP R/R)
    5 Processes EPOW event, starts shutting down 6 calls OPAL_CEC_POWER_DOWN 7 (is still in R/R)
    8 gets OPAL_INTERNAL_ERROR, spins in opal_poll_events 9 (FSP comes back)
    10 spinning in opal_poll_events 11 (thinks host is running)

    The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload error path for fsp_sync_msg() is to return -1, which means we give the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API docs give us the opportunity to return OPAL_BUSY when trying again later may be successful, and we're ambiguous as to if you should retry on OPAL_INTERNAL_ERROR.

    For reference, the linux code looks like this: :

    static void __noreturn pnv_power_off(void)
    {
            long rc = OPAL_BUSY;
    
            pnv_prepare_going_down();
    
            while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
                    rc = opal_cec_power_down(0);
                    if (rc == OPAL_BUSY_EVENT)
                            opal_poll_events(NULL);
                    else
                            mdelay(10);
            }
            for (;;)
                    opal_poll_events(NULL);
    }
    

    Which means that practically our only option is to return OPAL_BUSY or OPAL_BUSY_EVENT.

    We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're running pollers to communicate with the FSP and do the final bits of Reset/Reload handling before we power off the system.

v5.9-rc1

6 years ago

skiboot-5.9-rc1

skiboot v5.9-rc1 was released on Wednesday October 11th 2017. It is the first release candidate of skiboot 5.9, which will become the new stable release of skiboot following the 5.8 release, first released August 31st 2017.

skiboot v5.9-rc1 contains all bug fixes as of skiboot-5.4.7 and skiboot-5.1.21 (the currently maintained stable releases). We do not currently expect to do any 5.8.x stable releases.

For how the skiboot stable releases work, see stable-rules for details.

The current plan is to cut the final 5.9 by October 17th, with skiboot 5.9 being for all POWER8 and POWER9 platforms in op-build v1.20 (Due October 18th). This release will be targetted to early POWER9 systems.

Over skiboot-5.8, we have the following changes:

New Features

POWER8

  • fast-reset by default (if possible)

    Currently, this is limited to POWER8 systems.

    A normal reboot will, rather than doing a full IPL, go through a fast reboot procedure. This reduces the "reboot to petitboot" time from minutes to a handful of seconds.

POWER9

  • POWER9 power management during boot

    Less power should be consumed during boot.

  • OPAL_SIGNAL_SYSTEM_RESET for POWER9

    This implements OPAL_SIGNAL_SYSTEM_RESET, using scom registers to quiesce the target thread and raise a system reset exception on it. It has been tested on DD2 with stop0 ESL=0 and ESL=1 shallow power saving modes.

    DD1 is not implemented because it is sufficiently different as to make support difficult.

  • Enable deep idle states for POWER9

    • SLW: Add support for p9_stop_api

      p9_stop_api's are used to set SPR state on a core wakeup form a deeper low power state. p9_stop_api uses low level platform formware and self-restore microcode to restore the sprs to requested values.

      Code is taken from : https://github.com/open-power/hostboot/tree/master/src/import/chips/p9/procedures/utils/stopreg

    • SLW: Removing timebase related flags for stop4

      When a core enters stop4, it does not loose decrementer and time base. Hence removing flags OPAL_PM_DEC_STOP and OPAL_PM_TIMEBASE_STOP.

    • SLW: Allow deep states if homer address is known

      Use a common variable has_wakeup_engine instead of has_slw to tell if the:

      • SLW image is populated in case of power8
      • CME image is populated in case of power9

      Currently we expect CME to be loaded if homer address is known ( except for simulators)

    • SLW: Configure self-restore for HRMOR

      Make a stop api call using libpore to restore HRMOR register. HRMOR needs to be cleared so that when thread exits stop, they arrives at linux system_reset vector (0x100).

    • SLW: Add opal_slw_set_reg support for power9

    This OPAL call is made from Linux to OPAL to configure values in various SPRs after wakeup from a deep idle state.

  • PHB4: CAPP recovery

    CAPP recovery is initiated when a CAPP Machine Check is detected. The capp recovery procedure is initiated via a Hypervisor Maintenance interrupt (HMI).

    CAPP Machine Check may arise from either an error that results in a PHB freeze or from an internal CAPP error with CAPP checkstop FIR action. An error that causes a PHB freeze will result in the link down signal being asserted. The system continues running and the CAPP and PSL will be re-initialized.

    This implements CAPP recovery for POWER9 systems

  • Add wafer-location property for POWER9

    Extract wafer-location from ECID and add property under xscom node.

    • bits 64:71 are the chip x location (7:0)
    • bits 72:79 are the chip y location (7:0)

    Sample output: :

    [root@wsp xscom@623fc00000000]# lsprop ecid
    ecid             019a00d4 03100718 852c0000 00fd7911
    [root@wsp xscom@623fc00000000]# lsprop wafer-location
    wafer-location   00000085 0000002c
    
  • Add wafer-id property for POWER9

    Wafer id is derived from ECID data.

    • bits 4:63 are the wafer id ( ten 6 bit fields each containing a code)

    Sample output: :

    [root@wsp xscom@623fc00000000]# lsprop ecid
    ecid             019a00d4 03100718 852c0000 00fd7911
    [root@wsp xscom@623fc00000000]# lsprop wafer-id
    wafer-id         "6Q0DG340SO"
    
  • Add ecid property under xscom node for POWER9. Sample output: :

    [root@wsp xscom@623fc00000000]# lsprop ecid
    ecid             019a00d4 03100718 852c0000 00fd7911
    
  • Add ibm,firmware-versions device tree node

    In P8, hostboot provides mini device tree. It contains /ibm,firmware-versions node which has various firmware component version details.

    In P9, OPAL is building device tree. This patch adds support to parse VERSION section of PNOR and create /ibm,firmware-versions device tree node.

    Sample output: :

    /sys/firmware/devicetree/base/ibm,firmware-versions # lsprop .
    occ              "6a00709"
    skiboot          "v5.7-rc1-p344fb62"
    buildroot        "2017.02.2-7-g23118ce"
    capp-ucode       "9c73e9f"
    petitboot        "v1.4.3-p98b6d83"
    sbe              "02021c6"
    open-power       "witherspoon-v1.17-128-gf1b53c7-dirty"
    ....
    ....
    

POWER9

  • Disable Transactional Memory on Power9 DD 2.1

    Update pa_features_p9[] to disable TM (Transactional Memory). On DD 2.1 TM is not usable by Linux without other workarounds, so skiboot must disable it.

  • xscom: Do not print error message for 'chiplet offline' return values

    xscom_read/write operations returns CHIPLET_OFFLINE when chiplet is offline. Some multicast xscom_read/write requests from HBRT results in xscom operation on offline chiplet(s) and printing below warnings in OPAL console: :

    [ 135.036327572,3] XSCOM: Read failed, ret = -14
    [ 135.092689829,3] XSCOM: Read failed, ret = -14
    

    Some SCOM users can deal correctly with this error code (notably opal-prd), so the error message is (in practice) erroneous.

  • IMC: Fix the core_imc_event_mask

    CORE_IMC_EVENT_MASK is a scom that contains bits to control event sampling for different machine state for core imc. The current event-mask setting sample events only on host kernel (hypervisor) and host userspace.

    Patch to enable the sampling of events in other machine states (like guest kernel and guest userspace).

  • IMC: Update the nest_pmus array with occ/gpe microcode uav updates

    OOC/gpe nest microcode maintains the list of individual nest units supported. Sync the recent updates to the UAV with nest_pmus array.

    For reference occ/gpr microcode link for the UAV: https://github.com/open-power/occ/blob/master/src/occ_gpe1/gpe1_24x7.h

  • Parse IOSLOT information from HDAT

    Add structure definitions that describe the physical PCIe topology of a system and parse them into the device-tree based PCIe slot description.

  • idle: user context state loss flags fix for stop states

    The "lite" stop variants with PSSCR[ESL]=PSSCR[EC]=1 do not lose user context, while the non-lite variants do (ESL: enable state loss).

    Some of the POWER9 idle states had these wrong.

CAPI

  • POWER9 DD2 update

    The CAPI initialization sequence has been updated in DD2. This patch adapts to the changes, retaining compatibility with DD1. The patch includes some changes to DD1 fix-ups as well.

  • Load CAPP microcode for POWER9 DD2.0 and DD2.1

  • capi: Mask Psl Credit timeout error for POWER9

    Mask the PSL credit timeout error in CAPP FIR Mask register bit(46). As per the h/w team this error is now deprecated and shouldn't cause any fir-action for P9.

NVLINK2

A notabale change is that we now generate the device tree description of NVLINK based on the HDAT we get from hostboot. Since Hostboot will generate HDAT based on VPD, you now MUST have correct VPD programmed or we will default to a Sequoia layout, which will lead to random problems if you are not booting a Sequoia Witherspoon planar. In the case of booting with old VPD and/or Hostboot, we print a giant scary warning in order to scare you.

  • npu2: Read slot label from the HDAT link node

    Binding GPU to emulated NPU PCI devices is done using the slot labels since the NPU devices do not have a patching slot node we need to copy the label in here.

  • npu2: Copy link speed from the npu HDAT node

    This needs to be in the PCI device node so the speed of the NVLink can be passed to the GPU driver.

  • npu2: hw-procedures: Add settings to PHY_RESET

    Set a few new values in the PHY_RESET procedure, as specified by our updated programming guide documentation.

  • Parse NVLink information from HDAT

    Add the per-chip structures that descibe how the A-Bus/NVLink/OpenCAPI phy is configured. This generates the npu@xyz nodes for each chip on systems that support it.

  • npu2: Add vendor cap for IRQ testing

    Provide a way to test recoverable data link interrupts via a new vendor capability byte.

  • npu2: Enable recoverable data link (no-stall) interrupts

    Allow the NPU2 to trigger "recoverable data link" interrupts.

  • npu2: Implement basic FLR (Function Level Reset)

  • npu2: hw-procedures: Update PHY DC calibration procedure

  • npu2: hw-procedures: Change rx_pr_phase_step value

XIVE

  • xive: Fix opal_xive_dump_tm() to access W2 properly. The HW only supported limited access sizes.

  • xive: Make opal_xive_allocate_irq() properly try all chips

    When requested via OPAL_XIVE_ANY_CHIP, we need to try all chips. We first try the current one (on which the caller sits) and if that fails, we iterate all chips until the allocation succeeds.

  • xive: Fix initialization & cleanup of HW thread contexts

    Instead of trying to "pull" everything and clear VT (which didn't work and caused some FIRs to be set), instead just clear and then set the PTER thread enable bit. This has the side effect of completely resetting the corresponding thread context.

    This fixes the spurrious XIVE FIRs reported by PRD and fircheck

  • xive: Add debug option for detecting misrouted IPI in emulation

    This is high overhead so we don't enable it by default even in debug builds, it's also a bit messy, but it allowed me to detect and debug a locking issue earlier so it can be useful.

  • xive: Increase the interrupt "gap" on debug builds

    We normally allocate IPIs from 0x10. Make that 0x1000 on debug builds to limit the chances of overlapping with Linux interrupt numbers which makes debugging code that confuses them easier.

    Also add a warning in emulation if we get an interrupt in the queue whose number is below the gap.

  • xive: Fix locking around cache scrub & watch

    Thankfully the missing locking only affects debug code and init code that doesn't run concurrently. Also adds a DEBUG option that checks the lock is properly held.

  • xive: Workaround HW issue with scrub facility

    Without this, we sometimes don't observe from a CPU the values written to the ENDs or NVTs via the cache watch.

  • xive: Add exerciser for cache watch/scrub facility in DEBUG builds

  • xive: Make assertion in xive_eq_for_target() more informative

  • xive: Add debug code to check initial cache updates

  • xive: Ensure pressure relief interrupts are disabled

    We don't use them and we hijack the VP field with their configuration to store the EQ reference, so make sure the kernel or guest can't turn them back on by doing MMIO writes to ACK#

  • xive: Don't try setting the reserved ACK# field in VPs

    That doesn't work, the HW doesn't implement it in the cache watch facility anyway.

  • xive: Remove useless memory barriers in VP/EQ inits

    We no longer update "live" memory structures, we use a temporary copy on the stack and update the actual memory structure using the cache watch, so those barriers are pointless.

PHB4

  • phb4: Mask RXE_ARB: DEC Stage Valid Error

    Change the inits to mask out the RXE ARB: DEC Stage Valid Error (bit 370. This has been a fatal error but should be informational only.

    This update will be in the next version of the phb4 workbook.

  • phb4: Add additional adapter to retrain whitelist

    The single port version of the ConnectX-5 has a different device ID 0x1017. Updated descriptions to match pciutils database.

  • PHB4: Default to PCIe GEN3 on POWER9 DD2.00

    You can use the NVRAM override for DD2.00 screened parts.

  • phb4: Retrain link if degraded

    On P9 Scale Out (Nimbus) DD2.0 and Scale in (Cumulus) DD1.0 (and below) the PCIe PHY can lockup causing training issues. This can cause a degradation in speed or width in ~5% of training cases (depending on the card). This is fixed in later chip revisions. This issue can also cause PCIe links to not train at all, but this case is already handled.

    This patch checks if the PCIe link has trained optimally and if not, does a full PHB reset (to fix the PHY lockup) and retrain.

    One complication is some devices are known to train degraded unless device specific configuration is performed. Because of this, we only retrain when the device is in a whitelist. All devices in the current whitelist have been testing on a P9DSU/Boston, ZZ and Witherspoon.

    We always gather information on the link and print it in the logs even if the card is not in the whitelist.

    For testing purposes, there's an nvram to retry all PCIe cards and all P9 chips when a degraded link is detected. The new option is 'pci-retry-all=true' which can be set using: nvram -p ibm,skiboot --update-config pci-retry-all=true. This option may increase the boot time if used on a badly behaving card.

IBM FSP platforms

  • FSP/NVRAM: Handle "get vNVRAM statistics" command

    FSP sends MBOX command (cmd : 0xEB, subcmd : 0x05, mod : 0x00) to get vNVRAM statistics. OPAL doesn't maintain any such statistics. Hence return FSP_STATUS_INVALID_SUBCMD.

    Fixes these messages appearing in the OPAL log: :

    [16944.384670488,3] FSP: Unhandled message eb0500
    [16944.474110465,3] FSP: Unhandled message eb0500
    [16945.111280784,3] FSP: Unhandled message eb0500
    [16945.293393485,3] FSP: Unhandled message eb0500
    
  • fsp: Move common prints to trace

    These two prints just end up filling the skiboot logs on any machine that's been booted for more than a few hours.

    They have never been useful, so make them trace level. They were: ::

    : SURV: Received heartbeat acknowledge from FSP SURV: Sending the heartbeat command to FSP

BMC based systems

  • hw/lpc-uart: read from RBR to clear character timeout interrupts

    When using the aspeed SUART, we see a condition where the UART sends continuous character timeout interrupts. This change adds a (heavily commented) dummy read from the RBR to clear the interrupt condition on init.

    This was observed on p9dsu systems, but likely applies to other systems using the SUART.

  • astbmc: Add methods for handing Device Tree based slots e.g. ones from HDAT on POWER9.

General

  • ipmi: Convert common debug prints to trace

    OPAL logs messages for every IPMI request from host. Sometime OPAL console is filled with only these messages. This path is pretty stable now and we have enough logs to cover bad path. Hence lets convert these debug message to trace/info message. Examples are: :

    [ 1356.423958816,7] opal_ipmi_recv(cmd: 0xf0 netfn: 0x3b resp_size: 0x02)
    [ 1356.430774496,7] opal_ipmi_send(cmd: 0xf0 netfn: 0x3a len: 0x3b)
    [ 1356.430797392,7] BT: seq 0x20 netfn 0x3a cmd 0xf0: Message sent to host
    [ 1356.431668496,7] BT: seq 0x20 netfn 0x3a cmd 0xf0: IPMI MSG done
    
  • libflash/file: Handle short read()s and write()s correctly

    Currently we don't move the buffer along for a short read() or write() and nor do we request only the remaining amount.

  • hw/p8-i2c: Rework timeout handling

    Currently we treat a timeout as a hard failure and will automatically fail any transations that hit their timeout. This results in unnecessarily failing I2C requests if interrupts are dropped, etc. Although these are bad things that we should log we can handle them better by checking the actual hardware status and completing the transation if there are no real errors. This patch reworks the timeout handling to check the status and continue the transaction if it can. if it can while logging an error if it detects a timeout due to a dropped interrupt.

  • core/flash: Only expect ELF header for BOOTKERNEL partition flash resource

    When loading a flash resource which isn't signed (secure and trusted boot) and which doesn't have a subpartition, we assume it's the BOOTKERNEL since previously this was the only such resource. Thus we also assumed it had an ELF header which we parsed to get the size of the partition rather than trusting the actual_size field in the FFS header. A previous commit (9727fe3 DT: Add ibm,firmware-versions node) added the version resource which isn't signed and also doesn't have a subpartition, thus we expect it to have an ELF header. It doesn't so we print the error message "FLASH: Invalid ELF header part VERSION".

    It is a fluke that this works currently since we load the secure boot header unconditionally and this happen to be the same size as the version partition. We also don't update the return code on error so happen to return OPAL_SUCCESS.

    To make this explicitly correct; only check for an ELF header if we are loading the BOOTKERNEL resource, otherwise use the partition size from the FFS header. Also set the return code on error so we don't erroneously return OPAL_SUCCESS. Add a check that the resource will fit in the supplied buffer to prevent buffer overrun.

  • flash: Support adding the no-erase property to flash

    The mbox protocol explicitly states that an erase is not required before a write. This means that issuing an erase from userspace, through the mtd device, and back returns a successful operation that does nothing. Unfortunately, this makes userspace tools unhappy. Linux MTD devices support the MTD_NO_ERASE flag which conveys that writes do not require erases on the underlying flash devices. We should set this property on all of our devices which do not require erases to be performed.

    NOTE: This still requires a linux kernel component to set the MTD_NO_ERASE flag from the device tree property.

Utilities

  • external/gard: Clear entire guard partition instead of entry by entry

    When using the current implementation of the gard tool to ecc clear the entire GUARD partition it is done one gard record at a time. While this may be ok when accessing the actual flash this is very slow when done from the host over the mbox protocol (on the order of 4 minutes) because the bmc side is required to do many read, erase, writes under the hood.

    Fix this by rewriting the gard tool reset_partition() function. Now we allocate all the erased guard entries and (if required) apply ecc to the entire buffer. Then we can do one big erase and write of the entire partition. This reduces the time to clear the guard partition to on the order of 4 seconds.

  • opal-prd: Fix opal-prd command line options

    HBRT OCC reset interface depends on service processor type.

    • FSP: reset_pm_complex()
    • BMC: process_occ_reset()

    We have both occ and pm-complex command line interfaces. This patch adds support to dispaly appropriate message depending on system type.

    SP Command Action


    FSP opal-prd occ display error message FSP opal-prd pm-complex Call pm_complex_reset() BMC opal-prd occ Call process_occ_reset() BMC opal-prd pm-complex display error message

  • opal-prd: detect service processor type and then make appropriate occ reset call.

  • pflash: Fix erase command for unaligned start address

    The erase_range() function handles erasing the flash for a given start address and length, and can handle an unaligned start address and length. However in the unaligned start address case we are incorrectly calculating the remaining size which can lead to incomplete erases.

    If we're going to update the remaining size based on what the start address was then we probably want to do that before we overide the origin start address. So rearrange the code so that this is indeed the case.

  • external/gard: Print an error if run on an FSP system

Simulators

  • mambo: Add mambo socket program

    This adds a program that can be run inside a mambo simulator in linux userspace which enables TCP sockets to be proxied in and out of the simulator to the host.

    Unlike mambo bogusnet, it's requires no linux or skiboot specific drivers/infrastructure to run.

    Run inside the simulator:

    • to forward host ssh connections to sim ssh server: ./mambo-socket-proxy -h 10022 -s 22, then connect to port 10022 on your host with ssh -p 10022 localhost
    • to allow http proxy access from inside the sim to local http proxy: ./mambo-socket-proxy -b proxy.mynetwork -h 3128 -s 3128

    Multiple connections are supported.

  • idle: disable stop*_lite POWER9 idle states for Mambo platform

    Mambo prior to Mambo.7.8.21 had a bug where the stop idle instruction with PSSCR[ESL]=PSSCR[EC]=0 would resume with MSR set as though it had taken a system reset interrupt.

    Linux currently executes this instruction with MSR already set that way, so the problem went unnoticed. A proposed patch to Linux changes that, and causes the idle code to crash. Work around this by disabling lite stop states for the mambo platform for now.

v5.9-rc2

6 years ago

skiboot-5.9-rc2

skiboot v5.9-rc2 was released on Monday October 16th 2017. It is the second release candidate of skiboot 5.9, which will become the new stable release of skiboot following the 5.8 release, first released August 31st 2017.

skiboot v5.9-rc2 contains all bug fixes as of skiboot-5.4.8 and skiboot-5.1.21 (the currently maintained stable releases). We do not currently expect to do any 5.8.x stable releases.

For how the skiboot stable releases work, see stable-rules for details.

The current plan is to cut the final 5.9 by October 17th, with skiboot 5.9 being for all POWER8 and POWER9 platforms in op-build v1.20 (Due October 18th). This release will be targetted to early POWER9 systems.

Over skiboot-5.9-rc1, we have the following changes:

  • opal-prd: Fix memory leak

  • hdata/i2c: update the list of known i2c devs

    This updates the list of known i2c devices - as of HDAT spec v10.5e - so that they can be properly identified during the hdat parsing.

  • hdata/i2c: log unknown i2c devices

    An i2c device is unknown if either the i2c device list is outdated or the device is marked as unknown (0xFF) in the hdat.

  • opal/cpu: Mark the core as bad while disabling threads of the core.

    If any of the core fails to sync its TB during chipTOD initialization, all the threads of that core are disabled. But this does not make linux kernel to ignore the core/cpus. It crashes while bringing them up with below backtrace: :

    [   38.883898] kexec_core: Starting new kernel
    cpu 0x0: Vector: 300 (Data Access) at [c0000003f277b730]
        pc: c0000000001b9890: internal_create_group+0x30/0x304
        lr: c0000000001b9880: internal_create_group+0x20/0x304
        sp: c0000003f277b9b0
       msr: 900000000280b033
       dar: 40
     dsisr: 40000000
      current = 0xc0000003f9f41000
      paca    = 0xc00000000fe00000   softe: 0        irq_happened: 0x01
        pid   = 2572, comm = kexec
    Linux version 4.13.2-openpower1 (jenkins@p89) (gcc version 6.4.0 (Buildroot 2017.08-00006-g319c6e1)) #1 SMP Wed Sep 20 05:42:11 UTC 2017
    enter ? for help
    [c0000003f277b9b0] c0000000008a8780 (unreliable)
    [c0000003f277ba50] c00000000041c3ac topology_add_dev+0x2c/0x40
    [c0000003f277ba70] c00000000006b078 cpuhp_invoke_callback+0x88/0x170
    [c0000003f277bac0] c00000000006b22c cpuhp_up_callbacks+0x54/0xb8
    [c0000003f277bb10] c00000000006bc68 cpu_up+0x11c/0x168
    [c0000003f277bbc0] c00000000002f0e0 default_machine_kexec+0x1fc/0x274
    [c0000003f277bc50] c00000000002e2d8 machine_kexec+0x50/0x58
    [c0000003f277bc70] c0000000000de4e8 kernel_kexec+0x98/0xb4
    [c0000003f277bce0] c00000000008b0f0 SyS_reboot+0x1c8/0x1f4
    [c0000003f277be30] c00000000000b118 system_call+0x58/0x6c
    
  • hw/imc: pause microcode at boot

    IMC nest counters has both in-band (ucode access) and out of band access to it. Since not all nest counter configurations are supported by ucode, out of band tools are used to characterize other configuration.

    So it is prefer to pause the nest microcode at boot to aid the nest out of band tools. If the ucode not paused and OS does not have IMC driver support, then out to band tools will race with ucode and end up getting undesirable values. Patch to check and pause the ucode at boot.

    OPAL provides APIs to control IMC counters. OPAL_IMC_COUNTERS_INIT is used to initialize these counters at boot. OPAL_IMC_COUNTERS_START and OPAL_IMC_COUNTERS_STOP API calls should be used to start and pause these IMC engines. doc/opal-api/opal-imc-counters.rst details the OPAL APIs and their usage.

  • xive: Fix VP free block group mode false-positive parameter check

    The check to ensure the buddy allocation idx is aligned to its allocation order was not taking into account the allocation split. This would result in opal_xive_free_vp_block failures despite giving the same value as returned by opal_xive_alloc_vp_block.

    E.g., starting then stopping 4 KVM guests gives the following pattern in the host: :

    opal_xive_alloc_vp_block(5)=0x45000020
    opal_xive_alloc_vp_block(5)=0x45000040
    opal_xive_alloc_vp_block(5)=0x45000060
    opal_xive_alloc_vp_block(5)=0x45000080
    opal_xive_free_vp_block(0x45000020)=-1
    opal_xive_free_vp_block(0x45000040)=0
    opal_xive_free_vp_block(0x45000060)=-1
    opal_xive_free_vp_block(0x45000080)=0
    
  • hw/p8-i2c: Fix deadlock in p9_i2c_bus_owner_change

    When debugging a system where Linux was taking soft lockup errors with two CPUs stuck in OPAL:

    CPU0 CPU1


    lock
    p8_i2c_recover
    opal_handle_interrupt
    sync_timer cancel_timer
    p9_i2c_bus_o wner_change occ_p9_inter rupt xive_source interrupt opal_handle interrupt

    p8_i2c_recover() is a timer, and is stuck trying to take master->lock. p9_i2c_bus_owner_change() has taken master->lock, but then is stuck waiting for all timers to complete. We deadlock.

    Fix this by using cancel_timer_async().

  • FSP/CONSOLE: Limit number of error logging

    Commit c8a7535f (FSP/CONSOLE: Workaround for unresponsive ipmi daemon) added error logging when buffer is full. In some corner cases kernel may call this function multiple time and we may endup logging error again and again.

    This patch fixes it by generating error log only once.

  • FSP/CONSOLE: Fix fsp_console_write_buffer_space() call

    Kernel calls fsp_console_write_buffer_space() to check console buffer space availability. If there is enough buffer space to write data, then kernel will call fsp_console_write() to write actual data.

    In some extreme corner cases (like one explained in commit c8a7535f) console becomes full and this function returns 0 to kernel (or space available in console buffer < next incoming data size). Kernel will continue retrying until it gets enough space. So we will start seeing RCU stalls.

    This patch keeps track of previous available space. If previous space is same as current means not enough space in console buffer to write incoming data. It may be due to very high console write operation and slow response from FSP -OR- FSP has stopped processing data (ex: because of ipmi daemon died). At this point we will start timer with timeout of SER_BUFFER_OUT_TIMEOUT (10 secs). If situation is not improved within 10 seconds means something went bad. Lets return OPAL_RESOURCE so that kernel can drop console write and continue.

  • FSP/CONSOLE: Close SOL session during R/R

    Presently we are not closing SOL and FW console sessions during R/R. Host will continue to write to SOL buffer during FSP R/R. If there is heavy console write operation happening during FSP R/R (like running top command inside console), then at some point console buffer becomes full. fsp_console_write_buffer_space() returns 0 (or less than required space to write data) to host. While one thread is busy writing to console, if some other threads tries to write data to console we may see RCU stalls (like below) in kernel. :

    [ 2082.828363] INFO: rcu_sched detected stalls on CPUs/tasks: { 32} (detected by 16, t=6002 jiffies, g=23154, c=23153, q=254769)
    [ 2082.828365] Task dump for CPU 32:
    [ 2082.828368] kworker/32:3    R  running task        0  4637      2 0x00000884
    [ 2082.828375] Workqueue: events dump_work_fn
    [ 2082.828376] Call Trace:
    [ 2082.828382] [c000000f1633fa00] [c00000000013b6b0] console_unlock+0x570/0x600 (unreliable)
    [ 2082.828384] [c000000f1633fae0] [c00000000013ba34] vprintk_emit+0x2f4/0x5c0
    [ 2082.828389] [c000000f1633fb60] [c00000000099e644] printk+0x84/0x98
    [ 2082.828391] [c000000f1633fb90] [c0000000000851a8] dump_work_fn+0x238/0x250
    [ 2082.828394] [c000000f1633fc60] [c0000000000ecb98] process_one_work+0x198/0x4b0
    [ 2082.828396] [c000000f1633fcf0] [c0000000000ed3dc] worker_thread+0x18c/0x5a0
    [ 2082.828399] [c000000f1633fd80] [c0000000000f4650] kthread+0x110/0x130
    [ 2082.828403] [c000000f1633fe30] [c000000000009674] ret_from_kernel_thread+0x5c/0x68
    

    Hence lets close SOL (and FW console) during FSP R/R.

  • FSP/CONSOLE: Do not associate unavailable console

    Presently OPAL sends associate/unassociate MBOX command for all FSP serial console (like below OPAL message). We have to check console is available or not before sending this message. :

    [ 5013.227994012,7] FSP: Reassociating HVSI console 1
    [ 5013.227997540,7] FSP: Reassociating HVSI console 2
    
  • FSP: Disable PSI link whenever FSP tells OPAL about impending R/R

    Commit 42d5d047 fixed scenario where DPO has been initiated, but FSP went into reset before the CEC power down came in. But this is generic issue that can happen in normal shutdown path as well.

    Hence disable PSI link as soon as we detect FSP impending R/R.

  • fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORM Also, return OPAL_BUSY_EVENT on failure sending FSP_CMD_REBOOT / DEEP_REBOOT.

    We had a race condition between FSP Reset/Reload and powering down the system from the host:

    Roughly:

    # FSP Host


    1 Power on
    2 Power on 3 (inject EPOW)
    4 (trigger FSP R/R)
    5 Processes EPOW event, starts shutting down 6 calls OPAL_CEC_POWER_DOWN 7 (is still in R/R)
    8 gets OPAL_INTERNAL_ERROR, spins in opal_poll_events 9 (FSP comes back)
    10 spinning in opal_poll_events 11 (thinks host is running)

    The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload error path for fsp_sync_msg() is to return -1, which means we give the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API docs give us the opportunity to return OPAL_BUSY when trying again later may be successful, and we're ambiguous as to if you should retry on OPAL_INTERNAL_ERROR.

    For reference, the linux code looks like this: :

    static void __noreturn pnv_power_off(void)
    {
            long rc = OPAL_BUSY;
    
            pnv_prepare_going_down();
    
            while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
                    rc = opal_cec_power_down(0);
                    if (rc == OPAL_BUSY_EVENT)
                            opal_poll_events(NULL);
                    else
                            mdelay(10);
            }
            for (;;)
                    opal_poll_events(NULL);
    }
    

    Which means that practically our only option is to return OPAL_BUSY or OPAL_BUSY_EVENT.

    We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're running pollers to communicate with the FSP and do the final bits of Reset/Reload handling before we power off the system.

v5.9-rc3

6 years ago

skiboot-5.9-rc3

skiboot v5.9-rc3 was released on Wednesday October 18th 2017. It is the third release candidate of skiboot 5.9, which will become the new stable release of skiboot following the 5.8 release, first released August 31st 2017.

skiboot v5.9-rc3 contains all bug fixes as of skiboot-5.4.8 and skiboot-5.1.21 (the currently maintained stable releases). We do not currently expect to do any 5.8.x stable releases.

For how the skiboot stable releases work, see stable-rules for details.

The current plan is to cut the final 5.9 by October 20th, with skiboot 5.9 being for all POWER8 and POWER9 platforms in op-build v1.20 (Due October 18th). This release will be targetted to early POWER9 systems.

Over skiboot-5.9-rc2, we have the following changes:

  • Improvements to vpd device tree entries

    Previously we would miss some properties

  • Revert "npu2: Add vendor cap for IRQ testing"

    This reverts commit 9817c9e29b6fe00daa3a0e4420e69a97c90eb373 which seems to break setting the PCI dev flag and the link number in the PCIe vendor specific config space. This leads to the device driver attempting to re-init the DL when it shouldn't which can cause HMI's.

  • hw/imc: Fix IMC Catalog load for DD2.X processors

  • cpu: Add OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED

    Add a new CPU reinit flag, "TM Suspend Disabled", which requests that CPUs be configured so that TM (Transactional Memory) suspend mode is disabled.

    Currently this always fails, because skiboot has no way to query the state. A future hostboot change will add a mechanism for skiboot to determine the status and return an appropriate error code.

v5.9-rc4

6 years ago

skiboot-5.9-rc4

skiboot v5.9-rc4 was released on Thursday October 19th 2017. It is the fourth release candidate of skiboot 5.9, which will become the new stable release of skiboot following the 5.8 release, first released August 31st 2017.

skiboot v5.9-rc4 contains all bug fixes as of skiboot-5.4.8 and skiboot-5.1.21 (the currently maintained stable releases). We do not currently expect to do any 5.8.x stable releases.

For how the skiboot stable releases work, see stable-rules for details.

The current plan is to cut the final 5.9 by October 20th, with skiboot 5.9 being for all POWER8 and POWER9 platforms in op-build v1.20 (Due October 18th, so we're running a bit behind there). This release will be targetted to early POWER9 systems.

Over skiboot-5.9-rc3, we have the following changes:

  • phb4: Fix PCIe GEN4 on DD2.1 and above

    In this change:

    : eef0e197ab PHB4: Default to PCIe GEN3 on POWER9 DD2.00

    We clamped DD2.00 parts to GEN3 but unfortunately this change also applies to DD2.1 and above.

    This fixes this to only apply to DD2.00.

  • occ-sensors : Add OCC inband sensor region to exports (useful for debugging)

Two SRESET fixes:

  • core: direct-controls: Fix clearing of special wakeup

    'special_wakeup_count' is incremented on successfully asserting special wakeup. So we will never clear the special wakeup if we check 'special_wakeup_count' to be zero. Fix this issue by checking the 'special_wakeup_count' to 1 in dctl_clear_special_wakeup().

  • core/direct-controls: increase special wakeup timeout on POWER9

    Some instances have been observed where the special wakeup assert times out. The current timeout is too short for deeper sleep states. Hostboot uses 100ms, so match that.