Motherboard Failure..?

I built a rig back in December, and it was working great up until two weeks ago. I have the system dual-booted with both Windows and Ubuntu Linux. All of a sudden, Windows started having spurious Blue Screens, and Linux would just freeze and reboot. Over the past two weeks, the number of blue screens and kernel panics have increased to the point where neither OS will boot. In fact, not even a Windows install USB or an Arch Linux install USB will boot. The tests below lead me to believe that I’m dealing with a motherboard failure, but the incredibly strange thing is that Windows Safe Mode (with networking) boots and doesn’t show any signs of a problem. No other OS will boot or stay booted.

Info about my build:
- CPU: Intel Core i7-9700K
- Motherboard: Gigabyte Z390 UD
- RAM: G.Skill Ripjaws V 2x8GB DDR4-3200 (F4-3200C16D-16GKVB)
- GPU: MSI Armor GeForce RTX 2060 Super
- SSD: Samsung 970 EVO M.2 NVMe 500GB

Testing steps:
- Tested GPU: ran with motherboard’s onboard graphics and disconnected NVIDIA GPU - problem persists.
- Swapped RAM sockets - problem persists.
- Tried updating BIOS to version F10e - problem persists.
- Tested SSD: took SSD out and tried to boot into both Windows install USB and Arch Linux install USB - neither will boot still.
- Tested RAM: ran MemTest86+ from an Arch Linux install USB - all tests passed.
- Tested CPU: booted into Windows Safe Mode and ran Intel Processor Diagnostic Tool - all tests passed.
- Tested SSD: `chkdsk /f /r` from Windows Safe Mode - say everything is fine.
- Tested SSD: ran Samsung Magician from Windows Safe Mode - says SSD health is OK.
- Unplugged, cleaned, and replugged power connections for CPU, GPU, and motherboard - no effect.

More specific behavior I’ve noticed:
- When I could still boot, I was noticing that some programs would behave erratically. For example, when trying to compile a large library from source, GCC would sometimes have a segmentation fault, but sometimes it would succeed at compiling the same library. Git (both Linux and Windows) would sometimes report that the current directory wasn’t a git repository, when it indeed was.
- Windows Blue Screen errors I’ve gotten:
  - PAGE_FAULT_IN_NONPAGED_AREA
  - MACHINE_CHECK_EXCEPTION
  - IRQL_NOT_LESS_OR_EQUAL
  - KERNEL_SECURITY_CHECK_FAILURE
  - SYSTEM_SERVICE_EXCEPTION
- Part of a Linux kernel panic traceback:
  - kernel tried to execute NX-protected page - exploit attempt? (uid:0)
  - BUG: unable to handle page fault for address:
  - #PF: supervisor instruction fetch in kernel mode
  - #PF: error_code(0x0011) - permissions violation
  - Hardware name: Gigabyte Technology Co., Ltd. Z390 UD/Z390 UD, BIOS F8
  - ucsi_ccg 21-0008: failed to reset PPM

Anyone have any thoughts about what might be going on? To me, everything looks like it’s pointing to a motherboard failure *except* the fact that it boots successfully into Windows Safe Mode, and I want to get some other opinions before I take my build apart to exchange the motherboard.

Thanks so much!

Comments

  • TSTonyV
    TSTonyV ✭✭✭✭✭
    1000 Comments Fourth Anniversary 250 Likes 25 Answers
    Hello @emily! Welcome to the Community. 

    Your testing looks pretty thorough and I would agree, this is probably a motherboard issue. The only thing I didn't see you mention is trying another set of RAM, or running a single stick of memory at a time (assuming you have two). Besides that, you've done everything else except actually changing your components. 

    While it's odd that Windows can boot in Safe Mode specifically, it's likely that it's just not utilizing whatever resource/hardware is causing the issue when in Safe Mode. That's the only explanation I can really think of. If you purchased a replacement plan on the motherboard you can bring it back to us, otherwise I would reach out to Gigabyte for an RMA. 
  • PowerSpec_MikeW
    PowerSpec_MikeW PowerSpec Engineer
    2500 Comments Fifth Anniversary 100 Answers 250 Likes
    Everything about the description of your issue points towards RAM. Whether it's the RAM or the memory controller is more difficult to say given what you've tried. Safe mode is bare minimum boot, only essential services are loading. It's odd that it consistently works and never crashes, but it's not uncommon for a system with faulty RAM to load into safe mode fine. PE environments by comparison create and live in writable RAM volumes. With that in mind, crashing when booting to the Windows and Linux media makes sense. That's assuming that the reason you couldn't boot from the media, is that the installation would blue screen relatively quickly. Please confirm.

    I would suggest this. Test the RAM sticks individually, mainly test booting into installation media with individual sticks installed and see if you can get into the installation process. If so, run Memtest86+ again. Run tests 3 and 5 in a loop for at least several hours.
  • @TSTonyV @TSMikeW so it turns out it was actually the CPU. I replaced the motherboard on the 2-year replacement plan and the problem still persisted, and I also swapped out RAM sticks per @TSMikeW 's suggestion. On a whim, I tried disabling one of the 8 cores on the i7-9700K in the BIOS - and my system proceeded to boot normally into all operating systems.
  • TSTonyV
    TSTonyV ✭✭✭✭✭
    1000 Comments Fourth Anniversary 250 Likes 25 Answers
    Very strange, buy glad you were able to get a solution, even if it's temporary. If you have a replacement plan on the CPU you could bring that in to us, or of course check with Intel for an RMA.
We love seeing what our customers build

Submit photos and a description of your PC to our build showcase

Submit Now
Looking for a little inspiration?

See other custom PC builds and get some ideas for what can be done

View Build Showcase

SAME DAY CUSTOM BUILD SERVICE

If You Can Dream it, We Can Build it.

Services starting at $149.99