Hi all, first caveat I have never built a PC before and just did so for the first time so I am still learning about how things work, so if I say anything wrong please forgive me. I did build it with a friend who has done it many times before so I am not too worried that I did something drastically wrong.
The issue I am having is that my set up seems to randomly crash to a black screen. Sometimes this happens before I can even log in, usually it takes around 15-20 minutes, sometimes it takes 1-2 hours. I have to manually press the power button to turn it off and then power it back on again. Sometimes the audio from the computer itself is distorted/delayed when this happens. The interesting this is that this does not happen when I am playing anything intensive, in fact it only seems to happen while I am idling, surfing the web, watching videos, calling a friend. etc. I have been checking the the temperature of the graphics card and CPU regularly and they are definitely not overheating. One other random thing to note that is probably not relevant is that when I turn the power supply itself on (not the computer) the LEDs on the fans flash for a brief moment and then turn off. Everything then runs as normal after that. Windows event viewer always returns the same errors when this happens: Event ID 63, Event ID 41, keyword (70368744177664),(2) and bugcheckCode 278.
I have tried making sure drivers are up to date, unplugging and replugging in all the cords to the power supply and socket itself, monitoring the temperatures, and turning off automatic restarts on windows (I am planning on testing with a different power supply soon as well). There is only one single fix that seems to solve the issue - opening the NIVIDIA control panel and setting the power settings to 'prefer max performance'. This leads me to believe the problem is with the GPU, but wanted to consult and see what people suggest here before I RMA anything. I could just leave it on 'prefer max performance' at all times I guess but that seems like a bandaid fix and plus it would stress the system long term.
Thanks for the help ahead of time!
Processor: 13th gen Intel(r) Core (TM) i5-13600K 3.50 Ghz
Graphics Card: Asus GeForce RTX 4090
PSU: EVGA SuperNOVA 1000 GT 80 Plus 1000W Fully Modular Power Supply
Ram Sticks: Trident Z5 RGB x 2 (32gb total)
Memory: 980 Pro PCIe 4.0 NVMe M.2 SSD
Motherboard: Gigabyte Z790 Aero G
Cooler: TH240 ARGB Sync
Case: Ceres 500 TG ARGB
Sounds like a power curve issue. First thing I'd check is it to see if ASUS had a VBIOS update available for the card. It's odd the system restarts rather than the GPU crashing and recovering though. I suspect there's a BSOD, but the video card isn't functioning when this happens. Do you have files in C:\Windows\minidump?
Ok, some updates:
None of this resolved the issue: however, rather than crashing to a black screen what happens now is the computer freezes and I have to manually restart it, sometimes using the power button but other times by restarting the PSU itself. However, I think your intuition was a good one because I checked the minidumps folder and I do indeed have files in there. I switched the power settings back to normal from performance mode and let it freeze a number of times to see if there were differences in the files being generated (I read them in using windows debugger). They all generate the same error having to do with 'GenuineIntel.sys' with bugcheck code 124. All my searches indicate that this probably has to do with the CPU, but let me know what you think. I am attaching a text file with the full readout from the debugger analysis of that minidump file.
Thanks again for your help.
WHEA uncorrectable, error on the PCIe bus. No surprises there since you've confirmed the GPU is the issue. There's a setting under System - Display - Graphics Settings - Hardware-Accelerated GPU Scheduling. Is this on or off?
This setting is currently on
What if you try HAGS off with the power setting on normal?
Tried that this morning as per your advice. About 10 minutes in to switching HAGs off, putting the power setting back to normal, and then restarting the computer, I get the freeze and have to reset the PSU to get the system back on.
Not related then. Just seems to be an issue related to the voltage curve on the normal setting. If it's not a bug with the VBIOS then it's likely the card itself. You could tweak it with Afterburner, but if this is a new card under warranty, I'd have it repaired or replaced. I feel if you did tweak it, it's likely to get gradually worse over time, or outright fail.
I registered here just so I can confirm that I too am having this exact same problem. But the problem seems to only occur while using my 4090 FE on my 13700k/z690 hero build. No such problems occur with my 4090 installed on my 9700k/z390 Gaming Pro Carbon build. Also, I did try my 3090 on my 13700k build and I have no such issues. So it appears to only be happening when using the 4090 with my z690 hero.
What jumps out at me is the Z390 being PCie 3.0 and the Z690 supporting up to 5.0. Try the 4090 FE on the Z690, with the PCie set to Gen3 in the BIOS.
Just as an update, I RMAd the card and got a new one (this time a Zotec 4090) and the exact same issue happened immediately. So unless I got two faulty cards in a row the GPU hardware doesn't seem to be the problem. At a loss about what to try at this point
Found evidence online of other issues encountering the issue with other cards in combination with a Gigabyte Z790 board. Do you have any motherboard management or RGB software installed from Gigabyte? Have you tried without that software installed?
Yes I do have some motherboard management/RGB software installed. Are you just saying I should try uninstalling gigabyte control center software/other gigabyte management software or go beyond that and uninstall drivers?
It's worth trying to uninstall the motherboard management/RGB software. Or doing a clean install and not installing that software at all to see if it's in anyway a factor.
@PowerSpec_MikeW Reimaged the entire system yesterday and then did not install anything except a web browser to see what would happen. Looks like the same issue is occurring, but instead of freezing the computer goes to a black screen and has to be powered on again (or restarts on its own, not clear which yet. Will update after more monitoring). This was the initial behavior that happened until I tried updating all drivers/updating to windows 11, upon which the freezing would occur. In the windows event logs it indicates the same error irrespective of whether it was freezing or the automatic restarts.
Tells us the board is losing power very briefly and restarts. Seems to be a power saving feature that's dropping a component below the tolerance. You're basically disabling power saving on the card and that fixes it, but it may not be the card. It may be keeping something else 'alive'. I don't know that board and I haven't messed with Gigabyte in a long time so I can't provide too much help here. I'd try looking for a static VCore option. Maybe set a static voltage on your CPU instead of auto and see if the issue reoccurs. I'd say it's a worst solution than the one you already found, but at least it would confirm more specifically what the problem is.
What would be a good static voltage to set the CPU at? I am not sure exactly how high to go to try to circumvent the issue
For a 13600K, it won't take much. I'd recommend dialing in a static voltage of 1.3V and see where that takes you. I wouldn't push beyond 1.35V, thermals might become a concern with a 240 AIO.
While you are at it, dial in 1.2V on the VCCSA or System Agent voltage rail. The fact that you were seeing WHEA errors on PCIe bus before leads me to believe something might be up with the system agent. VCCSA or System Agent is related to your PCIe subdomain so a little extra voltage might help there. Still, if you have a warranty on the processor or board, I'd probably recommend replacing the components. At the end of the day, you shouldn't have to dial in static voltages to remain stable. Something is likely up.
What do you all think about testing out the stability of the system using the integrated graphics from the CPU? I still can't tell whether the problem is an interaction between the GPU and the board or if it has to do with the CPU. Would using it through the integrated graphics help isolate the issue?
And in line with your suggestions my next step will be to test it out with the static voltages as suggested. Will update you all. I have a store warranty on the CPU so it would be easy to be replace, the motherboard would be with the manufacturer and would be a pain to replace but will do if all else fails.
It's possible, if static VCCSA/VCORE solve the issue, then the issue would seem to be an idle power issue with the CPU. However, keep in mind the IGP is on die so it may increase the CPU package power and potentially solve the issue. I think it's an interesting test, but keep the GPU installed. There should be an option in the BIOS like IGD or IGFX Multi-Monitor. Enable that, put a second monitor on the IGP and see if that keeps it 'alive'.
Submit photos and a description of your PC to our build showcase
See other custom PC builds and get some ideas for what can be done
Services starting at $149.99