NVIDIA TDR error? Anyone seen this before?

User avatar
bnemec
Posts: 1946
Joined: Tue Mar 09, 2021 9:22 am
Answers: 10
Location: Wisconsin USA
x 2552
x 1401

NVIDIA TDR error? Anyone seen this before?

Unread post by bnemec »

Three times today my display went blank and then said no video input. I tried different ports on the card (P2200) and even the integrated ports, nothing worked. Except music was still playing so I thought maybe just display issue. So I VNC into the computer and I have the 1/4VGA display, tough going. I gathered some screen shots. This happened twice then I installed an OLDER driver, 443.66 specifically. Then shortly it happened again so I downloaded the latest 460 driver. I tried 470 but it will not install, also weird that MS kept blocking the download that it is unsafe. MS should block SW install if they're worried about "unsafe" software. I digress.

Edit: maybe coincidence, but each time I was editing sketch in SW. Adding 4th screen shot of task manager.
Edit again: task manager image is second now; it bumped the other two down. I struggle with inserting images, sorry.

Screen shots from VNC while video output. The first was up when I opened VNC. Second is from trying to open the nvidia control panel. Wondering if something crashed the PGU (is that possible?) I looked at system info and found the P2200 must be off-line. I tried a few things each time this happened, only way I've found to get the card back online is restarting computer. Web search suggestion was Ctrl + Win + Shift + B which only made a beep.


2021-11-08 14_57_06-.png
2021-11-08 09_18_09-Settings.png
Attachments
2021-11-08 15_01_53-.png
2021-11-08 15_04_12-Greenshot.png
by Certified » Wed Nov 10, 2021 9:40 am
On my system I had not manually changed it, but TdrDelay was already set to 10 seconds instead of the default of 2 listed in the Microsoft article. I've had both NVIDIA and AMD pro cards in this workstation so I am suspecting one of the gfx vendor's driver tweaks it up a bit from the default upon install. The worst changing it to 10 seconds will do is make you wait an additional 8 seconds for recovery in the event the gfx driver does actually crash hard. In the best case, it will stop the driver from force crashing if the card is hung on actual computations for more than the 2 sec. You are right that it is probably a band aid though. The gfx driver should never be hanging for more than 2 seconds on a simple sketch under normal circumstances. Since it is happening even under relatively light load, see if there is a heat based cause. Ive mostly seen it on thin laptops that have their heatsinks caked in dust, but if the card isn't getting the cooling it needs, it could be thermal throttling down to protect itself. A can of compressed air can do wonders.

Another thing to check: Is your dedicated vram near capacity in task manager right before it happens? The new solidworks "performance" pipeline is hungry for vram and you "only" have 5gb on the p2200. If it ran out of vram and is doing a swap from vram to system memory, it could be a 2+ second delay. As I understand it, that swap out of vram is a blocking operation so it prevents simultaneous gfx driver output for as long as the memory transfer takes. Back in windows 7 days when Microsoft set the default TdrDelay, large graphics cards came with 768MB of vram and 2 seconds was a reasonable maximum time for swap operations to occur in. Now cards come with 5GB - 48GB and that swap can take much longer. If this is what you are running into, upping TdrDelay is not the band-aid, but the actual fix. If you do identify swaps as the root cause, you would also want to make sure your card is not running into memory bottlenecks. Make sure your p2200 is utilizing a full 16x pci3.0 link (gpu-z can check) and your ram is clocked as high as it is stable (xmp profile)
Go to full post
User avatar
bnemec
Posts: 1946
Joined: Tue Mar 09, 2021 9:22 am
Answers: 10
Location: Wisconsin USA
x 2552
x 1401

Re: NVIDIA TDR error? Anyone seen this before?

Unread post by bnemec »

Just happened again but this time the display came back just in time to show Solidworks Is Busy dialog then it closed. The error log window did not come up. CXPA files for today are all >10MB I'll send those to VAR.

I had installed GPU driver 463.15 after the previous blackout; I'm assuming the newer driver helped the display come back without restarting.
Certified
Posts: 8
Joined: Fri Oct 29, 2021 2:19 pm
Answers: 1
x 1
x 5

Re: NVIDIA TDR error? Anyone seen this before?

Unread post by Certified »

I just visited the link [1] in the error message and found that if you receive this message "the application was unable to continue rendering because the Microsoft Windows imposed time limit (TDR) was exceeded. This is normally the case when the workload sent to the graphics card is greater than what the graphics card can process in the normal alloted time of two seconds."

Luckily, windows 10 will let you increase your TDR Timeout. Visiting the windows TDR registry keys article [2] we can see that it is controlled by the "TdrDelay" regkey. So press winkey + r, type "regedit" into the run box and press enter. In the registry editor window that pops up navigate to "HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers" and double click the "TdrDelay" entry. Select the Decimal radio button option instead of the default Hexidecimal and enter something a few seconds more forgiving than your current settings (like 10 if you had 2) into the Value data box. Press okay and reboot. Let me know if you are still running into your issue after changing that value.

[1] https://nvidia.custhelp.com/app/answers ... /a_id/3633
[2] https://docs.microsoft.com/en-us/window ... istry-keys
User avatar
bnemec
Posts: 1946
Joined: Tue Mar 09, 2021 9:22 am
Answers: 10
Location: Wisconsin USA
x 2552
x 1401

Re: NVIDIA TDR error? Anyone seen this before?

Unread post by bnemec »

Certified wrote: Tue Nov 09, 2021 9:55 am I just visited the link [1] in the error message and found that if you receive this message "the application was unable to continue rendering because the Microsoft Windows imposed time limit (TDR) was exceeded. This is normally the case when the workload sent to the graphics card is greater than what the graphics card can process in the normal alloted time of two seconds."

Luckily, windows 10 will let you increase your TDR Timeout. Visiting the windows TDR registry keys article [2] we can see that it is controlled by the "TdrDelay" regkey. So press winkey + r, type "regedit" into the run box and press enter. In the registry editor window that pops up navigate to "HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers" and double click the "TdrDelay" entry. Select the Decimal radio button option instead of the default Hexidecimal and enter something a few seconds more forgiving than your current settings (like 10 if you had 2) into the Value data box. Press okay and reboot. Let me know if you are still running into your issue after changing that value.

[1] https://nvidia.custhelp.com/app/answers ... /a_id/3633
[2] https://docs.microsoft.com/en-us/window ... istry-keys
I read that too. It feels like increasing that time is more of a band aid than a solution. There wasn't much going on in the viewport when this happens, just editing a sketch with a couple dozen elements on a small sheet metal part of two flanges. I guess if this is "normal" behavior and everyone increases this (kinda like GDI Object limit) then I suppose it's the fix.
Certified
Posts: 8
Joined: Fri Oct 29, 2021 2:19 pm
Answers: 1
x 1
x 5

Re: NVIDIA TDR error? Anyone seen this before?

Unread post by Certified »

On my system I had not manually changed it, but TdrDelay was already set to 10 seconds instead of the default of 2 listed in the Microsoft article. I've had both NVIDIA and AMD pro cards in this workstation so I am suspecting one of the gfx vendor's driver tweaks it up a bit from the default upon install. The worst changing it to 10 seconds will do is make you wait an additional 8 seconds for recovery in the event the gfx driver does actually crash hard. In the best case, it will stop the driver from force crashing if the card is hung on actual computations for more than the 2 sec. You are right that it is probably a band aid though. The gfx driver should never be hanging for more than 2 seconds on a simple sketch under normal circumstances. Since it is happening even under relatively light load, see if there is a heat based cause. Ive mostly seen it on thin laptops that have their heatsinks caked in dust, but if the card isn't getting the cooling it needs, it could be thermal throttling down to protect itself. A can of compressed air can do wonders.

Another thing to check: Is your dedicated vram near capacity in task manager right before it happens? The new solidworks "performance" pipeline is hungry for vram and you "only" have 5gb on the p2200. If it ran out of vram and is doing a swap from vram to system memory, it could be a 2+ second delay. As I understand it, that swap out of vram is a blocking operation so it prevents simultaneous gfx driver output for as long as the memory transfer takes. Back in windows 7 days when Microsoft set the default TdrDelay, large graphics cards came with 768MB of vram and 2 seconds was a reasonable maximum time for swap operations to occur in. Now cards come with 5GB - 48GB and that swap can take much longer. If this is what you are running into, upping TdrDelay is not the band-aid, but the actual fix. If you do identify swaps as the root cause, you would also want to make sure your card is not running into memory bottlenecks. Make sure your p2200 is utilizing a full 16x pci3.0 link (gpu-z can check) and your ram is clocked as high as it is stable (xmp profile)
User avatar
bnemec
Posts: 1946
Joined: Tue Mar 09, 2021 9:22 am
Answers: 10
Location: Wisconsin USA
x 2552
x 1401

Re: NVIDIA TDR error? Anyone seen this before?

Unread post by bnemec »

Certified wrote: Wed Nov 10, 2021 9:40 am On my system I had not manually changed it, but TdrDelay was already set to 10 seconds instead of the default of 2 listed in the Microsoft article. I've had both NVIDIA and AMD pro cards in this workstation so I am suspecting one of the gfx vendor's driver tweaks it up a bit from the default upon install. The worst changing it to 10 seconds will do is make you wait an additional 8 seconds for recovery in the event the gfx driver does actually crash hard. In the best case, it will stop the driver from force crashing if the card is hung on actual computations for more than the 2 sec. You are right that it is probably a band aid though. The gfx driver should never be hanging for more than 2 seconds on a simple sketch under normal circumstances. Since it is happening even under relatively light load, see if there is a heat based cause. Ive mostly seen it on thin laptops that have their heatsinks caked in dust, but if the card isn't getting the cooling it needs, it could be thermal throttling down to protect itself. A can of compressed air can do wonders.

Another thing to check: Is your dedicated vram near capacity in task manager right before it happens? The new solidworks "performance" pipeline is hungry for vram and you "only" have 5gb on the p2200. If it ran out of vram and is doing a swap from vram to system memory, it could be a 2+ second delay. As I understand it, that swap out of vram is a blocking operation so it prevents simultaneous gfx driver output for as long as the memory transfer takes. Back in windows 7 days when Microsoft set the default TdrDelay, large graphics cards came with 768MB of vram and 2 seconds was a reasonable maximum time for swap operations to occur in. Now cards come with 5GB - 48GB and that swap can take much longer. If this is what you are running into, upping TdrDelay is not the band-aid, but the actual fix. If you do identify swaps as the root cause, you would also want to make sure your card is not running into memory bottlenecks. Make sure your p2200 is utilizing a full 16x pci3.0 link (gpu-z can check) and your ram is clocked as high as it is stable (xmp profile)
Thank you. That's a bunch to digest. Since checking for dust and fan operation is simple and common and could explain why I'm having problem and other same machines are not. The fan on the card seems to never run. I installed HWinfo to monitor. I don't know what temps the GPU should operate at but I found some webpages comparing performance of several cards and they showed the P2200 at 65C "under load" while consuming 73 Watts. I set an alarm at 70C and did normal sketch stuff I was doing before and the alarm went off. Max temp showing 75.6C and max power showing 23.8W fan speed all zeros. Is this normal? I'm trying to find info about max chip temp or temp/fan speed data for this card and not finding much more than this.
https://www.servethehome.com/nvidia-qua ... -review/6/
User avatar
bnemec
Posts: 1946
Joined: Tue Mar 09, 2021 9:22 am
Answers: 10
Location: Wisconsin USA
x 2552
x 1401

Re: NVIDIA TDR error? Anyone seen this before?

Unread post by bnemec »

The fan on the P2200 was not running. HWinfo showed 0 rpm but GPU Fan was calling for 80%. IT is going to see about warrenty.

It does throttle back when the temp gets up into the mid 90s, but even with that SW will cause the temp to continue to climb until the board turns off.
ryan-feeley
Posts: 82
Joined: Thu Jan 20, 2022 3:35 pm
Answers: 1
x 31
x 91

Re: NVIDIA TDR error? Anyone seen this before?

Unread post by ryan-feeley »

I saw those VIDEO_TDR_FAILURE errors back in 2019 with a P5000 card (same family as your P2200). Tech replaced it and the MoBo, and it still happened.

I messed around with different solutions. Some involved changing the powermanagement settings of the card when running solidworks, others involved disabling hw acceleration in certain programs.

I ended up just changing the power plan in my computer to "Ultimate Performance". That seems to prevent any of the busses going into whatever low-power move causes this latency issue. Or maybe it just runs the fans more aggressively. Your issue may just be bum hardware, but check back on this thread if you replace it and still have problems.
Post Reply