This post has been de-listed
It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.
I am getting following error while trying to apply IBM large model support with Tensorflow 2.1 using Nvidia A100. I cannot understand this error, I tried to solve it but cannot find a way to do it. I would appreciate any help in this regard. The code can be found in here: https://github.com/junaidjawaid1/3d_U-Net-TFLMS/tree/main
Error
2023-09-20 15:05:14.369391: I tensorflow/core/common_runtime/bfc_allocator.cc:1356] Sum Total of in-use chunks: 27.38GiB
2023-09-20 15:05:14.369396: I tensorflow/core/common_runtime/bfc_allocator.cc:1358] total_region_allocated_bytes_: 39737245696 memory_limit_: 39737245696 available bytes: 0 curr_region_allocation_bytes_: 137438953472
2023-09-20 15:05:14.369413: I tensorflow/core/common_runtime/bfc_allocator.cc:1364] Stats:
Limit: 39737245696
InUse: 29395621376
MaxInUse: 38731654400
NumAllocs: 10381
MaxAllocSize: 9730785280
BytesInactive: 0
BytesActive: 29395621376
PeakBytesActive: 29395621376
BytesReclaimed: 109406002160
NumSingleReclaims: 6
NumFullReclaims: 4
NumDefrags: 0
BytesDefragged: 0
2023-09-20 15:05:14.369432: W tensorflow/core/common_runtime/bfc_allocator.cc:806] ********************************************************\___\*******************________________
2023-09-20 15:05:14.369564: F tensorflow/stream_executor/cuda/cuda_driver.cc:216] Check failed: is_host_ptr == points_to_host_memory (0 vs. 1)dst pointer is not actually on GPU: (nil)
/opt/gridengine/default/spool/compute-0-3/job_scripts/108639: line 13: 99471 Aborted python $PYTHON_SCRIPT
Subreddit
Post Details
- Posted
- 1 year ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/deeplearnin...