![]() That host-side code gets compiled with cl.exe (Microsoft) rather than icl.exe (Intel), which causes later link errors because the Intel compiler (or its libs, such as libmmds.lib) has its own implementations of various things like _ceil and _floor.īasically I just need a way to tell nvcc to use icl.exe rather than cl.exe (they take the same args so this should work fine). cu file, and math ops such as ceil() and floor(). Why do I need to do this? Why not just mix cl.exe and icl.exe-compiled CPU code? On Windows, with VS2005, if I use (for instance) the thrust template lib, it uses std::string in a few places on the host side in a. I don’t build with an IDE, so I just need to know what command-line args to pass to things. I use the Intel C/C++ compiler for our CPU code, and nvcc for the gpu code. When interfacing C and Fortran, it is important to remember that while arguments in C are passed by values, in Fortran they are passed by reference.I’ve read various threads in this forum, but none of them seem to actually point to a solution, so here’s my take on it. Since we are using standard Fortran, we will need to write the computation on the GPU using CUDA C. ! computing the reference solution on the CPU ! Allocate B using standard allocate call ! From this point on, we can use A and C as normal Fortran array If (err > 0) print *,"Error in allocating C with cuda HostAlloc =",err If (err > 0) print *,"Error in allocating A with cuda HostAlloc =",errĮrr = cudaHostAlloc(cptr_C,N*sizeof(fp_kind),cudaHostAllocMapped) ![]() If (err > 0) print *,"Error in setting cudaSetDeviceFlags=",errĮrr = cudaHostAlloc(cptr_A,N*sizeof(fp_kind),cudaHostAllocMapped) ! Allocate A and C using cudaHostAlloc and then map the C pointer to Fortran arraysĮrr=cudaSetDeviceFlags(cudaDeviceMapHost) Real(fp_kind) ,allocatable, dimension (:) :: B Real(fp_kind) ,pointer, dimension (:) :: A,C Integer, parameter :: fp_kind = kind(0.0d0) ! Double precision We will use the standard Fortran allocator for this one. B is an array that we will use to compute a reference solution on the CPU. Since we want to use the zero copy features on these two, we will allocate them with cudaHostAlloc. We need to do a couple of extra steps: call the CUDA allocator in C, and then pass the C pointer to Fortran using the function C_F_Pointer provided by the iso C bindings.Ī is the input array, C is the output array from the GPU computation. Since we are using a standard Fortran 90 compiler, we can't use the built in allocator ( it has no knowledge of pinned memory). This is achieved with calls to cudaHostGetDevicePointer. These are the pointers that we will pass to the CUDA kernels.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |