Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Improve performance of join #2417
Comments
I could be interested if the new implementation(s) to be faster for either afcuda or afcpu. Would it be ? |
This would be faster on afcuda. I haven't looked into afcpu's implementation. This will require you to modify the CUDA kernel implementation. |
I don't think there will be a big improvement on the CPU backend. |
ok, a speed up for afcuda would be good enough, please add here more details and I will see if I could do it. |
The join function seems to be calling the join kernel for each buffer that needs to be joined. It would be better if we performed this operation in one kernel call instead of multiple calls. Here are the steps you can take to do this:
|
Is this issue still open? If it is, I would like to work on it |
Current implementation of join can be improved by performing the operation in a single call to the backend kernel instead of multiple calls.
This is a fairly easy kernel and may be a good issue for someone getting to know CUDA/ArrayFire internals. Ping me if you want additional info.