The Wayback Machine - https://web.archive.org./web/20201208163924/https://github.com/arrayfire/arrayfire/issues/2417
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of join #2417

Open
umar456 opened this issue on Jan 21, 2019 · 6 comments
Open

Improve performance of join #2417

umar456 opened this issue on Jan 21, 2019 · 6 comments

Comments

@umar456
Copy link
Member

@umar456 umar456 commented on Jan 21, 2019

Current implementation of join can be improved by performing the operation in a single call to the backend kernel instead of multiple calls.

This is a fairly easy kernel and may be a good issue for someone getting to know CUDA/ArrayFire internals. Ping me if you want additional info.

@WilliamTambellini
Copy link
Contributor

@WilliamTambellini WilliamTambellini commented on Jan 22, 2019

I could be interested if the new implementation(s) to be faster for either afcuda or afcpu. Would it be ?

@umar456
Copy link
Member Author

@umar456 umar456 commented on Jan 22, 2019

This would be faster on afcuda. I haven't looked into afcpu's implementation. This will require you to modify the CUDA kernel implementation.

@umar456
Copy link
Member Author

@umar456 umar456 commented on Jan 22, 2019

I don't think there will be a big improvement on the CPU backend.

@WilliamTambellini
Copy link
Contributor

@WilliamTambellini WilliamTambellini commented on Jan 22, 2019

ok, a speed up for afcuda would be good enough, please add here more details and I will see if I could do it.

@umar456
Copy link
Member Author

@umar456 umar456 commented on Jan 26, 2019

The join function seems to be calling the join kernel for each buffer that needs to be joined. It would be better if we performed this operation in one kernel call instead of multiple calls. Here are the steps you can take to do this:

  1. Allocate a buffer to store the pointers to buffers that need to be joined.
  2. Allocate a buffer for the offsets. You could do this with the previous buffer but a separate buffer may be easier.
  3. Modify the kernel so that you loop over each of the arrays. This may be a simple modification based on what I saw. You probably just need to add another loop to iterate over each of the buffers.
  4. Modify the host code to call the new function. Remove the join wrapper as its now unnecessary.
@UmashankarTriforce
Copy link

@UmashankarTriforce UmashankarTriforce commented on Sep 30, 2019

Is this issue still open? If it is, I would like to work on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.