[ARM] cache align destination pointer when copying memory for some processors