* Add predicate support to DeviceStream fetches, making them even uglier * Add `store_per_thread` to PTX stdlib