Generally, disk, NVMe, eMMC, SPI Flash etc. are all so much slower than your CPU that loading something uncompressed is much slower that loading a compressed thing and then decompressing. decompression is usually possible to do in parallel, when the storage may be queried asynchronously and the algorithm may work with partial data or in a streaming fashion. Modern storage is usually accessed asynchronously and LZ4 may be streamed.
For example, loading a LZ4 compressed kernel from NVMe is faster than an uncompressed kernel on my single Cortex-A53 @ 1.4GHz by a factor that is almost the same as the compression factor. That is, in my testing, you can’t easily detect any latency added by compression.
Still, I encourage you to do your own testing, as you may have a setup that does not benefit from compression (off hand, I can’t think of anything like that). Be warned that Linux normally keeps everything in RAM in a cache of your filesystem because it’s so much faster for accessing, so any testing would have to include flushing this cache.
TL;DR: modern decompression algorithms are orders of magnitude faster than non-volatile memory today.