The local runtime may need to load the model into memory before generation starts. After that, repeated requests on the same model are usually faster.