If we ignore NFB and just concentrate on the virtual current source, then using the speaker signal is not going to give you as precise a result as, say, a constant current source. The reason being tolerance variations together with the frequency and phase response of the succeeding stages. How important these are depends on you design goals, of course.
Afterthought: The whole idea falls flat when you start to clip. Whether that is a good or a bad thing is a whole new subject.