We tested with a text file that was about 1 GB in size. The computer had 2 GB of RAM, enabling the OS to keep the whole thing in cache.
With Windows STL implementation the program run for 40 seconds, even if the cache was hot. Linux version ran for 40 seconds if the cache was cold, and 4 seconds if the cache was hot. He wrote a C version of the program that also took 4 seconds to comlete on Linux. On Windows, C version of the program ran for 10 seconds, still much slower than in Linux.
Here is the test source code:
He later rewrote the C++ version to use a class member function instead of a global function from <string>, and that got it to a speed of 23 seconds. Still, much slower than in Linux.
The version of Windows C++ runtime was from Visual Studio 2005.