Community technical support mailing list was retired 2010 and replaced with a professional technical support team. For assistance please contact: Pre-sales Technical support via email to sales@march-hare.com.
Michael Wojcik wrote: > Look, however the copy happens, it's going to map every page of the new > file at some point. Doesn't matter whether you're using private-buffer > I/O with kernel copies (conventional read(2)/write(2) I/O), or > memory-mapping and copying. Running a cumulative checksum over those > pages will take very little additional time; you're already pulling them > into cache. It's not like CVS can DMA from one file to another. It's just not that simple. The file isn't read like that.. bits of it are skipped over while reading, and there's no single point that that you would be able to even make sense of a checksum if you had one. If you can't check a checksum why spend the time to write it in the first place? When it writes it just remembers the last point it read from and does a block copy of the rest *however* the headers are normally reconstructed. This reconstruction is distributed all over the RCS code and no single point could be used that a checksum could be calculated. Checksumming individual revisions is slated for 3.0/3.1 if the technical difficulties can be overcome, but they won't use RCS files anyway. > I wasn't suggesting that. I'd run a checksum over the whole file, then > append the result as another RCS section. That would also kill performance. You would need to put any new data at the beginning, or in the header area, since it is a major performance drag under many operating systems to seek to the end of the file - this is why the older revisions are stored at the end, btw. Storing the checksum in the RCS file at all would negate the point of it anyway - it would always be invalid because writing it in the file changes the file. > If a checksum takes noticeable time, that's an algorithm problem. CVSNT > shouldn't have any problem getting on the order of tens of megabytes per > second (or better) throughput on a checksum on standard hardware. It > wouldn't be noticeable. I wouldn't accept an algorithm that couldn't process *hundreds* of megabytes a second. RCS files get *big*. And it definately would be noticable if it took more than a second. 1000 files, 1 second - that's adding more than 16 minutes to your checkin time. Not acceptable. Believe me I *have* worked on this, and the tradeoff simply isn't worth it. Tony