Community technical support mailing list was retired 2010 and replaced with a professional technical support team. For assistance please contact: Pre-sales Technical support via email to sales@march-hare.com.
> From: cvsnt-bounces at cvsnt.org > [mailto:cvsnt-bounces at cvsnt.org] On Behalf Of Tony Hoyle > Sent: Friday, 19 May, 2006 13:25 > > Michael Wojcik wrote: > > I believe that's highly dubious. CVS has to rewrite the whole file for > > most operations, since the RCS file format is plain-text (and > > Actually for the most part it's simply a fast copy routine. > It's designed to be *very* fast. It still has to do just as much disk I/O. > At no time is the entire file or anything resembling it in > memory. Nor would it need to be in order to checksum it. Look, however the copy happens, it's going to map every page of the new file at some point. Doesn't matter whether you're using private-buffer I/O with kernel copies (conventional read(2)/write(2) I/O), or memory-mapping and copying. Running a cumulative checksum over those pages will take very little additional time; you're already pulling them into cache. It's not like CVS can DMA from one file to another. > Checksumming the individual revisions won't work due to keyword expansion I wasn't suggesting that. I'd run a checksum over the whole file, then append the result as another RCS section. > The only way you could checksum would be to do it to binary revisions, and > even then you'd need a size threshold - the calculation is very CPU intensive > compared to everything else. Sorry, Tony, I don't believe that. Calculating any reasonable checksum is going to have negligible CPU costs, particularly compared to the disk and network I/O time. If a checksum takes noticeable time, that's an algorithm problem. CVSNT shouldn't have any problem getting on the order of tens of megabytes per second (or better) throughput on a checksum on standard hardware. It wouldn't be noticeable. I'm tempted now to implement it just to prove the point. > Really it's not worth it. The only thing that could corrupt an RCS file is > actual hardware failure - and then you routinely recover from backups anyway.. *If* you detect it. A checksum is cheap insurance. Also, recovering the entire repository from backup is potentially more intrusive than recovering only those files that have been corrupted. > I wouldn't trust a file stored on such a device checksummed or not. You have to trust something at some point. My inclination would be to let CVS administrators decide on their own threat models. Now, it's perfectly reasonable for March Hare to say "checksums are not a priority for us", whether that's because you don't believe their very useful, or because there isn't significant demand; I have no arguments with that. But I'm very suspicious of performance arguments without data to back them up. -- Michael Wojcik Principal Software Systems Developer, Micro Focus