Community technical support mailing list was retired 2010 and replaced with a professional technical support team. For assistance please contact: Pre-sales Technical support via email to sales@march-hare.com.
"Performance" is perhaps the area where you'll get the biggest volume of FUD from various tool vendors regarding source control tools, which in many cases isn't expected because it is difficult to characterize the various projects. The project I am working on has a source code pull of "404 File(s) 19,362,615 bytes" As the current version control for the team is non-client/server, it uses considerable network bandwidth to pull the code. (~6 minutes during the day, ~3 minutes on weekends or late evenings). I checked a baseline version of our code into CVS, and was able with -z9 to pull the code from my home linux box via the cable modem uplink in about 2.5 minutes, and pull from a machine on the local 100Mbit ethernet in about 40 seconds. However, this doesn't account for stuff like back revisions, etc. The last time I did any investigation of benchmarking CVS, it had a full version of the head, and then backreferenced all changes from that point. We have files that are version 2.1380, which on this file would be a nightmare to grab -r2.1. Also, the algorithm doesn't save any history of parsing when comparing 2 different versions, so doing a compare of "-r2.1" and "-r2.2" is the worst case for performance -- it involves 2 linear searches from 2.1380. A sample worst case check can be done by taking 2 completely different source files, and alternatively checking them in over the top of each other as the revision numbers grow: copy test-a.txt test.txt cvs commit -m "test" test.txt copy test-b.txt test.txt cvs commit -m "test" test.txt and so on... For this sort of reason, I am not sure how well CVS necessarilly scales to very large implementations like in the presentation you mentioned. I have heard complaints about tagging speed too, though I haven't done any investigation in that matter myself. A "modern" version control system (something like, say, Rational, PVCS Dimensions, or any of the others) uses a database to hold all their changes, and keeps their archive in a format with more checkpoints, as well as a data representation of what the revision history tree for any given file looks like, that allow it to make more intelligent decisions regarding how to parse the changes as fast as possible when differencing two files, pulling out a specific version, etc. They pay 2 penalties for doing their versioning in this manner: 1) unless you have 1000+ revisions of a single file, the database access penalty exceeds the time to just perform a linear algorithm on the file to get what you want -- the "overkill" factor. And 2) storing full checkpoints is space intensive. However, the size of disk drives is definitely growing faster than our ability to fill them with lines of source code. (I do not write 5x more lines now than I did 2 years ago) With space getting close to free for most people, there is little reason not to use a more space intensive algorithm that saves CPU cycles. A simple database with hash-based searches, something like MySQL, can still sort millions of entries in a matter of seconds on modest hardware. (I wrote a bug tracking database for fun one spring break (like bugzilla, but a bit simpler) and with 500K bugs in the database, each bug having 3-8 events attached to it. It took me about 40 seconds on a P3-450 running linux to sort the database ordered by priority then ordered by age to the second, then return the top 200 results to me via a web page... this seemed like a pretty good test at the time) If anyone else has thoughts on this, or any CVS performance numbers, I'd be interested to hear them also. And I am not trying to suggest that CVS isn't a great package, just I would need to see some proof that it could handle the NT-esque development teams before I would try to implement it in one. --eric > -----Original Message----- > From: Kari Hoijarvi [mailto:hoijarvi at me.wustl.edu] > Sent: Saturday, March 23, 2002 9:39 AM > To: cvsnt > Subject: [Cvsnt] CVS(NT) with huge repositories? > > > This is a multi-part message in MIME format. > -- > [ Picked text/plain from multipart/alternative ] > There is an interesting slide show about NT development: > http://www.usenix.org/events/usenix-win2000/invitedtalks/lucov sky_html/ especially the slide about version control: http://www.usenix.org/events/usenix-win2000/invitedtalks/lucovsky_html/sld01 5.htm I wonder if CVS(NT) actually could handle it? They wrote their own tool. I was in the Outlook team 1995-1998 and used an earlier version of it. It was able to handle Office 2000 fairly well, maybe 1/6 th of the size of Win2000. My experiences with the synchronization effort is consistent, about 15 minutes vs. 2 hours. Has someone benchmarked CVS(NT) with 200 projects(250 MB each) and about 1000 updated files per day? Kari -- _______________________________________________ Cvsnt mailing list Cvsnt at cvsnt.org http://www.cvsnt.org/cgi-bin/mailman/listinfo/cvsnt https://www.march-hare.com/cvspro/en.asp#downcvs _______________________________________________ Cvsnt mailing list Cvsnt at cvsnt.org http://www.cvsnt.org/cgi-bin/mailman/listinfo/cvsnt https://www.march-hare.com/cvspro/en.asp#downcvs