|| When the going gets tough, the tough get empirical. ||

So, we have a few prototypes, now what? How do we choose?

candidates

All candidates should run with blob support and appropriate block size, since these could potentially have non-linear effects.

A big question is how much tuning should be done on these. The goal is to get an idea of how the technique would perform in a "proper implementation", but without spending ages of time super-optimizing each possibility before we even know if it's worth it or not. So we should do the things that would seem to make a qualitative difference. E.g., it's probably worth making sure that netsync is smart enough to avoid reconstructing versions when actually unnecessary.

A big question: is it worth implementing the lookup-chains-by-rowid optimizations that drh suggested for these tests? In principle they should not affect locality much, since we are not VACUUMing, but avoiding traversing indexes should be a win of some magnitude. * When profiling checkouts and pulls there doesn't seem to be a significantly amount of (wait-)time spent in things that look index-related for sqlite (An exception is when the database hasn't been ANALYZEd recently). It's possible that I'm not looking at the right callstacks though. - Matt Johnston

testing methodology

We should test both full pulls, and repeated incremental pulls, since the various methods may interact in complicated ways with repeated incremental pulls. (These can screw up disk locality in various ways, cause non-optimal ordering for forced delta linearization like classic revlogs do, etc.)

So, for each test dataset, we have a single db. Call this PRISTINE-0. From this, we use db kill_rev_locally repeatedly to produce smaller and smaller dbs, PRISTINE-1, PRISTINE-2, PRISTINE-3, etc. PRISTINE-n+1 is produced by taking PRISTINE-n, and repeatedly running automate leaves, picking a random item, and db kill_rev_locally on it. I'm not sure how many revs we should remove each time, or how many of these sets we should generate. Note that the randomness here impacts reproducibility; either save these databases so everyone can re-run the tests using the same dbs, or specify the random seed used.

Up until this point, everything can be done with just a random mainline version of monotone.

Now, for each n, serve up PRISTINE-n using mainline, and pull it into a fresh db using the version under test. Call the result TEST-n. These are the databases we will actually use for testing.

Now, measure the following things: * pulling TEST-0 into a fresh database. Measure server CPU time, client CPU and real time, and bytes transferred. Measure size of resulting db (both du and du -a, for the revlog case), and time for a cold-cache checkout, and time for a hot-cache checkout. And time for a cold cache log --diffs --last=20, and same on hot cache. (To do a cold-cache checkout on linux, put the database into a separate partition, then umount that partition, re-mount it, and immediately run a checkout from it.) * pulling each of the TEST-n databases into a single database -- to simulate a user tracking a project by doing a daily incremental pull. Measure the size of the resulting db (du and du -a again), and time for a cold-cache checkout, and time for a hot-cache checkout, and hot and cold log --diffs --last=20. (Maybe measure the time each pull takes too?) Finally, measure the server CPU time, client CPU and real time, for doing a fresh pull from the incrementally pulled database. (This simulates the db a server would generally have, produced by incremental pushes.)

data sets

Some interesting ones: * net.venge.monotone* * net.venge.monotone* --exclude net.venge.monotone.contrib* * xaraya core, modules, themes, languages -- http://mt.xaraya.com/ (many checkins, not so large a tree, I think) * openembedded -- http://ewi546.ewi.utwente.nl/OE/OE.db.bz2 (large tree, fair number of checkins) * CTWM -- http://ctwm.free.lp.se/monotone-crash-course.html (tiny all around) * something big in both tree and history... anyone got a kernel import around to play with?

questions

It is more important to get some numbers than to get perfect numbers, so probably the most important thing is to get the script working, and then to get the candidates basically working, and the last priority is getting super-good data sets.