The job of Leveled compaction strategy is to maintain this structure while keeping L0 empty: Let’s explain now why LCS indeed fulfills its ambition to provide low space amplification and therefore indeed solves STCS’s main problem.
In the previous post, we saw that space amplification comes in two varieties: The first is temporary disk space use during compaction, and the second is space wasted by storing different values for the same over-written rows.
In other words, a run is a collection of sstables with non-overlapping token ranges.
The benefit of using a run of fragments (small sstables) instead of one huge sstable is that with a run, we can compact only parts of the huge sstable instead of all of it.
With LCS, this problem is gone, as we can see in the graph below: compaction does require some temporary space, as evidenced by the green spikes in the graph, but these spikes are much smaller than the purple (STCS) spikes, and not proportional to the amount of data in the database (note that in this test, we lowered the LCS sstables’s size from the default 160 MB to 10 MB, so we can meaningfully demonstrate LCS on this relatively small data set).
The second example we saw in the previous post was an overwrite workload, where the same 1.2 GB data set was written over and over, 15 times.LCS does not have the temporary disk space problem which plagued STCS: While STCS may need to do huge compactions and temporarily have both input and output on disk, LCS always does small compaction steps, involving roughly 11 input and output sstables of a fixed size.This means we may need roughly 11*160MB, less than 2 GB, of temporary disk space – not half the disk as in STCS. The reason is that most of the data is stored in the biggest level, and since this level is a run – with different sstables having no overlap – we cannot have any duplicates inside this run. The best case for LCS is that the last level is filled.This post and the rest of this series are based on a talk that I gave (with Raphael Carvalho) in the last annual Scylla Summit in San Francisco.The video and slides for the talk are available on our Tech Talk page.For example, if the last level is L3, it has 1000 sstables.In this case, L2 and L1 together have just 110 sstables, compared to 1000 sstables in L3.The paper Optimizing Space Amplification in Rocks DB suggests that this can be fixed by changing the level sizes so that instead of insisting that L3 has exactly 1000 sstables, we focus on L3 having 10 times more sstables than L2.Neither Scylla nor Cassandra have this fix yet, so in worst case during massive overwrites, their LCS may still have space amplification of 2.It actually has a worst case where we can get 2-fold space amplification.This happens when the last level is not filled, but rather only filled as much as the previous level.