WIP: feat(storage): gate streaming cache refill locality#25352
WIP: feat(storage): gate streaming cache refill locality#25352
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces streaming-only locality gating for Hummock cache refill by deriving a live “streaming table/vnode view” from read_version_mapping, then using that view to skip meta-cache refills and filter non-L0 data-cache refill units that don’t overlap local streaming ownership.
Changes:
- Add a streaming-only refill context/view built from live
read_version_mapping. - Gate meta cache refill by streaming table overlap and add locality metrics.
- Gate non-L0 data cache refill units by streaming table/vnode overlap and add tests for locality filtering behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/storage/src/hummock/event_handler/refiller.rs |
Adds streaming locality view, meta/data refill locality gates, metrics, and unit tests. |
src/storage/src/hummock/event_handler/hummock_event_handler.rs |
Wires read_version_mapping into CacheRefiller::new to power locality gating. |
Comments suppressed due to low confidence (1)
src/storage/src/hummock/event_handler/refiller.rs:726
skip_recent_filterused to force the full (non-inheritance) refill path for non-L0 deltas; after this change only L0 usesdata_file_cache_refill_full_impl. Withskip_recent_filter = true, non-L0 refills now depend on parent SST metas being present in the meta cache (sstable_cachedhits), otherwiseget_units_to_refill_by_inheritancecan yield no units and the refill becomes a no-op. Ifskip_recent_filteris intended as a debugging/single-node “refill regardless of recent filter” mode (per config docs), consider preserving the previous full-refill behavior for non-L0 (while still applying the streaming locality filter), or update the option/doc semantics accordingly.
if delta.insert_sst_level == 0 {
Self::data_file_cache_refill_full_impl(context, delta, holders).await;
} else {
Self::data_file_cache_impl(context, delta, holders).await;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let block_smallest_key = | ||
| FullKey::decode(&sstable.meta.block_metas[block_index].smallest_key) | ||
| .to_vec() | ||
| .into_bytes(); | ||
| let table_id = block_smallest_key.user_key.table_id; | ||
| let Some(streaming_bitmap) = self.streaming.get(&table_id) else { | ||
| return false; | ||
| }; | ||
|
|
||
| let block_largest_key = | ||
| if let Some(next_block_meta) = sstable.meta.block_metas.get(block_index + 1) { | ||
| if next_block_meta.table_id() != table_id { | ||
| // The next block belongs to another table, so we no longer have a safe upper | ||
| // bound for vnode_range. Admit to avoid false negative filtering. | ||
| return true; | ||
| } | ||
| FullKey::decode(&next_block_meta.smallest_key) | ||
| .to_vec() | ||
| .into_bytes() | ||
| } else { | ||
| let largest_key = FullKey::decode(&sstable.meta.largest_key) | ||
| .to_vec() | ||
| .into_bytes(); | ||
| if largest_key.user_key.table_id != table_id { | ||
| // Multi-table SSTs can end with another table's largest key. Admit when the local | ||
| // table is present to avoid false negative filtering on the boundary block. | ||
| return true; | ||
| } | ||
| largest_key | ||
| }; |
There was a problem hiding this comment.
This locality gate decodes and clones full keys (FullKey::decode(...).to_vec().into_bytes()) for each block being considered, which adds allocations and CPU overhead on the refill hot path. If possible, avoid allocating by decoding into a borrowed form or extracting just the table_id/vnode range from the encoded key bytes (e.g., reuse existing helpers like BlockMeta::table_id() and parse vnode/table_key without to_vec()).
| let block_smallest_key = | |
| FullKey::decode(&sstable.meta.block_metas[block_index].smallest_key) | |
| .to_vec() | |
| .into_bytes(); | |
| let table_id = block_smallest_key.user_key.table_id; | |
| let Some(streaming_bitmap) = self.streaming.get(&table_id) else { | |
| return false; | |
| }; | |
| let block_largest_key = | |
| if let Some(next_block_meta) = sstable.meta.block_metas.get(block_index + 1) { | |
| if next_block_meta.table_id() != table_id { | |
| // The next block belongs to another table, so we no longer have a safe upper | |
| // bound for vnode_range. Admit to avoid false negative filtering. | |
| return true; | |
| } | |
| FullKey::decode(&next_block_meta.smallest_key) | |
| .to_vec() | |
| .into_bytes() | |
| } else { | |
| let largest_key = FullKey::decode(&sstable.meta.largest_key) | |
| .to_vec() | |
| .into_bytes(); | |
| if largest_key.user_key.table_id != table_id { | |
| // Multi-table SSTs can end with another table's largest key. Admit when the local | |
| // table is present to avoid false negative filtering on the boundary block. | |
| return true; | |
| } | |
| largest_key | |
| }; | |
| let block_meta = &sstable.meta.block_metas[block_index]; | |
| let table_id = block_meta.table_id(); | |
| let Some(streaming_bitmap) = self.streaming.get(&table_id) else { | |
| return false; | |
| }; | |
| let block_smallest_key = FullKey::decode(&block_meta.smallest_key); | |
| let block_largest_key = if let Some(next_block_meta) = sstable.meta.block_metas.get(block_index + 1) { | |
| if next_block_meta.table_id() != table_id { | |
| // The next block belongs to another table, so we no longer have a safe upper | |
| // bound for vnode_range. Admit to avoid false negative filtering. | |
| return true; | |
| } | |
| FullKey::decode(&next_block_meta.smallest_key) | |
| } else { | |
| let largest_key = FullKey::decode(&sstable.meta.largest_key); | |
| if largest_key.user_key.table_id != table_id { | |
| // Multi-table SSTs can end with another table's largest key. Admit when the local | |
| // table is present to avoid false negative filtering on the boundary block. | |
| return true; | |
| } | |
| largest_key | |
| }; |
| .sum::<u64>(), | ||
| ); | ||
|
|
||
| if delta.insert_sst_level == 0 || context.config.skip_recent_filter { | ||
| if delta.insert_sst_level == 0 { | ||
| Self::data_file_cache_refill_full_impl(context, delta, holders).await; | ||
| } else { | ||
| Self::data_file_cache_impl(context, delta, holders).await; |
There was a problem hiding this comment.
Because meta_cache_refill now filters by streaming locality, it can return an empty holders list even when delta.insert_sst_infos is non-empty. data_cache_refill will still proceed into the non-L0 path and do parent-meta cache lookups / unit computation with an empty holders, which is wasted work. Consider adding an early return in data_cache_refill when holders.is_empty() (after the existing basic guards).
| Bound::Excluded(block_largest_key.user_key.table_key), | ||
| ); | ||
| let vnode_range = vnode_range(&table_key_range); | ||
| let bitmap = Bitmap::from_range(streaming_bitmap.len(), vnode_range.0..vnode_range.1); |
There was a problem hiding this comment.
check_table_refill_vnodes builds a Bitmap::from_range(streaming_bitmap.len(), vnode_range.0..vnode_range.1), but vnode_range is explicitly “vnode count unaware” and can legally return an end bound larger than the actual vnode bitmap length. That will trigger Bitmap::from_range assertions and cause the locality gate to panic/log-warn and fail-open. Consider clamping the computed vnode range to 0..=streaming_bitmap.len() (or computing intersection without constructing a bitmap) so this path never panics in normal operation.
| let bitmap = Bitmap::from_range(streaming_bitmap.len(), vnode_range.0..vnode_range.1); | |
| let vnode_count = streaming_bitmap.len(); | |
| let clamped_start = vnode_range.0.min(vnode_count); | |
| let clamped_end = vnode_range.1.min(vnode_count); | |
| if clamped_start >= clamped_end { | |
| return false; | |
| } | |
| let bitmap = Bitmap::from_range(vnode_count, clamped_start..clamped_end); |
Summary
read_version_mappingRelation to #24452
This is the streaming-only first-stage split of the table refill gating work.
Included in this PR:
read_version_mappingExplicitly not included:
RoleNotes
read_version_mappinginside refiller instead of maintaining a separate long-lived cachevnode_statisticsfor whole-SST rejectionTesting
cargo fmt --allcargo test -p risingwave_storage refiller::tests -- --nocapturecargo test -p risingwave_storage --no-run