Implement subnet pool utilization by david-crespo · Pull Request #10157 · oxidecomputer/omicron

david-crespo · 2026-03-25T23:07:16Z

Follows the same pattern as IP pool utilization. Counting addresses feels a little weird, since subnets are blocks of addresses, but there isn't really another unit we can use.

Datastore method wraps both queries (member subnets for capacity, allocated subnets for usage) in a single transaction, with a 10k row limit on each query to bound resource usage.
accumulate_subnet_sizes sums subnet sizes using checked 128-bit arithmetic, analogous to accumulate_ip_range_sizes.
App layer computes remaining = capacity - allocated in u128 before converting to f64, matching ip_pool_utilization_view.
Response type uses { remaining, capacity } (both f64), matching IpPoolUtilization.

Implement the subnet_pool_utilization_view endpoint, which was previously a stub returning 500. The implementation follows the IP pool utilization pattern: fetch subnet CIDRs from the database, compute sizes in Rust using u128 arithmetic, and return remaining/capacity as f64. Changes: - Add a new API version (SUBNET_POOL_UTILIZATION_REMAINING) that changes SubnetPoolUtilization from allocated/capacity to remaining/capacity, matching IpPoolUtilization. - Add datastore methods: subnet_pool_utilization, helpers for fetching member and external subnet CIDRs, and accumulate_subnet_sizes. - Wire up the app layer to call the datastore and compute remaining. - Add datastore unit tests for IPv4 and IPv6 pools. - Add integration tests via assert_subnet_pool_utilization helper. - Update verify_endpoints from GetUnimplemented to Get.

david-crespo · 2026-03-26T00:21:54Z

+        }
+        Ok(count)
+    }
+


Compare to IP pool utilization:

omicron/nexus/db-queries/src/db/datastore/ip_pool.rs

Lines 864 to 986 in b2d7cde

/// Return the number of IPs allocated from and the capacity of the provided

/// IP Pool.

pub async fn ip_pool_utilization(

&self,

opctx: &OpContext,

authz_pool: &authz::IpPool,

) -> Result<(i64, u128), Error> {

opctx.authorize(authz::Action::Read, authz_pool).await?;

opctx.authorize(authz::Action::ListChildren, authz_pool).await?;

let conn = self.pool_connection_authorized(opctx).await?;

let (allocated, ranges) = self

.transaction_retry_wrapper("ip_pool_utilization")

.transaction(&conn, |conn| async move {

let allocated = self

.ip_pool_allocated_count_on_connection(&conn, authz_pool)

.await?;

let ranges = self

.ip_pool_list_ranges_batched_on_connection(

&conn, authz_pool,

)

.await?;

Ok((allocated, ranges))

})

.await

.map_err(|e| match &e {

DieselError::NotFound => public_error_from_diesel(

e,

ErrorHandler::NotFoundByResource(authz_pool),

),

_ => public_error_from_diesel(e, ErrorHandler::Server),

})?;

let capacity = Self::accumulate_ip_range_sizes(ranges)?;

Ok((allocated, capacity))

}

/// Return the total number of IPs allocated from the provided pool.

#[cfg(test)]

async fn ip_pool_allocated_count(

&self,

opctx: &OpContext,

authz_pool: &authz::IpPool,

) -> Result<i64, Error> {

opctx.authorize(authz::Action::Read, authz_pool).await?;

let conn = self.pool_connection_authorized(opctx).await?;

self.ip_pool_allocated_count_on_connection(&conn, authz_pool)

.await

.map_err(|e| public_error_from_diesel(e, ErrorHandler::Server))

}

async fn ip_pool_allocated_count_on_connection(

&self,

conn: &async_bb8_diesel::Connection<DbConnection>,

authz_pool: &authz::IpPool,

) -> Result<i64, DieselError> {

use nexus_db_schema::schema::external_ip;

external_ip::table

.filter(external_ip::ip_pool_id.eq(authz_pool.id()))

.filter(external_ip::time_deleted.is_null())

.select(count(external_ip::ip).aggregate_distinct())

.first_async::<i64>(conn)

.await

}

/// Return the total capacity of the provided pool.

#[cfg(test)]

async fn ip_pool_total_capacity(

&self,

opctx: &OpContext,

authz_pool: &authz::IpPool,

) -> Result<u128, Error> {

opctx.authorize(authz::Action::Read, authz_pool).await?;

opctx.authorize(authz::Action::ListChildren, authz_pool).await?;

let conn = self.pool_connection_authorized(opctx).await?;

self.ip_pool_list_ranges_batched_on_connection(&conn, authz_pool)

.await

.map_err(|e| {

public_error_from_diesel(

e,

ErrorHandler::NotFoundByResource(authz_pool),

)

})

.and_then(Self::accumulate_ip_range_sizes)

}

async fn ip_pool_list_ranges_batched_on_connection(

&self,

conn: &async_bb8_diesel::Connection<DbConnection>,

authz_pool: &authz::IpPool,

) -> Result<Vec<(IpNetwork, IpNetwork)>, DieselError> {

use nexus_db_schema::schema::ip_pool_range;

ip_pool_range::table

.filter(ip_pool_range::ip_pool_id.eq(authz_pool.id()))

.filter(ip_pool_range::time_deleted.is_null())

.select((ip_pool_range::first_address, ip_pool_range::last_address))

// This is a rare unpaginated DB query, which means we are

// vulnerable to a resource exhaustion attack in which someone

// creates a very large number of ranges in order to make this

// query slow. In order to mitigate that, we limit the query to the

// (current) max allowed page size, effectively making this query

// exactly as vulnerable as if it were paginated. If there are more

// than 10,000 ranges in a pool, we will undercount, but I have a

// hard time seeing that as a practical problem.

.limit(10000)

.get_results_async::<(IpNetwork, IpNetwork)>(conn)

.await

}

fn accumulate_ip_range_sizes(

ranges: Vec<(IpNetwork, IpNetwork)>,

) -> Result<u128, Error> {

let mut count: u128 = 0;

for range in ranges.into_iter() {

let first = range.0.ip();

let last = range.1.ip();

let r = IpRange::try_from((first, last))

.map_err(|e| Error::internal_error(e.as_str()))?;

match r {

IpRange::V4(r) => count += u128::from(r.len()),

IpRange::V6(r) => count += r.len(),

}

}

Ok(count)

}

david-crespo · 2026-03-26T00:23:04Z

+    fn from(new: SubnetPoolUtilization) -> Self {
+        Self { allocated: new.capacity - new.remaining, capacity: new.capacity }
+    }
+}


This is silly and I think loses precision, but the endpoint was unimplemented, so there can't be anyone who is relying on the current version.

david-crespo · 2026-03-26T00:24:11Z

+    let capacity = (1u128 << 80) as f64;
+    assert_subnet_pool_utilization(client, SUBNET_POOL_NAME, 0.0, capacity)
+        .await;
+}


compare to

omicron/nexus/tests/integration_tests/ip_pools.rs

Lines 939 to 1026 in b2d7cde

// This is mostly about testing the total field with huge numbers.

// testing allocated is done in a bunch of other places. look for

// assert_ip_pool_utilization calls

#[nexus_test]

async fn test_ipv4_ip_pool_utilization_total(

cptestctx: &ControlPlaneTestContext,

) {

let client = &cptestctx.external_client;

let _pool = create_ipv4_pool(client, "p0").await;

assert_ip_pool_utilization(client, "p0", 0, 0.0).await;

let add_url = "/v1/system/ip-pools/p0/ranges/add";

// add just 5 addresses to get the party started

let range = IpRange::V4(

Ipv4Range::new(

std::net::Ipv4Addr::new(10, 0, 0, 1),

std::net::Ipv4Addr::new(10, 0, 0, 5),

)

.unwrap(),

);

object_create::<IpRange, IpPoolRange>(client, &add_url, &range).await;

assert_ip_pool_utilization(client, "p0", 0, 5.0).await;

}

// We're going to test adding an IPv6 pool and collecting its utilization, even

// though that requires operating directly on the datastore. When we resolve

// https://github.com/oxidecomputer/omicron/issues/8881, we can switch this to

// use the API to create the IPv6 pool and ranges instead.

#[nexus_test]

async fn test_ipv6_ip_pool_utilization_total(

cptestctx: &ControlPlaneTestContext,

) {

let client = &cptestctx.external_client;

let nexus = &cptestctx.server.server_context().nexus;

let datastore = nexus.datastore();

let log = cptestctx.logctx.log.new(o!());

let opctx = OpContext::for_tests(log, datastore.clone());

let identity = IdentityMetadataCreateParams {

name: "p0".parse().unwrap(),

description: String::new(),

};

let pool = datastore

.ip_pool_create(

&opctx,

nexus_db_model::IpPool::new(

&identity,

nexus_db_model::IpVersion::V6,

nexus_db_model::IpPoolReservationType::ExternalSilos,

),

)

.await

.expect("should be able to create IPv6 pool");

// Check the utilization is zero.

assert_ip_pool_utilization(client, "p0", 0, 0.0).await;

// Now let's add a gigantic range. This requires direct datastore

// shenanigans because adding IPv6 ranges through the API is currently not

// allowed. It's worth doing because we want this code to correctly handle

// IPv6 ranges when they are allowed again.

let by_id = NameOrId::Id(pool.id());

let (authz_pool, db_pool) = nexus

.ip_pool_lookup(&opctx, &by_id)

.expect("should be able to lookup pool we just created")

.fetch_for(authz::Action::CreateChild)

.await

.expect("should be able to fetch pool we just created");

let ipv6_range = Ipv6Range::new(

std::net::Ipv6Addr::new(0xfd00, 0, 0, 0, 0, 0, 0, 0),

std::net::Ipv6Addr::new(

0xfd00, 0, 0, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff,

),

)

.unwrap();

let big_range = IpRange::V6(ipv6_range);

datastore

.ip_pool_add_range(&opctx, &authz_pool, &db_pool, &big_range)

.await

.expect("could not add range");

let capacity = ipv6_range.len() as f64;

assert_ip_pool_utilization(client, "p0", 0, capacity).await;

}

david-crespo · 2026-03-26T00:24:45Z

+
+        db.terminate().await;
+        logctx.cleanup_successful();
+    }


compare to

omicron/nexus/db-queries/src/db/datastore/ip_pool.rs

Lines 2784 to 3039 in b2d7cde

// We're breaking out the utilization tests for IPv4 and IPv6 pools, since

// pools only contain one version now.

//

// See https://github.com/oxidecomputer/omicron/issues/8888.

#[tokio::test]

async fn test_ipv4_ip_pool_utilization() {

let logctx = dev::test_setup_log("test_ipv4_ip_pool_utilization");

let db = TestDatabase::new_with_datastore(&logctx.log).await;

let (opctx, datastore) = (db.opctx(), db.datastore());

let authz_silo = opctx.authn.silo_required().unwrap();

let project = Project::new(

authz_silo.id(),

project::ProjectCreate {

identity: IdentityMetadataCreateParams {

name: "my-project".parse().unwrap(),

description: "".to_string(),

},

},

);

let (.., project) =

datastore.project_create(&opctx, project).await.unwrap();

// create an IP pool for the silo, add a range to it, and link it to the silo

let identity = IdentityMetadataCreateParams {

name: "my-pool".parse().unwrap(),

description: "".to_string(),

};

let pool = datastore

.ip_pool_create(

&opctx,

IpPool::new(

&identity,

IpVersion::V4,

IpPoolReservationType::ExternalSilos,

),

)

.await

.expect("Failed to create IP pool");

let authz_pool = authz::IpPool::new(

authz::FLEET,

pool.id(),

LookupType::ById(pool.id()),

);

// capacity of zero because there are no ranges

let max_ips = datastore

.ip_pool_total_capacity(&opctx, &authz_pool)

.await

.unwrap();

assert_eq!(max_ips, 0);

let range = IpRange::V4(

Ipv4Range::new(

Ipv4Addr::new(10, 0, 0, 1),

Ipv4Addr::new(10, 0, 0, 5),

)

.unwrap(),

);

datastore

.ip_pool_add_range(&opctx, &authz_pool, &pool, &range)

.await

.expect("Could not add range");

// now it has a capacity of 5 because we added the range

let max_ips = datastore

.ip_pool_total_capacity(&opctx, &authz_pool)

.await

.unwrap();

assert_eq!(max_ips, 5);

let link = IncompleteIpPoolResource {

ip_pool_id: pool.id(),

resource_type: IpPoolResourceType::Silo,

resource_id: authz_silo.id(),

is_default: true,

};

datastore

.ip_pool_link_silo(&opctx, link)

.await

.expect("Could not link pool to silo");

let ip_count = datastore

.ip_pool_allocated_count(&opctx, &authz_pool)

.await

.unwrap();

assert_eq!(ip_count, 0);

let identity = IdentityMetadataCreateParams {

name: "my-ip".parse().unwrap(),

description: "".to_string(),

};

// Allocate from default pool (no explicit pool or IP version)

let ip = datastore

.allocate_floating_ip(

&opctx,

project.id(),

identity,

FloatingIpAllocation::Auto { pool: None, ip_version: None },

)

.await

.expect("Could not allocate floating IP");

assert_eq!(ip.ip.to_string(), "10.0.0.1/32");

let ip_count = datastore

.ip_pool_allocated_count(&opctx, &authz_pool)

.await

.unwrap();

assert_eq!(ip_count, 1);

// allocating one has nothing to do with total capacity

let max_ips = datastore

.ip_pool_total_capacity(&opctx, &authz_pool)

.await

.unwrap();

assert_eq!(max_ips, 5);

db.terminate().await;

logctx.cleanup_successful();

}

#[tokio::test]

async fn test_ipv6_ip_pool_utilization() {

let logctx = dev::test_setup_log("test_ipv6_ip_pool_utilization");

let db = TestDatabase::new_with_datastore(&logctx.log).await;

let (opctx, datastore) = (db.opctx(), db.datastore());

let authz_silo = opctx.authn.silo_required().unwrap();

let project = Project::new(

authz_silo.id(),

project::ProjectCreate {

identity: IdentityMetadataCreateParams {

name: "my-project".parse().unwrap(),

description: "".to_string(),

},

},

);

let (.., project) =

datastore.project_create(&opctx, project).await.unwrap();

// create an IP pool for the silo, add a range to it, and link it to the silo

let identity = IdentityMetadataCreateParams {

name: "my-pool".parse().unwrap(),

description: "".to_string(),

};

let pool = datastore

.ip_pool_create(

&opctx,

IpPool::new(

&identity,

IpVersion::V6,

IpPoolReservationType::ExternalSilos,

),

)

.await

.expect("Failed to create IP pool");

let authz_pool = authz::IpPool::new(

authz::FLEET,

pool.id(),

LookupType::ById(pool.id()),

);

let link = IncompleteIpPoolResource {

ip_pool_id: pool.id(),

resource_type: IpPoolResourceType::Silo,

resource_id: authz_silo.id(),

is_default: true,

};

datastore

.ip_pool_link_silo(&opctx, link)

.await

.expect("Could not link pool to silo");

// capacity of zero because there are no ranges

let max_ips = datastore

.ip_pool_total_capacity(&opctx, &authz_pool)

.await

.unwrap();

assert_eq!(max_ips, 0);

// Add an IPv6 range

let ipv6_range = IpRange::V6(

Ipv6Range::new(

Ipv6Addr::new(0xfd00, 0, 0, 0, 0, 0, 0, 10),

Ipv6Addr::new(0xfd00, 0, 0, 0, 0, 0, 1, 20),

)

.unwrap(),

);

datastore

.ip_pool_add_range(&opctx, &authz_pool, &pool, &ipv6_range)

.await

.expect("Could not add range");

let max_ips = datastore

.ip_pool_total_capacity(&opctx, &authz_pool)

.await

.unwrap();

assert_eq!(max_ips, 11 + 65536);

let ip_count = datastore

.ip_pool_allocated_count(&opctx, &authz_pool)

.await

.unwrap();

assert_eq!(ip_count, 0);

let identity = IdentityMetadataCreateParams {

name: "my-ip".parse().unwrap(),

description: "".to_string(),

};

// Allocate from default pool (no explicit pool or IP version)

let ip = datastore

.allocate_floating_ip(

&opctx,

project.id(),

identity,

FloatingIpAllocation::Auto { pool: None, ip_version: None },

)

.await

.expect("Could not allocate floating IP");

assert_eq!(ip.ip.to_string(), "fd00::a/128");

let ip_count = datastore

.ip_pool_allocated_count(&opctx, &authz_pool)

.await

.unwrap();

assert_eq!(ip_count, 1);

// allocating one has nothing to do with total capacity

let max_ips = datastore

.ip_pool_total_capacity(&opctx, &authz_pool)

.await

.unwrap();

assert_eq!(max_ips, 11 + 65536);

// add a giant range for fun

let ipv6_range = IpRange::V6(

Ipv6Range::new(

Ipv6Addr::new(0xfd00, 0, 0, 0, 0, 0, 1, 21),

Ipv6Addr::new(

0xfd00, 0, 0, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff,

),

)

.unwrap(),

);

datastore

.ip_pool_add_range(&opctx, &authz_pool, &pool, &ipv6_range)

.await

.expect("Could not add range");

let max_ips = datastore

.ip_pool_total_capacity(&opctx, &authz_pool)

.await

.unwrap();

assert_eq!(max_ips, 1208925819614629174706166);

db.terminate().await;

logctx.cleanup_successful();

}

david-crespo · 2026-03-26T18:09:36Z

+        };
+        let remaining = remaining as f64;
+        let capacity = capacity as f64;
+        Ok(subnet_pool_types::SubnetPoolUtilization { remaining, capacity })


Compare to the below. I noticed this is a utilization file instead of in the IP pools file. Not sure that really makes sense but I'll probably try to make it consistent.

omicron/nexus/src/app/utilization.rs

Lines 35 to 65 in db6f6b0

pub async fn ip_pool_utilization_view(

&self,

opctx: &OpContext,

pool_lookup: &lookup::IpPool<'_>,

) -> Result<IpPoolUtilization, Error> {

let (.., authz_pool) =

pool_lookup.lookup_for(authz::Action::Read).await?;

let (allocated, capacity) =

self.db_datastore.ip_pool_utilization(opctx, &authz_pool).await?;

// Compute the remaining count in full 128-bit arithmetic, checking for

// negative values, and convert to f64s at the end.

let Ok(allocated) = u128::try_from(allocated) else {

return Err(Error::internal_error(

"Impossible negative number of allocated IP addresses",

));

};

let Some(remaining) = capacity.checked_sub(allocated) else {

return Err(Error::internal_error(

format!(

"Computed an impossible negative count of remaining IP \

addresses. Capacity = {capacity}, allocated = {allocated}"

)

.as_str(),

));

};

let remaining = remaining as f64;

let capacity = capacity as f64;

Ok(IpPoolUtilization { remaining, capacity })

}

mergeconflict

Sorry for the very late review!

mergeconflict · 2026-04-02T01:48:41Z

+            .filter(subnet_pool_member::subnet_pool_id.eq(pool_id))
+            .filter(subnet_pool_member::time_deleted.is_null())
+            .select(subnet_pool_member::subnet)
+            .limit(10000)


I have no idea how likely we are to ever hit the limit(10000) here (and below in subnet_pool_list_allocated_subnets_on_connection), but if we do, we'll start missing stuff.

I wonder if it's possible to do this DB work in a more pipelined fashion so that we don't have to return a giant Vec<IpNet>. Like, would it be possible to join subnet_pool_member with external_subnet, paginate that query, and keep a running tally of the allocations? Or maybe better, is it possible to express the whole thing as SQL so you can just get a pair of scalar values back from the query?

There's a reason it's not in SQL, but I don't remember what it is. Let me find the PR where I added the IP pool utilization logic.

Found it: #5258 (comment)

neither CRDB nor Postgres support subtraction of IPv6 addresses when the result is greater than a 64 bit integer, and we couldn't figure out how to convert the addresses to 128 bit integers first and then subtract

I'll let the robots have a shot at it but mostly likely you can't do it.

We needed subtraction in the IP Range case because they are generic (start, end) tuples. We don't need that here. We do need addition, but since we're working on subnets, which are explicitly powers-of-two, that gives us some options. We can either:

Convert the sizes to floats in the DB and sum that way. We lose precision, but we already lose precision in the API response anyway.

Keep track of everything as powers of two, and then do the final exponentiation in Rust where we have a native 128-bit type. There are probably a few ways we could do that, such as keeping track of the count of all subnet sizes / mask lengths grouped by the subnet size. There are at most 128 values to keep track of here.

This also raises a question IMO: do we want to keep the subnet sizes in the response? For IP Pool utilization, the unit is always a single address, so counting 1s is correct. Here, someone might care about how the members in a pool are broken down as a function of the sizes of those subnets. It would tell you something about fragmentation, which is part of why we added the min / max prefix lengths in the first place.

Do you have an idea of how to represent fragmentation in a useful way without listing everything there is? it also matters how it's packed, right? Not just the sizes. Like maybe what you really want is the size of the largest gap or something.

Claude likes option 1 a lot (it ran a bunch of SQL to test it), will try it.

Details

OK, here's the full picture. Three viable approaches for eliminating the limit(10000):

Approach 1: Pure float in SQL — return two scalar f64 values directly:

SELECT (SELECT SUM(pow(2::FLOAT8, (CASE WHEN family(subnet) = 4 THEN 32 ELSE 128 END - masklen(subnet))::FLOAT8)) FROM subnet_pool_member WHERE ...) AS capacity, (SELECT SUM(pow(2::FLOAT8, ...)) FROM external_subnet WHERE ...) AS allocated;

This works and returns exactly the (f64, f64) that the API needs (SubnetPoolUtilization is { allocated: f64, capacity: f64 }). The precision loss is real — a /16 (65536 addresses) vanishes when summed with two /48s — but since the API is already f64, this doesn't make things any worse than what the client sees. The imprecision exists today too, just introduced at a different point (Rust u128 → f64 in the API response). This is the simplest option and eliminates both the limit and the round-trip of row data entirely.

Approach 2: GROUP BY masklen, compute in Rust — your colleague's second suggestion:

SELECT 'member' AS source, family(subnet) AS ip_ver, masklen(subnet) AS prefix_len, count(*) AS cnt FROM subnet_pool_member WHERE ... GROUP BY family(subnet), masklen(subnet) UNION ALL SELECT 'allocated', family(subnet), masklen(subnet), count(*) FROM external_subnet WHERE ... GROUP BY family(subnet), masklen(subnet);

Returns at most ~258 rows (129 possible prefix lengths × 2 sources). Rust then does sum(cnt * 2^(bits - prefix_len)) with u128 arithmetic. This is exact and eliminates the limit, but requires a custom Diesel query and more Rust-side processing.

Approach 3: Paginate the existing queries. Replace limit(10000) with proper pagination. Most complex, least benefit — you'd still be fetching all the rows and accumulating in Rust.

Given that the API type is already f64, approach 1 seems like the obvious winner: one SQL query, two scalar values, no limit, no row transfer, and no precision regression relative to what the API already exposes. The only argument for approach 2 is if you wanted to keep exact u128 internally for some reason (e.g., logging, assertions, or a future API change away from f64).

Do you have an idea of how to represent fragmentation in a useful way without listing everything there is? it also matters how it's packed, right? Not just the sizes. Like maybe what you really want is the size of the largest gap or something.

How it's packed does matter, yeah. Ideally you'd want to know how many "slots" there are of each legal size, but that might not be worth it. The largest gap could be enough. We're getting solidly into the "fancy" and "subjective" realm, but you could imagine a "fragmentation index". Something like the ratio of the number of subnets there actually are to the smallest number that could "cover" the same portion of the address space. For example, if you had a bunch of small subnets spaced throughout, those could be compacted into one larger one. 1 - (minimum subnets to cover / actual subnets that cover) would give you an index in [0, 1] with higher numbers being more fragmented.

I admit this is pretty esoteric, and I'm not sure what you do with it. The next question is going to just be, "Ok, how can I rejigger this super-fragmented pool?", which requires looking at the actual allocated external subnets.

All that's to say, converting to floats in the DB seems fine to me too at this point.

Got it in c620b02. All existing tests pass. Huge victory for the robot — I don't think I could have done this. First it wrote straight SQL, and that was fine. Then I had it convert it to proper Diesel and it was even better.

Add assert_subnet_pool_utilization calls to verify that utilization tracks correctly as subnets are allocated/deleted and pool members are added/removed. external_subnets.rs (allocated utilization): - external_subnet_basic_crud: 0/256 before, 16/256 after /28, 0/256 after delete - external_subnet_pagination: 160/256 after 10 /28s - attach_test_impl (running + stopped): 16/256 after /28 - cannot_attach_too_many_subnets: MAX*4/256, then (MAX+1)*4/256 - external_subnet_create_name_conflict: 16/256 after first, still 16/256 after failed dup - test_limited_collaborator_external_subnet_lifecycle: 16/256 after create, 0/256 after delete subnet_pools.rs (capacity utilization): - basic_subnet_pool_member_crd: 0 initially, grows with 100 /48 members, shrinks after removing one Add comment to dedicated utilization tests pointing to the external subnet tests for allocated coverage.

mergeconflict

Nice!

oxidecomputer/omicron#10157 implements this on the API side. Because subnet pools don't have a type field like IP pools, I had to add last modified time to the properties table on the details page to get an even number of cells. <img width="1122" height="401" alt="image" src="https://github.com/user-attachments/assets/4d7b0f48-67d5-4b54-945f-a2343be471f5" /> <img width="1116" height="245" alt="image" src="https://github.com/user-attachments/assets/825e445d-46e7-4a7c-bccf-22ea2e24f0e1" />

david-crespo force-pushed the subnet-pool-utilization branch from b699fb4 to 4e58295 Compare March 25, 2026 23:47

david-crespo commented Mar 26, 2026

View reviewed changes

david-crespo marked this pull request as ready for review March 26, 2026 00:23

david-crespo commented Mar 26, 2026

View reviewed changes

david-crespo requested a review from mergeconflict March 26, 2026 16:22

david-crespo commented Mar 26, 2026

View reviewed changes

Comment thread nexus/src/app/subnet_pool.rs

make and use subnet_pool_lookup

2ce7e3a

david-crespo commented Mar 26, 2026

View reviewed changes

mergeconflict reviewed Apr 2, 2026

View reviewed changes

david-crespo force-pushed the subnet-pool-utilization branch from 708d46f to 45d1151 Compare April 2, 2026 15:03

do subnet pool calc in sql

c620b02

mergeconflict approved these changes Apr 2, 2026

View reviewed changes

david-crespo enabled auto-merge (squash) April 2, 2026 21:52

david-crespo merged commit 254a0c5 into main Apr 2, 2026
16 checks passed

david-crespo deleted the subnet-pool-utilization branch April 2, 2026 22:42

david-crespo mentioned this pull request Apr 2, 2026

Subnet pool utilization oxidecomputer/console#3165

Merged

	/// Return the number of IPs allocated from and the capacity of the provided
	/// IP Pool.
	pub async fn ip_pool_utilization(
	&self,
	opctx: &OpContext,
	authz_pool: &authz::IpPool,
	) -> Result<(i64, u128), Error> {
	opctx.authorize(authz::Action::Read, authz_pool).await?;
	opctx.authorize(authz::Action::ListChildren, authz_pool).await?;
	let conn = self.pool_connection_authorized(opctx).await?;
	let (allocated, ranges) = self
	.transaction_retry_wrapper("ip_pool_utilization")
	.transaction(&conn, \|conn\| async move {
	let allocated = self
	.ip_pool_allocated_count_on_connection(&conn, authz_pool)
	.await?;
	let ranges = self
	.ip_pool_list_ranges_batched_on_connection(
	&conn, authz_pool,
	)
	.await?;
	Ok((allocated, ranges))
	})
	.await
	.map_err(\|e\| match &e {
	DieselError::NotFound => public_error_from_diesel(
	e,
	ErrorHandler::NotFoundByResource(authz_pool),
	),
	_ => public_error_from_diesel(e, ErrorHandler::Server),
	})?;
	let capacity = Self::accumulate_ip_range_sizes(ranges)?;
	Ok((allocated, capacity))
	}

	/// Return the total number of IPs allocated from the provided pool.
	#[cfg(test)]
	async fn ip_pool_allocated_count(
	&self,
	opctx: &OpContext,
	authz_pool: &authz::IpPool,
	) -> Result<i64, Error> {
	opctx.authorize(authz::Action::Read, authz_pool).await?;
	let conn = self.pool_connection_authorized(opctx).await?;
	self.ip_pool_allocated_count_on_connection(&conn, authz_pool)
	.await
	.map_err(\|e\| public_error_from_diesel(e, ErrorHandler::Server))
	}

	async fn ip_pool_allocated_count_on_connection(
	&self,
	conn: &async_bb8_diesel::Connection<DbConnection>,
	authz_pool: &authz::IpPool,
	) -> Result<i64, DieselError> {
	use nexus_db_schema::schema::external_ip;
	external_ip::table
	.filter(external_ip::ip_pool_id.eq(authz_pool.id()))
	.filter(external_ip::time_deleted.is_null())
	.select(count(external_ip::ip).aggregate_distinct())
	.first_async::<i64>(conn)
	.await
	}

	/// Return the total capacity of the provided pool.
	#[cfg(test)]
	async fn ip_pool_total_capacity(
	&self,
	opctx: &OpContext,
	authz_pool: &authz::IpPool,
	) -> Result<u128, Error> {
	opctx.authorize(authz::Action::Read, authz_pool).await?;
	opctx.authorize(authz::Action::ListChildren, authz_pool).await?;
	let conn = self.pool_connection_authorized(opctx).await?;
	self.ip_pool_list_ranges_batched_on_connection(&conn, authz_pool)
	.await
	.map_err(\|e\| {
	public_error_from_diesel(
	e,
	ErrorHandler::NotFoundByResource(authz_pool),
	)
	})
	.and_then(Self::accumulate_ip_range_sizes)
	}

	async fn ip_pool_list_ranges_batched_on_connection(
	&self,
	conn: &async_bb8_diesel::Connection<DbConnection>,
	authz_pool: &authz::IpPool,
	) -> Result<Vec<(IpNetwork, IpNetwork)>, DieselError> {
	use nexus_db_schema::schema::ip_pool_range;
	ip_pool_range::table
	.filter(ip_pool_range::ip_pool_id.eq(authz_pool.id()))
	.filter(ip_pool_range::time_deleted.is_null())
	.select((ip_pool_range::first_address, ip_pool_range::last_address))
	// This is a rare unpaginated DB query, which means we are
	// vulnerable to a resource exhaustion attack in which someone
	// creates a very large number of ranges in order to make this
	// query slow. In order to mitigate that, we limit the query to the
	// (current) max allowed page size, effectively making this query
	// exactly as vulnerable as if it were paginated. If there are more
	// than 10,000 ranges in a pool, we will undercount, but I have a
	// hard time seeing that as a practical problem.
	.limit(10000)
	.get_results_async::<(IpNetwork, IpNetwork)>(conn)
	.await
	}

	fn accumulate_ip_range_sizes(
	ranges: Vec<(IpNetwork, IpNetwork)>,
	) -> Result<u128, Error> {
	let mut count: u128 = 0;
	for range in ranges.into_iter() {
	let first = range.0.ip();
	let last = range.1.ip();
	let r = IpRange::try_from((first, last))
	.map_err(\|e\| Error::internal_error(e.as_str()))?;
	match r {
	IpRange::V4(r) => count += u128::from(r.len()),
	IpRange::V6(r) => count += r.len(),
	}
	}
	Ok(count)
	}

	// This is mostly about testing the total field with huge numbers.
	// testing allocated is done in a bunch of other places. look for
	// assert_ip_pool_utilization calls
	#[nexus_test]
	async fn test_ipv4_ip_pool_utilization_total(
	cptestctx: &ControlPlaneTestContext,
	) {
	let client = &cptestctx.external_client;

	let _pool = create_ipv4_pool(client, "p0").await;

	assert_ip_pool_utilization(client, "p0", 0, 0.0).await;

	let add_url = "/v1/system/ip-pools/p0/ranges/add";

	// add just 5 addresses to get the party started
	let range = IpRange::V4(
	Ipv4Range::new(
	std::net::Ipv4Addr::new(10, 0, 0, 1),
	std::net::Ipv4Addr::new(10, 0, 0, 5),
	)
	.unwrap(),
	);
	object_create::<IpRange, IpPoolRange>(client, &add_url, &range).await;

	assert_ip_pool_utilization(client, "p0", 0, 5.0).await;
	}

	// We're going to test adding an IPv6 pool and collecting its utilization, even
	// though that requires operating directly on the datastore. When we resolve
	// https://github.com/oxidecomputer/omicron/issues/8881, we can switch this to
	// use the API to create the IPv6 pool and ranges instead.
	#[nexus_test]
	async fn test_ipv6_ip_pool_utilization_total(
	cptestctx: &ControlPlaneTestContext,
	) {
	let client = &cptestctx.external_client;

	let nexus = &cptestctx.server.server_context().nexus;
	let datastore = nexus.datastore();
	let log = cptestctx.logctx.log.new(o!());
	let opctx = OpContext::for_tests(log, datastore.clone());
	let identity = IdentityMetadataCreateParams {
	name: "p0".parse().unwrap(),
	description: String::new(),
	};
	let pool = datastore
	.ip_pool_create(
	&opctx,
	nexus_db_model::IpPool::new(
	&identity,
	nexus_db_model::IpVersion::V6,
	nexus_db_model::IpPoolReservationType::ExternalSilos,
	),
	)
	.await
	.expect("should be able to create IPv6 pool");

	// Check the utilization is zero.
	assert_ip_pool_utilization(client, "p0", 0, 0.0).await;

	// Now let's add a gigantic range. This requires direct datastore
	// shenanigans because adding IPv6 ranges through the API is currently not
	// allowed. It's worth doing because we want this code to correctly handle
	// IPv6 ranges when they are allowed again.
	let by_id = NameOrId::Id(pool.id());
	let (authz_pool, db_pool) = nexus
	.ip_pool_lookup(&opctx, &by_id)
	.expect("should be able to lookup pool we just created")
	.fetch_for(authz::Action::CreateChild)
	.await
	.expect("should be able to fetch pool we just created");
	let ipv6_range = Ipv6Range::new(
	std::net::Ipv6Addr::new(0xfd00, 0, 0, 0, 0, 0, 0, 0),
	std::net::Ipv6Addr::new(
	0xfd00, 0, 0, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff,
	),
	)
	.unwrap();
	let big_range = IpRange::V6(ipv6_range);
	datastore
	.ip_pool_add_range(&opctx, &authz_pool, &db_pool, &big_range)
	.await
	.expect("could not add range");

	let capacity = ipv6_range.len() as f64;
	assert_ip_pool_utilization(client, "p0", 0, capacity).await;
	}

	// We're breaking out the utilization tests for IPv4 and IPv6 pools, since
	// pools only contain one version now.
	//
	// See https://github.com/oxidecomputer/omicron/issues/8888.
	#[tokio::test]
	async fn test_ipv4_ip_pool_utilization() {
	let logctx = dev::test_setup_log("test_ipv4_ip_pool_utilization");
	let db = TestDatabase::new_with_datastore(&logctx.log).await;
	let (opctx, datastore) = (db.opctx(), db.datastore());

	let authz_silo = opctx.authn.silo_required().unwrap();
	let project = Project::new(
	authz_silo.id(),
	project::ProjectCreate {
	identity: IdentityMetadataCreateParams {
	name: "my-project".parse().unwrap(),
	description: "".to_string(),
	},
	},
	);
	let (.., project) =
	datastore.project_create(&opctx, project).await.unwrap();

	// create an IP pool for the silo, add a range to it, and link it to the silo
	let identity = IdentityMetadataCreateParams {
	name: "my-pool".parse().unwrap(),
	description: "".to_string(),
	};
	let pool = datastore
	.ip_pool_create(
	&opctx,
	IpPool::new(
	&identity,
	IpVersion::V4,
	IpPoolReservationType::ExternalSilos,
	),
	)
	.await
	.expect("Failed to create IP pool");
	let authz_pool = authz::IpPool::new(
	authz::FLEET,
	pool.id(),
	LookupType::ById(pool.id()),
	);

	// capacity of zero because there are no ranges
	let max_ips = datastore
	.ip_pool_total_capacity(&opctx, &authz_pool)
	.await
	.unwrap();
	assert_eq!(max_ips, 0);

	let range = IpRange::V4(
	Ipv4Range::new(
	Ipv4Addr::new(10, 0, 0, 1),
	Ipv4Addr::new(10, 0, 0, 5),
	)
	.unwrap(),
	);
	datastore
	.ip_pool_add_range(&opctx, &authz_pool, &pool, &range)
	.await
	.expect("Could not add range");

	// now it has a capacity of 5 because we added the range
	let max_ips = datastore
	.ip_pool_total_capacity(&opctx, &authz_pool)
	.await
	.unwrap();
	assert_eq!(max_ips, 5);

	let link = IncompleteIpPoolResource {
	ip_pool_id: pool.id(),
	resource_type: IpPoolResourceType::Silo,
	resource_id: authz_silo.id(),
	is_default: true,
	};
	datastore
	.ip_pool_link_silo(&opctx, link)
	.await
	.expect("Could not link pool to silo");

	let ip_count = datastore
	.ip_pool_allocated_count(&opctx, &authz_pool)
	.await
	.unwrap();
	assert_eq!(ip_count, 0);

	let identity = IdentityMetadataCreateParams {
	name: "my-ip".parse().unwrap(),
	description: "".to_string(),
	};
	// Allocate from default pool (no explicit pool or IP version)
	let ip = datastore
	.allocate_floating_ip(
	&opctx,
	project.id(),
	identity,
	FloatingIpAllocation::Auto { pool: None, ip_version: None },
	)
	.await
	.expect("Could not allocate floating IP");
	assert_eq!(ip.ip.to_string(), "10.0.0.1/32");

	let ip_count = datastore
	.ip_pool_allocated_count(&opctx, &authz_pool)
	.await
	.unwrap();
	assert_eq!(ip_count, 1);

	// allocating one has nothing to do with total capacity
	let max_ips = datastore
	.ip_pool_total_capacity(&opctx, &authz_pool)
	.await
	.unwrap();
	assert_eq!(max_ips, 5);

	db.terminate().await;
	logctx.cleanup_successful();
	}

	#[tokio::test]
	async fn test_ipv6_ip_pool_utilization() {
	let logctx = dev::test_setup_log("test_ipv6_ip_pool_utilization");
	let db = TestDatabase::new_with_datastore(&logctx.log).await;
	let (opctx, datastore) = (db.opctx(), db.datastore());

	let authz_silo = opctx.authn.silo_required().unwrap();
	let project = Project::new(
	authz_silo.id(),
	project::ProjectCreate {
	identity: IdentityMetadataCreateParams {
	name: "my-project".parse().unwrap(),
	description: "".to_string(),
	},
	},
	);
	let (.., project) =
	datastore.project_create(&opctx, project).await.unwrap();

	// create an IP pool for the silo, add a range to it, and link it to the silo
	let identity = IdentityMetadataCreateParams {
	name: "my-pool".parse().unwrap(),
	description: "".to_string(),
	};
	let pool = datastore
	.ip_pool_create(
	&opctx,
	IpPool::new(
	&identity,
	IpVersion::V6,
	IpPoolReservationType::ExternalSilos,
	),
	)
	.await
	.expect("Failed to create IP pool");
	let authz_pool = authz::IpPool::new(
	authz::FLEET,
	pool.id(),
	LookupType::ById(pool.id()),
	);
	let link = IncompleteIpPoolResource {
	ip_pool_id: pool.id(),
	resource_type: IpPoolResourceType::Silo,
	resource_id: authz_silo.id(),
	is_default: true,
	};
	datastore
	.ip_pool_link_silo(&opctx, link)
	.await
	.expect("Could not link pool to silo");

	// capacity of zero because there are no ranges
	let max_ips = datastore
	.ip_pool_total_capacity(&opctx, &authz_pool)
	.await
	.unwrap();
	assert_eq!(max_ips, 0);

	// Add an IPv6 range
	let ipv6_range = IpRange::V6(
	Ipv6Range::new(
	Ipv6Addr::new(0xfd00, 0, 0, 0, 0, 0, 0, 10),
	Ipv6Addr::new(0xfd00, 0, 0, 0, 0, 0, 1, 20),
	)
	.unwrap(),
	);
	datastore
	.ip_pool_add_range(&opctx, &authz_pool, &pool, &ipv6_range)
	.await
	.expect("Could not add range");
	let max_ips = datastore
	.ip_pool_total_capacity(&opctx, &authz_pool)
	.await
	.unwrap();
	assert_eq!(max_ips, 11 + 65536);

	let ip_count = datastore
	.ip_pool_allocated_count(&opctx, &authz_pool)
	.await
	.unwrap();
	assert_eq!(ip_count, 0);

	let identity = IdentityMetadataCreateParams {
	name: "my-ip".parse().unwrap(),
	description: "".to_string(),
	};
	// Allocate from default pool (no explicit pool or IP version)
	let ip = datastore
	.allocate_floating_ip(
	&opctx,
	project.id(),
	identity,
	FloatingIpAllocation::Auto { pool: None, ip_version: None },
	)
	.await
	.expect("Could not allocate floating IP");
	assert_eq!(ip.ip.to_string(), "fd00::a/128");

	let ip_count = datastore
	.ip_pool_allocated_count(&opctx, &authz_pool)
	.await
	.unwrap();
	assert_eq!(ip_count, 1);

	// allocating one has nothing to do with total capacity
	let max_ips = datastore
	.ip_pool_total_capacity(&opctx, &authz_pool)
	.await
	.unwrap();
	assert_eq!(max_ips, 11 + 65536);

	// add a giant range for fun
	let ipv6_range = IpRange::V6(
	Ipv6Range::new(
	Ipv6Addr::new(0xfd00, 0, 0, 0, 0, 0, 1, 21),
	Ipv6Addr::new(
	0xfd00, 0, 0, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff,
	),
	)
	.unwrap(),
	);
	datastore
	.ip_pool_add_range(&opctx, &authz_pool, &pool, &ipv6_range)
	.await
	.expect("Could not add range");

	let max_ips = datastore
	.ip_pool_total_capacity(&opctx, &authz_pool)
	.await
	.unwrap();
	assert_eq!(max_ips, 1208925819614629174706166);

	db.terminate().await;
	logctx.cleanup_successful();
	}

	pub async fn ip_pool_utilization_view(
	&self,
	opctx: &OpContext,
	pool_lookup: &lookup::IpPool<'_>,
	) -> Result<IpPoolUtilization, Error> {
	let (.., authz_pool) =
	pool_lookup.lookup_for(authz::Action::Read).await?;

	let (allocated, capacity) =
	self.db_datastore.ip_pool_utilization(opctx, &authz_pool).await?;

	// Compute the remaining count in full 128-bit arithmetic, checking for
	// negative values, and convert to f64s at the end.
	let Ok(allocated) = u128::try_from(allocated) else {
	return Err(Error::internal_error(
	"Impossible negative number of allocated IP addresses",
	));
	};
	let Some(remaining) = capacity.checked_sub(allocated) else {
	return Err(Error::internal_error(
	format!(
	"Computed an impossible negative count of remaining IP \
	addresses. Capacity = {capacity}, allocated = {allocated}"
	)
	.as_str(),
	));
	};
	let remaining = remaining as f64;
	let capacity = capacity as f64;
	Ok(IpPoolUtilization { remaining, capacity })
	}

Conversation

david-crespo commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergeconflict left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-crespo Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergeconflict left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

david-crespo commented Mar 25, 2026 •

edited

Loading

david-crespo Apr 2, 2026 •

edited

Loading