Skip to main content

Integration

Integrating ColonyOS with a satellite constellation raises problems that do not exist in traditional distributed computing.

Problems

  • Intermittent ground contact — satellites are in line of sight of a ground station for only 5–15 minutes per ~95-minute orbit. Even with inter-satellite links, there are periods where no satellite in the constellation has line of sight to any ground station — the entire constellation is temporarily unreachable from the ground. ColonyOS requires executors to respond to the server regularly; a satellite executor would be considered dead during these gaps, which are the normal state, not a failure.
  • Split originator and completer — Earth rotates during a job. The satellite that had line of sight when the job was uploaded may not be the one with line of sight when the result is ready. ColonyOS expects the assigned executor to close the process, but the result may arrive through a different satellite.
  • Multi-node jobs — a single SpaceCoMP job requires multiple satellites working together (collectors, mappers, reducer). ColonyOS assigns a process to a single executor. Workflows (DAGs) could chain multiple processes, but each step requires a round-trip to the ColonyOS server to complete one process and assign the next. With the server on the ground and intermittent contact, this adds a full ground round-trip per step.
  • Position-dependent assignment — which satellite runs which role depends on its orbital position relative to the area of interest, not on generic worker availability. The planner must use orbital predictions. ColonyOS's executor-type matching does not account for geographic position.
  • Pull-based model — ColonyOS executors pull work by calling assign(), which requires a persistent connection to the server. Maintaining this connection through intermittent ground links is difficult. ColonyOS does not natively support push — the server cannot send a job to an executor without the executor first requesting one.
  • Embedded execution — satellites run no_std Rust on cFS with no HTTP, no Go runtime, and no general-purpose OS. While ColonyOS can support different protocols, the on-board communication stack (SRSPP, ISL routing) is fundamentally different from the server-client model used by existing ColonyOS executors.

Solutions

Ground Coordinator

The solution: a ground process acts as the ColonyOS executor. It is always reachable — no keepalive problem. ColonyOS sees a single, always-available executor and requires no modifications. Satellites execute autonomously over ISL, outside ColonyOS's awareness.

Sat A
← ISL →
Sat B
← ISL →
Sat C
↕ ISL
LOS Satellite
↕ ground link
Ground Coordinator (executor)
↕ assign / result
ColonyOS Server (cloud)
Satellite (outside ColonyOS)
Executor
ColonyOS Server

Ground-Originated Jobs

  1. User submits a SpaceCoMP job to ColonyOS (AOI, algorithm, parameters).
  2. Ground coordinator pulls the job via assign().
  3. Coordinator computes the plan: which satellites are collectors, mappers, reducer (using orbital predictions).
  4. Coordinator uploads assignments to the constellation via the current LOS satellite.
  5. Satellites execute autonomously over ISL. Data flows directly between satellites via SRSPP.
  6. Result routes back through whichever satellite has LOS at completion — Earth rotates during the job, so the originating LOS satellite may no longer be visible.
  7. Ground coordinator reports the result to ColonyOS.

Satellite-Originated Jobs

  1. A satellite detects an anomaly in sensor data (e.g., displacement threshold exceeded in a workflow).
  2. It routes a job request through the ISL mesh to the LOS satellite.
  3. LOS satellite relays the request to the ground coordinator.
  4. Ground coordinator submits to ColonyOS, pulls it back, and orchestrates as above.

For time-critical cases, a satellite could coordinate directly in orbit — planning and assigning roles without involving the ground. This eliminates the ground round-trip but cannot leverage ColonyOS for orchestration.

GEO Relay

A GEO relay satellite (such as TDRSS or EDRS) provides near-continuous low-bandwidth connectivity between LEO satellites and the ground, eliminating the ground contact gap. This makes the ground coordinator more responsive — job uploads and status updates no longer wait for a direct ground pass.

The ground coordinator design still applies with a GEO relay. Satellite orbits are deterministic — the ground knows exactly where every satellite is and when it will be available. There is no need for satellites to continuously check in with a server to advertise their availability. The ground coordinator computes the plan using orbital predictions, uploads it when a link is available (faster with a GEO relay), and satellites execute autonomously. This requires no changes to ColonyOS.

Alternative Designs Considered

Direct Communication

Every satellite is a ColonyOS executor, each holding a blocking assign() connection routed through the LOS gateway. This does not work well with the current version of ColonyOS: hundreds of connections through one gateway, every phase transition bounces through the ground, and satellites are constantly considered dead due to LOS cycling.

Sat A (executor)
Sat B (executor)
Sat C (executor)
↕ assign / result per satellite
LOS Gateway
↕ ground link
ColonyOS Server (cloud)

Satellite Coordinator

One satellite is the ColonyOS executor. It receives jobs from the server and coordinates other satellites over ISL. Data flows directly between satellites, and ColonyOS sees one job in, one result out. But the coordinator satellite also goes in and out of LOS — the same keepalive problem remains.

Sat A
← ISL →
Sat B
← ISL →
Sat C
↕ ISL
Coordinator Sat (executor)
↕ ground link
ColonyOS Server (cloud)

Satellite Server (Future)

ColonyOS server running on a satellite. Executors (other satellites) communicate entirely over ISL — no ground round-trips for orchestration. The keepalive problem is reduced because satellites can always reach the server over ISL. However, ColonyOS is implemented in Go and requires a full OS environment. Running it on a satellite would require an embedded reimplementation (no_std, cFS). A single satellite server is also a single point of failure.

Sat A (executor)
← ISL →
Sat B (executor)
← ISL →
Sat C (executor)
↕ ISL
ColonyOS Server (on satellite)

Comparison

DirectSat CoordinatorGround CoordinatorSat Server
Keepalive issueYesYesNoReduced
Ground round-tripsEvery phaseJob in/outJob in/outNone
ISL data flowNoYesYesYes
ColonyOS changesNoneNoneNoneReimplementation
Works with current ColonyOSNoNoYesNo