Geff specification
The graph exchange file format is zarr
based. A graph is stored in a zarr group, which can have any name. This allows storing multiple geff
graphs inside the same zarr root directory. A geff
group is identified by the presence of a geff
key in the .zattrs
. Other geff
metadata is also stored in the .zattrs
file of the geff
group, nested under the geff
key. The geff
group must contain a nodes
group and an edges
group (albeit both can be empty). geff
graphs have the option to provide properties for nodes
and edges
.
geff
graphs have the option to provide time and spatial dimensions as special attributes. These attributes are specified in the axes
section of the metadata, inspired by the OME-zarr axes
specification.
Zarr specification
Currently, geff
supports zarr specifications 2 and 3. However, geff
will default to writing specification 2 because graphs written to the zarr v3 spec will not be compatible with all applications. When zarr 3 is more fully adopted by other libraries and tools, we will move to a zarr spec 3 default.
Geff metadata
geff_metadata
Type: objectgeff_metadata
Geff Version
Type: stringGeff version string following semantic versioning (MAJOR.MINOR.PATCH), optionally with .devN and/or +local parts (e.g., 0.3.1.dev6+g61d5f18).
If not provided, the version will be set to the current geff package version.
^\d+\.\d+(?:\.\d+)?(?:\.dev\d+)?(?:\+[a-zA-Z0-9]+)?
Directed
Type: booleanTrue if the graph is directed, otherwise False.
Axes
Default: nullOptional list of Axis objects defining the axes of each node in the graph.
Each object's name
must be an existing attribute on the nodes. The optional type
keymust be one of space
, time
or channel
, though readers may not use this information. Each axis can additionally optionally define a unit
key, which should match the validOME-Zarr units, and min
and max
keys to define the range of the axis.
No Additional Items
Each item of this array must be:
Axis
Type: objectName
Type: stringType
Default: nullUnit
Default: nullMin
Default: nullMax
Default: nullNode Props Metadata
Default: nullMetadata for node properties. The keys are the property identifiers, and the values are PropMetadata objects describing the properties.
Each additional property must conform to the following schema
PropMetadata
Type: objectMetadata describing a property in the geff graph.
Identifier
Type: stringDtype
Type: stringEncoding
Default: nullUnit
Default: nullName
Default: nullDescription
Default: nullEdge Props Metadata
Default: nullMetadata for edge properties. The keys are the property identifiers, and the values are PropMetadata objects describing the properties.
Each additional property must conform to the following schema
PropMetadata
Type: objectMetadata describing a property in the geff graph.
Same definition as PropMetadataNode property: Detections as ellipsoids
Default: null Name of the `ellipsoid` property.
An ellipsoid is assumed to be in the same coordinate system as the `space` type
properties.
It is defined by
- a center point :math:`c`, already given by the `space` type properties
- a covariance matrix :math:`\Sigma`, symmetric and positive-definite, stored in this
property as a `2x2`/`3x3` array.
To plot the ellipsoid:
- Compute the eigendecomposition of the covariance matrix
:math:`\Sigma = Q \Lambda Q^{\top}`
- Sample points :math:`z` on the unit sphere
- Transform the points to the ellipsoid by
:math:`x = c + Q \Lambda^{(1/2)} z`.
Track Node Props
Default: nullNode properties denoting tracklet and/or lineage IDs.
A tracklet is defined as a simple path of connected nodes where the initiating node has any incoming degree and outgoing degree at most 1,and the terminating node has incoming degree at most 1 and any outgoing degree, and other nodes along the path have in/out degree of 1. Each tracklet must contain the maximal set of connected nodes that match this definition - no sub-tracklets.
A lineage is defined as a weakly connected component on the graph.
The dictionary can store one or both of 'tracklet' or 'lineage' keys.
Each additional property must conform to the following schema
Type: stringAffine transformation matrix to transform the graph coordinates to the physical coordinates. The matrix must have the same number of dimensions as the number of axes in the graph.
Affine
Type: objectAffine transformation class following scipy conventions.
Internally stores transformations as homogeneous coordinate matrices (N+1, N+1).
The transformation matrix follows scipy.ndimage.affine_transform convention
where the matrix maps output coordinates to input coordinates (inverse/pull transformation).
For a point pout in output space, the corresponding input point pin is computed as:
pinhomo = matrix @ pouthomo
where pouthomo = [pout; 1] and pin = pinhomo[:-1]
Attributes:
matrix: Homogeneous transformation matrix as list of lists (ndim+1, ndim+1)
Matrix
Type: objectHomogeneous transformation matrix as list of lists (ndim+1, ndim+1)
Metadata indicating how spatiotemporal axes are displayed by a viewer
DisplayHint
Type: objectMetadata indicating how spatiotemporal axes are displayed by a viewer
Display Horizontal
Type: stringWhich spatial axis to use for horizontal display
Display Vertical
Type: stringWhich spatial axis to use for vertical display
Display Depth
Default: nullOptional, which spatial axis to use for depth display
Display Time
Default: nullOptional, which temporal axis to use for time
Extra
Type: objectExtra metadata that is not part of the schema
Note
The axes dictionary is modeled after the OME-zarr specifications and is used to identify spatio-temporal properties on the graph nodes. If the same names are used in the axes metadata of the related image or segmentation data, applications can use this information to align graph node locations with image data.
geff.units.VALID_AXIS_TYPES
module-attribute
VALID_AXIS_TYPES = ['space', 'time', 'channel']
geff.units.VALID_SPACE_UNITS
module-attribute
VALID_SPACE_UNITS = [
None,
"angstrom",
"attometer",
"centimeter",
"decimeter",
"exameter",
"femtometer",
"foot",
"gigameter",
"hectometer",
"inch",
"kilometer",
"megameter",
"meter",
"micrometer",
"mile",
"millimeter",
"nanometer",
"parsec",
"petameter",
"picometer",
"terameter",
"yard",
"yoctometer",
"yottameter",
"zeptometer",
"zettameter",
]
geff.units.VALID_TIME_UNITS
module-attribute
VALID_TIME_UNITS = [
None,
"attosecond",
"centisecond",
"day",
"decisecond",
"exasecond",
"femtosecond",
"gigasecond",
"hectosecond",
"hour",
"kilosecond",
"megasecond",
"microsecond",
"millisecond",
"minute",
"nanosecond",
"petasecond",
"picosecond",
"second",
"terasecond",
"yoctosecond",
"yottasecond",
"zeptosecond",
"zettasecond",
]
Affine transformations
The optional affine
field allows specifying a global affine transformation that maps the graph coordinates stored in the node properties to a physical coordinate system. The value matrix is stored as a (N + 1) × (N + 1)
homogeneous matrix following the scipy.ndimage.affine_transform
convention, where N equals the number of spatio-temporal axes declared in axes
.
Extra attributes
The optional extra
object is a free-form dictionary that can hold any additional, application-specific metadata that is not covered by the core geff schema. Users may place arbitrary keys and values inside extra
without fear of clashing with future reserved fields. Although the core geff
reader makes these attributes available, their meaning and use are left entirely to downstream applications.
The nodes
group
The nodes group will contain an ids
array and optionally a props
group.
The ids
array
The nodes\ids
array is a 1D array of node IDs of length N
>= 0, where N
is the number of nodes in the graph. Node ids must be unique. Node IDs can have any type supported by zarr (except floats), but we recommend integer dtypes. For large graphs, uint64
might be necessary to provide enough range for every node to have a unique ID. In the minimal case of an empty graph, the ids
array will be present but empty.
The props
group and node property
groups
The nodes\props
group is optional and will contain one or more node property
groups, each with a values
array and an optional missing
array.
values
arrays can be any zarr supported dtype, and can be N-dimensional. The first dimension of thevalues
array must have the same length as the nodeids
array, such that each row of the propertyvalues
array stores the property for the node at that index in the ids array.-
The
missing
array is an optional, a one dimensional boolean array to support properties that are not present on all nodes. A1
at an index in themissing
array indicates that thevalue
of that property for the node at that index is None, and the value in thevalues
array at that index should be ignored. If themissing
array is not present, that means that all nodes have values for the property. -
Geff provides special support for spatio-temporal properties, although they are not required. When
axes
are specified in thegeff
metadata, each axis name identifies a spatio-temporal property. Spatio-temporal properties are not allowed to have missing arrays. Otherwise, they are identical to other properties from a storage specification perspective. -
The
seg_id
property is an optional, special node property that stores the segmenatation label for each node. Theseg_id
values do not need to be unique, in case labels are repeated between time points. If theseg_id
property is not present, it is assumed that the graph is not associated with a segmentation. -
Geff provides special support for predefined shape properties, although they are not required. These currently include:
sphere
,ellipsoid
. Values can be marked asmissing
, and a geff graph may contain multiple different shape properties. Units of shapes are assumed to be the same as the units on the spatial axes. Otherwise, shape properties are identical to other properties from a storage specification perspective.sphere
: Hypersphere in n spatial dimensions, defined by a scalar radius.ellipsoid
: Defined by a symmetric positive-definite covariance matrix, whose dimensionality is assumed to match the spatial axes.
Note
When writing a graph with missing properties to the geff format, you must fill in a dummy value in the values
array for the nodes that are missing the property, in order to keep the indices aligned with the node ids.
The edges
group
Similar to the nodes
group, the edges
group will contain an ids
array and an optional props
group.
The ids
array
The edges\ids
array is a 2D array with the same dtype as the nodes\ids
array. It has shape (E, 2)
, where E
is the number of edges in the graph. If there are no edges in the graph, the edge group and ids
array must be present with shape (0, 2)
. All elements in the edges\ids
array must also be present in the nodes\ids
array, and the data types of the two id arrays must match.
Each row represents an edge between two nodes. For directed graphs, the first column is the source nodes and the second column holds the target nodes. For undirected graphs, the order is arbitrary.
Edges should be unique (no multiple edges between the same two nodes) and edges from a node to itself are not supported.
The props
group and edge property
groups
The edges\props
group will contain zero or more edge property
groups, each with a values
array and an optional missing
array.
values
arrays can be any zarr supported dtype, and can be N-dimensional. The first dimension of thevalues
array must have the same length as theedges\ids
array, such that each row of the propertyvalues
array stores the property for the edge at that index in the ids array.- The
missing
array is an optional, a one dimensional boolean array to support properties that are not present on all edges. A1
at an index in themissing
array indicates that thevalue
of that property for the edge at that index is missing, and the value in thevalues
array at that index should be ignored. If themissing
array is not present, that means that all edges have values for the property.
The edges/props
is optional. If you do not have any edge properties, the edges\props
can be absent.
Example file structure and metadata
Here is a schematic of the expected file structure.
/path/to.zarr
/tracking_graph
.zattrs # graph metadata with `geff_version`
nodes/
ids # shape: (N,) dtype: uint64
props/
t/
values # shape: (N,) dtype: uint16
z/
values # shape: (N,) dtype: float32
y/
values # shape: (N,) dtype: float32
x/
values # shape: (N,) dtype: float32
radius/
values # shape: (N,) dtype: int | float
missing # shape: (N,) dtype: bool
covariance3d/
values # shape: (N, 3, 3) dtype: float
missing # shape: (N,) dtype: bool
color/
values # shape: (N, 4) dtype: float16
missing # shape: (N,) dtype: bool
edges/
ids # shape: (E, 2) dtype: uint64
props/
distance/
values # shape: (E,) dtype: float16
score/
values # shape: (E,) dtype: float16
missing # shape: (E,) dtype: bool
# optional:
/segmentation
# unspecified, but totally okay:
/raw
# /path/to.zarr/tracking_graph/.zattrs
{
"geff": {
"directed": true,
"geff_version": "0.1.3.dev4+gd5d1132.d20250616",
"axes": [ # optional
{'name': 't', 'type': "time", 'unit': "seconds", 'min': 0, 'max': 125},
{'name': 'z', 'type': "space", 'unit': "micrometers", 'min': 1523.36, 'max': 4398.1},
{'name': 'y', 'type': "space", 'unit': "micrometers", 'min': 81.667, 'max': 1877.7},
{'name': 'x', 'type': "space", 'unit': "micrometers", 'min': 764.42, 'max': 2152.3},
],
# predefined node attributes for storing detections as spheres or ellipsoids
"sphere": "radius", # optional
"ellipsoid": "covariance3d", # optional
"display_hints": {
"display_horizontal": "x",
"display_vertical": "y",
"display_depth": "z",
"display_time": "t",
},
# node attributes corresponding to tracklet and/or lineage IDs
"track_node_props": {
"lineage": "ultrack_lineage_id",
"tracklet": "ultrack_id"
},
"related_objects": {
{
"type":"labels", "path":"../segmentation/", "label_prop": "seg_id",
},
{
"type":"image", "path":"../raw/",
},
},
# optional coordinate transformation is defined as homogeneous coordinates
# It is expected to be a (D+1)x(D+1) matrix where D is the number of axes
"affine": [
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1],
# custom other things must be placed **inside** the extra attribute
"extra": {
...
}
}
}