Geff specification
The graph exchange file format is zarr
based. A graph is stored in a zarr group, which can have any name. However the name of the group can include the .geff
suffix to indicate that the group contains geff
data. This allows storing multiple geff
graphs inside the same zarr root directory. A geff
group is identified by the presence of a geff
key in the .zattrs
. Other geff
metadata is also stored in the .zattrs
file of the geff
group, nested under the geff
key. The geff
group must contain a nodes
group and an edges
group (albeit both can be empty). geff
graphs have the option to provide properties for nodes
and edges
.
geff
graphs have the option to provide time and spatial dimensions as special attributes. These attributes are specified in the axes
section of the metadata, inspired by the OME-zarr axes
specification.
Zarr specification
Currently, geff
supports zarr specifications 2 and 3. However, geff
will default to writing specification 2 because graphs written to the zarr v3 spec will not be compatible with all applications. When zarr 3 is more fully adopted by other libraries and tools, we will move to a zarr spec 3 default.
geff_spec.GeffMetadata
Bases: BaseModel
Geff metadata schema to validate the attributes json file in a geff zarr
Parameters:
-
geff_version
(str
, default:'1.1'
) –Geff version string following semantic versioning (MAJOR.MINOR.PATCH), optionally with .devN and/or +local parts (e.g., 0.3.1.dev6+g61d5f18). If not provided, the version will be set to the current geff package version.
-
directed
(bool
) –True if the graph is directed, otherwise False.
-
axes
(list[Axis] | None
, default:None
) –Optional list of
Axis
objects defining the axes of each node in the graph. The axes list is modeled after the OME-zarr specifications and is used to identify spatio-temporal properties on the graph nodes. If the same names are used in the axes metadata of the related image or segmentation data, applications can use this information to align graph node locations with image data. The order of the axes in the list is meaningful. For one, any downstream properties that are an array of values with one value per (spatial) axis will be in the order of the axis list (filtering to only the spatial axes by thetype
field if needed). Secondly, if associated image or segmentation data does not have axes metadata, the order of the spatiotemporal axes is a good default guess for aligning the graph and the image data, although there is no way to denote the channel dimension in the graph spec. If you are writing out a geff with an associated segmentation and/or image dataset, we highly recommend providing the axis names for your segmentation/image using the OME-zarr spec, including channel dimensions if needed. -
node_props_metadata
(dict[str, PropMetadata]
) –Metadata for node properties. The keys are the property identifiers, and the values are PropMetadata objects describing the properties.There must be one entry for each node property.
-
edge_props_metadata
(dict[str, PropMetadata]
) –Metadata for edge properties. The keys are the property identifiers, and the values are PropMetadata objects describing the properties.There must be one entry for each edge property.
-
sphere
(str | None
, default:None
) –Name of the optional
sphere
property.A sphere is defined by
- a center point, already given by the
space
type properties - a radius scalar, stored in this property
- a center point, already given by the
-
ellipsoid
(str | None
, default:None
) –Name of the
ellipsoid
property.An ellipsoid is assumed to be in the same coordinate system as the
space
type properties.It is defined by
- a center point \(c\), already given by the
space
type properties - a covariance matrix \(\Sigma\), symmetric and positive-definite, stored in this
property as a
2x2
/3x3
array.
To plot the ellipsoid:
- Compute the eigendecomposition of the covariance matrix \(\Sigma = Q \Lambda Q^{\top}\)
- Sample points \(z\) on the unit sphere
- Transform the points to the ellipsoid by \(x = c + Q \Lambda^{(1/2)} z\).
- a center point \(c\), already given by the
-
track_node_props
(dict[Literal['lineage', 'tracklet'], str] | None
, default:None
) –Node properties denoting tracklet and/or lineage IDs. A tracklet is defined as a simple path of connected nodes where the initiating node has any incoming degree and outgoing degree at most 1, and the terminating node has incoming degree at most 1 and any outgoing degree, and other nodes along the path have in/out degree of 1. Each tracklet must contain the maximal set of connected nodes that match this definition - no sub-tracklets. A lineage is defined as a weakly connected component on the graph. The dictionary can store one or both of 'tracklet' or 'lineage' keys.
-
related_objects
(list[RelatedObject] | None
, default:None
) –A list of dictionaries of related objects such as labels or images. Each dictionary must contain 'type', 'path', and optionally 'label_prop' properties. The 'type' represents the data type. 'labels' and 'image' should be used for label and image objects, respectively. Other types are also allowed, The 'path' should be relative to the geff zarr-attributes file. It is strongly recommended all related objects are stored as siblings of the geff group within the top-level zarr group. The 'label_prop' is only valid for type 'labels' and specifies the node property that will be used to identify the labels in the related object.
-
display_hints
(DisplayHint | None
, default:None
) –Metadata indicating how spatiotemporal axes are displayed by a viewer
-
extra
(dict[str, Any]
, default:<class 'dict'>
) –The optional
extra
object is a free-form dictionary that can hold any additional, application-specific metadata that is not covered by the core geff schema. Users may place arbitrary keys and values insideextra
without fear of clashing with future reserved fields. Although the coregeff
reader makes these attributes available, their meaning and use are left entirely to downstream applications.
The nodes
group
The nodes group will contain an ids
array and optionally a props
group.
The ids
array
The nodes\ids
array is a 1D array of node IDs of length N
>= 0, where N
is the number of nodes in the graph. Node ids must be unique. Node IDs must have an integer dtype. For large graphs, int64
might be necessary to provide enough range for every node to have a unique ID. In the minimal case of an empty graph, the ids
array will be present but empty.
The props
group and node property
groups
The nodes\props
group is optional and will contain one or more node property
groups, each with a values
array and an optional missing
array.
-
values
arrays can be any zarr supported dtype, and can be N-dimensional. The first dimension of thevalues
array must have the same length as the nodeids
array, such that each row of the propertyvalues
array stores the property for the node at that index in the ids array. String values will be stored according to the zarr-extensions string specification - as variable length UTF8 strings. -
The
missing
array is an optional, a one dimensional boolean array to support properties that are not present on all nodes. A1
at an index in themissing
array indicates that thevalue
of that property for the node at that index is None, and the value in thevalues
array at that index should be ignored. If themissing
array is not present, that means that all nodes have values for the property. -
Geff provides special support for spatio-temporal properties, although they are not required. When
axes
are specified in thegeff
metadata, each axis name identifies a spatio-temporal property. Spatio-temporal properties are not allowed to have missing arrays. Otherwise, they are identical to other properties from a storage specification perspective. -
The
seg_id
property is an optional, special node property that stores the segmenatation label for each node. Theseg_id
values do not need to be unique, in case labels are repeated between time points. If theseg_id
property is not present, it is assumed that the graph is not associated with a segmentation. -
Geff provides special support for predefined shape properties, although they are not required. These currently include:
sphere
,ellipsoid
. Values can be marked asmissing
, and a geff graph may contain multiple different shape properties. Units of shapes are assumed to be the same as the units on the spatial axes. Otherwise, shape properties are identical to other properties from a storage specification perspective. -sphere
: Hypersphere in n spatial dimensions, defined by a scalar radius. -ellipsoid
: Defined by a symmetric positive-definite covariance matrix, whose dimensionality is assumed to match the spatial axes.
Note
When writing a graph with missing properties to the geff format, you must fill in a dummy value in the values
array for the nodes that are missing the property, in order to keep the indices aligned with the node ids.
Variable length properties
While most properties can be represented as normal arrays, where each node has a property of the same shape, the specification also supports properties where each node can have an array property of a variable shape. This is useful for properties such as polygons, meshes, or crops of bounding boxes.
Variable length properties will have a data
array in addition to the values
and missing
arrays. For variable length properties, the data
array will contain a 1D flattened array of the actual values for all the nodes. The values
array will contain the offset and shape of the relevant section of data in the data
array.
The edges
group
Similar to the nodes
group, the edges
group will contain an ids
array and an optional props
group.
The ids
array
The edges\ids
array is a 2D array with the same dtype as the nodes\ids
array. It has shape (E, 2)
, where E
is the number of edges in the graph. If there are no edges in the graph, the edge group and ids
array must be present with shape (0, 2)
. All elements in the edges\ids
array must also be present in the nodes\ids
array, and the data types of the two id arrays must match.
Each row represents an edge between two nodes. For directed graphs, the first column is the source nodes and the second column holds the target nodes. For undirected graphs, the order is arbitrary.
Edges should be unique (no multiple edges between the same two nodes) and edges from a node to itself are not supported.
The props
group and edge property
groups
The edges\props
group will contain zero or more edge property
groups, each with a values
array and an optional missing
array. Variable length edge properties operate the same as variable length node properties, with an additional data
array that the values
array refers to.
values
arrays can be any zarr supported dtype, and can be N-dimensional. The first dimension of thevalues
array must have the same length as theedges\ids
array, such that each row of the propertyvalues
array stores the property for the edge at that index in the ids array.- The
missing
array is an optional, a one dimensional boolean array to support properties that are not present on all edges. A1
at an index in themissing
array indicates that thevalue
of that property for the edge at that index is missing, and the value in thevalues
array at that index should be ignored. If themissing
array is not present, that means that all edges have values for the property.
The edges/props
is optional. If you do not have any edge properties, the edges\props
can be absent.
Example file structure and metadata
Here is a schematic of the expected file structure.
/path/to.zarr
/tracking_graph.geff
.zattrs # graph metadata with `geff_version`
nodes/
ids # shape: (N,) dtype: uint64
props/
t/
values # shape: (N,) dtype: uint16
z/
values # shape: (N,) dtype: float32
y/
values # shape: (N,) dtype: float32
x/
values # shape: (N,) dtype: float32
radius/
values # shape: (N,) dtype: int | float
missing # shape: (N,) dtype: bool
covariance3d/
values # shape: (N, 3, 3) dtype: float
missing # shape: (N,) dtype: bool
color/
values # shape: (N, 4) dtype: float32
missing # shape: (N,) dtype: bool
polygon/
data # shape: (V,) dtype: any, V is the length of all the flattened entries
values # shape: (N, ndim + 1) dtype: int64, ndim is number of dimensions in each entry array
missing # shape: (N,) dtype: bool
edges/
ids # shape: (E, 2) dtype: uint64
props/
distance/
values # shape: (E,) dtype: float32
score/
values # shape: (E,) dtype: float32
missing # shape: (E,) dtype: bool
# optional:
/segmentation
# unspecified, but totally okay:
/raw
This is a geff metadata zattrs file that matches the above example structure.
// /path/to.zarr/tracking_graph/.zattrs
{
"geff": {
"directed": true,
"geff_version": "0.1.3",
// axes are optional
"axes": [
{ "name": "t", "type": "time", "unit": "second", "min": 0, "max": 125 },
{
"name": "z",
"type": "space",
"unit": "micrometer",
"min": 1523.36,
"max": 4398.1
},
{
"name": "y",
"type": "space",
"unit": "micrometer",
"min": 81.667,
"max": 1877.7
},
{
"name": "x",
"type": "space",
"unit": "micrometer",
"min": 764.42,
"max": 2152.3
}
],
// predefined node attributes for storing detections as spheres or ellipsoids
"sphere": "radius", // optional
"ellipsoid": "covariance3d", // optional
"display_hints": {
"display_horizontal": "x",
"display_vertical": "y",
"display_depth": "z",
"display_time": "t"
},
"node_props_metadata": {
"t": {
"identifier": "t",
"dtype": "uint16",
"varlength": false,
"unit": "second"
},
"z": {
"identifier": "z",
"dtype": "float32",
"varlength": false,
"unit": "micrometer"
},
"y": {
"identifier": "y",
"dtype": "float32",
"varlength": false,
"unit": "micrometer"
},
"x": {
"identifier": "x",
"dtype": "float32",
"varlength": false,
"unit": "micrometer"
},
"radius": {
"identifier": "radius",
"dtype": "float32",
"varlength": false,
"unit": "micrometer"
},
"covariance3d": {
"identifier": "covariance3d",
"dtype": "float32",
"varlength": false
},
"color": { "identifier": "color", "dtype": "float32", "varlength": false }
},
"edge_props_metadata": {
"distance": {
"identifier": "distance",
"dtype": "float32",
"varlength": false
},
"score": { "identifier": "score", "dtype": "float32", "varlength": false }
},
// node attributes corresponding to tracklet and/or lineage IDs
"track_node_props": {
"lineage": "ultrack_lineage_id",
"tracklet": "ultrack_id"
},
"related_objects": [
{
"type": "labels",
"path": "../segmentation/",
"label_prop": "seg_id"
},
{
"type": "image",
"path": "../raw/"
}
],
// optional coordinate transformation is defined as homogeneous coordinates
// It is expected to be a (D+1)x(D+1) matrix where D is the number of axes
"affine": [
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1]
],
// custom other things must be placed **inside** the extra attribute
"extra": {
// ...
}
}
}
Minimal geff metadata must have version
and directed
fields under a geff
field, as
well as empty node_props_metadata
and edge_props_metadata
fields.
{
"geff": {
"version": "0.0.0",
"directed": false,
"node_props_metadata": {},
"edge_props_metadata": {}
}
}