Skip to content

Geff specification

The graph exchange file format is zarr based. A graph is stored in a zarr group, which can have any name. This allows storing multiple geff graphs inside the same zarr root directory. A geff group is identified by the presence of a geff key in the .zattrs. Other geff metadata is also stored in the .zattrs file of the geff group, nested under the geff key. The geff group must contain a nodes group and an edges group (albeit both can be empty). geff graphs have the option to provide properties for nodes and edges.

geff graphs have the option to provide time and spatial dimensions as special attributes. These attributes are specified in the axes section of the metadata, inspired by the OME-zarr axes specification.

Zarr specification

Currently, geff supports zarr specifications 2 and 3. However, geff will default to writing specification 2 because graphs written to the zarr v3 spec will not be compatible with all applications. When zarr 3 is more fully adopted by other libraries and tools, we will move to a zarr spec 3 default.

Geff metadata

GeffSchema Type: object

geff_metadata

Type: object

geff_metadata

Geff Version

Type: string

Geff version string following semantic versioning (MAJOR.MINOR.PATCH), optionally with .devN and/or +local parts (e.g., 0.3.1.dev6+g61d5f18).
If not provided, the version will be set to the current geff package version.

Must match regular expression: ^\d+\.\d+(?:\.\d+)?(?:\.dev\d+)?(?:\+[a-zA-Z0-9]+)?

Directed

Type: boolean

True if the graph is directed, otherwise False.

Axes

Default: null

Optional list of Axis objects defining the axes of each node in the graph.
Each object's name must be an existing attribute on the nodes. The optional type keymust be one of space, time or channel, though readers may not use this information. Each axis can additionally optionally define a unit key, which should match the validOME-Zarr units, and min and max keys to define the range of the axis.

Type: array
No Additional Items

Each item of this array must be:

Axis

Type: object

Name

Type: string

Type

Default: null

Unit

Default: null

Min

Default: null

Max

Default: null
Type: null

Node Props Metadata

Default: null

Metadata for node properties. The keys are the property identifiers, and the values are PropMetadata objects describing the properties.

Type: object

Each additional property must conform to the following schema

PropMetadata

Type: object

Metadata describing a property in the geff graph.

Edge Props Metadata

Default: null

Metadata for edge properties. The keys are the property identifiers, and the values are PropMetadata objects describing the properties.

Type: object

Each additional property must conform to the following schema

PropMetadata

Type: object

Metadata describing a property in the geff graph.

Same definition as PropMetadata

Node property: Detections as spheres

Default: null
        Name of the optional `sphere` property.

        A sphere is defined by
        - a center point, already given by the `space` type properties
        - a radius scalar, stored in this property

Type: string
Type: null

Node property: Detections as ellipsoids

Default: null
        Name of the `ellipsoid` property.

        An ellipsoid is assumed to be in the same coordinate system as the `space` type
        properties.

        It is defined by
        - a center point :math:`c`, already given by the `space` type properties
        - a covariance matrix :math:`\Sigma`, symmetric and positive-definite, stored in this
          property as a `2x2`/`3x3` array.

        To plot the ellipsoid:
        - Compute the eigendecomposition of the covariance matrix
        :math:`\Sigma = Q \Lambda Q^{\top}`
        - Sample points :math:`z` on the unit sphere
        - Transform the points to the ellipsoid by
        :math:`x = c + Q \Lambda^{(1/2)} z`.

Type: string
Type: null

Track Node Props

Default: null

Node properties denoting tracklet and/or lineage IDs.
A tracklet is defined as a simple path of connected nodes where the initiating node has any incoming degree and outgoing degree at most 1,and the terminating node has incoming degree at most 1 and any outgoing degree, and other nodes along the path have in/out degree of 1. Each tracklet must contain the maximal set of connected nodes that match this definition - no sub-tracklets.
A lineage is defined as a weakly connected component on the graph.
The dictionary can store one or both of 'tracklet' or 'lineage' keys.

Type: object

Each additional property must conform to the following schema

Type: string

Default: null

Affine transformation matrix to transform the graph coordinates to the physical coordinates. The matrix must have the same number of dimensions as the number of axes in the graph.

Affine

Type: object

Affine transformation class following scipy conventions.

Internally stores transformations as homogeneous coordinate matrices (N+1, N+1).
The transformation matrix follows scipy.ndimage.affine_transform convention
where the matrix maps output coordinates to input coordinates (inverse/pull transformation).

For a point pout in output space, the corresponding input point pin is computed as:
pinhomo = matrix @ pouthomo
where pouthomo = [pout; 1] and pin = pinhomo[:-1]

Attributes:
matrix: Homogeneous transformation matrix as list of lists (ndim+1, ndim+1)

Matrix

Type: object

Homogeneous transformation matrix as list of lists (ndim+1, ndim+1)

Type: null

Default: null

Metadata indicating how spatiotemporal axes are displayed by a viewer

DisplayHint

Type: object

Metadata indicating how spatiotemporal axes are displayed by a viewer

Display Horizontal

Type: string

Which spatial axis to use for horizontal display

Display Vertical

Type: string

Which spatial axis to use for vertical display

Display Depth

Default: null

Optional, which spatial axis to use for depth display

Display Time

Default: null

Optional, which temporal axis to use for time

Type: null

Extra

Type: object

Extra metadata that is not part of the schema

Note

The axes dictionary is modeled after the OME-zarr specifications and is used to identify spatio-temporal properties on the graph nodes. If the same names are used in the axes metadata of the related image or segmentation data, applications can use this information to align graph node locations with image data.

geff.units.VALID_AXIS_TYPES module-attribute

VALID_AXIS_TYPES = ['space', 'time', 'channel']

geff.units.VALID_SPACE_UNITS module-attribute

VALID_SPACE_UNITS = [
    None,
    "angstrom",
    "attometer",
    "centimeter",
    "decimeter",
    "exameter",
    "femtometer",
    "foot",
    "gigameter",
    "hectometer",
    "inch",
    "kilometer",
    "megameter",
    "meter",
    "micrometer",
    "mile",
    "millimeter",
    "nanometer",
    "parsec",
    "petameter",
    "picometer",
    "terameter",
    "yard",
    "yoctometer",
    "yottameter",
    "zeptometer",
    "zettameter",
]

geff.units.VALID_TIME_UNITS module-attribute

VALID_TIME_UNITS = [
    None,
    "attosecond",
    "centisecond",
    "day",
    "decisecond",
    "exasecond",
    "femtosecond",
    "gigasecond",
    "hectosecond",
    "hour",
    "kilosecond",
    "megasecond",
    "microsecond",
    "millisecond",
    "minute",
    "nanosecond",
    "petasecond",
    "picosecond",
    "second",
    "terasecond",
    "yoctosecond",
    "yottasecond",
    "zeptosecond",
    "zettasecond",
]

Affine transformations

The optional affine field allows specifying a global affine transformation that maps the graph coordinates stored in the node properties to a physical coordinate system. The value matrix is stored as a (N + 1) × (N + 1) homogeneous matrix following the scipy.ndimage.affine_transform convention, where N equals the number of spatio-temporal axes declared in axes.

Extra attributes

The optional extra object is a free-form dictionary that can hold any additional, application-specific metadata that is not covered by the core geff schema. Users may place arbitrary keys and values inside extra without fear of clashing with future reserved fields. Although the core geff reader makes these attributes available, their meaning and use are left entirely to downstream applications.

The nodes group

The nodes group will contain an ids array and optionally a props group.

The ids array

The nodes\ids array is a 1D array of node IDs of length N >= 0, where N is the number of nodes in the graph. Node ids must be unique. Node IDs can have any type supported by zarr (except floats), but we recommend integer dtypes. For large graphs, uint64 might be necessary to provide enough range for every node to have a unique ID. In the minimal case of an empty graph, the ids array will be present but empty.

The props group and node property groups

The nodes\props group is optional and will contain one or more node property groups, each with a values array and an optional missing array.

  • values arrays can be any zarr supported dtype, and can be N-dimensional. The first dimension of the values array must have the same length as the node ids array, such that each row of the property values array stores the property for the node at that index in the ids array.
  • The missing array is an optional, a one dimensional boolean array to support properties that are not present on all nodes. A 1 at an index in the missing array indicates that the value of that property for the node at that index is None, and the value in the values array at that index should be ignored. If the missing array is not present, that means that all nodes have values for the property.

  • Geff provides special support for spatio-temporal properties, although they are not required. When axes are specified in the geff metadata, each axis name identifies a spatio-temporal property. Spatio-temporal properties are not allowed to have missing arrays. Otherwise, they are identical to other properties from a storage specification perspective.

  • The seg_id property is an optional, special node property that stores the segmenatation label for each node. The seg_id values do not need to be unique, in case labels are repeated between time points. If the seg_id property is not present, it is assumed that the graph is not associated with a segmentation.

  • Geff provides special support for predefined shape properties, although they are not required. These currently include: sphere, ellipsoid. Values can be marked as missing, and a geff graph may contain multiple different shape properties. Units of shapes are assumed to be the same as the units on the spatial axes. Otherwise, shape properties are identical to other properties from a storage specification perspective.

    • sphere: Hypersphere in n spatial dimensions, defined by a scalar radius.
    • ellipsoid: Defined by a symmetric positive-definite covariance matrix, whose dimensionality is assumed to match the spatial axes.

Note

When writing a graph with missing properties to the geff format, you must fill in a dummy value in the values array for the nodes that are missing the property, in order to keep the indices aligned with the node ids.

The edges group

Similar to the nodes group, the edges group will contain an ids array and an optional props group.

The ids array

The edges\ids array is a 2D array with the same dtype as the nodes\ids array. It has shape (E, 2), where E is the number of edges in the graph. If there are no edges in the graph, the edge group and ids array must be present with shape (0, 2). All elements in the edges\ids array must also be present in the nodes\ids array, and the data types of the two id arrays must match. Each row represents an edge between two nodes. For directed graphs, the first column is the source nodes and the second column holds the target nodes. For undirected graphs, the order is arbitrary. Edges should be unique (no multiple edges between the same two nodes) and edges from a node to itself are not supported.

The props group and edge property groups

The edges\props group will contain zero or more edge property groups, each with a values array and an optional missing array.

  • values arrays can be any zarr supported dtype, and can be N-dimensional. The first dimension of the values array must have the same length as the edges\ids array, such that each row of the property values array stores the property for the edge at that index in the ids array.
  • The missing array is an optional, a one dimensional boolean array to support properties that are not present on all edges. A 1 at an index in the missing array indicates that the value of that property for the edge at that index is missing, and the value in the values array at that index should be ignored. If the missing array is not present, that means that all edges have values for the property.

The edges/props is optional. If you do not have any edge properties, the edges\props can be absent.

Example file structure and metadata

Here is a schematic of the expected file structure.

/path/to.zarr
    /tracking_graph
        .zattrs  # graph metadata with `geff_version`
        nodes/
            ids  # shape: (N,)  dtype: uint64
            props/
                t/
                    values # shape: (N,) dtype: uint16
                z/
                    values # shape: (N,) dtype: float32
                y/
                    values # shape: (N,) dtype: float32
                x/
                    values # shape: (N,) dtype: float32
                radius/
                    values # shape: (N,) dtype: int | float
                    missing # shape: (N,) dtype: bool
                covariance3d/
                    values # shape: (N, 3, 3) dtype: float
                    missing # shape: (N,) dtype: bool
                color/
                    values # shape: (N, 4) dtype: float16
                    missing # shape: (N,) dtype: bool
        edges/
            ids  # shape: (E, 2) dtype: uint64
            props/
                distance/
                    values # shape: (E,) dtype: float16
                score/
                    values # shape: (E,) dtype: float16
                    missing # shape: (E,) dtype: bool
    # optional:
    /segmentation 

    # unspecified, but totally okay:
    /raw 
This is a geff metadata zattrs file that matches the above example structure.
# /path/to.zarr/tracking_graph/.zattrs
{   
    "geff": {
        "directed": true,
        "geff_version": "0.1.3.dev4+gd5d1132.d20250616",
        "axes": [ # optional
            {'name': 't', 'type': "time", 'unit': "seconds", 'min': 0, 'max': 125},
            {'name': 'z', 'type': "space", 'unit': "micrometers", 'min': 1523.36, 'max': 4398.1},
            {'name': 'y', 'type': "space", 'unit': "micrometers", 'min': 81.667, 'max': 1877.7},
            {'name': 'x', 'type': "space", 'unit': "micrometers", 'min': 764.42, 'max': 2152.3},
        ],
        # predefined node attributes for storing detections as spheres or ellipsoids
        "sphere": "radius", # optional
        "ellipsoid": "covariance3d", # optional
        "display_hints": {
            "display_horizontal": "x",
            "display_vertical": "y",
            "display_depth": "z",
            "display_time": "t",
        },
        # node attributes corresponding to tracklet and/or lineage IDs
        "track_node_props": {
            "lineage": "ultrack_lineage_id",
            "tracklet": "ultrack_id"
        },
        "related_objects": {
            {
                "type":"labels", "path":"../segmentation/", "label_prop": "seg_id",
            },
            {
                "type":"image", "path":"../raw/",
            },
        },
        # optional coordinate transformation is defined as homogeneous coordinates
        # It is expected to be a (D+1)x(D+1) matrix where D is the number of axes
        "affine": [
            [1, 0, 0, 0, 0],
            [0, 1, 0, 0, 0],
            [0, 0, 1, 0, 0],
            [0, 0, 0, 1, 0],
            [0, 0, 0, 0, 1],
        # custom other things must be placed **inside** the extra attribute
        "extra": {
            ...
        }
    }
}