rekall.interval_set module

An IntervalSet is a set of Intervals with a number of core operations to transform and combine sets of Intervals.

class rekall.interval_set.IntervalSet(intrvls)

Bases: object

A set of Intervals.

IntervalSet provides core operations to transform and combine sets of Intervals such as map, fold, join, union, minus, coalesce, etc.

Note

This class does not obey the uniqueness semantics of a Set.

When the number of intervals is larger than NUM_INTRVLS_THRESHOLD, binary operations join, collect_by_interval and filter_against will no longer operate on the full cross product of the intervals, and instead only look at pairs that are within a certain neighborhood of each other along the primary axis (the primary axis of the first Interval passed in during construction). This neighborhood defaults to DEFAULT_FRACTION of the overall range along that axis of the IntervalSet.

Specifically, we compute DEFAULT_FRACTION of the overall range of that axis and set optimization_window to that fraction of the range. The optimized operations (join, collect_by_interval, and filter_against) only look at pairs whose distance from each other along the primary axis is less than optimization_window.

DEFAULT_FRACTION = 0.01
NUM_INTRVLS_THRESHOLD = 1000
coalesce(axis, bounds_merge_op, payload_merge_op=<function IntervalSet.<lambda>>, predicate=None, epsilon=0)

Recursively merge all intervals that are touching or overlapping along axis.

Merge intervals in self if they meet, overlap, or are up to epsilon apart along axis. If a predicate is specified, intervals will be merged if they meet/overlap and satisfy the predicate. Repeat the process until all such intervals are merged.

Merges the bounds with bounds_merge_op and merges payloads with payload_merge_op.

Parameters:
  • axis – The axis to coalesce on.
  • bounds_merge_op – A function that takes two bounds and returns a merged version of both of them. Along axis, this function should return a bound that spans the two bounds.
  • payload_merge_op (optional) – A function that takes in two payloads and merges them. Defaults to a function that returns the first of the two payloads.
  • predicate (optional) – A function that takes an interval that is currently being coalesced and a new interval and returns whether or not the two intervals should be merged.
  • epsilon (optional) – The slack for judging if Intervals meet or overlap. Must be nonnegative. Defaults to 0 (no slack).
Returns:

A new IntervalSet of intervals that are disjoint along axis and are at least epsilon apart.

collect_by_interval(other, predicate, filter_empty=True, window=None)

Collect intervals in other and nest them under intervals in self.

For each interval in self, its payload in the output IntervalSet will be all the intervals in other that satisfy the predicate. The payload of each resulting interval will become a pair (P, IS) where P is the original payload in self, and IS is an IntervalSet containing the nested intervals.

Note

We do not check the full cross-product of self and other and instead only check pairs that are within window of each other along the primary axis. See class documentation for more details.

Parameters:
  • other (IntervalSet) – Set of intervals to collect.
  • predicate – A function that takes one interval in self and one in other and returns a bool.
  • filter_empty (optional) – Whether to remove intervals in self if no intervals in other satisfies the predicate with it. If False, such intervals will have payload (P, IntervalSet([])) where P is the original payload. Defaults to True.
  • window (optional) – Restrict interval pairs to those within window of each other along the primary axis. Defaults to None which means using the default optimization window associated with self. See class Documentation for more detail.
Returns:

A new IntervalSet of intervals in self but with nested interval set from other in the payload.

dilate(window, axis=None)

Expand the range of every interval in the set along some axis.

Parameters:
  • window (Number) – The amount to extend at each end-point of the range. The actual interval will grow by 2*window. Use negative number to shrink intervals.
  • axis (optional) – The axis to dilate on. Represented as a pair of co-ordinates, such as ('t1', 't2'). Defaults to None, which uses the primary_axis of self.
Returns:

A new IntervalSet with the dilated intervals.

duration(axis=None)

Returns the sum of length of intervals along some axis in the set.

Parameters:axis – The axis to sum the length over.
Returns:The sum of the size of the intervals along axis.
empty()

Returns whether the set is empty.

filter(predicate)

Filter the set and keep intervals that pass the predicate.

Parameters:predicate – A function that takes an Interval and returns a bool.
Returns:A new IntervalSet which is the filtered set.
filter_against(other, predicate, window=None)

Filter intervals in self against intervals in other.

Keep only intervals in self that satisfy the predicate with at least one interval in other.

Note

We do not check the full cross-product of self and other and instead only check pairs that are within window of each other. See class documentation for more details.

Parameters:
  • other (IntervalSet) – The other interval set to filter against.
  • predicate – A function that takes one interval in self and an interval in other and returns a bool.
  • window (optional) – Restrict interval pairs to those within window of each other along the primary axis. Defaults to None which means using the default optimization window associated with self. See class Documentation for more detail.
Returns:

A new IntervalSet with intervals in self that satisify predicate with at least one interval in other.

filter_size(min_size=0, max_size='infty', axis=None)

Filter the intervals by length of the bounds along some axis.

Parameters:
  • min_size (Number, optional) – minimum size to pass the filter. Defaults to 0.
  • max_size (Number, optional) – maximum size to pass the filter. Defaults to rekall.common.INFTY.
  • axis (optional) – The axis to filter on. Represented as a pair of co-ordinates, such as ('t1', 't2'). Defaults to None, which uses the primary_axis of self.
Returns:

A new IntervalSet of intervals with bound length within [min_size, max_size] along the given axis.

fold(reducer, init=None, sort_key=None)

Folds a reducer over an ordered list of intervals in the set.

Parameters:
  • reducer – A function that takes a previous state and an interval and returns an updated state.
  • init (optional) – The initial state to use in fold. Defaults to None, which means using the first interval as the initial state and run reduction from the second interval.
  • sort_key (optional) – A function that takes an Interval and returns a value as the sort key that defines the order of the list to fold over. If None, uses the primary_axis of the Bound of an Interval in the IntervalSet.
Returns:

The final value of the state after folding all intervals in set.

fold_to_set(reducer, init=None, sort_key=None, acc_to_set=<function IntervalSet.<lambda>>)

Fold over intervals in the set to produce a new IntervalSet.

The same as fold method except it returns a IntervalSet by running acc_to_set on the final state of the fold, making it an in-system operation.

Parameters:
  • reducer – A function that takes a previous state and an interval and returns an updated state.
  • init (optional) – The initial state to use in fold. Defaults to None, which means using the first interval as the initial state and run reduction from the second interval.
  • sort_key (optional) – A function that takes an Interval and returns a value as the sort key that defines the order of the list to fold over. If None, uses the primary_axis of the Bound of an Interval in the IntervalSet.
  • acc_to_set (optional) – A function that takes the final state of the fold and returns an IntervalSet. Defaults to a function that takes in a list of Intervals and constructs an IntervalSet with that.
Returns:

A new IntervalSet that is the result of acc_to_set on the output of fold.

get_intervals()

Returns a list of Intervals, ordered by their Bounds (which are sortable).

group_by(key, merge)

Group intervals by key and produces a new interval for each group.

Parameters:
  • key – A function that takes an Interval and returns a value as the key to group by.
  • merge – A function that takes a group key and an IntervalSet of the group, and returns one Interval.
Returns:

A new IntervalSet with the results of merge on each group.

group_by_axis(axis, output_bounds)

Group intervals by a particular axis.

For each group, create an Interval with bounds equal to output_bounds, except for the co-ordinates in axis. These are set to the co-ordinate values for the group. The payload is an IntervalSet containing all the intervals in the group.

Example:

iset = IntervalSet([
    Interval(Bounds3D(0, 1, 0.5, 0.75, 0.5, 0.75)),
    Interval(Bounds3D(0, 1, 0.3, 0.75, 0.1, 0.3)),
    Interval(Bounds3D(0, 2, 0.1, 0.7, 0.1, 0.2)),
    Interval(Bounds3D(0, 2, 0.05, 0.25, 0.1, 0.12)),
    Interval(Bounds3D(1, 2, 0.6, 0.75, 0.1, 0.2))
])

is_grouped = iset.group_by_axis(('t1', 't2'),
    Bounds3D(0, 1, 0, 1, 0, 1))

# This is what is_grouped looks like
is_grouped == IntervalSet([
    Interval(Bounds3D(0, 1, 0, 1, 0, 1), [
        Interval(Bounds3D(0, 1, 0.5, 0.75, 0.5, 0.75)),
        Interval(Bounds3D(0, 1, 0.3, 0.75, 0.1, 0.3)),
    ]),
    Interval(Bounds3D(0, 2, 0, 1, 0, 1), [
        Interval(Bounds3D(0, 2, 0.1, 0.7, 0.1, 0.2)),
        Interval(Bounds3D(0, 2, 0.05, 0.25, 0.1, 0.12))
    ]),
    Interval(Bounds3D(1, 2, 0, 1, 0, 1), [
        Interval(Bounds3D(1, 2, 0.6, 0.75, 0.1, 0.2))
    ])
])
Parameters:
  • axis – The axis to group by. Represented as a pair of co-ordinates, such as ('t1', 't2').
  • output_bounds – Default output bounds for the Intervals in the output. The fields in axis are overwritten by the grouped values.
Returns:

A new IntervalSet of Intervals grouped by axis, with the full IntervalSet of Intervals in the group in the payload.

join(other, predicate, merge_op, window=None)

Cross-products two sets and combines pairs that pass the predicate.

Named after the database JOIN operation, this method takes the cross- product of two IntervalSets, filters the resulting set of pairs and forms a new IntervalSet by combining each pair into a new Interval.

Note

join does not take the full cross-product but instead only consider pairs within window of each other. See notes in class documentation for more details.

Parameters:
  • other (IntervalSet) – The interval set to join with.
  • predicate – A function that takes two Intervals and returns a bool.
  • merge_op – A function that takes two Intervals and returns a single new Interval.
  • window (optional) – Restrict interval pairs to those within window of each other along the primary axis. Defaults to None which means using the default optimization window associated with self. See class Documentation for more detail.
Returns:

A new IntervalSet from the intervals produced by merge_op on pairs that pass the predicate.

map(map_fn)

Maps a function over all intervals in the set.

Parameters:map_fn – A function that takes an Interval and returns an Interval.
Returns:A new IntervalSet with the mapped intervals.
map_payload(fn)

Maps a function over payloads of all intervals in the set.

Parameters:fn – A function that takes the payload of an interval and returns the new payload.
Returns:A new IntervalSet of intervals with the same bounds but with transformed payloads.
match(pattern, exact=False)

Pattern matching among the intervals in the set.

A pattern is a list of constraints, where each constraint is a pair of (names, predicates). A name is a string that names the variable in the constraint. A predicate is a function that takes len(names) Intervals as arguments and returns True or False.

A solution is a directionary of assignments, where the key is the name of the variable defined in the pattern and the key is the interval in the set that gets assigned to that variable. The assignments in a solution will satisfy all constraints in the pattern.

Example

Here’s a simple example of matching a spatial pattern:

pattern = [
    (["harry"], [XY(height_at_least(0.5)), name_is("harry")]),
    (["ron"], [XY(height_at_least(0.5)), name_is("ron")]),
    (["harry","ron"], [XY(same_height()), XY(left_of())])
]

# solutions are cases in intervalset where harry and ron are at
# least half the screen height and have the same height and
# harry is on the left of ron.
solutions = intervalset.match(pattern)

# solutions == [
#   {"harry": Interval(...), "ron": Interval(...)},
#   {"harry": Interval(...), "ron": Interval(...)},
# ]
Parameters:
  • pattern – A pattern specified by a list of constraints. See above for more details.
  • exact (optional) – Whether all intervals in the set need to have an assignment for it to be considered a solution. Defaults to False.
Returns:

A list of solutions as described above.

minus(other, axis=None, window=None, predicate=None)

Subtract one IntervalSet from the other across some axis.

Calculates the interval difference between intervals in self and intervals in other on some axis. The other dimensions are not changed.

In particular, a.minus(b) will produce a new set of Intervals that maximally covers the Intervals in a without intersecting any of the intervals in b on axis, but only if they pass the predicate.

The difference between two intervals can produce up to two new intervals.

Suppose we try to subtract two sets of intervals from each other:

# Suppose the following interval is in self
|--------------------------------------------------------|

# and that the following two intervals are in other
          |--------------|     |----------------|

# this function will produce three intervals
|---------|              |-----|                |--------|

If an interval in self overlaps no pairs in other along the temporal dimension, then the interval is reproduced in the output.

Note

Although it is possible to compute difference on arbitrary volumes, it is unclear how to break the resulting set into bounding volumes. Therefore we currently only allow difference to be computed along a single axis.

Parameters:
  • other – The IntervalSet to subtract from self.
  • axis (optional) – The axis to subtract on. Represented as a pair of co-ordinates, such as ('t1', 't2'). Defaults to None, which uses the primary_axis of self.
  • window (optional) – Restrict interval pairs to those within window of each other along the primary axis. Defaults to None which means using the default optimization window associated with self. See class Documentation for more detail.
  • predicate (optional) – A function that takes one interval in self and an interval in other and returns a bool. Only counts intervals in other that pass the predicate for the minus operation.
Returns:

A new IntervalSet with the results from the subtraction. It may contain more intervals than before the subtraction, but none of the intervals will overlap with the intervals in other along axis.

size()

Returns the number of intervals in the set.

split(split_fn)

Splits each Interval into an IntervalSet, and returns the union of all the IntervalSets.

Parameters:
  • split_fn – A function that takes an Interval and returns an
  • IntervalSet.
Returns:

A new IntervalSet with the union of all the IntervalSets generated by split_fn applied to each Interval.

to_json(payload_to_json)

Converts the interval set to a JSON object.

Parameters:payload_to_json – Function that converts each payload to a JSON object.
Returns:JSON object for the interval set
union(other)

Set union of two IntervalSets.

Parameters:other – The other IntervalSet to union with.
Returns:A new IntervalSet with all intervals in self and other.