rekall.stdlib.ingest module¶
Module for ingesting data from various sources into
IntervalSetMapping’s. Provides some common data loading facilities from
data sources that we’ve seen appear regularly in our use.
-
rekall.stdlib.ingest.attrgetter_accessor(row, field)¶ Accessor for iterables whose fields are put into class attributes, like Django querysets. Returns the equivalent of
row.field.
-
rekall.stdlib.ingest.django_bbox_default_schema()¶ A default schema for bounding box records in a database.
-
rekall.stdlib.ingest.getter_accessor(row, field)¶ Accessor for iterables whose items have implemented __getitem__, like Pandas/Spark dataframes. Returns
row[field].
-
rekall.stdlib.ingest.ism_from_df(df, bounds_class=<class 'rekall.bounds.bounds3D.Bounds3D'>, bounds_schema={}, progress=None, total=None)¶ Default constructor for Pandas-style dataframes.
This uses the right accessor for rows in a dataframe and by default creates Intervals with Bounds3D bounds.
The default schema in this case is:
{ "key": "video_id", "t1": "min_frame", "t2": "max_frame" }
The fields in
bounds_schemaupdate the fields of the default schema.Payload is set by setting a value in the
payloadfield ofbounds_schema. For example:{ "payload": "id" }
This will set the payload of each Interval to the id field of the row.
Parameters: - qs – A Django queryset where every record will become an Interval.
- bounds_class (optional) – The bounds that each Interval will have. Curently only supports Bounds1D and Bounds3D. Defaults to Bounds3D.
- bounds_schema (optional) – A dictionary that overrides the default field keys.
- with_payload (optional) – A function that takes in a record in
qsand returns the payload for the Interval from the record. Defaults to a function that returnsNone. - progress (optional) – Whether to display a loading bar from
tqdm. Defaults toFalse. - total (optional) – Used in conjunction with
progressto optionally display the total number of items in the loading bar ifprogressisTrue.
Returns: An IntervalSetMapping with Intervals from each record of qs.
Raises: NotImplementedError– Ifbounds_classis not one ofBounds3DorBounds1D.
-
rekall.stdlib.ingest.ism_from_django_qs(qs, bounds_class=<class 'rekall.bounds.bounds3D.Bounds3D'>, bounds_schema={}, with_payload=None, progress=None)¶ Default constructor for Django QuerySets.
This uses the right accessor for rows in a Django QuerySet and by default creates Intervals with Bounds3D bounds.
The default schema in this case is:
{ "key": "video_id", "t1": "min_frame", "t2": "max_frame" }
The fields in
bounds_schemaupdate the fields of the default schema.Payload is set by setting a value in the
payloadfield ofbounds_schema. For example:{ "payload": "id" }
This will set the payload of each Interval to the id field of the record.
This supports nested field names. For example:
{ "t1": "face.frame.number", "t2": "face.frame.number" }
Parameters: - qs – A Django queryset where every record will become an Interval.
- bounds_class (optional) – The bounds that each Interval will have. Curently only supports Bounds1D and Bounds3D. Defaults to Bounds3D.
- bounds_schema (optional) – A dictionary that overrides the default field keys.
- progress (optional) – Whether to display a loading bar from
tqdm. The total for the loading bar is computed usingqs.count(). Defaults toFalse.
Returns: An IntervalSetMapping with Intervals from each record of qs.
Raises: NotImplementedError– Ifbounds_classis not one ofBounds3DorBounds1D.
-
rekall.stdlib.ingest.ism_from_iterable_with_schema_bounds1D(iterable, key_accessor, bounds_schema={}, with_payload=<function <lambda>>, progress=False, total=None)¶ Constructs an IntervalSetMapping of Intervals with Bounds1D bounds from an iterable based on a schema.
bounds_schemaandkey_accessordefine how to access necessarykeyand co-ordinate fields from items ofiterable. In particular, for any valueVinbounds_schema,key_accessor(item, V)should access fieldVofitem.getter_accessorandattrgetter_accessorare two examples ofkey_accessormethods.By default, this function expects
iteratorto have fieldskey,t1, andt2. Butbounds_schemacan overwrite those fields. For example, if the mapping key, t1, and t2 values are stored in fieldsvideo_id,min_frame, andmax_framerespectively,bounds_schemashould be set to:{ 'key': 'video_id', 't1': 'min_frame', 't2': 'max_frame' }
And for each item in iterable, the key, t1, and t2 values will be accessed by
key_accessor(item, 'video_id'),key_accessor(item, 'min_frame'), andkey_accessor(item, 'max_frame'), respectively.If only the key needs to be updated,
bounds_schemashould be set to:{ 'key': 'video_id' }
Parameters: - iterable – An iterable of elements whose relevant fields can be
accessed by
key_accessor. - key_accessor – A function that takes an element of
iterableand a key and returns the value of the key on that element. - bounds_schema (optional) – A dictionary that overrides default field
keys (
'key'for the key,'t1'for co-ordinate t1, and't2'for co-ordinate t2). - with_payload (optional) – A function that takes in an item in iterable
and returns the payload for the Interval from the item. Defaults to
a function that returns
None. - progress (optional) – Whether to display a loading bar from
tqdm. Defaults toFalse. - total (optional) – Used in conjunction with
progressto optionally display the total number of items in the loading bar ifprogressisTrue.
Returns: An IntervalSetMapping with Intervals from each item of iterable.
- iterable – An iterable of elements whose relevant fields can be
accessed by
-
rekall.stdlib.ingest.ism_from_iterable_with_schema_bounds3D(iterable, key_accessor, bounds_schema={}, with_payload=<function <lambda>>, progress=False, total=None)¶ Constructs an IntervalSetMapping of Intervals with Bounds3D bounds from an iterable based on a schema.
bounds_schemaandkey_accessordefine how to access necessarykeyand co-ordinate fields from items ofiterable. In particular, for any valueVinbounds_schema,key_accessor(item, V)should access fieldVofitem.getter_accessorandattrgetter_accessorare two examples ofkey_accessormethods.By default, this function expects
iteratorto have fieldskey,t1, andt2. Butbounds_schemacan overwrite those fields and add new fields (Bounds3D has default values forx1,x2,y1, andy2-bounds_schemacan overwrite these default values). For example, if the mapping key, t1, t2, x1, and x2 values are stored in fieldsvideo_id,min_frame,max_frame,bbox_x1, andbbox_x2fields respectively,bounds_schemashould be set to:{ 'key': 'video_id', 't1': 'min_frame', 't2': 'max_frame', 'x1': 'bbox_x1', 'x2': 'bbox_x2' }
And for each item in iterable, the key, t1, t2, x1, and x2 values will be accessed by
key_accessor(item, 'video_id'),key_accessor(item, 'min_frame'),key_accessor(item, 'max_frame'),key_accessor(item, 'bbox_x1'),key_accessor(item, 'bbox_x2'), respectively.If only the key needs to be updated,
bounds_schemashould be set to:{ 'key': 'video_id' }
In this case, Bounds3D will use default values for x1, x2, y1, and y2.
Parameters: - iterable – An iterable of elements whose relevant fields can be
accessed by
key_accessor. - key_accessor – A function that takes an element of
iterableand a key and returns the value of the key on that element. - bounds_schema (optional) – A dictionary that overrides default field
keys (
'key'for the key,'t1'for co-ordinate t1,'t2'for co-ordinate t2,'x1'for co-ordinate x1,'x2'for co-ordinate x2,'y1'for co-ordinate y1, and'y2'for co-ordinate y2). - with_payload (optional) – A function that takes in an item in iterable
and returns the payload for the Interval from the item. Defaults to
a function that returns
None. - progress (optional) – Whether to display a loading bar from
tqdm. Defaults toFalse. - total (optional) – Used in conjunction with
progressto optionally display the total number of items in the loading bar ifprogressisTrue.
Returns: An IntervalSetMapping with Intervals from each item of iterable.
- iterable – An iterable of elements whose relevant fields can be
accessed by