rekall.stdlib.ingest module¶
Module for ingesting data from various sources into
IntervalSetMapping
’s. Provides some common data loading facilities from
data sources that we’ve seen appear regularly in our use.
-
rekall.stdlib.ingest.
attrgetter_accessor
(row, field)¶ Accessor for iterables whose fields are put into class attributes, like Django querysets. Returns the equivalent of
row.field
.
-
rekall.stdlib.ingest.
django_bbox_default_schema
()¶ A default schema for bounding box records in a database.
-
rekall.stdlib.ingest.
getter_accessor
(row, field)¶ Accessor for iterables whose items have implemented __getitem__, like Pandas/Spark dataframes. Returns
row[field]
.
-
rekall.stdlib.ingest.
ism_from_df
(df, bounds_class=<class 'rekall.bounds.bounds3D.Bounds3D'>, bounds_schema={}, progress=None, total=None)¶ Default constructor for Pandas-style dataframes.
This uses the right accessor for rows in a dataframe and by default creates Intervals with Bounds3D bounds.
The default schema in this case is:
{ "key": "video_id", "t1": "min_frame", "t2": "max_frame" }
The fields in
bounds_schema
update the fields of the default schema.Payload is set by setting a value in the
payload
field ofbounds_schema
. For example:{ "payload": "id" }
This will set the payload of each Interval to the id field of the row.
Parameters: - qs – A Django queryset where every record will become an Interval.
- bounds_class (optional) – The bounds that each Interval will have. Curently only supports Bounds1D and Bounds3D. Defaults to Bounds3D.
- bounds_schema (optional) – A dictionary that overrides the default field keys.
- with_payload (optional) – A function that takes in a record in
qs
and returns the payload for the Interval from the record. Defaults to a function that returnsNone
. - progress (optional) – Whether to display a loading bar from
tqdm
. Defaults toFalse
. - total (optional) – Used in conjunction with
progress
to optionally display the total number of items in the loading bar ifprogress
isTrue
.
Returns: An IntervalSetMapping with Intervals from each record of qs.
Raises: NotImplementedError
– Ifbounds_class
is not one ofBounds3D
orBounds1D
.
-
rekall.stdlib.ingest.
ism_from_django_qs
(qs, bounds_class=<class 'rekall.bounds.bounds3D.Bounds3D'>, bounds_schema={}, with_payload=None, progress=None)¶ Default constructor for Django QuerySets.
This uses the right accessor for rows in a Django QuerySet and by default creates Intervals with Bounds3D bounds.
The default schema in this case is:
{ "key": "video_id", "t1": "min_frame", "t2": "max_frame" }
The fields in
bounds_schema
update the fields of the default schema.Payload is set by setting a value in the
payload
field ofbounds_schema
. For example:{ "payload": "id" }
This will set the payload of each Interval to the id field of the record.
This supports nested field names. For example:
{ "t1": "face.frame.number", "t2": "face.frame.number" }
Parameters: - qs – A Django queryset where every record will become an Interval.
- bounds_class (optional) – The bounds that each Interval will have. Curently only supports Bounds1D and Bounds3D. Defaults to Bounds3D.
- bounds_schema (optional) – A dictionary that overrides the default field keys.
- progress (optional) – Whether to display a loading bar from
tqdm
. The total for the loading bar is computed usingqs.count()
. Defaults toFalse
.
Returns: An IntervalSetMapping with Intervals from each record of qs.
Raises: NotImplementedError
– Ifbounds_class
is not one ofBounds3D
orBounds1D
.
-
rekall.stdlib.ingest.
ism_from_iterable_with_schema_bounds1D
(iterable, key_accessor, bounds_schema={}, with_payload=<function <lambda>>, progress=False, total=None)¶ Constructs an IntervalSetMapping of Intervals with Bounds1D bounds from an iterable based on a schema.
bounds_schema
andkey_accessor
define how to access necessarykey
and co-ordinate fields from items ofiterable
. In particular, for any valueV
inbounds_schema
,key_accessor(item, V)
should access fieldV
ofitem
.getter_accessor
andattrgetter_accessor
are two examples ofkey_accessor
methods.By default, this function expects
iterator
to have fieldskey
,t1
, andt2
. Butbounds_schema
can overwrite those fields. For example, if the mapping key, t1, and t2 values are stored in fieldsvideo_id
,min_frame
, andmax_frame
respectively,bounds_schema
should be set to:{ 'key': 'video_id', 't1': 'min_frame', 't2': 'max_frame' }
And for each item in iterable, the key, t1, and t2 values will be accessed by
key_accessor(item, 'video_id')
,key_accessor(item, 'min_frame')
, andkey_accessor(item, 'max_frame')
, respectively.If only the key needs to be updated,
bounds_schema
should be set to:{ 'key': 'video_id' }
Parameters: - iterable – An iterable of elements whose relevant fields can be
accessed by
key_accessor
. - key_accessor – A function that takes an element of
iterable
and a key and returns the value of the key on that element. - bounds_schema (optional) – A dictionary that overrides default field
keys (
'key'
for the key,'t1'
for co-ordinate t1, and't2'
for co-ordinate t2). - with_payload (optional) – A function that takes in an item in iterable
and returns the payload for the Interval from the item. Defaults to
a function that returns
None
. - progress (optional) – Whether to display a loading bar from
tqdm
. Defaults toFalse
. - total (optional) – Used in conjunction with
progress
to optionally display the total number of items in the loading bar ifprogress
isTrue
.
Returns: An IntervalSetMapping with Intervals from each item of iterable.
- iterable – An iterable of elements whose relevant fields can be
accessed by
-
rekall.stdlib.ingest.
ism_from_iterable_with_schema_bounds3D
(iterable, key_accessor, bounds_schema={}, with_payload=<function <lambda>>, progress=False, total=None)¶ Constructs an IntervalSetMapping of Intervals with Bounds3D bounds from an iterable based on a schema.
bounds_schema
andkey_accessor
define how to access necessarykey
and co-ordinate fields from items ofiterable
. In particular, for any valueV
inbounds_schema
,key_accessor(item, V)
should access fieldV
ofitem
.getter_accessor
andattrgetter_accessor
are two examples ofkey_accessor
methods.By default, this function expects
iterator
to have fieldskey
,t1
, andt2
. Butbounds_schema
can overwrite those fields and add new fields (Bounds3D has default values forx1
,x2
,y1
, andy2
-bounds_schema
can overwrite these default values). For example, if the mapping key, t1, t2, x1, and x2 values are stored in fieldsvideo_id
,min_frame
,max_frame
,bbox_x1
, andbbox_x2
fields respectively,bounds_schema
should be set to:{ 'key': 'video_id', 't1': 'min_frame', 't2': 'max_frame', 'x1': 'bbox_x1', 'x2': 'bbox_x2' }
And for each item in iterable, the key, t1, t2, x1, and x2 values will be accessed by
key_accessor(item, 'video_id')
,key_accessor(item, 'min_frame')
,key_accessor(item, 'max_frame')
,key_accessor(item, 'bbox_x1')
,key_accessor(item, 'bbox_x2')
, respectively.If only the key needs to be updated,
bounds_schema
should be set to:{ 'key': 'video_id' }
In this case, Bounds3D will use default values for x1, x2, y1, and y2.
Parameters: - iterable – An iterable of elements whose relevant fields can be
accessed by
key_accessor
. - key_accessor – A function that takes an element of
iterable
and a key and returns the value of the key on that element. - bounds_schema (optional) – A dictionary that overrides default field
keys (
'key'
for the key,'t1'
for co-ordinate t1,'t2'
for co-ordinate t2,'x1'
for co-ordinate x1,'x2'
for co-ordinate x2,'y1'
for co-ordinate y1, and'y2'
for co-ordinate y2). - with_payload (optional) – A function that takes in an item in iterable
and returns the payload for the Interval from the item. Defaults to
a function that returns
None
. - progress (optional) – Whether to display a loading bar from
tqdm
. Defaults toFalse
. - total (optional) – Used in conjunction with
progress
to optionally display the total number of items in the loading bar ifprogress
isTrue
.
Returns: An IntervalSetMapping with Intervals from each item of iterable.
- iterable – An iterable of elements whose relevant fields can be
accessed by