Typing

TimeSeriesValue (TS)

The most basic type is the TS type. We have been using this in the previous examples, this represents a time-series of “scalar” values. Where a scalar value is a non-time-varying data type. These include things such as int, float, tuple, etc. Scalars should be immutable, although this constraint is not formally enforced, using a mutable data type can cause undefined behaviour if the value is mutated during processing.

We have already covered a number of examples using the key time-series properties such as value, active, and valid. There are two other important properties, namely:

delta_value - This represents the change in value but is equivalent to value in the case of the TS type.
last_modified_time - The time the value was last changed.
modified - A boolean value representing if this input was modified in this engine cycle.

Modified can be useful when there are more than one input that can change and the code behaves differently depending on which input was modified, for example:

from hgraph import compute_node, TS
from hgraph.test import eval_node

@compute_node
def my_compute_node(ts1: TS[int], ts2: TS[int]) -> TS[int]:
    if ts1.modified and ts2.modified:
        return ts1.value + ts2.value
    elif ts1.modified:
        return ts1.value
    else:
        return ts2.value  # ts2 must have been modified in this case

assert eval_node(my_compute_node, [1, 2, None], [3, None, 4]) == [4, 2, 4]

As mentioned previously, HGraph is strongly typed. Whilst Python itself does not enforce any form of typing, this is not true for HGraph functions. They require each input and output to be typed. These types are validated when connecting an output to an input. There is no automatic type conversions, thus an output of type TS[int] cannot be bound to a type of TS[float] without an explicit type cast.

Casting

To facilitate type conversions a number of casting utility functions exists, these are:

cast_
downcast_
downcast_ref

For now we will focus on cast_ and downcast_.

To convert from one type to another use cast_, this can be used on any type that supports a constructor that takes in the source value and returns an instance of the to value.

For example:

from hgraph import graph, TS, cast_
from hgraph.test import eval_node

@graph
def cast_int_to_float(ts: TS[int]) -> TS[float]:
    return cast_(float, ts)

assert eval_node(cast_int_to_float, [1, 2, 3]) == [1.0, 2.0, 3.0]

This casts the integer input into a float.

Note

As mentioned earlier, most code users write is expected to be in the form of graph code. All nodes (compute/sink/source) are wired together in the graph decorated functions. It is not possible to call a node from within a node function. Since cast_ is a compute node, it must be called within a graph decorated function.

The downcast allows a type to be re-cast to a child type, for example:

from hgraph import graph, TS, downcast_, MIN_ST
from hgraph.test import eval_node
from datetime import date, datetime

@graph
def cast_date_to_datetime(ts: TS[date]) -> TS[datetime]:
    return downcast_(datetime, ts)

assert eval_node(cast_date_to_datetime, [MIN_ST]) == [MIN_ST]

This may seem a bit strange, since we supply a datetime instance to start with. But in Python datetime is an instance of date, thus the type checking logic correctly accepts the datetime instance as a date. But if we got rid of the downcast_ operator the graph would complain.

Additionally if we supplied a date as an input, the downcast would raise an assertion error.

CompoundScalar

The downcast operator is generally more widely used when working with the CompoundScalar type. This type provides a more complex structure for a value (or scalar) type.

The compound scalar is a type safe data class. All compound scalar classes have CompoundScalar as the base. It is recommended to use the dataclass to tag the class.

Here is an example:

@dataclass
class MyCompoundScalar(CompoundScalar):
    p1: str
    p2: int

It is possible to use all the defined types available in HGraph as types for the properties. It is possible to provide default values as well, for example:

p1: str = "Hello"

As well as setting the value to None, which is useful to describe optional fields, i.e.:

p1: str = None

Scalar types are considered as immutable and atomic, in this case the compound scalar represents the collection of values that tick together.

The types can be sub-classed as well.

To use the type, these are type to time standard time-series type (TS)

TimeSeriesBundle (TSB)

Sometimes it is useful to describe a related collection of time-series values, these related values do not necessarily change in unison with each other, but do form a natural grouping. Alternatively they may represent values that, whilst they do change together may be computed separately.

An example of this scenario is a mid and spread, they are related in that they are computed from the inside bid and offer price, but depending on the market or use-case the mid price is more likely to change than the spread. Thus they are likely to tick at different rates. Also, it is quite standard for a mid price to be computed independently from the spread when pricing an instrument, but the values need to be grouped together as they are both required to know the value of the instrument when considering side.

Using this example we can group time-series values together as follows:

from hgraph import TSB, TimeSeriesSchema, graph
from dataclasses import dataclass

@dataclass
class MidSpread(TimeSeriesSchema):
    mid: TS[float]
    spread: TS[float]


@graph
def my_price_logic(price: TSB[MidSpread], ...) ->  ...

We declare the schema or shape of the bundle in much the same way as for the CompoundScalar, however, in this case the types are all time-series types. With a TimeSeriesScheam, all properties must be time-series types. Whereas for the CompoundScalar all types much also be scalar types.

With both TS of CompoundScalar and TSB of TimeSeriesSchema, it is possible to dereference the individual properties of the schemas by using the standard dot notation, for example:

@graph
def my_price_logic(price: TSB[MidSpread], ...) ->  ...
    a = price.mid

Note

When dereferencing a property of a bundle, during wiring, there is no cost. Doing the same with a TS of CompoundScalar incurs a cost of a node to extract the property from the compound scalar and emit it as a time-series value.

To construct a TSB value we consider two options, one in graph mode and one in a compute_node.

from hgraph import TS, TSB, TimeSeriesSchema, graph, CompoundScalar, combine
from hgraph.test import eval_node
from dataclasses import dataclass
from frozendict import frozendict as fd

@dataclass
class BidAsk(CompoundScalar):
    bid: float
    ask: float

@dataclass
class MidSpread(TimeSeriesSchema):
    mid: TS[float]
    spread: TS[float]

@graph
def to_mid_spread(price: TS[BidAsk]) -> TSB[MidSpread]:
    mid = (price.bid + price.ask) / 2.0
    spread = price.ask - price.bid
    return combine[TSB[MidSpread]](mid=mid, spread=spread)

assert eval_node(to_mid_spread, [BidAsk(bid=100.0, ask=101.0)]) == [fd(mid=100.5, spread=1.0)]

This shows the use of the dot dereferencing of a compound scalar. Remember this does incur two nodes to extract the bid and ask time-series values. This also shows the use of many standard operators such as divide and subtraction. HGraph supports most of the Python operators at wiring time allowing for writing code is a very similar fashion to standard Python. But this is really just building up a dependency graph of nodes with the operators being replaced with computation nodes. These nodes will be evaluated when the inputs tick.

The use of the combine operator is depicted here. The operator is a generic operator that will be resolved into the correct node (or logical) instance. In this case let the combine operator that we which to combine time-series values together into a TSB with the schema MidSpread. If no refining parameters are provide (the [TSB[MidSpread]] the combine always assumes it is producing a TSB instance and will create an un-named type. Un named TSB instances are defined dynamically and will match a named type based on the properties matching, that is:

combine[TSB[MidSpread]](mid=mid, spread=spread)

is equivalent to:

combine(mid=mid, spread=spread)

It is also possible to combine time-series values into a compound scalar, for example:

ask = ...
bid = ...
combine[TS[BidAsk]](bid=bid, ask=ask)

In this case it is required that the output type is provided to produce the correct output type, otherwise we would instead create an un-named bundle of the values.

Lets consider the other approach, using a compute_node:

from hgraph import TS, TSB, TimeSeriesSchema, graph, CompoundScalar, compute_node
from hgraph.test import eval_node
from dataclasses import dataclass
from frozendict import frozendict as fd

@dataclass
class BidAsk(CompoundScalar):
    bid: float
    ask: float

@dataclass
class MidSpread(TimeSeriesSchema):
    mid: TS[float]
    spread: TS[float]

@compute_node
def to_mid_spread(price: TS[BidAsk]) -> TSB[MidSpread]:
    price = price.value  # get the actual value
    mid = (price.bid + price.ask) / 2.0
    spread = price.ask - price.bid
    return dict(mid=mid, spread=spread)

assert eval_node(to_mid_spread, [BidAsk(bid=100.0, ask=101.0)]) == [fd(mid=100.5, spread=1.0)]

This code looks very similar to the previous example, the only real difference is the requirement to extract the value from price before performing the computations and here we return the bundle as a dictionary of modified values.

In this case the code will produce fewer nodes as the nodes to extract bid and ask are not required, not will there be nodes for the mathematical operations. This code is likely to run faster then the previous example whilst the runtime-engine remains in Python. However, once the engine is migrated to C++, experience indicates that the prior code will often outperform the second version as it is all evaluated in C++ and not in Python.

That said, with all performance statements, validation of your particular use-case is always important.

Finally, lets view how to access the properties of a TSB inside of a compute node.

from hgraph import TS, TSB, TimeSeriesSchema, graph, CompoundScalar, compute_node
from hgraph.test import eval_node
from dataclasses import dataclass
from frozendict import frozendict as fd

@dataclass
class BidAsk(CompoundScalar):
    bid: float
    ask: float

@dataclass
class MidSpread(TimeSeriesSchema):
    mid: TS[float]
    spread: TS[float]

@compute_node
def to_bid_ask(price: TSB[MidSpread]) -> TS[BidAsk]:
    mid = price.mid.value
    half_spread = price.spread.value / 2.0
    return BidAsk(bid=mid-half_spread, ask=mid+half_spread)

assert eval_node(to_bid_ask, [fd(mid=100.5, spread=1.0)]) == [BidAsk(bid=100.0, ask=101.0)]

Here we see that each time-series property is represented as a time-series within the compute node. Thus we need to get the value of the property. Each property also responds to all other time-series methods such as modified, etc.

It is also possible to request the value of the time-series bundle directly, this will return a dictionary of keys and values. This is also the first time that the delta_value returns something different, this will return the dictionary of values that was modified in this engine cycle.

Exercise

Try creating a compute node (or sink node) that prints the value and delta_value with different input combinations being ticked.

TimeSeriesList (TSL)

The TSL is the time-series equivalent of a list, at this point in time, the list have a fixed size. This list is of homogenous time-series values. This is different to the TSB which is a collection of heterogeneous time-series values. When specifying the TSL two generics need to be provided, the first is the time-series type making up the elements of the list and the second is the size of the list, for example:

from hgraph import compute_node, TSL, TS, Size
from hgraph.test import eval_node

@compute_node
def my_compute_node(tsl: TSL[TS[int], Size[2]]) -> TS[int]:
    return tsl[0].value + tsl[1].value

assert eval_node(my_compute_node, [(1, 2), (3, 4)]) == [3, 7]

Note

The use of the Size class to specifying the size of the list. This is done as Python does not support values as generics and only types. This provides a mechanism to specify the type including it’s size using the generic tooling provided by Python.

When accessing a collection type, as with the TSB, referencing an element of the type within a node the return value is the time-series value, in this case it is TS[int] that gets returned.

If value is called on the collection type, the returned value is the collection of recursive calls to value on the elements of the collection, for example:

from hgraph import compute_node, TSL, TS, Size
from hgraph.test import eval_node

@compute_node
def my_compute_node(tsl: TSL[TS[int], Size[2]]) -> TS[tuple[int, ...]]:
    return tsl.value

assert eval_node(my_compute_node, [(1, 2), (3, 4)]) == [(1, 2), (3, 4)]

Collection types can be dereferenced in graph code as well, for example:

from hgraph import graph, TSL, TS, Size
from hgraph.test import eval_node

@graph
def my_compute_node(tsl: TSL[TS[int], Size[2]]) -> TS[int]:
    return tsl[0] + tsl[1]

assert eval_node(my_compute_node, [(1, 2), (3, 4)]) == [3, 7]

This code is the same as the node implementation. Since we are at graph level, the + operator results in the following equivalent code:

 @graph
def my_compute_node(tsl: TSL[TS[int], Size[2]]) -> TS[int]:
    return add_(tsl[0], tsl[1])

Where the add_ node takes two TS inputs.

TimeSeriesSet (TSS)

Another often used data type is the set, the time-series equivalent is the time-series set or TSS. This is a collection time-series type as well, but behaves more closely to the TS type as it can only contain scalar values.

The type supports tracking the contents of a set over time and can provide the changes made in the form of the SetDelta protocol class. The delta contains the items added and removed. The type itself contains the current state (accessible via the value property). The SetDelta is obtained from the delta_value property on the time-series instance.

Here is an example of the TSS used in a compute node.

from hgraph import compute_node, TSS, set_delta
from hgraph.test import eval_node

@compute_node
def my_compute_node(tss_1: TSS[int], tss_2: TSS[int]) -> TSS[int]:
    added = (tss_1.added() - tss_2.value) | (tss_2.added() - tss_1.value)
    removed = tss_1.removed() - tss_2.value
    removed |= tss_2.removed() - tss_1.value
    return set_delta(added=added, removed=removed)

assert eval_node(my_compute_node, [frozenset({1, 2}),], [frozenset({3, 4})]) == [frozenset({1, 2, 3, 4})]

TimeSeriesDict (TSD)

This represents a dictionary of time-series values, the TSD is comprised of a key_set that is a TSS instance. The values of the dictionary are themselves time-series values in the same manor as for the TSB and TSL collection types. This is currently the only dynamic type, in that it can grow and shrink the number of collected time-series values.

Another way to think of the TSD is to view it as a multiplex of time-series values.

The TSD takes generics as for dict, i.e. TSD[K, V] where the K must be a keyable scalar value (must support the hashable protocol). and V is a time-series type.

The following key behaviours are provided by the TSD that are accessible in the node, namely:

key_set: As already discussed this is a time-series set with type K. The set contains the keys of the dictionary.
keys(), values(), items(): As for any dictionary, these represent an iterator over the keys, values, and items. Where values are the time-series type instances.
modified_keys(), modified_values(), modified_items(): As above, but will only provide values that have been modified in this engine cycle.
valid_keys(), valid_values(), valid_items(): As above, but will only provide values that are valid in this engine cycle.
added_keys(), added_values(), added_items(): As above, but will only provide values that have been added in this engine cycle.
removed_keys(), removed_values(), removed_items(): As above, but will only provide values that have been removed in this engine cycle.

Standard methods such as __len__, __iter__, and __contains__ are implemented as expected for a dict.

Here is an example to create a TSD as an output:

from hgraph import compute_node, TSD, REMOVE_IF_EXISTS, REMOVE, TS
from hgraph.test import eval_node
from frozendict import frozendict as fd

@compute_node(valid=("key", "value"))
def my_compute_node(key: TS[int], value: TS[str], remove: TS[int]) -> TSD[int, TS[str]]:
    out = {}
    if key.modified or value.modified:
        out[key.value] = value.value
    if remove.modified:
        out[remove.value] = REMOVE_IF_EXISTS
    return out

assert eval_node(my_compute_node,
            [1, None, 2],
            ["a", "b", "c", "d"],
            [None, None, None, 1]
        ) == [fd({1: "a"}), fd({1: "b"}), fd({2: "c"}), fd({1: REMOVE, 2: "d"})]

In this example we create a time-series dictionary from the time-series supplying keys and values and removing keys when the remove time-series ticks.

Note the use of valid to advice the engine that we only require the key and value attribute to be valid, thus if the remove has not ticked the code will still be evaluated. See what happens if you remove the valid constraints.

We also use REMOVE_IF_EXISTS, this is a soft instruction to the TSD to remove a key, if the key does not exist then it nothing happens. If we had used REMOVE, this will raise an exception if the key does not exist. In this example this would work, try change this and then supply a key that does not exist to see how that behaves.

The delta-value of the TSD will contain REMOVE if a key is removed.

Next an example of using a TSD as in input is considered:

from hgraph import compute_node, TSD, REMOVE, TS
from hgraph.test import eval_node
from frozendict import frozendict as fd

@compute_node
def my_compute_node(tsd: TSD[int, TS[str]], key: TS[int]) -> TS[str]:
    if key.value in tsd:
        v = tsd[key.value]
        if v.modified or key.modified:
            return v.delta_value


assert eval_node(my_compute_node,
            [fd({1: "a"}), None, fd({1: "b"}), fd({2: "c"}), fd({1: REMOVE, 2: "d"})],
            [None, 1, None, 2]
        ) == [None, "a", "b", "c", "d"]

This is a very low performing approach to extracting a value from a TSD based on the key. This shows the basic dictionary nature of the input.

Note that this has a graph solution that is more performant, here is the example of this:

from hgraph import graph, TSD, REMOVE, TS
from hgraph.test import eval_node
from frozendict import frozendict as fd

@graph
def my_compute_node(tsd: TSD[int, TS[str]], key: TS[int]) -> TS[str]:
    return tsd[key]

assert eval_node(my_compute_node,
            [fd({1: "a"}), None, fd({1: "b"}), fd({2: "c"}), fd({1: REMOVE, 2: "d"})],
            [None, 1, None, 2]
        ) == [None, "a", "b", "c", "d"]

The TSD has a number of useful features that can be accessed in graph mode, these include:

[SCALAR|TS|TSS]: By using the [] operator on the time-series dictionary with a scalar value (say tsd[1]) or a time-series of scalar values, this returns a time-series of values for the matching key, if a time-series set is used, then the set is used to filter the keys in the dictionary.
key_set: Returns a reference to the key-set of the time-series dictionary.