Typing ====== TimeSeriesValue (TS) -------------------- The most basic type is the ``TS`` type. We have been using this in the previous examples, this represents a time-series of "scalar" values. Where a scalar value is a non-time-varying data type. These include things such as int, float, tuple, etc. Scalars should be immutable, although this constraint is not formally enforced, using a mutable data type can cause undefined behaviour if the value is mutated during processing. We have already covered a number of examples using the key time-series properties such as ``value``, ``active``, and ``valid``. There are two other important properties, namely: * ``delta_value`` - This represents the change in value but is equivalent to ``value`` in the case of the ``TS`` type. * ``last_modified_time`` - The time the value was last changed. * ``modified`` - A boolean value representing if this input was modified in this engine cycle. Modified can be useful when there are more than one input that can change and the code behaves differently depending on which input was modified, for example: .. testcode:: from hgraph import compute_node, TS from hgraph.test import eval_node @compute_node def my_compute_node(ts1: TS[int], ts2: TS[int]) -> TS[int]: if ts1.modified and ts2.modified: return ts1.value + ts2.value elif ts1.modified: return ts1.value else: return ts2.value # ts2 must have been modified in this case assert eval_node(my_compute_node, [1, 2, None], [3, None, 4]) == [4, 2, 4] As mentioned previously, HGraph is strongly typed. Whilst Python itself does not enforce any form of typing, this is not true for HGraph functions. They require each input and output to be typed. These types are validated when connecting an output to an input. There is no automatic type conversions, thus an output of type ``TS[int]`` cannot be bound to a type of ``TS[float]`` without an explicit type cast. Casting ------- To facilitate type conversions a number of casting utility functions exists, these are: * cast\_ * downcast\_ * downcast_ref For now we will focus on ``cast_`` and ``downcast_``. To convert from one type to another use ``cast_``, this can be used on any type that supports a constructor that takes in the source value and returns an instance of the to value. For example: .. testcode:: from hgraph import graph, TS, cast_ from hgraph.test import eval_node @graph def cast_int_to_float(ts: TS[int]) -> TS[float]: return cast_(float, ts) assert eval_node(cast_int_to_float, [1, 2, 3]) == [1.0, 2.0, 3.0] This casts the integer input into a float. .. note:: As mentioned earlier, most code users write is expected to be in the form of graph code. All nodes (compute/sink/source) are wired together in the graph decorated functions. It is not possible to call a node from within a node function. Since ``cast_`` is a compute node, it must be called within a ``graph`` decorated function. The downcast allows a type to be re-cast to a child type, for example: .. testcode:: from hgraph import graph, TS, downcast_, MIN_ST from hgraph.test import eval_node from datetime import date, datetime @graph def cast_date_to_datetime(ts: TS[date]) -> TS[datetime]: return downcast_(datetime, ts) assert eval_node(cast_date_to_datetime, [MIN_ST]) == [MIN_ST] This may seem a bit strange, since we supply a datetime instance to start with. But in Python datetime is an instance of date, thus the type checking logic correctly accepts the datetime instance as a date. But if we got rid of the ``downcast_`` operator the graph would complain. Additionally if we supplied a date as an input, the downcast would raise an assertion error. CompoundScalar -------------- The downcast operator is generally more widely used when working with the ``CompoundScalar`` type. This type provides a more complex structure for a value (or scalar) type. The compound scalar is a type safe data class. All compound scalar classes have ``CompoundScalar`` as the base. It is recommended to use the ``dataclass`` to tag the class. Here is an example:: @dataclass class MyCompoundScalar(CompoundScalar): p1: str p2: int It is possible to use all the defined types available in HGraph as types for the properties. It is possible to provide default values as well, for example:: p1: str = "Hello" As well as setting the value to ``None``, which is useful to describe optional fields, i.e.:: p1: str = None Scalar types are considered as immutable and atomic, in this case the compound scalar represents the collection of values that tick together. The types can be sub-classed as well. To use the type, these are type to time standard time-series type (``TS``) TimeSeriesBundle (TSB) ---------------------- Sometimes it is useful to describe a related collection of time-series values, these related values do not necessarily change in unison with each other, but do form a natural grouping. Alternatively they may represent values that, whilst they do change together may be computed separately. An example of this scenario is a mid and spread, they are related in that they are computed from the inside bid and offer price, but depending on the market or use-case the mid price is more likely to change than the spread. Thus they are likely to tick at different rates. Also, it is quite standard for a mid price to be computed independently from the spread when pricing an instrument, but the values need to be grouped together as they are both required to know the value of the instrument when considering side. Using this example we can group time-series values together as follows:: from hgraph import TSB, TimeSeriesSchema, graph from dataclasses import dataclass @dataclass class MidSpread(TimeSeriesSchema): mid: TS[float] spread: TS[float] @graph def my_price_logic(price: TSB[MidSpread], ...) -> ... We declare the schema or shape of the bundle in much the same way as for the ``CompoundScalar``, however, in this case the types are all time-series types. With a ``TimeSeriesScheam``, all properties must be time-series types. Whereas for the ``CompoundScalar`` all types much also be scalar types. With both ``TS`` of ``CompoundScalar`` and ``TSB`` of ``TimeSeriesSchema``, it is possible to dereference the individual properties of the schemas by using the standard dot notation, for example:: @graph def my_price_logic(price: TSB[MidSpread], ...) -> ... a = price.mid .. note:: When dereferencing a property of a bundle, during wiring, there is no cost. Doing the same with a ``TS`` of ``CompoundScalar`` incurs a cost of a node to extract the property from the compound scalar and emit it as a time-series value. To construct a TSB value we consider two options, one in ``graph`` mode and one in a ``compute_node``. .. testcode:: from hgraph import TS, TSB, TimeSeriesSchema, graph, CompoundScalar, combine from hgraph.test import eval_node from dataclasses import dataclass from frozendict import frozendict as fd @dataclass class BidAsk(CompoundScalar): bid: float ask: float @dataclass class MidSpread(TimeSeriesSchema): mid: TS[float] spread: TS[float] @graph def to_mid_spread(price: TS[BidAsk]) -> TSB[MidSpread]: mid = (price.bid + price.ask) / 2.0 spread = price.ask - price.bid return combine[TSB[MidSpread]](mid=mid, spread=spread) assert eval_node(to_mid_spread, [BidAsk(bid=100.0, ask=101.0)]) == [fd(mid=100.5, spread=1.0)] This shows the use of the dot dereferencing of a compound scalar. Remember this does incur two nodes to extract the bid and ask time-series values. This also shows the use of many standard operators such as divide and subtraction. HGraph supports most of the Python operators at wiring time allowing for writing code is a very similar fashion to standard Python. But this is really just building up a dependency graph of nodes with the operators being replaced with computation nodes. These nodes will be evaluated when the inputs tick. The use of the ``combine`` operator is depicted here. The operator is a generic operator that will be resolved into the correct node (or logical) instance. In this case let the ``combine`` operator that we which to combine time-series values together into a ``TSB`` with the schema ``MidSpread``. If no refining parameters are provide (the ``[TSB[MidSpread]]`` the combine always assumes it is producing a ``TSB`` instance and will create an un-named type. Un named TSB instances are defined dynamically and will match a named type based on the properties matching, that is:: combine[TSB[MidSpread]](mid=mid, spread=spread) is equivalent to:: combine(mid=mid, spread=spread) It is also possible to combine time-series values into a compound scalar, for example:: ask = ... bid = ... combine[TS[BidAsk]](bid=bid, ask=ask) In this case it is required that the output type is provided to produce the correct output type, otherwise we would instead create an un-named bundle of the values. Lets consider the other approach, using a ``compute_node``: .. testcode:: from hgraph import TS, TSB, TimeSeriesSchema, graph, CompoundScalar, compute_node from hgraph.test import eval_node from dataclasses import dataclass from frozendict import frozendict as fd @dataclass class BidAsk(CompoundScalar): bid: float ask: float @dataclass class MidSpread(TimeSeriesSchema): mid: TS[float] spread: TS[float] @compute_node def to_mid_spread(price: TS[BidAsk]) -> TSB[MidSpread]: price = price.value # get the actual value mid = (price.bid + price.ask) / 2.0 spread = price.ask - price.bid return dict(mid=mid, spread=spread) assert eval_node(to_mid_spread, [BidAsk(bid=100.0, ask=101.0)]) == [fd(mid=100.5, spread=1.0)] This code looks very similar to the previous example, the only real difference is the requirement to extract the value from price before performing the computations and here we return the bundle as a dictionary of modified values. In this case the code will produce fewer nodes as the nodes to extract ``bid`` and ``ask`` are not required, not will there be nodes for the mathematical operations. This code is likely to run faster then the previous example whilst the runtime-engine remains in Python. However, once the engine is migrated to C++, experience indicates that the prior code will often outperform the second version as it is all evaluated in C++ and not in Python. That said, with all performance statements, validation of your particular use-case is always important. Finally, lets view how to access the properties of a ``TSB`` inside of a compute node. .. testcode:: from hgraph import TS, TSB, TimeSeriesSchema, graph, CompoundScalar, compute_node from hgraph.test import eval_node from dataclasses import dataclass from frozendict import frozendict as fd @dataclass class BidAsk(CompoundScalar): bid: float ask: float @dataclass class MidSpread(TimeSeriesSchema): mid: TS[float] spread: TS[float] @compute_node def to_bid_ask(price: TSB[MidSpread]) -> TS[BidAsk]: mid = price.mid.value half_spread = price.spread.value / 2.0 return BidAsk(bid=mid-half_spread, ask=mid+half_spread) assert eval_node(to_bid_ask, [fd(mid=100.5, spread=1.0)]) == [BidAsk(bid=100.0, ask=101.0)] Here we see that each time-series property is represented as a time-series within the compute node. Thus we need to get the value of the property. Each property also responds to all other time-series methods such as ``modified``, etc. It is also possible to request the value of the time-series bundle directly, this will return a dictionary of keys and values. This is also the first time that the ``delta_value`` returns something different, this will return the dictionary of values that was modified in this engine cycle. Exercise ........ Try creating a compute node (or sink node) that prints the ``value`` and ``delta_value`` with different input combinations being ticked. TimeSeriesList (TSL) -------------------- The ``TSL`` is the time-series equivalent of a list, at this point in time, the list have a fixed size. This list is of homogenous time-series values. This is different to the ``TSB`` which is a collection of heterogeneous time-series values. When specifying the ``TSL`` two generics need to be provided, the first is the time-series type making up the elements of the list and the second is the size of the list, for example: .. testcode:: from hgraph import compute_node, TSL, TS, Size from hgraph.test import eval_node @compute_node def my_compute_node(tsl: TSL[TS[int], Size[2]]) -> TS[int]: return tsl[0].value + tsl[1].value assert eval_node(my_compute_node, [(1, 2), (3, 4)]) == [3, 7] .. note:: The use of the ``Size`` class to specifying the size of the list. This is done as Python does not support values as generics and only types. This provides a mechanism to specify the type including it's size using the generic tooling provided by Python. When accessing a collection type, as with the ``TSB``, referencing an element of the type within a node the return value is the time-series value, in this case it is ``TS[int]`` that gets returned. If value is called on the collection type, the returned value is the collection of recursive calls to value on the elements of the collection, for example: .. testcode:: from hgraph import compute_node, TSL, TS, Size from hgraph.test import eval_node @compute_node def my_compute_node(tsl: TSL[TS[int], Size[2]]) -> TS[tuple[int, ...]]: return tsl.value assert eval_node(my_compute_node, [(1, 2), (3, 4)]) == [(1, 2), (3, 4)] Collection types can be dereferenced in graph code as well, for example: .. testcode:: from hgraph import graph, TSL, TS, Size from hgraph.test import eval_node @graph def my_compute_node(tsl: TSL[TS[int], Size[2]]) -> TS[int]: return tsl[0] + tsl[1] assert eval_node(my_compute_node, [(1, 2), (3, 4)]) == [3, 7] This code is the same as the node implementation. Since we are at graph level, the ``+`` operator results in the following equivalent code:: @graph def my_compute_node(tsl: TSL[TS[int], Size[2]]) -> TS[int]: return add_(tsl[0], tsl[1]) Where the ``add_`` node takes two TS inputs. TimeSeriesSet (TSS) ------------------- Another often used data type is the ``set``, the time-series equivalent is the time-series set or ``TSS``. This is a collection time-series type as well, but behaves more closely to the TS type as it can only contain scalar values. The type supports tracking the contents of a set over time and can provide the changes made in the form of the ``SetDelta`` protocol class. The delta contains the items added and removed. The type itself contains the current state (accessible via the ``value`` property). The ``SetDelta`` is obtained from the ``delta_value`` property on the time-series instance. Here is an example of the ``TSS`` used in a compute node. .. testcode:: from hgraph import compute_node, TSS, set_delta from hgraph.test import eval_node @compute_node def my_compute_node(tss_1: TSS[int], tss_2: TSS[int]) -> TSS[int]: added = (tss_1.added() - tss_2.value) | (tss_2.added() - tss_1.value) removed = tss_1.removed() - tss_2.value removed |= tss_2.removed() - tss_1.value return set_delta(added=added, removed=removed) assert eval_node(my_compute_node, [frozenset({1, 2}),], [frozenset({3, 4})]) == [frozenset({1, 2, 3, 4})] TimeSeriesDict (TSD) -------------------- This represents a dictionary of time-series values, the ``TSD`` is comprised of a ``key_set`` that is a ``TSS`` instance. The values of the dictionary are themselves time-series values in the same manor as for the ``TSB`` and ``TSL`` collection types. This is currently the only dynamic type, in that it can grow and shrink the number of collected time-series values. Another way to think of the ``TSD`` is to view it as a multiplex of time-series values. The ``TSD`` takes generics as for dict, i.e. ``TSD[K, V]`` where the ``K`` must be a keyable scalar value (must support the hashable protocol). and ``V`` is a time-series type. The following key behaviours are provided by the ``TSD`` that are accessible in the node, namely: ``key_set`` As already discussed this is a time-series set with type ``K``. The set contains the keys of the dictionary. ``keys()``, ``values()``, ``items()`` As for any dictionary, these represent an iterator over the keys, values, and items. Where values are the time-series type instances. ``modified_keys()``, ``modified_values()``, ``modified_items()`` As above, but will only provide values that have been modified in this engine cycle. ``valid_keys()``, ``valid_values()``, ``valid_items()`` As above, but will only provide values that are valid in this engine cycle. ``added_keys()``, ``added_values()``, ``added_items()`` As above, but will only provide values that have been added in this engine cycle. ``removed_keys()``, ``removed_values()``, ``removed_items()`` As above, but will only provide values that have been removed in this engine cycle. Standard methods such as ``__len__``, ``__iter__``, and ``__contains__`` are implemented as expected for a dict. Here is an example to create a ``TSD`` as an output: .. testcode:: from hgraph import compute_node, TSD, REMOVE_IF_EXISTS, REMOVE, TS from hgraph.test import eval_node from frozendict import frozendict as fd @compute_node(valid=("key", "value")) def my_compute_node(key: TS[int], value: TS[str], remove: TS[int]) -> TSD[int, TS[str]]: out = {} if key.modified or value.modified: out[key.value] = value.value if remove.modified: out[remove.value] = REMOVE_IF_EXISTS return out assert eval_node(my_compute_node, [1, None, 2], ["a", "b", "c", "d"], [None, None, None, 1] ) == [fd({1: "a"}), fd({1: "b"}), fd({2: "c"}), fd({1: REMOVE, 2: "d"})] In this example we create a time-series dictionary from the time-series supplying keys and values and removing keys when the remove time-series ticks. Note the use of ``valid`` to advice the engine that we only require the ``key`` and ``value`` attribute to be valid, thus if the ``remove`` has not ticked the code will still be evaluated. See what happens if you remove the ``valid`` constraints. We also use ``REMOVE_IF_EXISTS``, this is a soft instruction to the ``TSD`` to remove a key, if the key does not exist then it nothing happens. If we had used ``REMOVE``, this will raise an exception if the key does not exist. In this example this would work, try change this and then supply a key that does not exist to see how that behaves. The delta-value of the ``TSD`` will contain ``REMOVE`` if a key is removed. Next an example of using a ``TSD`` as in input is considered: .. testcode:: from hgraph import compute_node, TSD, REMOVE, TS from hgraph.test import eval_node from frozendict import frozendict as fd @compute_node def my_compute_node(tsd: TSD[int, TS[str]], key: TS[int]) -> TS[str]: if key.value in tsd: v = tsd[key.value] if v.modified or key.modified: return v.delta_value assert eval_node(my_compute_node, [fd({1: "a"}), None, fd({1: "b"}), fd({2: "c"}), fd({1: REMOVE, 2: "d"})], [None, 1, None, 2] ) == [None, "a", "b", "c", "d"] This is a very low performing approach to extracting a value from a ``TSD`` based on the key. This shows the basic dictionary nature of the input. Note that this has a graph solution that is more performant, here is the example of this: .. testcode:: from hgraph import graph, TSD, REMOVE, TS from hgraph.test import eval_node from frozendict import frozendict as fd @graph def my_compute_node(tsd: TSD[int, TS[str]], key: TS[int]) -> TS[str]: return tsd[key] assert eval_node(my_compute_node, [fd({1: "a"}), None, fd({1: "b"}), fd({2: "c"}), fd({1: REMOVE, 2: "d"})], [None, 1, None, 2] ) == [None, "a", "b", "c", "d"] The ``TSD`` has a number of useful features that can be accessed in graph mode, these include: ``[SCALAR|TS|TSS]`` By using the ``[]`` operator on the time-series dictionary with a scalar value (say ``tsd[1]``) or a time-series of scalar values, this returns a time-series of values for the matching key, if a time-series set is used, then the set is used to filter the keys in the dictionary. ``key_set`` Returns a reference to the key-set of the time-series dictionary.