Quick Start¶
What is Yatel?¶
It’s a reference implementation of NW-OLAP
- Wiskey-Ware License
- It is largely implementing the aforementioned process.
- Soon to arrive it’s first usable version 0.3
Case study (example)¶
Suppose we have the following problem:
We have three places called Cordoba [0] [1] [2], each separated one from the other by a certain distance. We can use Yatel to state the problem and make queries:
- Which ones have an area between 200km2 and 600km2?
- Which ones speak Spanish?
- Those with the time zone utc-6?
- Who has in his name Andalucía?
Loading problem into Yatel¶
We load the previous model into Yatel, as follows:
>>> from yatel import dom, db
>>> from pprint import pprint
# postgres, oracle, mysql, and many more
>>> nw = db.YatelNetwork("memory", mode="w")
>>> elems = [
... dom.Haplotype(0, name="Cordoba"), # left
... dom.Haplotype(1, name="Cordoba"), # right
... dom.Haplotype(2, name="Cordoba"), # bottom
...
... dom.Edge(6599, (0, 1)),
... dom.Edge(8924, (1, 2)),
... dom.Edge(9871, (2, 0)),
...
... dom.Fact(0,name="Andalucia", lang="sp", timezone="utc-3"),
... dom.Fact(1, lang="sp"),
... dom.Fact(1, timezone="utc-6"),
... dom.Fact(2, name="Andalucia", lang="sp", timezone="utc"),
... ]
>>> nw.add_elements(elems)
>>> nw.confirm_changes()
In the above code, we create a database in memory and define:
- A haplotype for each Córdoba.
- An edge to match each Córdoba by a distance.
- Facts that give us information about the haplotypes.
Models and Attributes¶
Showing the description
>>> descriptor = nw.describe()
>>> pprint(dict(descriptor))
{'edge_attributes': {u'max_nodes': 2, u'weight': <type 'float'>},
'fact_attributes': {'hap_id': <type 'int'>,
'lang': <type 'str'>,
'name': <type 'str'>,
'timezone': <type 'str'>},
'haplotype_attributes': {'hap_id': <type 'int'>, 'name': <type 'str'>},
'mode': 'r',
'size': {u'edges': 3, u'facts': 4, u'haplotypes': 3}
}
Showing Haplotypes:
>>> for hap in nw.haplotypes():
... print hap
<Haplotype (0) at 0x24faa50>
<Haplotype (1) at 0x24eae50>
<Haplotype (2) at 0x24fa990>
Showing Edges:
>>> for edge in nw.edges():
... print edge
<Edge ([6599.0 [0, 1]] ) at 0x1f64c50>
<Edge ([8924.0 [1, 2]] ) at 0x24fa0d0>
<Edge ([9871.0 [2, 0]] ) at 0x1f64c50>
Showing Facts:
>>> for fact in nw.facts():
... print fact
<Fact (of Haplotype '0') at 0x24eae50>
<Fact (of Haplotype '1') at 0x24fad10>
<Fact (of Haplotype '1') at 0x24eae50>
<Fact (of Haplotype '2') at 0x24fad10>
Query¶
Now for the queries:
>>> hap = nw.haplotype_by_id(2)
>>> hap
<Haplotype (2) at 0x24fa990>
Edges by haplotype:
>>> for edge in nw.edges_by_haplotype(hap):
... print edge
<Edge ([9871.0 [2, 0]] ) at 0x24fa710>
<Edge ([8924.0 [1, 2]] ) at 0x1f64c50>
Facts by haplotype:
>>> for fact in nw.facts_by_haplotype(hap):
... print dict(fact)
{u'lang': u'sp', u'timezone': u'utc', 'hap_id': 2, u'name': u'Andalucia'}
Haplotypes by lang environment:
>>> for hap in nw.haplotypes_by_environment(lang="sp"):
... print hap
<Haplotype (0) at 0x24fa2d0>
<Haplotype (1) at 0x25c5350>
<Haplotype (2) at 0x24fa2d0>
Haplotypes by timezone environment:
>>> for hap in nw.haplotypes_by_environment(timezone="utc-6"):
... print hap
<Haplotype (1) at 0x24eae50>
Haplotypes by name environment:
>>> for hap in nw.haplotypes_by_environment(name="Andalucia"):
... print hap
<Haplotype (0) at 0x25c5350>
<Haplotype (2) at 0x24eae50>
Edges by Andalucia environment:
>>> for edge in nw.edges_by_environment(name="Andalucia"):
... print edge
<Edge ([9871.0 [2, 0]] ) at 0x24fa7d0>
All environments:
>>> for env in nw.environments():
... print env
<Enviroment {u'lang': u'sp', u'timezone': u'utc-3', u'name': u'Andalucia'} at 0x24faad0>
<Enviroment {u'lang': u'sp', u'timezone': None, u'name': None} at 0x24db490>
<Enviroment {u'lang': None, u'timezone': u'utc-6', u'name': None} at 0x24faad0>
<Enviroment {u'lang': u'sp', u'timezone': u'utc', u'name': u'Andalucia'} at 0x24db490>
Statistics¶
Here are some statistics:
>>> from yatel import stats
>>> stats.average(nw) # average
8464.66666667
>>> stats.std(nw, name="Andalucia")
0.0
Data Mining¶
Now to some data mining:
>>> from scipy.spatial.distance import euclidean
>>> from yatel.cluster import kmeans
>>> cbs, distortion = kmeans.kmeans(nw, nw.environments(), 2)
>>> for env in nw.environments():
... coords = kmeans.hap_in_env_coords(nw, env)
... min_euc = None
... closest_centroid = None
... for cb in cbs:
... euc = euclidean(cb, coords)
... if min_euc is None or euc < min_euc:
... min_euc = euc
... closest_centroid = cb
... print "{} || {} || {}".format(dict(env), closest_centroid, euc)
{u'lang': u'sp', u'timezone': u'utc-3', u'name': u'Andalucia'} || [0 0 0] || 1.0
{u'lang': u'sp', u'timezone': u'utc-3', u'name': u'Andalucia'} || [0 0 0] || 1.41421356237
{u'lang': u'sp', u'timezone': None, u'name': None} || [0 0 0] || 1.0
{u'lang': u'sp', u'timezone': None, u'name': None} || [0 1 0] || 0.0
{u'lang': None, u'timezone': u'utc-6', u'name': None} || [0 0 0] || 1.0
{u'lang': None, u'timezone': u'utc-6', u'name': None} || [0 1 0] || 0.0
{u'lang': u'sp', u'timezone': u'utc', u'name': u'Andalucia'} || [0 0 0] || 1.0
{u'lang': u'sp', u'timezone': u'utc', u'name': u'Andalucia'} || [0 0 0] || 1.41421356237