Developers Club geek daily blog

2 years, 10 months ago
It is the sixth article from the cycle "The Category Theory for Programmers". The previous articles were already published on Habré:
0. The category theory for programmers: preface
1. Category: composition essence
2. Types and functions
3. Categories, big and small
4. Kleysli's categories
5. Works and koproizvedeniye

In the previous article basic operations over types were considered: work and koproizvedeniye. Now we will show that the combination of these mechanisms allows to construct many of daily data structures. Such creation has essential applied relevance. For example, if we are able to check base data types for equality, and also we know how to reduce equality of work and a koproizvedeniye to equality a component, then operators of equality for composite types it is possible to display automatically. In Haskell for an extensive subset of composite types operators of equality and comparison, converting at line and back and many other operations are automatically brought.

Let's consider in more detail the place of work and a koproizvedeniye of types in programming.

Work of types


Canonical implementation of work of types in programming languages is a couple. In Haskell of steam is a primitive designer of types, and in C ++ it is rather difficult template from standard library.
Pair
Strictly speaking, work of types is not commutative: it is impossible to substitute couple of type (Int, Bool) instead of (Bool, Int), though they also contain the same data. However work is commutative to within the isomorphism set by function swap, which back to:
swap :: (a, b) -> (b, a)
swap (x, y) = (y, x)

It is possible to consider such couples as different formats of storage of the same information as big endian and little endian.

Putting one couples in others, it is possible to combine some types in work. The same can be received more simply if to notice that the enclosed couples are equivalent to trains. This consequence of the fact that different orders of attachment of couples are isomorphic among themselves. There are two possible orders of a combination in work of three types a, b and c (in the set sequence). Namely,
((a, b), c)
or
(a, (b, c))

These types are different in the sense that the function expecting argument of the first type cannot transfer argument of the second, however values of types are in one-to-one correspondence. There is a function which sets this display in one party:
alpha :: ((a, b), c) -> (a, (b, c))
alpha ((x, y), z) = (x, (y, z))

And here the return to it:
alpha_inv :: (a, (b, c)) -> ((a, b), c)
alpha_inv  (x, (y, z)) = ((x, y), z)

So, isomorphism of types takes place; these are just different methods of repacking of the same data.

Considering work of types as binary operation, we see that isomorphism is very similar to the associative law in monoides above:
(a * b) * c = a * (b * c)

The only difference is that in a monoida both works absolutely match, and for types — are equal to within isomorphism.

If to consider this distinction insignificant, then it is possible to prolong analogy to monoides further and to show that Singleton () is a neutral element concerning multiplication of types, just as 1 it is neutral concerning multiplication of numbers. Really, accession () to a type element a does not add any information. Type
(a, ())

it is isomorphic a, where isomorphism is set by functions
rho :: (a, ()) -> a
rho (x, ()) = x
rho_inv :: a -> (a, ())
rho_inv x = (x, ())

Our supervision can be formalized in the form of the statement that the category of sets Set is monoidal, i.e. category which is also monoidy concerning multiplication of objects (in this case — rather Cartesian product). Below strict definition will be given.

In Haskell more general method of a task of works, especially convenient as we will soon be convinced when they are combined with the sums of types is available. Let's use the named designers with several arguments. For example, alternative determination of couple looks so:
data Pair a b = P a b

Here Pair a b — this name of the designer of types parametrized by two other types a and b, and P — name of the designer of data. We define specific type, transferring in the designer of types Pair two types, also we create couple of this type, transferring suitably the typified values in the designer P. For example, we will define a variable stmt as to steam from String and Bool:
stmt :: Pair String Bool
stmt = P "This statement is" False

The first line is a declaration of type. It consists of the designer of types Pair with String and Bool instead of a and b. The second line defines the variable value received by application of the designer of data P to a specific line and logical value. Once again: designers of types are used for designing of types, designers of data — for designing of values.

As namespaces of types and data in Haskell are not crossed, often same name is used for both designers. For example,
data Pair a b = Pair a b

If to look at the built-in type of couple with a Lenin squint, then she is recognized that she actually is a variation on the last declaration, only the designer Pair it is replaced on binary operator (,). It is possible to use (,) as well as any other named designer, in the prefix notation:
stmt = (,) "This statement is" False

Similarly (,,) designs the three etc.

Instead of use of the generalized couples or trains, it is possible to enter a separate name for work of specific types. For example,
data Stmt = Stmt String Bool

represents work String and Bool, but possesses own name and the designer. The benefit of such determination is that it is possible to get many types with the same contents, but different semantics and functionality which the type system will not allow to mix among themselves.

Programming on trains and mnogoargumentny designers often leads to confusion and errors because it is necessary to trace all the time what component, for what is responsible. It would be better to have an opportunity to give to components proper names. Work of types with the named fields is called record in Haskell and struct in C.

Records


Let's review a simple example. Let's describe chemical elements the uniform structure consisting of two lines (the Latin name and the character) and the integral number corresponding to atomic mass. For this purpose it is possible to use a train (String, String, Int) and all the time to hold in the head what component, for what is responsible. A component from a train we will apply pattern matching to extraction. The following function checks whether the character of chemical element is a prefix of its Latin name (for example, HeHelium prefix):
startsWithSymbol :: (String, String, Int) -> Bool
startsWithSymbol (name, symbol, _) = isPrefixOf symbol name

It is easy to be mistaken in such code, it is difficult to read and support it. It is much better to define record instead of a train:
data Element = Element { name         :: String
                       , symbol       :: String
                       , atomicNumber :: Int }

These representations are isomorphic what it is possible to be convinced by means of the following vzaimnoobratny conversions of:
tupleToElem :: (String, String, Int) -> Element
tupleToElem (n, s, a) = Element { name = n
                                , symbol = s
                                , atomicNumber = a }
elemToTuple :: Element -> (String, String, Int)
elemToTuple e = (name e, symbol e, atomicNumber e)

Let's notice that names of writing fields at the same time are also functions-aksessorami. For example, atomicNumber e returns a field atomicNumber records e. Thus function atomicNumber has type:
atomicNumber :: Element -> Int

With use of records of type Element function startsWithSymbol becomes more readable:
startsWithSymbol :: Element -> Bool
startsWithSymbol e = isPrefixOf (symbol e) (name e)

In Haskell it is possible to turn the trick turning function isPrefixOf in infix operator, having framed her with the return apostrophes. It does a code to more readable:
startsWithSymbol e = symbol e `isPrefixOf` name e

We could lower brackets because a priority of infix operator of low priority of function call.

Sum of types


How work in category of sets induces works of types, the koproizvedeniye generates the sums of types. Canonical implementation of the sum of types in Haskell is as follows:
data Either a b = Left a | Right b

As well as couples, Either- y are commutative (to within isomorphism), can be enclosed, and the order of attachment is not important (to within isomorphism). For example, the sum of three types looks so:
data OneOfThree a b c = Sinistral a | Medial b | Dextral c

It turns out that Set forms (symmetric) monoidal category concerning operation of a koproizvedeniye. The place of binary operation is taken by the dizjyunktny sum, and a neutral element is the initial object. In terms of types Either — monoidal operation, but not inhabited type Void — its neutral element. Consider that Either — this addition, and Void — it is zero. Really, adding Void to the sum of types does not change a set of values of type. For example,
Either a Void

isomorphically a. Really, as type Void it is not inhabited, there is no method to construct Right- value. Means, the only inhabitants Either a Void are Left- values which are just a wrapper over value of type a. Symbolically it can be written as a + 0 = a.

The sums of types very often meet in Haskell. In C ++ their analogs (associations or options) are used significantly less often for a number of reasons.

First, the simplest sums of types are transfers which are implemented in C ++ with the help enum. Equivalent
data Color = Red | Green | Blue

in C ++ will be
enum { Red, Green, Blue };

Simpler sum of types
data Bool = True | False

in C ++ is primitive type bool.

Further, the sums of types coding existence or lack of value are implemented in C ++ by means of different tricks with "impossible" values, such as blank lines, negative numbers, null pointers etc. In Haskell explicit, intended optionality of value registers with the help Maybe:
data Maybe a = Nothing | Just a

Type Maybe is the sum of two types. Let's mentally turn it designers into separate types. The first will take a form:
data NothingType = Nothing

This transfer with a unique value Nothing. In other words, it is Singleton equivalent to type (). Second part
data JustType a = Just a

represents a wrapper over type a. We could write Maybe as
data Maybe a = Either () a

More difficult sums of types are fabricated in C ++ by means of pointers. The pointer can be or zero, or point out value of a certain type. For example, in Haskell the list is defined as (recursive) sum of types:
List a = Nil | Cons a (List a)

In C ++ the same type registers so:
template<class A>
class List {
    Node<A> * _head;
public:
    List() : _head(nullptr) {}  // Nil
    List(A a, List<A> l)        // Cons
      : _head(new Node<A>(a, l))
    {}
};

The null pointer codes the empty list here.

Let's notice that designers Nil and Cons from Haskell turned in two overloaded designers of a class List with similar arguments: without arguments in a case Nil, value and the list for Cons. Class List does not need a tag which would distinguish components of the sum of types; instead he uses special value nullptr for _head, to provide Nil.

Important distinction between types in Haskell and in C ++ consists that in immutabelna data structure Haskell. If the object was created by means of a certain designer, then he will forever remember what designer and with what arguments was used. So class copy Maybe, created as Just "energy", will never turn back Nothing. Similarly empty list forever will remain empty, and the three-element list will always store the same three elements.

The Immutabelnost does designers reversible: the object can always be taken to the components used during its creation. Such deconstruction is executed by pattern matching as which this or that designer acts. Arguments of the designer are substituted with names of variables (or other samples).

At type List two designers, so deconstruction of any List consists of two corresponding pattern matchings. The first sample matches the empty list Nil, the second — with the list created with the help Cons. For an example we will define simple function:
maybeTail :: List a -> Maybe (List a)
maybeTail Nil = Nothing
maybeTail (Cons _ t) = Just t

First part of determination maybeTail the designer uses Nil as the sample for comparison also returns Nothing. The second part uses as an example the designer Cons. The first argument of a sample is provided by a crossed out section as we are not interested in the value which is contained in it. Second argument Cons contacts a variable t (hereinafter we will speak about variables though, strictly speaking, they are invariable: once the variable connected with value never changes). Function value on this sample is equal Just t. So, depending on a method of creation of value of type List, it will match one of samples. If it was created with the help Cons, function will receive both used at the same time argument (first of which it will be ignored).

More difficult sums of types are implemented in C ++ by hierarchy of polymorphic classes. The family of classes with the general primogenitor can be treated as the sum of types in which as an implicit tag components serves the table of virtual functions. What serves as pattern matching in Haskell to, is implemented in C ++ by scheduling function call.

The reader will seldom meet in C ++ use union as the sum of types because of its excessive limitation. In union it is impossible to place even std::string, because this class has a designer of copying.

Algebra of types


Separately works and the sums of types allow to define a set of useful data structures, but true power results from their combination.

Let's sum up the above results. We considered two commutative monoidal structures which are the cornerstone of type system. It is the sum of types with a neutral element Void and work of types with a neutral element (). It is convenient to imagine them as addition and multiplication. In that case Void corresponds to zero, and () — to unit.

Let's look how far this analogy stretches. For example, whether truly that multiplication by zero gives zero? In other words, whether any work is isomorphic on Void to type Void?

Whether there are couples consisting from, say, Int and Void? For creation of couple both values are necessary. Value of type Int — it is not a problem, and here with Void there is a hitch: this type is not inhabited (there is no value of this type). Thus, for any type a type (a, Void) also it is not inhabited and, therefore, it is equivalent Void. In other words, a*0 = 0.

Addition and multiplication of numbers are connected by the distributive law:
a * (b + c) = a * b + a * c

Whether it is executed for the sum and work of types? Yes, to within isomorphism. The left part of identity corresponds to type
(a, Either b c)

and right —
Either (a, b) (a, c)

Let's show functions which will transform types there and back:
prodToSum :: (a, Either b c) -> Either (a, b) (a, c)
prodToSum (x, e) =
    case e of
      Left  y -> Left  (x, y)
      Right z -> Right (x, z)
sumToProd :: Either (a, b) (a, c) -> (a, Either b c)
sumToProd e =
    case e of
      Left  (x, y) -> (x, Left  y)
      Right (x, z) -> (x, Right z)

Construction case of it is used for pattern matching in function. The arrow separates a sample and the expression corresponding to it. For example, by a challenge prodToSum with argument
prod1 :: (Int, Either String Float)
prod1 = (2, Left "Hi!")

variable e in case e of it will be equal Left "Hi!". It will match a sample Left y, substituting "Hi!" instead of y. As variable x earlier it was already connected with 2, result of construction case of (and all function) will be, as expected Left (2, "Hi!").

We will leave the proof that functions are higher than a vzaimnoobratna to the reader as exercise. They only repack the same data from one format in another.

Two monoid connected by the distributive law in mathematics are called a half ring. It not a complete ring as we not in forces to define subtraction of types. A set of statements, correct for the natural numbers forming a half ring, it is possible to transfer to types. There are several examples:
Numbers
Types
0
Void
1
()
a + b
Either a b = Left a | Right b
a * b
(a, b) or Pair a b = Pair a b
2 = 1 + 1
data Bool = True | False
1 + a
data Maybe = Nothing | Just a

The type of the list is of special interest as it is defined as an equation solution. The defined type meets in both parties of equality:
List a = Nil | Cons a (List a)

Having made normal substitutions and having replaced List a on x, we will receive
x = 1 + a * x

This equation cannot be solved by traditional algebraic methods as types cannot be read or divided. Let's try to substitute recursively instead of x on the right there is an expression (1 + a*x) and to remove the brackets on distributivity. Let's receive
x = 1 + a*x
x = 1 + a*(1 + a*x) = 1 + a + a*a*x
x = 1 + a + a*a*(1 + a*x) = 1 + a + a*a + a*a*a*x
...
x = 1 + a + a*a + a*a*a + a*a*a*a...

Eventually we will come to the infinite sum of works (trains) which can be treated so: the list or is empty, 1; or consists of one element, a; or consists of couple, a*a; or from the three, a*a*a, etc. The determination received formally completely answers intuitive idea of the list as about a line where instead of letters — values of type a.

We will return to lists and other recursive structures further, after studying of functors and motionless points.

The solution of the equations with character variables is an algebra! Therefore such data types are also called algebraic (ATD).

Summing up the results, we will give one very important interpretation of algebra of types. Let's notice that work of types a and b has to contain also value of type a, and value of type b, what is attracted by density of population of both types. On the other hand, the sum of types contains or value of type a, or value of type b, so it is enough that at least one of them was inhabited. Logic operations of conjunction and a disjunction are formed by the half ring which is in the following compliance with algebra of types:
Logic
Types
false
Void
true
()
a || b
Either a b = Left a | Right b
&& b
(a, b)

This analogy can be deepened and is a Curry-Howard isomorphism basis between logic and the type theory. We will return to this question by consideration of functional types.

Exercises


  1. Show that Maybe a and Either () a are isomorphic.
  2. Let's consider the following sum of types in Haskell:
    data Shape = Circle Float
               | Rect Float Float

    To define on type Shape function area, we will use pattern matching:
    area :: Shape -> Float
    area (Circle r) = pi * r * r
    area (Rect d h) = d * h

    Implement Shape on C ++ or Java as the interface also create two classes: Circle and Rect. Then write area as virtual function.
  3. (Continuation) is simple to add new function circ, which calculates perimeter Shape. In Haskell determination of type Shape will remain invariable; the following code can be added to any place of the program:
    circ :: Shape -> Float
    circ (Circle r) = 2.0 * pi * r
    circ (Rect d h) = 2.0 * (d + h)

    Add circ in your program on C ++ or Java. What parts of the original program had to be modified?
  4. Add (continuation) to type Shape new figure Square also enter the corresponding updates to other code. What had to be changed in Haskell? And what in C ++ or Java? (Even if you do not know Haskell, changes have to be quite obvious.)
  5. Show that formal identity a + a = 2 * a it is executed for types (to within isomorphism). We remind, 2 in language of types corresponds Bool (see the table above).

Thanks


The author is grateful Gershoma Bazermana for reviewing of a post and useful comments.

This article is a translation of the original post at habrahabr.ru/post/274103/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus