Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Programming

1083 Articles
article-image-binary-search-tree-tutorial
Pavan Ramchandani
25 Jul 2018
19 min read
Save for later

Build a C++ Binary search tree [Tutorial]

Pavan Ramchandani
25 Jul 2018
19 min read
A binary tree is a hierarchical data structure whose behavior is similar to a tree, as it contains root and leaves (a node that has no child). The root of a binary tree is the topmost node. Each node can have at most two children, which are referred to as the left child and the right child. A node that has at least one child becomes a parent of its child. A node that has no child is a leaf. In this tutorial, you will be learning about the Binary tree data structures, its principles, and strategies in applying this data structures to various applications. This C++ tutorial has been taken from C++ Data Structures and Algorithms. Read more here.  Take a look at the following binary tree: From the preceding binary tree diagram, we can conclude the following: The root of the tree is the node of element 1 since it's the topmost node The children of element 1 are element 2 and element 3 The parent of elements 2 and 3 is 1 There are four leaves in the tree, and they are element 4, element 5, element 6, and element 7 since they have no child This hierarchical data structure is usually used to store information that forms a hierarchy, such as a file system of a computer. Building a binary search tree ADT A binary search tree (BST) is a sorted binary tree, where we can easily search for any key using the binary search algorithm. To sort the BST, it has to have the following properties: The node's left subtree contains only a key that's smaller than the node's key The node's right subtree contains only a key that's greater than the node's key You cannot duplicate the node's key value By having the preceding properties, we can easily search for a key value as well as find the maximum or minimum key value. Suppose we have the following BST: As we can see in the preceding tree diagram, it has been sorted since all of the keys in the root's left subtree are smaller than the root's key, and all of the keys in the root's right subtree are greater than the root's key. The preceding BST is a balanced BST since it has a balanced left and right subtree. We also can define the preceding BST as a balanced BST since both the left and right subtrees have an equal height (we are going to discuss this further in the upcoming section). However, since we have to put the greater new key in the right subtree and the smaller new key in the left subtree, we might find an unbalanced BST, called a skewed left or a skewed right BST. Please see the following diagram:   The preceding image is a sample of a skewed left BST, since there's no right subtree. Also, we can find a BST that has no left subtree, which is called a skewed right BST, as shown in the following diagram: As we can see in the two skewed BST diagrams, the height of the BST becomes taller since the height equals to N - 1 (where N is the total keys in the BST), which is five. Comparing this with the balanced BST, the root's height is only three. To create a BST in C++, we need to modify our TreeNode class in the preceding binary tree discussion, Building a binary tree ADT. We need to add the Parent properties so that we can track the parent of each node. It will make things easier for us when we traverse the tree. The class should be as follows: class BSTNode { public: int Key; BSTNode * Left; BSTNode * Right; BSTNode * Parent; }; There are several basic operations which BST usually has, and they are as follows: Insert() is used to add a new node to the current BST. If it's the first time we have added a node, the node we inserted will be a root node. PrintTreeInOrder() is used to print all of the keys in the BST, sorted from the smallest key to the greatest key. Search() is used to find a given key in the BST. If the key exists it returns TRUE, otherwise it returns FALSE. FindMin() and FindMax() are used to find the minimum key and the maximum key that exist in the BST. Successor() and Predecessor() are used to find the successor and predecessor of a given key. We are going to discuss these later in the upcoming section. Remove() is used to remove a given key from BST. Now, let's discuss these BST operations further. Inserting a new key into a BST Inserting a key into the BST is actually adding a new node based on the behavior of the BST. Each time we want to insert a key, we have to compare it with the root node (if there's no root beforehand, the inserted key becomes a root) and check whether it's smaller or greater than the root's key. If the given key is greater than the currently selected node's key, then go to the right subtree. Otherwise, go to the left subtree if the given key is smaller than the currently selected node's key. Keep checking this until there's a node with no child so that we can add a new node there. The following is the implementation of the Insert() operation in C++: BSTNode * BST::Insert(BSTNode * node, int key) { // If BST doesn't exist // create a new node as root // or it's reached when // there's no any child node // so we can insert a new node here if(node == NULL) { node = new BSTNode; node->Key = key; node->Left = NULL; node->Right = NULL; node->Parent = NULL; } // If the given key is greater than // node's key then go to right subtree else if(node->Key < key) { node->Right = Insert(node->Right, key); node->Right->Parent = node; } // If the given key is smaller than // node's key then go to left subtree else { node->Left = Insert(node->Left, key); node->Left->Parent = node; } return node; } As we can see in the preceding code, we need to pass the selected node and a new key to the function. However, we will always pass the root node as the selected node when performing the Insert() operation, so we can invoke the preceding code with the following Insert() function: void BST::Insert(int key) { // Invoking Insert() function // and passing root node and given key root = Insert(root, key); } Based on the implementation of the Insert() operation, we can see that the time complexity to insert a new key into the BST is O(h), where h is the height of the BST. However, if we insert a new key into a non-existing BST, the time complexity will be O(1), which is the best case scenario. And, if we insert a new key into a skewed tree, the time complexity will be O(N), where N is the total number of keys in the BST, which is the worst case scenario. Traversing a BST in order We have successfully created a new BST and can insert a new key into it. Now, we need to implement the PrintTreeInOrder() operation, which will traverse the BST in order from the smallest key to the greatest key. To achieve this, we will go to the leftmost node and then to the rightmost node. The code should be as follows: void BST::PrintTreeInOrder(BSTNode * node) { // Stop printing if no node found if(node == NULL) return; // Get the smallest key first // which is in the left subtree PrintTreeInOrder(node->Left); // Print the key std::cout << node->Key << " "; // Continue to the greatest key // which is in the right subtree PrintTreeInOrder(node->Right); } Since we will always traverse from the root node, we can invoke the preceding code as follows: void BST::PrintTreeInOrder() { // Traverse the BST // from root node // then print all keys PrintTreeInOrder(root); std::cout << std::endl; } The time complexity of the PrintTreeInOrder() function will be O(N), where N is the total number of keys for both the best and the worst cases since it will always traverse to all keys. Finding out whether a key exists in a BST Suppose we have a BST and need to find out if a key exists in the BST. It's quite easy to check whether a given key exists in a BST, since we just need to compare the given key with the current node. If the key is smaller than the current node's key, we go to the left subtree, otherwise we go to the right subtree. We will do this until we find the key or when there are no more nodes to find. The implementation of the Search() operation should be as follows: BSTNode * BST::Search(BSTNode * node, int key) { // The given key is // not found in BST if (node == NULL) return NULL; // The given key is found else if(node->Key == key) return node; // The given is greater than // current node's key else if(node->Key < key) return Search(node->Right, key); // The given is smaller than // current node's key else return Search(node->Left, key); } Since we will always search for a key from the root node, we can create another Search() function as follows: bool BST::Search(int key) { // Invoking Search() operation // and passing root node BSTNode * result = Search(root, key); // If key is found, returns TRUE // otherwise returns FALSE return result == NULL ? false : true; } The time complexity to find out a key in the BST is O(h), where h is the height of the BST. If we find a key which lies in the root node, the time complexity will be O(1), which is the best case. If we search for a key in a skewed tree, the time complexity will be O(N), where N is the total number of keys in the BST, which is the worst case. Retrieving the minimum and maximum key values Finding out the minimum and maximum key values in a BST is also quite simple. To get a minimum key value, we just need to go to the leftmost node and get the key value. On the contrary, we just need to go to the rightmost node and we will find the maximum key value. The following is the implementation of the FindMin() operation to retrieve the minimum key value, and the FindMax() operation to retrieve the maximum key value: int BST::FindMin(BSTNode * node) { if(node == NULL) return -1; else if(node->Left == NULL) return node->Key; else return FindMin(node->Left); } int BST::FindMax(BSTNode * node) { if(node == NULL) return -1; else if(node->Right == NULL) return node->Key; else return FindMax(node->Right); } We return -1 if we cannot find the minimum or maximum value in the tree, since we assume that the tree can only have a positive integer. If we intend to store the negative integer as well, we need to modify the function's implementation, for instance, by returning NULL if no minimum or maximum values are found. As usual, we will always find the minimum and maximum key values from the root node, so we can invoke the preceding operations as follows: int BST::FindMin() { return FindMin(root); } int BST::FindMax() { return FindMax(root); } Similar to the Search() operation, the time complexity of the FindMin() and FindMax() operations is O(h), where h is the height of the BST. However, if we find the maximum key value in a skewed left BST, the time complexity will be O(1), which is the best case, since it doesn't have any right subtree. This also happens if we find the minimum key value in a skewed right BST. The worst case will appear if we try to find the minimum key value in a skewed left BST or try to find the maximum key value in a skewed right BST, since the time complexity will be O(N). Finding out the successor of a key in a BST Other properties that we can find from a BST are the successor and the predecessor. We are going to create two functions named Successor() and Predecessor() in C++. But before we create the code, let's discuss how to find out the successor and the predecessor of a key of a BST. In this section, we are going to learn about the successor first, and then we will discuss the predecessor in the upcoming section. There are three rules to find out the successor of a key of a BST. Suppose we have a key, k, that we have searched for using the previous Search() function. We will also use our preceding BST to find out the successor of a specific key. The successor of k can be found as follows: If k has a right subtree, the successor of k will be the minimum integer in the right subtree of k. From our preceding BST, if k = 31, Successor(31) will give us 53 since it's the minimum integer in the right subtree of 31. Please take a look at the following diagram: If k does not have a right subtree, we have to traverse the ancestors of k until we find the first node, n, which is greater than node k. After we find node n, we will see that node k is the maximum element in the left subtree of n. From our preceding BST, if k = 15, Successor(15) will give us 23 since it's the first greater ancestor compared with 15, which is 23. Please take a look at the following diagram: If k is the maximum integer in the BST, there's no successor of k. From the preceding BST, if we run Successor(88), we will get -1, which means no successor has been found, since 88 is the maximum key of the BST. Based on our preceding discussion about how to find out the successor of a given key in a BST, we can create a Successor() function in C++ with the following implementation: int BST::Successor(BSTNode * node) { // The successor is the minimum key value // of right subtree if (node->Right != NULL) { return FindMin(node->Right); } // If no any right subtree else { BSTNode * parentNode = node->Parent; BSTNode * currentNode = node; // If currentNode is not root and // currentNode is its right children // continue moving up while ((parentNode != NULL) && (currentNode == parentNode->Right)) { currentNode = parentNode; parentNode = currentNode->Parent; } // If parentNode is not NULL // then the key of parentNode is // the successor of node return parentNode == NULL ? -1 : parentNode->Key; } } However, since we have to find a given key's node first, we have to run Search() prior to invoking the preceding Successor() function. The complete code for searching for the successor of a given key in a BST is as follows: int BST::Successor(int key) { // Search the key's node first BSTNode * keyNode = Search(root, key); // Return the key. // If the key is not found or // successor is not found, // return -1 return keyNode == NULL ? -1 : Successor(keyNode); } From our preceding Successor() operation, we can say that the average time complexity of running the operation is O(h), where h is the height of the BST. However, if we try to find out the successor of a maximum key in a skewed right BST, the time complexity of the operation is O(N), which is the worst case scenario. Finding out the predecessor of a key in a BST If k has a left subtree, the predecessor of k will be the maximum integer in the left subtree of k. From our preceding BST, if k = 12, Predecessor(12) will be 7 since it's the maximum integer in the left subtree of 12. Please take a look at the following diagram: If k does not have a left subtree, we have to traverse the ancestors of k until we find the first node, n, which is lower than node k. After we find node n, we will see that node n is the minimum element of the traversed elements. From our preceding BST, if k = 29, Predecessor(29) will give us 23 since it's the first lower ancestor compared with 29, which is 23. Please take a look at the following diagram: If k is the minimum integer in the BST, there's no predecessor of k. From the preceding BST, if we run Predecessor(3), we will get -1, which means no predecessor is found since 3 is the minimum key of the BST. Now, we can implement the Predecessor() operation in C++ as follows: int BST::Predecessor(BSTNode * node) { // The predecessor is the maximum key value // of left subtree if (node->Left != NULL) { return FindMax(node->Left); } // If no any left subtree else { BSTNode * parentNode = node->Parent; BSTNode * currentNode = node; // If currentNode is not root and // currentNode is its left children // continue moving up while ((parentNode != NULL) && (currentNode == parentNode->Left)) { currentNode = parentNode; parentNode = currentNode->Parent; } // If parentNode is not NULL // then the key of parentNode is // the predecessor of node return parentNode == NULL ? -1 : parentNode->Key; } } And, similar to the Successor() operation, we have to search for the node of a given key prior to invoking the preceding Predecessor() function. The complete code for searching for the predecessor of a given key in a BST is as follows: int BST::Predecessor(int key) { // Search the key's node first BSTNode * keyNode = Search(root, key); // Return the key. // If the key is not found or // predecessor is not found, // return -1 return keyNode == NULL ? -1 : Predecessor(keyNode); } Similar to our preceding Successor() operation, the time complexity of running the Predecessor() operation is O(h), where h is the height of the BST. However, if we try to find out the predecessor of a minimum key in a skewed left BST, the time complexity of the operation is O(N), which is the worst case scenario. Removing a node based on a given key The last operation in the BST that we are going to discuss is removing a node based on a given key. We will create a Remove() operation in C++. There are three possible cases for removing a node from a BST, and they are as follows: Removing a leaf (a node that doesn't have any child). In this case, we just need to remove the node. From our preceding BST, we can remove keys 7, 15, 29, and 53 since they are leaves with no nodes. Removing a node that has only one child (either a left or right child). In this case, we have to connect the child to the parent of the node. After that, we can remove the target node safely. As an example, if we want to remove node 3, we have to point the Parent pointer of node 7 to node 12 and make the left node of 12 points to 7. Then, we can safely remove node 3. Removing a node that has two children (left and right children). In this case, we have to find out the successor (or predecessor) of the node's key. After that, we can replace the target node with the successor (or predecessor) node. Suppose we want to remove node 31, and that we want 53 as its successor. Then, we can remove node 31 and replace it with node 53. Now, node 53 will have two children, node 29 in the left and node 88 in the right. Also, similar to the Search() operation, if the target node doesn't exist, we just need to return NULL. The implementation of the Remove() operation in C++ is as follows: BSTNode * BST::Remove( BSTNode * node, int key) { // The given node is // not found in BST if (node == NULL) return NULL; // Target node is found if (node->Key == key) { // If the node is a leaf node // The node can be safely removed if (node->Left == NULL && node->Right == NULL) node = NULL; // The node have only one child at right else if (node->Left == NULL && node->Right != NULL) { // The only child will be connected to // the parent's of node directly node->Right->Parent = node->Parent; // Bypass node node = node->Right; } // The node have only one child at left else if (node->Left != NULL && node->Right == NULL) { // The only child will be connected to // the parent's of node directly node->Left->Parent = node->Parent; // Bypass node node = node->Left; } // The node have two children (left and right) else { // Find successor or predecessor to avoid quarrel int successorKey = Successor(key); // Replace node's key with successor's key node->Key = successorKey; // Delete the old successor's key node->Right = Remove(node->Right, successorKey); } } // Target node's key is smaller than // the given key then search to right else if (node->Key < key) node->Right = Remove(node->Right, key); // Target node's key is greater than // the given key then search to left else node->Left = Remove(node->Left, key); // Return the updated BST return node; } Since we will always remove a node starting from the root node, we can simplify the preceding Remove() operation by creating the following one: void BST::Remove(int key) { root = Remove(root, key); } As shown in the preceding Remove() code, the time complexity of the operation is O(1) for both case 1 (the node that has no child) and case 2 (the node that has only one child). For case 3 (the node that has two children), the time complexity will be O(h), where h is the height of the BST, since we have to find the successor or predecessor of the node's key. If you found this tutorial useful, do check out the book C++ Data Structures and Algorithms for more useful material on data structure and algorithms with real-world implementation in C++. Working with shaders in C++ to create 3D games Getting Inside a C++ Multithreaded Application Understanding the Dependencies of a C++ Application
Read more
  • 0
  • 3
  • 189462

article-image-fast-array-operations-numpy
Packt
19 Dec 2013
10 min read
Save for later

Fast Array Operations with NumPy

Packt
19 Dec 2013
10 min read
(For more resources related to this topic, see here.) Getting started with NumPy NumPy is founded around its multidimensional array object, numpy.ndarray. NumPy arrays are a collection of elements of the same data type; this fundamental restriction allows NumPy to pack the data in an efficient way. By storing the data in this way NumPy can handle arithmetic and mathematical operations at high speed. Creating arrays You can create NumPy arrays using the numpy.array function. It takes list-like object (or another array) as input and, optionally, a string expressing its data type. You can interactively test array creation using an IPython shell as follows: In [1]: import numpy as np In [2]: a = np.array([0, 1, 2]) Every NumPy array has a data type that can be accessed by the dtype attribute, as shown in the following code. In the following code example, dtype is a 64-bit integer. In [3]: a.dtype Out[3]: dtype('int64') If we want those numbers to be treated as a float type of variable, we can either pass the dtype argument in the np.array function or cast the array to another data type using the astype method as shown in the following code: In [4]: a = np.array([1, 2, 3], dtype='float32') In [5]: a.astype('float32') Out[5]: array([ 0.,  1.,  2.], dtype=float32) To create an array with two dimensions (an array of arrays) we can initialize the array using a nested sequence shown as follows: In [6]: a = np.array([[0, 1, 2], [3, 4, 5]]) In [7]: print(a) Out[7]: [[0 1 2]         [3 4 5]] The array created in this way has two dimensions—axes in NumPy's jargon. Such an array is like a table that contains two rows and three columns. We can access the axes structure using the ndarray.shape attribute: In [7]: a.shape Out[7]: (2, 3) Arrays can also be reshaped only as long as the product of the shape dimensions is equal to the total number of elements in the array. For example, we can reshape an array containing 16 elements in the following ways: (2, 8), (4, 4), or (2, 2, 4). To reshape an array we can either use the ndarray.reshape method or directly change the ndarray.shape attribute. The following code illustrates the use of the ndarray.reshape method: In [7]: a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8,                       9, 10, 11, 12, 13, 14, 15]) In [7]: a.shape Out[7]: (16,) In [8]: a.reshape(4, 4) # Equivalent: a.shape = (4, 4) Out[8]: array([[ 0,  1,  2,  3],        [ 4,  5,  6,  7],        [ 8,  9, 10, 11],        [12, 13, 14, 15]]) Thanks to this property you are also free to add dimensions of size one. You can reshape an array with 16 elements to (16, 1), (1, 16), (16, 1, 1), and so on. NumPy provides convenience functions, shown in the following code, to create arrays filled with zeros, filled with ones, or without an initialization value (empty—their actual value is meaningless and depends on the memory state). Those functions take the array shape as a tuple and optionally its dtype. In [8]: np.zeros((3, 3)) In [9]: np.empty((3, 3)) In [10]: np.ones((3, 3), dtype='float32') In our examples we will use the numpy.random module to generate random floating point numbers in the (0, 1) interval. The numpy.random module is shown as follows: In [11]: np.random.rand(3, 3) Sometimes it is convenient to initialize arrays that have a similar shape to other arrays. Again, NumPy provides some handy functions for that purpose such as zeros_like, empty_like, and ones_like. These functions are as follows: In [12]: np.zeros_like(a) In [13]: np.empty_like(a) In [14]: np.ones_like(a) Accessing arrays NumPy array interface is, on a shallow level, similar to Python lists. They can be indexed using integers, and can also be iterated using a for loop. In [15]: A = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8]) In [16]: A[0] Out[16]: 0 In [17]: [a for a in A] Out[17]: [0, 1, 2, 3, 4, 5, 6, 7, 8] It is also possible to index an array in multiple dimensions. If we take a (3,3) array (an array containing 3 triplets) and we index the first element, we obtain the first triplet shown as follows: In [18]: A = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) In [19]: A[0] Out[19]: array([0, 1, 2]) We can index the triplet again by adding the other index separated by a comma. To get the second element of the first triplet we can index using [0, 1] as shown in the following code: In [20]: A[0, 1] Out[20]: 1 NumPy allows you to slice arrays in single and multiple dimensions. If we index on the first dimension we will get a collection of triplets shown as follows: In [21]: A[0:2] Out[21]: array([[0, 1, 2],                [3, 4, 5]]) If we slice the array with [0:2]. for every selected triplet we extract the first two elements, resulting in a (2, 2) array shown in the following code: In [22]: A[0:2, 0:2] Out[22]: array([[0, 1],                 [3, 4]]) Intuitively, you can update values in the array by using both numerical indexes and slices. The syntax is as follows: In [23]: A[0, 1] = 8 In [24]: A[0:2, 0:2] = [[1, 1], [1, 1]] Indexing with the slicing syntax is fast because it doesn't make copies of the array. In NumPy terminology it returns a view over the same memory area. If we take a slice of the original array and then changes one of its value; the original array will be updated as well. The following code illustrates an example of the same: In [25]: a = np.array([1, 1, 1, 1]) In [26]: a_view = A[0:2] In [27]: a_view[0] = 2 In [28]: print(A) Out[28]: [2 1 1 1] We can take a look at another example that shows how the slicing syntax can be used in a real-world scenario. We define an array r_i, shown in the following line of code, which contains a set of 10 coordinates (x, y); its shape will be (10, 2): In [29]: r_i = np.random.rand(10, 2) A typical operation is extracting the x component of each coordinate. In other words you want to extract the items [0, 0], [1, 0], [2, 0], and so on. resulting in an array with shape (10,). It is helpful to think that the first index is moving while the second one is fixed (at 0). With this in mind, we will slice every index on the first axis (the moving one) and take the first element (the fixed one) on the second axis as shown in the following line of code: In [30]: x_i = r_i[:, 0] On the other hand, the following expression of code will keep the first index fixed and the second index moving, giving the first (x, y) coordinate: In [31]: r_0 = r_i[0, :] Slicing all the indexes over the last axis is optional; using r_i[0] has the same effect as r_i[0, :]. NumPy allows to index an array by using another NumPy array made of either integer or Boolean values—a feature called fancy indexing. If you index with an array of integers, NumPy will interpret the integers as indexes and will return an array containing their corresponding values. If we index an array containing 10 elements with [0, 2, 3], we obtain an array of size 3 containing the elements at positions 0, 2 and 3. The following code gives us an illustration of this concept: In [32]: a = np.array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0]) In [33]: idx = np.array([0, 2, 3]) In [34]: a[idx] Out[34]: array([9, 7, 6]) You can use fancy indexing on multiple dimensions by passing an array for each dimension. If we want to extract the elements [0, 2] and [1, 3] we have to pack all the indexes acting on the first axis in one array, and the ones acting on the second axis in another. This can be seen in the following code: In [35]: a = np.array([[0, 1, 2], [3, 4, 5],                        [6, 7, 8], [9, 10, 11]]) In [36]: idx1 = np.array([0, 1]) In [37]: idx2 = np.array([2, 3]) In [38]: a[idx1, idx2] You can also use normal lists as index arrays, but not tuples. For example the following two statements are equivalent: >>> a[np.array([0, 1])] # is equivalent to >>> a[[0, 1]] However, if you use a tuple, NumPy will interpret the following statement as an index on multiple dimensions: >>> a[(0, 1)] # is equivalent to >>> a[0, 1] The index arrays are not required to be one-dimensional; we can extract elements from the original array in any shape. For example we can select elements from the original array to form a (2,2) array shown as follows: In [39]: idx1 = [[0, 1], [3, 2]] In [40]: idx2 = [[0, 2], [1, 1]] In [41]: a[idx1, idx2] Out[41]: array([[ 0,  5],                 [10,  7]]) The array slicing and fancy indexing features can be combined. For example, this is useful if we want to swap the x and y columns in a coordinate array. In the following code, the first index will be running over all the elements (a slice), and for each of those we extract the element in position 1 (the y) first and then the one in position 0 (the x): In [42]: r_i = np.random(10, 2) In [43]: r_i[:, [0, 1]] = r_i[:, [1, 0]] When the index array is a Boolean there are slightly different rules. The Boolean array will act like a mask; every element corresponding to True will be extracted and put in the output array. This procedure is shown as follows: In [44]: a = np.array([0, 1, 2, 3, 4, 5]) In [45]: mask = np.array([True, False, True, False, False, False]) In [46]: a[mask] Out[46]: array([0, 2]) The same rules apply when dealing with multiple dimensions. Furthermore, if the index array has the same shape as the original array, the elements corresponding to True will be selected and put in the resulting array. Indexing in NumPy is a reasonably fast operation. Anyway, when speed is critical, you can use the, slightly faster, numpy.take and numpy.compress functions to squeeze out a little more speed. The first argument of numpy.take is the array we want to operate on, and the second is the list of indexes we want to extract. The last argument is axis; if not provided, the indexes will act on the flattened array, otherwise they will act along the specified axis. In [47]: r_i = np.random(100, 2) In [48]: idx = np.arange(50) # integers 0 to 50 In [49]: %timeit np.take(r_i, idx, axis=0) 1000000 loops, best of 3: 962 ns per loop In [50]: %timeit r_i[idx] 100000 loops, best of 3: 3.09 us per loop The similar, but faster version for Boolean arrays is numpy.compress which works in the same way. The use of numpy.compress is shown as follows: In [51]: idx = np.ones(100, dtype='bool') # all True values In [52]: %timeit np.compress(idx, r_i, axis=0) 1000000 loops, best of 3: 1.65 us per loop In [53]: %timeit r_i[idx] 100000 loops, best of 3: 5.47 us per loop Summary The article thus covers the basics of NumPy arrays, talking about the creating of arrays and how we can access them. Resources for Article: Further resources on this subject: Getting Started with Spring Python [Article] Python Testing: Installing the Robot Framework [Article] Python Multimedia: Fun with Animations using Pyglet [Article]
Read more
  • 0
  • 0
  • 107489

article-image-what-is-multi-layered-software-architecture
Packt Editorial Staff
17 May 2018
7 min read
Save for later

What is a multi layered software architecture?

Packt Editorial Staff
17 May 2018
7 min read
Multi layered software architecture is one of the most popular architectural patterns today. It moderates the increasing complexity of modern applications. It also makes it easier to work in a more agile manner. That's important when you consider the dominance of DevOps and other similar methodologies today. Sometimes called tiered architecture, or n-tier architecture, a multi layered software architecture consists of various layers, each of which corresponds to a different service or integration. Because each layer is separate, making changes to each layer is easier than having to tackle the entire architecture. Let's take a look at how a multi layered software architecture works, and what the advantages and disadvantages of it are. This has been taken from the book Architectural Patterns. Find it here. What does a layered software architecture consist of? Before we get into a multi layered architecture, let's start with the simplest form of layered architecture - three tiered architecture. This is a good place to start because all layered software architecture contains these three elements. These are the foundations: Presentation layer: This is the first and topmost layer which is present in the application. This tier provides presentation services, that is presentation, of content to the end user through GUI. This tier can be accessed through any type of client device like desktop, laptop, tablet, mobile, thin client, and so on. For the content to the displayed to the user, the relevant web pages should be fetched by the web browser or other presentation component which is running in the client device. To present the content, it is essential for this tier to interact with the other tiers that are present preceding it. Application layer: This is the middle tier of this architecture. This is the tier in which the business logic of the application runs. Business logic is the set of rules that are required for running the application as per the guidelines laid down by the organization. The components of this tier typically run on one or more application servers. Data layer: This is the lowest tier of this architecture and is mainly concerned with the storage and retrieval of application data. The application data is typically stored in a database server, file server, or any other device or media that supports data access logic and provides the necessary steps to ensure that only the data is exposed without providing any access to the data storage and retrieval mechanisms. This is done by the data tier by providing an API to the application tier. The provision of this API ensures complete transparency to the data operations which are done in this tier without affecting the application tier. For example, updates or upgrades to the systems in this tier do not affect the application tier of this architecture. The diagram below shows how a simple layered architecture with 3 tiers works:   These three layers are essential. But other layers can be built on top of them. That's when we get into multi layered architecture. It's sometimes called n-tiered architecture because the number of tiers or layers (n) could be anything! It depends on what you need and how much complexity you're able to handle. Multi layered software architecture A multi layered software architecture still has the presentation layer and data layer. It simply splits up and expands the application layer. These additional aspects within the application layer are essentially different services. This means your software should now be more scalable and have extra dimensions of functionality. Of course, the distribution of application code and functions among the various tiers will vary from one architectural design to another, but the concept remains the same. The diagram below illustrates what a multi layered software architecture looks like. As you can see, it's a little more complex that a three-tiered architecture, but it does increase scalability quite significantly: What are the benefits of a layered software architecture? A layered software architecture has a number of benefits - that's why it has become such a popular architectural pattern in recent years. Most importantly, tiered segregation allows you to manage and maintain each layer accordingly. In theory it should greatly simplify the way you manage your software infrastructure. The multi layered approach is particularly good for developing web-scale, production-grade, and cloud-hosted applications very quickly and relatively risk-free. It also makes it easier to update any legacy systems - when you're architecture is broken up into multiple layers, the changes that need to be made should be simpler and less extensive than they might otherwise have to be. When should you use a multi layered software architecture? Clearly, the argument for a multi layered software architecture is pretty clear. However, there are some instances when it is particularly appropriate: If you are building a system in which it is possible to split the application logic into smaller components that could be spread across several servers. This could lead to the design of multiple tiers in the application tier. If the system under consideration requires faster network communications, high reliability, and great performance, then n-tier has the capability to provide that as this architectural pattern is designed to reduce the overhead which is caused by network traffic. An example of a multi layered software architecture We can illustrate the working of an multi layered architecture with the help of an example of a shopping cart web application which is present in all e-commerce sites. The shopping cart web application is used by the e-commerce site user to complete the purchase of items through the e-commerce site. You'd expect the application to have several features that allow the user to: Add selected items to the cart Change the quantity of items in their cart Make payments The client tier, which is present in the shopping cart application, interacts with the end user through a GUI. The client tier also interacts with the application that runs in the application servers present in multiple tiers. Since the shopping cart is a web application, the client tier contains the web browser. The presentation tier present in the shopping cart application displays information related to the services like browsing merchandise, buying them, adding them to the shopping cart, and so on. The presentation tier communicates with other tiers by sending results to the client tier and all other tiers which are present in the network. The presentation tier also makes calls to database stored procedures and web services. All these activities are done with the objective of providing a quick response time to the end user. The presentation tier plays a vital role by acting as a glue which binds the entire shopping cart application together by allowing the functions present in different tiers to communicate with each other and display the outputs to the end user through the web browser. In this multi layered architecture, the business logic which is required for processing activities like calculation of shipping cost and so on are pulled from the application tier to the presentation tier. The application tier also acts as the integration layer and allows the applications to communicate seamlessly with both the data tier and the presentation tier. The last tier which is the data tier is used to maintain data. This layer typically contains database servers. This layer maintains data independent from the application server and the business logic. This approach provides enhanced scalability and performance to the data tier. Read next Microservices and Service Oriented Architecture What is serverless architecture and why should I be interested?
Read more
  • 0
  • 1
  • 107301
Visually different images

article-image-mastering-the-api-life-cycle-a-comprehensive-guide-to-design-implementation-release-and-maintenance
Bruno Pedro
06 Nov 2024
15 min read
Save for later

Mastering the API Life Cycle: A Comprehensive Guide to Design, Implementation, Release, and Maintenance

Bruno Pedro
06 Nov 2024
15 min read
This article is an excerpt from the book, "Building an API Product", by Bruno Pedro. Build cutting-edge API products confidently, excelling in today's competitive market with this comprehensive guide on API fundamentals, inner workings, and steps for successful API product development.Introduction The life of an API product consists of a series of stages. Those stages form a cycle that starts with the initial conception of the API product and ends with the retirement of the API. The name of this sequence of stages is called a life cycle. This term started to gain popularity in software and product development in the 1980s. It’s used as a common framework to align the different participants during the life of a software application or product. Each stage of the API life cycle has specific goals, deliverables, and activities that must be completed before advancing to the next stage. There are many variations on the concept of API life cycles. I use my own version to simplify learning and focus on what is essential. Over the years, I have distilled the API life cycle into four easy-to-understand stages.  They are the design, implementation, release, and maintenance stages. Keep reading to gain an overview of what each of the stages looks like.  Figure 4.1 – The API life cycle The goal of this chapter is to provide you with a global overview of what an API life cycle is. You will see each one of the stages of the API life cycle as a transition and not simply an isolated step. You will first learn about the design stage and understand how it’s foundational to the success of an API product. Th en, you’ll continue o n to the implementation stage, where you’ll learn that a big part of an API server can be generated. After that, the chapter explores the release stage, where you’ll learn the importance of finding the right distribution model. Finally, you’ll understand the importance of versioning and sunsetting your API in the maintenance stage. After reading the chapter, you will understand and be able to recognize the API life cycle’s diff erent stages. You will understand how each API life cycle stage connects to the others. You will also know the participants and stakeholders of each stage of the API life cycle. Finally, you will know the most critical aspects of each stage of the API life cycle. In this article, you’ll learn about the four stages of the API life cycle: Design Implement Release Maintain  Design The first stage of the API life cycle is where you decide what you will build. You can view the design stage as a series of steps where your view of what your API will become gets more refined and validated. At the end of the design stage, you will be able to confidently implement your API, knowing that it’s aligned with the needs of your business and your customers. The steps I take in the design stage are as follows: Ideation Strategy Definition Validation Specification These steps help me advance in holistically designing the API, involving as many different stakeholders as possible so I get a complete alignment. I usually start with a rough idea of what the ideal API would look like. Then I start asking different stakeholders as many questions as possible to understand whether my initial assumptions were correct. Something I always ask is why an API should be built. Even though it looks like a simple question, its answer can reveal the real intentions behind building the API. Also, the answer is different depending on whom you ask the question. Your job is to synthesize the information you gather and document pieces of evidence that back up the decisions you make about the API design. You will, at this stage, interview as many stakeholders as possible. They can include potential API users, engineers who work with you, and your company’s leadership team. The goal is to find out why you’re building the API and to document it. Once you know why you’re building the API, you’ll learn what the API will look like to fit the needs of potential users. To learn what API users need, identify the personas you want to serve and then put yourself in their shoes. You’ve already seen a few proto-personas in Chapter 2. In this API life cycle stage, you draw from those generic personas and identify your API users. You then contact people representing your API user personas and interview them. During the interviews, you should understand their JTBDs, the challenges they face during their work, and the tools they use. From the information you obtain, you can infer the benefits they would get from the API you’re building and how they would use the API. This last piece of information is critical because it lets you define the architectural style of the API. By knowing what tools your user personas use daily, you can make an informed decision about the architectural style of your API. Architectural styles are how you identify the technology and type of communication that the API will use. For example, REST is one architectural style that lets API consumers interact with remote resources by executing one of the HTTP verbs. Among those verbs, there’s one that’s natively supported by web browsers—HTTP GET. So, if you identify that a user persona wants to use a web browser to consume your API, then you will want to follow the REST architectural style and limit it to HTTP GET. Otherwise, that user persona won’t be able to use your API directly from their tool of choice. Something else you’ll want to define is the capabilities your API will offer users. Defining capabilities is an exercise that combines the information you gathered from interviews. You translate JTBDs, benefits, and behaviors into a set of capabilities that your API will have. Ideally, those capabilities will cover all the needs of the users whom you interviewed. However, you might want to prioritize the capabilities according to their degree of urgency and the cost of implementation. In any case, you want to validate your assumptions before investing in actually implementing the API. Validation of your API design happens first at a high level, and after a positive review, you attempt a low-level validation. High-level validation involves sharing the definition of the architectural style and capabilities that you have created with the API stakeholders. You present your findings to the stakeholders, explain how you came up with the definitions, and then ask for their review. Sometimes the feedback will make you question your assumptions, and you must refine your definitions. Eventually, you will get to a point where the stakeholders are all aligned with what you think the API should be. At that point, you’re ready to attempt a low-level validation. The difference between a high-level and a low-level validation is the amount of detail you share with your stakeholders and how technical the feedback you expect needs to be. While in high-level validation, you mostly expect an opinion about the design of the API, in low-level validation, you actually want the stakeholders to test the API before you start building it. You do that by creating what is called an API mock server. It allows anyone to make real API requests to a server as if they were making requests to the real API. The mock server responds with data that is not real but has the same shape that the responses of the real API would have. Stakeholders can then test making requests to the mock server from their tools of choice to see how the API would work. You might need to make changes during this low-level validation process until the stakeholders are comfortable with how your API will work. After that, you’re ready to translate the API design into a machine-readable definition document that will be used during the implementation stage of the API life cycle. The type of machine-readable definition depends on the architectural style identified earlier. If, for example, the architectural style is REST, then you’ll create an OpenAPI document. Otherwise, you will work with the type of machine-readable definition most appropriate for the architectural style of the API. Once you have a machine-readable API definition, you’re ready to advance to the implementation stage of the API life cycle. Implementation Having a machine-readable API definition is halfway to getting an entire API server up and running. I won’t focus on any particular architectural style, so you can keep all options open at this point. The goal of the machine-readable definition is to make it easy to generate server code and configuration and give your API consumers a simple way to interact with your API. Some API server solutions require almost no coding as long as you have a machine-readable definition. One type of coding you’ll need to do—or ask an engineer to do—is the code responsible for the business logic behind each API capability. While the API itself can be almost entirely generated, the logic behind each capability must be programmed and linked to the API. Usually, you’ll start with a first version of your API server that can run locally and will be used to iteratively implement all the business logic behind each of the capabilities. Later, you’ll make your API server publicly available to your API consumers. When I say publicly available, I mean that your API consumers should be able to securely make requests. One of the elements of security that you should think about is authentication. Many APIs are fully open to the public without requiring any type of authentication. However, when building an API product, you want to identify who your users are. Monetization is only possible if you know who is making requests to your API. Other security factors to consider have already been covered in Chapter 3. They include things such as logging, monitoring, and rate limiting. In any case, you should always test your API thoroughly during the implementation stage to make sure that everything is working according to plan. One type of test that is particularly useful at this stage is contract testing. This type of test aims to verify whether the API responses include the expected information in the expected format. The word contract is used to describe the API definition as something that both you—the API producers—and your consumers agree to. By performing a contract test, you’ll verify whether the implementation of the API has been done according to what has been designed and defined in the machine-readable document. For example, you can verify whether a particular capability is responding with the type of data that you defined. Before deploying your API to production, though, you want to be more thorough with your testing. Other types of tests that are well suited to be performed at this stage are functional and performance testing. Functional tests, in particular, can help you identify areas of the API that are not behaving as functionally as intended. Testing different elements of your API helps you increase its quality. Nevertheless, there’s another activity that focuses on API quality and relies on tests to obtain insights. Quality assurance, or QA, is one type of activity where you test your API capabilities using different inputs and check whether the responses are the expected ones. QA can be performed manually or  automatically by following a programmable script. Performing API QA has the advantage of improving the quality of your API, its overall user experience, and even the security of the product. Since a QA process can identify defects early on during the implementation stage of an API product, it can reduce the cost of fi xing those defects if they’re found when consumers are already using the API. While contract and functional tests provide information on how an API works, QA off ers a broader perspective on how consumers experience the API. A QA process can be a part of the release process of your API and can determine whether the proposed changes have production quality. Release In soft ware development, you can say that a release happens whenever you make your soft ware available to users. Diff erent release environments target diff erent kinds of users. You can have a development environment that is mostly used to share your soft ware with other developers and to make testing easy. Th ere can also be a staging environment where the soft ware is available to a broader audience, and QA testing can happen. Finally, there is a production environment where the soft ware is made available generally to your customers. Releasing soft ware—and API products—can be done manually or automatically. While manual releases work well for small projects, things can get more complicated if you have a large code base and a growing team working on the project. In those situations, you want to automate the release as much as possible with something called a build process. During implementation, you focus on developing your API and ensuring you have all tests in place. If those tests are all fully automated, you can make them run every time you try to release your API. Each build process can automatically run a series of steps, including packaging the soft ware, making it available on a mock server, and running tests. If any of the build steps fail, you can consider that the whole build process failed, and the API isn’t released. If the build process succeeds, you have a packaged API ready to be deployed into your environment of choice. Deploying the API means it will become available to any users with access to the environment where you’re doing the release. You can either manage the deployment process yourself, including the servers where your API will run, or use one of the many available API gateway products. Either way, you’ll want to have a layer of control between your users and your API. If controlling how users interact with your API is important, knowing how your API is behaving is also fundamental. If you know how your API behaves, you can understand whether its behavior is aff ecting your users’ experience. By anticipating how users can be negatively aff ected, you can proactively take measures and improve the quality of your API. Using an API monitor lets you periodically receive information about the behavior and quality of your API. You can understand whether any part of your API is not working as expected by using a solution such as a Postman Monitor. Diff erent solutions let you gather information about API availability, response times, and error rates. If you want to go deeper and understand how the API server is performing, you can also use an Application Performance Monitor (APM). Services such as New Relic give you information about the performance and error rate of the server and the code that is running your API. Another area that you want to pay attention to during the release stage of the API life cycle is documentation. While you can have an API reference automatically built from your machine-readable defi nition, you’ll want to pay attention to other aspects of documentation. As you’ve seen in Chapter 2, good API documentation is fundamental to obtaining a good user experience. In Chapter 3, you learned how documentation can enhance support and help users get answers to their questions when interacting with your API. Documentation also involves tutorials covering the JTBDs of the API user personas and clearly showing how consumers can interact with each API feature. To promote the whole API and the features you’re releasing, you can make an announcement to your customers and the community. Announcing a release is a good idea because it raises the general public’s awareness and helps users understand what has changed since the last release. Depending on the size of your company, your available marketing budget, and the importance of the release, you choose the media where you make the announcement. You could simply share the news on your blog, or go all the way and promote the new version of your API with a marketing campaign. Your goal is always to reach the existing users of your API and to make the news available to other potential users. Sharing news about your release is a way to increase the reach of your API. Another way is to distribute your API reference in existing API marketplaces that already have their own audience. Online marketplaces let you list your API so potential users can fi nd it and start using it. Th ere are vertical marketplaces that focus on specifi c sectors, such as healthcare or education. Other marketplaces are more generic and let you list any API. Th e elements you make available are usually your API reference, documentation, and pointers on signing up and starting to use the API. You can pick as many marketplaces as you like. Keep in mind that some of the existing solutions charge you for listing your API, so measure each marketplace as a distribution channel. You can measure how many users sign up and use your API across the marketplaces where your API is listed. Over time, you’ll understand which marketplaces aren’t worth keeping, and you can remove your API from those. Th is measurement is part of API analytics, one of the activities of the maintenance stage of the API life cycle. Keep rea ding to learn more about it. Maintenance You’re now in the last stage of the API life cycle. This is the stage where you make sure that your API is continuously running without disturbances. Of all the activities at this stage, the one where you’ll spend the most time will be analyzing how users interact with your API. Analytics is where you understand who your users are, what they’re doing, whether they’re being successful, and if not, how you can help them succeed. The information you gather will help you identify features that you should keep, the ones that you should improve, and the ones that you should shut down. But analytics is not limited to usage. You can also obtain performance, security, and even business metrics. For example, with analytics, you can identify the customers who interact with the top features of your API and understand how much revenue is being generated. That information can tell you whether the investment in those top features is paying off. You can also understand what errors are the most common and which customers are having the most difficulties. Being able to do that allows you to proactively fix problems before users get in touch with your support team. Something to keep in mind is that there will be times when users will have difficulties working with your API. The issues can be related to your API server being slow or not working at all. There can be problems related to connectivity between some users and your API. Alternatively, individual users can have issues that only affect them. All these situations usually lead to customers contacting your support team. Having a support system in place is important because it increases the satisfaction of your users and their trust in your product. Without support, users will feel lost when they have difficulties. Worse, they’ll share their problems publicly without you having a chance to help. One situation where support is particularly requested is when you need to release a new version of your API. Versioning happens whenever you introduce new features, fix existing ones, or deprecate some part of your API. Having a version helps your users know what they should expect when interacting with your API. Versioning also enables you to communicate and identify those changes in different categories. You can have minor bug fixes, new features, or breaking changes. All those can affect how customers use your API, and communicating them is essential to maintaining a good experience. Another aspect of versioning is the ability to keep several versions running. As the API producer, running more than one version can be helpful but can increase your costs. The advantage of having at least two versions is that you can roll back to the previous version if the current one is having issues. This is often considered a good practice. Knowing when to end the life of your entire API or some of its features is a simple task, especially when there are customers using your API regularly. First of all, it’s essential that you have a communication plan so your customers know in advance when your API will stop working. Things to mention in the communication plan include a timeline of the shutdown and any alternative options, if available, even from a competitor of yours. A second aspect to account for is ensuring the API sunset is done according to existing laws and regulations. Other elements include handling the retention of data processed or generated by usage of the API and continuing to monitor accesses to the API even after you shut it down. ConclusionAt this point, you know how to identify the different stages of the API life cycle and how they’re all interconnected. You also understand which stakeholders participate at each stage of the API life cycle. You can describe the most important elements of each stage of the API life cycle and know why they must be considered to build a successful API product. You first learned about my simplified version of the API life cycle and its four stages. You then went into each of them, starting with the design stage. You learned how designing an API can affect its success. You understood the connection between user personas, their attributes, and the architectural type of the API that you’re building. After that, you got to know what high and low-level design validations are and how they can help you reach a product-market fit. You then learned that having a machine-readable definition enables you to document your API but is also a shortcut to implementing its server and infrastructure. Afterward, you learned about contract testing and QA and how they connect to the implementation and release stages. You acquired knowledge about the different release environments and learned how they’re used. You knew about distribution and API marketplaces and how to measure API usage and performance. Finally, you learned how to version and eventually shut down your API. Author BioBruno Pedro is a computer science professional with over 25 years of experience in the industry. Throughout his career, he has worked on a variety of projects, including Internet traffic analysis, API backends and integrations, and Web applications. He has also managed teams of developers and founded several companies, including tarpipe, an iPaaS, in 2008, and the API Changelog in 2015. In addition to his work experience, Bruno has also made contributions to the API industry through his written work, including two published books on API-related topics and numerous technical magazine and web articles. He has also been a speaker at numerous API industry conferences and events from 2013 to 2018.
Read more
  • 1
  • 0
  • 87683

article-image-implementing-memory-management-with-golang-garbage-collector
Packt Editorial Staff
03 Sep 2019
10 min read
Save for later

Implementing memory management with Golang's garbage collector

Packt Editorial Staff
03 Sep 2019
10 min read
Did you ever think of how bulk messages are pushed in real-time that fast? How is it possible? Low latency garbage collector (GC) plays an important role in this. In this article, we present ways to look at certain parameters to implement memory management with the Golang GC. Garbage collection is the process of freeing up memory space that is not being used. In other words, the GC sees which objects are out of scope and cannot be referenced anymore and frees the memory space they consume. This process happens in a concurrent way while a Go program is running and not before or after the execution of the program. This article is an excerpt from the book Mastering Go - Third Edition by Mihalis Tsoukalos. Mihalis runs through the nuances of Go, with deep guides to types and structures, packages, concurrency, network programming, compiler design, optimization, and more.  Implementing the Golang GC The Go standard library offers functions that allow you to study the operation of the GC and learn more about what the GC does secretly. These functions are illustrated in the gColl.go utility. The source code of gColl.go is presented here in chunks. Package main import (    "fmt"    "runtime"    "time" ) You need the runtime package because it allows you to obtain information about the Go runtime system, which, among other things, includes the operation of the GC. func printStats(mem runtime.MemStats) { runtime.ReadMemStats(&mem) fmt.Println("mem.Alloc:", mem.Alloc) fmt.Println("mem.TotalAlloc:", mem.TotalAlloc) fmt.Println("mem.HeapAlloc:", mem.HeapAlloc) fmt.Println("mem.NumGC:", mem.NumGC, "\n") } The purpose of the printStats() function is to avoid writing the same Go code all the time. The runtime.ReadMemStats() call gets the latest garbage collection statistics for you. func main() {    var mem runtime.MemStats    printStats(mem)    for i := 0; i < 10; i++ { // Allocating 50,000,000 bytes        s := make([]byte, 50000000)        if s == nil {            fmt.Println("Operation failed!")          }    }    printStats(mem) In this part, we have a for loop that creates 10-byte slices with 50,000,000 bytes each. The reason for this is that by allocating large amounts of memory, we can trigger the GC. for i := 0; i < 10; i++ { // Allocating 100,000,000 bytes      s := make([]byte, 100000000)       if s == nil {           fmt.Println("Operation failed!")       }       time.Sleep(5 * time.Second)   } printStats(mem) } The last part of the program makes even bigger memory allocations – this time, each byte slice has 100,000,000 bytes. Running gColl.go on a macOS Big Sur machine with 24 GB of RAM produces the following kind of output: $ go run gColl.go mem.Alloc: 124616 mem.TotalAlloc: 124616 mem.HeapAlloc: 124616 mem.NumGC: 0 mem.Alloc: 50124368 mem.TotalAlloc: 500175120 mem.HeapAlloc: 50124368 mem.NumGC: 9 mem.Alloc: 122536 mem.TotalAlloc: 1500257968 mem.HeapAlloc: 122536 mem.NumGC: 19 The value of mem.Alloc is the bytes of allocated heap objects — allocated are all the objects that the GC has not yet freed. mem.TotalAlloc shows the cumulative bytes allocated for heap objects—this number does not decrease when objects are freed, which means that it keeps increasing. Therefore, it shows the total number of bytes allocated for heap objects during program execution. mem.HeapAlloc is the same as mem.Alloc. Last, mem.NumGC shows the total number of completed garbage collection cycles. The bigger that value is, the more you have to consider how you allocate memory in your code and if there is a way to optimize that. If you want even more verbose output regarding the operation of the GC, you can combine go run gColl.go with GODEBUG=gctrace=1. Apart from the regular program output, you get some extra metrics—this is illustrated in the following output: $ GODEBUG=gctrace=1 go run gColl.go gc 1 @0.021s 0%: 0.020+0.32+0.015 ms clock, 0.16+0.17/0.33/0.22+0.12 ms cpu, 4->4->0 MB, 5 MB goal, 8 P gc 2 @0.041s 0%: 0.074+0.32+0.003 ms clock, 0.59+0.087/0.37/0.45+0.030 ms cpu, 4->4->0 MB, 5 MB goal, 8 P . . . gc 18 @40.152s 0%: 0.065+0.14+0.013 ms clock, 0.52+0/0.12/0.042+0.10 ms cpu, 95->95->0 MB, 96 MB goal, 8 P gc 19 @45.160s 0%: 0.028+0.12+0.003 ms clock, 0.22+0/0.13/0.081+0.028 ms cpu, 95->95->0 MB, 96 MB goal, 8 P mem.Alloc: 120672 mem.TotalAlloc: 1500256376 mem.HeapAlloc: 120672 mem.NumGC: 19 Now, let us explain the 95->95->0 MB triplet in the previous line of output. The first value (95) is the heap size when the GC is about to run. The second value (95) is the heap size when the GC ends its operation. The last value is the size of the live heap (0). Go garbage collection is based on the tricolor algorithm The operation of the Go GC is based on the tricolor algorithm, which is the subject of this subsection. Note that the tricolor algorithm is not unique to Go and can be used in other programming languages as well. Strictly speaking, the official name for the algorithm used in Go is the tricolor mark-and-sweep algorithm. It can work concurrently with the program and uses a write barrier. This means that when a Go program runs, the Go scheduler is responsible for the scheduling of the application and the GC. This is as if the Go scheduler has to deal with a regular application with multiple goroutines! The core idea behind this algorithm came from Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens and was first illustrated in a paper named On-the-Fly Garbage Collection: An Exercise in Cooperation. The primary principle behind the tricolor mark-and-sweep algorithm is that it divides the objects of the heap into three different sets according to their color, which is assigned by the algorithm. It is now time to talk about the meaning of each color set. The objects of the black set are guaranteed to have no pointers to any object of the white set. However, an object of the white set can have a pointer to an object of the black set because this has no effect on the operation of the GC. The objects of the gray set might have pointers to some objects of the white set. Finally, the objects of the white set are the candidates for garbage collection. So, when the garbage collection begins, all objects are white, and the GC visits all the root objects and colors them gray. The roots are the objects that can be directly accessed by the application, which includes global variables and other things on the stack. These objects mostly depend on the Go code of a program. After that, the GC picks a gray object, makes it black, and starts looking at whether that object has pointers to other objects of the white set or not. Therefore, when an object of the gray set is scanned for pointers to other objects, it is colored black. If that scan discovers that this particular object has one or more pointers to a white object, it puts that white object in the gray set. This process keeps going for as long as objects exist in the gray set. After that, the objects in the white set are unreachable and their memory space can be reused. Therefore, at this point, the elements of the white set are said to be garbage collected. Please note that no object can go directly from the black set to the white set, which allows the algorithm to operate and be able to clear the objects on the white set. As mentioned before, no object of the black set can directly point to an object of the white set. Additionally, if an object of the gray set becomes unreachable at some point in a garbage collection cycle, it will not be collected at this garbage collection cycle but in the next one! Although this is not an optimal situation, it is not that bad. During this process, the running application is called the mutator. The mutator runs a small function named write barrier that is executed each time a pointer in the heap is modified. If the pointer of an object in the heap is modified, which means that this object is now reachable, the write barrier colors it gray and puts it in the gray set. The mutator is responsible for the invariant that no element of the black set has a pointer to an element of the white set. This is accomplished with the help of the write barrier function. Failing to accomplish this invariant will ruin the garbage collection process and will most likely crash your program in a pretty bad and undesirable way! So, there are three different colors: black, white, and gray. When the algorithm begins, all objects are colored white. As the algorithm keeps going, white objects are moved into one of the other two sets. The objects that are left in the white set are the ones that are going to be cleared at some point. The next figure displays the three color sets with objects in them. Figure 1: The Go GC represents the heap of a program as a graph In the presented graph, you can see that while object E, which is in the white set, can access object F, it cannot be accessed by any other object because no other object points to object E, which makes it a perfect candidate for garbage collection! Additionally, objects A, B, and C are root objects and are always reachable; therefore, they cannot be garbage collected. Graph comprehended Can you guess what will happen next in that graph? Well, it is not that difficult to realize that the algorithm will have to process the remaining elements of the gray set, which means that both objects A and F will go to the black set. Object A will go to the black set because it is a root element and F will go to the black set because it does not point to any other object while it is in the gray set. After object A is garbage collected, object F will become unreachable and will be garbage collected in the next cycle of the GC because an unreachable object cannot magically become reachable in the next iteration of the garbage collection cycle. Note: The Go garbage collection can also be applied to variables such as channels. When the GC finds out that a channel is unreachable, that is when the channel variable cannot be accessed anymore, it will free its resources even if the channel has not been closed. Go allows you to manually initiate a garbage collection by putting a runtime.GC() statement in your Go code. However, have in mind that runtime.GC() will block the caller and it might block the entire program, especially if you are running a very busy Go program with many objects. This mainly happens because you cannot perform garbage collections while everything else is rapidly changing, as this will not give the GC the opportunity to clearly identify the members of the white, black, and gray sets. This garbage collection status is also called garbage collection safe-point. You can find the long and relatively advanced Go code of the GC at https://github.com/golang/go/blob/master/src/runtime/mgc.go, which you can study if you want to learn even more information about the garbage collection operation. You can even make changes to that code if you are brave enough! Understanding Go Internals: defer, panic() and recover() functions [Tutorial] Implementing hashing algorithms in Golang [Tutorial] Is Golang truly community driven and does it really matter?
Read more
  • 0
  • 0
  • 84250

article-image-functional-programming-c
Packt
07 Jul 2016
4 min read
Save for later

Functional Programming in C#

Packt
07 Jul 2016
4 min read
In this article, we are going to explore the following topics: The introductory of functional programming concept Comparison between functional and imperative approach (For more resources related to this topic, see here.) Introduction to functional programming In functional programming, we use mathematic approach to construct our code. The function we've got in the code has similarity with mathematical function we usually use in our daily basis. The variable in the code function represents the value of the function parameter and it similar to the mathematical function. The idea is a programmer defines the functions, which contain the expression, definition, and also parameters which can be expressed by variable in order to solve the problems. After a programmer builds the function and sends computer the function, it's now computer's turn to do its job. In general, the role of computer is to evaluate the expression in the function and return the result. We can imagine that the computer acts like a calculator since it will analyse the expression from the function and yield the result to the user in printed format. Suppose we have the expression 3 + 5 inside a function. The computer will definitely return 8 as the result just after completely evaluates it. However, it is just the trivial example of the acting of computer in evaluating the expression. In fact, the programmer can increase the ability of the computer by making the complex definition and expression inside the function. Not only can the computer evaluate the trivial expression, but it also can evaluate complex calculation and expression. Comparison to imperative programming The main difference between functional and imperative programming is the existence of side effect. In functional programming, since it applies pure function concept, the side effect is avoided. It's different with imperative programming which has to access I/O and modify state outside the function which will produce side effect. In addition, with an imperative approach, the programmer focuses on the way of performing the task and tracking changes in state while a programmer focuses on the kind of desired information and the kind of required transformation in functional approach. The change of states becomes important in imperative programming while no change of states exist in functional programming. The order of execution is also important in imperative function but not really important in functional programming since we need to concern more on constructing the problem as a set of functions to be execute rather than the detail step of the flow. We will continue our discussion about functional and imperative approach by creating some code in the next topics. Summary We have been acquainted with functional approach so far by discussing the introduction of functional programming. We also have compared the functional approach with mathematical concept in function. It's now clear that functional approach uses the mathematical approach to compose functional program. The comparison between functional and imperative programming has also give us the important point to distinguish the two. It's now clear that in functional programming the programmer focuses on the kind of desired information and the kind of required transformation while in imperative approach the programmer focuses on the way of performing the task and tracking changes in state. For more information on C#, visit the following books: C# 5 First Look (https://www.packtpub.com/application-development/c-5-first-look) C# Multithreaded and Parallel Programming (https://www.packtpub.com/application-development/c-multithreaded-and-parallel-programming) C# 6 and .NET Core 1.0: Modern Cross-Platform Development (https://www.packtpub.com/application-development/c-6-and-net-core-10) Resources for Article: Further resources on this subject: Introduction to Object-Oriented Programming using Python, JavaScript, and C#[article] C# Language Support for Asynchrony[article] C# with NGUI[article]
Read more
  • 0
  • 0
  • 68953
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-concurrency-and-parallelism-in-golang-tutorial
Natasha Mathur
06 Jul 2018
11 min read
Save for later

How Concurrency and Parallelism works in Golang [Tutorial]

Natasha Mathur
06 Jul 2018
11 min read
Computer and software programs are useful because they do a lot of laborious work very fast and can also do multiple things at once. We want our programs to be able to do multiple things simultaneously, and the success of a programming language can depend on how easy it is to write and understand multitasking programs. Concurrency and parallelism are two terms that are bound to come across often when looking into multitasking and are often used interchangeably. However, they mean two distinctly different things. In this article, we will look at how concurrency and parallelism work in Go using simple examples for better understanding. Let's get started! This article is an excerpt from a book 'Distributed Computing with Go' written by V.N. Nikhil Anurag. The standard definitions given on the Go blog are as follows: Concurrency: Concurrency is about dealing with lots of things at once. This means that we manage to get multiple things done at once in a given period of time. However, we will only be doing a single thing at a time. This tends to happen in programs where one task is waiting and the program decides to run another task in the idle time. In the following diagram, this is denoted by running the yellow task in idle periods of the blue task. Parallelism: Parallelism is about doing lots of things at once. This means that even if we have two tasks, they are continuously working without any breaks in between them. In the diagram, this is shown by the fact that the green task is running independently and is not influenced by the red task in any manner: It is important to understand the difference between these two terms. Let's look at a few concrete examples to further elaborate upon the difference between the two. Concurrency Let's look at the concept of concurrency using a simple example of a few daily routine tasks and the way we can perform them. Imagine you start your day and need to get six things done: Make hotel reservation Book flight tickets Order a dress Pay credit card bills Write an email Listen to an audiobook The order in which they are completed doesn't matter, and for some of the tasks, such as  writing an email or listening to an audiobook, you need not complete them in a single sitting. Here is one possible way to complete the tasks: Order a dress. Write one-third of the email. Make hotel reservation. Listen to 10 minutes of audiobook. Pay credit card bills. Write another one-third of the email. Book flight tickets. Listen to another 20 minutes of audiobook. Complete writing the email. Continue listening to audiobook until you fall asleep. In programming terms, we have executed the above tasks concurrently. We had a complete day and we chose particular tasks from our list of tasks and started to work on them. For certain tasks, we even decided to break them up into pieces and work on the pieces between other tasks. We will eventually write a program which does all of the preceding steps concurrently, but let's take it one step at a time. Let's start by building a program that executes the tasks sequentially, and then modify it progressively until it is purely concurrent code and uses goroutines. The progression of the program will be in three steps: Serial task execution. Serial task execution with goroutines. Concurrent task execution. Code overview The code will consist of a set of functions that print out their assigned tasks as completed. In the cases of writing an email or listening to an audiobook, we further divide the tasks into more functions. This can be seen as follows: writeMail, continueWritingMail1, continueWritingMail2 listenToAudioBook, continueListeningToAudioBook Serial task execution Let's first implement a program that will execute all the tasks in a linear manner. Based on the code overview we discussed previously, the following code should be straightforward: package main import ( "fmt" ) // Simple individual tasks func makeHotelReservation() { fmt.Println("Done making hotel reservation.") } func bookFlightTickets() { fmt.Println("Done booking flight tickets.") } func orderADress() { fmt.Println("Done ordering a dress.") } func payCreditCardBills() { fmt.Println("Done paying Credit Card bills.") } // Tasks that will be executed in parts // Writing Mail func writeAMail() { fmt.Println("Wrote 1/3rd of the mail.") continueWritingMail1() } func continueWritingMail1() { fmt.Println("Wrote 2/3rds of the mail.") continueWritingMail2() } func continueWritingMail2() { fmt.Println("Done writing the mail.") } // Listening to Audio Book func listenToAudioBook() { fmt.Println("Listened to 10 minutes of audio book.") continueListeningToAudioBook() } func continueListeningToAudioBook() { fmt.Println("Done listening to audio book.") } // All the tasks we want to complete in the day. // Note that we do not include the sub tasks here. var listOfTasks = []func(){ makeHotelReservation, bookFlightTickets, orderADress, payCreditCardBills, writeAMail, listenToAudioBook, } func main() { for _, task := range listOfTasks { task() } } We take each of the main tasks and start executing them in simple sequential order. Executing the preceding code should produce unsurprising output, as shown here: Done making hotel reservation. Done booking flight tickets. Done ordering a dress. Done paying Credit Card bills. Wrote 1/3rd of the mail. Wrote 2/3rds of the mail. Done writing the mail. Listened to 10 minutes of audio book. Done listening to audio book. Serial task execution with goroutines We took a list of tasks and wrote a program to execute them in a linear and sequential manner. However, we want to execute the tasks concurrently! Let's start by first introducing goroutines for the split tasks and see how it goes. We will only show the code snippet where the code actually changed here: /******************************************************************** We start by making Writing Mail & Listening Audio Book concurrent. *********************************************************************/ // Tasks that will be executed in parts // Writing Mail func writeAMail() { fmt.Println("Wrote 1/3rd of the mail.") go continueWritingMail1() // Notice the addition of 'go' keyword. } func continueWritingMail1() { fmt.Println("Wrote 2/3rds of the mail.") go continueWritingMail2() // Notice the addition of 'go' keyword. } func continueWritingMail2() { fmt.Println("Done writing the mail.") } // Listening to Audio Book func listenToAudioBook() { fmt.Println("Listened to 10 minutes of audio book.") go continueListeningToAudioBook() // Notice the addition of 'go' keyword. } func continueListeningToAudioBook() { fmt.Println("Done listening to audio book.") } The following is a possible output: Done making hotel reservation. Done booking flight tickets. Done ordering a dress. Done paying Credit Card bills. Wrote 1/3rd of the mail. Listened to 10 minutes of audio book. Whoops! That's not what we were expecting. The output from the continueWritingMail1, continueWritingMail2, and continueListeningToAudioBook functions is missing; the reason being that we are using goroutines. Since goroutines are not waited upon, the code in the main function continues executing and once the control flow reaches the end of the main function, the program ends. What we would really like to do is to wait in the main function until all the goroutines have finished executing. There are two ways we can do this—using channels or using WaitGroup.  We'll use WaitGroup now. In order to use WaitGroup, we have to keep the following in mind: Use WaitGroup.Add(int) to keep count of how many goroutines we will be running as part of our logic. Use WaitGroup.Done() to signal that a goroutine is done with its task. Use WaitGroup.Wait() to wait until all goroutines are done. Pass WaitGroup instance to the goroutines so they can call the Done() method. Based on these points, we should be able to modify the source code to use WaitGroup. The following is the updated code: package main import ( "fmt" "sync" ) // Simple individual tasks func makeHotelReservation(wg *sync.WaitGroup) { fmt.Println("Done making hotel reservation.") wg.Done() } func bookFlightTickets(wg *sync.WaitGroup) { fmt.Println("Done booking flight tickets.") wg.Done() } func orderADress(wg *sync.WaitGroup) { fmt.Println("Done ordering a dress.") wg.Done() } func payCreditCardBills(wg *sync.WaitGroup) { fmt.Println("Done paying Credit Card bills.") wg.Done() } // Tasks that will be executed in parts // Writing Mail func writeAMail(wg *sync.WaitGroup) { fmt.Println("Wrote 1/3rd of the mail.") go continueWritingMail1(wg) } func continueWritingMail1(wg *sync.WaitGroup) { fmt.Println("Wrote 2/3rds of the mail.") go continueWritingMail2(wg) } func continueWritingMail2(wg *sync.WaitGroup) { fmt.Println("Done writing the mail.") wg.Done() } // Listening to Audio Book func listenToAudioBook(wg *sync.WaitGroup) { fmt.Println("Listened to 10 minutes of audio book.") go continueListeningToAudioBook(wg) } func continueListeningToAudioBook(wg *sync.WaitGroup) { fmt.Println("Done listening to audio book.") wg.Done() } // All the tasks we want to complete in the day. // Note that we do not include the sub tasks here. var listOfTasks = []func(*sync.WaitGroup){ makeHotelReservation, bookFlightTickets, orderADress, payCreditCardBills, writeAMail, listenToAudioBook, } func main() { var waitGroup sync.WaitGroup // Set number of effective goroutines we want to wait upon waitGroup.Add(len(listOfTasks)) for _, task := range listOfTasks{ // Pass reference to WaitGroup instance // Each of the tasks should call on WaitGroup.Done() task(&waitGroup) } // Wait until all goroutines have completed execution. waitGroup.Wait() } Here is one possible output order; notice how continueWritingMail1 and continueWritingMail2 were executed at the end after listenToAudioBook and continueListeningToAudioBook: Done making hotel reservation. Done booking flight tickets. Done ordering a dress. Done paying Credit Card bills. Wrote 1/3rd of the mail. Listened to 10 minutes of audio book. Done listening to audio book. Wrote 2/3rds of the mail. Done writing the mail. Concurrent task execution In the final output of the previous part, we can see that all the tasks in listOfTasks are being executed in serial order, and the last step for maximum concurrency would be to let the order be determined by Go runtime instead of the order in listOfTasks. This might sound like a laborious task, but in reality this is quite simple to achieve. All we need to do is add the go keyword in front of task(&waitGroup): func main() { var waitGroup sync.WaitGroup // Set number of effective goroutines we want to wait upon waitGroup.Add(len(listOfTasks)) for _, task := range listOfTasks { // Pass reference to WaitGroup instance // Each of the tasks should call on WaitGroup.Done() go task(&waitGroup) // Achieving maximum concurrency } // Wait until all goroutines have completed execution. waitGroup.Wait() Following is a possible output: Listened to 10 minutes of audio book. Done listening to audio book. Done booking flight tickets. Done ordering a dress. Done paying Credit Card bills. Wrote 1/3rd of the mail. Wrote 2/3rds of the mail. Done writing the mail. Done making hotel reservation. If we look at this possible output, the tasks were executed in the following order: Listen to audiobook. Book flight tickets. Order a dress. Pay credit card bills. Write an email. Make hotel reservations. Now that we have a good idea on what concurrency is and how to write concurrent code using goroutines and WaitGroup, let's dive into parallelism. Parallelism Imagine that you have to write a few emails. They are going to be long and laborious, and the best way to keep yourself entertained is to listen to music while writing them, that is, listening to music "in parallel" to writing the emails. If we wanted to write a program that simulates this scenario, the following is one possible implementation: package main import ( "fmt" "sync" "time" ) func printTime(msg string) { fmt.Println(msg, time.Now().Format("15:04:05")) } // Task that will be done over time func writeMail1(wg *sync.WaitGroup) { printTime("Done writing mail #1.") wg.Done() } func writeMail2(wg *sync.WaitGroup) { printTime("Done writing mail #2.") wg.Done() } func writeMail3(wg *sync.WaitGroup) { printTime("Done writing mail #3.") wg.Done() } // Task done in parallel func listenForever() { for { printTime("Listening...") } } func main() { var waitGroup sync.WaitGroup waitGroup.Add(3) go listenForever() // Give some time for listenForever to start time.Sleep(time.Nanosecond * 10) // Let's start writing the mails go writeMail1(&waitGroup) go writeMail2(&waitGroup) go writeMail3(&waitGroup) waitGroup.Wait() } The output of the program might be as follows: Done writing mail #3. 19:32:57 Listening... 19:32:57 Listening... 19:32:57 Done writing mail #1. 19:32:57 Listening... 19:32:57 Listening... 19:32:57 Done writing mail #2. 19:32:57 The numbers represent the time in terms of Hour:Minutes:Seconds and, as can be seen, they are being executed in parallel. You might have noticed that the code for parallelism looks almost identical to the code for the final concurrency example. However, in the function listenForever, we are printing Listening... in an infinite loop. If the preceding example was written without goroutines, the output would keep printing Listening... and never reach the writeMail function calls. Goroutines are concurrent and, to an extent, parallel; however, we should think of them as being concurrent. The order of execution of goroutines is not predictable and we should not rely on them to be executed in any particular order. We should also take care to handle errors and panics in our goroutines because even though they are being executed in parallel, a panic in one goroutine will crash the complete program. Finally, goroutines can block on system calls, however, this will not block the execution of the program nor slow down the performance of the overall program. We looked at how goroutine can be used to run concurrent programs and also learned how parallelism works in Go. If you found this post useful, do check out the book 'Distributed Computing with Go' to learn more about Goroutines, channels and messages, and other concepts in Go. Golang Decorators: Logging & Time Profiling Essential Tools for Go Programming Why is Go the go-to language for cloud-native development? – An interview with Mina Andrawos
Read more
  • 0
  • 0
  • 67251

article-image-introduction-functional-programming
Packt
01 Dec 2016
19 min read
Save for later

Introduction to the Functional Programming

Packt
01 Dec 2016
19 min read
In this article by Wisnu Anggoro, the author of the book, Functional C#, we are going to explore the functional programming by testing it. We will use the power of C# to construct some functional code. We will also deal with the features in C#, which are mostly used in developing functional programs. By the end of this chapter, we will have an idea how the functional approach in C# will be like. Here are the topics we will cover in this chapter: Introduction to functional programming concept Comparing between the functional and imperative approach The concepts of functional programming The advantages and disadvantages of functional programming (For more resources related to this topic, see here.) In functional programming, we write functions without side effects the way we write in Mathematics. The variable in the code function represents the value of the function parameter, and it is similar to the mathematical function. The idea is that a programmer defines the functions that contain the expression, definition, and the parameters that can be expressed by a variable in order to solve problems. After a programmer builds the function and sends the function to the computer, it's the computer's turn to do its job. In general, the role of the computer is to evaluate the expression in the function and return the result. We can imagine that the computer acts like a calculator since it will analyze the expression from the function and yield the result to the user in a printed format. The calculator will evaluate a function which are composed of variables passed as parameters and expressions which forms the body of the function. Variables are substituted by its value in the expression. We can give simple expression and compound expressions using Algebraic operators. Since expression without assignments never alter the value, sub expressions needs to be evaluated only once. Suppose we have the expression 3 + 5 inside a function. The computer will definitely return 8 as the result right after it completely evaluates it. However, this is just a simple example of how the computer acts in evaluating an expression. In fact, a programmer can increase the ability of the computer by creating a complex definition and expression inside the function. Not only can the computer evaluate the simple expression, but it can also evaluate the complex calculation and expression. Understanding definitions, scripts, and sessions As we discuss earlier about  a calculator that will analyze the expression from the function, let's imagine we have a calculator that has a console panel like a computer does. The difference between that and a conventional calculator is that we have to press Enter instead of = (equal to) in order to run the evaluation process of the expression. Here, we can type the expression and then press Enter. Now, imagine that we type the following expression: 3 x 9 Immediately after pressing Enter, the computer will print 27 in the console, and that's what we are expecting. The computer has done a great job of evaluating the expression we gave. Now, let's move to analyzing the following definitions. Imagine that we type them on our functional calculator: square a = a * a max a b = a, if a ≥ b = b, if b > a We have defined the two definitions, square and max. We can call the list of definitions script. By calling the square function followed by any number representing variable a, we will be given the square of that number. Also, in the max definition, we serve two numbers to represent variable a and b, and then the computer will evaluate this expression to find out the biggest number between the variables. By defining these two definitions, we can use them as a function, which we can call session, as follows: square (1 + 2) The computer will definitely print 9 after evaluating the preceding function. The computer will also be able to evaluate the following function: max 1 2 It will return 2 as the result based on the definition we defined earlier. This is also possible if we provide the following expression: square (max 2 5) Then, 25 will be displayed in our calculator console panel. We can also modify a definition using the previous definition. Suppose we want to quadruple an integer number and take advantage of the definition of the square function; here is what we can send to our calculator: quad q = square q * square a quad 10 The first line of the preceding expression is a definition of the quad function. In the second line, we call that function, and we will be provided with 10000 as the result. The script can define the variable value; for instance, take a look at the following: radius = 20 So, we should expect the computer to be able to evaluate the following definition: area = (22 / 7) * square (radius) Understanding the functions for functional programming Functional programming uses a technique of emphasizing functions and their application instead of commands and their execution. Most values in functional programming are function values. Let's take a look at the following mathematical notation: f :: A -> B From the preceding notation, we can say that function f is a relation of each element stated there, which is A and B. We call A, the source type, and B, the target type. In other words, the notation of A à B states that A is an argument where we have to input the value, and B is a return value or the output of the function evaluation. Consider that x denotes an element of A and x + 2 denotes an element of B, so we can create the mathematical notation as follows: f(x) = x + 2 In mathematics, we use f(x) to denote a functional application. In functional programming, the function will be passed an argument and will return the result after the evaluation of the expression. We can construct many definitions for one and the same function. The following two definitions are similar and will triple the input passed as an argument: triple y = y + y + y triple' y = 3 * y As we can see, triple and triple' have different expressions. However, they are the same functions, so we can say that triple = triple'. Although we have many definitions to express one function, we will find that there is only one definition that will prove to be the most efficient in the procedure of evaluation in the sense of the reducing the expression we discussed previously. Unfortunately, we cannot determine which one is the most efficient from our preceding two definitions since that depends on the characteristic of the evaluation mechanism. Forming the definition Now, let's go back to our discussion on definitions at the beginning of this chapter. We have the following definition in order to retrieve the value from the case analysis: max a b = a, if a ≥ b = b, if b > a There are two expressions in this definition, distinguished by a Boolean-value expression. This distinguisher is called guards, and we use them to evaluate the value of True or False. The first line is one of the alternative result values for this function. It states that the return value will be a if the expression a ≥ b is True. In contrast, the function will return value b if the expression b ≥ a is True. Using these two cases, a ≥ b and b ≥ a, the max value depends on the value of a and b. The order of the cases doesn't matter. We can also define the max function using the special word otherwise. This word ensures that the otherwise case will be executed if no expression results in a True value. Here, we will refactor our max function using the word otherwise: max a b = a, if a ≥ b = b, otherwise From the preceding function definition, we can see that if the first expression is False, the function will return b immediately without performing any evaluation. In other words, the otherwise case will always return True if all previous guards return False. Another special word usually used in mathematical notations is where. This word is used to set the local definition for the expression of the function. Let's take a look at the following example: f x y = (z + 2) * (z + 3) where z = x + y In the preceding example, we have a function f with variable z, whose value is determined by x and y. There, we introduce a local z definition to the function. This local definition can also be used along with the case analysis we have discussed earlier. Here is an example of the conjunction local definition with the case analysis: f x y = x + z, if x > 100 = x - z, otherwise where z = triple(y + 3) In the preceding function, there is a local z definition, which qualifies for both x + z and x – z expressions. As we discussed earlier, although the function has two equal to (=) signs, only one expression will return the value. Currying Currying is a simple technique of changing structure arguments by sequence. It will transform a n-ary function into n unary function. It is a technique which was created to circumvent limitations of Lambda functions which are unary functions Let's go back to our max function again and get the following definition: max a b = a, if a ≥ b = b, if b > a We can see that there is no bracket in the max a b function name. Also, there is no comma-separated a and b in the function name. We can add a bracket and a comma to the function definition, as follows: max' (a,b) = a, if a ≥ b = b, if b > a At first glance, we find the two functions to be the same since they have the same expression. However, they are different because of their different type. The max' function has a single argument, which consists of a pair of numbers. The type of max' function can be written as follows: max' :: (num, num) -> num On the other hand, the max function has two arguments. The type of this function can be written as follows: max :: num -> (num -> num) The max function will take a number and then return a function from single number to many numbers. From the preceding max function, we pass the variable a to the max function, which returns a value. Then, that value is compared to variable b in order to find the maximum number. Comparison between functional and imperative programming The main difference between functional and imperative programming is that imperative programming produces side-effects while functional programming doesn't. In Imperative programming, the expressions are evaluated and its resulting value is assigned to variables. So, when we group series of expressions into a function, the resulting value depends upon the state of variables at that point in time. This is called side effects. Because of the continues change in state, the order of evaluation matter. In Functional programming world, destructive assignment is forbidden and each time an assignment happens a new variable is induced. Concepts of functional programming We can also distinguish functional programming over imperative programming by the concepts. The core ideas of Functional programming are encapsulated in the constructs like First Class Functions, Higher Order Functions, Purity, Recursion over Loops, and Partial Functions. We will discuss the concepts in this topic. First-class and higher-order functions In Imperative programming, the given data is more importance and are passed through series of functions (with side effects). Functions are special constructs with its own semantics. In effect, functions do not have the same place as variables and constants. Since a function cannot be passed as parameter or not returned as a result, they are regarded as second class citizens of the programming world. In the functional programming world, we can pass function as a parameter and return function as a result. They obey the same semantics as variables and their values. Thus, they are First Class Citizens. We can also create function of functions called Second Order Function through Composition. There is no limit imposed on the composability of function and they are called Higher Order Functions. Fortunately, the C# language has supported these two concepts since it has a feature called function object, which has types and values. To discuss more details about the function object, let's take a look at the following code: class Program { static void Main(string[] args) { Func<int, int> f = (x) => x + 2; int i = f(1); Console.WriteLine(i); f = (x) => 2 * x + 1; i = f(1); Console.WriteLine(i); } } We can find the code in FuncObject.csproj, and if we run it, it will display the following output on the console screen: Why do we display it? Let's continue the discussion on function types and function values. Hit Ctrl + F5 instead of F5 in order to run the code in debug mode but without the debugger. It's useful to stop the console from closing on the exit. Pure functions In the functional programming, most of the functions do not have side-effects. In other words, the function doesn't change any variables outside the function itself. Also, it is consistent, which means that it always returns the same value for the same input data. The following are example actions that will generate side-effects in programming: Modifying a global variable or static variable since it will make a function interact with the outside world. Modifying the argument in a function. This usually happens if we pass a parameter as a reference. Raising an exception. Taking input and output outside—for instance, get a keystroke from the keyboard or write data to the screen. Although it does not satisfy the rule of a pure function, we will use many Console.WriteLine() methods in our program in order to ease our understanding in the code sample. The following is the sample non-pure function that we can find in NonPureFunction1.csproj: class Program { private static string strValue = "First"; public static void AddSpace(string str) { strValue += ' ' + str; } static void Main(string[] args) { AddSpace("Second"); AddSpace("Third"); Console.WriteLine(strValue); } } If we run the preceding code, as expected, the following result will be displayed on the console: In this code, we modify the strValue global variable inside the AddSpace function. Since it modifies the variable outside, it's not considered a pure function. Let's take a look at another non-pure function example in NonPureFunction2.csproj: class Program { public static void AddSpace(StringBuilder sb, string str) { sb.Append(' ' + str); } static void Main(string[] args) { StringBuilder sb1 = new StringBuilder("First"); AddSpace(sb1, "Second"); AddSpace(sb1, "Third"); Console.WriteLine(sb1); } } We see the AddSpace function again but this time with the addition of an argument-typed StringBuilder argument. In the function, we modify the sb argument with hyphen and str. Since we pass the sb variable by reference, it also modifies the sb1 variable in the Main function. Note that it will display the same output as NonPureFunction2.csproj. To convert the preceding two non-pure function code into pure function code, we can refactor the code to be the following. This code can be found at PureFunction.csproj: class Program { public static string AddSpace(string strSource, string str) { return (strSource + ' ' + str); } static void Main(string[] args) { string str1 = "First"; string str2 = AddSpace(str1, "Second"); string str3 = AddSpace(str2, "Third"); Console.WriteLine(str3); } } Running PureFunction.csproj, we will get the same output compared to the two previous non-pure function code. However, in this pure function code, we have three variables in the Main function. This is because in functional programming, we cannot modify the variable we have initialized earlier. In the AddSpace function, instead of modifying the global variable or argument, it now returns a string value to satisfy the the functional rule. The following are the advantages we will have if we implement the pure function in our code: Our code will be easy to be read and maintain because the function does not depend on external state and variables. It is also designed to perform specific tasks that increase maintainability. The design will be easier to be changed since it is easier to refactor. Testing and debugging will be easier since it's quite easy to isolate the pure function. Recursive functions In imperative programming world, we have got destructive assignment to mutate the state if a variable. By using loops, one can change multiple variables to achieve the computational objective. In Functional programming world, since variable cannot be destructively assigned, we need a Recursive function calls to achieve the objective of looping. Let's create a factorial function. In mathematical terms, the factorial of the nonnegative integer N is the multiplication of all positive integers less than or equal to N. This is usually denoted by N!. We can denote the factorial of 7 as follows: 7! = 7 x 6 x 5 x 4 x 3 x 2 x 1 = 5040 If we look deeper at the preceding formula, we will discover that the pattern of the formula is as follows: N! = N * (N-1) * (N-2) * (N-3) * (N-4) * (N-5) ... Now, let's take a look at the following factorial function in C#. It's an imperative approach and can be found in the RecursiveImperative.csproj file. public partial class Program { private static int GetFactorial(int intNumber) { if (intNumber == 0) { return 1; } return intNumber * GetFactorial(intNumber - 1); } } As we can see, we invoke the GetFactorial() function from the GetFactorial() function itself. This is what we call a recursive function. We can use this function by creating a Main() method containing the following code: public partial class Program { static void Main(string[] args) { Console.WriteLine( "Enter an integer number (Imperative approach)"); int inputNumber = Convert.ToInt32(Console.ReadLine()); int factorialNumber = GetFactorial(inputNumber); Console.WriteLine( "{0}! is {1}", inputNumber, factorialNumber); } } We invoke the GetFactorial() method and pass our desired number to the argument. The method will then multiply our number with what's returned by the GetFactorial() method, in which the argument has been subtracted by 1. The iteration will last until intNumber – 1 is equal to 0, which will return 1. Now, let's compare the preceding recursive function in the imperative approach with one in the functional approach. We will use the power of the Aggregate operator in the LINQ feature to achieve this goal. We can find the code in the RecursiveFunctional.csproj file. The code will look like what is shown in the following: class Program { static void Main(string[] args) { Console.WriteLine( "Enter an integer number (Functional approach)"); int inputNumber = Convert.ToInt32(Console.ReadLine()); IEnumerable<int> ints = Enumerable.Range(1, inputNumber); int factorialNumber = ints.Aggregate((f, s) => f * s); Console.WriteLine( "{0}! is {1}", inputNumber, factorialNumber); } } We initialize the ints variable, which contains a value from 1 to our desired integer number in the preceding code, and then we iterate ints using the Aggregate operator. The output of RecursiveFunctional.csproj will be completely the same compared to the output of RecursiveImperative.csproj. However, we use the functional approach in the code in RecursiveFunctional.csproj. The advantages and disadvantages of functional programming So far, we have had to deal with functional programming by creating code using functional approach. Now, we can look at the advantages of the functional approach, such as the following: The order of execution doesn't matter since it is handled by the system to compute the value we have given rather than the one defined by programmer. In other words, the declarative of the expressions will become unique. Because functional programs have an approach toward mathematical concepts, the system will designed with the notation as close as possible to the mathematical way of concept. Variables can be replaced by their value since the evaluation of expression can be done any time. The functional code is then more mathematically traceable because the program is allowed to be manipulated or transformed by substituting equals with equals. This feature is called Referential Transparency. Immutability makes the functional code free of side-effects. A shared variable, which is an example of a side-effect, is a serious obstacle for creating parallel code and result in non-deterministic execution. By removing the side-effect, we can have a good coding approach. The power of lazy evaluation will make the program run faster because it only provides what we really required for the queries result. Suppose we have a large amount of data and want to filter it by a specific condition, such as showing only the data that contains the word Name. In imperative programming, we will have to evaluate each operation of all the data. The problem is when the operation takes a long time, the program will need more time to run as well. Fortunately, the functional programming that applies LINQ will perform the filtering operation only when it is needed. That's why functional programming will save much of our time using lazy evaluation. We have a solution for complex problems using composability. It is a rule principle that manages a problem by dividing it, and it gives pieces of the problem to several functions. The concept is similar to a situation when we organize an event and ask different people to take up a particular responsibility. By doing this, we can ensure that everything will done properly by each person. Beside the advantages of functional programming, there are several disadvantages as well. Here are some of them: Since there's no state and no update of variables is allowed, loss of performance will take place. The problem occurs when we deal with a large data structure and it needs to perform a duplication of any data even though it only changes a small part of the data. Compared to imperative programming, much garbage will be generated in functional programming due to the concept of immutability, which needs more variables to handle specific assignments. Because we cannot control the garbage collection, the performance will decrease as well. Summary So we have been acquainted with the functional approach by discussing the introduction of functional programming. We also have compared the functional approach to the mathematical concept when we create functional program. It's now clear that the functional approach uses the mathematical approach to compose a functional program. The comparison between functional and imperative programming also led us to the important point of distinguishing the two. It's now clear that in functional programming, the programmer focuses on the kind of desired information and the kind of required transformation, while in the imperative approach, the programmer focuses on the way of performing the task and tracking changes in the state. Resources for Article: Further resources on this subject: Introduction to C# and .NET [article] Why we need Design Patterns? [article] Parallel Computing [article]
Read more
  • 0
  • 0
  • 65539

article-image-setting-gradle-properties-to-build-a-project
Savia Lobo
30 Jul 2018
10 min read
Save for later

Setting Gradle properties to build a project [Tutorial]

Savia Lobo
30 Jul 2018
10 min read
A Gradle script is a program. We use a Groovy DSL to express our build logic. Gradle has several useful built-in methods to handle files and directories as we often deal with files and directories in our build logic. In today's post, we will take a look at how to set Gradle properties in a project build.  We will also see how to use the Gradle Wrapper task to distribute a configurable Gradle with our build scripts. This article is an excerpt taken from, 'Gradle Effective Implementations Guide - Second Edition' written by Hubert Klein Ikkink.  Setting Gradle project properties In a Gradle build file, we can access several properties that are defined by Gradle, but we can also create our own properties. We can set the value of our custom properties directly in the build script and we can also do this by passing values via the command line. The default properties that we can access in a Gradle build are displayed in the following table: NameTypeDefault valueprojectProjectThe project instance.nameStringThe name of the project directory. The name is read-only.pathStringThe absolute path of the project.descriptionStringThe description of the project.projectDirFileThe directory containing the build script. The value is read-only.buildDirFileThe directory with the build name in the directory, containing the build script.rootDirFileThe directory of the project at the root of a project structure.groupObjectNot specified.versionObjectNot specified.antAntBuilderAn AntBuilder instance. The following build file has a task of showing the value of the properties: version = '1.0' group = 'Sample' description = 'Sample build file to show project properties' task defaultProperties << { println "Project: $project" println "Name: $name" println "Path: $path" println "Project directory: $projectDir" println "Build directory: $buildDir" println "Version: $version" println "Group: $project.group" println "Description: $project.description" println "AntBuilder: $ant" } When we run the build, we get the following output: $ gradle defaultProperties :defaultProperties Project: root project 'props' Name: defaultProperties Path: :defaultProperties Project directory: /Users/mrhaki/gradle-book/Code_Files/props Build directory: /Users/mrhaki/gradle-book/Code_Files/props/build Version: 1.0 Group: Sample Description: Sample build file to show project properties AntBuilder: org.gradle.api.internal.project.DefaultAntBuilder@3c95cbbd BUILD SUCCESSFUL Total time: 1.458 secs Defining custom properties in script To add our own properties, we have to define them in an  ext{} script block in a build file. Prefixing the property name with ext. is another way to set the value. To read the value of the property, we don't have to use the ext. prefix, we can simply refer to the name of the property. The property is automatically added to the internal project property as well. In the following script, we add a customProperty property with a String value custom. In the showProperties task, we show the value of the property: // Define new property. ext.customProperty = 'custom' // Or we can use ext{} script block. ext { anotherCustomProperty = 'custom' } task showProperties { ext { customProperty = 'override' } doLast { // We can refer to the property // in different ways: println customProperty println project.ext.customProperty println project.customProperty } } After running the script, we get the following output: $ gradle showProperties :showProperties override custom custom BUILD SUCCESSFUL Total time: 1.469 secs Defining properties using an external file We can also set the properties for our project in an external file. The file needs to be named gradle.properties, and it should be a plain text file with the name of the property and its value on separate lines. We can place the file in the project directory or Gradle user home directory. The default Gradle user home directory is $USER_HOME/.gradle. A property defined in the properties file, in the Gradle user home directory, overrides the property values defined in a properties file in the project directory. We will now create a gradle.properties file in our project directory, with the following contents. We use our build file to show the property values: task showProperties { doLast { println "Version: $version" println "Custom property: $customProperty" } } If we run the build file, we don't have to pass any command-line options, Gradle will use gradle.properties to get values of the properties: $ gradle showProperties :showProperties Version: 4.0 Custom property: Property value from gradle.properties BUILD SUCCESSFUL Total time: 1.676 secs Passing properties via the command line Instead of defining the property directly in the build script or external file, we can use the -P command-line option to add an extra property to a build. We can also use the -P command-line option to set a value for an existing property. If we define a property using the -P command-line option, we can override a property with the same name defined in the external gradle.properties file. The following build script has a showProperties task that shows the value of an existing property and a new property: task showProperties { doLast { println "Version: $version" println "Custom property: $customProperty" } } Let's run our script and pass the values for the existing version property and the non-existent  customProperty: $ gradle -Pversion=1.1 -PcustomProperty=custom showProperties :showProperties Version: 1.1 Custom property: custom BUILD SUCCESSFUL Total time: 1.412 secs Defining properties via system properties We can also use Java system properties to define properties for our Gradle build. We use the -D command-line option just like in a normal Java application. The name of the system property must start with org.gradle.project, followed by the name of the property we want to set, and then by the value. We can use the same build script that we created before: task showProperties { doLast { println "Version: $version" println "Custom property: $customProperty" } } However, this time we use different command-line options to get a result: $ gradle -Dorg.gradle.project.version=2.0 -Dorg.gradle.project.customProperty=custom showProperties :showProperties Version: 2.0 Custom property: custom BUILD SUCCESSFUL Total time: 1.218 secs Adding properties via environment variables Using the command-line options provides much flexibility; however, sometimes we cannot use the command-line options because of environment restrictions or because we don't want to retype the complete command-line options each time we invoke the Gradle build. Gradle can also use environment variables set in the operating system to pass properties to a Gradle build. The environment variable name starts with ORG_GRADLE_PROJECT_ and is followed by the property name. We use our build file to show the properties: task showProperties { doLast { println "Version: $version" println "Custom property: $customProperty" } } Firstly, we set ORG_GRADLE_PROJECT_version and ORG_GRADLE_PROJECT_customProperty environment variables, then we run our showProperties task, as follows: $ ORG_GRADLE_PROJECT_version=3.1 ORG_GRADLE_PROJECT_customProperty="Set by environment variable" gradle showProp :showProperties Version: 3.1 Custom property: Set by environment variable BUILD SUCCESSFUL Total time: 1.373 secs Using the Gradle Wrapper Normally, if we want to run a Gradle build, we must have Gradle installed on our computer. Also, if we distribute our project to others and they want to build the project, they must have Gradle installed on their computers. The Gradle Wrapper can be used to allow others to build our project even if they don't have Gradle installed on their computers. The wrapper is a batch script on the Microsoft Windows operating systems or shell script on other operating systems that will download Gradle and run the build using the downloaded Gradle. By using the wrapper, we can make sure that the correct Gradle version for the project is used. We can define the Gradle version, and if we run the build via the wrapper script file, the version of Gradle that we defined is used. Creating wrapper scripts To create the Gradle Wrapper batch and shell scripts, we can invoke the built-in wrapper task. This task is already available if we have installed Gradle on our computer. Let's invoke the wrapper task from the command-line: $ gradle wrapper :wrapper BUILD SUCCESSFUL Total time: 0.61 secs After the execution of the task, we have two script files—gradlew.bat and gradlew—in the root of our project directory. These scripts contain all the logic needed to run Gradle. If Gradle is not downloaded yet, the Gradle distribution will be downloaded and installed locally. In the gradle/wrapper directory, relative to our project directory, we find the gradle-wrapper.jar and gradle-wrapper.properties files. The gradle-wrapper.jar file contains a couple of class files necessary to download and invoke Gradle. The gradle-wrapper.properties file contains settings, such as the URL, to download Gradle. The gradle-wrapper.properties file also contains the Gradle version number. If a new Gradle version is released, we only have to change the version in the gradle-wrapper.properties file and the Gradle Wrapper will download the new version so that we can use it to build our project. All the generated files are now part of our project. If we use a version control system, then we must add these files to the version control. Other people that check out our project can use the gradlew scripts to execute tasks from the project. The specified Gradle version is downloaded and used to run the build file. If we want to use another Gradle version, we can invoke the wrapper task with the --gradle-version option. We must specify the Gradle version that the Wrapper files are generated for. By default, the Gradle version that is used to invoke the wrapper task is the Gradle version used by the wrapper files. To specify a different download location for the Gradle installation file, we must use the --gradle-distribution-url option of the wrapper task. For example, we could have a customized Gradle installation on our local intranet, and with this option, we can generate the Wrapper files that will use the Gradle distribution on our intranet. In the following example, we generate the wrapper files for Gradle 2.12 explicitly: $ gradle wrapper --gradle-version=2.12 :wrapper BUILD SUCCESSFUL Total time: 0.61 secs Customizing the Gradle Wrapper If we want to customize properties of the built-in wrapper task, we must add a new task to our Gradle build file with the org.gradle.api.tasks.wrapper.Wrapper type. We will not change the default wrapper task, but create a new task with new settings that we want to apply. We need to use our new task to generate the Gradle Wrapper shell scripts and support files. We can change the names of the script files that are generated with the scriptFile property of the Wrapper task. To change the name and location of the generated JAR and properties files, we can change the jarFile property: task createWrapper(type: Wrapper) { // Set Gradle version for wrapper files. gradleVersion = '2.12' // Rename shell scripts name to // startGradle instead of default gradlew. scriptFile = 'startGradle' // Change location and name of JAR file // with wrapper bootstrap code and // accompanying properties files. jarFile = "${projectDir}/gradle-bin/gradle-bootstrap.jar" } If we run the createWrapper task, we get a Windows batch file and shell script and the Wrapper bootstrap JAR file with the properties file is stored in the gradle-bin directory: $ gradle createWrapper :createWrapper BUILD SUCCESSFUL Total time: 0.605 secs $ tree . . ├── gradle-bin │ ├── gradle-bootstrap.jar │ └── gradle-bootstrap.properties ├── startGradle ├── startGradle.bat └── build.gradle 2 directories, 5 files To change the URL from where the Gradle version must be downloaded, we can alter the distributionUrl property. For example, we could publish a fixed Gradle version on our company intranet and use the distributionUrl property to reference a download URL on our intranet. This way we can make sure that all developers in the company use the same Gradle version: task createWrapper(type: Wrapper) { // Set URL with custom Gradle distribution. distributionUrl = 'http://intranet/gradle/dist/gradle-custom- 2.12.zip' } We discussed the Gradle properties and how to use the Gradle Wrapper to allow users to build our projects even if they don't have Gradle installed. We discussed how to customize the Wrapper to download a specific version of Gradle and use it to run our build. If you've enjoyed reading this post, do check out our book 'Gradle Effective Implementations Guide - Second Edition' to know more about how to use Gradle for Java Projects. Top 7 Python programming books you need to read 4 operator overloading techniques in Kotlin you need to know 5 Things you need to know about Java 10
Read more
  • 0
  • 0
  • 64427

article-image-why-golan-is-the-fastest-growing-language-on-github
Sugandha Lahoti
09 Aug 2018
4 min read
Save for later

Why Golang is the fastest growing language on GitHub

Sugandha Lahoti
09 Aug 2018
4 min read
Google’s Go language or alternatively Golang is currently one of the fastest growing programming languages in the software industry. Its speed, simplicity, and reliability make it the perfect choice for all kinds of developers. Now, its popularity has further gained momentum. According to a report, Go is the fastest growing language on GitHub in Q2 of 2018. Go has grown almost 7% overall with a 1.5% change from the previous Quarter. Source: Madnight.github.io What makes Golang so popular? A person was quoted on Reddit saying, “What I would have done in Python, Ruby, C, C# or C++, I'm now doing in Go.” Such is the impact of Go. Let’s see what makes Golang so popular. Go is cross-platform, so you can target an operating system of your choice when compiling a piece of code. Go offers a native concurrency model that is unlike most mainstream programming languages. Go relies on a concurrency model called CSP ( Communicating Sequential Processes). Instead of locking variables to share memory, Golang allows you to communicate the value stored in your variable from one thread to another. Go has a fairly mature package of its own. Once you install Go, you can build production level software that can cover a wide range of use cases from Restful web APIs to encryption software, before needing to consider any third party packages. Go code typically compiles to a single native binary, which basically makes deploying an application written in Go as easy as copying the application file to the destination server. Go is also being rapidly being adopted as the go-to cloud native language and by leading projects like Docker and Ethereum. It’s concurrency feature and easy deployment make it a popular choice for cloud development. Can Golang replace Python? Reddit is abuzz with people sharing their thoughts about whether Golang would replace Python. A user commented that “Writing a utility script is quicker in Go than in Python or JS. Not quicker as in performance, but in terms of raw development speed.” Another Reddit user pointed out three reasons not to use Python in a Reddit discussion, Why are people ditching python for go?: Dynamic compilation of Python can result in errors that exist in code, but they are in fact not detected. CPython really is very slow; very specifically, procedures that are invoked multiple times are not optimized to run more quickly in future runs (like pypy); they always run at the same slow speed. Python has a terrible distribution story; it's really hard to ship all your Python dependencies onto a new system. Go addresses those points pretty sharply. It has a good distribution story with static binaries. It has a repeatable build process, and it's pretty fast. In the same discussion, however, a user nicely sums it up saying, “There is nothing wrong with python except maybe that it is not statically typed and can be a bit slow, which also depends on the use case. Go is the new kid on the block, and while Go is nice, it doesn't have nearly as many libraries as python does. When it comes to stable, mature third-party packages, it can't beat python at the moment.” If you’re still thinking about whether or not to begin coding with Go, here’s a quirky rendition of the popular song Let it Go from Disney’s Frozen to inspire you. Write in Go! Write in Go! Go Cloud is Google’s bid to establish Golang as the go-to language of cloud Writing test functions in Golang [Tutorial] How Concurrency and Parallelism works in Golang [Tutorial]
Read more
  • 0
  • 0
  • 56943
article-image-java-multithreading-synchronize-threads-implement-critical-sections
Fatema Patrawala
30 May 2018
13 min read
Save for later

Java Multithreading: How to synchronize threads to implement critical sections and avoid race conditions

Fatema Patrawala
30 May 2018
13 min read
One of the most common situations in concurrent programming occurs when more than one execution thread shares a resource. In a concurrent application, it is normal for multiple threads to read or write the same data structure or have access to the same file or database connection. These shared resources can provoke error situations or data inconsistency, and we have to implement some mechanism to avoid these errors. These situations are called race conditions and they occur when different threads have access to the same shared resource at the same time. Therefore, the final result depends on the order of the execution of threads, and most of the time, it is incorrect. You can also have problems with change visibility. So if a thread changes the value of a shared variable, the changes would only be written in the local cache of that thread; other threads will not have access to the change (they will only be able to see the old value). We present to you a java multithreading tutorial taken from the book, Java 9 Concurrency Cookbook - Second Edition, written by Javier Fernández González. The solution to these problems lies in the concept of a critical section. A critical section is a block of code that accesses a shared resource and can't be executed by more than one thread at the same time. To help programmers implement critical sections, Java (and almost all programming languages) offers synchronization mechanisms. When a thread wants access to a critical section, it uses one of these synchronization mechanisms to find out whether there is any other thread executing the critical section. If not, the thread enters the critical section. If yes, the thread is suspended by the synchronization mechanism until the thread that is currently executing the critical section ends it. When more than one thread is waiting for a thread to finish the execution of a critical section, JVM chooses one of them and the rest wait for their turn. Java language offers two basic synchronization mechanisms: The  synchronized keyword The  Lock interface and its implementations In this article, we explore the use of synchronized keyword method to perform synchronization mechanism in Java. So let's get started: Synchronizing a method In this recipe, you will learn how to use one of the most basic methods of synchronization in Java, that is, the use of the synchronized keyword to control concurrent access to a method or a block of code. All the synchronized sentences (used on methods or blocks of code) use an object reference. Only one thread can execute a method or block of code protected by the same object reference. When you use the synchronized keyword with a method, the object reference is implicit. When you use the synchronized keyword in one or more methods of an object, only one execution thread will have access to all these methods. If another thread tries to access any method declared with the synchronized keyword of the same object, it will be suspended until the first thread finishes the execution of the method. In other words, every method declared with the synchronized keyword is a critical section, and Java only allows the execution of one of the critical sections of an object at a time. In this case, the object reference used is the own object, represented by the this keyword. Static methods have a different behavior. Only one execution thread will have access to one of the static methods declared with the synchronized keyword, but a different thread can access other non-static methods of an object of that class. You have to be very careful with this point because two threads can access two different synchronized methods if one is static and the other is not. If both methods change the same data, you can have data inconsistency errors. In this case, the object reference used is the class object. When you use the synchronized keyword to protect a block of code, you must pass an object reference as a parameter. Normally, you will use the this keyword to reference the object that executes the method, but you can use other object references as well. Normally, these objects will be created exclusively for this purpose. You should keep the objects used for synchronization private. For example, if you have two independent attributes in a class shared by multiple threads, you must synchronize access to each variable; however, it wouldn't be a problem if one thread is accessing one of the attributes and the other accessing a different attribute at the same time. Take into account that if you use the own object (represented by the this keyword), you might interfere with other synchronized code (as mentioned before, the this object is used to synchronize the methods marked with the synchronized keyword). In this recipe, you will learn how to use the synchronized keyword to implement an application simulating a parking area, with sensors that detect the following: when a car or a motorcycle enters or goes out of the parking area, an object to store the statistics of the vehicles being parked, and a mechanism to control cash flow. We will implement two versions: one without any synchronization mechanisms, where we will see how we obtain incorrect results, and one that works correctly as it uses the two variants of the synchronized keyword. The example of this recipe has been implemented using the Eclipse IDE. If you use Eclipse or a different IDE, such as NetBeans, open it and create a new Java project. How to do it... Follow these steps to implement the example: First, create the application without using any synchronization mechanism. Create a class named ParkingCash with an internal constant and an attribute to store the total amount of money earned by providing this parking service: public class ParkingCash { private static final int cost=2; private long cash; public ParkingCash() { cash=0; } Implement a method named vehiclePay() that will be called when a vehicle (a car or motorcycle) leaves the parking area. It will increase the cash attribute: public void vehiclePay() { cash+=cost; } Finally, implement a method named close() that will write the value of the cash attribute in the console and reinitialize it to zero: public void close() { System.out.printf("Closing accounting"); long totalAmmount; totalAmmount=cash; cash=0; System.out.printf("The total amount is : %d", totalAmmount); } } Create a class named ParkingStats with three private attributes and the constructor that will initialize them: public class ParkingStats { private long numberCars; private long numberMotorcycles; private ParkingCash cash; public ParkingStats(ParkingCash cash) { numberCars = 0; numberMotorcycles = 0; this.cash = cash; } Then, implement the methods that will be executed when a car or motorcycle enters or leaves the parking area. When a vehicle leaves the parking area, cash should be incremented: public void carComeIn() { numberCars++; } public void carGoOut() { numberCars--; cash.vehiclePay(); } public void motoComeIn() { numberMotorcycles++; } public void motoGoOut() { numberMotorcycles--; cash.vehiclePay(); } Finally, implement two methods to obtain the number of cars and motorcycles in the parking area, respectively. Create a class named Sensor that will simulate the movement of vehicles in the parking area. It implements the Runnable interface and has a ParkingStats attribute, which will be initialized in the constructor: public class Sensor implements Runnable { private ParkingStats stats; public Sensor(ParkingStats stats) { this.stats = stats; } Implement the run() method. In this method, simulate that two cars and a motorcycle arrive in and then leave the parking area. Every sensor will perform this action 10 times: @Override public void run() { for (int i = 0; i< 10; i++) { stats.carComeIn(); stats.carComeIn(); try { TimeUnit.MILLISECONDS.sleep(50); } catch (InterruptedException e) { e.printStackTrace(); } stats.motoComeIn(); try { TimeUnit.MILLISECONDS.sleep(50); } catch (InterruptedException e) { e.printStackTrace(); } stats.motoGoOut(); stats.carGoOut(); stats.carGoOut(); } } Finally, implement the main method. Create a class named Main with the main() method. It needs ParkingCash and ParkingStats objects to manage parking: public class Main { public static void main(String[] args) { ParkingCash cash = new ParkingCash(); ParkingStats stats = new ParkingStats(cash); System.out.printf("Parking Simulatorn"); Then, create the Sensor tasks. Use the availableProcessors() method (that returns the number of available processors to the JVM, which normally is equal to the number of cores in the processor) to calculate the number of sensors our parking area will have. Create the corresponding Thread objects and store them in an array: intnumberSensors=2 * Runtime.getRuntime() .availableProcessors(); Thread threads[]=new Thread[numberSensors]; for (int i = 0; i<numberSensors; i++) { Sensor sensor=new Sensor(stats); Thread thread=new Thread(sensor); thread.start(); threads[i]=thread; } Then wait for the finalization of the threads using the join() method: for (int i=0; i<numberSensors; i++) { try { threads[i].join(); } catch (InterruptedException e) { e.printStackTrace(); } } Finally, write the statistics of Parking: System.out.printf("Number of cars: %dn", stats.getNumberCars()); System.out.printf("Number of motorcycles: %dn", stats.getNumberMotorcycles()); cash.close(); } } In our case, we executed the example in a four-core processor, so we will have eight Sensor tasks. Each task performs 10 iterations, and in each iteration, three vehicles enter the parking area and the same three vehicles go out. Therefore, each Sensor task will simulate 30 vehicles. If everything goes well, the final stats will show the following: There are no cars in the parking area, which means that all the vehicles that came into the parking area have moved out Eight Sensor tasks were executed, where each task simulated 30 vehicles and each vehicle was charged 2 dollars each; therefore, the total amount of cash earned was 480 dollars When you execute this example, each time you will obtain different results, and most of them will be incorrect. The following screenshot shows an example: We had race conditions, and the different shared variables accessed by all the threads gave incorrect results. Let's modify the previous code using the synchronized keyword to solve these problems: First, add the synchronized keyword to the vehiclePay() method of the ParkingCash class: public synchronized void vehiclePay() { cash+=cost; } Then, add a synchronized block of code using the this keyword to the close() method: public void close() { System.out.printf("Closing accounting"); long totalAmmount; synchronized (this) { totalAmmount=cash; cash=0; } System.out.printf("The total amount is : %d",totalAmmount); } Now add two new attributes to the ParkingStats class and initialize them in the constructor of the class: private final Object controlCars, controlMotorcycles; public ParkingStats (ParkingCash cash) { numberCars=0; numberMotorcycles=0; controlCars=new Object(); controlMotorcycles=new Object(); this.cash=cash; } Finally, modify the methods that increment and decrement the number of cars and motorcycles, including the synchronized keyword. The numberCars attribute will be protected by the controlCars object, and the numberMotorcycles attribute will be protected by the controlMotorcycles object. You must also synchronize the getNumberCars() and getNumberMotorcycles() methods with the associated reference object: public void carComeIn() { synchronized (controlCars) { numberCars++; } } public void carGoOut() { synchronized (controlCars) { numberCars--; } cash.vehiclePay(); } public void motoComeIn() { synchronized (controlMotorcycles) { numberMotorcycles++; } } public void motoGoOut() { synchronized (controlMotorcycles) { numberMotorcycles--; } cash.vehiclePay(); } Execute the example now and see the difference when compared to the previous version. How it works... The following screenshot shows the output of the new version of the example. No matter how many times you execute it, you will always obtain the correct result: Let's see the different uses of the synchronized keyword in the example: First, we protected the vehiclePay() method. If two or more Sensor tasks call this method at the same time, only one will execute it and the rest will wait for their turn; therefore, the final amount will always be correct. We used two different objects to control access to the car and motorcycle counters. This way, one Sensor task can modify the numberCars attribute and another Sensor task can modify the numberMotorcycles attribute at the same time; however, no two Sensor tasks will be able to modify the same attribute at the same time, so the final value of the counters will always be correct. Finally, we also synchronized the getNumberCars() and getNumberMotorcycles() methods. Using the synchronized keyword, we can guarantee correct access to shared data in concurrent applications. As mentioned at the introduction of this recipe, only one thread can access the methods of an object that uses the synchronized keyword in their declaration. If thread (A) is executing a synchronized method and thread (B) wants to execute another synchronized method of the same object, it will be blocked until thread (A) is finished. But if thread (B) has access to different objects of the same class, none of them will be blocked. When you use the synchronized keyword to protect a block of code, you use an object as a parameter. JVM guarantees that only one thread can have access to all the blocks of code protected with this object (note that we always talk about objects, not classes). We used the TimeUnit class as well. The TimeUnit class is an enumeration with the following constants: DAYS, HOURS, MICROSECONDS, MILLISECONDS, MINUTES, NANOSECONDS, and SECONDS. These indicate the units of time we pass to the sleep method. In our case, we let the thread sleep for 50 milliseconds. There's more... The synchronized keyword penalizes the performance of the application, so you must only use it on methods that modify shared data in a concurrent environment. If you have multiple threads calling a synchronized method, only one will execute them at a time while the others will remain waiting. If the operation doesn't use the synchronized keyword, all the threads can execute the operation at the same time, reducing the total execution time. If you know that a method will not be called by more than one thread, don't use the synchronized keyword. Anyway, if the class is designed for multithreading access, it should always be correct. You must promote correctness over performance. Also, you should include documentation in methods and classes in relation to their thread safety. You can use recursive calls with synchronized methods. As the thread has access to the synchronized methods of an object, you can call other synchronized methods of that object, including the method that is being executed. It won't have to get access to the synchronized methods again. We can use the synchronized keyword to protect access to a block of code instead of an entire method. We should use the synchronized keyword in this way to protect access to shared data, leaving the rest of the operations out of this block and obtaining better performance of the application. The objective is to have the critical section (the block of code that can be accessed only by one thread at a time) as short as possible. Also, avoid calling blocking operations (for example, I/O operations) inside a critical section. We have used the synchronized keyword to protect access to the instruction that updates the number of persons in the building, leaving out the long operations of the block that don't use shared data. When you use the synchronized keyword in this way, you must pass an object reference as a parameter. Only one thread can access the synchronized code (blocks or methods) of this object. Normally, we will use the this keyword to reference the object that is executing the method: synchronized (this) { // Java code } To summarize, we learnt to use the synchronized  keyword method for multithreading in Java to perform synchronization mechasim. You read an excerpt from the book Java 9 Concurrency Cookbook - Second Edition. This book will help you master the art of fast, effective Java development with the power of concurrent and parallel programming. Concurrency programming 101: Why do programmers hang by a thread? How to create multithreaded applications in Qt Getting Inside a C++ Multithreaded Application
Read more
  • 0
  • 0
  • 56785

article-image-why-guido-van-rossum-quit
Amey Varangaonkar
20 Jul 2018
7 min read
Save for later

Why Guido van Rossum quit as the Python chief (BDFL)

Amey Varangaonkar
20 Jul 2018
7 min read
It was the proverbial ‘end of an era’ for Python as Guido van Rossum stepped down as the Python chief, almost 3 decades since he created the programming language. It came as a shock to many Python users, and left a few bewildered. Many core developers thought this day might come, but they didn’t expect it to come so soon. However, looking at the post that Guido shared with the community, does this decision really come as a surprise? In this article, we dive deep into the possibilities and the circumstances that could’ve played a major role in van Rossum’s resignation. *Disclaimer: The views presented in this article are based purely on our research. They are not to be considered as inputs directly received from the Python community or Guido van Rossum himself. What can we make of Guido’s post? I’m pretty sure you’ve already read the mailing list post that Guido shared with the community last week. Aptly titled as ‘Transfer of Power’, the mail straightaway begins on a negative note: “Now that PEP 572 is done, I don't ever want to have to fight so hard for a PEP and find that so many people despise my decisions.” Some way to start a mail. The anger, disappointment and the tiredness is quite evident. Guido goes on to state that he would be removing himself from all the decision-making processes and will be available only for a while as a core developer and a mentor. From the tone of the mail, the three main reasons for his departure can be figured out quite easily: Guido felt there were questions around his decision-making and overall administration capabilities. The backlash on the PEP 572 is a testament to this. van Rossum is 62 now. Maybe the stress of leading this project for close to 30 years has finally taken a toll on his health, as he wryly talked about the piling medical issues. This is also quite evident from the last sentence of his mail: “I'm tired, and need a very long break” Guido thinks this is the right time for the baton to be passed over to the other core committers. He leaves everything for the core developers to figure out - from finalizing the PEPs (Python Enhancement Proposal) to deciding how the new core developers are inducted. Understanding the backlash behind PEP 572 For a mature language such as Python, you’d think there wouldn’t be much left to get excited about. However, a proposal to add a new feature to Python - PEP 572 - has caused a furore in the Python community in the last few months. What PEP 572 is all about The idea behind PEP 572 is quite simple - to allow assignment to variables within expressions. To make things simpler, consider the following lines of code in Python: a = b  - this is a simple assignment statement, while: a == b - this is a test for equality With PEP 572 comes a brand new operator := which is available in some other programming languages, and is an equivalent of the in-expression. So the way you would use this operator would be: while a:=b.read(10): print(a) Looks like a simple statement, isn’t it? Keep printing a while it is in a certain range of b. So what’s all the hue and cry about? In principle, the way := is used signifies that the value of an expression is assigned and returned to whatever code is using it, almost as if no assignment ever happened. This can get really tricky when complex expressions are involved. Ideally, an expression assignment is useful when one needs to retain the result of that expression while it is being used for some other purposes. The use of := is against this best practice, and has therefore led to many disagreements. The community response to PEP 572 Many Python users thought PEP 572 was a bad idea due to the reasons mentioned above. They did not hide their feelings regarding this too. In fact, some of the comments were quite brutal: Even some of the core developers were unhappy with this proposal, saying it did not fit the fundamental Python best practice, i.e. preference for simplicity over complexity. This practice is a part of the PEP 20, titled ‘The Zen of the Python’. As the Python BDFL, van Rossum personally signed off each PEP. This is in stark contrast to how other programming languages such as PHP finalize their proposals, i.e., by voting on them. On the PEP 572 objections, Guido’s response befitted that of a BDFL perfectly: Some developers still disagreed with this proposal, believing that it deviated from the standard best practices and rather reflected van Rossum’s preferred style of coding. So much so that van Rossum had to ask the committers to give him time to respond to the queries. Eventually the PEP 572 was accepted by Guido van Rossum, as he settled the matter with the following note: Thank you all. I will accept the PEP as is. I am happy to accept *clarification* updates to the PEP if people care to submit them as PRs to the peps repo, and that could even (to some extent) include summaries of discussion we've had, or outright rejected ideas. But even without any of those I think the PEP is very clear so I will not wait very long (maybe a week). Normally, in case of some other language, such an argument could have gone on forever, with both the sides reluctant to give in. The progress of the language would be stuck in a limbo as a result of this polarity. With Guido gone now, one cannot help but wonder if this is going to be case with Python going forward. Could van Rossum been pressurized less if he had adopted a consensus-based voting system to sign proposals off too? And if that was the case, would the proposal still have gone through an opposing majority of core developers? “Tired of the hatred” It would be wrong to say that the BDFL quit mainly because of how working on PEP 572 left a bitter taste in his mouth. However, it is fair to say that the negativity surrounding PEP 572 must’ve pushed van Rossum off the ledge finally. The fact that he thinks stepping down from his role as Python chief would mean people would not ‘despise his decisions’ - must’ve played a major role in his announcement. Guido’s decision to quit was rather an inevitable outcome of a series of past bad experiences accrued over the years with backlashes over his decisions on Python’s direction. Leading one of the most successful and long running open source projects in the world is no joke, and it brings more than its fair share of burden to carry. In many ways, CEOs of big tech companies have it easier. For starters, they’ve a lot of funding and they mainly worry about how to make their shareholders happy (make more money). More importantly, they aren’t directly exposed to the end users the way open source leaders are, for every decision they make. What’s next for Guido? Guido van Rossum isn’t going away for good. His mail states that he will still be around as a core dev, and as a mentor to other budding developers for some time. He says just wants to move away from the leadership role, away from all the responsibilities that once made him the BDFL. His tweet corroborates this: https://twitter.com/gvanrossum/status/1017546023227424768?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet Call him a dictator if you will, his contributions to Python cannot be taken away. From being a beginner’s coding language to being used in enterprise applications - Python’s rise under Van Rossum as one of the most popular and versatile programming languages in the world has been incredible. Perhaps the time was right for the sun to set, and the PEP 572 scenario and the circumstances surrounding it might just have given Guido the platform to ride away into the sunset. Read more Python founder resigns – Guido van Rossum goes ‘on a permanent vacation from being BDFL’ Top 7 Python programming books you need to read Python, Tensorflow, Excel and more – Data professionals reveal their top tools
Read more
  • 0
  • 2
  • 56495

article-image-will-rust-replace-c
Aaron Lazar
26 Jul 2018
6 min read
Save for later

Will Rust Replace C++?

Aaron Lazar
26 Jul 2018
6 min read
This question has been asked several times, showing that developers like yourself want to know whether Rust will replace the good old, painfully difficult to program, C++. Let’s find out, shall we? Going with the trends If I compare both Rust vs C++ on Google Trends, this is what I get. C++ beats Rust to death. Each one of C++’s troughs are like daggers piercing through Rust, pinning it down to the floor! C++ seems to have it’s own ups and downs, but it’s maintaining a pretty steady trend, over the past 5 years. Now if I knock C++ out of the way, this is what I get, That’s a pretty interesting trend there! I’d guess it’s about a 25 degree slope there. Never once has Rust seen a major dip in it’s gradual rise to fame. But what’s making it grow that well? What Developers Love and Why Okay, if you’re in a mood for funsies, try this out at your workplace: Assemble your team members in a room and then tell them there’s a huge project coming up. Tell them that the requirements state that it’s to be developed in Rust. You might find 78.9% of them beaming! Give it a few moments, then say you’re sorry and that you actually meant C++. Watch those smiles go right out the window! ;) You might wonder why I used the very odd percentage, 78.9%. Well, that’s just the percentage of developers who love Rust, as per the 2018 StackOverflow survey. Now this isn’t something that happened overnight, as Rust topped the charts even in 2017, with 73.1% respondents loving the language. You want me to talk about C++ too? Okay, if you insist, where is it? Ahhhhh… there it is!!! C++ coming up at 4th place…. from the bottom! So why this great love for Rust and this not so great love for C++? C++ is a great language, you get awesome performance, you can build super fast applications with its rich function library. You can build a wide variety of applications from GUI apps to 3D graphics, games, desktop apps, as well as hard core computer vision applications. On the other hand, Rust is pretty fast too. It can be used just about anywhere C++ can be used. It has a superb community and most of all, it’s memory safe! Rust’s concurrency capabilities have often been hailed as being superior to C++, and developers all around are eager to get their hands on Rust for this feature! Wondering how I know? I have access to a dashboard that puts a smile on my face, everytime I check the sales of Hands-On Concurrency with Rust! ;) You should get the book too, you know. Coming back to our discussion, Rust’s build and dependency injection tool, Cargo, is a breeze to work with. Why Rust is a winner When compared with C++, the main advantage of using Rust is safety. C++ doesn’t protect its own abstractions, and so, doesn’t allow programmers to protect theirs either. Rust on the other hand, does both. If you make a mistake in C++, your program will technically have no meaning, which can result in arbitrary behavior. Unlike C++, Rust protects you from such dangers, so you can instead concentrate on solving problems. If you’re already a C++ programmer, Rust will allow you to be more effective, while allowing those with little to no low level programming experience, to create things they might not have been capable of doing before. Mozilla was very wise in creating Rust, and the reason behind it was that they wanted web developers to have a practical and efficient language at hand, should they need to write low level code. Kudos to Mozilla! Now back to the question - Will Rust replace C++? Should C++ really worry about Rust replacing it someday? Honestly speaking, I think it has a pretty good shot at replacing C++. Rust is much better in several aspects, like memory safety, concurrency and it lets you think more carefully about memory usage and pointers. Rust will make you a better and more efficient programmer. The transition is already happening in various fields. In game development, for example, AAA game studio, At Dawn Studios is switching entirely to Rust, after close to 3 decades of using C++. That’s a pretty huge step, considering there might be a lot of considerations and workarounds to figure out. But if you look at the conversations on Twitter, the Rust team is delighted at this move and is willing to offer any kind of support if need be. Don’t you just want to give the Rust team a massive bear hug? IoT is another booming field, where Rust is finding rapid adoption. Hardware makers like Tessel provide support for Rust already. In terms of security, Microsoft created an open source repo on github, for an IoT Edge Security Daemon, written entirely in Rust. Rust seems to be doing pretty well in the GUI department too, with tools like Piston. In fact, you might also find Rust being used along with popular GUI framework, Qt. All this shows that Rust is seriously growing in adoption. While I say it might eventually be the next C++, it’s probably going to take years for that to happen. This is mainly because entire ecosystems are built on C++ and they will continue to be. Today there are many dead programming languages whose applications still live on and breed newer generations of developers. (I’m looking at you, COBOL!) In this world of Polyglotism, if that’s even a word, the bigger question we should be asking is how much will we benefit if both C++ and Rust are implemented together. There is definitely a strong case for C++ developers to learn Rust. The question then really is: Do you want to be a programmer working in mature industries and projects or do you want to be a code developer working at the cutting edge of technological progress? I’ll flip the original question and pose it to you: Will you replace C++ with Rust? Perform Advanced Programming with Rust Learn a Framework; forget the language! Firefox 61 builds on Firefox Quantum, adds Tab Warming, WebExtensions, and TLS 1.3  
Read more
  • 0
  • 5
  • 54885
article-image-how-to-dockerize-asp-net-core-application
Aaron Lazar
27 Apr 2018
5 min read
Save for later

How to dockerize an ASP.NET Core application

Aaron Lazar
27 Apr 2018
5 min read
There are many reasons why you might want to dockerize an ASP.NET Core application. But ultimately, it's simply going to make life much easier for you. It's great for isolating components, especially if you're building a microservices or planning to deploy your application on the cloud. So, if you want an easier life (possibly) follow this tutorial to learn how to dockerize an ASP.NET Core application. Get started: Dockerize an ASP.NET Core application Create a new ASP.NET Core Web Application in Visual Studio 2017 and click OK: On the next screen, select Web Application (Model-View-Controller) or any type you like, while ensuring that ASP.NET Core 2.0 is selected from the drop-down list. Then check the Enable Docker Support checkbox. This will enable the OS drop-down list. Select Windows here and then click on the OK button: If you see the following message, you need to switch to Windows containers. This is because you have probably kept the default container setting for Docker as Linux: If you right-click on the Docker icon in the taskbar, you will see that you have an option to enable Windows containers there too. You can switch to Windows containers from the Docker icon in the taskbar by clicking on the Switch to Windows containers option: Switching to Windows containers may take several minutes to complete, depending on your line speed and the hardware configuration of your PC.If, however, you don't click on this option, Visual Studio will ask you to change to Windows containers when selecting the OS platform as Windows.There is a good reason that I am choosing Windows containers as the target OS. This reason will become clear later on in the chapter when working with Docker Hub and automated builds. After your ASP.NET Core application is created, you will see the following project setup in Solution Explorer: The Docker support that is added to Visual Studio comes not only in the form of the Dockerfile, but also in the form of the Docker configuration information. This information is contained in the global docker-compose.yml file at the solution level: 3. Clicking on the Dockerfile in Solution Explorer, you will see that it doesn't look complicated at all. Remember, the Dockerfile is the file that creates your image. The image is a read-only template that outlines how to create a Docker container. The Dockerfile, therefore, contains the steps needed to generate the image and run it. The instructions in the Dockerfile create layers in the image. This means that if anything changes in the Dockerfile, only the layers that have changed will be rebuilt when the image is rebuilt. The Dockerfile looks as follows: FROM microsoft/aspnetcore:2.0-nanoserver-1709 AS base WORKDIR /app EXPOSE 80 FROM microsoft/aspnetcore-build:2.0-nanoserver-1709 AS build WORKDIR /src COPY *.sln ./ COPY DockerApp/DockerApp.csproj DockerApp/ RUN dotnet restore COPY . . WORKDIR /src/DockerApp RUN dotnet build -c Release -o /app FROM build AS publish RUN dotnet publish -c Release -o /app FROM base AS final WORKDIR /app COPY --from=publish /app . ENTRYPOINT ["dotnet", "DockerApp.dll"] When you have a look at the menu in Visual Studio 2017, you will notice that the Run button has been changed to Docker: Clicking on the Docker button to debug your ASP.NET Core application, you will notice that there are a few things popping up in the Output window. Of particular interest is the IP address at the end. In my case, it reads Launching http://172.24.12.112 (yours will differ): When the browser is launched, you will see that the ASP.NET Core application is running at the IP address listed previously in the Output window. Your ASP.NET Core application is now running inside of a Windows Docker container: This is great and really easy to get started with. But what do you need to do to Dockerize an ASP.NET Core application that already exists? As it turns out, this isn't as difficult as you may think. How to add Docker support to an existing .NET Core application Imagine that you have an ASP.NET Core application without Docker support. To add Docker support to this existing application, simply add it from the context menu: To add Docker support to an existing ASP.NET Core application, you need to do the following: Right-click on your project in Solution Explorer Click on the Add menu item Click on Docker Support in the fly-out menu: Visual Studio 2017 now asks you what the target OS is going to be. In our case, we are going to target Windows: After clicking on the OK button, Visual Studio 2017 will begin to add the Docker support to your project: It's actually extremely easy to create ASP.NET Core applications that have Docker support baked in, and even easier to add Docker support to existing ASP.NET Core applications. Lastly, if you experience any issues, such as file access issues, ensure that your antivirus software has excluded your Dockerfile from scanning. Also, make sure that you run Visual Studio as Administrator. This tutorial has been taken from C# 7 and .NET Core Blueprints. More Docker tutorials Building Docker images using Dockerfiles How to install Keras on Docker and Cloud ML
Read more
  • 0
  • 0
  • 53751

article-image-using-gerrit-github
Packt
04 Sep 2013
14 min read
Save for later

Using Gerrit with GitHub

Packt
04 Sep 2013
14 min read
In this article by Luca Milanesio, author of the book Learning Gerrit Code review, we will learn about Gerrit Code revew. GitHub is the world's largest platform for the free hosting of Git Projects, with over 4.5 million registered developers. We will now provide a step-by-step example of how to connect Gerrit to an external GitHub server so as to share the same set of repositories. Additionally, we will provide guidance on how to use the Gerrit Code Review workflow and GitHub concurrently. By the end of this article we will have our Gerrit installation fully integrated and ready to be used for both open source public projects and private projects on GitHub. (For more resources related to this topic, see here.) GitHub workflow GitHub has become the most popular website for open source projects, thanks to the migration of some major projects to Git (for example, Eclipse) and new projects adopting it, along with the introduction of the social aspect of software projects that piggybacks on the Facebook hype. The following diagram shows the GitHub collaboration model: The key aspects of the GitHub workflow are as follows: Each developer pushes to their own repository and pulls from others Developers who want to make a change to another repository, create a fork on GitHub and work on their own clone When forked repositories are ready to be merged, pull requests are sent to the original repository maintainer The pull requests include all of the proposed changes and their associated discussion threads Whenever a pull request is accepted, the change is merged by the maintainer and pushed to their repository on GitHub   GitHub controversy The preceding workflow works very effectively for most open source projects; however, when the projects gets bigger and more complex, the tools provided by GitHub are too unstructured, and a more defined review process with proper tools, additional security, and governance is needed. In May 2012 Linus Torvalds , the inventor of Git version control, openly criticized GitHub as a commit editing tool directly on the pull request discussion thread: " I consider GitHub useless for these kinds of things. It's fine for hosting, but the pull requests and the online commit editing, are just pure garbage " and additionally, " the way you can clone a (code repository), make changes on the web, and write total crap commit messages, without GitHub in any way making sure that the end result looks good. " See https://github.com/torvalds/linux/pull/17#issuecomment-5654674. Gerrit provides the additional value that Linus Torvalds claimed was missing in the GitHub workflow: Gerrit and GitHub together allows the open source development community to reuse the extended hosting reach and social integration of GitHub with the power of governance of the Gerrit review engine. GitHub authentication The list of authentication backends supported by Gerrit does not include GitHub and it cannot be used out of the box, as it does not support OpenID authentication. However, a GitHub plugin for Gerrit has been recently released in order to fill the gaps and allow a seamless integration. GitHub implements OAuth 2.0 for allowing external applications, such as Gerrit, to integrate using a three-step browser-based authentication. Using this scheme, a user can leverage their existing GitHub account without the need to provision and manage a separate one in Gerrit. Additionally, the Gerrit instance will be able to self-provision the SSH public keys needed for pushing changes for review. In order for us to use GitHub OAuth authentication with Gerrit, we need to do the following: Build the Gerrit GitHub plugin Install the GitHub OAuth filter into the Gerrit libraries (/lib under the Gerrit site directory) Reconfigure Gerrit to use the HTTP authentication type   Building the GitHub plugin The Gerrit GitHub plugin can be found under the Gerrit plugins/github repository on https://gerrit-review.googlesource.com/#/admin/projects/plugins/github. It is open source under the Apache 2.0 license and can be cloned and built using the Java 6 JDK and Maven. Refer to the following example: $ git clone https://gerrit.googlesource.com/plugins/github $ cd github $ mvn install […] [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------- [INFO] Total time: 9.591s [INFO] Finished at: Wed Jun 19 18:38:44 BST 2013 [INFO] Final Memory: 12M/145M [INFO] ------------------------------------------------------- The Maven build should generate the following artifacts: github-oauth/target/github-oauth*.jar, the GitHub OAuth library for authenticating Gerrit users github-plugin/target/github-plugin*.jar, the Gerrit plugin for integrating with GitHub repositories and pull requests Installing GitHub OAuth library The GitHub OAuth JAR file needs to copied to the Gerrit /lib directory; this is required to allow Gerrit to use it for filtering all HTTP requests and enforcing the GitHub three-step authentication process: $ cp github-oauth/target/github-oauth-*.jar /opt/gerrit/lib/ Installing GitHub plugin The GitHub plugin includes the additional support for the overall configuration, the advanced GitHub repositories replication, and the integration of pull requests into the Code Review process. We now need to install the plugin before running the Gerrit init again so that we can benefit from the simplified automatic configuration steps: $ cp github-plugin/target/github-plugin-*.jar /opt/gerrit/plugins/github.jar Register Gerrit as a GitHub OAuth application Before going through the Gerrit init, we need to tell GitHub to trust Gerrit as a partner application. This is done through the generation of a ClientId/ClientSecret pair associated to the exact Gerrit URLs that will be used for initiating the 3-step OAuth authentication. We can register a new application in GitHub through the URL https://github.com/settings/applications/new, where the following three fields are requested: Application name : It is the logical name of the application authorized to access GitHub, for example, Gerrit. Main URL : The Gerrit canonical web URL used for redirecting to GitHub OAuth authentication, for example, https://myhost.mydomain:8443. Callback URL : The URL that GitHub should redirect to when the OAuth authentication is successfully completed, for example, https://myhost.mydomain:8443/oauth. GitHub will automatically generate a unique pair ClientId/ClientSecret that has to be provided to Gerrit identifying them as a trusted authentication partner. ClientId/ClientSecret are not GitHub credentials and cannot be used by an interactive user to access any GitHub data or information. They are only used for authorizing the integration between a Gerrit instance and GitHub. Running Gerrit init to configure GitHub OAuth We now need to stop Gerrit and go through the init steps again in order to reconfigure the Gerrit authentication. We need to enable HTTP authentication by choosing an HTTP header to be used to verify the user's credentials, and to go through the GitHub settings wizard to configure the OAuth authentication. $ /opt/gerrit/bin/gerrit.sh stop Stopping Gerrit Code Review: OK $ cd /opt/gerrit $ java -jar gerrit.war init [...] *** User Authentication *** Authentication method []: HTTP RETURN Get username from custom HTTP header [Y/n]? Y RETURN Username HTTP header []: GITHUB_USER RETURN SSO logout URL : /oauth/reset RETURN *** GitHub Integration *** GitHub URL [https://github.com]: RETURN Use GitHub for Gerrit login ? [Y/n]? Y RETURN ClientId []: 384cbe2e8d98192f9799 RETURN ClientSecret []: f82c3f9b3802666f2adcc4 RETURN Initialized /opt/gerrit $ /opt/gerrit/bin/gerrit.sh start Starting Gerrit Code Review: OK   Using GitHub login for Gerrit Gerrit is now fully configured to register and authenticate users through GitHub OAuth. When opening the browser to access any Gerrit web pages, we are automatically redirected to the GitHub for login. If we have already visited and authenticated with GitHub previously, the browser cookie will be automatically recognized and used for the authentication, instead of presenting the GitHub login page. Alternatively, if we do not yet have a GitHub account, we create a new GitHub profile by clicking on the SignUp button. Once the authentication process is successfully completed, GitHub requests the user's authorization to grant access to their public profile information. The following screenshot shows GitHub OAuth authorization for Gerrit: The authorization status is then stored under the user's GitHub applications preferences on https://github.com/settings/applications. Finally, GitHub redirects back to Gerrit propagating the user's profile securely using a one-time code which is used to retrieve the full data profile including username, full name, e-mail, and associated SSH public keys. Replication to GitHub The next steps in the Gerrit to GitHub integration is to share the same Git repositories and then keep them up-to-date; this can easily be achieved by using the Gerrit replication plugin. The standard Gerrit replication is a master-slave, where Gerrit always plays the role of the master node and pushes to remote slaves. We will refer to this scheme as push replication because the actual control of the action is given to Gerrit through a git push operation of new commits and branches. Configure Gerrit replication plugin In order to configure push replication we need to enable the Gerrit replication plugin through Gerrit init: $ /opt/gerrit/bin/gerrit.sh stop Stopping Gerrit Code Review: OK $ cd /opt/gerrit $ java -jar gerrit.war init [...] *** Plugins *** Prompt to install core plugins [y/N]? y RETURN Install plugin reviewnotes version 2.7-rc4 [y/N]? RETURN Install plugin commit-message-length-validator version 2.7-rc4 [y/N]? RETURN Install plugin replication version 2.6-rc3 [y/N]? y RETURN Initialized /opt/gerrit $ /opt/gerrit/bin/gerrit.sh start Starting Gerrit Code Review: OK The Gerrit replication plugin relies on the replication.config file under the /opt/gerrit/etc directory to identify the list of target Git repositories to push to. The configuration syntax is a standard .ini format where each group section represents a target replica slave. See the following simplest replication.config script for replicating to GitHub: [remote "github"] url = git@github.com:myorganisation/${name}.git The preceding configuration enables all of the repositories in Gerrit to be replicated to GitHub under the myorganisa tion GitHub Team account. Authorizing Gerrit to push to GitHub Now, that Gerrit knows where to push, we need GitHub to authorize the write operations to its repositories. To do so, we need to upload the SSH public key of the underlying OS user where Gerrit is running to one of the accounts in the GitHub myorganisation team, with the permissions to push to any of the GitHub repositories. Assuming that Gerrit runs under the OS user gerrit, we can copy and paste the SSH public key values from the ~gerrit/.ssh/id_rsa.pub (or ~gerrit/.ssh/id_dsa.pub) to the Add an SSH Key section of the GitHub account under target URL to be set to: https://github.com/settings/ssh Start working with Gerrit replication Everything is now ready to start playing with Gerrit to GitHub replication. Whenever a change to a repository is made on Gerrit, it will be automatically replicated to the corresponding GitHub repository. In reality there is one additional operation that is needed on the GitHub side: the actual creation of the empty repositories using https://github.com/new associated to the ones created in Gerrit. We need to make sure that we select the organization name and repository name, consistent with the ones defined in Gerrit and in the replication.config file. Never initialize the repository from GitHub with an empty commit or readme file; otherwise the first replication attempt from Gerrit will result in a conflict and will then fail. Now GitHub and Gerrit are fully connected and whenever a repository in GitHub matches one of the repositories in Gerrit, it will be linked and synchronized with the latest set of commits pushed in Gerrit. Thanks to the Gerrit-GitHub authentication previously configured, Gerrit and GitHub share the same set of users and the commits authors will be automatically recognized and formatted by GitHub. The following screenshot shows Gerrit commits replicated to GitHub: Reviewing and merging to GitHub branches The final goal of the Code Review process is to agree and merge changes to their branches. The merging strategies need to be aligned with real-life scenarios that may arise when using Gerrit and GitHub concurrently. During the Code Review process the alignment between Gerrit and GitHub was at the change level, not influenced by the evolution of their target branches. Gerrit changes and GitHub pull requests are isolated branches managed by their review lifecycle. When a change is merged, it needs to align with the latest status of its target branch using a fast-forward, merge, rebase, or cherry-pick strategy. Using the standard Gerrit merge functionality, we can apply the configured project merge strategy to the current status of the target branch on Gerrit. The situation on GitHub may have changed as well, so even if the Gerrit merge has succeeded there is no guarantee that the actual subsequent synchronization to GitHub will do the same! The GitHub plugin mitigates this risk by implementing a two-phase submit + merge operation for merging opened changes as follows: Phase-1 : The change target branch is checked against its remote peer on GitHub and fast forwarded if needed. If two branches diverge, the submit + merge is aborted and manual merge intervention is requested. Phase-2 : The change is merged on its target branch in Gerrit and an additional ad hoc replication is triggered. If the merge succeeds then the GitHub pull request is marked as completed. At the end of Phase-2 the Gerrit and GitHub statuses will be completely aligned. The pull request author will then receive the notification that his/her commit has been merged. Using Gerrit and GitHub on http://gerrithub.io When using Gerrit and GitHub on the web with public or private repositories, all of the commits are replicated from Gerrit to GitHub, and each one of them has a complete copy of the data. If we are using a Git and collaboration server on GitHub over the Internet, why can't we do the same for its Gerrit counterpart? Can we avoid installing a standalone instance of Gerrit just for the purpose of going through a formal Code Review? One hassle-free solution is to use the GerritHub service (http://gerrithub.io), which offers a free Gerrit instance on the cloud already configured and connected with GitHub through the github-plugin and github-oauth authentication library. All of the flows that we have covered in this article are completely automated, including the replication and automatic pull request to change automation. As accounts are shared with GitHub, we do not need to register or create another account to use GerritHub; we can just visit http://gerrithub.io and start using Gerrit Code Review with our existing GitHub projects without having to teach our existing community about a new tool. GerritHub also includes an initial setup Wizard for the configuration and automation of the Gerrit projects and the option to configure the Gerrit groups using the existing GitHub. Once Gerrit is configured, the Code Review and GitHub can be used seamlessly for achieving maximum control and social reach within your developer community. Summary We have now integrated our Gerrit installation with GitHub authentication for a seamless Single-Sign-On experience. Using an existing GitHub account we started using Gerrit replication to automatically mirror all the commits to GitHub repositories, allowing our projects to have an extended reach to external users, free to fork our repositories, and to contribute changes as pull requests. Finally, we have completed our Code Review in Gerrit and managed the merge to GitHub with a two-phase change submit + merge process to ensure that the target branches on both Gerrit and GitHub have been merged and aligned accordingly. Similarly to GitHub, this Gerrit setup can be leveraged for free on the web without having to manage a separate private instance, thanks to the free set target URL to http://gerrithub.io service available on the cloud. Resources for Article : Further resources on this subject: Getting Dynamics NAV 2013 on Your Computer – For (Almost) Free [Article] Building Your First Zend Framework Application [Article] Quick start - your first Sinatra application [Article]
Read more
  • 0
  • 1
  • 51232