论文部分内容阅读
Abstract Many science and engineering applications involve solving a linear least?squares system formed from some field measurements. In the distributed cyber?physical systems (CPS), each sensor node used for measurement often only knows partial independent rows of the least?squares system. To solve the least?squares all the measurements must be gathered at a centralized location and then perform the computation. Such data collection and computation are inefficient because of bandwidth and time constraints and sometimes are infeasible because of data privacy concerns. Iterative methods are natural candidates for solving the aforementioned problem and there are many studies regarding this. However, most of the proposed solutions are related to centralized/parallel computations while only a few have the potential to be applied in distributed networks. Thus distributed computations are strongly preferred or demanded in many of the real world applications, e.g. smart?grid, target tracking, etc. This paper surveys the representative iterative methods for distributed least?squares in networks.
Keywords distributed computing; iterative methods; least?squares; mesh network
1 Introduction
Many physical phenomena can be described by partial differential equations [1] which forms large sparse system of linear equations after further discretized. Problems, such as state estimation, target tracking and tomography inversion, are often formulated as a large?scale linear system based on some field measurements. Those field measurements may contain errors, thus an extra amount of measurement is often sampled to form an over?determined linear system:
where , and . Such extra information can smooth out the errors but produces an overdetermined system that usually has no exact solution. The method of least?squares is a common solution to the above problem and can be defined as
The coefficient A is often modeled from the data obtain from sensors used for observing the physical phenomena such as cyber physical system. Each sensor or node observes partial phenomena due to the spatial and temporal restriction and thus only forms partial rows of the least?squares systems. The large?scale cyber?physical systems are often built on a mesh network, which could be a wired, wireless or wired?wireless hybrid multi?hop network. For instance, the problems from target tracking, seismic tomography and smart grid state estimation all have an inherently distributed system of linear equations. However, the least squares method used currently for solving these problems assumes a centralized setup, where partial row information from all the nodes are collected in a server and then solved using the centralized least?square algorithm. In many of those cyber?physical systems, the distributed computation in mesh networks is strongly demanded or preferred over the centralized computation approach, due to the following reasons (but not limited to):
1) In some applications such as imaging seismic tomography with the aid of mesh network, the real?time data retrieval from a large?scale seismic mesh network into a central server is virtually impossible due to the sheer amount of data and resource limitations. The distributed computation may process data inside the network in real time to reduce the bandwidth demand as well as distribute the communication and computation load to each node in the network.
2) The mesh network may be disruptive in real world and the data collection and centralized computation may suffer from node failure or link disruption. These become a bottleneck especially when the node failure or link disruption happens near to the sink node which leads to loss of high volume of raw data. However, with distributed computation, the remaining nodes in the network can finish the computation and get the approximated results.
3) In smart grid state estimation, the data collection for centralized computation is even infeasible due to data privacy concerns or inter?agency policy constraints.
4) In some applications that need real?time control, the distributed computation also has advantage over centralized schemes, since some decisions can be made locally in real time. The current state of the art computational device such as smart phones enables us to perform in?network computing and carry out distributed computation over a mesh network.
Iterative methods are natural candidates when it comes for large sparse system and especially for distributed computation of least?squares. Although there are a lot of studies on iterative least?squares, most of them are concerned with the efficiency of centralized/parallel computation, and only a few are explicitly about distributed computation or have the potential to be applied on mesh networks. In mesh networks, since the computers need to communicate with each other through messages passing over a multi?hop network, the key challenges are speeding up the computation and reducing the communication cost. More attention shall be paid to communication instead of the computation cost, especially when solving a big problem in a large?scale mesh network.
In this paper, we select and survey the representative iterative methods from several research communities. These methods have the potential to be used in solving least squares problem over mesh networks. Here, a skeleton sketch of each algorithm is provided and later we analyze the time?to?completion and communication cost of these algorithms and provide the comparison. Some of the algorithms presented here were not originally designed for meeting our requirements, so we slightly modify them to maintain consistency. The rest of the paper is organized as follows. In section 2, we present the network model and the evaluation criteria for comparison. Then in section 3, we describe the state of art on surveyed algorithms in details, analyze and compare their communication costs and time?to?completion. Finally, we conclude the paper in section 4.
2 Model and Assumption
We denote a wired and/or wireless mesh network with nodes which form connected graph and can be reached through multi?hop message relays. Without loss of generality, we assume that the diameter of the network is (i.e., any message can be sent from one node to another through at most hops). We also assume that each node has a single radio and the link between the neighboring nodes has a unit bandwidth. Therefore, the communication delay of one unit data delivery between direct neighbors (either through a unicast to one direct neighbor or multicast/broadcast to all direct neighbors) would be one unit time. It is noted that the link layer supports broadcast, which is often true in many mesh networks. If the link layer only supports unicast, the analysis can be similarly done by considering a one?hop broadcast as multiple unicast and we skip this analysis as it is trivial. We also understand that the link layer communication may take more than one unit time for one unit data due to network interference and media contentions. Therefore, we classify the communication patterns in a mesh network into three categories, unicast (one?hop or multi?hop), one?hop broadcast (local broadcast to all neighbors) and network flooding (broadcast to all the nodes in network). For simplicity and convenience, in the rest of the paper we use the term “broadcast” for local broadcast to one?hop neighbors and “flood” for network flooding. We use the aforementioned assumption on communication cost and delay for the quest of the fundamental limit of each surveyed algorithm in an ideal mesh network.
We assume a random communication network (has to be a connected graph) in our analysis. The influence of the communication network to the performance has been studied in [2], [3]. It is known that the network connectivity ratio, which is defined as the number of edges divided by the number of all possible edgeswill affect the performance. The algorithms are supposed to obtain a faster convergence speed when each node has higher number of neighbors. Higher neighbor count means more nodes can receive the information after one transmission of certain node. It accelerates information diffusion among all the nodes in the network and thus help all the nodes reach consensus faster. To conduct a fair comparison, we use the same communication network (topology) for all the benchmarks. For comparison and evaluation, the following two performance criteria are considered:
·Communication cost: To solve a least?squares problem of large?size system, communication cost has a big influence on the algorithm performance. Here we refer the communication cost as the cost involved in the messages exchanged in the mesh network during a single iteration of the iterative methods. Since iterative methods typically converge after many iterations, the communication cost of the iterative methods depends on both the cost in one iteration and the iteration number.
·Time?to?completion: The time taken for a network to finish one iteration in the iterative method is referred as time?to?completion in this paper. Note that it is different from the computational time complexity and shall include the consideration of the message size and number of hops the packet has traversed.
We also focus on the analysis of communication delays for time?to?completion while ignoring the computation time in each node. The rationale is summarized as follows. First, the cost of communication is very high in practice. We are highly constrained by the physical bandwidth and energy consumption in large?scale sensor networks in particular. Furthermore, the communication stage is much more time?consuming than the computation step. It turns out that reducing the total communication rounds is the key for speeding up the algorithm in terms of time?to?completion.
The least?squares problem in (2) formed over the mesh network is inherently distributed, i.e. each node only knows part of A and b. We assume that each node in the network holds consecutive rows of matrix and the corresponding part of vector b. For example in Fig. 1, block A1 indicates the first rows of matrix A which is assigned to node along with the right hand side vector . Note that the algorithm surveyed in this paper does not require that matrix A and b be equally partitioned over the network. Here the assumption of equal partition is for the simplicity of presentation and analysis and the new distributed equation takes the form:
The least squares problem takes the form and since there is no central coordinator which has entire A and b the computation of optimum has to be done distributedly. As mentioned above, the communication cost becomes crucial for distributed solution over sensor network and the goal of this paper is to survey various distributed least squares algorithm originating from different domains. We also try to compare different algorithms under similar criteria as mentioned above so that it provides the reader some basic differences between them and also help them to choose the type of algorithms suitable for their application. The notations used in this paper are described in Table 1.
3 Survey and Analysis
The methods for solving the linear least?squares problems are typically classified into two categories, direct methods and iterative methods. Direct methods are based on the factorization of the coefficient matrix into easily invertible matrices whereas iterative methods solve the system by generating a sequence of improving approximate solutions for the problem. Until recently direct methods were often preferred over iterative methods [4] due to their robustness and predictable behaviors (one can estimate the amount of resources required by direct solvers in terms of time and storage) [5], [6]. On the other hand, a number of iterative methods have also been discovered, which require fewer memory and are approaching the solution quality of direct solvers [6]. The size of the least squares problem arising from real world three?dimension problem models could be significantly large comprising hundreds of millions of equations as well as the unknowns. Despite such a huge dimension, the matrices arising are typically sparse and can be easily stored. Now given the dimension and sparsity property of the matrix, iterative methods become almost mandatory for solving them [7]. Moreover, iterative methods are gaining ground because they are easier to be implemented efficiently in high?performance computers than direct methods [6].
The methods for solving least?squares problems in distributed networks can be classified into two categories, distributed and decentralized (fully distributed) methods. We will discuss it in detail in this section.
3.1 Distributed Least?Squares Methods
To achieve high performance in computation, researchers have studied distributed iterative methods to solve large linear systems/linear least?squares problems [8], [9]. The researches leverage both shared and distributed memory architecture. In this section, we only present those distributed iterative methods that can be potentially distributed over a mesh network. The Distributed Multisplitting (D?MS), Distributed Modified Conjugate Gradient Least?Squares (D?MCGLS), Distributed Component?Average Row Projection (D?CARP), Distributed Cooperative Estimation (D?CE), Distributed Least Mean Squares (D?LMS), Distributed Recursive Least?Squares (D?RLS) methodsare discussed and compared, but only D?LMS and D?RLS are analyzed in details in this paper since they are more relevant and more promising than other distributed algorithms. Table 2 gives a summary of the communication cost and time?to?completion of the selected algorithms running in the distributed network. The details about the algorithm description and analysis are shown in Section 3. When the least?squares problem in (2) where , and is considered, we suppose that the iterative algorithm converges within iterations in the network, and and denote the average and maximum node degrees of the network respectively. The algorithms discussed in this section have been proved to be convergent, but the iteration number highly depends on the matrix condition number; these algorithms may need hundreds to thousands of iterations to converge over a network with hundreds of nodes for a large system. Besides, some algorithms either requires flooding communication in the network per iteration or a Hamiltonian path in the network to perform the computation node by node.
3.1.1 Distributed Least Mean Squares Method
Schizas, Mateos and Giannakis [10]-[13] introduce the D?LMS algorithm. This algorithm lets each node maintain its own local estimation and, to reach the consensus, exchange the local estimation only within its neighbors. The advantage of the methods like D?LMS and D?CE in signal processing is that only local information exchange is required. The problem is that these methods may converge slow in a large?scale network.
In their discussion, the wireless sensor network is deployed to estimate a signal vector . Each node has a regression vector , where denotes the time instants, and there is a observation on time ; both of them are assumed to have zero mean. One global vector is used for all the observations on nodes in the network. is the regression vector combined over the network, and the global LMS estimator is then described as
Let represent the local estimation of the global variable of one node (each node has its own estimation of the signal vector). In conjunction with these local variables, we consider the convex constrained minimization problem as
where is the neighbor set of node .
The equality constraints above only involve the local estimations of the neighbors of each node and force an agreement among each node’s neighbors. Since we assume that the network is connected, the constraints above will introduce a consensus in the network. We can finally have for all . We find that the distributed estimation problem is equivalent to the original problem in the sense that their optimal solutions coincide such as , for all . To construct the distributed algorithm, the authors resort to the Alternating Direction Method of Multipliers (ADMM) (algorithm, and get the following two equations for estimation updating,
where is a constant step?size and is a local priori error. and denote the additive communication noise present in the reception of and . Algorithm 1 gives the description of the distributed least mean square algorithm. In detail, during the time instant node receives the local estimates and plugs them into the equations above to evaluate for . Each updated local Lagrange multiplier is subsequently transmitted to the corresponding neighbor . Then upon reception of , the multipliers are jointly used along with and the newly acquired local data to obtain via the above equations. The ?st iteration is concluded after node broadcasts to its neighbors.
1) Communication Cost
Algorithm 1 is simple and only steps 3 and 5 involve communication. Applied to system , vectors and are both of length (columns of ). For example, in step 3, node needs to transmit to all its neighbors; in step 5, needs to transmit different to different neighbors (Fig. 2). Suppose that the average degree of network is , in each iteration, the communication cost of one node is of , so the communication of the network is . Suppose that after iterations (the iteration number might be greater than the time instants, so after the ?th sample on node is involved, the first sample is used as the ?st sample), the algorithm converges and the total communication cost is .
2) Time?to?Completion
In step 3 of Algorithm 1, node needs to broadcast to all its neighbors. From the receiver side, each node needs to receive different updates from all its neighbors, and then the delay of the whole network depends on the maximum node degree of the network since the algorithm is synchronous; this delay is . In step 5, node needs to send different to different neighbors, the delay is also . The total communication delay is then .
3.1.2 Distributed Recursive Least?Squares Method
Sayed and Lopes [14] developed a distributed least?squares estimation strategy by appealing to collaboration techniques that exploit the space?time structure of the data, achieving an exact recursive solution that is fully distributed. This D?RLS strategy is developed by appealing to collaboration techniques to achieve an exact recursive solution. It requires a cyclic path in the network to perform the computation node by node. The advantage of this method is the iteration number is fixed (the network size) for a give set of data to solve a least?squares problem, but the problem is a large dense matrix needs to be exchanged between nodes. The details and analysis of D?RLS strategy are given in this section, and Algorithm 2 gives the classic RLS procedure [15] .
To distribute the exact algorithm for estimating the vector in the network of nodes, each node has access to regressors and measurement data and , where and . At each time instant , the network has access to space?time data:
where and are snapshot matrices revealing the network data status at time . We collect all the data available up to time into global matrices and :
Note that here is equivalent to solving the least?squares problem of by partition and row?wise and each node has one partition of consecutive rows of and . Applying the RLS algorithm is different to the D?LMS estimator since it gives the least?squares solution of the whole block of data . Therefore, in the distributed RLS algorithm to solve a normal least?squares problem, we use to indicate the row block on node but not only one vector collected at time instant (we can treat it as all the data collected till time ).
By assuming an incremental path is defined across the network cycling from node to and so forth, until node , The RLS algorithm can be rewritten as a distributed version in Algorithm 3 [14].
1) Communication Cost
In the distributed RLS algorithm, the communication is in step 6. Each node shares with its successor node in the cycle path of network the quantities where and . For example, node receives the message from and sends it to (Fig. 3). Therefore, in each iteration, the communication cost only happens in one node and it is . Since the algorithm can converge after one cycle in the network, the total communication cost is . Note that a Hamiltonian path is required by this algorithm, and to find such a path, extra communication is required. This is another problem and out of the scope of the analysis in this paper, we omit this cost here.
2) Time?to?Completion
In distributed RLS algorithm, it is easy to see that the delay in one step is and there are totally steps in the algorithm, so the total time?to?completion is .
3.2 Decentralized Optimization Methods
In recent years, much attention has been paid to fully distributed (decentralized) consensus optimization problems, especially in applications like distributed machine learning, multi?agent optimization, etc. Several algorithms have been proposed for solving general convex and (sub)differentiable functions. By setting the objective function as least?square, the decentralized least?square problem can be seen as a special case of the following model. where nodes are in the network and they need to collaboratively estimate the model parameters . Each node locally holds the function and can communicate only with its immediate neighbors.
Considering the problem in (11), (sub)gradient?based methods have been proposed [16]-[20]. However, it has been analyzed that the aforementioned methods can only converge to a neighborhood of an optimal solution in the case of fixed step size [21]. Modified algorithms have been developed in [16] and [17], which use diminishing step sizes in order to guarantee converging to an true solution. Other related algorithms were discussed in [22]-[28], which share similar ideas. The D?NC algorithm proposed in [16] was demonstrated to have an outer?loop convergence rate of in terms of objective value error. The rate is same as the optimal centralized Nesterov’s accelerated gradient method and decentralized algorithms usually have slower convergence rate than the centralized versions. However, the number of consensus iterations within outer?loop is growing significantly along the iteration. Shi [29] developed a method based on correction on mixing matrix for Decentralized Gradient Descent (DGD) method [21] without diminishing step sizes.
The algorithms mentioned above are based on synchronous models. Distributed optimization methods for asynchronous models have been designed in [30]-[32]. However, it is worth noting that their convergence rates are usually slower than the counterparts in synchronous models. In the next, we show the derivations of various decentralized optimization methods. In order to have a compact form, let , where is a column vector containing local estimate of common interest at node . Similarly, define and , where denotes the gradient of .
Without loss of generality, we assume (the size of decision variable ), and the aforementioned problem can be formulated as:
The constraint requires to be consensual due to the property of the mixing matrix . Now we first derive the DGD algorithm [21] from (5). Assuming is symmetric and is symmetric, we get . Then if and only if holds. The original problem in (12) is thus equivalent to:
If we use external penalty method, an unconstrained (but inaccurate) reformation can be expressed as:
It is clear to see that (14) will approximate (15) as . Applying the gradient descent method to (14), we can obtain the following update rule:
If we choose constant step?size , this is solving for the fixed , and hence not yielding the optimal consensus solution. The DGD algorithm is shown in Algorithm 4.
We also show the derivation of Exact First?Order Algorithm (EXTRA) (Algorithm 6) [29]. We claim that EXTRA is equivalent to applying Alternating Direction Method of Multipliers (ADMM)method [34] to the original problem in (13). In (13), we would like to solve the exact constraint eventually. To this end, an unconstrained reformulation using augmented Lagrangian method can be described as:
Finally, we discuss the Fast Decentralized Gradient Descent (FDGD) method [35] that can be seen as an accelerated version of EXTRA. FDGD does not require diminishing step size and the method is accelerated to reach an optimal convergence rate for general convex differentiable functions . We adopte the idea of Nesterov’s optimal gradient method for centralized smooth optimization [36] and mixing matrix method in network gossip and consensus averaging algorithms [29], [37]. In Algorithm 7, the superscript “ag” stands for “aggregated”, and “md” stands for “middle”. Matrix is a half?mixing matrix based on . At iteration , each node sends its current to all its immediate neighbors and receives from them (one round of communication). The result is output as the final solution. Algorithm 7 is a first?order method since only is required in each iteration, and hence the subproblem has low computation complexity. We do not need to use diminishing step sizes that converge to 0 but still can ensure both of convergence and consensus. Besides, if for all , Algorithm 7 reduces to a version very similar to regular decentralized gradient descent (16). However, by the choice of as below, the change from input to output is faster than that from to . This implies that Algorithm 7 can converge faster than regular DGD. The last remark explains intuitively why the multi?step scheme defined in Algorithm 7 can potentially accelerate the convergence. The comparison of various decentralized least?square methods is summarized in Table 3.
4 Conclusions
In this paper, we surveyed some of the developments in distributed iterative methods and parallel iterative methods which can be potentially applied to solve least?squares problems in the mesh network. We covered the traditional iterative methods for solving linear systems including the relaxation methods, the conjugate gradient methods and the row action methods. One algorithm from each category is selected for describing how to apply them to solve least?squares problem in mesh network. We also surveyed some consensus and diffusion based strategies for parameter estimation in signal processing in the mesh network. Such the strategies only require local communications, however, for a large scale network, may take more iterations to converge to the required accuracy to reach an agreement among all the nodes. Algorithm selection depends on the context of the problem and the mesh network. We also analyzed and compared the performance of the selected representative algorithms in terms of communication cost and time?to?completion. These two concerns are critical for evaluating the performance of distributed algorithms in the context of mesh networks. Besides, we think that a future research direction of distributed computing in mesh networks is the data loss tolerance: will the algorithm still approximate the optimal estimation well if ?percent packets get lost in the network? Different from traditional parallel machines where data delivery is often guaranteed, in many distributed network applications, preventing data losses can either be very expensive (such as sensor networks) as it requires retransmissions, or have a time constraint in real?time applications (such as smart grid) that makes retransmitted data useless.
References
[1] A. Bj?rck, Numerical Methods for Least Squares Problems. Philadelphia, USA: SIAM, 1996.
[2] L. Zhao, W. Z. Song, and X. Ye, “Decentralized consensus in distributed networks,” International Journal of Parallel, Emergent and Distributed Systems, vol. 20, pp. 1-20, 2016. doi: 10.1080/17445760.2016.1233552.
[3] L. Zhao, W. Z. Song, X. Ye, and Y. Gu, “Asynchronous broadcast?based decentralized learning in sensor networks,” International Journal of Parallel, Emergent and Distributed Systems, vol. 32, pp. 1-19, 2017.
[4] Y. Saad and H. A. van der Vorst, “Iterative solution of linear systems in the 20th century,” Journal of Computational and Applied Mathematics, vol. 123, no. 1-2, pp. 1-33, 2000. doi: 10.1016/S0377?0427(00)00412?X.
[5] I. S. Duff, A. M. Erisman, and J. K. Reid, Direct Methods for Sparse Matrices of Numerical Mathematics and Scientific Computation. Oxford, UK: Clarendon Press, 1989.
[6] Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd edition. Philadelphia, USA: SIAM, 2003.
[7] C. C. Paige and M. A. Saunders, “LSQR: an algorithm for sparse linear equations and sparse least squares,” ACM Transactions on Mathematical Software, vol. 8, no. 1, pp. 43-71, Mar. 1982.
[8] M. Baboulin, “Solving large dense linear least squares problems on parallel distributed computers. Application to the Earth’s gravity field computation,” Ph.D. Thesis, Toulouse Institute of Technology, Toulouse, France, 2006.
[9] M. T. Heath, E. Ng, and B. W. Peyton, “Parallel algorithms for sparse linear systems,” SIAM review, vol. 33, no. 3, pp. 420-460, 1991. [10] G. Mateos and G. B. Giannakis, “Distributed recursive least?squares: stability and performance analysis,” IEEE Transactions on Signal Processing, vol. 60, no. 7, pp. 3740-3754, 2012.
[11] G. Mateos, I. D. Schizas, and G. B. Giannakis, “Performance analysis of the consensus?based distributed LMS algorithm,” EURASIP Journal on Advances in Signal Processing, 2009. doi: 10.1155/2009/981030.
[12] I. D. Schizas, G. B. Giannakis, S. I. Roumeliotis, and A. Ribeiro, “Consensus in ad hoc WSNs with noisy links—part II: distributed estimation and smoothing of random signals,” IEEE Transactions on Signal Processing, vol. 56, no. 4, pp. 1650-1666, 2008. doi: 10.1109/TSP.2007.908943.
[13] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, “Consensus in ad hoc WSNs with noisy links—part I: distributed estimation of deterministic signals,” IEEE Transactions on Signal Processing, vol. 56, no. 1, pp. 350-364, 2008. doi: 10.1109/TSP.2007.906734.
[14] A. H. Sayed, and C. G. Lopes, “Distributed recursive least?squares strategies over adaptive networks. signals, systems and computers,” Fortieth Asilomar Conference on Signals, Systems and Computers (ACSSC’06), Pacific Grove, USA, 2006, pp. 233-237. doi: 10.1109/ACSSC.2006.356622.
[15] V. K. Madisetti, Digital Signal Processing Fundamentals. Boca Raton, USA: CRC Press, 2009.
[16] D. Jakovetic, J. Xavier, and J. M. F. Moura, “Fast distributed gradient methods,” IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1131-1146, May 2014. doi: 10.1109/TAC.2014.2298712.
[17] I.?A. Chen, “Fast distributed first?order methods,” Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, USA, 2012.
[18] I. Matei, and J. S. Baras, “Performance evaluation of the consensus?based distributed subgradient method under random communication topologies,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 4, pp. 754-771, 2011. doi: 10.1109/JSTSP.2011.2120593.
[19] A. Nedic and A. Olshevsky, “Distributed optimization over time?varying directed graphs,” in IEEE 52nd Annual Conference on Decision and Control (CDC), Florence, Italy, 2013, pp. 6855-6860.
[20] A. Nedic and A. Olshevsky, “stochastic gradient?push for strongly convex functions on time?varying directed graphs,” arXiv:1406.2075, 2014.
[21] K. Yuan, Q. Ling, and W. Yin, “On the convergence of decentralized gradient descent,” arXiv:1310.7063, 2013.
[22] H. Terelius, U. Topcu, and R. Murray, “Decentralized multi?agent optimization via dual decomposition,” in 18th IFAC World Congress, Milano, Italy, 2011. [23] J. N. Tsitsiklis, “Problems in decentralized decision making and computation,” Ph.D. Thesis, MIT, Cambridge, USA, 1984.
[24] M. Rabbat and R. Nowak, “Distributed optimization in sensor networks. Information processing in sensor networks,” in Third International Symposium on Information Processing in Sensor Networks, Berkeley, USA, 2004, pp. 20-27. doi: 10.1145/984622.984626.
[25] G. Shi and K. H. Johansson, “Finite?time and asymptotic convergence of distributed averaging and maximizing algorithms,” arXiv:1205.1733, 2012.
[26] J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans, “Distributed asynchronous deterministic and stochastic gradient optimization algorithms,” IEEE Transactions on Automatic Control, vol. 31, no. 9, pp. 803-812, 1986.
[27] L. Xiao, S. Boyd, and S. Lall, “A scheme for robust distributed sensor fusion based on average consensus,” in Proc. 4th International Symposium on Information Processing in Sensor Networks, Piscataway, USA, 2005. doi: 10.1109/IPSN.2005.1440896.
[28] M. Zargham, A. Ribeiro, and A. Jadbabaie, “A distributed line search for network optimization,” in American Control Conference (ACC), Montreal, Canada, 2012, pages 472-477. doi: 10.1109/ACC.2012.6314986.
[29] W. Shi, Q. Ling, G. Wu, and W. Yin, “EXTRA: an exact first?order algorithm for decentralized consensus optimization,” arXiv:1404.6264, 2014.
[30] F. Iutzeler, P. Bianchi, P. Ciblat, and W. Hachem. “Asynchronous distributed optimization using a randomized alternating direction method of multipliers,” arXiv:1303.2837, 2013.
[31] E. Wei and A. Ozdaglar, “On the O(1/k) convergence of asynchronous distributed alternating direction method of multipliers,” arXiv:1307.8254, 2013.
[32] L. Zhao, W.?Z. Song, L. Shi, and X. Ye, “Decentralised seismic tomography computing in cyber?physical sensor systems,” Cyber?Physical Systems, vol. 1, no. 2-4, pp. 91-112, 2015. doi: 10.1080/23335777.2015.1062049.
[33] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Applied Optimization), 1st edition. New York, USA: Springer, 2004. doi: 10.1007/978?1?4419?8853?9.
[34] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1-122, Jan. 2011. doi: 10.1561/2200000016.
[35] L. Zhao, W. Z. Song, and X. Ye, “Fast decentralized gradient descent method and applications to in?situ seismic tomography,” IEEE International Conference on Big Data, Santa Clara, USA, 2015, pp. 908-917. doi: 10.1109/BigData.2015.7363839.
[36] Y. Nesterov, “Gradient methods for minimizing composite objective function,” CORE Discussion Papers, 2007076, Catholic University of Louvain, Louvain?la?
Neuve, Belgium, Sept. 2007.
[37] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 2508-2530, Jun. 2006. doi: 10.1109/TIT.2006.874516.
Keywords distributed computing; iterative methods; least?squares; mesh network
1 Introduction
Many physical phenomena can be described by partial differential equations [1] which forms large sparse system of linear equations after further discretized. Problems, such as state estimation, target tracking and tomography inversion, are often formulated as a large?scale linear system based on some field measurements. Those field measurements may contain errors, thus an extra amount of measurement is often sampled to form an over?determined linear system:
where , and . Such extra information can smooth out the errors but produces an overdetermined system that usually has no exact solution. The method of least?squares is a common solution to the above problem and can be defined as
The coefficient A is often modeled from the data obtain from sensors used for observing the physical phenomena such as cyber physical system. Each sensor or node observes partial phenomena due to the spatial and temporal restriction and thus only forms partial rows of the least?squares systems. The large?scale cyber?physical systems are often built on a mesh network, which could be a wired, wireless or wired?wireless hybrid multi?hop network. For instance, the problems from target tracking, seismic tomography and smart grid state estimation all have an inherently distributed system of linear equations. However, the least squares method used currently for solving these problems assumes a centralized setup, where partial row information from all the nodes are collected in a server and then solved using the centralized least?square algorithm. In many of those cyber?physical systems, the distributed computation in mesh networks is strongly demanded or preferred over the centralized computation approach, due to the following reasons (but not limited to):
1) In some applications such as imaging seismic tomography with the aid of mesh network, the real?time data retrieval from a large?scale seismic mesh network into a central server is virtually impossible due to the sheer amount of data and resource limitations. The distributed computation may process data inside the network in real time to reduce the bandwidth demand as well as distribute the communication and computation load to each node in the network.
2) The mesh network may be disruptive in real world and the data collection and centralized computation may suffer from node failure or link disruption. These become a bottleneck especially when the node failure or link disruption happens near to the sink node which leads to loss of high volume of raw data. However, with distributed computation, the remaining nodes in the network can finish the computation and get the approximated results.
3) In smart grid state estimation, the data collection for centralized computation is even infeasible due to data privacy concerns or inter?agency policy constraints.
4) In some applications that need real?time control, the distributed computation also has advantage over centralized schemes, since some decisions can be made locally in real time. The current state of the art computational device such as smart phones enables us to perform in?network computing and carry out distributed computation over a mesh network.
Iterative methods are natural candidates when it comes for large sparse system and especially for distributed computation of least?squares. Although there are a lot of studies on iterative least?squares, most of them are concerned with the efficiency of centralized/parallel computation, and only a few are explicitly about distributed computation or have the potential to be applied on mesh networks. In mesh networks, since the computers need to communicate with each other through messages passing over a multi?hop network, the key challenges are speeding up the computation and reducing the communication cost. More attention shall be paid to communication instead of the computation cost, especially when solving a big problem in a large?scale mesh network.
In this paper, we select and survey the representative iterative methods from several research communities. These methods have the potential to be used in solving least squares problem over mesh networks. Here, a skeleton sketch of each algorithm is provided and later we analyze the time?to?completion and communication cost of these algorithms and provide the comparison. Some of the algorithms presented here were not originally designed for meeting our requirements, so we slightly modify them to maintain consistency. The rest of the paper is organized as follows. In section 2, we present the network model and the evaluation criteria for comparison. Then in section 3, we describe the state of art on surveyed algorithms in details, analyze and compare their communication costs and time?to?completion. Finally, we conclude the paper in section 4.
2 Model and Assumption
We denote a wired and/or wireless mesh network with nodes which form connected graph and can be reached through multi?hop message relays. Without loss of generality, we assume that the diameter of the network is (i.e., any message can be sent from one node to another through at most hops). We also assume that each node has a single radio and the link between the neighboring nodes has a unit bandwidth. Therefore, the communication delay of one unit data delivery between direct neighbors (either through a unicast to one direct neighbor or multicast/broadcast to all direct neighbors) would be one unit time. It is noted that the link layer supports broadcast, which is often true in many mesh networks. If the link layer only supports unicast, the analysis can be similarly done by considering a one?hop broadcast as multiple unicast and we skip this analysis as it is trivial. We also understand that the link layer communication may take more than one unit time for one unit data due to network interference and media contentions. Therefore, we classify the communication patterns in a mesh network into three categories, unicast (one?hop or multi?hop), one?hop broadcast (local broadcast to all neighbors) and network flooding (broadcast to all the nodes in network). For simplicity and convenience, in the rest of the paper we use the term “broadcast” for local broadcast to one?hop neighbors and “flood” for network flooding. We use the aforementioned assumption on communication cost and delay for the quest of the fundamental limit of each surveyed algorithm in an ideal mesh network.
We assume a random communication network (has to be a connected graph) in our analysis. The influence of the communication network to the performance has been studied in [2], [3]. It is known that the network connectivity ratio, which is defined as the number of edges divided by the number of all possible edgeswill affect the performance. The algorithms are supposed to obtain a faster convergence speed when each node has higher number of neighbors. Higher neighbor count means more nodes can receive the information after one transmission of certain node. It accelerates information diffusion among all the nodes in the network and thus help all the nodes reach consensus faster. To conduct a fair comparison, we use the same communication network (topology) for all the benchmarks. For comparison and evaluation, the following two performance criteria are considered:
·Communication cost: To solve a least?squares problem of large?size system, communication cost has a big influence on the algorithm performance. Here we refer the communication cost as the cost involved in the messages exchanged in the mesh network during a single iteration of the iterative methods. Since iterative methods typically converge after many iterations, the communication cost of the iterative methods depends on both the cost in one iteration and the iteration number.
·Time?to?completion: The time taken for a network to finish one iteration in the iterative method is referred as time?to?completion in this paper. Note that it is different from the computational time complexity and shall include the consideration of the message size and number of hops the packet has traversed.
We also focus on the analysis of communication delays for time?to?completion while ignoring the computation time in each node. The rationale is summarized as follows. First, the cost of communication is very high in practice. We are highly constrained by the physical bandwidth and energy consumption in large?scale sensor networks in particular. Furthermore, the communication stage is much more time?consuming than the computation step. It turns out that reducing the total communication rounds is the key for speeding up the algorithm in terms of time?to?completion.
The least?squares problem in (2) formed over the mesh network is inherently distributed, i.e. each node only knows part of A and b. We assume that each node in the network holds consecutive rows of matrix and the corresponding part of vector b. For example in Fig. 1, block A1 indicates the first rows of matrix A which is assigned to node along with the right hand side vector . Note that the algorithm surveyed in this paper does not require that matrix A and b be equally partitioned over the network. Here the assumption of equal partition is for the simplicity of presentation and analysis and the new distributed equation takes the form:
The least squares problem takes the form and since there is no central coordinator which has entire A and b the computation of optimum has to be done distributedly. As mentioned above, the communication cost becomes crucial for distributed solution over sensor network and the goal of this paper is to survey various distributed least squares algorithm originating from different domains. We also try to compare different algorithms under similar criteria as mentioned above so that it provides the reader some basic differences between them and also help them to choose the type of algorithms suitable for their application. The notations used in this paper are described in Table 1.
3 Survey and Analysis
The methods for solving the linear least?squares problems are typically classified into two categories, direct methods and iterative methods. Direct methods are based on the factorization of the coefficient matrix into easily invertible matrices whereas iterative methods solve the system by generating a sequence of improving approximate solutions for the problem. Until recently direct methods were often preferred over iterative methods [4] due to their robustness and predictable behaviors (one can estimate the amount of resources required by direct solvers in terms of time and storage) [5], [6]. On the other hand, a number of iterative methods have also been discovered, which require fewer memory and are approaching the solution quality of direct solvers [6]. The size of the least squares problem arising from real world three?dimension problem models could be significantly large comprising hundreds of millions of equations as well as the unknowns. Despite such a huge dimension, the matrices arising are typically sparse and can be easily stored. Now given the dimension and sparsity property of the matrix, iterative methods become almost mandatory for solving them [7]. Moreover, iterative methods are gaining ground because they are easier to be implemented efficiently in high?performance computers than direct methods [6].
The methods for solving least?squares problems in distributed networks can be classified into two categories, distributed and decentralized (fully distributed) methods. We will discuss it in detail in this section.
3.1 Distributed Least?Squares Methods
To achieve high performance in computation, researchers have studied distributed iterative methods to solve large linear systems/linear least?squares problems [8], [9]. The researches leverage both shared and distributed memory architecture. In this section, we only present those distributed iterative methods that can be potentially distributed over a mesh network. The Distributed Multisplitting (D?MS), Distributed Modified Conjugate Gradient Least?Squares (D?MCGLS), Distributed Component?Average Row Projection (D?CARP), Distributed Cooperative Estimation (D?CE), Distributed Least Mean Squares (D?LMS), Distributed Recursive Least?Squares (D?RLS) methodsare discussed and compared, but only D?LMS and D?RLS are analyzed in details in this paper since they are more relevant and more promising than other distributed algorithms. Table 2 gives a summary of the communication cost and time?to?completion of the selected algorithms running in the distributed network. The details about the algorithm description and analysis are shown in Section 3. When the least?squares problem in (2) where , and is considered, we suppose that the iterative algorithm converges within iterations in the network, and and denote the average and maximum node degrees of the network respectively. The algorithms discussed in this section have been proved to be convergent, but the iteration number highly depends on the matrix condition number; these algorithms may need hundreds to thousands of iterations to converge over a network with hundreds of nodes for a large system. Besides, some algorithms either requires flooding communication in the network per iteration or a Hamiltonian path in the network to perform the computation node by node.
3.1.1 Distributed Least Mean Squares Method
Schizas, Mateos and Giannakis [10]-[13] introduce the D?LMS algorithm. This algorithm lets each node maintain its own local estimation and, to reach the consensus, exchange the local estimation only within its neighbors. The advantage of the methods like D?LMS and D?CE in signal processing is that only local information exchange is required. The problem is that these methods may converge slow in a large?scale network.
In their discussion, the wireless sensor network is deployed to estimate a signal vector . Each node has a regression vector , where denotes the time instants, and there is a observation on time ; both of them are assumed to have zero mean. One global vector is used for all the observations on nodes in the network. is the regression vector combined over the network, and the global LMS estimator is then described as
Let represent the local estimation of the global variable of one node (each node has its own estimation of the signal vector). In conjunction with these local variables, we consider the convex constrained minimization problem as
where is the neighbor set of node .
The equality constraints above only involve the local estimations of the neighbors of each node and force an agreement among each node’s neighbors. Since we assume that the network is connected, the constraints above will introduce a consensus in the network. We can finally have for all . We find that the distributed estimation problem is equivalent to the original problem in the sense that their optimal solutions coincide such as , for all . To construct the distributed algorithm, the authors resort to the Alternating Direction Method of Multipliers (ADMM) (algorithm, and get the following two equations for estimation updating,
where is a constant step?size and is a local priori error. and denote the additive communication noise present in the reception of and . Algorithm 1 gives the description of the distributed least mean square algorithm. In detail, during the time instant node receives the local estimates and plugs them into the equations above to evaluate for . Each updated local Lagrange multiplier is subsequently transmitted to the corresponding neighbor . Then upon reception of , the multipliers are jointly used along with and the newly acquired local data to obtain via the above equations. The ?st iteration is concluded after node broadcasts to its neighbors.
1) Communication Cost
Algorithm 1 is simple and only steps 3 and 5 involve communication. Applied to system , vectors and are both of length (columns of ). For example, in step 3, node needs to transmit to all its neighbors; in step 5, needs to transmit different to different neighbors (Fig. 2). Suppose that the average degree of network is , in each iteration, the communication cost of one node is of , so the communication of the network is . Suppose that after iterations (the iteration number might be greater than the time instants, so after the ?th sample on node is involved, the first sample is used as the ?st sample), the algorithm converges and the total communication cost is .
2) Time?to?Completion
In step 3 of Algorithm 1, node needs to broadcast to all its neighbors. From the receiver side, each node needs to receive different updates from all its neighbors, and then the delay of the whole network depends on the maximum node degree of the network since the algorithm is synchronous; this delay is . In step 5, node needs to send different to different neighbors, the delay is also . The total communication delay is then .
3.1.2 Distributed Recursive Least?Squares Method
Sayed and Lopes [14] developed a distributed least?squares estimation strategy by appealing to collaboration techniques that exploit the space?time structure of the data, achieving an exact recursive solution that is fully distributed. This D?RLS strategy is developed by appealing to collaboration techniques to achieve an exact recursive solution. It requires a cyclic path in the network to perform the computation node by node. The advantage of this method is the iteration number is fixed (the network size) for a give set of data to solve a least?squares problem, but the problem is a large dense matrix needs to be exchanged between nodes. The details and analysis of D?RLS strategy are given in this section, and Algorithm 2 gives the classic RLS procedure [15] .
To distribute the exact algorithm for estimating the vector in the network of nodes, each node has access to regressors and measurement data and , where and . At each time instant , the network has access to space?time data:
where and are snapshot matrices revealing the network data status at time . We collect all the data available up to time into global matrices and :
Note that here is equivalent to solving the least?squares problem of by partition and row?wise and each node has one partition of consecutive rows of and . Applying the RLS algorithm is different to the D?LMS estimator since it gives the least?squares solution of the whole block of data . Therefore, in the distributed RLS algorithm to solve a normal least?squares problem, we use to indicate the row block on node but not only one vector collected at time instant (we can treat it as all the data collected till time ).
By assuming an incremental path is defined across the network cycling from node to and so forth, until node , The RLS algorithm can be rewritten as a distributed version in Algorithm 3 [14].
1) Communication Cost
In the distributed RLS algorithm, the communication is in step 6. Each node shares with its successor node in the cycle path of network the quantities where and . For example, node receives the message from and sends it to (Fig. 3). Therefore, in each iteration, the communication cost only happens in one node and it is . Since the algorithm can converge after one cycle in the network, the total communication cost is . Note that a Hamiltonian path is required by this algorithm, and to find such a path, extra communication is required. This is another problem and out of the scope of the analysis in this paper, we omit this cost here.
2) Time?to?Completion
In distributed RLS algorithm, it is easy to see that the delay in one step is and there are totally steps in the algorithm, so the total time?to?completion is .
3.2 Decentralized Optimization Methods
In recent years, much attention has been paid to fully distributed (decentralized) consensus optimization problems, especially in applications like distributed machine learning, multi?agent optimization, etc. Several algorithms have been proposed for solving general convex and (sub)differentiable functions. By setting the objective function as least?square, the decentralized least?square problem can be seen as a special case of the following model. where nodes are in the network and they need to collaboratively estimate the model parameters . Each node locally holds the function and can communicate only with its immediate neighbors.
Considering the problem in (11), (sub)gradient?based methods have been proposed [16]-[20]. However, it has been analyzed that the aforementioned methods can only converge to a neighborhood of an optimal solution in the case of fixed step size [21]. Modified algorithms have been developed in [16] and [17], which use diminishing step sizes in order to guarantee converging to an true solution. Other related algorithms were discussed in [22]-[28], which share similar ideas. The D?NC algorithm proposed in [16] was demonstrated to have an outer?loop convergence rate of in terms of objective value error. The rate is same as the optimal centralized Nesterov’s accelerated gradient method and decentralized algorithms usually have slower convergence rate than the centralized versions. However, the number of consensus iterations within outer?loop is growing significantly along the iteration. Shi [29] developed a method based on correction on mixing matrix for Decentralized Gradient Descent (DGD) method [21] without diminishing step sizes.
The algorithms mentioned above are based on synchronous models. Distributed optimization methods for asynchronous models have been designed in [30]-[32]. However, it is worth noting that their convergence rates are usually slower than the counterparts in synchronous models. In the next, we show the derivations of various decentralized optimization methods. In order to have a compact form, let , where is a column vector containing local estimate of common interest at node . Similarly, define and , where denotes the gradient of .
Without loss of generality, we assume (the size of decision variable ), and the aforementioned problem can be formulated as:
The constraint requires to be consensual due to the property of the mixing matrix . Now we first derive the DGD algorithm [21] from (5). Assuming is symmetric and is symmetric, we get . Then if and only if holds. The original problem in (12) is thus equivalent to:
If we use external penalty method, an unconstrained (but inaccurate) reformation can be expressed as:
It is clear to see that (14) will approximate (15) as . Applying the gradient descent method to (14), we can obtain the following update rule:
If we choose constant step?size , this is solving for the fixed , and hence not yielding the optimal consensus solution. The DGD algorithm is shown in Algorithm 4.
We also show the derivation of Exact First?Order Algorithm (EXTRA) (Algorithm 6) [29]. We claim that EXTRA is equivalent to applying Alternating Direction Method of Multipliers (ADMM)method [34] to the original problem in (13). In (13), we would like to solve the exact constraint eventually. To this end, an unconstrained reformulation using augmented Lagrangian method can be described as:
Finally, we discuss the Fast Decentralized Gradient Descent (FDGD) method [35] that can be seen as an accelerated version of EXTRA. FDGD does not require diminishing step size and the method is accelerated to reach an optimal convergence rate for general convex differentiable functions . We adopte the idea of Nesterov’s optimal gradient method for centralized smooth optimization [36] and mixing matrix method in network gossip and consensus averaging algorithms [29], [37]. In Algorithm 7, the superscript “ag” stands for “aggregated”, and “md” stands for “middle”. Matrix is a half?mixing matrix based on . At iteration , each node sends its current to all its immediate neighbors and receives from them (one round of communication). The result is output as the final solution. Algorithm 7 is a first?order method since only is required in each iteration, and hence the subproblem has low computation complexity. We do not need to use diminishing step sizes that converge to 0 but still can ensure both of convergence and consensus. Besides, if for all , Algorithm 7 reduces to a version very similar to regular decentralized gradient descent (16). However, by the choice of as below, the change from input to output is faster than that from to . This implies that Algorithm 7 can converge faster than regular DGD. The last remark explains intuitively why the multi?step scheme defined in Algorithm 7 can potentially accelerate the convergence. The comparison of various decentralized least?square methods is summarized in Table 3.
4 Conclusions
In this paper, we surveyed some of the developments in distributed iterative methods and parallel iterative methods which can be potentially applied to solve least?squares problems in the mesh network. We covered the traditional iterative methods for solving linear systems including the relaxation methods, the conjugate gradient methods and the row action methods. One algorithm from each category is selected for describing how to apply them to solve least?squares problem in mesh network. We also surveyed some consensus and diffusion based strategies for parameter estimation in signal processing in the mesh network. Such the strategies only require local communications, however, for a large scale network, may take more iterations to converge to the required accuracy to reach an agreement among all the nodes. Algorithm selection depends on the context of the problem and the mesh network. We also analyzed and compared the performance of the selected representative algorithms in terms of communication cost and time?to?completion. These two concerns are critical for evaluating the performance of distributed algorithms in the context of mesh networks. Besides, we think that a future research direction of distributed computing in mesh networks is the data loss tolerance: will the algorithm still approximate the optimal estimation well if ?percent packets get lost in the network? Different from traditional parallel machines where data delivery is often guaranteed, in many distributed network applications, preventing data losses can either be very expensive (such as sensor networks) as it requires retransmissions, or have a time constraint in real?time applications (such as smart grid) that makes retransmitted data useless.
References
[1] A. Bj?rck, Numerical Methods for Least Squares Problems. Philadelphia, USA: SIAM, 1996.
[2] L. Zhao, W. Z. Song, and X. Ye, “Decentralized consensus in distributed networks,” International Journal of Parallel, Emergent and Distributed Systems, vol. 20, pp. 1-20, 2016. doi: 10.1080/17445760.2016.1233552.
[3] L. Zhao, W. Z. Song, X. Ye, and Y. Gu, “Asynchronous broadcast?based decentralized learning in sensor networks,” International Journal of Parallel, Emergent and Distributed Systems, vol. 32, pp. 1-19, 2017.
[4] Y. Saad and H. A. van der Vorst, “Iterative solution of linear systems in the 20th century,” Journal of Computational and Applied Mathematics, vol. 123, no. 1-2, pp. 1-33, 2000. doi: 10.1016/S0377?0427(00)00412?X.
[5] I. S. Duff, A. M. Erisman, and J. K. Reid, Direct Methods for Sparse Matrices of Numerical Mathematics and Scientific Computation. Oxford, UK: Clarendon Press, 1989.
[6] Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd edition. Philadelphia, USA: SIAM, 2003.
[7] C. C. Paige and M. A. Saunders, “LSQR: an algorithm for sparse linear equations and sparse least squares,” ACM Transactions on Mathematical Software, vol. 8, no. 1, pp. 43-71, Mar. 1982.
[8] M. Baboulin, “Solving large dense linear least squares problems on parallel distributed computers. Application to the Earth’s gravity field computation,” Ph.D. Thesis, Toulouse Institute of Technology, Toulouse, France, 2006.
[9] M. T. Heath, E. Ng, and B. W. Peyton, “Parallel algorithms for sparse linear systems,” SIAM review, vol. 33, no. 3, pp. 420-460, 1991. [10] G. Mateos and G. B. Giannakis, “Distributed recursive least?squares: stability and performance analysis,” IEEE Transactions on Signal Processing, vol. 60, no. 7, pp. 3740-3754, 2012.
[11] G. Mateos, I. D. Schizas, and G. B. Giannakis, “Performance analysis of the consensus?based distributed LMS algorithm,” EURASIP Journal on Advances in Signal Processing, 2009. doi: 10.1155/2009/981030.
[12] I. D. Schizas, G. B. Giannakis, S. I. Roumeliotis, and A. Ribeiro, “Consensus in ad hoc WSNs with noisy links—part II: distributed estimation and smoothing of random signals,” IEEE Transactions on Signal Processing, vol. 56, no. 4, pp. 1650-1666, 2008. doi: 10.1109/TSP.2007.908943.
[13] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, “Consensus in ad hoc WSNs with noisy links—part I: distributed estimation of deterministic signals,” IEEE Transactions on Signal Processing, vol. 56, no. 1, pp. 350-364, 2008. doi: 10.1109/TSP.2007.906734.
[14] A. H. Sayed, and C. G. Lopes, “Distributed recursive least?squares strategies over adaptive networks. signals, systems and computers,” Fortieth Asilomar Conference on Signals, Systems and Computers (ACSSC’06), Pacific Grove, USA, 2006, pp. 233-237. doi: 10.1109/ACSSC.2006.356622.
[15] V. K. Madisetti, Digital Signal Processing Fundamentals. Boca Raton, USA: CRC Press, 2009.
[16] D. Jakovetic, J. Xavier, and J. M. F. Moura, “Fast distributed gradient methods,” IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1131-1146, May 2014. doi: 10.1109/TAC.2014.2298712.
[17] I.?A. Chen, “Fast distributed first?order methods,” Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, USA, 2012.
[18] I. Matei, and J. S. Baras, “Performance evaluation of the consensus?based distributed subgradient method under random communication topologies,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 4, pp. 754-771, 2011. doi: 10.1109/JSTSP.2011.2120593.
[19] A. Nedic and A. Olshevsky, “Distributed optimization over time?varying directed graphs,” in IEEE 52nd Annual Conference on Decision and Control (CDC), Florence, Italy, 2013, pp. 6855-6860.
[20] A. Nedic and A. Olshevsky, “stochastic gradient?push for strongly convex functions on time?varying directed graphs,” arXiv:1406.2075, 2014.
[21] K. Yuan, Q. Ling, and W. Yin, “On the convergence of decentralized gradient descent,” arXiv:1310.7063, 2013.
[22] H. Terelius, U. Topcu, and R. Murray, “Decentralized multi?agent optimization via dual decomposition,” in 18th IFAC World Congress, Milano, Italy, 2011. [23] J. N. Tsitsiklis, “Problems in decentralized decision making and computation,” Ph.D. Thesis, MIT, Cambridge, USA, 1984.
[24] M. Rabbat and R. Nowak, “Distributed optimization in sensor networks. Information processing in sensor networks,” in Third International Symposium on Information Processing in Sensor Networks, Berkeley, USA, 2004, pp. 20-27. doi: 10.1145/984622.984626.
[25] G. Shi and K. H. Johansson, “Finite?time and asymptotic convergence of distributed averaging and maximizing algorithms,” arXiv:1205.1733, 2012.
[26] J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans, “Distributed asynchronous deterministic and stochastic gradient optimization algorithms,” IEEE Transactions on Automatic Control, vol. 31, no. 9, pp. 803-812, 1986.
[27] L. Xiao, S. Boyd, and S. Lall, “A scheme for robust distributed sensor fusion based on average consensus,” in Proc. 4th International Symposium on Information Processing in Sensor Networks, Piscataway, USA, 2005. doi: 10.1109/IPSN.2005.1440896.
[28] M. Zargham, A. Ribeiro, and A. Jadbabaie, “A distributed line search for network optimization,” in American Control Conference (ACC), Montreal, Canada, 2012, pages 472-477. doi: 10.1109/ACC.2012.6314986.
[29] W. Shi, Q. Ling, G. Wu, and W. Yin, “EXTRA: an exact first?order algorithm for decentralized consensus optimization,” arXiv:1404.6264, 2014.
[30] F. Iutzeler, P. Bianchi, P. Ciblat, and W. Hachem. “Asynchronous distributed optimization using a randomized alternating direction method of multipliers,” arXiv:1303.2837, 2013.
[31] E. Wei and A. Ozdaglar, “On the O(1/k) convergence of asynchronous distributed alternating direction method of multipliers,” arXiv:1307.8254, 2013.
[32] L. Zhao, W.?Z. Song, L. Shi, and X. Ye, “Decentralised seismic tomography computing in cyber?physical sensor systems,” Cyber?Physical Systems, vol. 1, no. 2-4, pp. 91-112, 2015. doi: 10.1080/23335777.2015.1062049.
[33] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Applied Optimization), 1st edition. New York, USA: Springer, 2004. doi: 10.1007/978?1?4419?8853?9.
[34] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1-122, Jan. 2011. doi: 10.1561/2200000016.
[35] L. Zhao, W. Z. Song, and X. Ye, “Fast decentralized gradient descent method and applications to in?situ seismic tomography,” IEEE International Conference on Big Data, Santa Clara, USA, 2015, pp. 908-917. doi: 10.1109/BigData.2015.7363839.
[36] Y. Nesterov, “Gradient methods for minimizing composite objective function,” CORE Discussion Papers, 2007076, Catholic University of Louvain, Louvain?la?
Neuve, Belgium, Sept. 2007.
[37] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE Transactions on Information Theory, vol. 52, no. 6, pp. 2508-2530, Jun. 2006. doi: 10.1109/TIT.2006.874516.