1 The Steiner Tree Problem

An instance of the Steiner Tree problem is a connected graph on a set of vertices and edges equipped with a positive weight function . Given a set of terminals , a valid solution is s set that induces a sub-graph in which any pair of terminals is connected. The weight of a solution is defined by , and we want to find a valid solution with minimum weight, usually denoted . Note that since the initial graph is connected there is always a valid solution.
Finding an exact optimal solution is NP-Complete. However, there is polynomial time approximation algorithm that yields a factor 2 approximation. We say that a solution is 2-approximation for the problem if for some optimal solution .

2 Adding Assumptions “for Free”

Given a graph and weight-function , consider the following construction: Let be a clique on the same set of vertices , that is, . Define the weight function as follows, let be the weight of the shortest path from to in the original . When “shortest path” is defined to be the path with minimum sum of weights among all the simple paths that from to . Furthermore, it worth noting that by its definition, obeys the triangle inequality, meaning that for any triple , if , and then .
Fix some , we will show that the weight of any optimal solution for the original instance is exactly the same weight as of any optimal solution for . We will do that by showing two inequalities: let be optimal solutions for the original and the modified instances respectively. Then and .
Obviously, the inequality holds, because contains all the edges of and for any edge in its weight over the modified instance have , by the definition of . Therefore, the optimal weight over can only be equal to or smaller than the one of .
The other direction is less trivial. Take any and convert it to be a valid solution over the original as follows: for any , if and then add . Otherwise, take all the edges along the shortest path connecting the endpoints of other the original graph and add them to . Note that the entire weight of that path cannot exceeds by the definition of . Therefore we end up with a valid solution , that also have . In conclusion, .
The proof above gave us a concrete polynomial-time procedure to convert any valid solution over to a valid solution over such that . From that we can conclude that it is suffice to find a 2-approximation for the “easier” instance and then converts it to a 2-approximation solution for the original instance , note also that the construction of from can be done in polynomial time (for example use Dijkstra's algorithm for computing the shortest paths).
In other words, we can assume for free (i.e. without losing anything) that is a clique and have the triangle inequality, and design an approximation algorithm that deals only with such instances.

3 Approximation Algorithm for the Easier Problem

Let be a clique, be a weight function that obeys the triangle inequality, and is a set of terminals. Use any known polynomial time algorithm to get a Minimum Spanning Tree (MST) on (for instance, use Kruskal's algorithm), let the edges of that tree be . Output as the approximated solution for the Steiner Tree problem.
Putting it differently, we claim that , when is some optimal solution for the Steiner Tree problem for that “easier” instance.
And now we will prove that.
Let be some optimal solution for the Steiner Tree problem. It must be a tree because otherwise we have a redundant edge such that removing it will decrease the weight of the solution and keep it valid (the terminals will stay connected); in contradiction to the optimality of . Run a Depth First Search (DFS) at the root of , traverse all the nodes in the tree and return back to the root. Memorize all the nodes traversed that way (with repetitions) and denote that sequence . By the way we build , it is a cycle that touches all the vertices of (and maybe also others) and uses twice any edge of , once for the forward pass and once for the backward pass, so it has a length . Suppose that , then we conclude that
Define the following order on : for any two terminals , we say that comes before if the first appearance of in comes before the first appearance of in . Let be the sequence of ordered terminals, note that it is also defines a path because is a clique, that is, for any . Denote that path .
By the triangle inequality, for any pair the weighed subpath in , which is the weight of a single edge, is less or equal to the weighted length in (which may use more than one edge), thus .
Since touches all the terminals in exactly once, it has no cycles, hence it is also a spanning tree over the vertices and edges of the clique. Recall that the proposed solution is a minimum spanning tree on , therefore . We conclude that as required.