Policy Gradient Reinforcement Learning in TensorFlow 2

In a series of recent posts, I have been reviewing the various Q based methods of deep reinforcement learning (see here, here, here, here and so on). Deep Q based reinforcement learning operates by training a neural network to learn the Q value for each action of an agent which resides in a certain state s of the environment. The policy which guides the actions of the agent in this paradigm operates by a random selection of actions at the beginning of training (the epsilon greedy method), but then the agent will select actions based on the highest Q value predicted in each state s. The Q value is simply an estimation of future rewards which will result from taking action a. An alternative to the deep Q based reinforcement learning is to forget about the Q value and instead have the neural network estimate the optimal policy directly. Reinforcement learning methods based on this idea are often called Policy Gradient methods.

This post will review the REINFORCE or Monte-Carlo version of the Policy Gradient methodology. This methodology will be used in the Open AI gym Cartpole environment. All code used and explained in this post can be found on this site’s Github repository.  

Eager to build deep learning systems in TensorFlow 2? Get the book here

Policy Gradients and their theoretical foundation

This section will review the theory of Policy Gradients, and how we can use them to train our neural network for deep reinforcement learning. This section will feature a fair bit of mathematics, but I will try to explain each step and idea carefully for those who aren’t as familiar with the mathematical ideas. We’ll also skip over a step at the end of the analysis for the sake of brevity.  

In Policy Gradient based reinforcement learning, the objective function which we are trying to maximise is the following:

$$J(\theta) = \mathbb{E}_{\pi_\theta} \left[\sum_{t=0}^{T-1} \gamma^t r_t \right]$$
The function above means that we are attempting to find a policy ($\pi$) with parameters ($\theta$) which maximises the expected value of the sum of the discounted rewards of an agent in an environment. Therefore, we need to find a way of varying the parameters of the policy $\theta$ such that the expected value of the discounted rewards are maximised. In the deep reinforcement learning case, the parameters $\theta$ are the parameters of the neural network.
Note the difference to the deep Q learning case – in deep Q based learning, the parameters we are trying to find are those that minimise the difference between the actual Q values (drawn from experiences) and the Q values predicted by the network. However, in Policy Gradient methods, the neural network directly determines the actions of the agent – usually by using a softmax output and sampling from this. 
Also note that, because environments are usually non-deterministic, under any given policy ($\pi_\theta$) we are not always going to get the same reward. Rather, we are going to be sampling from some probability function as the agent operates in the environment, and therefore we are trying to maximise the expected sum of rewards, not the 100% certain, we-will-get-this-every-time reward sum.  
Ok, so we want to learn the optimal $\theta$. The way we generally learn parameters in deep learning is by performing some sort of gradient based search of $\theta$. So we want to iteratively execute the following:
$$\theta \leftarrow \theta + \alpha \nabla J(\theta)$$
So the question is, how do we find $\nabla J(\theta)$?

Finding the Policy Gradient

First, let’s make the expectation a little more explicit. Remember, the expectation of the value of a function $f(x)$ is the summation of all the possible values due to variations in x multiplied by the probability of x, like so:

$$\mathbb{E}[f(x)] = \sum_x P(x)f(x)$$
Ok, so what does the cashing out of the expectation in $J(\theta)$ look like? First, we have to define the function which produces the rewards, i.e. the rewards equivalent of $f(x)$ above. Let’s call this $R(\tau)$ (where
$R(\tau) = \sum_{t=0}^{T-1}r_t$, ignoring discounting for the moment).  The value $\tau$ is the trajectory of the agent “moving” through the environment. It can be defined as:
$$\tau = (s_0, a_0, r_0, s_1, a_1, r_1, \ldots, s_{T-1}, a_{T-1}, r_{T-1}, s_T)$$
The trajectory, as can be seen, is the progress of the agent through an episode of a game of length T. This trajectory is the fundamental factor which determines the sum of the rewards – hence $R(\tau)$. This covers the $f(x)$ component of the expectation definition. What about the $P(x)$ component? In this case, it is equivalent to $P(\tau)$ – but what does this actually look like in a reinforcement learning environment? 
It consists of the two components – the probabilistic policy function which yields an action $a_t$ from states $s_t$ with a certain probability, and a probability that state $s_{t+1}$ will result from taking action $a_t$ from state $s_t$. The latter probabilistic component is uncertain due to the random nature of many environments. These two components operating together will “roll out” the trajectory of the agent $\tau$. 
$P(\tau)$ looks like:
$$P(\tau) = \prod_{t=0}^{T-1} P_{\pi_{\theta}}(a_t|s_t)P(s_{t+1}|s_t,a_t)$$
If we take the first step, starting in state $s_0$ – our neural network will produce a softmax output with each action assigned a certain probability. The action is then selected by weighted random sampling subject to these probabilities – therefore, we have a probability of action $a_0$ being selected according to $P_{\pi_{\theta}}(a_t|s_t)$. This probability is determined by the policy $\pi$ which in turn is parameterised according to $\theta$ (i.e. a neural network with weights $\theta$.  The next term will be $P(s_1|s_0,a_0)$ which expresses any non-determinism in the environment.
(Note: the vertical line in the probability functions above are conditional probabilities. $P_{\pi_{\theta}}(a_t|s_t)$ refers to the probability of action $a_t$ being selected, given the agent is in state $s_t$). 
These probabilities are multiplied out over all the steps in the episode of length T to produce the trajectory $\tau$. (Note, the probability of being in the first state, $s_0$, has been excluded from this analysis for simplicity). Now we can substitute $P(\tau)$ and $R(\tau)$ into the original expectation and take the derivative to get to $\nabla J(\theta)$ which is what we need to do the gradient based optimisation. However, to get there, we first need to apply a trick or two. 
First, let’s take the log derivative of $P(\tau)$ with respect to $\theta$ i.e. $\nabla_\theta$ and work out what we get:
$$\nabla_\theta \log P(\tau) = \nabla \log \left(\prod_{t=0}^{T-1} P_{\pi_{\theta}}(a_t|s_t)P(s_{t+1}|s_t,a_t)\right) $$
$$ =\nabla_\theta \left[\sum_{t=0}^{T-1} (\log P_{\pi_{\theta}}(a_t|s_t) + \log P(s_{t+1}|s_t,a_t)) \right]$$
$$ =\nabla_\theta \sum_{t=0}^{T-1}\log P_{\pi_{\theta}}(a_t|s_t)$$
The reason we are taking the log will be made clear shortly. As can be observed, when the log is taken of the multiplicative operator ($\prod$) this is converted to a summation (as multiplying terms within a log function is equivalent to adding them separately). In the final line, it can be seen that taking the derivative with respect to the parameters ($\theta$) removes the dynamics of the environment ($\log P(s_{t+1}|s_t,a_t))$) as these are independent of the neural network parameters / $\theta$. 
Let’s go back to our original expectation function, substituting in our new trajectory based functions, and apply the derivative (again ignoring discounting for simplicity):
$$J(\theta) = \mathbb{E}[R(\tau)]$$
$$ = \smallint P(\tau) R(\tau)$$
$$\nabla_\theta J(\theta) = \nabla_\theta \smallint P(\tau) R(\tau)$$
So far so good. Now, we are going to utilise the following rule which is sometimes called the “log-derivative” trick:
$$\frac{\nabla_\theta p(X,\theta)}{p(X, \theta)} = \nabla_\theta \log p(X,\theta)$$
We can then apply the $\nabla_{\theta}$ operator within the integral, and cajole our equation so that we get the $\frac{\nabla_{\theta} P{\tau}}{P(\tau)}$ expression like so:
$$\nabla_\theta J(\theta)=\int P(\tau) \frac{\nabla_\theta P(\tau)}{P(\tau)} R(\tau)$$
Then, using the log-derivative trick and applying the definition of expectation, we arrive at:
$$\nabla_\theta J(\theta)=\mathbb{E}\left[R(\tau) \nabla_\theta logP(\tau)\right]$$
We can them substitute our previous derivation of $\nabla_{\theta} log P(\tau)$ into the above to arrive at:
$$\nabla_\theta J(\theta) =\mathbb{E}\left[R(\tau) \nabla_\theta \sum_{t=0}^{T-1} log P_{\pi_{\theta}}(a_t|s_t)\right]$$
This is now close to the point of being something we can work with in our learning algorithm. Let’s take it one step further by recognising that, during our learning process, we are randomly sampling trajectories from the environment, and hoping to make informed training steps. Therefore, we can recognise that, to maximise the expectation above, we need to maximise it with respect to its argument i.e. we maximise:
$$\nabla_\theta J(\theta) \sim R(\tau) \nabla_\theta \sum_{t=0}^{T-1} log P_{\pi_{\theta}}(a_t|s_t)$$
Recall that $R(\tau)$ is equal to $R(\tau) = \sum_{t=0}^{T-1}r_t$ (ignoring discounting). Therefore, we have two summations that need to be multiplied out, element by element. It turns out that after doing this, we arrive at an expression like so:
$$\nabla_\theta J(\theta) \sim \left(\sum_{t=0}^{T-1} log P_{\pi_{\theta}}(a_t|s_t)\right)\left(\sum_{t’= t + 1}^{T} \gamma^{t’-t-1} r_{t’} \right)$$
As can be observed, there are two main components that need to be multiplied. However, one should note the differences in the bounds of the summation terms in the equation above – these will be explained in the next section.

Calculating the Policy Gradient

The way we compute the gradient as expressed above in the REINFORCE method of the Policy Gradient algorithm involves sampling trajectories through the environment to estimate the expectation, as discussed previously. This REINFORCE method is therefore a kind of Monte-Carlo algorithm. Let’s consider this a bit more concretely. 
Let’s say we initialise the agent and let it play a trajectory $\tau$ through the environment. The actions of the agent will be selected by performing weighted sampling from the softmax output of the neural network – in other words, we’ll be sampling the action according to $P_{\pi_{\theta}}(a_t|s_t)$. At each step in the trajectory, we can easily calculate $log P_{\pi_{\theta}}(a_t|s_t)$ by simply taking the log of the softmax output values from the neural network. So, for the first step in the trajectory, the neural network would take the initial states $s_0$ as input, and it would produce a vector of actions $a_0$ with pseudo-probabilities generated by the softmax operation in the final layer. 
What about the second part of the $\nabla_\theta J(\theta)$ equation – $\sum_{t’= t + 1}^{T} \gamma^{t’-t-1} r_{t’}$? We can see that the summation term starts at $t’ = t + 1 = 1$. The summation then goes from t=1 to the total length of the trajectory T – in other words, from t=1 to the total length of the episode. Let’s say the episode length was 4 states long – this term would then look like $\gamma^0 r_1 + \gamma^1 r_2 + \gamma^2 r_3$, where $\gamma$ is the discounting factor and is < 1. 
Straight-forward enough. However, you may have realised that, in order to calculate the gradient $\nabla_\theta J(\theta)$ at the first step in the trajectory/episode, we need to know the reward values of all subsequent states in the episode. Therefore, in order to execute this method of learning, we can only take gradient learning steps after the full episode has been played to completion. Only after the episode is complete can we perform the training step. 
We are almost ready to move onto the code part of this tutorial. However, this is a good place for a quick discussion about how we would actually implement the calculations $\nabla_\theta J(\theta)$ equation in TensorFlow 2 / Keras. It turns out we can just use the standard cross entropy loss function to execute these calculations. Recall that cross entropy is defined as (for a deeper explanation of entropy, cross entropy, information and KL divergence, see this post):
$$CE = -\sum p(x) log(q(x))$$
Which is just the summation between one function $p(x)$ multiplied by the log of another function $q(x)$ over the possible values of the argument x. If we look at the source code of the Keras implementation of cross-entropy, we can see the following:
Keras output of cross-entropy loss function

Keras output of cross-entropy loss function

The output tensor here is simply the softmax output of the neural network, which, for our purposes, will be a tensor of size (num_steps_in_episode, num_actions). Note that the log of output is calculated in the above. The target value, for our purposes, can be all the discounted rewards calculated at each step in the trajectory, and will be of size (num_steps_in_episode, 1). The summation of the multiplication of these terms is then calculated (reduce_sum). Gradient based training in TensorFlow 2 is generally a minimisation of the loss function, however, we want to maximise the calculation as discussed above. The good thing is, the sign of cross entropy calculation shown above is inverted – so we are good to go. 

To call this training step utilising Keras, all we have to do is execute something like the following:

network.train_on_batch(states, discounted_rewards)

Here, we supply all the states gathered over the length of the episode, and the discounted rewards at each of those steps. The Keras backend will pass the states through network, apply the softmax function, and this will become the output variable in the Keras source code snippet above. Likewise, discounted_rewards is the same as target in the source code snippet above. 

Now that we have covered all the pre-requisite knowledge required to build a REINFORCE-type method of Policy Gradient reinforcement learning, let’s have a look at how this can be coded and applied to the Cartpole environment.

Policy Gradient reinforcement learning in TensorFlow 2 and Keras

In this section, I will detail how to code a Policy Gradient reinforcement learning algorithm in TensorFlow 2 applied to the Cartpole environment. As always, the code for this tutorial can be found on this site’s Github repository.

First, we define the network which we will use to produce $P_{\pi_{\theta}}(a_t|s_t)$ with the state as the input:

GAMMA = 0.95

env = gym.make("CartPole-v0")
state_size = 4
num_actions = env.action_space.n

network = keras.Sequential([
    keras.layers.Dense(30, activation='relu', kernel_initializer=keras.initializers.he_normal()),
    keras.layers.Dense(30, activation='relu', kernel_initializer=keras.initializers.he_normal()),
    keras.layers.Dense(num_actions, activation='softmax')

As can be observed, first the environment is initialised. Next, the network is defined using the Keras Sequential API. The network consists of 3 densely connected layers. The first 2 layers have ReLU activations, and the final layer has a softmax activation to produce the pseudo-probabilities to approximate $P_{\pi_{\theta}}(a_t|s_t)$. Finally, the network is compiled with a cross entropy loss function and an Adam optimiser. 

The next part of the code chooses the action from the output of the model:

def get_action(network, state, num_actions):
    softmax_out = network(state.reshape((1, -1)))
    selected_action = np.random.choice(num_actions, p=softmax_out.numpy()[0])
    return selected_action

As can be seen, first the softmax output is extracted from the network by inputing the current state. The action is then selected by making a random choice from the number of possible actions, with the probabilities weighted according to the softmax values. 

The next function is the main function involved in executing the training step:

def update_network(network, rewards, states, actions, num_actions):
    reward_sum = 0
    discounted_rewards = []
    for reward in rewards[::-1]:  # reverse buffer r
        reward_sum = reward + GAMMA * reward_sum
    discounted_rewards = np.array(discounted_rewards)
    # standardise the rewards
    discounted_rewards -= np.mean(discounted_rewards)
    discounted_rewards /= np.std(discounted_rewards)
    states = np.vstack(states)
    loss = network.train_on_batch(states, discounted_rewards)
    return loss

First, the discounted rewards list is created: this is a list where each element corresponds to the summation from t + 1 to T according to $\sum_{t’= t + 1}^{T} \gamma^{t’-t-1} r_{t’}$. The input argument rewards is a list of all the rewards achieved at each step in the episode. The rewards[::-1] operation reverses the order of the rewards list, so the first run through the for loop will deal with last reward recorded in the episode. As can be observed, a reward sum is accumulated each time the for loop is executed. Let’s say that the episode length is equal to 4 – $r_3$ will refer to the last reward recorded in the episode. In this case, the discounted_rewards list would look like:

[$r_3$, $r_2 + \gamma r_3$, $r_1 + \gamma r_2 + \gamma^2 r_3$, $r_0 + \gamma r_1 + \gamma^2 r_2 + \gamma^3 r_3$]


This list is in reverse to the order of the actual state value list (i.e. [$s_0$, $s_1$, $s_2$, $s_3$]), so the next line after the for loop reverses the list (discounted_rewards.reverse()). 

Next, the list is converted into a numpy array, and the rewards are normalised to reduce the variance in the training. Finally, the states list is stacked into a numpy array and both this array and the discounted rewards array are passed to the Keras train_on_batch function, which was detailed earlier. 

The next part of the code is the main episode and training loop:

num_episodes = 10000000
train_writer = tf.summary.create_file_writer(STORE_PATH + f"/PGCartPole_{dt.datetime.now().strftime('%d%m%Y%H%M')}")
for episode in range(num_episodes):
    state = env.reset()
    rewards = []
    states = []
    actions = []
    while True:
        action = get_action(network, state, num_actions)
        new_state, reward, done, _ = env.step(action)

        if done:
            loss = update_network(network, rewards, states, actions, num_actions)
            tot_reward = sum(rewards)
            print(f"Episode: {episode}, Reward: {tot_reward}, avg loss: {loss:.5f}")
            with train_writer.as_default():
                tf.summary.scalar('reward', tot_reward, step=episode)
                tf.summary.scalar('avg loss', loss, step=episode)

        state = new_state

As can be observed, at the beginning of each episode, three lists are created which will contain the state, reward and action values for each step in the episode / trajectory. These lists are appended to until the done flag is returned from the environment signifying that the episode is complete. At the end of the episode, the training step is performed on the network by running update_network. Finally, the rewards and loss are logged in the train_writer for viewing in TensorBoard. 

The training results can be observed below:

Training progress of Policy Gradient RL on Cartpole environment

Training progress of Policy Gradient RL in Cartpole environment

As can be observed, the rewards steadily progress until they “top out” at the maximum possible reward summation for the Cartpole environment, which is equal to 200. However, the user can verify that repeated runs of this version of Policy Gradient training has a high variance in its outcomes. Therefore, improvements in the Policy Gradient REINFORCE algorithm are required and available – these improvements will be detailed in future posts.


Eager to build deep learning systems in TensorFlow 2? Get the book here



38 thoughts on “Policy Gradient Reinforcement Learning in TensorFlow 2”

  1. Your article has proven useful to me. It’s very informative and you are obviously very knowledgeable in this area. You have opened my eyes to varying views on this topic with interesting and solid content.jobs

  2. I’ve got an error in :

    def update_network(network, rewards, states):
    reward_sum = 0
    discounted_rewards = []
    for reward in rewards[::-1]: # reverse buffer r
    reward_sum = reward + GAMMA * reward_sum
    loss = network.train_on_batch(states, discounted_rewards)
    return loss

    ValueError: Shapes (33, 1) and (33, 2) are incompatible

  3. There are definitely quite a lot of particulars like that to take into consideration. That may be a great point to bring up. I supply the ideas above as normal inspiration however clearly there are questions like the one you deliver up where a very powerful thing shall be working in sincere good faith. I don?t know if best practices have emerged round issues like that, however I am sure that your job is clearly recognized as a fair game. Each girls and boys really feel the affect of only a moment’s pleasure, for the rest of their lives.slot gacor

  4. Hi

    Defrost frozen foods in minutes safely and naturally with our THAW KING™.

    50% OFF for the next 24 Hours ONLY + FREE Worldwide Shipping for a LIMITED

    Buy now: thawking.online

    All the best,


  5. Aus ganzen, fein gemahlenen Champignons aus kontrolliert biologischem Anbau stellen wir VINATURA BIO CHAGA Kapseln in Deutschland her.Durch besonders schonende Trocknungsmethoden und schonende Temperatureinstellungen halten wir dem Pilz die wichtigen Enzyme und Nährstoffe fern. Der Hauptbestandteil von Chaga ist Melanin, das die Pilze produzieren, um sich vor dem rauen Klima ihres Verbreitungsgebietes im Norden zu schützen. Neben den wertvollen Polysacchariden enthält die kleine Wunderknolle noch einige weitere bioaktive Inhaltsstoffe. Der Chaga ist eine echte Wunderkiste!

  6. I intended to put you that bit of observation in order to give thanks again for the amazing solutions you have featured on this site. It’s really extremely open-handed of people like you giving unhampered just what most people could possibly have offered for an e-book to help with making some bucks on their own, notably considering that you might well have tried it if you ever desired. The strategies additionally served to become fantastic way to be certain that other people have similar dreams similar to my personal own to learn great deal more on the topic of this condition. I believe there are some more pleasurable situations ahead for those who discover your blog post.

  7. When do you think this Real Estate market will go back in a positive direction? Or is it still too early to tell? We are seeing a lot of housing foreclosures in Altamonte Springs Florida. What about you? Would love to get your feedback on this.Article source

  8. I’m so happy to read this. This is the kind of manual that needs to be given and not the accidental misinformation that is at the other blogs. Appreciate your sharing this greatest doc.

  9. I was just looking for this info for some time. After 6 hours of continuous Googleing, at last I got it in your site. I wonder what is the lack of Google strategy that do not rank this kind of informative sites in top of the list. Normally the top web sites are full of garbage.

  10. Hey there, I just found your site, quick question…

    My name’s Eric, I found adventuresinmachinelearning.com after doing a quick search – you showed up near the top of the rankings, so whatever you’re doing for SEO, looks like it’s working well.

    So here’s my question – what happens AFTER someone lands on your site? Anything?

    Research tells us at least 70% of the people who find your site, after a quick once-over, they disappear… forever.

    That means that all the work and effort you put into getting them to show up, goes down the tubes.

    Why would you want all that good work – and the great site you’ve built – go to waste?

    Because the odds are they’ll just skip over calling or even grabbing their phone, leaving you high and dry.

    But here’s a thought… what if you could make it super-simple for someone to raise their hand, say, “okay, let’s talk” without requiring them to even pull their cell phone from their pocket?

    You can – thanks to revolutionary new software that can literally make that first call happen NOW.

    Talk With Web Traffic is a software widget that sits on your site, ready and waiting to capture any visitor’s Name, Email address and Phone Number. It lets you know IMMEDIATELY – so that you can talk to that lead while they’re still there at your site.

    You know, strike when the iron’s hot!

    CLICK HERE https://talkwithwebtraffic.com to try out a Live Demo with Talk With Web Traffic now to see exactly how it works.

    When targeting leads, you HAVE to act fast – the difference between contacting someone within 5 minutes versus 30 minutes later is huge – like 100 times better!

    That’s why you should check out our new SMS Text With Lead feature as well… once you’ve captured the phone number of the website visitor, you can automatically kick off a text message (SMS) conversation with them.

    Imagine how powerful this could be – even if they don’t take you up on your offer immediately, you can stay in touch with them using text messages to make new offers, provide links to great content, and build your credibility.

    Just this alone could be a game changer to make your website even more effective.

    Strike when the iron’s hot!

    CLICK HERE https://talkwithwebtraffic.com to learn more about everything Talk With Web Traffic can do for your business – you’ll be amazed.

    Thanks and keep up the great work!

    PS: Talk With Web Traffic offers a FREE 14 days trial – you could be converting up to 100x more leads immediately!
    It even includes International Long Distance Calling.
    Stop wasting money chasing eyeballs that don’t turn into paying customers.
    CLICK HERE https://talkwithwebtraffic.com to try Talk With Web Traffic now.

    If you’d like to unsubscribe click here http://talkwithwebtraffic.com/unsubscribe.aspx?d=adventuresinmachinelearning.com

  11. Good day,

    My name is Eric and unlike a lot of emails you might get, I wanted to instead provide you with a word of encouragement – Congratulations

    What for?

    Part of my job is to check out websites and the work you’ve done with adventuresinmachinelearning.com definitely stands out.

    It’s clear you took building a website seriously and made a real investment of time and resources into making it top quality.

    There is, however, a catch… more accurately, a question…

    So when someone like me happens to find your site – maybe at the top of the search results (nice job BTW) or just through a random link, how do you know?

    More importantly, how do you make a connection with that person?

    Studies show that 7 out of 10 visitors don’t stick around – they’re there one second and then gone with the wind.

    Here’s a way to create INSTANT engagement that you may not have known about…

    Talk With Web Visitor is a software widget that’s works on your site, ready to capture any visitor’s Name, Email address and Phone Number. It lets you know INSTANTLY that they’re interested – so that you can talk to that lead while they’re literally checking out adventuresinmachinelearning.com.

    CLICK HERE http://talkwithcustomer.com to try out a Live Demo with Talk With Web Visitor now to see exactly how it works.

    It could be a game-changer for your business – and it gets even better… once you’ve captured their phone number, with our new SMS Text With Lead feature, you can automatically start a text (SMS) conversation – immediately (and there’s literally a 100X difference between contacting someone within 5 minutes versus 30 minutes.)

    Plus then, even if you don’t close a deal right away, you can connect later on with text messages for new offers, content links, even just follow up notes to build a relationship.

    Everything I’ve just described is simple, easy, and effective.

    CLICK HERE http://talkwithcustomer.com to discover what Talk With Web Visitor can do for your business.

    You could be converting up to 100X more leads today!

    PS: Talk With Web Visitor offers a FREE 14 days trial – and it even includes International Long Distance Calling.
    You have customers waiting to talk with you right now… don’t keep them waiting.
    CLICK HERE http://talkwithcustomer.com to try Talk With Web Visitor now.

    If you’d like to unsubscribe click here http://talkwithcustomer.com/unsubscribe.aspx?d=adventuresinmachinelearning.com

  12. Насосы: тепловые насосы – циркуляционные насосы для вашей системы отопления.
    Это грудь любой системы отопления: циркуляционный насос, часто также называемый тепловым насосом. Беспричинно от того, работает ли способ отопления на газе, масле, тепловом насосе тож солнечной системе – без циркуляционного насоса ничего не работает. Он отвечает за циркуляцию отопительной воды в системе отопления и за то, воеже тепло доходило прежде потребителей тепла, таких как радиаторы. Посмотри как выбрать напор для циркуляционного насоса смотри на сайте [url=https://www.nasoscirkulyacionnyi.ru]nasoscirkulyacionnyi[/url] какой фирмы циркуляционный насос выбрать для отопления. Поскольку насосы должны мучиться до 5000 часов в год, они представляют собой одного из крупнейших потребителей энергии в домашнем хозяйстве, следовательно важно, чтобы они соответствовали последнему слову техники, поскольку современные высокоэффективные насосы потребляют прибл. Для 80% меньше электроэнергии. Всетаки большинство старых насосов являются устаревшими и крупногабаритными, поэтому стоит взглянуть на котельную!
    Циркуляционные насосы для отопления используются чтобы циркуляции отопительной воды в замкнутой системе отопления. Они надежно гарантируют, что теплая вода от теплогенератора направляется в радиаторы разве в систему теплого пола и возвращается вследствие обратку. Циркуляционный насос повинен как преодолеть гидравлическое отпор арматуры и трубопроводов из-за замкнутой системы.
    Экономьте электроэнергию с через современных циркуляционных насосов.
    Современные циркуляционные насосы чтобы систем отопления отличаются высокой эффективностью и низким энергопотреблением – только порядочно ватт во пора работы. По сравнению с устаревшими насосами прошлых десятилетий существует громадный потенциал экономии энергии. Именно поэтому стоит заменить заскорузлый насос отопления на современный высокоэффективный насос, что часто быстро окупается после счет небольшого энергопотребления. Циркуляционные насосы для систем отопления доступны в различных исполнениях, которые могут разниться сообразно максимальному напору и сообразно резьбе подключения.
    Циркуляционный насос также нередко называют тепловым насосом или насосом для теплого пола. Сообразно сути, это одни и те же водяные насосы, только названы по-разному из-за их применения. В зависимости через области применения и, в частности, присоединительных размеров, у нас вы свободно найдете счастливый насос.
    Циркуляционный насос – это насос, обеспечивающий снабжение систем отопления (теплой) водой. Циркуляционный насос можно встречать в каждом доме либо здании, где лупить радиаторы и / сиречь полы с подогревом. Когда есть и горячий пол, и радиаторы, то циркуляционных насосов несколько. В больших домах / зданиях в большинстве случаев также устанавливаются несколько циркуляционных насосов чтобы обеспечения полной циркуляции. Фактически вы можете говорить, сколько циркуляционный насос – это грудь вашей системы центрального отопления или системы теплых полов. Коли центральное отопление либо полы с подогревом не работают разве работают плохо, это нередко связано с циркуляционным насосом, какой должен заменить. В зависимости от системы отопления вы можете свободно найти циркуляционный насос следовать передней панелью вашего центрального отопления. Примерно во всех случаях размеры насоса, которым принужден отвечать другой циркуляционный насос из нашего ассортимента, указаны на вашем старом насосе. Коли это не так, вы можете свободно найти марку и серийный номер в таблице обмена . Наши инструкции, приведенные ниже, помогут вам заменить насос отопления.
    Мы выбрали три насоса (центрального) отопления иначе циркуляционные насосы ради нашего ассортимента. Обладая этими тремя размерами, мы можем предложить замену теплового насоса для каждой установки, с помощью которой вы также напрямую сэкономите для расходах на электроэнергию. Подробнее об этом ниже! Кроме того, заменить циркуляционный насос, насос отопления разве насос теплого пола довольно просто. Это экономит затраты на установку. Ознакомьтесь с инструкциями и / тож видео ниже. Разумеется, вы также можете обещать установку теплового насоса у специалиста. Все чтобы экономии средств рекомендуем приказывать циркуляционный насос у нас сообразно конкурентоспособной цене. Тогда вы платите только изза установку у специалиста.

  13. Wow that was odd. I just wrote an very long comment but after I clicked submit my comment didn’t show up.
    Grrrr… well I’m not writing all that over again. Anyways, just wanted to say fantastic blog!

  14. Good post. I be taught one thing tougher on completely different blogs everyday. It can at all times be stimulating to read content material from different writers and follow a bit one thing from their store. I’d favor to make use of some with the content material on my blog whether or not you don’t mind. Natually I’ll offer you a hyperlink in your web blog. Thanks for sharing.

Leave a Reply

Your email address will not be published. Required fields are marked *