PPO Proximal Policy Optimization reinforcement learning in TensorFlow 2

In previous posts (here and here), I have been covering policy gradient-based reinforcement learning methods. In this post, I will continue the series by covering another pseudo-policy gradient based method called Proximal Policy Optimization (PPO). The PPO technique was designed to make some improvements on the Trust Region Policy Optimization (TRPO) algorithm, which in turn was designed to improve the Advantage Actor Critic (A2c) method. In this post, I’ll explain the theory surrounding the development of Proximal Policy Optimization and provide a worked coding example using PPO to train an agent to play the Cartpole environment from the Open AI Gym.

All code contained in this post can be found on this site’s Github repository here.

PPO theory

A policy gradient and A2C recap

A policy gradient-based method of reinforcement learning selection agent actions based on the output of a neural network, with each output corresponding to the probability that a certain action should be taken. This probability distribution is sampled from to produce actions during training.

The A2c algorithm within the family of policy gradient-based reinforcement learning methods has a policy gradient term equal to:

$$\nabla_\theta J(\theta) \sim \left(\sum_{t=0}^{T-1} log P_{\pi_{\theta}}(a_t|s_t)\right)A(s_t, a_t)$$

It should be noted that the above term is for gradient ascent, not descent i.e. this term should be maximized. Recall that the advantage is expressed as:

$$A(s_t, a_t) = Q(s_t, a_t) – V(s_t)$$

Which is a kind of “relative” or normalized benefit of taking action $a_t$ from state $s_t$, with the normalization arising from the overall value of being in that state $V(s_t)$ (for further discussion on the advantage and value estimates, see this post). This loss weights the log probability of taking the action with the advantage. The probability of taking the action is generated from the policy output of the neural network, with a softmax activation applied. So if an action is taken with high probability, let’s say 0.8, and this results in a high advantage – then this policy output will be reinforced. If the advantage is low, alternatively, then this high probability policy output will be discouraged.

Via Bellman’s equation, the advantage can actually be estimated purely from the value estimate:

$$A(s_t, a_t) = r_{t+1} + \gamma V(s_{t+1}) – V(s_t)$$

This enables us to create an advantaged weighted policy gradient method by also estimating the value of any given state $V(s_t)$. This requires a two-output neural network, one for estimating action probabilities (actor part) and the other for state values (critic part), with architecture as shown below:

A2C architecture

A2C architecture


The architecture shown below is for a network whose state inputs are game pixels. For other state types, such as the state variables for the Cartpole environment, the input layers are usually densely connected. The loss function for the value output in the A2C architecture is usually simply the mean-squared error between the predicted values and the actual values calculated from the discounted future rewards.

What’s the disadvantage with the A2C method?

There are a couple of disadvantages of the A2C method. These can be summarised as follows:

  1. Large gradient steps can move the network weights into sub-optimal areas and wreck (or at least slow) the training progress
  2. There is poor sample efficiency – the batch samples can only inform the network training once

These disadvantages were addressed in the Trust Region Policy Optimization (TRPO) method, which ultimately leads to the PPO method, and will be discussed in the following section in that context.

Trust Region Policy Optimization

There are some very interesting ideas in TRPO, however, in this post, I will only be covering them briefly without too much theory.

Improving sampling efficiency

The first problem to address is the poor sample efficiency of A2C (and other Policy Gradient methods). A set of training samples are collected by playing the game, which in reinforcement learning can be a relatively lengthy process, and they are only used to train the network once. This is not great from a computational perspective, especially given that reinforcement learning may require hundreds of thousands or millions of training steps to achieve good results. In policy gradient-based methods, ideally we would like to use samples generated via the rolling out of a trajectory using an older version of the policy, to train a new policy.

But how can we do this in a mathematically valid way? It turns out we can use the concept of Importance Sampling (IS). Importance sampling allows us to compute the expected value of function $f(x)$, where $x$ is sampled from a distribution $p(x)$, by instead sampling $x$ from another distribution $q(x)$. That’s probably a bit confusing, but this is what it looks like mathematically:

$$\mathbb{E}_{x \sim p(x)} \left[f(x)\right] = \mathbb{E}_{x \sim q(x)} \left[\frac{p(x)}{q(x)} f(x) \right]$$

Where $x \sim p(x)$ in the expectation subscript refers to $x$ being sampled from $p(x)$ and the same goes for $x \sim q(x)$. We can apply this formula to the case where we might want to use actions, states and advantage values sampled during the roll-out of an old policy $\pi_{\theta old}$ to train the current policy. First recall that the policy gradient part of the loss for A2C, with the expectation still shown (before it is cashed out into an empirical sum sampled from a trajectory), looks like:

$$\nabla_\theta J(\theta) = \mathbb{E} \left[ \nabla_{\theta} log P_{\pi_{\theta}}(a_t|s_t) A(s_t, a_t)\right]$$

Now, using importance sampling, we can replace this with:

$$\nabla_\theta J(\theta) = \mathbb{E} \left[\frac{\nabla_{\theta} P_{\pi_{\theta}}(a_t|s_t)}{P_{\pi_{\theta old}}(a_t|s_t)} A_{\theta old}(s_t, a_t)\right]$$

You may be wondering where the $log$ went in the formula above, however it turns out that the derivative of both of these expressions are equivalent for small step sizes, as discussed in the original TRPO paper.

This use of importance sampling now allows the same batch of samples from a trajectory rollout to be used many times to train the current policy, making the policy gradient method much more sample efficient.

Improving step stability

As mentioned above, one of the issues of the A2C method of reinforcement learning is that it is unstable. In other words, the gradient steps during training can derail the agent’s learning process. Consider the image below of the Hilary Step on Mount Everest:

Hilary Step – Mt Everest By Debasish biswas kolkata – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=46637445

Imagine the training process of the agent is to ascend the very narrow path to the summit, which represents optimal agent behavior. However, as can be observed, the path is very narrow, and a step too large in any direction will send the agent “over the cliff” perhaps permanently damaging the training process for the agent. This is perhaps an extreme example, though maybe not – reinforcement learning optimization landscapes are infamously sharp and “treacherous”. In any case, this fact restricts both the speed of the training and the learning rates that are possible during training. If the learning rates or steps are too high, then there will be a greater risk of derailing the training process.

This is especially a problem if we are hoping to improve sample efficiency and run multiple training steps over the same set of trajectory samples. This is where the “trust region” part of TRPO comes in. Trust region optimization involves creating a small area around the current point which specifies the maximum radius of a step that can be taken in the next gradient descent/ascent iteration. Consider the two diagrams below, from this good overview of trust-region methods:

First trust region example

First trust region example

Second trust region example

Second trust region example

The dark circles around the points in the diagrams above are the trust regions, which define the boundaries of the maximum next step in whatever gradient method is being used. The trust region is either reduced or expanded depending on how well the last step performed – if it resulted in a significant improvement, the trust region size is expanded, if it resulted in a degradation in performance, the trust region size is reduced. However, the trust region size can also be a simple fixed hyper-parameter, as is the case in TRPO as will be shown below.

The optimization problem solved in TRPO is approximately as follows:

$$J(\theta) = \mathbb{E} \left[\frac{P_{\pi_{\theta}}(a_t|s_t)}{P_{\pi_{\theta old}}(a_t|s_t)} A_{\theta old}(s_t, a_t)\right]$$

With an optimization constraint of:

$$D_{KL}(\theta | \theta_{old}) \leq \delta $$

Here $D_{KL}$ is the Kullback–Leibler divergence (KL divergence). The KL divergence is a way of measuring the distance between two probability distributions, so this constraint ensures that the new parameters of the policy aren’t too different from the old parameters. In other words, this constraint enforces a trust region on the parameter / policy updates.

In the original paper, the conjugate gradient optimization method is used to solve this optimization problem with the aforementioned $D_{KL}(\theta | \theta_{old})$ based constraint. However, compared to the standard optimization methods commonly used in deep learning, the conjugate gradient is slow. Therefore, an improvement of TRPO that removes the need for strict constraints and the use of conjugate gradient optimization, while still keeping both the trust region concept and enabling the improvement of sample efficiency, would clearly be desirable. Proximal Policy Optimization (PPO) is one way of achieving these goals.

Proximal Policy Optimization (PPO)

As shown above, the importance sampling version of the policy loss function, which enables better sample efficiency, is expressed as follows in the original PPO paper:

$$L^{CPI}(\theta) = \mathbb{E}_t \left[\frac{\pi_\theta (a_t | s_t)}{\pi_{\theta old}(a_t | s_t)}A_t\right] = \mathbb{E}_t\left[r_t(\theta)A_t\right]$$

Here the superscipt CPI stands for “conservative policy iteration”, and the $\pi_\theta (a_t | s_t)$ notation expresses the same probability as shown previously (i.e. $P_{\pi_{\theta}}(a_t|s_t)$). This loss function, while allowing greater sample efficiency by using importance sampling, as discussed above, will still be subject to large gradient updates derailing training progress. In order to introduce a quasi-trust region limitation, while not introducing conjugate gradient-based optimization or strict constraints, the authors of the PPO paper introduce the following alternative loss function:

$$L^{CLIP}(\theta) = \mathbb{E}_t\left[min(r_t(\theta))A_t, clip(r_t(\theta), 1 – \epsilon, 1 + \epsilon)A_t)\right]$$

Where $\epsilon$ is a hyperparameter recommended to be between 0.1 to 0.3. This CLIP loss function is essentially the same as the CPI loss function, apart from when the probability of an action under the new parameters ($\pi_\theta (a_t | s_t)$) is greater than the old parameters by a certain degree (i.e. r > 1 + $\epsilon$), after which the loss value is clipped. This ensures that no large steps can occur in the gradient updates, which is essentially enforcing a trust region. The plots below for both positive and negative advantage values illustrate the behavior of $L^{CLIP}$ with respect to $r$:

PPO clipped loss function

PPO clipped loss function (from https://arxiv.org/pdf/1707.06347.pdf)

The total loss function for the PPO method consists of $L^{CLIP}$, the mean-square error loss of the value estimator, and a term to encourage higher entropy / greater exploration (with the latter two terms identical to the components in A2C). It can be expressed as follows, with signs introduced to convert a maximization function to a minimization function:

$$L_{TOTAL} = -L^{CLIP} + L^{VALUE} * k_1 – L^{ENTROPY} * k_2$$

Where the $k$ values are hyperparameters that weigh the various components of the loss against $L^{CLIP}$ (with $k_1$ usually on the order of 0.5, and $k_2$ on the order of 0.01). In the following section, I will present a code work-through of PPO being utilized to train a CartPole agent.

PPO code walk-through

As stated above, all code in this walk-through can be found on this site’s Github repository. The RL environment that will be used is Open AI’s Cartpole environment. In this example, I’ll be making use of TensorFlow’s GradientTape functionality and the Keras model API.

The code below shows the establishment of some constants and the creation of the neural network model structure:

import tensorflow as tf
from tensorflow import keras
import tensorflow_probability as tfp
import numpy as np
import gym
import datetime as dt

STORE_PATH = 'C:\\Users\\andre\\TensorBoard\\PPOCartpole'
GAMMA = 0.99
LR = 0.001


env = gym.make("CartPole-v0")
state_size = 4
num_actions = env.action_space.n

ent_discount_val = ENTROPY_LOSS_WEIGHT

class Model(keras.Model):
    def __init__(self, num_actions):
        self.num_actions = num_actions
        self.dense1 = keras.layers.Dense(64, activation='relu',
        self.dense2 = keras.layers.Dense(64, activation='relu',
        self.value = keras.layers.Dense(1)
        self.policy_logits = keras.layers.Dense(num_actions)

    def call(self, inputs):
        x = self.dense1(inputs)
        x = self.dense2(x)
        return self.value(x), self.policy_logits(x)

    def action_value(self, state):
        value, logits = self.predict_on_batch(state)
        dist = tfp.distributions.Categorical(logits=logits)
        action = dist.sample()
        return action, value

Most of the constants declared in the first part of the code above are familiar from the A2C tutorial – the critic and entropy loss weight values, gamma value (reward discounting factor), and so on. However, a few new values need to be discussed. First, I have included an entropy discounting rate (ENT_DISCOUNT_RATE) – after each episode, the entropy loss weight value will be discounted by this value to discourage exploration as the number of episodes played piles up. The constant CLIP_VALUE refers to the value $\epsilon$ in the $L^{CLIP}$ loss function ($\mathbb{E}_t\left[min(r_t(\theta))A_t, clip(r_t(\theta), 1 – \epsilon, 1 + \epsilon)A_t)\right]$). The value NUM_TRAIN_EPOCHS refers to the number of times a given trajectory of rewards, actions, states, etc. will be used to train the network, making use of the importance sampling configuration of the loss function.

The Model class is a fairly standard Keras model API implementation, with two common dense layers and a separate value/critic output and a policy logits output. For more details on this model implementation, see the A2C tutorial.

The loss functions

Next, we’ll review the custom loss functions:

def critic_loss(discounted_rewards, value_est):
    return tf.cast(tf.reduce_mean(keras.losses.mean_squared_error(discounted_rewards, value_est)) * CRITIC_LOSS_WEIGHT,

First the critic loss – this loss function calculates the mean squared error between the discounted rewards (extracted by discounting the trajectory of rewards collated by running the agent in the environment) and the predicted values (value_est) from the neural network from the trajectory of states.

Next, we look at the entropy loss:

def entropy_loss(policy_logits, ent_discount_val):
    probs = tf.nn.softmax(policy_logits)
    entropy_loss = -tf.reduce_mean(keras.losses.categorical_crossentropy(probs, probs))
    return entropy_loss * ent_discount_val

In this loss function, the policy logits directly from the neural network (and based on the trajectory of states sampled from the environment) and the discounted entropy weight (ent_discount_val) are passed as arguments. The policy logits are then converted to quasi-probabilities by applying the softmax function, and then the entropy calculation is performed by applying the Keras categorical cross-entropy function on probs twice (again, for an explanation of how this works, see here). Notice the minus sign to invert the entropy calculation – in a minimization optimization calculation, this will act to enhance the entropy / exploration of the agent. The discounted entropy weight value is then applied to the entropy and the product is returned.

def actor_loss(advantages, old_probs, action_inds, policy_logits):
    probs = tf.nn.softmax(policy_logits)
    new_probs = tf.gather_nd(probs, action_inds)

    ratio = new_probs / old_probs

    policy_loss = -tf.reduce_mean(tf.math.minimum(
        ratio * advantages,
        tf.clip_by_value(ratio, 1.0 - CLIP_VALUE, 1.0 + CLIP_VALUE) * advantages
    return policy_loss

The actor loss takes in the advantages (calculated in another function, which will be explained as follows), the old probability values (again, calculated elsewhere), the action indices, and the policy logits from the neural network model. The action indices are the array indices of the actions actually taken during the trajectory roll-out. In other words, if, for a certain state in the game, the action probabilities are [0.01, 0.3, 0.5, 0.2] and the 0.5 action is taken, the action index, in this case, is 2 (with a zero-based index system). The action_inds is a tensor indices which correspond to the action taken for every record in the batch.

The first line of the function is where the policy logit values are converted to quasi-probabilities using the softmax function. These new policy logits are generated by using the most recent neural network parameters – in other words, after the softmax has been applied, they refer to the $\pi_\theta (a_t | s_t)$ values in the $L^{CLIP}$ loss function. The old_probs values refer to the $\pi_{\theta old}(a_t | s_t)$ values. We are only after the action probabilities that correspond to the actions actually taken during the agent’s playing of the game. The tf.gather_nd function, therefore, takes the probs and extracts those probability outputs which correspond to the actions that were actually taken, specified by the action_inds. This action_inds based selection has already been performed on the probabilities from the neural network parameterized by the old network weight values $\theta_{old}$, as will be shown later.

Then the r value (i.e. the $\frac{\pi_\theta (a_t | s_t)}{\pi_{\theta old}(a_t | s_t)}$ value) is computed. After this, the clipping function is applied as per the $L^{CLIP}$ calculation, shown again below for reference:

$$L^{CLIP}(\theta) = \mathbb{E}_t\left[min(r_t(\theta))A_t, clip(r_t(\theta), 1 – \epsilon, 1 + \epsilon)A_t)\right]$$

Notice the negative sign in the calculation of policy_loss – this ensures that $L^{CLIP}$ is maximized during the minimization optimization that will be performed in the TensorFlow environment.

Next, I introduce the train_model function which calls the various loss function and makes the gradient step:

def train_model(action_inds, old_probs, states, advantages, discounted_rewards, optimizer, ent_discount_val):
    with tf.GradientTape() as tape:
        values, policy_logits = model.call(tf.stack(states))
        act_loss = actor_loss(advantages, old_probs, action_inds, policy_logits)
        ent_loss = entropy_loss(policy_logits, ent_discount_val)
        c_loss = critic_loss(discounted_rewards, values)
        tot_loss = act_loss + ent_loss + c_loss
    grads = tape.gradient(tot_loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    return tot_loss, c_loss, act_loss, ent_loss

This function takes advantage of the tf.GradientTape() context manager – for more details, see this post. Within this context, the model is called on the list of states collected during the unrolling of a trajectory of an agent through the game. The states variable will be a list of input states, of length equal to BATCH_SIZE. The model returns value and policy logit estimates. These values are then passed through to the various loss functions and summated. The gradients of the trainable variables with respect to this total loss value are then calculated, and an optimization step is undertaken using the apply_gradients method.

The next function calculates the discounted rewards and the advantages:

def get_advantages(rewards, dones, values, next_value):
    discounted_rewards = np.array(rewards + [next_value[0]])

    for t in reversed(range(len(rewards))):
        discounted_rewards[t] = rewards[t] + GAMMA * discounted_rewards[t+1] * (1-dones[t])
    discounted_rewards = discounted_rewards[:-1]
    # advantages are bootstrapped discounted rewards - values, using Bellman's equation
    advantages = discounted_rewards - np.stack(values)[:, 0]
    # standardise advantages
    advantages -= np.mean(advantages)
    advantages /= (np.std(advantages) + 1e-10)
    # standardise rewards too
    discounted_rewards -= np.mean(discounted_rewards)
    discounted_rewards /= (np.std(discounted_rewards) + 1e-8)
    return discounted_rewards, advantages

This function has been discussed in previous tutorials, for instance, this post. In this case, both the advantages (used in the actor loss function) and the discounted rewards (used in the critic loss function) are normalized to improve training stability.

model = Model(num_actions)
optimizer = keras.optimizers.Adam(learning_rate=LR)

train_writer = tf.summary.create_file_writer(STORE_PATH + f"/PPO-CartPole_{dt.datetime.now().strftime('%d%m%Y%H%M')}")

num_steps = 10000000
episode_reward_sum = 0
state = env.reset()
episode = 1
total_loss = None
for step in range(num_steps):
    rewards = []
    actions = []
    values = []
    states = []
    dones = []
    probs = []
    for _ in range(BATCH_SIZE):
        _, policy_logits = model(state.reshape(1, -1))

        action, value = model.action_value(state.reshape(1, -1))
        new_state, reward, done, _ = env.step(action.numpy()[0])

        episode_reward_sum += reward

        state = new_state

        if done:
            state = env.reset()
            if total_loss is not None:
                print(f"Episode: {episode}, latest episode reward: {episode_reward_sum}, "
                      f"total loss: {np.mean(total_loss)}, critic loss: {np.mean(c_loss)}, "
                      f"actor loss: {np.mean(act_loss)}, entropy loss {np.mean(ent_loss)}")
            with train_writer.as_default():
                tf.summary.scalar('rewards', episode_reward_sum, episode)
            episode_reward_sum = 0

            episode += 1

    _, next_value = model.action_value(state.reshape(1, -1))
    discounted_rewards, advantages = get_advantages(rewards, dones, values, next_value[0])

    actions = tf.squeeze(tf.stack(actions))
    probs = tf.nn.softmax(tf.squeeze(tf.stack(probs)))
    action_inds = tf.stack([tf.range(0, actions.shape[0]), tf.cast(actions, tf.int32)], axis=1)

    total_loss = np.zeros((NUM_TRAIN_EPOCHS))
    act_loss = np.zeros((NUM_TRAIN_EPOCHS))
    c_loss = np.zeros(((NUM_TRAIN_EPOCHS)))
    ent_loss = np.zeros((NUM_TRAIN_EPOCHS))
    for epoch in range(NUM_TRAIN_EPOCHS):
        loss_tuple = train_model(action_inds, tf.gather_nd(probs, action_inds),
                                 states, advantages, discounted_rewards, optimizer,
        total_loss[epoch] = loss_tuple[0]
        c_loss[epoch] = loss_tuple[1]
        act_loss[epoch] = loss_tuple[2]
        ent_loss[epoch] = loss_tuple[3]
    ent_discount_val *= ENT_DISCOUNT_RATE

    with train_writer.as_default():
        tf.summary.scalar('tot_loss', np.mean(total_loss), step)
        tf.summary.scalar('critic_loss', np.mean(c_loss), step)
        tf.summary.scalar('actor_loss', np.mean(act_loss), step)
        tf.summary.scalar('entropy_loss', np.mean(ent_loss), step)

The majority of this trajectory roll-out / training loop has been covered in this post. The main difference is the loop over NUM_TRAIN_EPOCHS:

for epoch in range(NUM_TRAIN_EPOCHS):
    loss_tuple = train_model(action_inds, tf.gather_nd(probs, action_inds),
                             states, advantages, discounted_rewards, optimizer,
    total_loss[epoch] = loss_tuple[0]
    c_loss[epoch] = loss_tuple[1]
    act_loss[epoch] = loss_tuple[2]
    ent_loss[epoch] = loss_tuple[3]

In this section of the code, the train_model function is run NUM_TRAIN_EPOCHS times. This can be performed in PPO using the importance sampling functionality i.e.:

$$L^{CPI}(\theta) = \mathbb{E}_t \left[\frac{\pi_\theta (a_t | s_t)}{\pi_{\theta old}(a_t | s_t)}A_t\right] = \mathbb{E}_t\left[r_t(\theta)A_t\right]$$

In this case, probs are the probabilities extracted from the model over the rolling out of the trajectory by the agent. In other words, they are the probabilities generated from the neural network with the parameters of the network based on the previous round of training. Therefore, these are the old probabilities or $\pi_{\theta old}(a_t | s_t)$ in the formula above. The new probabilities $\pi_{\theta}(a_t | s_t)$ are generated within the gradient tape context of train_model – a new set of parameters for each training loop. However, through each of these iterations, the old probabilities, or probs, stay constant.

Running this code on the CartPole environment yields the following rewards during training:

PPO Cartpole rewards

PPO Cartpole rewards

As can be observed, the agent manages to score the maximum Cartpole reward of 200 quite early on in the training process, however, it takes some time to achieve this result consistently. Further fine-tuning of the entropy reduction factor, or other hyperparameters, would likely yield a more consistent high score sooner.

This concludes my introductory post on the Proximal Policy Optimization (PPO) method and its implementation in TensorFlow 2. I hope it was informative for you.

27 thoughts on “PPO Proximal Policy Optimization reinforcement learning in TensorFlow 2”

  1. By way of introduction, I am Mark Schaefer, and I represent Nutritional Products International. We serve both international and domestic manufacturers who are seeking to gain more distribution within the United States. Your brand recently caught my attention, so I am contacting you today to discuss the possibility of expanding your national distribution reach.We provide expertise in all areas of distribution, and our offerings include the following: Turnkey/One-stop solution, Active accounts with major U.S. distributors and retailers, Our executive team held executive positions with Walmart and Amazon, Our proven sales force has public relations, branding, and marketing all under one roof, We focus on both new and existing product lines, Warehousing and logistics. Our company has a proven history of initiating accounts and placing orders with major distribution outlets. Our history allows us to have intimate and unique relationships with key buyers across the United States, thus giving your brand a fast track to market in a professional manner. Please contact me directly so that we can discuss your brand further. Kind Regards, Mark Schaefer, marks@nutricompany.com, VP of Business Development, Nutritional Products International, 101 Plaza Real S, Ste #224, Boca Raton, FL 33432, Office: 561-544-0719

  2. Howdy would you mind letting me know which hosting company you’re using? I’ve loaded your blog in 3 different internet browsers and I must say this blog loads a lot faster then most. Can you recommend a good internet hosting provider at a honest price? Cheers, I appreciate it!

  3. An Easy Way to Promote Clickbank Products


    Making Massive Affiliate Commission from
    Clickbank Marketplace is Easier than Ever.

    People are Generating Thousands of Dollars
    using this DFY Clickbank Affiliate Funnels
    Created for Highly Converting Health Niche,
    Dietary-Supplements and Nutrition sub-niches.

    This is a Golden Opportunity to Skyrocket
    Your Profits on Clickbank. Check this out:

    This Package gives Everything you need to
    create a Profitable Affiliate Marketing

    These CB Affiliate Funnel Package includes
    10 Customized PDF Reports, Landing Pages,
    Follow-up messages and ready to use Graphics.

    This package is focused on Highly profitable
    Health and Wellness niches like Body Detox,
    Obesity, Anti-Aging, Prostate Health, Weight
    Loss, Natural Immunity, Diabetes, Belly Fat,
    Metabolism and Tooth Care.

    These Funnels will be Customized with Your
    Clickbank Affiliate links. All the PDF Reports
    will have your Affiliate links and you will get
    credits for all the sales. Around 50 Clickbank
    Products will be promoted with your Affiliate IDs.

    This is an Easy to use Cloud based Software.
    You can use it on a PC, Mac or mobile device.

    This is 100% Newbie friendly. Step by step Video
    Tutorials also included in this package.

    Get Instant Access Right now:

  4. Howdy just wanted to give you a brief heads up and let you know a few of the images aren’t loading correctly. I’m not sure why but I think its a linking issue. I’ve tried it in two different browsers and both show the same results.

  5. Hello, i think that i saw you visited my weblog so i came to go back the prefer?.I am trying
    to to find issues to improve my site!I suppose its adequate to make use of some of your ideas!!

    Here is my web-site fish oil

  6. My partner and I absolutely love your blog and find nearly all of your post’s to be
    exactly what I’m looking for. can you offer guest writers to write content for you personally?

    I wouldn’t mind publishing a post or elaborating on a lot of
    the subjects you write regarding here. Again, awesome
    web site!

  7. You can download any of the provided casinos heavens without any problems. If you download the casino app, it command travail like a standard reproduction of the power supply milieu from the responsive version. Unfortunately, all the apps you download will only do one’s daily dozen on Android. Download casino in 1 click from the schedule, these are the most advanced responsive applications.

    The most suitable casino apps benefit of Android
    The can of worms is that determination Android apps and downloading them can be hazardous, as Google doesn’t allow real money Android casinos to be placed in the On Store.

    Download casino app
    But don’t worry, there is a unassuming discovery, you can download the casino app from casinoapk2.xyz.

    As so multitudinous users attired in b be committed to been asking around casino gaming on their Android phones or tablets. We dug round a tittle to discover you the most desirable casino apps offering the in spite of verifiable money experience.

    The most simplified [url=”https://casinoapk3.xyz/”]http://casinoapk3.xyz[/url], download Pin-up.

    Reviewers check out each pertinence against asylum to ensure positiveness;
    We inclination relieve you find legal money gambling apps with the most beneficent Android apps;
    The casinos extend the best group of games.
    Download Loyal On easy street Casino
    It is not as a last resort effortless to download real lettuce casinos, on a par on more public smartphones like Samsung Galaxy, HTC Complete or Sony Xperia. So if you wish for to download the app to out first simoleons, decipher all below.

    Our collaborate ground the pre-eminent casinos contribution excellence gambling exchange for your trick and ran an intense 25-step verification manage benefit of them.

    On this epoch you will come on an relevancy in spite of Android:

    Reception Promotions – We conscious how much players hanker after to take interest of the bonuses, so we made firm that our featured sites proffer aristocratic deals in support of Android.
    Discrepancy of games. Unpropitious choice is a big minus. We not recommend the app, the recreation portfolio is elephantine and varied.
    Deposits – You be in want of as hardly restrictions as viable when it comes to depositing and withdrawing shin-plasters to your casino app account. We shape inevitable that all apps we approve brook a encyclopedic sort of payment methods.
    Promiscuous payouts. All applications offer lustfully payments with verified money, credited to the account in a few hours.
    Mobile Compatibility – Play Apps Anywhere.
    Customer Hold up – To be featured on the Featured Record, we call for online casinos to offer sweeping and receptive person service.
    Advantages of an online [url=”https://casinoapk2.xyz/skachat-joy-kazino-na-android/”]скачать Джой[/url] app in behalf of Android
    Excess video graphics and usability in Android apps.
    Experience the nevertheless stunning PC experience.
    Intelligent access from the application.
    Casino apps – looking for the greatest
    We check and download casino apps to protect they come together great standards. The criteria habituated to to excellent a casino app are upstanding as stringent as the criteria utilized to quantify a PC casino. Each relevance has:

    Highest quality graphics;
    Untroubled loading and playing epoch;
    Express payouts.

Leave a Reply

Your email address will not be published. Required fields are marked *