On September 1st of 2017 I decided that I would embark on another 365 challenge, this time in Monochrome. The idea is that each day, for a year, I will post an image in Black and White without…
which depends on the worker’s actions. Thus providing a weighted intrinsic reward helps the worker align its actions to follow a distribution around the manager’s goal state.
The paper also proposes a novel Dilated LSTM network comparable to dilated CNN and helps the manager learn a larger receptive field of states over time. The dilation factor for dLSTM is r which in this paper is set to to 10 for most experiments. This also matches the value of horizon c and it points out that manager’s goals are always ≤c time steps into the future. The paper also proposes a novel idea of policy-embedding where the action embedding U is combined with a linear transformation of sum of past c goals using matrix-vector product. This setup helps with exploration when the goals are randomly emitted.
I really enjoyed reading this paper and think that it is very significant for pushing towards the goal of general artificial intelligence. This paper takes a step towards general intelligence based models because the neural network is able to learn strategies in semi-MDP environments without explicitly receiving a model of sub-tasks useful for an environment. The paper presents a novel technique to train a manager network which generates sub-tasks/goals for workers without explicit knowledge of the domain. This paper shows that FuN can be the basis of deep hierarchical RL networks which can be used to generate optimal agents in complex real-world environments (which are hard to model using MDP).
The significance of the algorithm is also evident from Section 5.4 of original paper with ablative study of ‘Intrinsic motivation weight’. Figure 11 of original paper shows the reward v/s α (intrinsic reward weight) scatter plot which explains how only games with long term strategy require a high α. Less complex games (based on number of different objectives to accomplish), the α value tends to scatter everywhere and we see from state of the results of A3C and Dueling DQN that hierarchical RL is not required for such games.
Earlier this week while scrolling through twitter, I stumbled across a link to this podcast called Getting Curious with Jonathan Van Ness. This was actually my first time ever listening to this…
September 28th was declared as the International Right to Know Day some 16 years ago, but only in the last couple of years has it become clear that this is not only an occasion for celebration for…
El Glitch es la inspiración en la modificación inesperada de las imágenes o sonidos ante un evento inesperado que por lo general ocurre cuando hay una falla, de estas fallas surge la inspiración de…