Guiding Pretraining in Reinforcement Learning with Large Language Models
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan,
J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G.,
Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G.,
Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu,
J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin,
M., Gray, S., Chess, B., Clark, J., Berner, C., McCan-
dlish, S., Radford, A., Sutskever, I., and Amodei, D.
Language models are few-shot learners, 2020. URL
https://arxiv.org/abs/2005.14165.
Burda, Y., Edwards, H., Storkey, A., and Klimov, O. Ex-
ploration by random network distillation. In Seventh
International Conference on Learning Representations,
pp. 1–17, 2019.
Chan, H., Wu, Y., Kiros, J., Fidler, S., and Ba, J.
Actrce: Augmenting experience via teacher’s advice
for multi-goal reinforcement learning. arXiv preprint
arXiv:1902.04546, 2019.
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. d. O.,
Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman,
G., et al. Evaluating large language models trained on
code. arXiv preprint arXiv:2107.03374, 2021.
Choi, K., Cundy, C., Srivastava, S., and Ermon, S. LMPriors:
Pre-trained language models as task-specific priors. arXiv
preprint arXiv:2210.12530, 2022.
Colas, C., Sigaud, O., and Oudeyer, P.-Y. Gep-pg: Decou-
pling exploration and exploitation in deep reinforcement
learning algorithms. In International conference on ma-
chine learning, pp. 1039–1048. PMLR, 2018.
Colas, C., Karch, T., Lair, N., Dussoux, J.-M., Moulin-
Frier, C., Dominey, P., and Oudeyer, P.-Y. Language
as a cognitive tool to imagine goals in curiosity driven
exploration. Advances in Neural Information Processing
Systems, 33:3761–3774, 2020.
Colas, C., Karch, T., Sigaud, O., and Oudeyer, P.-
Y. Autotelic agents with intrinsically motivated goal-
conditioned reinforcement learning: a short survey. Jour-
nal of Artificial Intelligence Research, 74:1159–1199,
2022.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert:
Pre-training of deep bidirectional transformers for lan-
guage understanding. arXiv preprint arXiv:1810.04805,
2018.
Dubey, R., Agrawal, P., Pathak, D., Griffiths, T. L., and
Efros, A. A. Investigating human priors for playing video
games. arXiv preprint arXiv:1802.10217, 2018.
Hafner, D. Benchmarking the spectrum of agent capabilities.
arXiv preprint arXiv:2109.06780, 2021.
Hermann, K. M., Hill, F., Green, S., Wang, F., Faulkner, R.,
Soyer, H., Szepesvari, D., Czarnecki, W. M., Jaderberg,
M., Teplyashin, D., et al. Grounded language learning in
a simulated 3d world. arXiv preprint arXiv:1706.06551,
2017.
Hill, F., Lampinen, A., Schneider, R., Clark, S., Botvinick,
M., McClelland, J. L., and Santoro, A. Environmental
drivers of systematicity and generalization in a situated
agent. arXiv preprint arXiv:1910.00571, 2019.
Hill, F., Mokra, S., Wong, N., and Harley, T. Hu-
man instruction-following with deep reinforcement learn-
ing via transfer-learning from text. arXiv preprint
arXiv:2005.09382, 2020.
Huang, W., Abbeel, P., Pathak, D., and Mordatch, I. Lan-
guage models as zero-shot planners: Extracting action-
able knowledge for embodied agents. arXiv preprint
arXiv:2201.07207, 2022a.
Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence,
P., Zeng, A., Tompson, J., Mordatch, I., Chebotar, Y., et al.
Inner monologue: Embodied reasoning through planning
with language models. arXiv preprint arXiv:2207.05608,
2022b.
Kant, Y., Ramachandran, A., Yenamandra, S., Gilitschenski,
I., Batra, D., Szot, A., and Agrawal, H. Housekeep: Tidy-
ing virtual households using commonsense reasoning.
In Avidan, S., Brostow, G., Ciss
´
e, M., Farinella, G. M.,
and Hassner, T. (eds.), Computer Vision – ECCV 2022,
pp. 355–373, Cham, 2022. Springer Nature Switzerland.
ISBN 978-3-031-19842-7.
Kong, Y. and Fu, Y. Human action recognition and predic-
tion: A survey. International Journal of Computer Vision,
130(5):1366–1401, 2022.
Kwon, M., Xie, S. M., Bullard, K., and Sadigh, D. Re-
ward design with language models. In International
Conference on Learning Representations, 2023. URL
https://openreview.net/forum?id=10uNUgI5Kl.
Ladosz, P., Weng, L., Kim, M., and Oh, H. Exploration
in deep reinforcement learning: A survey. Information
Fusion, 2022.
Lehman, J., Stanley, K. O., et al. Exploiting open-endedness
to solve problems through the search for novelty. In
ALIFE, pp. 329–336, 2008.
Lehman, J., Clune, J., Misevic, D., Adami, C., Altenberg,
L., Beaulieu, J., Bentley, P. J., Bernard, S., Beslon, G.,
Bryson, D. M., Cheney, N., Chrabaszcz, P., Cully, A.,
Doncieux, S., Dyer, F. C., Ellefsen, K. O., Feldt, R., Fis-
cher, S., Forrest, S., F
´
renoy, A., Gag
´
ne, C., Le Goff,
L., Grabowski, L. M., Hodjat, B., Hutter, F., Keller,
10