DADS reward implementation

Thank you for sharing your great code :)

I think I found that the reward function is a little different from what was defined in the paper(iclr2020):
https://github.com/google-research/dads/blob/abc37f532c26658e41ae309b646e8963bd7a8676/unsupervised_skill_learning/dads_agent.py#L142-L144

As far as I understand, the first reward term defined in eq. 6 of the paper is log q(s'|s,z) - log(\sum_{i=1}^{L}{q(s'|s,z_i)}). But the reward in this repo is defined as \sum_{i=1}^{L} {log q(s'|s,z) - log q(s'|s,z_i)} with numpy's broadcasting functionality. May I ask if I misunderstood or if there is any practical technique I'm missing?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DADS reward implementation #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	# final DADS reward
	intrinsic_reward = np.log(num_reps + 1) - np.log(1 + np.exp(
	np.clip(logp_altz - logp.reshape(1, -1), -50, 50)).sum(axis=0))

DADS reward implementation #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions