In a given environment, which is the best way of collecting reward? Hierarchical RL? Some sort of POMDP-RL, or perhaps analogy-based RL? Combinations thereof? Or other nameless approaches to exploiting algorithmic regularities in solution space? A smart learner should find out by itself, using experience to improve its own credit assignment strategy (metalearning or ``learning to learn") [LenatLenat1983,SchmidhuberSchmidhuber1987]. In principle, such a learner should be able to run arbitrary credit assignment strategies, and discover and use ``good'' ones, without wasting too much of its limited life-time [Schmidhuber, Zhao, SchraudolphSchmidhuber et al.1997a]. It seems obvious that DPRL does not provide a useful basis for achieving this goal, while DS seems more promising as it does allow for searching spaces populated with arbitrary algorithms, including metalearning algorithms. I will come back to this issue later.
Disclaimer: of course, solutions to almost all possible problems are irregular and do not share mutual algorithmic information [KolmogorovKolmogorov1965,SolomonoffSolomonoff1964,ChaitinChaitin1969,Li VitányiLi Vitányi1993]. In general, learning and generalization are therefore impossible for any algorithm. But it's the comparatively few, exceptional, low-complexity problems that receive almost all attention of computer scientists.