50 Essential Information Theoretic Works
Forecastability as a formal research question draws on a wide body of foundational work. The fifty entries below span the theory of predictive information, entropy estimation, algorithmic complexity, and the empirical study of dependence in time series, the sources from which the methods and framing of this programme most directly descend.
Foundations of predictive information and limits
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423; 27(4), 623–656. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620–630. https://doi.org/10.1103/PhysRev.106.620
Khinchin, A. I. (1957). Mathematical foundations of information theory. Dover Publications. https://books.google.com/books/about/Mathematical_Foundations_of_Information.html?id=Bn2gjlsoi2UC
Rényi, A. (1961). On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics (pp. 547–561). University of California Press. http://projecteuclid.org/euclid.bsmsp/1200512181
Fano, R. M. (1961). Transmission of information: A statistical theory of communication. The MIT Press. https://mitpress.mit.edu/9780262561693/transmission-of-information/
Kolmogorov, A. N. (1958). A new metric invariant of transitive dynamical systems and automorphisms of Lebesgue spaces. Doklady Akademii Nauk SSSR, 119(5), 861–864. https://www.mathnet.ru/eng/dan22553
Sinai, Y. G. (1959). On the notion of entropy of a dynamical system. Doklady Akademii Nauk SSSR, 124, 768–771. https://www.mathnet.ru/eng/dan22675
Brudno, A. A. (1978). Entropy and the complexity of the trajectories of a dynamical system. Russian Mathematical Surveys, 33(1), 197–198. https://doi.org/10.1070/RM1978v033n01ABEH002243
Solomonoff, R. J. (1964). A formal theory of inductive inference. Part I. Information and Control, 7(1), 1–22. https://doi.org/10.1016/S0019-9958(64)90223-2
Solomonoff, R. J. (1964). A formal theory of inductive inference. Part II. Information and Control, 7(2), 224–254. https://doi.org/10.1016/S0019-9958(64)90131-7
Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1(1), 3–11. https://www.mathnet.ru/eng/ppi1024
Martin-Löf, P. (1966). The definition of random sequences. Information and Control, 9(6), 602–619. https://doi.org/10.1016/S0019-9958(66)80018-9
Chaitin, G. J. (1966). On the length of programs for computing finite binary sequences. Journal of the ACM, 13(4), 547–569. https://doi.org/10.1145/321356.321363
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136
Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5), 465–471. https://doi.org/10.1016/0005-1098(78)90005-5
Information dynamics in time series (dependence, memory, and horizons)
Packard, N. H., Crutchfield, J. P., Farmer, J. D., & Shaw, R. S. (1980). Geometry from a time series. Physical Review Letters, 45(9), 712–716.
https://doi.org/10.1103/PhysRevLett.45.712
Takens, F. (1981). Detecting strange attractors in turbulence. In D. A. Rand & L.-S. Young (Eds.), Dynamical systems and turbulence, Warwick 1980 (Lecture Notes in Mathematics, Vol. 898, pp. 366–381). Springer. https://doi.org/10.1007/BFb0091924
Grassberger, P., & Procaccia, I. (1983). Measuring the strangeness of strange attractors. Physica D: Nonlinear Phenomena, 9(1–2), 189–208. https://doi.org/10.1016/0167-2789(83)90298-1
Fraser, A. M., & Swinney, H. L. (1986). Independent coordinates for strange attractors from mutual information. Physical Review A, 33(2), 1134–1140. https://doi.org/10.1103/PhysRevA.33.1134
Grassberger, P. (1986). Toward a quantitative theory of self-generated complexity. International Journal of Theoretical Physics, 25(9), 907–938. https://doi.org/10.1007/BF00668821
Crutchfield, J. P., & Young, K. (1989). Inferring statistical complexity. Physical Review Letters, 63(2), 105–108. https://doi.org/10.1103/PhysRevLett.63.105
Massey, J. L. (1990). Causality, feedback and directed information. In Proceedings of the International Symposium on Information Theory and Its Applications (ISITA-90) (pp. 303–305). https://www.isiweb.ee.ethz.ch/archive/massey_pub/pdf/BI532.pdf
Feder, M., Merhav, N., & Gutman, M. (1992). Universal prediction of individual sequences. IEEE Transactions on Information Theory, 38(4), 1258–1270. https://doi.org/10.1109/18.144706
Rissanen, J. (1989). Stochastic complexity in statistical inquiry. World Scientific. https://www.worldscientific.com/worldscibooks/10.1142/0822
Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85(2), 461–464. https://doi.org/10.1103/PhysRevLett.85.461
Bialek, W., Nemenman, I., & Tishby, N. (2001). Predictability, complexity, and learning. Neural Computation, 13(11), 2409–2463. https://doi.org/10.1162/089976601753195969
Crutchfield, J. P., & Feldman, D. P. (2003). Regularities unseen, randomness observed: The entropy convergence and information. Chaos, 13(1), 25–54. https://doi.org/10.1063/1.1530990
Tishby, N., Pereira, F. C., & Bialek, W. (1999). The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control, and Computing. University of Illinois at Urbana-Champaign. https://research.google/pubs/the-information-bottleneck-method-2/
Barnett, L., Barrett, A. B., & Seth, A. K. (2009). Granger causality and transfer entropy are equivalent for Gaussian variables. Physical Review Letters, 103, 238701. https://doi.org/10.1103/PhysRevLett.103.238701
Song, C., Qu, Z., Blumm, N., Wang, D., & Barabási, A.-L. (2010). Limits of predictability in human mobility. Science, 327(5968), 1018–1021. https://doi.org/10.1126/science.1177170
Bandt, C., & Pompe, B. (2002). Permutation entropy: A natural complexity measure for time series. Physical Review Letters, 88, 174102.
https://doi.org/10.1103/PhysRevLett.88.174102
Pincus, S. M. (1991). Approximate entropy as a measure of system complexity. Proceedings of the National Academy of Sciences, 88(6), 2297–2301. https://doi.org/10.1073/pnas.88.6.2297
Richman, J. S., & Moorman, J. R. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology - Heart and Circulatory Physiology, 278(6), H2039–H2049. https://doi.org/10.1152/ajpheart.2000.278.6.H2039
Estimation and operational forecastability diagnostics (finite data, bias, and algorithmic lenses)
Kozachenko, L. F., & Leonenko, N. N. (1987). Sample estimate of the entropy of a random vector. Problems of Information Transmission, 23(2), 95–101. https://dmitripavlov.org/scans/kozachenko-leonenko.pdf
Grassberger, P. (1988). Finite sample corrections to entropy and dimension estimates. Physics Letters A, 128(6–7), 369–373. https://doi.org/10.1016/0375-9601(88)90193-4
Darbellay, G. A., & Vajda, I. (1999). Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45(4), 1315–1321. https://doi.org/10.1109/18.761290
Paninski, L. (2003). Estimation of entropy and mutual information. Neural Computation, 15(6), 1191–1253. https://doi.org/10.1162/089976603321780272
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6), 066138. https://doi.org/10.1103/PhysRevE.69.066138
Nemenman, I., Shafee, F., & Bialek, W. (2002). Entropy and inference, revisited. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems 14. MIT Press. https://arxiv.org/abs/physics/0107078
Kontoyiannis, I., Algoet, P. H., Suhov, Y. M., & Wyner, A. J. (1998). Nonparametric entropy estimation for stationary processes and random fields, with applications to English text. IEEE Transactions on Information Theory, 44(3), 1319–1327. https://doi.org/10.1109/18.669425
Ziv, J., & Lempel, A. (1977). A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3), 337–343. https://doi.org/10.1109/TIT.1977.1055714
Ziv, J., & Lempel, A. (1978). Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, 24(5), 530–536. https://doi.org/10.1109/TIT.1978.1055934
Kaspar, F., & Schuster, H. G. (1987). Easily calculable measure for the complexity of spatiotemporal patterns. Physical Review A, 36(2), 842-848. https://doi.org/10.1103/PhysRevA.36.842
Costa, M., Goldberger, A. L., & Peng, C.-K. (2002). Multiscale entropy analysis of complex physiologic time series. Physical Review Letters, 89, 068102. https://doi.org/10.1103/PhysRevLett.89.068102
Barnett, L., & Seth, A. K. (2014). The MVGC multivariate Granger causality toolbox: A new approach to Granger-causal inference. Journal of Neuroscience Methods, 223, 50–68. https://doi.org/10.1016/j.jneumeth.2013.10.018
Wibral, M., Vicente, R., & Lindner, M. (Eds.). (2014). Transfer entropy in neuroscience. Springer. https://doi.org/10.1007/978-3-319-04298-2
Amigo, G., Díaz-Pachón, D. A., Marks, R. J., & Baylis, C. (2023). Algorithmic information forecastability. arXiv. https://arxiv.org/abs/2304.10752
Li, M., & Vitányi, P. (2019). An introduction to Kolmogorov complexity and its applications (4th ed.). Springer. https://doi.org/10.1007/978-3-030-11298-1