50 Essential Information Theoretic Works

Forecastability as a formal research question draws on a wide body of foundational work. The fifty entries below span the theory of predictive information, entropy estimation, algorithmic complexity, and the empirical study of dependence in time series, the sources from which the methods and framing of this programme most directly descend.

Foundations of predictive information and limits

Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423; 27(4), 623–656. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620–630. https://doi.org/10.1103/PhysRev.106.620

Khinchin, A. I. (1957). Mathematical foundations of information theory. Dover Publications. https://books.google.com/books/about/Mathematical_Foundations_of_Information.html?id=Bn2gjlsoi2UC

Rényi, A. (1961). On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics (pp. 547–561). University of California Press. http://projecteuclid.org/euclid.bsmsp/1200512181

Fano, R. M. (1961). Transmission of information: A statistical theory of communication. The MIT Press. https://mitpress.mit.edu/9780262561693/transmission-of-information/

Kolmogorov, A. N. (1958). A new metric invariant of transitive dynamical systems and automorphisms of Lebesgue spaces. Doklady Akademii Nauk SSSR, 119(5), 861–864. https://www.mathnet.ru/eng/dan22553

Sinai, Y. G. (1959). On the notion of entropy of a dynamical system. Doklady Akademii Nauk SSSR, 124, 768–771. https://www.mathnet.ru/eng/dan22675

Brudno, A. A. (1978). Entropy and the complexity of the trajectories of a dynamical system. Russian Mathematical Surveys, 33(1), 197–198. https://doi.org/10.1070/RM1978v033n01ABEH002243

Solomonoff, R. J. (1964). A formal theory of inductive inference. Part I. Information and Control, 7(1), 1–22. https://doi.org/10.1016/S0019-9958(64)90223-2

Solomonoff, R. J. (1964). A formal theory of inductive inference. Part II. Information and Control, 7(2), 224–254. https://doi.org/10.1016/S0019-9958(64)90131-7

Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission, 1(1), 3–11. https://www.mathnet.ru/eng/ppi1024

Martin-Löf, P. (1966). The definition of random sequences. Information and Control, 9(6), 602–619. https://doi.org/10.1016/S0019-9958(66)80018-9

Chaitin, G. J. (1966). On the length of programs for computing finite binary sequences. Journal of the ACM, 13(4), 547–569. https://doi.org/10.1145/321356.321363

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136

Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5), 465–471. https://doi.org/10.1016/0005-1098(78)90005-5

Information dynamics in time series (dependence, memory, and horizons)

Packard, N. H., Crutchfield, J. P., Farmer, J. D., & Shaw, R. S. (1980). Geometry from a time series. Physical Review Letters, 45(9), 712–716.

https://doi.org/10.1103/PhysRevLett.45.712

Takens, F. (1981). Detecting strange attractors in turbulence. In D. A. Rand & L.-S. Young (Eds.), Dynamical systems and turbulence, Warwick 1980 (Lecture Notes in Mathematics, Vol. 898, pp. 366–381). Springer. https://doi.org/10.1007/BFb0091924

Grassberger, P., & Procaccia, I. (1983). Measuring the strangeness of strange attractors. Physica D: Nonlinear Phenomena, 9(1–2), 189–208. https://doi.org/10.1016/0167-2789(83)90298-1

Fraser, A. M., & Swinney, H. L. (1986). Independent coordinates for strange attractors from mutual information. Physical Review A, 33(2), 1134–1140. https://doi.org/10.1103/PhysRevA.33.1134

Grassberger, P. (1986). Toward a quantitative theory of self-generated complexity. International Journal of Theoretical Physics, 25(9), 907–938. https://doi.org/10.1007/BF00668821

Crutchfield, J. P., & Young, K. (1989). Inferring statistical complexity. Physical Review Letters, 63(2), 105–108. https://doi.org/10.1103/PhysRevLett.63.105

Massey, J. L. (1990). Causality, feedback and directed information. In Proceedings of the International Symposium on Information Theory and Its Applications (ISITA-90) (pp. 303–305). https://www.isiweb.ee.ethz.ch/archive/massey_pub/pdf/BI532.pdf

Feder, M., Merhav, N., & Gutman, M. (1992). Universal prediction of individual sequences. IEEE Transactions on Information Theory, 38(4), 1258–1270. https://doi.org/10.1109/18.144706

Rissanen, J. (1989). Stochastic complexity in statistical inquiry. World Scientific. https://www.worldscientific.com/worldscibooks/10.1142/0822

Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85(2), 461–464. https://doi.org/10.1103/PhysRevLett.85.461

Bialek, W., Nemenman, I., & Tishby, N. (2001). Predictability, complexity, and learning. Neural Computation, 13(11), 2409–2463. https://doi.org/10.1162/089976601753195969

Crutchfield, J. P., & Feldman, D. P. (2003). Regularities unseen, randomness observed: The entropy convergence and information. Chaos, 13(1), 25–54. https://doi.org/10.1063/1.1530990

Tishby, N., Pereira, F. C., & Bialek, W. (1999). The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control, and Computing. University of Illinois at Urbana-Champaign. https://research.google/pubs/the-information-bottleneck-method-2/

Barnett, L., Barrett, A. B., & Seth, A. K. (2009). Granger causality and transfer entropy are equivalent for Gaussian variables. Physical Review Letters, 103, 238701. https://doi.org/10.1103/PhysRevLett.103.238701

Song, C., Qu, Z., Blumm, N., Wang, D., & Barabási, A.-L. (2010). Limits of predictability in human mobility. Science, 327(5968), 1018–1021. https://doi.org/10.1126/science.1177170

Bandt, C., & Pompe, B. (2002). Permutation entropy: A natural complexity measure for time series. Physical Review Letters, 88, 174102.

https://doi.org/10.1103/PhysRevLett.88.174102

Pincus, S. M. (1991). Approximate entropy as a measure of system complexity. Proceedings of the National Academy of Sciences, 88(6), 2297–2301. https://doi.org/10.1073/pnas.88.6.2297

Richman, J. S., & Moorman, J. R. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology - Heart and Circulatory Physiology, 278(6), H2039–H2049. https://doi.org/10.1152/ajpheart.2000.278.6.H2039

Estimation and operational forecastability diagnostics (finite data, bias, and algorithmic lenses)

Kozachenko, L. F., & Leonenko, N. N. (1987). Sample estimate of the entropy of a random vector. Problems of Information Transmission, 23(2), 95–101. https://dmitripavlov.org/scans/kozachenko-leonenko.pdf

Grassberger, P. (1988). Finite sample corrections to entropy and dimension estimates. Physics Letters A, 128(6–7), 369–373. https://doi.org/10.1016/0375-9601(88)90193-4

Darbellay, G. A., & Vajda, I. (1999). Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45(4), 1315–1321. https://doi.org/10.1109/18.761290

Paninski, L. (2003). Estimation of entropy and mutual information. Neural Computation, 15(6), 1191–1253. https://doi.org/10.1162/089976603321780272

Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6), 066138. https://doi.org/10.1103/PhysRevE.69.066138

Nemenman, I., Shafee, F., & Bialek, W. (2002). Entropy and inference, revisited. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems 14. MIT Press. https://arxiv.org/abs/physics/0107078

Kontoyiannis, I., Algoet, P. H., Suhov, Y. M., & Wyner, A. J. (1998). Nonparametric entropy estimation for stationary processes and random fields, with applications to English text. IEEE Transactions on Information Theory, 44(3), 1319–1327. https://doi.org/10.1109/18.669425

Ziv, J., & Lempel, A. (1977). A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3), 337–343. https://doi.org/10.1109/TIT.1977.1055714

Ziv, J., & Lempel, A. (1978). Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, 24(5), 530–536. https://doi.org/10.1109/TIT.1978.1055934

Kaspar, F., & Schuster, H. G. (1987). Easily calculable measure for the complexity of spatiotemporal patterns. Physical Review A, 36(2), 842-848. https://doi.org/10.1103/PhysRevA.36.842

Costa, M., Goldberger, A. L., & Peng, C.-K. (2002). Multiscale entropy analysis of complex physiologic time series. Physical Review Letters, 89, 068102. https://doi.org/10.1103/PhysRevLett.89.068102

Barnett, L., & Seth, A. K. (2014). The MVGC multivariate Granger causality toolbox: A new approach to Granger-causal inference. Journal of Neuroscience Methods, 223, 50–68. https://doi.org/10.1016/j.jneumeth.2013.10.018

Wibral, M., Vicente, R., & Lindner, M. (Eds.). (2014). Transfer entropy in neuroscience. Springer. https://doi.org/10.1007/978-3-319-04298-2

Amigo, G., Díaz-Pachón, D. A., Marks, R. J., & Baylis, C. (2023). Algorithmic information forecastability. arXiv. https://arxiv.org/abs/2304.10752

Li, M., & Vitányi, P. (2019). An introduction to Kolmogorov complexity and its applications (4th ed.). Springer. https://doi.org/10.1007/978-3-030-11298-1