The development of modern technologies, particularly in the field of machine learning and generative artificial intelligence, enhances the capabilities of statisticians. They can utilize new data sources and quickly analyze vast information resources. With the progression of computer tools, the way users seek statistical data is changing. The reports prepared by entities such as the Central Statistical Office (GUS) and Eurostat will need to compete with those prepared by nonspecialized entities or even language models, which often rely on uncertain sources.
“Statistics greatly benefits from digital transformation, starting with a very general topic, which is efficiency. The funds allocated to statistics haven’t been increasing for years, however, the number and range of statistics and information we provide increases very dynamically each year. From this viewpoint, it is apparent how digitization aids public statistics. It also poses certain challenges but also opportunities we wish to seize,” emphasizes Dr. Dominik Rozkrut, President of GUS, in a conversation with the Newseria Innovations agency.
Statisticians are making use of new, sometimes experimental sources of data, as well as their acquisition and processing techniques, and utilize them to provide society with the best possible resource of reports. GUS is interested in data sources such as satellite images (for example, for statistics related to agriculture, ship positioning at sea, real-time traffic intensity statistics, as well as those supplied by payment card operators or telecommunication service providers).
“These are all new data sources which did not exist before. We did not have the possibility to use them because they simply did not exist. However, today completely new possibilities are opening,” says the President of GUS.
In July in Warsaw, the IV Congress of Polish Statistics took place, during which a significant part was devoted to the discussion about the significance of new technologies in statistics. One of the speakers, Mariana Kotzeva from Eurostat, pointed out in her presentation that especially generative artificial intelligence is changing the way users access data. Using such solutions involves a high risk of obtaining incorrect data, which reduces user trust. Despite all this, the near future suggests that official statistics will have to compete with those originating from unauthorized reports, often based on uncertain and unstructured data.
“The question about quality is a very good question. We had certain entrenched quality control methods in traditional statistical research methods, but all new data sources, of course, offer great possibilities, but they are also a challenge when it comes to achieving quality. Integrating sources of various nature makes it harder to manage quality. But we devote a lot of attention to this. We join forces with other statistical offices around the world, with Eurostat, OECD, the United Nations and we are working out appropriate methods, techniques, regulations, which will allow us to effectively communicate quality, and also care for it,” declares Dr. Dominik Rozkrut.
According to the expert, large data sets can provide valuable statistical information in virtually every field.
“These data allow us to increase the accuracy, precision, and detail of the results we present, but also to learn about things and phenomena that we could not examine before, due to the limitations of traditional research instruments. The potential is huge, and the Central Statistical Office is seeking to cooperate with the managers of these new data sources. Sometimes these data are public, but largely they are the data of private managers. We undertake cooperation, form partnerships in order to widely use these data,” informs the President of GUS.