The goal of the National Oceanic and Atmospheric Administration (NOAA) is to put all of its data — data about weather, climate, ocean coasts, fisheries, and ecosystems – into the hands of the people who need it most. The trick is translating the hard data and making it useful to people who aren’t necessarily subject matter experts, said Edward Kearns, the NOAA’s first ever data officer, speaking at the recent Open Source Leadership Summit (OSLS).
NOAA’s mission is similar to NASA’s in that it is science based, but “our mission is operations; to get the quality information to the American people that they need to run their businesses, to protect their lives and property, to manage their water resources, to manage their ocean resources,” said Kearns, during his talk titled “Realizing the Full Potential of NOAA’s Open Data.”
He said that NOAA was doing Big Data long before the term was coined and that the agency has way too much of it – to the tune of 30 petabytes in its archives with another 200 petabytes of data in a working data store. Not surprisingly, NOAA officials have a hard time moving it around and managing it, Kearns said.
Data Sharing
NOAA is a big consumer of open source and sharing everything openly is part of the organization’s modus operandi. On a global level, “the agency has been a leader for the entire United States in trying to broker data sharing among countries,” Kearns said. One of the most successful examples has been through the United Nations, with an organization called World Meteorological Organization (WMO).
Agency officials have a tendency to default making their products accessible in the public domain, something Kearns said he’d like to change. By adopting some modern licensing practices, he believes the NOAA could actually share even more information with the public. “The Linux Foundation has made progress on the community data license agreement. This is one the things I’d like to possibly consider adopting for our organization,’’ he added.
One of the great success stories the NOAA has in terms of getting critical data to the public was after Hurricane Irma hit Florida in September 2017, he said.
“As you can imagine, there were a lot of American citizens that were hungry for information and were hitting the NOAA websites very hard and data sites very hard,’’ he said. “Typically, we have a hard time keeping up with that kind of demand.” The National Hurricane Center is part of the NOAA, and the agency took the NHC’s website and put it on Amazon Cloud.
This gave the agency the ability to handle over a billion hits a day during the peak hurricane season. But, he continued, “we are still … just starting to get into how to adopt some of these more modern technologies to do our job better.”
Equal Access
Now the NOAA is looking to find a way to make the data available to an even wider group of people and make it more easily understood. Those are their two biggest challenges: how to disseminate data and how to help people understand it, Kearns said.
“We’re getting hammered every day by a lot of companies that want the data… and we have to make sure everybody’s got an equal chance of getting the data,” he said.
This is becoming a harder job because demand is growing exponentially, he said. “Our costs are going up because we need more servers, we need more networks,” and it’s a problem due to budget constraints.
The agency decided that partnering with industry would help facilitate the delivery of data.
The NOAA is going into the fourth year of a deal it signed with Amazon, Microsoft, IBM, Google, and a nonprofit out of the University of the Chicago called the Open Commons Consortium (OCC), Kearns said. The agreement is that NOAA data will remain free and open and the OCC will host it at no cost to taxpayers and monetize services around the data.
The agency is using an academic partner acting as a data broker to help it “flip this data and figure out how to drop it into all of our collaborators’ cloud platforms, and they turn it around and serve many consumers from that,” Kearns explained. “We went from a one-to-many, to a one-to-a-few, to a many model of distribution.”
People trust NOAA’s data today because they get it from a NOAA data service, he said. Now the agency is asking them to trust the NOAA data that exists outside the federal system on a partner system.
On AWS alone the NOAA has seen an improvement of over two times the number of people who are using the data, he said. The agency in turn, has seen a 50 percent reduction in hits on the NOAA servers.
Google has loaded a lot of the agency’s climate data to its BigQuery data warehouse, “and they’ve been able to move petabytes of this data just in a few months, just because the data now has been loaded into a tool people are already using.”
This “reduces that obstacle of understanding,’’ Kearns noted. “You don’t have to understand a scientific data format, you can go right into BigQuery… and do analyses.”
Data Trust
Being able to trust data is also an important component of any shared initiative, and through the NOAA’s Big Data Project, the agency is seeking ways of ensuring that the trust that comes with the NOAA brand is conveyed with the data, he said, so people continue to trust it as they use it.
“We have a very proud history of this open data leadership, we’re continuing on that path, and we’re trying to see how we can amplify that,’’ Kearns said.
NOAA officials are now wondering if the data is being made available through these modern cloud platforms will make it easier for users to create information products for themselves and their customers.
“Of course, we’re also looking for other ways of just doing our business better,’’ he added. But they want to figure out if it makes sense to continue this experiment with its partners. That, he said, they will likely know by early next year.
Watch the complete presentation below:
- How Blockchain Changes the Nature of Trust - 2019-01-22
- Machine Learning, Biased Models, and Finding the Truth - 2018-11-27
- AI in the Real World - 2018-11-15