Daniel Ernst, distinguished technologist, high-performance computing & AI, Hewlett Packard Enterprise, joins Peggy to talk about HPC and exascale computing. He explains why Exascale Day, is celebrated on October 18, and the meaning behind the date. Daniel shares the details behind exascale computing, how researchers leverage it, the challenges it can help solve, and where it is currently being used. He also explains the relevance of having three HPE Cray Exascale Supercomputers coming online between 2021 and 2023 and what the future hold beyond that.
Below is an excerpt from the interview. To hear the entire interview on The Peggy Smedley Show, visit www.peggysmedleyshow.com, and select 10/20/2020 from the archives.
Peggy Smedley: Dan, I have to ask. This is all really exciting to hear about this Exascale Day, and I know it was October 18th and is a day that celebrates organizations and supercomputing. You have to talk to me about this computational science to change the world. People have to be out there going, “What does that mean, to better help things?” I want you to dig a little in to that. I know people listening will go, “What do we mean, we’re going to make things better?” I mean, right now, we need to make a lot of things better if you ask me because I think we’ve gone a little crazy. But I really want you to kind of walk me through what that really means.
Daniel Ernst: Sure. It’s a lot of big words and big promises and it would be good to talk about how we get there. Exascale computing is what Exascale Day is about and one of the core pieces of Exascale computing is about the rate we can do calculations. We often call these FLOPS, floating-point operations per second, for computing. That’s one of the ways we’ve measured supercomputer performance over the years. Exascale is meaning 10 to the 18th FLOPS. That’s a billion floating-point operations per second, a quintillion.
That’s a massive amount of calculations and the important thing about it is this is the calculations that we can put to work on a single problem. It’s not distributed over lots of little things. This is a capability to apply to a scientific problem. There’s a lot of other complex pieces around it to build a system that can actually do that. But the nice thing about exascale, or the thing we’re really looking forward to about it, is what our customers and the people who are going to deploy it are going to do, right? The things that they’re going to do to change the world, the advances that they’re going to make with this capability.
Smedley: This is when I get my geek on, because I’m not a geek by trade, but I love when you say people can be innovative, in my mind that’s what comes to, that you can solve single problems. Because I think the next innovators, that’s what we have to do. And I personally, when you say quintillion to me, I’m like, “I don’t know really what that really means in the scheme of things.” But what it really means to me is that we’re inspiring people, that we’re creating the what ifs, the why nots, the what’s next.
That’s what to me that you guys talk about that I think that’s what that means. And so for the science people, I think they get the 10 to the 18th FLOPS that you said. To me, someone who’s not a science person, who’s not a math person, but who’s a journalist says, “I can imagine the next thing that you’re solving from a science, a neuroscience perspective, a math problem.” That’s what I see, is that what you’re seeing with this?
Ernst: Yeah. And in fact, as we have been working towards this milestone and working towards these capabilities, we’ve been working very closely with applications people from places like the U.S. Dept. of. Energy, right, who is going to deploy the first systems. And those scientists were asked as we’re moving along this pathway, what could you do if you had this capability? And then as we’re talking to them about what they want to try and accomplish in the science world and the engineering world, they can give us feedback into how can we make a system to really enable them to do what they want to do, right?
Honestly, having a big computer is cool and everything, but if you can’t use it to make advances in the world, to make innovations, to change how people live and work, right? It’s a really expensive thing to show off, right? It needs to be more than that and that’s what our goal has been all throughout this process.
Smedley: When you talk to people about this and you say, “What are we going to do?” We said, “Let’s have this day.” And did people say, “Are they going to come with the next great big idea?” Are they going to say, “Let’s inspire people to think that we can think outside ourselves and think bigger and imagine, and get ourselves fired up about what’s next.” Is that what we’re really thinking about?
Ernst: Yeah. It’s really we’ve been working towards this milestone. We want people to think about what kinds of things could they do with it. And we’ve had these conversations with a lot of scientists over the years. And we’re starting to see them be able to deploy capabilities that we couldn’t get to before. I mean, the advancements in computing has enabled a lot of cool things. Some examples, weather prediction is something that’s been a computing problem for a long time.
Smedley: It’s still a problem, I’m sorry.
Ernst: And while you can say it’s still a problem, it’s not perfect. But I will tell you, they can do things today that they could not do by any measure. As an example, the recent Hurricane Laura, right, they did two things from a computational standpoint that they could not have done before.
National Hurricane Center, one, three days out, they gave a prediction of the track, right? And this prediction of the track is based on very detailed, scientific simulation that runs on these supercomputers. They were able to pinpoint where Laura was going to make landfall. They got it right within about a mile, and within the hour they said it would land. From three days in advance, that’s a huge advantage to giving people the ability to evacuate and for cities to make planning and emergency services to be ready. 10 years ago, there’s no way we would’ve had any of that capability. They also made a prediction of rapid intensification. And that, again, is something from a computational standpoint is really complicated.
And until they had the level of supercomputing that could have today, it made simulations in enough detail to be able to pull out that important information and make that prediction, giving people a warning that Laura was going to go from a tropical storm, all the way to a Category 4 very quickly. This is an example of the kinds of things more capability brings. We want people to think about, “All right, from here, where do we go?”
Smedley: I kid around about weather because I always say that’s something that’s so unpredictable. But I think what you just said is so powerful because we can then save lives in an entirely different way. Sometimes we can’t predict weather patterns and that’s the serious side of it because the devastating impact that it can have when it comes to land. But on the other side of it, we can tell people to evacuate, to get where they have to be, and save lives.
So we have to look at what we couldn’t do 10 years ago, what you just said, and then the power of being able to do the right calculations, to be able to see things and to predict and to do so much more. And I think that’s where the power of this, that you just said, the supercomputing power now of this takes us somewhere. And then now we’re limited again or not limited by what we’re able to do with this. And I think that’s what you’re saying, “If we take this information and how do we keep using it to do more and more, and what’s next?” I mean, is that what you’re saying is, “Okay, if we can do this now, what’s the next thing we can do if we keep pushing those boundaries?”
Ernst: Yeah, absolutely. One of the big ones we’re working on very hard and we’re seeing a huge use case for is advanced artificial intelligence. And using that as a tool in the toolbox to solve some of these problems. I think one that’s been talked about a lot is personalized medicine. It involves an enormous amount of understanding and not just of the science behind how medicine operates, but analysis of huge amounts of data, understanding track records of how certain types of people, features in people change the reactions they get just with the treatment.
And until you can address all of that data in one place and learn something from it, it’s very difficult for anything to show up in anybody’s clinic. We can talk about it, but gathering all that in one place and having it able to give a recommendation, for example, based on not just typical case studies, but what would be specifically good for a specific person. That’s a capability, for example, that we see, that’s “What’s next?” That’s something we could be able to do with these kinds of systems and enable that we couldn’t do before.
Smedley: And, are we saying that it’s now? I mean, we are able to do it now, or are these things we’re still saying it’s coming.
Ernst: These are things we’re still saying it’s coming. And it’s not to say that people have not been working on it. So this has been a dream of a lot of people for a lot of years, but we’re just now getting to the capabilities, not just in raw floating-point performance, just the sort of calculations, but in data storage, in high-performance networking, all these different pieces of the puzzle from a technology standpoint that let us build the capability.
Smedley: But is that the idea of what we want people to understand? You have to keep moving forward. You have to keep looking at the computational power that you just said in order for us to keep seeing what we’re able to accomplish. Because you just said you’re able to then move forward and continue to find complex challenges that you can overcome. That’s what this day is all about is to say, “Look, let’s get excited about the next thing we could do, personalized medicine, or what we could do is a major weather problem. We’re solving things that are affecting people’s lives and changing them forever.”
Ernst: Yeah, high-performance computing historically has had these kind of impacts on people. It’s a capability at a national level, in many ways to solve some of the problems in not just science and engineering and technology, but in a societal sense have impact on people’s lives. So we think about computing a lot of times in solving things like engineering problems. You can think about modeling aircraft engines or Formula One cars or weather. And it’s starting to be bridged with a movement towards easier access to things like AI and analytics.
You can look at things like political science. Not a place you would expect to see high-performance computing. But there was work done a few years ago where somebody tried to tackle the problem of gerrymandering, of how do we do redistricting in a way that’s not politically biased. It’s a really difficult problem but they were able to work through using a national high-performance computing resource at University of Illinois, the Blue Waters system, they were able to show what the characteristics of a non-gerrymandered district map would look like.
And that they were able to statistically draw out those characteristics such that they can compare it to anybody’s prepared map and be able to show pretty conclusively whether or not it was politically biased or not. And that was actually brought in and used in a court case last year that helped show that the Ohio map, for example, was one that needed to be changed. It had a real impact in an area that most people would not associate with computing.
Smedley: You’ve just raised something interesting. We’ve gone from medical research to weather, to gerrymandering. Let’s talk about that, because right now, prior to the pandemic, we’ve been all thinking about where scientific research, whether it was on a specific gender or whatever. But now we have to think about things because we have an election. We’re thinking about voter fraud, we’re thinking about all of the things that happen, security breaches, whatever it might be.
What should we be focusing on? I mean, is that something we need to be thinking about? There’s so much on our mind right now, are these the kind of things that we say, “Look, this supercomputing power, this high-performance computing power that we’re talking about, in time is it going to be able to contribute to that?” as well as saying, “Look, we’re going to be able to put that to an end.” Is that our hope at some point?
Ernst: Yeah. I think it’s very clear that there are always ways to apply or very often, are ways to apply these kinds of capabilities. The way to think about it is this, there’s sort of the standards and pillars in science, you think about what’s your theory and experiment? What everybody learned in high school. But really, there’s more pillars. There’s simulation, which is the modeling of things and trying to create based on the math what would happen in a certain system?
And then on top of that, there’s data analysis where we try and infer information from data. There is no problem on earth these days that doesn’t have some amount of data to it. Some of it’s very simple and maybe doesn’t need a high-performance computer, but it’s absolutely applicable across the lines. It’s not something that’s just limited to scientific pursuit.
And specific to the pandemic, the world’s HPC resources have been turned very quickly to spend a lot of time on that problem. And specifically investigations of different drugs, investigations of vaccines. So a couple of examples, work done at Oak Ridge looking at pre-filtering a huge number of medical compounds trying to model and simulate how they would bind with the COVID virus. And understand how that could be applied or not. That applied a pre-filter that reduced the number of things we needed to put through experimental trials by orders of magnitude. It’s a huge efficiency gain, but it’s not just limited to the pursuit of drug discovery.
The Fugaku, the fastest supercomputer in the world right now is located in Japan. And that system has spent a lot of its cycles in its short life so far stimulating the fluid dynamics of air and understanding the transmission of virus through the air. I think the recent discovery was they talked about the impact of humidity on viral transmission. And all of these things feed into things like, how do we better prepare our world for dealing with sort of the long haul of having viral transmission, right? How do we build more efficient air filtration? How do we guide public policy on social distancing and mask-wearing and things like that, right? These are all things that have contributions from advanced modeling.
Smedley: And when we look at all that, there’s always something new that comes on. When we think about more advances we make, there’s something else we’re going to look at. I mean, that’s what you’re telling us. Because we didn’t think there was going to be a pandemic. We didn’t think we were going to have to say, “What’s this advanced modeling we’re going to have to look at?”
And I guess that begs then the question, is the more awareness we have for all of this, the more we’re scientifically going to have to look at, and which direction we’re going to have to go, is it all depends on either whether it’s going to be a researcher, whether it’s going to have to be scientific research in each direction. What’s going to be the most pressing need at that moment is what I’m kind of hearing you say.
Ernst: Absolutely. And in fact, the way I like to think about these kinds of systems is that while we’re building them and designing them… I mean, we’ve been working on exascale computing research and development towards these systems for almost a decade now.
Smedley: That’s crazy.
Ernst: From when we first set the targets to now, we are actually going into building these things. What we were talking about is use cases half a dozen years ago, where one subset of what it will actually get done. The real important thing is we build these resources and our country has a lot of very smart people. They have experts in various domains, in science, in technology, in Mathematics, and social science.
We want them to be able to have an idea and then go say, “Let me take that idea and run it through and use this tool to help run it to ground and see if we can actually make a big change.” We can’t necessarily predict what those things are. And so, yeah, it is absolutely important that we have awareness that these exist and that this capability is there.
We’ve actually done things from a technological standpoint to try and enable more people to use our systems. We’re trying to get to easier programming methods and models, more deployable types of ways to use it. We’re trying to bring high-performance capabilities into the computer languages that people use every day to make it more accessible. We want this technology to be something that we can put in the hands of more people.
Smedley: So then, Dan, what’s the biggest hurdles then? Is the biggest hurdles based on what our needs are? So we know that we have to solve a personalized medicine issue or is our biggest hurdles that we are having dramatic, let’s say climate change, that’s a big debate. People are saying, “So it’s weather patterns.” Or is it right now we have a pandemic. Does it really matter what’s happening? Or is it, we can’t get both sides of the aisle to agree on something because we’re going to have a major infrastructure failure.
I mean, is it’s whatever happens at that moment, or is it at the time that we can figure out from what you just said, high-performance computing that you’ve been working on for a decade. And now we look at what the Dept. of Energy just says, “We figured out something else we can do that’s going to change American society, mankind’s life.” Does it matter, or if some great innovator figures out something you could solve a great problem?”
Ernst: Yeah. That’s really the thought. We want to enable the people that are those smart domain scientists or just people with really good ideas to be able to go and use these tools to solve whatever problem comes up. And there are obviously challenges in the ways of those things. It is still a capability that takes some work. But the national labs, for example, are there to help enable that stuff. And they’ve done a lot of work with private industry, for example, to try and get various parts of private industry to use these capabilities and give them training, give them access to resources, and so on.
Smedley: Is it gaining awareness that’s the hurdle? Is it a little bit of everything? I mean, because that’s what it seems like. You said there’s a lot of really smart people out there, so why is it taking, I don’t want to say it’s taking a long time, but it seems to me somebody hears that and goes a decades a long time. To you, it’s probably not a long time, but maybe it is. I don’t know. It just seems like a long time to most people when you think it’s taken a decade to make this work or to get where you want to be or even further along.
Ernst: Right. I mean, I think when you start from the decade ago position, you’re starting with, “What problems do we know about now? How can we solve bigger versions of those problems? Or how can we solve those same problems faster.” And that’s what you start from. And what you hope to get to eventually is an overall system capability that lets you do those things, but also anything else other people come up with along the way. And we make adjustments as we learn new use cases and things like that. And I will point out that you say, “Well, what’s the hurdle there? Is it awareness? Is it getting people to use these systems?” It’s some of each. There are technological hurdles that we’ve had to scrape past to get to this point, but the big one is that those people, we need to get them there.
Smedley: Are we doing enough? Because we talked in the beginning about STEM, science, technology, and that. Are we doing enough to inspire, let’s say the great innovators, to want to be helping and doing the things we need to do to make these kind of things happen? Are there enough great innovators to make that happen to what we want to do to make high-performance computation happen? Are there enough great minds to keep it going forward to where you want it to be so that you can keep celebrating an Exascale Day, year after year after year on October 18th?
Ernst: I’m hoping so. I mean, that’s going to be a lot of fun to see what people come up with, right?
Smedley: Yeah, right!
Ernst: One of the joys of my job is we come from the kind of the Cray background. Our goal was this area, to create these capabilities for people to build something new. And we always want to try and stand back when we put things out there. We put these systems in play and see what people can do with them. So yeah, we certainly hope that we get to continue this conversation year after year. There does need to be work bringing up people into this category, into this area. We are seeing a big boom right now in the educational space around things like artificial intelligence. And those are going to be at a certain level, HPC problems.
We’ve seen actually a huge amount of growth in using this type of technology there because of a few reasons. One is it uses a lot of the same kinds of computer resources. But I think, more importantly, they’ve certainly designed their software stacks to be way more accessible. People can define neural networks and image processing in a way that is… Quite literally, I have kids that are doing it. And to think that they someday would be able to very easily run a large capability job on something like this is pretty exciting. And I think we’re on that path.
Smedley: When we think about what we have in the next decade, what will we be seeing between 2021 that’s in a year from now, in 10 years from now, what will we see coming online next year, in the next decade? What do you see? I mean, we want to see what we have now. What do you want to see next year? And then let’s say in the next decade, what do you envision?
Ernst: That is a great question, Peggy. So if you look backwards right. To sort of try and paint a path forward. Every 10 to 12 years, we’ve been going up by a prefix. So we had terascale in the late ‘90s, we got to petascale in sort of the I think around 2007, and now we’re talking about exascale in 2021. What you might note is that the window has gotten bigger. It is taking us longer to get a prefix out. A prefix is a factor of a thousand. We managed to do it for a long time by Moore’s Law in the progress of moving forward. That’s getting a lot harder these days. It’s not completely gone, but the rate of change there is smaller and we’ve needed to do more to do this capability in terms of scaling up in larger numbers of components and all that, all those kinds of technical details.
As we see going forward, I think the first step we need to do is start working forward on these problems. This process has repeated itself over the years where we, as a sort of HPC community, stop, take time at target setting, where do we want to get next? And then just start doing the work to figure out what do we need to do to get there. That work is just starting. But I will tell you that it’s going to be a really large challenge to continue this pace. And so we’re looking at all kinds of different interesting technological advances thinking about very customized computing in terms of hardware, but also trying to use new software techniques to just change the algorithms, to make them less compute-intensive and more data-intensive, for example.
Those are all pathways we have going for the next decade, but we’re still really early. So at that point, it’s time to do this all over again, where we start the process of setting targets and trying to understand what needs to be done.
Smedley: I love the idea that you say we’re starting all new targets. It’s really exciting. So, Dan Ernst, distinguished technologists, high-performance computing AI, Hewlett Packard Enterprise, thank you for joining me today for the first in a three-part series to discuss how technology is driving the future. This has been a really exciting interview on exascale. And so stay tuned as we continue our series. Dan, before we go, what is your URL so our listeners can really learn more about everything we shared today.
Ernst: Well, if you want to learn more about exascale computing, what we’re doing at HPE, actually, there’s a really unique URL you can go to. And that is to go to 000000000000000000.com. And by that, I mean type the number zero 18 times and add .com and that’ll take you to the site.
Want to tweet about this article? Use hashtags #TPSS #IoT #sustainability #AI #5G #digitaltransformation #infrastructure #futureofwork #exascale