The overarching goal is to maximize the discovery potential of scientific collaborations through the development of revolutionary open source products and methods in the SDN, virtualization and global system operation and optimization space. This will be accomplished by exploiting and contributing to the remarkable synergy emerging between: Deeply programmable software-defined agile and adaptive network infrastructures that are emerging as multi-service multi-domain network “operating systems” interconnecting next generation Science DMZs, and the systems developed by the data intensive science programs harnessing global workflow, scheduling and data management systems. While the initial focus will be on the challenging LHC use case, the products developed will be general, and apply to many fields of data intensive science. These will be informed by the LSST and bioinformatics/genomics use cases, which will be explored during the latter part of the project.
We will construct autonomous, intelligent site-resident services that dynamically interact with network-resident services, and with the science programs’ principal data distribution and management tools, to request or command network resources in support of high throughput petascale to exascale workflows, using:
smart middleware to interface to SDN-orchestrated data flows over network paths with guaranteed bandwidth all the way to a set of high performance end-host data transfer nodes (DTNs),
protocol agnostic SDN-based QoS and traffic shaping services at the site egress that will provide stable, predictable data transfer rates, and auto-configuration of data transfer nodes, and
host- and site agent systems coupled to machine learning methods
At the core of SDN-NGenIA is an OpenDaylight controller using Application Level Traffic Optimization (ALTO) and a Min-max Fair Resource Allocation algorithm-set under development together with the group of Y. Yang et al. at Yale.
The accumulated knowledge of this development program also will serve to inform the design of the following generations of distributed petabit/sec systems, including continental scale instruments such as SKA, and the exascale computing systems of the next decade harnessing zettabyte datasets.