ALCF Aurora Early Science Program: Data and Learning Proposal Instructions

General Information and Submission Instructions

Aurora 2021 Data and Learning Call For Proposals Is Now Closed.

Our intent is for Aurora Early Science Program (ESP) proposals to be relatively simple and short—a stripped-down version of an INCITE proposal. The sections of the proposal are

  1. PI and co-PI information
  2. Project Summary
    • Executive Summary
    • Benefit to Community
    • Science Summary
    • Application Summary
  3. Estimate of Resources Required
  4. Participation in other Applications-Readiness Programs
  5. Project Team Members, Research Funding

For details on how proposals will be evaluated, see the Aurora 2021 Data and Learning ESP Call for Proposals webpage.

Submission

  • Submission deadline: April 8, 2018 before midnight in any time zone (Anywhere On Earth)
  • We are using the EasyChair system for proposal submission. You’ll need to create an account if you don’t have one already, and login to the Aurora ESP EasyChair website.
  • Prepare your proposal using the instructions below
  • Submit as a single PDF document, by using EasyChair to upload. You may resubmit with revisions as needed up until the deadline.

Please direct any questions to earlyscience@alcf.anl.gov. If needed, contact Tim Williams at 630-252-1154.

Proposal Instructions

Please create your proposal document with a project title, and the section headings noted below. Please use or refer to the template Word document based on this:

Section 1: PI and co-PI Information

1a. Principal Investigator (PI) Information

  • Last Name, First Name, Title (Dr., Mr., Ms., etc.)
  • Institution
  • Street address
  • Email address

1b. Co-Principal Investigator (co-PI) Information

For each co-investigator:

  • Last name, first name, title (Dr., Mr., Ms., etc.)
  • Institution
  • Street address
  • Email address

Section 2: Project Summary

2a. Executive Summary

Write an executive summary that accurately describes your proposed research and the high-impact scientific advances you will achieve with access to early resources at the ALCF. (1/2 page)

2b. Benefit to Community

Write a description of the benefit your project will provide to the science and HPC community. (1/2 page)

2c. Impact Statement

Provide a two-sentence project summary that can be used to describe the impact of your project to the public (50 words maximum).

2d. Science Summary

Write a description of the science problem you would like to address in the late 2021 time frame. This problem should be appropriate for an exascale system, with characteristics as described in Section 3 below. Include research that will need to be completed in the next two years to lead up to this work. (1 page)

2e. Application Summary

2e.i. Application Software Requirements

Write a list of your application requirements, including languages, libraries, I/O middleware, databases, machine learning/deep learning frameworks, workflow software, containers, etc. Indicate your current approaches to parallelism (Spark, Swift, MPI, OpenMP, etc.). Please include “productivity” languages such as Python, R, Julia, etc. (1 page).

2e.ii. Application Data Requirements

Write a description of your data management requirements. Example topics to include

  • Streaming/realtime data feeds
  • Access to remote databases
  • Setup/access of local databases
  • Data formats (HDF5, etc.)
  • Data sizes/scales (ingested/output data for each stage in end-to-end workflow)
  • ALCF persistant storage requirements (number of bytes, maximum number of files, other measurements that pertain to non-file-based storage)
2e.iii. Application Description

Write a description of the current application, including methods, parallelization, I/O, etc. (1 page).

2e.iv. Application Development Needed

Write a description of the code and/or algorithmic development you believe will be necessary to exploit an increase in parallelism per-node and an increase in overall levels of parallelism. Include work that will be needed in MPI parallelism. Show measurements of current application performance and scaling, indicating where development/optimization is needed to achieve performance goals for Aurora. (1 page).

Section 3: Estimate of Resources Requested

You'll be making two CPU resource requests. The first request is for development time on our current machines, which should be a modest request of on the order of one or a few million core­hours at most. This is development work that does not depend strongly on having the new hardware: implementing new algorithms, incorporating new kinds of networks, exploring to find optimal training datasets, adding new physics modules, introducing or scaling up of threads, refactoring of code or data structures needed to map to Aurora hardware architecture, etc.

The second request is for Early Science period time on our next­-generation machine, Aurora. This is a large request, for the CPU time you'll need to run your proposed science problem. Assuming 3 months of Early Science dedicated availability, and a peak speed of 1 exaFLOPS for the full machine, the total amount of Aurora CPU time available for Early Science is on the order of 1300 exaFLOPS-hours. Divided 20 ways, this would be about 65 exaFLOPS-hours for a single project. Your request should be in this general ballpark—could be somewhat higher or lower.

As a comparison reference point, consider the equivalent for today’s system, Theta. Theta has a peak speed of 11.69 petaFLOPS. Dedicated access for 3 months would give a total of around 15,300 petaFLOPS-hours, or 15.3 exaFLOPS-hours. Divided 20 ways, this would be about 0.77 exaFLOPS-hours for a single project. Given that Theta has 4392 nodes with 64 cores, this would translate into 18.5 million Theta core-hours. Roughly speaking, what you’ll get on Aurora is about 100x what you could get on Theta in terms of processing speed. An ESP project on Aurora would be the equivalent of about 1.8 billion Theta core-hours.

Please make your best effort to project your Aurora resource needs using units of exaFLOPS-hours. Because we cannot give you any details about the architecture now, it doesn’t make sense to estimate in terms of core-hours. However you make your estimates, please explain your estimation method; the "brief schedule" section in your proposal is a good place to do this explanation.

Processing time is, of course, only part of the computing story. There is also the memory aspect. In making your estimates and formulating problems you think you could solve on Aurora, you will also want to keep in mind the total amount of memory you need. While the aggregate compute speed of Aurora will be 100X that of Theta, the total memory of Aurora is expected to be only on the order of 10X that of Theta.

3a. Current-Generation System (Theta) Resources:

  • Theta time in Theta core-hours. This time is primarily for development, not for science runs. The ballpark of your request should be a few to 10 million core-hours per year—sort of like a Director’s Discretionary allocation.
    • Please specify the amount of time you need per calendar year (2018, 2019, 2020, and 2021)
  • Disk space in TB
  • Tape archive space in TB
  • Brief schedule for how you would use that time on Theta to prepare for early access to next-generation hardware and the final next-generation system: scaling tests, development (e.g. algorithms, physics modules), verification, parameter sweeps, porting to Xeon Phi architecture, etc. Assume that your Mira and/or Theta access begins on 1 October 2018 and continues until the start of the Early Science period on Aurora (second half of 2021; exact date subject to change). Break this down into milestones as appropriate for your project. (1/2 page).

3b. Next-Generation System (Aurora) Resources:

  • Aurora time in exaFLOPS-hours
  • Persistent storage space in TB (disk files/local databases/local object store/etc.)
  • Tape archive space in TB
  • Breakdown for how you would use time on Aurora to make final preparations for science runs, and for the science runs themselves. Preparations might include final scaling tests, science problem spin-up runs, etc. For the science runs themselves, estimate the total exaFLOPS-hours and breakdown into separate components/milestones as appropriate. You should plan for completing all of this during the (approximately) three-month Early Science period, when you and the other Aurora ESP projects will have dedicated pre-production access. Early Science starts in the 2nd half of 2021 (exact date subject to change). You will have continued access after that three months, but you will be sharing it with all our production users then, and may run at lower priority. (1/2 page).

Section 4: Participation in Other Applications-Readiness Programs

Indicate whether your team, or others you are aware of using the same code base, have projects under the OLCF CAAR program. Also indicate if you have an active project in the Exascale Computing Project.

Section 5: Project Team Members, Research Funding

5a. ALCF Funded Postdoc

Assuming 100% effort by an ALCF postdoc on your project for 2 years, identify the roles and responsibilities you expect for the postdoc. What activities do you expect him/her to focus on?

5b. Names and Levels of Effort

List the names and levels of effort (as a percentage of full-time) for all team members you expect to do work on the ESP project. Indicate which aspects/areas of the project each person world work on.

For each person, include a CV. If you have trouble getting all of the CVs into the PDF proposal document you are submitting, email earlyscience@alcf.anl.gov for assistance.

5b. Funding Sources

List the funding source(s) for your research. Other than the ALCF postdoc, you are expected to have funding to cover your team’s effort.

Section 6: Commitments/Expectations

Please confirm that should your proposal be awarded as an ESP project, you will commit to meeting the following three requirements:

  1. Having your institution(s) sign a multiparty RSNDA (restricted-secret nondisclosure agreement) with system vendor(s), so that you may speak with ALCF and other ESP participants about RSNDA information
  2. Helping recruit an ALCF postdoc to work on your project team in a timely manner. The goal is to hire within the first year of the project
  3. During the first two months of your project (after selection), prepare a detailed project plan with tasks/milestones we can use to document and report progress throughout the time until Aurora is accepted and the Early Science dedicated access period begins; ALCF will help with guidance on this
  4. Provide simple planning documentation in the form of short-term activities/milestones, and report monthly on estimated percentage completion of milestones, as a means to track progress of your project.

Indicate this on your proposal by copying the requirements and indicating “Confirmed” next to each.

Good Luck!