UCSD-2019: Technical Working Group: Data Management

From CMB-S4 wiki
Jump to: navigation, search

Link back to agenda

Charge

  1. Identify key decisions that must be made (and justified) prior to CD-1/PDR,
  2. Make progress on (or actually make) those decisions,
  3. Lay out a timeline and process for making each decision, consistent with the post-decision work and internal reviews that will be needed to complete preparations for CD-1/PDR,
  4. Ensure that those timelines and processes are understood and supported by the collaboration, and that we (together) believe we can follow them.

Agenda

  1. L2 Overview (Julian Borrill/Tom Crawford) slides
    1. Subsystem Management (Julian Borrill) slides
    2. Data Movement (Sasha Rahlin) slides
    3. Software Infrastructure (Ted Kisner) slides
    4. Data Synthesis (Sara Simon/Andrea Zonca) slides
    5. Data Reduction (Colin Bischoff/Reijo Keskitalo) slides
    6. Transients (Don Petravik/Nathan Whitehorn) slides (note conflict with Transients parallel)
    7. Site Hardware (Tom Crawford) slides
  2. Simulations for Flowdown (Sara Simon) slides

Remote attendance

Zoom link


Notes

Intro

Big questions to answer here:

  • What are we missing?

Note that the L2-level stuff, including the bi-weekly telecon, is all coordination and management; real work is done at L3 and below.

DM scope redefined as raw data coming off the telescopes to well-characterized "reduced data (maps, etc.)." (Used to be "well-characterized maps" but transients...)

  • also responsible for mock data sets to support decisions in other WBSs.

DM transitions to operations in ~2026

  • but there's a data challenge scheduled for 2027, should we change that? (MEM)
  • maybe say DM "begins transition" to ops in 2026.

Discussion about boundary between roles of project DM (raw data to maps) and collaboration analyzers (maps to science).

What does "well-characterized" mean? (RS)

  • Something we probably need to define better, along with analysis working groups.

Is there a document stating "at stage X in DOE/NSF project maturity, we need set Y of simulations"? (SH)

  • No. There probably should be.
    • A worry about asking AWGs what is needed is that they will say "everything," which is hard. (KH)

Do we really need HPC for anything? (JV)

  • For things that care about interprocess communication (which is important for capturing some types of correlations).
  • We are not the only people building in interoperability in HPC/HTC, so we should be able to piggyback. (SH)


L3: Transients

Draft WBS expected next week.

DOE doesn't do transients, so... (GG)

  • before he can finish, many people jump in with "yes it does"
  • so resources come from both sides?
    • hardware at Pole definitely from NSF


L3: Subsystem Management

Why are Pole and Atacama computing resources being crossed off? (ASR)

  • because there's a new L3 for that.

Is there software for all of the Data Challenge stuff in place? (KH)

  • much of it, yes, but not necessarily validated against all sites and instruments we want

If someone asked you "what actually needs to be simulated to get to Baseline Design," what would you say? (GG)

  • hardest thing is going to be instrument non-idealities like beam details


L3: Data Movement

What is the current TDRSS coverage? (MEM)

  • 4 hours/day, but the total bandwidth we are allocated is only ~125GB/day, which is a factor of at least 40 too low.
  • What about Starlink and other commercial options? (TK)
    • Looking into it.
  • What sort of lossy compression has been investigated? (SH)
    • Only downsampling. And sending back maps instead of TOD.

How much data traffic do you anticipate once the data is stored?

  • Any such traffic (with real data) will be tiny compared to distributing sims.

What is the cost model for data movement? (GG)

  • "FedEx" model is fully costed; buying more bandwidth is not.


L3: Software Infrastructure

What database options have you looked into? (SH)

  • Depends on how big these metadata will be (partially depends on how transients go).


L3: Data Synthesis

Jason Stevens will be working on scan strategy implementation.

If we go to the trouble of simulating data sets for other experiments, will we provide those simulated data sets to those collaborations? (TC)

  • Yes. In fact, we will make them publicly available. (SO is planning on making their sims public.)

Just a note that an S4 noise calculator for the LATs does exist. (TC)

  • What about SATs?
    • Ummmmmmmmmm, it's complicated. We need to somehow merge what has happened so far on the SAT side and the proposed unified simulation framework.

Want to make sure that AWGs have input on the input to the simulations. (TC)

  • Absolutely. We usually have the opposite problem (we go to the AWGs and beg them for input and they spurn us) (AZ).
  • This is the main purpose of the proposed cross-cutting simulation telecons.


L3: Data Reduction

Major milestones on DM calendar need to be shifted around to reconcile FY and CY (MEM).

  • Or at least we should make sure they are ok.


L3: Site Computing Hardware

Can Chilean transient analysis be done with local Chilean computing as well?

  • Also, maybe rewrite that risk as: "What is the risk to transient science if the link to Chile goes down for a certain amount of time, and no real-time analysis can happen during that time, and what is the likelihood of that actually occurring?"


Simulations for Flowdown

At what level has this been done for ACT and SPT, and, if these experiments have not done it at this level, what is the justification for doing it at this level now? (KH)

  • It has been done at some level for ACT and SPT but not at the fidelity proposed here. But it was critical for Planck.


Question for Gil G. (as representative of TBD group): How are we doing? Are we on the right track or off on a wild goose chase?

  • You're in pretty good shape, at least comparatively.


Important question for plenary: "How long before an agency review do the relevant simulations need to be produced?"