LSST AGN Science Collaboration 2021 Data Challenge

gtrichards · July 9, 2021, 5:04pm

We are announcing the LSST AGN Science Collaboration’s 2021 Data Challenge. The purpose of this challenge is to help get more people involved in the work needed to do AGN science with the upcoming LSST data. For this purpose, we have produced a common exploratory dataset that can be used to develop tools for 1) parameterization of AGN light curves, 2) AGN selection, and 3) AGN photo-z. A panel of judges (consisting of the AGN SC leadership team and multiple members of other LSST science collaborations) will award prizes for derivative work that advances the goals of the LSST AGN SC and AGN science with LSST in general. We have LSSTC funding to award 1st, 2nd, and 3rd prizes of $2000, $1500, and $1000, respectively. In addition there is $5000 of funding for participation awards (10-20 at $250-$500) and $3000 for page charges to encourage publications that are derived from the competition. The deadline for submissions will be 17 September 2021.

More details about the competition and the data set(s) available for your use can be found at GitHub - RichardsGroup/AGN_DataChallenge: Information for LSSTC AGN Data Challenge.

Submissions would ideally be in the form of Jupyter notebooks, but the panel of judges will consider all reasonable submissions adhering to the following format:

Introduction (What is the main goal your submission addresses with the data challenge? Does this goal relate to items in the AGN SC roadmap? If not, should a new item be added to the AGN SC road map?)
Data (What data sets are you using for the challenge (see details about what data are available? Why do these data sets allow you to address your main goal?)
Methods (Describe your method to extract the information for your main goal from the data. What is innovative about your implementation/application of this method to the data?)
Results (Summarize your results. Include plots and statistics that illustrate your results. Discuss future improvements to the method or what future features could help to improve your results.)
Code (Provide enough code that your results can be confirmed and tested by the judges on a “blinded” subsample; see below. Prize winners will be required to make their code available to the AGN SC and/or broader LSST community.)

We imagine that most submissions will fall into 3 categories:

AGN Classification as measured by both completeness and efficiency.
AGN Photo-z accuracy as measured by a robust estimator of the RMS and an outlier fraction. Specifically sigma_NMAD and f_out, see Lee & Chary 2020, Equation 1.
Other results not otherwise specified herein along the lines of “Most creative effort to do things that we haven’t thought of”. For example characterization of light curve data. This category is important as we are using real data (rather than simulated) that often lacks “truth” (class and redshift). Thus there may be submissions that are exploratory in nature that do not address category 1 or 2. Such submissions could include code that allows for additional data to be added to the challenge (e.g., thumbnail images). Category 3 submissions will largely be considered for participation awards unless the submitter is able to make its relevance/importance exceedingly clear to the judges.

Blinded data: Users will need to generate their own training and test sets from the data provided, but the builders have set aside a “blinded” subsample that will be used to test submissions addressing categories 1 and 2 (or 3 if appropriate)

Judges will be guided by these categories, metrics, and analysis of blinded data, but not beholden to them as we imagine that submissions may have value beyond such statistics.

MelissaGraham · July 9, 2021, 6:32pm

ywx649999311 · August 31, 2021, 7:01pm

Hi,
Just a quick announcement regarding the ongoing LSST AGN Data Challenge, we have added thumbnails/cutouts for all objects (except ~130) in the Object table from SDSS DR16. A new example notebook showing how to access those thumbnails/cutouts has also been added to the github repo. See AGN_DataChallenge/05_Cutout.ipynb at main · RichardsGroup/AGN_DataChallenge · GitHub.

Enjoy!
Weixiang, Gordon

ywx649999311 · September 9, 2021, 5:20pm

Hi,

After seeking input from the AGN SC members (those who attended the September telecon), we have decided to extend the deadline for submission to October 15th, 2021. Please stay tuned for the submission procedure and feel free to concat us (@ywx649999311 and @gtrichards) if you have anything questions.

Best,
Weixiang, Gordon

ywx649999311 · October 14, 2021, 5:42pm

Hi All,

The deadline for the LSSTC AGN Data Challenge is approaching!
Please submit your solution to Gordon Richards (gtr@physics.drexel.edu)
with “LSSTC AGN Data Challenge” in your email subject title by
11:59 PM Eastern Time on Friday, October 15 (tomorrow!!). Please follow the
instructions specified below:

Submissions should be in the form of a Jupyter notebook, with
sufficient narrative (i.e., not just code and plots) for judges to
follow what you have done.
We must be able to run that notebook in the SciServer environment.
Ideally, a detailed procedure on how to reproduce your result should be
provided, which should at least specify the exact computing image and
conda environment used. If you have included/used additional
data/software, please let us know how to recreate your environment.
Your submission doesn’t have to be run on the blinded data set, but
if you would like it evaluated against it, please let us know.
If your ML model takes a long time to train and you would like to
test your model on the blinded data set, you should provide the trained
model as well (and the code to read in your trained model in the
submitted Jupyter notebook).

Thanks again for your interest in the LSSTC AGN Data Challenge and
we are looking forward to learn your solution.

Regards,
Weixiang, Gordon