Tutorial

This tutorial is going to go through the process of generating BAO constraints from the Pathfinder data. Just kidding! We’re actually just going to generate some simulated data and the turn it into maps.

Setting up the Pipeline

Before you start, make sure you have access to the CHIME bitbucket organisation, and have set up your ssh keys for access to the bitbucket repositories from the machine you want to run the pipeline on. Unless you are working on Scinet, you’ll also want to ensure you have an account on niedermayer and your ssh keys are set up to allow password-less login, to ensure the database connection can be set up.

There are a few software pre-requesites to ensure you have installed. Obviously python is one of them, with numpy and scipy installed, but you also need to have virtualenv, allowing us to install the pipeline and it’s dependencies without messing up the base python installation. To check you have it installed try running:

$ virtualenv --help

if you get an error, it’s not installed properly so you’ll need to fix it.

With that all sorted, we’re ready to start. First step, download the pipeline repository to wherever you want it installed:

$ git clone git@github.com/radiocosmology/draco.git

Then change into that directory, and run the script mkvenv.sh:

$ cd draco
$ ./mkvenv.sh

The script will do three things. First it will create a python virtual environment to isolate the CHIME pipeline installation. Second it will fetch the python pre-requesites for the pipeline and install them into the virtualenv. Finally, it will install itself into the new virtualenv. Look carefully through the messages output for errors to make sure it completed successfully. You’ll need to activate the environment whenever you want to use the pipeline. To do that, simply do:

$ source <path to pipeline>/venv/bin/activate

You can check that it’s installed correctly by firing up python, and attempting to import some of the packages. For example:

>>> from drift.core import telescope
>>> print telescope.__file__
/Users/richard/code/draco/venv/src/driftscan/drift/core/telescope.pyc
>>> from draco import containers
>>> print containers.__file__
/Users/richard/code/draco/draco/containers.pyc

External Products

If you are here, you’ve got the pipeline successfully installed. Congratulations.

There are a few data products we’ll need to run the pipeline that must be generated externally. Fortunately installing the pipeline has already setup all the tools we need to do this.

We’ll start with the beam transfer matrices, which describes how the sky gets mapped into our measured visibilities. These are used both for simulating observations given a sky map, and for making maps from visibilities (real or simulated). To generate them we use the driftscan package, telling it what exactly to generate with a YAML configuration file such as the one below.

config:
    # Only generate Beam Transfers.
    beamtransfers:      Yes
    kltransform:        No
    psfisher:           No

    output_directory:   beams

telescope:
    type:
        # Specify a custom class
        class:  PolarisedCylinderTelescope
        module: drift.telescope.cylinder

    freq_lower:         400.0
    freq_upper:         410.0
    num_freq:           5

    num_cylinders:      2
    num_feeds:          4
    feed_spacing:       0.3
    cylinder_width:     10.0

This file is run with the command:

$ drift-makeproducts run product_params.yaml

To simulate the timestreams we also need a sky map to base it on. The cora package contains several different sky models we can use to produce a sky map. The easiest method is to use the cora-makesky command, e.g.:

$ cora-makesky foreground 64 401.0 411.0 5 foreground_map.h5

which will generate an HDF5 file containing simulated foreground maps at each polarisation (Stokes I, Q, U and V) with five frequency channels between 401.0 and 411.0 MHz. Each map is in Healpix format with NSIDE=16. There are options to produce 21cm signal simulations as well as point source only, and galactic synchrotron maps.

Map-making with the Pipeline

The CHIME pipeline is built using the infrastructure developed by Kiyo in the caput.pipeline module. Python classes are written to perform task on the data, and a YAML configuration file describes how these should be configured and connected together. Below I’ve put the configuration file we are going to use to make maps from simulated data:

pipeline:
    tasks:
        -   type:       draco.core.io.LoadBeamTransfer
            out:        tel_and_bt
            params:
                product_directory:  "testbeams/bt/"

        -   type:       draco.synthesis.stream.SimulateSidereal
            requires:   tel_and_bt
            out:        sstream
            params:
                save:   Yes
                output_root: teststream_

        -   type:       draco.analysis.transform.MModeTransform
            in:         sstream
            out:        mmodes

        -   type:       draco.analysis.mapmaker.DirtyMapMaker
            requires:   tel_and_bt
            in:         mmodes
            out:        dirtymap
            params:
                nside:      128
                save:   Yes
                output_root: map_dirty2_

        -   type:       draco.analysis.mapmaker.WienerMapMaker
            requires:   tel_and_bt
            in:         mmodes
            out:        wienermap
            params:
                nside:      128
                save:   Yes
                output_root: map_wiener2_

Before we jump into making the maps, let’s briefly go over what this all means. For further details you can consult the caput documentation on the pipeline.

The bulk of this configuration file is a list of tasks being configured. There is a type field where the class is specified by its fully qualified python name (for example, the first task draco.io.LoadBeamTransfer). To connect one task to another, you simply specify a label for the output of one task, and give the same label to the input or requires of the other task. The labels themselves are dummy variables, any string will do, provided it does not clash with the name of another label. The distinction between input and requires is that the first is for an input which is passed every cycle of the pipeline, and the second is for something required only at initialisation of the task.

Often we might want to configure a task from the YAML file itself. This is done with the params section of each task. The named items within this section are passed to the pipeline class when it is created. Each entry corresponds to a config.Property attribute on the class. For example the SimulateSidereal class has parameters that can be specified:

class SimulateSidereal(task.SingleTask):
    """Create a simulated timestream.

    Attributes
    ----------
    maps : list
        List of map filenames. The sum of these form the simulated sky.
    ndays : float, optional
        Number of days of observation. Setting `ndays = None` (default) uses
        the default stored in the telescope object; `ndays = 0`, assumes the
        observation time is infinite so that the noise is zero. This allows a
        fractional number to account for higher noise.
    seed : integer, optional
        Set the random seed used for the noise simulations. Default (None) is
        to choose a random seed.
    """
    maps = config.Property(proptype=list)
    ndays = config.Property(proptype=float, default=0.0)
    seed = config.Property(proptype=int, default=None)

    ...

In the YAML file we configured the task as follows:

-   type:       draco.synthesis.stream.SimulateSidereal
    requires:   [tel, bt]
    out:        sstream
    params:
        maps:   [ "testfg.h5" ]
        save:   Yes
        output_root: teststream_

Of the three properties available from the definition of SimulateSidereal we have only configured one of them, the list of maps to process. The remaining two entries of the params section are inherited from the pipeline base task. These simply tell the pipeline to save the output of the task, with a base name given by output_root.

The pipeline is run with the caput-pipeline script:

$ caput-pipeline run pipeline_params.yaml

What has it actually done? Let’s just quickly go through the tasks in order:

Load the beam transfer manager from disk. This just gives the pipeline access to all the beam transfer matrices produced by the driftscan code.
Load a map from disk, use the beam transfers to transform it into a sidereal timestream.
Select the products from the timestream that are understood by the given beam transfer manager. In this case it won’t change anything, but this task can subset frequencies and products as well as average over redundant baselines.
Perform the m-mode transform on the sidereal timestream.
Apply the map maker to the m-modes to produce a dirty map.
Apply the map maker to the generate a Wiener filtered map.

Ninja Techniques

Running on a cluster. Coming soon….