Data Reduction Tutorial

This service enables reduction and analysis of precision radial velocity (PRV) data from the HIRES Keck instrument.

This notebook is meant to be a template to set up your own processing using the code snippets provided here. The notebook may not run exactly as presented if you attempt to run the full notebook directly. The code has not been rigorously tested from within a notebook environemnt.

This notebook introduces the Keck HIRES Precision Radial Velocity (PRV) pipeline service and works through one specific example. There are number of variations, mostly having to do with planning processing, which will be covered in more detail by other notebooks but here you will see all the basics.

The notebook is kept with the HIRES PRV Python access toolkit in GitHub: https://github.com/Caltech-IPAC/hiresprv

Login

Logging in the first time creates a workspace for the user and associates it with a KOA account. This workspace is initialized containing pre-processed products for all HIRESprv compatible data in the archive. Please have a look around and search for existing observations of your target stars before collecting new observations.

Users of this service must have Keck Observatory Archive (KOA) accounts and use that login here to gain access to their data. Even researchers planning to use only public data will need a KOA login as this service is maintaining persistent storage under that ID.

The login is persisted through the use of HTTP cookies and logging in from multiple clients will connect the user to the same account, storage, and processing history. This environment (user workspace) is permanent as we expect some on-going research to span years. The login for a give client machine need only be done once, assuming the cookie file in the local storage is not deleted. If it is, logging in again will reconstruct it.

The cookie file, processing state information, and downloaded results like 1D spectra and RV curve tables will be kept locally in the same space as this notebook. If you wish to change that, simply add in whatever directory management and navigation you like.

[1]:
from hiresprv.auth import login

login('prv.cookies')
Successful login as bfulton

KOA Data Retrieval

The PRV workspace first needs to be populated with your data from KOA. This can be done all at once, if the data exists, or incrementally as the data are taken/identified. If the requested night is already contained in the pre-processed dataset or if you have already processed the night it will be skipped over.

This step is more than a simple data transfer. “Raw reduction” of the data, which converts the 2D CCD echelle images to 1D spectra, is done up-front as the data are retrieved a night at a time. The UT dates you give here are actually shifted a few hours to catch any calibration data collected in the afternoon of the same (Hawaii-local) day.

[1]:
from hiresprv.archive import Archive

koa = Archive('prv.cookies')

# These nights are known to work, but were not included in the pre-processed dataset
rtn = koa.by_dates("""2018-09-02
2018-09-03
2018-09-04
2018-09-05
2018-09-22
""")

{
    "status": "ok",
    "msg": "Processing dates in background."
}

Note that since the data in the workspace are permanent, repeated request for the same data would not change anything and so those dates will be ignored. Therefore, you can add to the above list or replace it with new dates as you choose. Dates must be formatted as YYYY-MM-DD.

The above service responds immediately with an acknowledgement of the request and starts the actual transfer and raw data reduction (which can take some time) as a background job. The job status can be checked by polling or can be monitored using the function below. While one retrieval job (or processing job below) is running, no others can be initiated.

PRV Processing Monitor

Some steps in the PRV processing can take quite a long time (hours) and we do not want to tie up this notebook page waiting for it to finish. Below we show how to retrieve a snapshot of the status (and you would have to poll manually to track the progress of the job) but the preferred approach is to start a real-time monitor in a custom page/tab which uses Javascript and an HTTP Event stream. Run the next cell to generate a link to this monitor:

[3]:
from hiresprv.status import Status
from IPython.display import IFrame, HTML, display

monitor = Status('prv.cookies')

link = monitor.generate_link()

HTML(link)
[3]:

Metadata

Once data have been retrieved and the “nightly” raw reduction performed, a set of records is added to a persistent metadata table, one row is added per observation. These observations are all taken through the HIRES PRV instrument (2D CCD) and will have been reduced to 1D spectra by the raw reduction. They fall into five classes:

  • RV observations – Multiple observations of a star with the iodine cell in the light path. Precision, relative radial velocities are calculated for this type of observation.

  • Templates – One long observation of the same star without iodine, for reference.

  • B stars – A set of observations of rapidly rotating B stars bracketing the template observation and used to reduce it.

  • Iodine – Reference observation of iodine for nightly calibration.

  • Miscellaneous other calibration observations (labelled as “Unknown”).

By inspecting this table, the user can determine what objects were observed, whether there are template observations for them (and adequate B star data to reduce a template), and whether there are enough RV measurements to generate a final RV curve.

With a small metadata table, this is simple enough to do by inspection but a typical workspace can easily have thousands of files covering tens or hundreds of objects. Furthermore, since observations for a single object are frequently spread out over years, the metadata table is often fairly thorougly mixed in time.

Luckily, there are a number of tools available in client-side Python subset and organize the metadata, so we provide it for download as a CSV table or an SQLite binary file or even, as here, as a simple HTML table. The workspace copy of the data is maintained in an SQLite database so we also provide a basic filtering mechanism as an optional addition to the download. This filtering is often adequate for basic processing scenarios.

Note that metadata retrieval can’t be done while the system is “busy” (downloading additional data or further reducing data in the workspace). Otherwise, metadata downloads can be done at any time.

Also note that the client-side file will become out-of-date once new data download or processing requests are submitted. It is up to the user to re-request the new metadata.

[2]:
from hiresprv.database import Database
import pandas as pd

state = Database('prv.cookies')

# We limit to 100 entries so that the query returns quickly in this tutorial
search_string = "select * from FILES LIMIT 100;"
url = state.search(sql=search_string)

df = pd.read_html(url, header=0)[0]
df.head(15)
[2]:
DATE DEACTIVATED OBTYPE FILENAME TARGET MJD BJD BCVEL RADVEL RA DEC EPOCH HRANG RA_MOTION DEC_MOTION PARALLAX ORIGFILENAME KOAID
0 20091231 0 Unknown r20091231.1 th-ar 2.455197e+06 2.455197e+06 NaN NaN NaN NaN NaN 0.000 NaN NaN NaN j820001.fits HI.20091231.06409.fits
1 20091231 0 Unknown r20091231.2 th-ar 2.455197e+06 2.455197e+06 NaN NaN NaN NaN NaN 0.000 NaN NaN NaN j820002.fits HI.20091231.06653.fits
2 20091231 0 Unknown r20091231.3 th-ar 2.455197e+06 2.455197e+06 NaN NaN NaN NaN NaN 0.000 NaN NaN NaN j820003.fits HI.20091231.07296.fits
3 20091231 0 Unknown r20091231.4 th-ar 2.455197e+06 2.455197e+06 NaN NaN NaN NaN NaN 0.000 NaN NaN NaN j820004.fits HI.20091231.07594.fits
4 20091231 0 Unknown r20091231.5 th-ar 2.455197e+06 2.455197e+06 NaN NaN NaN NaN NaN 0.000 NaN NaN NaN j820005.fits HI.20091231.07645.fits
5 20091231 0 Unknown r20091231.6 th-ar 2.455197e+06 2.455197e+06 NaN NaN NaN NaN NaN 0.000 NaN NaN NaN j820006.fits HI.20091231.07801.fits
6 20091231 0 Unknown r20091231.7 th-ar 2.455197e+06 2.455197e+06 NaN NaN NaN NaN NaN 0.000 NaN NaN NaN j820007.fits HI.20091231.07848.fits
7 20091231 0 Unknown r20091231.8 th-ar 2.455197e+06 2.455197e+06 NaN NaN NaN NaN NaN 0.000 NaN NaN NaN j820008.fits HI.20091231.07964.fits
8 20091231 0 Unknown r20091231.9 th-ar 2.455197e+06 2.455197e+06 NaN NaN NaN NaN NaN 0.000 NaN NaN NaN j820009.fits HI.20091231.08011.fits
9 20091231 0 Iodine calibration r20091231.10 iodine 2.455197e+06 2.455197e+06 NaN NaN NaN NaN NaN 0.000 NaN NaN NaN j820010.fits HI.20091231.08160.fits
10 20091231 0 Iodine calibration r20091231.11 iodine 2.455197e+06 2.455197e+06 NaN NaN NaN NaN NaN 0.000 NaN NaN NaN j820011.fits HI.20091231.08233.fits
11 20091231 0 RV observation r20091231.66 HD204277 2.455197e+06 2.455197e+06 -20657.870 9320.0 21 27 06.53 +16 07 25.00 2015.5 3.250 -0.0783 -0.0962 0.030195 j820066.fits HI.20091231.15897.fits
12 20091231 0 RV observation r20091231.67 HD210460 2.455197e+06 2.455197e+06 -23885.341 20630.0 22 10 19.12 +19 36 57.00 2015.5 2.557 0.0917 -0.0935 0.017691 j820067.fits HI.20091231.15988.fits
13 20091231 0 RV observation r20091231.68 HD201091 2.455197e+06 2.455197e+06 -16012.432 NaN 21 06 53.74 +38 44 29.00 2015.5 3.646 NaN NaN NaN j820068.fits HI.20091231.16113.fits
14 20091231 0 RV observation r20091231.69 HD201092 2.455197e+06 2.455197e+06 -16013.712 NaN 21 06 53.74 +38 44 29.00 2015.5 3.675 NaN NaN NaN j820069.fits HI.20091231.16208.fits

Reducing RV Measurements for a Star

Subsetting the Metadata: Single Target

Ultimately, to make an RV curve for one star we need to reduce its observations into RV measurements. Assuming there are adequate B star observations to reduce the template, we can isolate appropriate records in the metadata above by simply filtering on TARGET name. There are many ways to do this; in our case we we used the remote SQLite query capability and filtered it with

select DATE, OBTYPE, FILENAME, TARGET, BJD, BCVEL from FILES where TARGET like ‘HD185144’;

The resulting records are shown below.

[5]:

search_string = "select DATE,OBTYPE,FILENAME,TARGET,BJD,BCVEL from FILES where TARGET like 'HD185144';"

url = state.search(sql=search_string)
df = pd.read_html(url, header=0)[0]
df.head(15)
[5]:
DATE OBTYPE FILENAME TARGET BJD BCVEL
0 20091231 RV observation r20091231.72 HD185144 2.455197e+06 -4620.095
1 20091231 RV observation r20091231.73 HD185144 2.455197e+06 -4620.211
2 20091231 RV observation r20091231.74 HD185144 2.455197e+06 -4620.321
3 20091231 Template r20091231.79 HD185144 2.455197e+06 -4621.229
4 20091231 Template r20091231.80 HD185144 2.455197e+06 -4621.324
5 20091231 Template r20091231.81 HD185144 2.455197e+06 -4621.420
6 20091231 Template r20091231.82 HD185144 2.455197e+06 -4621.511
7 20091231 Template r20091231.83 HD185144 2.455197e+06 -4621.597
8 20150606 RV observation r20150606.145 HD185144 2.457180e+06 3189.795
9 20150606 RV observation r20150606.146 HD185144 2.457180e+06 3189.328
10 20150606 RV observation r20150606.147 HD185144 2.457180e+06 3188.863
11 20080516 RV observation r20080516.177 HD185144 2.454603e+06 2061.135
12 20080517 RV observation r20080517.181 HD185144 2.454604e+06 2124.858
13 20080617 RV observation r20080617.132 HD185144 2.454635e+06 3884.757
14 20080617 RV observation r20080617.133 HD185144 2.454635e+06 3884.242

Templates and B-stars

Another subset that comes up is matching B-Star observations with the template observations they will be used with. This can be many to many so the easiest quick look is just to list out all B-star and Template observations in time order and then match visually:

select DATE, OBTYPE, FILENAME, TARGET, BJD from FILES where OBTYPE like ‘TEMPLATE’ or OBTYPE like ‘B Star’;

The resulting records are shown below.

[14]:
search_string = "select DATE, OBTYPE, FILENAME, TARGET, BJD from FILES where OBTYPE like 'TEMPLATE' or OBTYPE like 'B Star';"

url = state.search(sql=search_string)

df = pd.read_html(url, header=0)[0]
df.head(15)
[14]:
DATE OBTYPE FILENAME TARGET BJD
0 20091231 B star r20091231.75 HR9071 2.455197e+06
1 20091231 B star r20091231.76 HR9071 2.455197e+06
2 20091231 B star r20091231.77 HR9071 2.455197e+06
3 20091231 B star r20091231.78 HR9071 2.455197e+06
4 20091231 Template r20091231.79 HD185144 2.455197e+06
5 20091231 Template r20091231.80 HD185144 2.455197e+06
6 20091231 Template r20091231.81 HD185144 2.455197e+06
7 20091231 Template r20091231.82 HD185144 2.455197e+06
8 20091231 Template r20091231.83 HD185144 2.455197e+06
9 20091231 B star r20091231.84 HR8047 2.455197e+06
10 20091231 B star r20091231.85 HR8047 2.455197e+06
11 20091231 B star r20091231.86 HR8047 2.455197e+06
12 20091231 B star r20091231.87 HR8047 2.455197e+06
13 20091231 Template r20091231.88 HD216520 2.455197e+06
14 20091231 Template r20091231.89 HD216520 2.455197e+06

RV Pipeline Processing

This shows that on 12/31/2009 three separate RV observations were made of HD185144 followed by five template observations (which the pipeline will combine into a single template). Five years later, another three RV observations were made.

As with the data download, the further reduction steps in the pipeline can be quite lengthy (minutes to hours each), so rather than have the user monitor each one, we provide a scripting mechanism so complex reduction jobs can be submitted in on shot.

In order to turn any of the RV observations into an RV value, we need the template. So we will generate that first. Since it is possible to repeat the template observations on more than one day, we need to explicitly state which object and which day. The script command for this is:

template 185144 20091231

To reduce an RV measurement, we have to refer to this template (the target name is enough) and specify which file to reduce. For example:

rv 185144 r20091231.7

Finally, once we have a set of RV measurements for an object, we a generate an RV curve (the pipeline finds all the appropriate RV measurements):

rvcurve 185144

As long as we follow the general rules that we need a template before we can reduce an RV measurement and we need at least three RV measurements before we can generate an RV curve, we can otherwise scripte things in whatever order we wish (e.g. all the templates first).

All of this is submitted to the pipeline as a text script, but first lets figure out which observations need to be processed.

[24]:
search_string = "select * from FILES where TARGET like 'HD185144' and DATE = '20180922';"

url = state.search(sql=search_string)
df = pd.read_html(url, header=0)[0]
df
[24]:
DATE DEACTIVATED OBTYPE FILENAME TARGET MJD BJD BCVEL RADVEL RA DEC EPOCH HRANG RA_MOTION DEC_MOTION PARALLAX ORIGFILENAME KOAID
0 20180922 0 RV observation r20180922.60 HD185144 2.458384e+06 2.458384e+06 2405.133 26580.0 19 32 23.37 +69 39 13.00 2015.5 -1.023 0.5979 -1.7383 0.17324 j3040160.fits HI.20180922.17329.fits
1 20180922 0 RV observation r20180922.61 HD185144 2.458384e+06 2.458384e+06 2404.570 26580.0 19 32 23.37 +69 39 13.00 2015.5 -1.009 0.5979 -1.7383 0.17324 j3040161.fits HI.20180922.17378.fits
2 20180922 0 RV observation r20180922.62 HD185144 2.458384e+06 2.458384e+06 2404.007 26580.0 19 32 23.37 +69 39 13.00 2015.5 -0.996 0.5979 -1.7383 0.17324 j3040162.fits HI.20180922.17427.fits
[25]:
from hiresprv.idldriver import Idldriver

idl = Idldriver('prv.cookies')

# This example would create the template, reduce the RV observations, and then construct the
# RV time series
full_example = """
template 185144 20091231
rv 185144 r20091231.72
rv 185144 r20091231.73
rv 185144 r20091231.74
rv 185144 r20150606.145
rv 185144 r20150606.146
rv 185144 r20150606.147
rvcurve 185144
"""

# Since the template for HD185144 has already been created in the pre-processed dataset
# we can start by calculating the RVs only for the new observations.
new_rvs = """
rv 185144 r20180922.60
rv 185144 r20180922.61
rv 185144 r20180922.62
"""

rtn = idl.run_script(new_rvs)

print(rtn)
status= ok
msg= Script running in background. Consult monitor for status.
None

Monitoring (again)

To monitor the pipeline processing request, the best idea is to use the same monitor page from above. It stops whenever a given script is finished but you can restart it any time to see the currently-running job. You can also insert a monitor start-up or status polling call here as well.

Product Retrieval

There is a utility function for retrieving the RV curves (CSV files) for each target (similarly, there is a function – data.spectrum – for retrieving the 1D FITS spectrum files).

[5]:
from hiresprv.download import Download

# second argument is path where files will be downloaded into
data = Download('prv.cookies', './')

rtn = data.rvcurve('185144')

with open('vst185144.csv', 'r') as file:
  for line in file:
    print(line, end='')

BJD_TDB,RV,RV_ERR,BC,ADU,CHI2
15196.69208800001,-2.860674413602231,0.789710,-4620.095214843750,52362,1.05080
15196.69270200003,1.730315543645411,0.812607,-4620.210937500000,51591,1.04891
15196.69329199987,1.431450932849171,0.803164,-4620.320800781250,48950,1.05804
17180.10972899990,-2.845576383390323,0.890091,3189.794921875000,55029,1.10542
17180.11030799989,1.235438116773508,0.825978,3189.327880859375,55717,1.10771
17180.11088699987,1.047083913731319,0.828699,3188.863037109375,48769,1.10120