Data Reduction Tutorial¶

This service enables reduction and analysis of precision radial velocity (PRV) data from the HIRES Keck instrument.

This notebook is meant to be a template to set up your own processing using the code snippets provided here. The notebook may not run exactly as presented if you attempt to run the full notebook directly. The code has not been rigorously tested from within a notebook environemnt.

This notebook introduces the Keck HIRES Precision Radial Velocity (PRV) pipeline service and works through one specific example. There are number of variations, mostly having to do with planning processing, which will be covered in more detail by other notebooks but here you will see all the basics.

The notebook is kept with the HIRES PRV Python access toolkit in GitHub: https://github.com/Caltech-IPAC/hiresprv

Login¶

Logging in the first time creates a workspace for the user and associates it with a KOA account. This workspace is initialized containing pre-processed products for all HIRESprv compatible data in the archive. Please have a look around and search for existing observations of your target stars before collecting new observations.

Users of this service must have Keck Observatory Archive (KOA) accounts and use that login here to gain access to their data. Even researchers planning to use only public data will need a KOA login as this service is maintaining persistent storage under that ID.

The login is persisted through the use of HTTP cookies and logging in from multiple clients will connect the user to the same account, storage, and processing history. This environment (user workspace) is permanent as we expect some on-going research to span years. The login for a give client machine need only be done once, assuming the cookie file in the local storage is not deleted. If it is, logging in again will reconstruct it.

The cookie file, processing state information, and downloaded results like 1D spectra and RV curve tables will be kept locally in the same space as this notebook. If you wish to change that, simply add in whatever directory management and navigation you like.

[1]:

from hiresprv.auth import login

login('prv.cookies')

Successful login as bfulton

KOA Data Retrieval¶

The PRV workspace first needs to be populated with your data from KOA. This can be done all at once, if the data exists, or incrementally as the data are taken/identified. If the requested night is already contained in the pre-processed dataset or if you have already processed the night it will be skipped over.

This step is more than a simple data transfer. “Raw reduction” of the data, which converts the 2D CCD echelle images to 1D spectra, is done up-front as the data are retrieved a night at a time. The UT dates you give here are actually shifted a few hours to catch any calibration data collected in the afternoon of the same (Hawaii-local) day.

[1]:

from hiresprv.archive import Archive

koa = Archive('prv.cookies')

# These nights are known to work, but were not included in the pre-processed dataset
rtn = koa.by_dates("""2018-09-02
2018-09-03
2018-09-04
2018-09-05
2018-09-22
""")

{
    "status": "ok",
    "msg": "Processing dates in background."
}

Note that since the data in the workspace are permanent, repeated request for the same data would not change anything and so those dates will be ignored. Therefore, you can add to the above list or replace it with new dates as you choose. Dates must be formatted as YYYY-MM-DD.

The above service responds immediately with an acknowledgement of the request and starts the actual transfer and raw data reduction (which can take some time) as a background job. The job status can be checked by polling or can be monitored using the function below. While one retrieval job (or processing job below) is running, no others can be initiated.

PRV Processing Monitor¶

Some steps in the PRV processing can take quite a long time (hours) and we do not want to tie up this notebook page waiting for it to finish. Below we show how to retrieve a snapshot of the status (and you would have to poll manually to track the progress of the job) but the preferred approach is to start a real-time monitor in a custom page/tab which uses Javascript and an HTTP Event stream. Run the next cell to generate a link to this monitor:

[3]:

from hiresprv.status import Status
from IPython.display import IFrame, HTML, display

monitor = Status('prv.cookies')

link = monitor.generate_link()

HTML(link)

[3]:

Launch real-time monitor.

Metadata¶

Once data have been retrieved and the “nightly” raw reduction performed, a set of records is added to a persistent metadata table, one row is added per observation. These observations are all taken through the HIRES PRV instrument (2D CCD) and will have been reduced to 1D spectra by the raw reduction. They fall into five classes:

RV observations – Multiple observations of a star with the iodine cell in the light path. Precision, relative radial velocities are calculated for this type of observation.
Templates – One long observation of the same star without iodine, for reference.
B stars – A set of observations of rapidly rotating B stars bracketing the template observation and used to reduce it.
Iodine – Reference observation of iodine for nightly calibration.
Miscellaneous other calibration observations (labelled as “Unknown”).

By inspecting this table, the user can determine what objects were observed, whether there are template observations for them (and adequate B star data to reduce a template), and whether there are enough RV measurements to generate a final RV curve.

With a small metadata table, this is simple enough to do by inspection but a typical workspace can easily have thousands of files covering tens or hundreds of objects. Furthermore, since observations for a single object are frequently spread out over years, the metadata table is often fairly thorougly mixed in time.

Luckily, there are a number of tools available in client-side Python subset and organize the metadata, so we provide it for download as a CSV table or an SQLite binary file or even, as here, as a simple HTML table. The workspace copy of the data is maintained in an SQLite database so we also provide a basic filtering mechanism as an optional addition to the download. This filtering is often adequate for basic processing scenarios.

Note that metadata retrieval can’t be done while the system is “busy” (downloading additional data or further reducing data in the workspace). Otherwise, metadata downloads can be done at any time.

Also note that the client-side file will become out-of-date once new data download or processing requests are submitted. It is up to the user to re-request the new metadata.

[2]:

from hiresprv.database import Database
import pandas as pd

state = Database('prv.cookies')

# We limit to 100 entries so that the query returns quickly in this tutorial
search_string = "select * from FILES LIMIT 100;"
url = state.search(sql=search_string)

df = pd.read_html(url, header=0)[0]
df.head(15)

[2]:

	DATE	OBTYPE	FILENAME	TARGET	MJD	BJD	BCVEL	RADVEL	RA	DEC	EPOCH	HRANG	RA_MOTION	DEC_MOTION	PARALLAX	ORIGFILENAME	KOAID
0	20091231	Unknown	r20091231.1	th-ar	2.455197e+06	2.455197e+06	NaN	NaN	NaN	NaN	NaN	0.000	NaN	NaN	NaN	j820001.fits	HI.20091231.06409.fits
1	20091231	Unknown	r20091231.2	th-ar	2.455197e+06	2.455197e+06	NaN	NaN	NaN	NaN	NaN	0.000	NaN	NaN	NaN	j820002.fits	HI.20091231.06653.fits
2	20091231	Unknown	r20091231.3	th-ar	2.455197e+06	2.455197e+06	NaN	NaN	NaN	NaN	NaN	0.000	NaN	NaN	NaN	j820003.fits	HI.20091231.07296.fits
3	20091231	Unknown	r20091231.4	th-ar	2.455197e+06	2.455197e+06	NaN	NaN	NaN	NaN	NaN	0.000	NaN	NaN	NaN	j820004.fits	HI.20091231.07594.fits
4	20091231	Unknown	r20091231.5	th-ar	2.455197e+06	2.455197e+06	NaN	NaN	NaN	NaN	NaN	0.000	NaN	NaN	NaN	j820005.fits	HI.20091231.07645.fits
5	20091231	Unknown	r20091231.6	th-ar	2.455197e+06	2.455197e+06	NaN	NaN	NaN	NaN	NaN	0.000	NaN	NaN	NaN	j820006.fits	HI.20091231.07801.fits
6	20091231	Unknown	r20091231.7	th-ar	2.455197e+06	2.455197e+06	NaN	NaN	NaN	NaN	NaN	0.000	NaN	NaN	NaN	j820007.fits	HI.20091231.07848.fits
7	20091231	Unknown	r20091231.8	th-ar	2.455197e+06	2.455197e+06	NaN	NaN	NaN	NaN	NaN	0.000	NaN	NaN	NaN	j820008.fits	HI.20091231.07964.fits
8	20091231	Unknown	r20091231.9	th-ar	2.455197e+06	2.455197e+06	NaN	NaN	NaN	NaN	NaN	0.000	NaN	NaN	NaN	j820009.fits	HI.20091231.08011.fits
9	20091231	Iodine calibration	r20091231.10	iodine	2.455197e+06	2.455197e+06	NaN	NaN	NaN	NaN	NaN	0.000	NaN	NaN	NaN	j820010.fits	HI.20091231.08160.fits
10	20091231	Iodine calibration	r20091231.11	iodine	2.455197e+06	2.455197e+06	NaN	NaN	NaN	NaN	NaN	0.000	NaN	NaN	NaN	j820011.fits	HI.20091231.08233.fits
11	20091231	RV observation	r20091231.66	HD204277	2.455197e+06	2.455197e+06	-20657.870	9320.0	21 27 06.53	+16 07 25.00	2015.5	3.250	-0.0783	-0.0962	0.030195	j820066.fits	HI.20091231.15897.fits
12	20091231	RV observation	r20091231.67	HD210460	2.455197e+06	2.455197e+06	-23885.341	20630.0	22 10 19.12	+19 36 57.00	2015.5	2.557	0.0917	-0.0935	0.017691	j820067.fits	HI.20091231.15988.fits
13	20091231	RV observation	r20091231.68	HD201091	2.455197e+06	2.455197e+06	-16012.432	NaN	21 06 53.74	+38 44 29.00	2015.5	3.646	NaN	NaN	NaN	j820068.fits	HI.20091231.16113.fits
14	20091231	RV observation	r20091231.69	HD201092	2.455197e+06	2.455197e+06	-16013.712	NaN	21 06 53.74	+38 44 29.00	2015.5	3.675	NaN	NaN	NaN	j820069.fits	HI.20091231.16208.fits

Reducing RV Measurements for a Star¶

Subsetting the Metadata: Single Target¶

Ultimately, to make an RV curve for one star we need to reduce its observations into RV measurements. Assuming there are adequate B star observations to reduce the template, we can isolate appropriate records in the metadata above by simply filtering on TARGET name. There are many ways to do this; in our case we we used the remote SQLite query capability and filtered it with

select DATE, OBTYPE, FILENAME, TARGET, BJD, BCVEL from FILES where TARGET like ‘HD185144’;

The resulting records are shown below.

[5]:

search_string = "select DATE,OBTYPE,FILENAME,TARGET,BJD,BCVEL from FILES where TARGET like 'HD185144';"

url = state.search(sql=search_string)
df = pd.read_html(url, header=0)[0]
df.head(15)

[5]:

	DATE	OBTYPE	FILENAME	TARGET	BJD	BCVEL
0	20091231	RV observation	r20091231.72	HD185144	2.455197e+06	-4620.095
1	20091231	RV observation	r20091231.73	HD185144	2.455197e+06	-4620.211
2	20091231	RV observation	r20091231.74	HD185144	2.455197e+06	-4620.321
3	20091231	Template	r20091231.79	HD185144	2.455197e+06	-4621.229
4	20091231	Template	r20091231.80	HD185144	2.455197e+06	-4621.324
5	20091231	Template	r20091231.81	HD185144	2.455197e+06	-4621.420
6	20091231	Template	r20091231.82	HD185144	2.455197e+06	-4621.511
7	20091231	Template	r20091231.83	HD185144	2.455197e+06	-4621.597
8	20150606	RV observation	r20150606.145	HD185144	2.457180e+06	3189.795
9	20150606	RV observation	r20150606.146	HD185144	2.457180e+06	3189.328
10	20150606	RV observation	r20150606.147	HD185144	2.457180e+06	3188.863
11	20080516	RV observation	r20080516.177	HD185144	2.454603e+06	2061.135
12	20080517	RV observation	r20080517.181	HD185144	2.454604e+06	2124.858
13	20080617	RV observation	r20080617.132	HD185144	2.454635e+06	3884.757
14	20080617	RV observation	r20080617.133	HD185144	2.454635e+06	3884.242

Templates and B-stars¶

Another subset that comes up is matching B-Star observations with the template observations they will be used with. This can be many to many so the easiest quick look is just to list out all B-star and Template observations in time order and then match visually:

select DATE, OBTYPE, FILENAME, TARGET, BJD from FILES where OBTYPE like ‘TEMPLATE’ or OBTYPE like ‘B Star’;

The resulting records are shown below.

[14]:

search_string = "select DATE, OBTYPE, FILENAME, TARGET, BJD from FILES where OBTYPE like 'TEMPLATE' or OBTYPE like 'B Star';"

url = state.search(sql=search_string)

df = pd.read_html(url, header=0)[0]
df.head(15)

[14]:

	DATE	OBTYPE	FILENAME	TARGET	BJD
0	20091231	B star	r20091231.75	HR9071	2.455197e+06
1	20091231	B star	r20091231.76	HR9071	2.455197e+06
2	20091231	B star	r20091231.77	HR9071	2.455197e+06
3	20091231	B star	r20091231.78	HR9071	2.455197e+06
4	20091231	Template	r20091231.79	HD185144	2.455197e+06
5	20091231	Template	r20091231.80	HD185144	2.455197e+06
6	20091231	Template	r20091231.81	HD185144	2.455197e+06
7	20091231	Template	r20091231.82	HD185144	2.455197e+06
8	20091231	Template	r20091231.83	HD185144	2.455197e+06
9	20091231	B star	r20091231.84	HR8047	2.455197e+06
10	20091231	B star	r20091231.85	HR8047	2.455197e+06
11	20091231	B star	r20091231.86	HR8047	2.455197e+06
12	20091231	B star	r20091231.87	HR8047	2.455197e+06
13	20091231	Template	r20091231.88	HD216520	2.455197e+06
14	20091231	Template	r20091231.89	HD216520	2.455197e+06

RV Pipeline Processing¶

This shows that on 12/31/2009 three separate RV observations were made of HD185144 followed by five template observations (which the pipeline will combine into a single template). Five years later, another three RV observations were made.

As with the data download, the further reduction steps in the pipeline can be quite lengthy (minutes to hours each), so rather than have the user monitor each one, we provide a scripting mechanism so complex reduction jobs can be submitted in on shot.

In order to turn any of the RV observations into an RV value, we need the template. So we will generate that first. Since it is possible to repeat the template observations on more than one day, we need to explicitly state which object and which day. The script command for this is:

template 185144 20091231

To reduce an RV measurement, we have to refer to this template (the target name is enough) and specify which file to reduce. For example:

rv 185144 r20091231.7

Finally, once we have a set of RV measurements for an object, we a generate an RV curve (the pipeline finds all the appropriate RV measurements):

rvcurve 185144

As long as we follow the general rules that we need a template before we can reduce an RV measurement and we need at least three RV measurements before we can generate an RV curve, we can otherwise scripte things in whatever order we wish (e.g. all the templates first).

All of this is submitted to the pipeline as a text script, but first lets figure out which observations need to be processed.

[24]:

search_string = "select * from FILES where TARGET like 'HD185144' and DATE = '20180922';"

url = state.search(sql=search_string)
df = pd.read_html(url, header=0)[0]
df

[24]:

	DATE	OBTYPE	FILENAME	TARGET	MJD	BJD	BCVEL	RADVEL	RA	DEC	EPOCH	HRANG	RA_MOTION	DEC_MOTION	PARALLAX	ORIGFILENAME	KOAID
0	20180922	RV observation	r20180922.60	HD185144	2.458384e+06	2.458384e+06	2405.133	26580.0	19 32 23.37	+69 39 13.00	2015.5	-1.023	0.5979	-1.7383	0.17324	j3040160.fits	HI.20180922.17329.fits
1	20180922	RV observation	r20180922.61	HD185144	2.458384e+06	2.458384e+06	2404.570	26580.0	19 32 23.37	+69 39 13.00	2015.5	-1.009	0.5979	-1.7383	0.17324	j3040161.fits	HI.20180922.17378.fits
2	20180922	RV observation	r20180922.62	HD185144	2.458384e+06	2.458384e+06	2404.007	26580.0	19 32 23.37	+69 39 13.00	2015.5	-0.996	0.5979	-1.7383	0.17324	j3040162.fits	HI.20180922.17427.fits

[25]:

from hiresprv.idldriver import Idldriver

idl = Idldriver('prv.cookies')

# This example would create the template, reduce the RV observations, and then construct the
# RV time series
full_example = """
template 185144 20091231
rv 185144 r20091231.72
rv 185144 r20091231.73
rv 185144 r20091231.74
rv 185144 r20150606.145
rv 185144 r20150606.146
rv 185144 r20150606.147
rvcurve 185144
"""

# Since the template for HD185144 has already been created in the pre-processed dataset
# we can start by calculating the RVs only for the new observations.
new_rvs = """
rv 185144 r20180922.60
rv 185144 r20180922.61
rv 185144 r20180922.62
"""

rtn = idl.run_script(new_rvs)

print(rtn)

status= ok
msg= Script running in background. Consult monitor for status.
None

Monitoring (again)¶

To monitor the pipeline processing request, the best idea is to use the same monitor page from above. It stops whenever a given script is finished but you can restart it any time to see the currently-running job. You can also insert a monitor start-up or status polling call here as well.

Product Retrieval¶

There is a utility function for retrieving the RV curves (CSV files) for each target (similarly, there is a function – data.spectrum – for retrieving the 1D FITS spectrum files).

[5]:

from hiresprv.download import Download

# second argument is path where files will be downloaded into
data = Download('prv.cookies', './')

rtn = data.rvcurve('185144')

with open('vst185144.csv', 'r') as file:
  for line in file:
    print(line, end='')

BJD_TDB,RV,RV_ERR,BC,ADU,CHI2
15196.69208800001,-2.860674413602231,0.789710,-4620.095214843750,52362,1.05080
15196.69270200003,1.730315543645411,0.812607,-4620.210937500000,51591,1.04891
15196.69329199987,1.431450932849171,0.803164,-4620.320800781250,48950,1.05804
17180.10972899990,-2.845576383390323,0.890091,3189.794921875000,55029,1.10542
17180.11030799989,1.235438116773508,0.825978,3189.327880859375,55717,1.10771
17180.11088699987,1.047083913731319,0.828699,3188.863037109375,48769,1.10120