Data Reduction Tutorial¶
This service enables reduction and analysis of precision radial velocity (PRV) data from the HIRES Keck instrument.
This notebook is meant to be a template to set up your own processing using the code snippets provided here. The notebook may not run exactly as presented if you attempt to run the full notebook directly. The code has not been rigorously tested from within a notebook environemnt.
This notebook introduces the Keck HIRES Precision Radial Velocity (PRV) pipeline service and works through one specific example. There are number of variations, mostly having to do with planning processing, which will be covered in more detail by other notebooks but here you will see all the basics.
The notebook is kept with the HIRES PRV Python access toolkit in GitHub: https://github.com/Caltech-IPAC/hiresprv
Login¶
Logging in the first time creates a workspace for the user and associates it with a KOA account. This workspace is initialized containing pre-processed products for all HIRESprv compatible data in the archive. Please have a look around and search for existing observations of your target stars before collecting new observations.
Users of this service must have Keck Observatory Archive (KOA) accounts and use that login here to gain access to their data. Even researchers planning to use only public data will need a KOA login as this service is maintaining persistent storage under that ID.
The login is persisted through the use of HTTP cookies and logging in from multiple clients will connect the user to the same account, storage, and processing history. This environment (user workspace) is permanent as we expect some on-going research to span years. The login for a give client machine need only be done once, assuming the cookie file in the local storage is not deleted. If it is, logging in again will reconstruct it.
The cookie file, processing state information, and downloaded results like 1D spectra and RV curve tables will be kept locally in the same space as this notebook. If you wish to change that, simply add in whatever directory management and navigation you like.
[1]:
from hiresprv.auth import login
login('prv.cookies')
Successful login as bfulton
KOA Data Retrieval¶
The PRV workspace first needs to be populated with your data from KOA. This can be done all at once, if the data exists, or incrementally as the data are taken/identified. If the requested night is already contained in the pre-processed dataset or if you have already processed the night it will be skipped over.
This step is more than a simple data transfer. “Raw reduction” of the data, which converts the 2D CCD echelle images to 1D spectra, is done up-front as the data are retrieved a night at a time. The UT dates you give here are actually shifted a few hours to catch any calibration data collected in the afternoon of the same (Hawaii-local) day.
[1]:
from hiresprv.archive import Archive
koa = Archive('prv.cookies')
# These nights are known to work, but were not included in the pre-processed dataset
rtn = koa.by_dates("""2018-09-02
2018-09-03
2018-09-04
2018-09-05
2018-09-22
""")
{
"status": "ok",
"msg": "Processing dates in background."
}
Note that since the data in the workspace are permanent, repeated request for the same data would not change anything and so those dates will be ignored. Therefore, you can add to the above list or replace it with new dates as you choose. Dates must be formatted as YYYY-MM-DD.
The above service responds immediately with an acknowledgement of the request and starts the actual transfer and raw data reduction (which can take some time) as a background job. The job status can be checked by polling or can be monitored using the function below. While one retrieval job (or processing job below) is running, no others can be initiated.
PRV Processing Monitor¶
Some steps in the PRV processing can take quite a long time (hours) and we do not want to tie up this notebook page waiting for it to finish. Below we show how to retrieve a snapshot of the status (and you would have to poll manually to track the progress of the job) but the preferred approach is to start a real-time monitor in a custom page/tab which uses Javascript and an HTTP Event stream. Run the next cell to generate a link to this monitor:
[3]:
from hiresprv.status import Status
from IPython.display import IFrame, HTML, display
monitor = Status('prv.cookies')
link = monitor.generate_link()
HTML(link)
[3]:
Metadata¶
Once data have been retrieved and the “nightly” raw reduction performed, a set of records is added to a persistent metadata table, one row is added per observation. These observations are all taken through the HIRES PRV instrument (2D CCD) and will have been reduced to 1D spectra by the raw reduction. They fall into five classes:
RV observations – Multiple observations of a star with the iodine cell in the light path. Precision, relative radial velocities are calculated for this type of observation.
Templates – One long observation of the same star without iodine, for reference.
B stars – A set of observations of rapidly rotating B stars bracketing the template observation and used to reduce it.
Iodine – Reference observation of iodine for nightly calibration.
Miscellaneous other calibration observations (labelled as “Unknown”).
By inspecting this table, the user can determine what objects were observed, whether there are template observations for them (and adequate B star data to reduce a template), and whether there are enough RV measurements to generate a final RV curve.
With a small metadata table, this is simple enough to do by inspection but a typical workspace can easily have thousands of files covering tens or hundreds of objects. Furthermore, since observations for a single object are frequently spread out over years, the metadata table is often fairly thorougly mixed in time.
Luckily, there are a number of tools available in client-side Python subset and organize the metadata, so we provide it for download as a CSV table or an SQLite binary file or even, as here, as a simple HTML table. The workspace copy of the data is maintained in an SQLite database so we also provide a basic filtering mechanism as an optional addition to the download. This filtering is often adequate for basic processing scenarios.
Note that metadata retrieval can’t be done while the system is “busy” (downloading additional data or further reducing data in the workspace). Otherwise, metadata downloads can be done at any time.
Also note that the client-side file will become out-of-date once new data download or processing requests are submitted. It is up to the user to re-request the new metadata.
[2]:
from hiresprv.database import Database
import pandas as pd
state = Database('prv.cookies')
# We limit to 100 entries so that the query returns quickly in this tutorial
search_string = "select * from FILES LIMIT 100;"
url = state.search(sql=search_string)
df = pd.read_html(url, header=0)[0]
df.head(15)
[2]:
DATE | DEACTIVATED | OBTYPE | FILENAME | TARGET | MJD | BJD | BCVEL | RADVEL | RA | DEC | EPOCH | HRANG | RA_MOTION | DEC_MOTION | PARALLAX | ORIGFILENAME | KOAID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 20091231 | 0 | Unknown | r20091231.1 | th-ar | 2.455197e+06 | 2.455197e+06 | NaN | NaN | NaN | NaN | NaN | 0.000 | NaN | NaN | NaN | j820001.fits | HI.20091231.06409.fits |
1 | 20091231 | 0 | Unknown | r20091231.2 | th-ar | 2.455197e+06 | 2.455197e+06 | NaN | NaN | NaN | NaN | NaN | 0.000 | NaN | NaN | NaN | j820002.fits | HI.20091231.06653.fits |
2 | 20091231 | 0 | Unknown | r20091231.3 | th-ar | 2.455197e+06 | 2.455197e+06 | NaN | NaN | NaN | NaN | NaN | 0.000 | NaN | NaN | NaN | j820003.fits | HI.20091231.07296.fits |
3 | 20091231 | 0 | Unknown | r20091231.4 | th-ar | 2.455197e+06 | 2.455197e+06 | NaN | NaN | NaN | NaN | NaN | 0.000 | NaN | NaN | NaN | j820004.fits | HI.20091231.07594.fits |
4 | 20091231 | 0 | Unknown | r20091231.5 | th-ar | 2.455197e+06 | 2.455197e+06 | NaN | NaN | NaN | NaN | NaN | 0.000 | NaN | NaN | NaN | j820005.fits | HI.20091231.07645.fits |
5 | 20091231 | 0 | Unknown | r20091231.6 | th-ar | 2.455197e+06 | 2.455197e+06 | NaN | NaN | NaN | NaN | NaN | 0.000 | NaN | NaN | NaN | j820006.fits | HI.20091231.07801.fits |
6 | 20091231 | 0 | Unknown | r20091231.7 | th-ar | 2.455197e+06 | 2.455197e+06 | NaN | NaN | NaN | NaN | NaN | 0.000 | NaN | NaN | NaN | j820007.fits | HI.20091231.07848.fits |
7 | 20091231 | 0 | Unknown | r20091231.8 | th-ar | 2.455197e+06 | 2.455197e+06 | NaN | NaN | NaN | NaN | NaN | 0.000 | NaN | NaN | NaN | j820008.fits | HI.20091231.07964.fits |
8 | 20091231 | 0 | Unknown | r20091231.9 | th-ar | 2.455197e+06 | 2.455197e+06 | NaN | NaN | NaN | NaN | NaN | 0.000 | NaN | NaN | NaN | j820009.fits | HI.20091231.08011.fits |
9 | 20091231 | 0 | Iodine calibration | r20091231.10 | iodine | 2.455197e+06 | 2.455197e+06 | NaN | NaN | NaN | NaN | NaN | 0.000 | NaN | NaN | NaN | j820010.fits | HI.20091231.08160.fits |
10 | 20091231 | 0 | Iodine calibration | r20091231.11 | iodine | 2.455197e+06 | 2.455197e+06 | NaN | NaN | NaN | NaN | NaN | 0.000 | NaN | NaN | NaN | j820011.fits | HI.20091231.08233.fits |
11 | 20091231 | 0 | RV observation | r20091231.66 | HD204277 | 2.455197e+06 | 2.455197e+06 | -20657.870 | 9320.0 | 21 27 06.53 | +16 07 25.00 | 2015.5 | 3.250 | -0.0783 | -0.0962 | 0.030195 | j820066.fits | HI.20091231.15897.fits |
12 | 20091231 | 0 | RV observation | r20091231.67 | HD210460 | 2.455197e+06 | 2.455197e+06 | -23885.341 | 20630.0 | 22 10 19.12 | +19 36 57.00 | 2015.5 | 2.557 | 0.0917 | -0.0935 | 0.017691 | j820067.fits | HI.20091231.15988.fits |
13 | 20091231 | 0 | RV observation | r20091231.68 | HD201091 | 2.455197e+06 | 2.455197e+06 | -16012.432 | NaN | 21 06 53.74 | +38 44 29.00 | 2015.5 | 3.646 | NaN | NaN | NaN | j820068.fits | HI.20091231.16113.fits |
14 | 20091231 | 0 | RV observation | r20091231.69 | HD201092 | 2.455197e+06 | 2.455197e+06 | -16013.712 | NaN | 21 06 53.74 | +38 44 29.00 | 2015.5 | 3.675 | NaN | NaN | NaN | j820069.fits | HI.20091231.16208.fits |
Reducing RV Measurements for a Star¶
Subsetting the Metadata: Single Target¶
Ultimately, to make an RV curve for one star we need to reduce its observations into RV measurements. Assuming there are adequate B star observations to reduce the template, we can isolate appropriate records in the metadata above by simply filtering on TARGET name. There are many ways to do this; in our case we we used the remote SQLite query capability and filtered it with
select DATE, OBTYPE, FILENAME, TARGET, BJD, BCVEL from FILES where TARGET like ‘HD185144’;
The resulting records are shown below.
[5]:
search_string = "select DATE,OBTYPE,FILENAME,TARGET,BJD,BCVEL from FILES where TARGET like 'HD185144';"
url = state.search(sql=search_string)
df = pd.read_html(url, header=0)[0]
df.head(15)
[5]:
DATE | OBTYPE | FILENAME | TARGET | BJD | BCVEL | |
---|---|---|---|---|---|---|
0 | 20091231 | RV observation | r20091231.72 | HD185144 | 2.455197e+06 | -4620.095 |
1 | 20091231 | RV observation | r20091231.73 | HD185144 | 2.455197e+06 | -4620.211 |
2 | 20091231 | RV observation | r20091231.74 | HD185144 | 2.455197e+06 | -4620.321 |
3 | 20091231 | Template | r20091231.79 | HD185144 | 2.455197e+06 | -4621.229 |
4 | 20091231 | Template | r20091231.80 | HD185144 | 2.455197e+06 | -4621.324 |
5 | 20091231 | Template | r20091231.81 | HD185144 | 2.455197e+06 | -4621.420 |
6 | 20091231 | Template | r20091231.82 | HD185144 | 2.455197e+06 | -4621.511 |
7 | 20091231 | Template | r20091231.83 | HD185144 | 2.455197e+06 | -4621.597 |
8 | 20150606 | RV observation | r20150606.145 | HD185144 | 2.457180e+06 | 3189.795 |
9 | 20150606 | RV observation | r20150606.146 | HD185144 | 2.457180e+06 | 3189.328 |
10 | 20150606 | RV observation | r20150606.147 | HD185144 | 2.457180e+06 | 3188.863 |
11 | 20080516 | RV observation | r20080516.177 | HD185144 | 2.454603e+06 | 2061.135 |
12 | 20080517 | RV observation | r20080517.181 | HD185144 | 2.454604e+06 | 2124.858 |
13 | 20080617 | RV observation | r20080617.132 | HD185144 | 2.454635e+06 | 3884.757 |
14 | 20080617 | RV observation | r20080617.133 | HD185144 | 2.454635e+06 | 3884.242 |
Templates and B-stars¶
Another subset that comes up is matching B-Star observations with the template observations they will be used with. This can be many to many so the easiest quick look is just to list out all B-star and Template observations in time order and then match visually:
select DATE, OBTYPE, FILENAME, TARGET, BJD from FILES where OBTYPE like ‘TEMPLATE’ or OBTYPE like ‘B Star’;
The resulting records are shown below.
[14]:
search_string = "select DATE, OBTYPE, FILENAME, TARGET, BJD from FILES where OBTYPE like 'TEMPLATE' or OBTYPE like 'B Star';"
url = state.search(sql=search_string)
df = pd.read_html(url, header=0)[0]
df.head(15)
[14]:
DATE | OBTYPE | FILENAME | TARGET | BJD | |
---|---|---|---|---|---|
0 | 20091231 | B star | r20091231.75 | HR9071 | 2.455197e+06 |
1 | 20091231 | B star | r20091231.76 | HR9071 | 2.455197e+06 |
2 | 20091231 | B star | r20091231.77 | HR9071 | 2.455197e+06 |
3 | 20091231 | B star | r20091231.78 | HR9071 | 2.455197e+06 |
4 | 20091231 | Template | r20091231.79 | HD185144 | 2.455197e+06 |
5 | 20091231 | Template | r20091231.80 | HD185144 | 2.455197e+06 |
6 | 20091231 | Template | r20091231.81 | HD185144 | 2.455197e+06 |
7 | 20091231 | Template | r20091231.82 | HD185144 | 2.455197e+06 |
8 | 20091231 | Template | r20091231.83 | HD185144 | 2.455197e+06 |
9 | 20091231 | B star | r20091231.84 | HR8047 | 2.455197e+06 |
10 | 20091231 | B star | r20091231.85 | HR8047 | 2.455197e+06 |
11 | 20091231 | B star | r20091231.86 | HR8047 | 2.455197e+06 |
12 | 20091231 | B star | r20091231.87 | HR8047 | 2.455197e+06 |
13 | 20091231 | Template | r20091231.88 | HD216520 | 2.455197e+06 |
14 | 20091231 | Template | r20091231.89 | HD216520 | 2.455197e+06 |
RV Pipeline Processing¶
This shows that on 12/31/2009 three separate RV observations were made of HD185144 followed by five template observations (which the pipeline will combine into a single template). Five years later, another three RV observations were made.
As with the data download, the further reduction steps in the pipeline can be quite lengthy (minutes to hours each), so rather than have the user monitor each one, we provide a scripting mechanism so complex reduction jobs can be submitted in on shot.
In order to turn any of the RV observations into an RV value, we need the template. So we will generate that first. Since it is possible to repeat the template observations on more than one day, we need to explicitly state which object and which day. The script command for this is:
template 185144 20091231
To reduce an RV measurement, we have to refer to this template (the target name is enough) and specify which file to reduce. For example:
rv 185144 r20091231.7
Finally, once we have a set of RV measurements for an object, we a generate an RV curve (the pipeline finds all the appropriate RV measurements):
rvcurve 185144
As long as we follow the general rules that we need a template before we can reduce an RV measurement and we need at least three RV measurements before we can generate an RV curve, we can otherwise scripte things in whatever order we wish (e.g. all the templates first).
All of this is submitted to the pipeline as a text script, but first lets figure out which observations need to be processed.
[24]:
search_string = "select * from FILES where TARGET like 'HD185144' and DATE = '20180922';"
url = state.search(sql=search_string)
df = pd.read_html(url, header=0)[0]
df
[24]:
DATE | DEACTIVATED | OBTYPE | FILENAME | TARGET | MJD | BJD | BCVEL | RADVEL | RA | DEC | EPOCH | HRANG | RA_MOTION | DEC_MOTION | PARALLAX | ORIGFILENAME | KOAID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 20180922 | 0 | RV observation | r20180922.60 | HD185144 | 2.458384e+06 | 2.458384e+06 | 2405.133 | 26580.0 | 19 32 23.37 | +69 39 13.00 | 2015.5 | -1.023 | 0.5979 | -1.7383 | 0.17324 | j3040160.fits | HI.20180922.17329.fits |
1 | 20180922 | 0 | RV observation | r20180922.61 | HD185144 | 2.458384e+06 | 2.458384e+06 | 2404.570 | 26580.0 | 19 32 23.37 | +69 39 13.00 | 2015.5 | -1.009 | 0.5979 | -1.7383 | 0.17324 | j3040161.fits | HI.20180922.17378.fits |
2 | 20180922 | 0 | RV observation | r20180922.62 | HD185144 | 2.458384e+06 | 2.458384e+06 | 2404.007 | 26580.0 | 19 32 23.37 | +69 39 13.00 | 2015.5 | -0.996 | 0.5979 | -1.7383 | 0.17324 | j3040162.fits | HI.20180922.17427.fits |
[25]:
from hiresprv.idldriver import Idldriver
idl = Idldriver('prv.cookies')
# This example would create the template, reduce the RV observations, and then construct the
# RV time series
full_example = """
template 185144 20091231
rv 185144 r20091231.72
rv 185144 r20091231.73
rv 185144 r20091231.74
rv 185144 r20150606.145
rv 185144 r20150606.146
rv 185144 r20150606.147
rvcurve 185144
"""
# Since the template for HD185144 has already been created in the pre-processed dataset
# we can start by calculating the RVs only for the new observations.
new_rvs = """
rv 185144 r20180922.60
rv 185144 r20180922.61
rv 185144 r20180922.62
"""
rtn = idl.run_script(new_rvs)
print(rtn)
status= ok
msg= Script running in background. Consult monitor for status.
None
Monitoring (again)¶
To monitor the pipeline processing request, the best idea is to use the same monitor page from above. It stops whenever a given script is finished but you can restart it any time to see the currently-running job. You can also insert a monitor start-up or status polling call here as well.
Product Retrieval¶
There is a utility function for retrieving the RV curves (CSV files) for each target (similarly, there is a function – data.spectrum – for retrieving the 1D FITS spectrum files).
[5]:
from hiresprv.download import Download
# second argument is path where files will be downloaded into
data = Download('prv.cookies', './')
rtn = data.rvcurve('185144')
with open('vst185144.csv', 'r') as file:
for line in file:
print(line, end='')
BJD_TDB,RV,RV_ERR,BC,ADU,CHI2
15196.69208800001,-2.860674413602231,0.789710,-4620.095214843750,52362,1.05080
15196.69270200003,1.730315543645411,0.812607,-4620.210937500000,51591,1.04891
15196.69329199987,1.431450932849171,0.803164,-4620.320800781250,48950,1.05804
17180.10972899990,-2.845576383390323,0.890091,3189.794921875000,55029,1.10542
17180.11030799989,1.235438116773508,0.825978,3189.327880859375,55717,1.10771
17180.11088699987,1.047083913731319,0.828699,3188.863037109375,48769,1.10120