Category: Python

Create and populate a SQL database using astrodbkit

The astrodbkit package can be used to modify an existing SQL database (such as The BDNYC Database) but it can also be used to create and populate a SQL database from scratch.

To do this, import the BDdb module and create a new database with

from astrodbkit import astrodb
dbpath = '/path/to/new_database.db'
astrodb.create_databse(dbpath)

Then load your new database with

db = astrodb.get_database(dbpath)

and start adding tables! The db.table() method accepts as its arguments the table name, list of field names, and list of data types like so:

db.table('my_new_table', ['field1','field2'], ['INTEGER','TEXT'], new_table=True)

Note new_table=True is necessary to create a new table. Otherwise, it looks for an existing table to modify (which you could do as well!).

To populate your new database with data, read the documentation here or a summary at Adding Data to the BDNYC Database.

As always, I recommend the SQLite Browser for a nice GUI to make changes outside of the command line.

Happy databasing!

Example Query: SpeX Prism Spectra for Very Red, Field Age, Early-L Dwarfs

Here we needed the SpeX Prism spectra for all the field age L0-L5 dwarfs with J-Ks color greater than a certain value that is different for each spectral type.

To perform this query we needed to join the SOURCES, SPECTRAL_TYPES, PHOTOMETRY, and SPECTRA tables by source_id. Then in the ‘where’ clause, we had to specify:

  • Our band 1 is ‘J’ and band 2 is ‘Ks’ both from system 2 (2MASS) to get the J-Ks color,
  • The instrument_id for the spectra must be 6 (SpeX),
  • The gravity must not be $$beta$$ or $$gamma$$ if it is field age,
  • And then we joined a bunch of conditionals for each spectral type/color requirement in parentheses with ‘or’ operators.

import BDdb
db = BDdb.get_db('/path/to/Dropbox/BDNYCdb/BDNYC.db')
red_field_Ls = db.query.execute("SELECT a.id, a.names, b.spectral_type, b.gravity, (c1.magnitude-c2.magnitude) AS color, d.wavelength, d.flux, d.unc FROM sources a JOIN spectral_types b ON a.id=b.source_id JOIN photometry c1 ON a.id=c1.source_id JOIN photometry c2 ON a.id=c2.source_id JOIN spectra d ON a.id=d.source_id WHERE b.gravity IS NULL AND c1.band='J' AND c1.system=2 AND c2.band='Ks' AND c2.system=2 AND d.instrument_id=6 AND b.regime='OPT' AND ((color>=1.3 AND b.spectral_type=10) OR (color>=1.35 AND b.spectral_type=11) OR (color>=1.48 AND b.spectral_type=12) OR (color>=1.64 AND b.spectral_type=13) OR (color>=1.69 AND b.spectral_type=14) OR (color>=1.72 AND b.spectral_type=15))").fetchall()

This returns a list of the form [source_id, name, optical spectral type, gravity suffix (which should be None), J-Ks color, wavelength array, flux array, uncertainty array]

Example Query: High Resolution NIRSPEC data

Here we needed the high resolution J band spectra for NIRSPEC orders 61 and 65 for all available objects in the database. In this snippet, we use the dict method instead of the usual query method so we can dump all of our spectra and meta data into a dictionary D.

import BDdb
db = BDdb.get_db('/path/to/Dropbox/BDNYCdb/BDNYC.db')
D = db.dict.execute("select * from spectra where wavelength_order in (61,65)").fetchall()

Then we can just use a list comprehension to iterate through the list and grab all the data like a Python dictionary with the column names as keys, i.e. D[n][‘flux’] gives you the flux array, D[n][‘source_id’] the source_id, etc. for the n results in your query.

Setting Up Your BDNYC Astro Python Environment

Here’s a step-by-step tutorial on how to set up your Python development environment.

Distribution and Compiler

The first step is to install Anaconda. Click here, select the appropriate version for your machine and operating system, download the .dmg file, open it and double-click the installer.

Make sure that when the installer asks where to put the distribution, you choose your actual hard drive and not some sub-directory on your machine. This will make things much easier later on.

What’s nice about this distribution is that it includes common plotting, computing, and astronomy packages such as numpy and scipy, astropy, matplotlib, and many others.

Next you’ll need a C compiler. If you’re using MacOSX, I recommend using Xcode which you can download free here. (This step may take a while so go get a beverage.) Once the download is complete, run the installer.

Required Packages

For future installation of packages, I recommend using Pip. To install this, at your Terminal command line type sudo easy_install pip. Then whenever you want to install a package you might need, you just open Terminal and do pip install package_name.

Not every package is this easy (though most are). If you can’t get something through Pip just download, unzip and put the folder of your new module with the rest of your packages in the directory /anaconda/pkgs/.

Then in Terminal, navigate to that directory with cd /anaconda/pkgs/package_dir_name and do python setup.py install.

Development Tools

MacOSX comes with the text editing application TextEdit but it is not good for editing code. I strongly recommend using TextMate though it is not free so you should ask your advisor to buy a license for you! Otherwise, some folks find the free TextWrangler to be pretty good.

Next, you’ll want to get access to the BDNYC database. Detailed instructions are here on how to setup Dropbox and Github on your machine in order to interact with the database.

Launching Python

Now to use Python, in Terminal just type ipython --pylab. I recommend always launching Python this way to have the plotting library preloaded.

Enjoy!

The BDNYC Data Archive

Setting Up the Database

To install, just do:

pip install BDNYCdb

Then download the bdnyc198.db database file. This initial release contains the astrometry, photometry and spectra for the 198 objects in the Filippazzo et al. (2015) sample.

To use the published and unpublished spectra, photometry and astrometry for all 1300 sources, ask someone who has access to the private database to share the BDNYCdb Dropbox folder with you.

Accessing the Database

Create a database instance by launching the Python interpreter and pointing the get_db() function to the database file by doing:

import BDdb
db = BDdb.get_db(filepath)

Voila! You can see an inventory of all data for a specific source by passing a source_id to the inventory() method.

db.inventory(767)

This will also plot all available spectra for that source for visual inspection if you set plot=True.

Querying the Database

Now that you have the database at your fingertips, you’ll want to get some info out of it. To do this, you can pass SQL queries wrapped in double-quotes (“) to the query() method.

data = db.query( "SQL_query_goes_here" )

Here is a detailed post about how to write a SQL query.

The result of a SQL query is a sequence of tuples with the data requested from each record. For example, we can get a source’s photometry from the PHOTOMETRY table with:

db.query("select band,magnitude from photometry where source_id=202")

which gives the output

[('J', 13.526),('H', 12.807),('Ks', 12.503),('W1', 12.486),('W2', 12.386),('W3', 12.313),('W4', 8.525)]

Alternatively, we can have this data returned as a Python dictionary with:

db.query("select band,magnitude from photometry where source_id=202", DICT=True)

which looks like

[{'band': 'J', 'magnitude': 13.526},{'band': 'H', 'magnitude': 12.807},{'band': 'Ks', 'magnitude': 12.503},{'band': 'W1', 'magnitude': 12.486},{'band': 'W2', 'magnitude': 12.386},{'band': 'W3', 'magnitude': 12.313},{'band': 'W4', 'magnitude': 8.525}]

Example Queries

Some SQL query examples to pass to the query() method (wrapped in double-quotes (“) of course) are:

  1. SELECT shortname, ra, dec FROM sources WHERE (222<ra AND ra<232) AND (5<dec AND dec<15)
  2. SELECT band, magnitude, magnitude_unc FROM photometry WHERE source_id=58
  3. SELECT source_id, band, magnitude FROM photometry WHERE band=’z’ AND magnitude<15
  4. SELECT wavelength, flux, unc FROM spectra WHERE observation_id=75”

As you hopefully gathered:

  1. Returns the shortname, ra and dec of all objects in a 10 square degree patch of sky centered at RA = 227, DEC = 10
  2. Returns all the photometry and uncertainties available for object 58
  3. Returns all objects and z magnitudes with z less than 15
  4. Returns the wavelength, flux and uncertainty arrays for all spectra of object 75

The above examples are for querying individual tables only. We can query from multiple tables at the same time with the JOIN command like so:

  1. SELECT t.name, p.band, p.magnitude, p.magnitude_unc FROM telescopes as t JOIN photometry AS p ON p.telescope_id=t.id WHERE p.source_id=58
  2. SELECT p1.magnitude-p2.magnitude FROM photometry AS p1 JOIN photometry AS p2 ON p1.source_id=p2.source_id WHERE p1.band=’J’ AND p2.band=’H’
  3. SELECT src.designation, src.unum, spt.spectral_type FROM sources AS src JOIN spectral_types AS spt ON spt.source_id=src.id WHERE spt.spectral_type>=10 AND spt.spectral_type<20 AND spt.regime=’optical’
  4. SELECT s.unum, p.parallax, p.parallax_unc, p.publication_id FROM sources as s JOIN parallaxes AS p ON p.source_id=s.id

As you may have gathered:

  1. Returns the survey, band and magnitude for all photometry of source 58
  2. Returns the J-H color for every object
  3. Returns the designation, U-number and optical spectral type for all L dwarfs
  4. Returns the parallax measurements and publications for all sources

Database Schema and Browsing

In order to write the SQL queries above you of course need to know the names of the fields in each table are. One way to do this is:

db.query("PRAGMA table_info('PHOTOMETRY')")

SQL browserEven easier is to use the DB Browser for SQLite pictured at left which lets you expand and collapse each table, sort and order columns, and other fun stuff.

It even allows you to manually create/edit/destroy records with a very nice GUI.

IMPORTANT: Keep in mind that if you change a database record, you immediately change it for everyone since we share the same database file on Dropbox. Be careful!

Always check and double-check that you are entering the correct data for the correct source before you save any changes with the SQLite Database Browser.

SQL Queries

An SQL database is comprised of a bunch of tables (kind of like a spreadsheet) that have fields (column names) and records (rows of data). For example, our database might have a table called students that looks like this:

id first last grade GPA
1 Al Einstein 6 2.7
2 Annie Cannon 6 3.8
3 Artie Eddington 8 3.2
4 Carlie Herschel 8 3.2

So in our students table, the fields are [id, first, last, grade, GPA], and there are a total of four records, each with a required yet arbitrary id in the first column.

To pull these records out, we tell SQL to SELECT values for the following fields FROM a certain table. In SQL this looks like:

In [1]: db.execute("SELECT id, first, last, grade, GPA FROM students").fetchall()
Out[1]: [(1,'Al','Einstein',6,2.7),(2,'Annie','Cannon',6,3.8),(3,'Artie','Eddington',8,3.2),(4,'Carlie','Herschel',8,3.2)]

Or equivalently, we can just use a wildcard “*” if we want to return all fields with the SQL query "SELECT * FROM students".

We can modify our SQL query to change the order of fields or only return certain ones as well. For example:

In [2]: db.execute("SELECT last, first, GPA FROM students").fetchall()
Out[1]: [('Einstein','Al',2.7),('Cannon','Annie',3.8),('Eddington','Artie',3.2),('Herschel','Carlie',3.2)]

Now that we know how to get records from tables, we can restrict which records it returns with the WHERE statement:

In [3]: db.execute("SELECT last, first, GPA FROM students WHERE GPA>3.1").fetchall()
Out[3]: [('Cannon','Annie',3.8),('Eddington','Artie',3.2),('Herschel','Carlie',3.2)]

Notice the first student had a GPA less than 3.1 so he was omitted from the result.

Now let’s say we have a second table called quizzes which is a table of every quiz grade for all students that looks like this:

id student_id quiz_number score
1 1 3 89
2 2 3 96
3 3 3 94
4 4 3 92
5 1 4 78
6 3 4 88
7 4 4 91

Now if we want to see only Al’s grades, we have to JOIN the tables ON some condition. In this case, we want to tell SQL that the student_id (not the id) in the quizzes table should match the id in the students table (since only those grades are Al’s). This looks like:

In [4]: db.execute("SELECT quizzes.quiz_number, quizzes.score FROM quizzes JOIN students ON students.id=quizzes.student_id WHERE students.last='Einstein'").fetchall()
Out[4]: [(3,89),(4,78)]

So students.id=quizzes.student_id associates each quiz with a student from the students table and students.last='Einstein' specifies that we only want the grades from the student with last name Einstein.

Similarly, we can see who scored 90 or greater on which quiz with:

In [5]: db.execute("SELECT students.last, quizzes.quiz_number, quizzes.score FROM quizzes JOIN students ON students.id=quizzes.student_id WHERE quizzes.score>=90").fetchall()
Out[5]: [('Cannon',3,96),('Eddington',3,94),('Herschel',3,92),('Herschel',4,91)]

That’s it! We can JOIN as many tables as we want with as many restrictions we need to pull out data in the desired form.

This is powerful, but the queries can become lengthy. A slight shortcut is to use the AS statement to assign a table to a variable (e.g. students => s, quizzes => q) like such:

In [6]: db.execute("SELECT s.last, q.quiz_number, q.score FROM quizzes AS q JOIN students AS s ON s.id=q.student_id WHERE q.score>=90").fetchall()
Out[6]: [('Cannon',3,96),('Eddington',3,94),('Herschel',3,92),('Herschel',4,91)]

Photon Flux Density vs. Energy Flux Density

One of the subtleties of photometry is the difference between magnitudes and colors calculated using energy flux densities (EFD) and photon flux densities (PFD).

The complication arises since the photometry presented by many surveys is calculated using PFD but spectra (specifically the synthetic variety) is given as EFD. The difference is small but measurable so let’s do it right.

The following is the process I used to remedy the situation by switching my models to PFD so they could be directly compared to the photometry from the surveys. Thanks to Mike Cushing for the guidance.

Filter Zero Points

Before we can calculate the magnitudes, we need filter zero points calculated from PFD. To do this, I started with a spectrum of Vega in units of [erg s-1 cm-2 A-1] snatched from STSci.

Then the zero point flux density in [photons s-1 cm-2 A-1] is:

$$!F_{zp}=frac{int p_V(lambda) S(lambda) dlambda}{int S(lambda)dlambda}=frac{int e_V(lambda)left( frac{lambda}{hc}right) S(lambda) dlambda}{int S(lambda)dlambda}$$

Where $$e_V$$ is the given energy flux density in [erg s-1 cm-2 A-1] of Vega, $$p_V$$ is the photon flux density in [photons s-1 cm-2 A-1], and $$S(lambda)$$ is the scalar filter throughput.

Since I’m starting with a spectrum of Vega in EFD units, I need to multiply by $$frac{lambda}{hc}$$ to convert it to PFD units.

In Python, this looks like:

def zp_flux(band):
    from scipy import trapz, interp, log10
    (wave, flux), filt, h, c = vega(), get_filters()[band], 6.6260755E-27 # [erg*s], 2.998E14 # [um/s]
    I = interp(wave, filt['wav'], filt['rsr'], left=0, right=0)
    return trapz(I*flux*wave/(h*c), x=wave)/trapz(I, x=wave))

Calculating Magnitudes

Now that we have the filter zero points, we can calculate the magnitudes using:

$$!m = -2.5logleft(frac{F_lambda}{F_{zp}}right)$$

Where $$m$$ is the apparent magnitude and $$F_lambda$$ is the flux from our source given similarly by:

$$!F_{lambda}=frac{int p_lambda(lambda) S(lambda) dlambda}{int S(lambda)dlambda}=frac{int e_lambda(lambda)left( frac{lambda}{hc}right) S(lambda) dlambda}{int S(lambda)dlambda}$$

Since the synthetic spectra I’m using are given in EFD units, I need to multiply by $$frac{lambda}{hc}$$ to convert it to PFD units just as I did with my spectrum of Vega.

In Python the magnitudes are obtained the same way as above but we use the source spectrum in [erg s-1 cm-2 A-1] instead of Vega. Then the magnitude is just:

mag = -2.5*log10(source_flux(band)/zp_flux(band))

Below is an image that shows the discrepancy between using EFD and PFD to calculate colors for comparison with survey photometry.

The circles are colors calculated from synthetic spectra of low surface gravity (large circles) to high surface gravity (small circles). The grey lines are iso-temperature contours. The jumping shows the different results using PFD and EFD. The stationary blue stars, green squares and red triangles are catalog photometric points calculated from PFD.
The circles are colors calculated from synthetic spectra of low surface gravity (large circles) to high surface gravity (small circles). The grey lines are iso-temperature contours. The jumping shows the different results using PFD and EFD. The stationary blue stars, green squares and red triangles are catalog photometric points calculated from PFD.

Other Considerations

The discrepancy I get between the same color calculated from PFD and EFD though is as much as 0.244 mags (in r-W3 at 1050K), which seems excessive. The magnitude calculation reduces to:

$$!m = -2.5logleft( frac{int e_lambda(lambda)S(lambda) lambda dlambda}{int e_V(lambda) S(lambda) lambda dlambda}right)$$

Since the filter profile is interpolated with the spectrum before integration, I thought the discrepancy must be due only to the difference in resolution between the synthetic and Vega spectra. In other words, I have to make sure the wavelength arrays for Vega and the source are identical so the trapezoidal sums have the same width bins.

This reduces the discrepancy in r-W3 at 1050K from -0.244 mags to -0.067 mags, which is better. However, the discrepancy in H-[3.6] goes from 0.071 mags to -0.078 mags.

To Recapitulate

In summary, I had a spectrum of Vega and some synthetic spectra all in energy flux density units of [erg s-1 cm-2 A-1] and some photometric points from the survey catalogs calculated from photon flux density units of [photons s-1 cm-2 A-1].

In order to compare apples to apples, I first converted my spectra to PFD by multiplying by $$frac{lambda}{hc}$$ at each wavelength point before integrating to calculate my zero points and magnitudes.

Converting Between Decimal Degrees and Hours, Minutes, Seconds

Here’s a quick Python snippet I wrote to convert right ascension in decimal degrees to hours, minutes, seconds and declination to (+/-)degrees, minutes, seconds.

def deg2HMS(ra='', dec='', round=False):
  RA, DEC, rs, ds = '', '', '', ''
  if dec:
    if str(dec)[0] == '-':
      ds, dec = '-', abs(dec)
    deg = int(dec)
    decM = abs(int((dec-deg)*60))
    if round:
      decS = int((abs((dec-deg)*60)-decM)*60)
    else:
      decS = (abs((dec-deg)*60)-decM)*60
    DEC = '{0}{1} {2} {3}'.format(ds, deg, decM, decS)
  
  if ra:
    if str(ra)[0] == '-':
      rs, ra = '-', abs(ra)
    raH = int(ra/15)
    raM = int(((ra/15)-raH)*60)
    if round:
      raS = int(((((ra/15)-raH)*60)-raM)*60)
    else:
      raS = ((((ra/15)-raH)*60)-raM)*60
    RA = '{0}{1} {2} {3}'.format(rs, raH, raM, raS)
  
  if ra and dec:
    return (RA, DEC)
  else:
    return RA or DEC

For example:
In [1]: f.deg2HMS(ra=66.918277)
Out[1]: '4 27 40.386'

Or even:
In [2]: f.deg2HMS(dec=24.622590)
Out[2]: '+24 37 21.324'

Or if you want to round the seconds, just do:
In [3]: f.deg2HMS(dec=24.622590,round=True)
Out[3]: '+24 37 21'

And to convert hours, minutes and seconds into decimal degrees, we have:

def HMS2deg(ra='', dec=''):
  RA, DEC, rs, ds = '', '', 1, 1
  if dec:
    D, M, S = [float(i) for i in dec.split()]
    if str(D)[0] == '-':
      ds, D = -1, abs(D)
    deg = D + (M/60) + (S/3600)
    DEC = '{0}'.format(deg*ds)
  
  if ra:
    H, M, S = [float(i) for i in ra.split()]
    if str(H)[0] == '-':
      rs, H = -1, abs(H)
    deg = (H*15) + (M/4) + (S/240)
    RA = '{0}'.format(deg*rs)
  
  if ra and dec:
    return (RA, DEC)
  else:
    return RA or DEC

So we can get back our decimal degrees with:

In [4]: f.HMS2deg(ra='4 27 40.386', dec='+24 37 21.324')
Out[4]: (66.918, 24.622)

How to Make Animated GIFs of Plots

So you’re swimming in countless plots of your data over some changing parameter. Sure they’re nice to look at, but are they animated? Didn’t think so.

Here’s how to create an animated .gif image of those Python plots using Photoshop.

Step 1: Generate the plots

I’ve found that a few lines of Python to programmatically draw and save your plots to a folder eliminates a lot of editing and tweaking later on.

For this tutorial, I’ll use my synthetic photometry code to generate color-parameter plots across surface gravities of 3.0dex to 6.0dex.

First I created a folder on my desktop to dump the plots called RI_teff.

Then in the Python interpreter it looks like:

In [1]: grav = [round(i*0.1,1) for i in range(30,61)]
In [2]: import syn_phot as s
In [3]: for i in g:
s.color_param('R','I',logg=i,save='/Desktop/RI_teff/')

Now I can create the animated .gif in Photoshop.

Step 2: Pull plots into Photoshop as layers

Open Photoshop and click File > Scripts > Load Files into Stack…

Select the folder on your Desktop that has all the plots and click ok.

Photoshop will open each image as a layer in a new project window.

Step 3: Create frames from each layer

Next click Window > Timeline to show the Timeline across the bottom of the program.

In the top-right corner of your Timeline, you’ll see a little button that has all the Timeline options. Click it and select Make Frames From Layers. Here’s what it looks like:

This will populate your Timeline with one frame for each image layer.

Click the Timeline options button again and click Reverse Frames if necessary. Otherwise, you can drag and drop the frames into the desired order.

Step 4: Timing is everything

Next we need to set the timing of each frame. Select the first frame from the Timeline then hold down the Shift button and select the last frame to select them all.

Next click on any frame where it says 0 sec. with a down arrow. Then select the display time for each frame in the animation.

I typically set the frames to be 0.2 or 0.5 seconds each, depending on the number of total frames. Then I set the last frame to be 2 seconds so it’s as if the image pauses when it finishes before starting the animation over.

Step 5: Save it!

Finally, click File > Save for Web… and make sure you have GIF filetype selected. Click Save and you’re done! Here’s the result:

Editing PYTHONPATH (or “Where’s my module?!”)

Python not finding your modules with import MyModule for some reason? It’s probably your PYTHONPATH. Here’s how you edit it so that Python sees all your modules.

A Word About Directories

Firstly, if you are the kind of person who keeps files scattered about your computer, dumps everything into the root directory, or is afraid to use those nice little blue folders, then perhaps programming and computers in general are not for you.

Logical and neat directory structure will make your own, your computer’s and your collaborators’ lives much easier.

My recommendation: Create a master code directory (called “Modules” or something) in your Documents folder. This will be the new home of all your future Python projects.

Now every time you create a .py file, it should go into a project folder inside your Modules directory, i.e. /Documents/Modules/NewProject/MyModule.py. Note that you should have no actual modules inside your Modules directory! Keep those puppies in their warm, snuggly project subdirectories.

This way you can also initialize that project folder (i.e. /NewProject) as a Git repository and just push and pull the contents to keep it up-to-date!

Editing PYTHONPATH

Python won’t just search your computer for the MyModule.py file you’re trying to import. You have to tell it explicitly each time where to get it. The PYTHONPATH is a list of directories for your computer to check whenever you type import MyModule into the interpreter.

To add a path, launch ipython and type:

import sys
sys.path.append("path/to/Modules")
print sys.path

Note: You only have to update your PYTHONPATH once, not every time you use Python!

So now your path is updated but this is only the path to your master Python folder.

In order to have Python see the modules inside each subdirectory, add a blank file called __init__.py to each subdirectory (with two underscores on each side).

Now to import the module and use a function called foo() do:

from NewProject import MyModule as m
m.foo()

That is, it’s checking the main python directory you added to your PYTHONPATH, then looking within the NewProject subdirectory via the __init__.py file for the module you’re trying to import.