Data Containers (sbpy.data.DataClass
)¶
sbpy
relies heavily on the use of DataClass
data
containers that are used to encapsulate data and to propagate them
through your workflow.
sbpy
provides data containers for orbital elements
(Orbit
), ephemerides (Ephem
), observational
data (Obs
), and physical properties
(Phys
) data.
What are Data Containers?¶
DataClass
- and hence all the data containers presented
here - uses a QTable
object under the hood. You can
think of those as tables - consisting of fields (or columns)
and rows - that have units
attached to them, allowing
you to propagate these units through your programs. We strongly urge
the user to make use of units
in the definition of data
containers in order to minimize confusion and tap the full potential
of sbpy
. Finally, DataClass
objects have a meta
attribute that enables the user to label these objects with
unstructured meta data.
The user is free to add any fields they want to a
DataClass
object. However, in order to enable the
seamless use of sbpy
functions and methods, we require the user to
pick among a few common field names for different properties as listed
here. DataClass
objects
are able to identify alternative field names as listed in this
document, as well as to perform transformations between a few field
names - see below for more details.
A DataClass
object can hold as many data rows as you
want. All rows can refer to a single object, or each row can refer to
a separate object - this is usually up to the user, restrictions exist
only in a few cases as detailed in this documentation.
Why are there different Data Container Types?¶
Although the data container classes Ephem
,
Orbit
, Obs
and Phys
look very
similar and are based on the same base class (DataClass
),
there are subtle differences. The container classes have been
introduced to minimize confusion between properties of different
natures, i.e., to avoid mixing apples with oranges. For instance,
Ephem
objects deal with ephemerides (e.g., RA and DEC and
other properties that change as a function of time); the diameter of
an object, however, does usually not vary with time. Adding a
“diameter” field to an Ephem
object would hence be
inefficient: this object can contain hundreds of rows describing a
target’s ephemerides, while the diameter would be the same in each of
these rows. Furthermore, providing separate data containers for
different properties enables the implementation of specifically
designed methods to query and modify the data held by these classes.
Data Container Type Overview¶
Ephem¶
Ephem
has been designed to hold
ephemerides, i.e., properties that vary with time.
Ephem
currently provides convenience functions to query
ephemerides from the JPL Horizons system
(from_horizons
) the Minor Planet Center
(from_mpc
), IMCCE’s Miriade system
(from_miriade
) as well as a convenience function to
derive ephemerides from an Orbit
object using pyoorb.
Obs¶
Obs
is tailored to holding observational data, e.g.,
magnitudes as a function of time. The Obs
class is the
only data container that is not directly derived from
DataClass
, but from Ephem
, providing the
same functionality as the latter. from_mpc
enables
you to query observations reported to the MPC for a specific target
and supplement
queries one of the ephemerides service
to supplement your observation data.
Orbit¶
Orbit
should be used to hold orbital elements of one
or several bodies. Elements can be retrieved using the convenience
function from_horizons
, propagated using
oo_propagate
, and transformed into other frames
using oo_transform
.
Phys¶
Phys
objects are meant to hold physical properties
that do not change over time. Known physical properties can currently
be queried from the JPL Small-Body Database Browser system using
from_sbdb
.
Names¶
Names
objects are somewhat different from the other data
containers, as they don’t hold properties but only object
names. These names can be used to identify object nature
(asteroid_or_comet
) and they can be parsed to
extract individual identifier components
(parse_asteroid
and
parse_comet
).
How to use Data Containers¶
All of the data objects dealt with in sbpy.data
share the same
common base class: sbpy.data.DataClass
. DataClass
defines the basic functionality and makes sure that all sbpy.data
objects can used in the exact same way.
In plain words, this means that in the following examples you can
replace DataClass
, Ephem
,
Orbit
, Obs
, and Phys
object
with each other. In order to show some useful use cases, we will
iterate between these types, but keep in mind: they all work the exact
same way.
Building Data Containers¶
While Ephem
, Orbit
, Obs
, and
Phys
provide a range of convenience functions to build
objects containing data, for instance from online data archives, it is
easily possible to build these objects from scratch. This can be done
for input data stored in dictionaries
(from_dict
), lists or arrays
(from_columns
and
from_rows
), Table
objects
(from_table
), or from data files
(from_file
).
Depending on how your input data are organized, you can use different options in different cases:
Building a Data Container from a Dictionary¶
Assume that you want to build an Orbit
object to
propagate this orbit and obtain ephemerides. Since you are dealing
with a single orbit, the most convenient solution might be to use a
dictionary to build your object:
>>> from sbpy.data import Orbit
>>> from astropy.time import Time
>>> import astropy.units as u
>>> elements = {'a':1.234*u.au, 'e':0.1234, 'i':12.34*u.deg,
... 'argper': 123.4*u.deg, 'node': 45.2*u.deg,
... 'epoch': Time(2451200.5, format='jd'), 'true_anom':23.1*u.deg}
>>> orb = Orbit.from_dict(elements)
>>> orb
<QTable length=1>
a e i argper node epoch true_anom
AU deg deg deg deg
float64 float64 float64 float64 float64 Time float64
------- ------- ------- ------- ------- --------- ---------
1.234 0.1234 12.34 123.4 45.2 2451200.5 23.1
One quick note on building DataClass
objects from
dictionaries: dictionaries have no intrinsic order. In dictionary
elements
as defined here, there is no guarantee that 'a'
will
always be located before 'e'
when reading out the dictionary item
by item, which happens when the data table is built in the
background. Hence, the order of the resulting data table columns has
to be considered random. If you want to force a specific order on the
columns in your data table, you can use an OrderedDict
instead of a simple dictionary. The order of elements in an
OrderedDict
will be the same as the order of the data
table columns.
For details on how to build objects from dictionaries, see
from_dict
.
Building a Data Container from Columns¶
Now assume that you want to build an Obs
object holding
RA, Dec, and observation midtime for some target that you observed. In
this case, you can use from_columns
as shown
here:
>>> from sbpy.data import Obs
>>> import astropy.units as u
>>> from astropy.time import Time
>>> from numpy import array
>>> ra = [10.223423, 10.233453, 10.243452]*u.deg
>>> dec = [-12.42123, -12.41562, -12.40435]*u.deg
>>> epoch = Time(2451523.5 + array([0.1234, 0.2345, 0.3525]), format='jd')
>>> obs = Obs.from_columns([ra, dec, epoch], names=['ra', 'dec', 't'])
>>> obs
<QTable length=3>
ra dec t
deg deg
float64 float64 Time
--------- --------- ------------
10.223423 -12.42123 2451523.6234
10.233453 -12.41562 2451523.7345
10.243452 -12.40435 2451523.8525
Note how epoch
is handled differently: it is provided to
Obs.from_column
as a Time
object (see
Design Principles - The Zen of sbpy for a discussion).
For details on how to build objects from lists or arrays, see
from_columns
and also
from_rows
, depending on whether your data is
represented as rows or columns. Note that you could also use
from_dict
by providing column data to the
different fields.
Building a Data Container from a Table¶
If your data are already available as a Table
or
QTable
, you can simply convert it into a
DataClass
object using from_table
.
Building a Data Container from a File¶
You can also read in the data from a file that should be properly
formatted using from_file
. This function merely
serves as a wrapper for astropy.table.Table.read
and uses the same
parameters as the latter function; please refer to this document for a review.
As an example, you can read in a properly formatted ASCII file using the following lines:
>>> from sbpy.data import Ephem
>>> data = Ephem.from_file('data.txt', format='ascii')
Please note that the file formats available (see here
for a list of available formats) provide varying support for units and
meta data. For instance, basic
, csv
, html
, and latex
do not provide unit or meta data information. However, fits
,
cds
, daophot
, ecsv
, and ipac
do support units and meta
data.
Building a Data Container from an Online Query¶
Most DataClass
data containers offer convenience
functions to query data from online service. Please refer to the
corresponding classes for information and examples for querying data.
A Note on Field Names¶
In order for sbpy
to properly identify the fields that might be
necessary for calculations, default column names should be used to
name these fields. For instance, a column of Right Ascensions should
be named 'RA'
or 'ra'
. For a list of acceptable field names,
please refer to the list of Data Container Field Name Reference.
Also note that sbpy
is able to use alternative field names, but only those that are listed in the list of
Data Container Field Name Reference.
Accessing data¶
In order to obtain a list of field names in a DataClass
object, you can use field_names
:
>>> obs.field_names
['ra', 'dec', 't']
You can also use the in
operator to check if a field is contained in
a DataClass
object. Alternative field names can be used
for the in
test:
>>> 'ra' in obs
True
>>> 'RA' in obs
True
Each of these columns can be accessed easily, for instance:
>>> obs['ra']
<Quantity [10.223423, 10.233453, 10.243452] deg>
which will return an Quantity
object if that
column has a Unit
attached to it or a Column
otherwise.
Similarly, if you are interested in the first set of observations in
obs
, you can use:
>>> obs[0]
<QTable length=1>
ra dec t
deg deg
float64 float64 Time
--------- --------- ------------
10.223423 -12.42123 2451523.6234
which returns you a new instance of the same class as your original objet with only the requested subset of the data. In order to retrieve RA from the second observation, you can combine both examples and do:
>>> obs[1]['ra']
<Quantity [10.233453] deg>
Just like in any Table
or QTable
object, you can use slicing to obtain subset tables from your data,
for instance:
>>> obs['ra', 'dec']
<QTable length=3>
ra dec
deg deg
float64 float64
--------- ---------
10.223423 -12.42123
10.233453 -12.41562
10.243452 -12.40435
>>> obs[:2]
<QTable length=2>
ra dec t
deg deg
float64 float64 Time
--------- --------- ------------
10.223423 -12.42123 2451523.6234
10.233453 -12.41562 2451523.7345
>>> obs[obs['ra'] <= 10.233453 * u.deg]
<QTable length=2>
ra dec t
deg deg
float64 float64 Time
--------- --------- ------------
10.223423 -12.42123 2451523.6234
10.233453 -12.41562 2451523.7345
The results of these examples will be of the same data type as obs
(or really just any type derived from DataClass
, e.g.,
Ephem
, Orbit
, …) The latter example shown
here uses a condition to filter data (only those observations with RA
less than or equal to 10.233453 degrees; note that it is necessary
here to apply u.deg
to the value that all the RAs are compared
against) but selects all the columns in the original table.
If you ever need to access the actual QTable
object
that is inside each DataClass
object, you can access it
as obs.table
.
Modifying an object¶
Individual elements, entire rows, and columns can be modified by directly addressing them:
>>> obs['ra']
<Quantity [10.223423, 10.233453, 10.243452] deg>
>>> obs['ra'] = obs['ra'] + 0.1 * u.deg
>>> obs['ra']
<Quantity [10.323423, 10.333453, 10.343452] deg>
The basic functionalities to modify the data table are implemented in
DataClass
, including adding rows and columns and stack a
DataClass with another DataClass object or an Table
object.
Let’s assume you want to add some more observations to your obs
object:
>>> obs.add_row([10.255460 * u.deg, -12.39460 * u.deg, 2451523.94653 * u.d])
>>> obs
<QTable length=4>
ra dec t
deg deg
float64 float64 Time
--------- --------- -------------
10.323423 -12.42123 2451523.6234
10.333453 -12.41562 2451523.7345
10.343452 -12.40435 2451523.8525
10.25546 -12.3946 2451523.94653
or if you want to add a column to your object:
>>> obs.apply(['V', 'V', 'R', 'i'], name='filter')
>>> obs
<QTable length=4>
ra dec t filter
deg deg
float64 float64 Time str32
--------- --------- ------------- ------
10.323423 -12.42123 2451523.6234 V
10.333453 -12.41562 2451523.7345 V
10.343452 -12.40435 2451523.8525 R
10.25546 -12.3946 2451523.94653 i
The same result can be achieved using the following syntax:
>>> obs['filter2'] = ['V', 'V', 'R', 'i']
>>> obs
<QTable length=4>
ra dec t filter filter2
deg deg
float64 float64 Time str32 str1
--------- --------- ------------- ------ -------
10.323423 -12.42123 2451523.6234 V V
10.333453 -12.41562 2451523.7345 V V
10.343452 -12.40435 2451523.8525 R R
10.25546 -12.3946 2451523.94653 i i
Similarly, existing columns can be modified using:
>>> obs['filter'] = ['g', 'i', 'R', 'V']
If you want to stack two observations into a single object:
>>> ra = [20.223423, 20.233453, 20.243452] * u.deg
>>> dec = [12.42123, 12.41562, 12.40435] * u.deg
>>> phase = [10.1, 12.3, 15.6] * u.deg
>>> epoch = Time(2451623.5 + array([0.1234, 0.2345, 0.3525]), format='jd')
>>> obs2 = Obs.from_columns([ra, dec, epoch, phase],
... names=['ra', 'dec', 't', 'phase'])
>>>
>>> obs.vstack(obs2)
>>> obs
<QTable length=7>
ra dec t filter filter2 phase
deg deg deg
float64 float64 Time str1 str1 float64
--------- --------- ------------- ------ ------- -------
10.323423 -12.42123 2451523.6234 g V ———
10.333453 -12.41562 2451523.7345 i V ———
10.343452 -12.40435 2451523.8525 R R ———
10.25546 -12.3946 2451523.94653 V i ———
20.223423 12.42123 2451623.6234 -- -- 10.1
20.233453 12.41562 2451623.7345 -- -- 12.3
20.243452 12.40435 2451623.8525 -- -- 15.6
Note that the data table to be stacked doesn’t have to have the same
columns as the original data table. A keyword join_type
is used to
decide how to process the different sets of columns. See
vstack()
for more detail.
Because the underlying QTable
can be exposed by the
table
property, it is possible to modify the data
table by directly accessing the underlying QTable
object.
However, this is not generally advised. You should use the mechanisms provided
by DataClass
to manipulate the data table as much as possible
to maintain the integrity of the data table.
Additional Data Container Concepts¶
Alternative field names¶
If you ask 3 different planetary astronomers which field name or
variable name they use for the orbital inclination, you will receive 3
different answers. Good candidates might be 'i'
, 'inc'
, or
'incl'
- it’s a matter of personal taste. The sbpy
developers
are aware of this ambiguity and hence DataClass
provides
some flexibility in the use of field name. This functionality is
established through a list of acceptable field names that are
recognized by sbpy
, which is provided in the
Data Container Field Name Reference.
As an example, if your Orbit
object has a column named
'incl'
but you try to get column 'i'
, the object will
internally check if 'i'
is a legitimate field name and what its
alternatives are, and it will find that a field name 'incl'
exists
in the object. The corresponding 'incl'
column is then
returned. If you try to get a field name that is not connected to any
existing field name, a KeyError
will be raised.
>>> from sbpy.data import Orbit
>>> orb = Orbit.from_dict({'incl': [1, 2, 3]*u.deg})
>>> orb['i']
<Quantity [1., 2., 3.] deg>
The definition of alternative field names is done in the file
sbpy/data/__init__.py
, using the list fieldnames
. This list is
automatically tested for potential naming conflicts, i.e., different
properties that share the same alternative field names, and a
human-readable list is compiled upon building sbpy
.
The full list of field names is available here: Data Container Field Name Reference.
Field conversions¶
There are parameters and properties that can be used synonymously, a
good example for which are an object’s radius and diameter. sbpy
acknowledges identities like this by providing internal conversions
for such properties. Consider the following example:
>>> from sbpy.data import Phys
>>> import astropy.units as u
>>> data = Phys.from_dict({'d': 10*u.km})
>>> print('{:.1f}'.format(data['d'][0]))
10.0 km
>>> print('{:.1f}'.format(data['radius'][0]))
5.0 km
Note that the radius is not explicitly defined in data
, but
derived internally upon querying it and added to the internal data table:
>>> print(data.field_names)
['d', 'radius']
Epochs and the use of astropy.time¶
Epochs and data referring to specific points in time have to be
provided as Time
objects. The advantage of these
objects is their flexibility in terms of format and time
scale. Time
objects can be readily transformed into a
wide range of formats; for instance, Time('2019-07-23 10:49').jd
can be used to convert an ISO epoch to a Julian date.
More importantly, Time
provides functionality to
transform epochs between different time scales. Hence, every
Time
object comes with a time scale (UTC is used
by default) and can be easily transformed into a different time
scale. The following example defines an epoch in UTC and as a Julian
date and transforms it to TDB:
>>> from astropy.time import Time
>>> epoch = Time(2451200, format='jd')
>>> epoch
<Time object: scale='utc' format='jd' value=2451200.0>
>>> epoch.tdb
<Time object: scale='tdb' format='jd' value=2451200.000742876>
>>> epoch.tdb.iso
'1999-01-21 12:01:04.184'
Using Time
in DataClass
objects is
straightforward. The following example builds a simple
Obs
object from a dictionary:
>>> from sbpy.data import Obs
>>> times = ['2018-10-01', '2018-11-01', '2018-12-01']
>>> obs = Obs.from_dict({'epoch': Time(times), 'mag': [10, 12, 14]*u.mag})
>>> obs
<QTable length=3>
epoch mag
mag
Time float64
----------------------- -------
2018-10-01 00:00:00.000 10.0
2018-11-01 00:00:00.000 12.0
2018-12-01 00:00:00.000 14.0
The 'epoch'
column in obs
can be used like any other field or
Time
object. The following example converts the epoch
to TAI and Julian date:
>>> obs['epoch'].tai.jd
array([2458392.50042824, 2458423.50042824, 2458453.50042824])
Note that different functions in sbpy
have different requirements on
the time scale of Time
objects. Fortunately,
Time
objects are able to convert most time scales
seamlessly. However, that requires that some user-defined time scale
might have to be converted to other time scale for compatibility
reasons internally, which also means that outpu t epochs usually
follow this forced time scale. In order to notify the user that the
time scale has been changed, a TimeScaleWarning
will be
issued.
Writing object data to a file¶
DataClass
objects can be written to files using
to_file
:
>>> obs.to_file('observations.dat')
By default, the data are written in ASCII format, but other formats
are available, too (list of file formats). Please
note that not all file formats support units and meta data. For
instance, basic
, csv
, html
, and latex
do not provide
unit or meta data information. However, fits
, cds
,
daophot
, ecsv
, and ipac
do support units and meta data.