SeisIO: A Fast, Efficient Geophysical Data Architecture for the Julia Language

Joshua P. Jones1^1, Kurama Okubo2^2, Tim Clements2^2and Marine A. Denolle2^2

1^1Portland, Oregon, USA

2^2Department of Earth and Planetary Sciences, Harvard University, Cambridge, MA, USA

This is a tutorial to accompany the article published in Seismological Research Letters doi: https://doi.org/10.1785/0220190295

Abstract

SeisIO for the Julia language is a new geophysical data framework that combines the intuitive syntax of a high‐level language with performance comparable to FORTRAN or C. Benchmark comparisons against recent versions of popular programs for seismic data download and analysis demonstrate significant improvements in file read speed and orders‐of‐magnitude improvements in memory overhead. Because the Julia language natively supports parallel computing with an intuitive syntax, we benchmark test parallel download and processing of multiweek segments of contiguous data from two sets of 10 broadband seismic stations, and find that SeisIO outperforms two popular Python‐based tools for data downloads. The current capabilities of SeisIO include file read support for several geophysical data formats, online data access using a variety of services, and optimized versions of several common data processing operations. Tutorial notebooks and extensive documentation are available to improve the user experience. As an accessible example of performant scientific computing for the next generation of researchers, SeisIO offers ease of use and rapid learning without sacrificing computational efficiency.

Installation

SeisIO.jl is a registered Julia package and can be installed using the Julia package manager.

using Pkg
Pkg.add("SeisIO")
2.6s

Getting Started

Start SeisIO by loading the package into memory:

using SeisIO 
0.2s

You'll need to do this at the start of every Julia session. SeisIO uses an array-like structure called SeisChannel for single-channel seismic data.

C = SeisChannel()
2.1s
SeisChannel with 0 samples ID: NAME: LOC: 0.0 N, 0.0 E, 0.0 m FS: 0.0 GAIN: 1.0 RESP: a0 1.0, f0 1.0, 0z, 0p UNITS: SRC: MISC: 0 entries NOTES: 0 entries T: X: (empty)

Access fields by name to view their contents directly:

C.x
0.8s
0-element Array{Float32,1}

You can also overwrite them by name:

C.x = rand(Float32,1024)
0.3s
1024-element Array{Float32,1}: 0.137569 0.07175 0.5004 0.815085 0.477524 0.84512 0.245006 0.798862 0.519067 0.797156 ⋮ 0.202904 0.877695 0.151946 0.89937 0.220425 0.704212 0.350717 0.498756 0.0601382
C.loc = GeoLoc(lat=-90.,lon=-0.0, el=9300.)
0.4s
-90.0 N, -0.0 E, 9300.0 m
C
0.2s
SeisChannel with 1024 samples ID: NAME: LOC: -90.0 N, -0.0 E, 9300.0 m FS: 0.0 GAIN: 1.0 RESP: a0 1.0, f0 1.0, 0z, 0p UNITS: SRC: MISC: 0 entries NOTES: 0 entries T: X: +1.376e-01 +7.175e-02 ... +6.014e-02 (nx = 1024)

The SeisData structure is the basic container that SeisIO uses. These objects behave like arrays of SeisChannel objects: they store multiple channels of data, but they're not matrices (more like arrays of arrays). You can initialize SeisData structures as empty containers for 0 or more channels:

S = SeisData(6)
1.5s

Let's create a random example for tutorial purposes.

using SeisIO.RandSeis
S = randSeisData(6)
8.6s

SeisData fields are accessed by name and index.

x1 = S.x[1]
0.2s
333420-element Array{Float32,1}: -0.0105186 -0.863169 2.49564 0.163625 0.0840347 0.456468 1.54621 0.899303 0.255382 0.19731 ⋮ 0.0327795 1.18892 0.00487551 -0.627559 0.862498 -1.53446 -2.26515 1.09544 0.83175

If you access a SeisData structure by a single integer index, the output is a SeisChannel:

C_new = S[1]
0.2s
SeisChannel with 333420 samples ID: AU.NNIDZ.I8.SAN NAME: Lq7kJItjdOGTz7Y13dQo85ayt3dTGG4TWKeLkGQZ LOC: [-86.7881, 148.786, 919.623, 37.31, 118.166, -66.2316] FS: 60.0 GAIN: 1.42901e8 RESP: a0 1.0, f0 1.0, 8z, 8p UNITS: rad SRC: randSeisChannel(c=false, nx=0) MISC: 4 entries NOTES: 1 entries T: 2020-04-29T01:37:42.083 (2 gaps) X: -1.052e-02 -8.632e-01 ... +8.318e-01 (nx = 333420)

If you access a SeisData structure by a range of indices, the output is a SeisData structure whose channel indices match your request:

S2 = S[1:3]
1.0s

These commands access identical information, but the first is faster:

x1 = S.x[1]        # get first index of data field :x
x2 = S[1].x        # extract S[1] to a SeisChannel, then get its data field :x
0.2s
333420-element Array{Float32,1}: -0.0105186 -0.863169 2.49564 0.163625 0.0840347 0.456468 1.54621 0.899303 0.255382 0.19731 ⋮ 0.0327795 1.18892 0.00487551 -0.627559 0.862498 -1.53446 -2.26515 1.09544 0.83175
x1 == x2           # true unless there are NaNs
0.4s
true

Working with Structures

A collection of SeisChannel objects becomes a SeisData structure:

S = SeisData(randSeisChannel(), randSeisChannel())
0.5s

You can push channels onto existing SeisData structures:

push!(S, C_new)
S
0.6s

SeisData structures can be concatenated:

append!(S, randSeisData(4))
S
0.5s

The addition operator also does this:

S += randSeisChannel()
S
0.5s

Thus, this command works: (note: please don't actually code like this)

S = SeisData(
        randSeisData(5), 
        randSeisChannel(), 
        SeisChannel(
            id="UW.SEP..EHZ", 
            name="Darth Exploded", 
            loc=GeoLoc(lat=46.1967, lon=-122.1875, el=1440.0), 
            t=[1 1569374150379000; 1024 0], 
            x=rand(1024)
            )
        )
0.9s

Warning: Combining structures calls prune! to remove empty channels, defined as channels with no data (empty :x); this is one of two operations\ needed to guarantee commutativity (i.e., for any SeisData structures S1, S2,\ S1 + S2 == S2 + S1). Thus, in the (heinous) creation syntax below, the command SeisData(3), which should add three empty channels, does nothing; prune! deletes the new channels. (To actually append empty channels, use append!)

S1 = SeisData(
        randSeisData(5), 
        randSeisChannel(), 
        SeisChannel(
            id="UW.SEP..EHZ", 
            name="Darth Exploded", 
            loc=GeoLoc(lat=46.1967, lon=-122.1875, el=1440.0), 
            t=[1 1569374150379000; 1024 0], 
            x=rand(1024)
            )
        ) + SeisData(3)
0.5s

Navigating Structures

There are two easy ways to find channels of interest in a data structure. The command findid returns the first channel number that matches the ID supplied:

i = findid("UW.SEP..EHZ", S)
0.2s
7

If you want partial matches, use the function findchan:

findchan("EHZ", S)
0.8s
1-element Array{Int64,1}: 7

Variant Structures

SeisData is a subtype of GphysData. The Abstract Type GphysData includes variant data structures, all of which have the same basic fields (:id, :x, etc). Other data structures will be added to GphysData as modules develop, but they will always contain the same basic fields. For example, in the Quake submodule, the EventTraceData structure has additional fields for location parameters like azimuth (:az) and has a custom Type for a phase catalog (:pha). You can convert between GphysData subtypes with convert, but extraneous fields will be lost, and fields that are not contained in the source Type will be initialized to default values:

using SeisIO.Quake
S_ev = randSeisData(6)
Tr = convert(EventTraceData, S_ev)
Tr
1.1s

Let's add some phases to see how conversion works:

Tr.pha[1]["P"] = SeisPha(amp = 1.0e6, d = 40075.0, tt = 1344.0, unc = 1.5)
Tr.pha[2] = randPhaseCat()
S_ev2 = convert(SeisData, Tr)
S_ev == S_ev2                    # true unless S_ev.x contains NaNs
1.7s
true

...but since the phase catalog wasn't retained by the SeisData object S_ev2,

Tr2 = convert(EventTraceData, S_ev2)
Tr == Tr2
0.3s
false

File I/O with SeisIO

Here we'll explain how to read and download data.

Before we do that, we'll need to grab some data from the SeisIO github page using svn

sudo apt-get update 
sudo apt-get install subversion
svn export https://github.com/jpjones76/SeisIO-TestData/trunk/Tutorial/
5.9s
Bash in Julia

Loading Entire Files

read_data reads one or more entire files into memory. Use this with legacy data formats like SAC, SEG Y, and mini-SEED:

?read_data
2.0s

If you know the file format that you're trying to read, pass it as the first argument to read_data in lowercase:

S = read_data("mseed", "Tutorial/2018.224.00.00.00.000.TA.C26K..BHZ.R.mseed")
0.2s
SeisData with 1 channels (1 shown) ID: TA.C26K..BHZ NAME: TA.C26K..BHZ LOC: 0.0 N, 0.0 E, 0.0 m FS: 40.0 GAIN: 1.0 RESP: a0 1.0, f0 1.0, 0z, 0p UNITS: SRC: MISC: 0 entries NOTES: 0 entries T: 2018-08-12T00:00:00.000 (0 gaps) X: +1.113e+03 +1.112e+03 ... +9.769e+03 (nx = 3456001) C: 0 open, 0 total
S = read_data("sac", "Tutorial/2018.224.00.00.00.000.TA.C26K..BHZ.R.SAC")
0.1s
SeisData with 1 channels (1 shown) ID: TA.C26K5..BHZ NAME: LOC: 69.9175 N, -144.912 E, 139.0 m FS: 40.0 GAIN: 6.28316e8 RESP: a0 1.0, f0 1.0, 0z, 0p UNITS: SRC: Tutorial/2018.224.00.00.00.000.TA… MISC: 0 entries NOTES: 1 entries T: 2018-08-12T00:00:00.000 (0 gaps) X: +1.113e+03 +1.112e+03 ... +9.769e+03 (nx = 3456001) C: 0 open, 0 total

f you don't know a file's format, read_data calls a (somewhat slower) function called guessthat can usually identify it:

S2 = read_data("Tutorial/2018.224.00.00.00.000.TA.C26K..BHZ.R.SAC")
1.6s
SeisData with 1 channels (1 shown) ID: TA.C26K5..BHZ NAME: LOC: 69.9175 N, -144.912 E, 139.0 m FS: 40.0 GAIN: 6.28316e8 RESP: a0 1.0, f0 1.0, 0z, 0p UNITS: SRC: Tutorial/2018.224.00.00.00.000.TA… MISC: 0 entries NOTES: 1 entries T: 2018-08-12T00:00:00.000 (0 gaps) X: +1.113e+03 +1.112e+03 ... +9.769e+03 (nx = 3456001) C: 0 open, 0 total
S == S2
0.2s
true

Reading from large volumes

Modern data formats create large volumes on disk, from which data are read in user-specified time windows. SeisIO currently supports reading from ASDF files through the function read_hdf5:

?read_hdf5
0.6s
hdf = "Tutorial/2days-40hz.h5"
id  = "CI.SDD..HHZ"
ts  = "2019-07-07T23:00:00"
te  = "2019-07-08T01:59:59.975"
S = read_hdf5(hdf, ts, te, id = id)
4.3s
SeisData with 1 channels (1 shown) ID: CI.SDD..HHZ NAME: Saddleback LOC: 33.5526 N, -117.662 E, 120.0 m FS: 40.0 GAIN: 6.2652e8 RESP: MultiStageResp (4 stages) with fi… UNITS: m/s SRC: /Tutorial/2days-40hz.h5 MISC: 4 entries NOTES: 2 entries T: 2019-07-07T23:00:00.000 (0 gaps) X: +5.957e+02 +6.266e+02 ... +1.651e+02 (nx = 432000) C: 0 open, 0 total

FDSN-style wildcards can be used for the ID string.

idr = "C*.SDD..HH?"
S1 = read_hdf5(hdf, ts, te, id = id)
0.3s
SeisData with 1 channels (1 shown) ID: CI.SDD..HHZ NAME: Saddleback LOC: 33.5526 N, -117.662 E, 120.0 m FS: 40.0 GAIN: 6.2652e8 RESP: MultiStageResp (4 stages) with fi… UNITS: m/s SRC: /Tutorial/2days-40hz.h5 MISC: 4 entries NOTES: 2 entries T: 2019-07-07T23:00:00.000 (0 gaps) X: +5.957e+02 +6.266e+02 ... +1.651e+02 (nx = 432000) C: 0 open, 0 total
S == S1
0.4s
true

If the id keyword isn't used, then all channels with data matching the time query are read into memory.

S2 = read_hdf5(hdf, ts, te)
0.2s
SeisData with 1 channels (1 shown) ID: CI.SDD..HHZ NAME: Saddleback LOC: 33.5526 N, -117.662 E, 120.0 m FS: 40.0 GAIN: 6.2652e8 RESP: MultiStageResp (4 stages) with fi… UNITS: m/s SRC: /Tutorial/2days-40hz.h5 MISC: 4 entries NOTES: 2 entries T: 2019-07-07T23:00:00.000 (0 gaps) X: +5.957e+02 +6.266e+02 ... +1.651e+02 (nx = 432000) C: 0 open, 0 total

Contents of an HDF5 volume can be scanned at the "station" (default) or "trace" level with scan_hdf5:

scan_hdf5(hdf)
0.9s
1-element Array{String,1}: "CI.SDD"
scan_hdf5(hdf, level="channel")
0.2s
2-element Array{String,1}: "/Waveforms/CI.SDD/CI.SDD..HHZ" "/Waveforms/CI.SDD/StationXML"

File and Format Information

Information on files and formats can be found in a number of places, including the command-line interface.

guess("Tutorial/2018.224.00.00.00.000.TA.C26K..BHZ.R.SAC")
0.6s
("sac", false)
ls Tutorial
0.7s
Bash in Julia
fname = "Tutorial/99011116541W"
g = guess("Tutorial/99011116541W")
0.2s
("uw", true)
SeisIO.formats[g[1]] # what are we looking at...?
0.6s
empty
S2 = read_data(g[1], fname)
0.6s

Saving Data

SeisData and SeisChannel structures can be written to ASDF, SAC, or to SeisIO's native format. ASDF format is portable (most languages can read HDF5) and well- suited to large data sets. SAC has the advantage that it's almost universally readable, but only writes 32-bit float data, and each contiguous data segment creates one file. SeisIO format saves complete information and does low-level object writing with a file index; it's the fastest of the three writers.

Writing to ASDF files

ASDF is an HDF5 file format designed for seismic data. Write to ASDF with the\ command write_hdf5.

?write_hdf5
0.4s
write_hdf5( hdf_out::String, S::GphysData )

Write data in a seismic HDF5 format to file hdf_out from structure S.

See also: read_hdf5

The simplest ASDF write creates a new file:

hdf_out = "test.h5"
write_hdf5(hdf_out, S)
1.7s

Writing to an existing ASDF volume adds the data to the Waveforms/ group, creating new Waveforms/(net.sta)/ paths as needed. Channel information is written to Waveforms/(net.sta)/StationXML:

write_hdf5(hdf_out, S1)
0.2s

Writing to SAC files

Writing to SAC creates one file for each contiguous data segment in each data channel. Although large collections of small files have been an obsolete data archival strategy since the early 2000s, SeisIO supports writing to SAC because the data format is almost universal.

writesac(S)                         # filenames are auto-generated. no need to specify.
writesacpz(S, "req_1.pz")           # in case you need instrument responses later.
2.6s

Two useful keywords control the creation of SAC files:

  • fname=FF uses filename FF, rather than creating file names automatically. This keywork only works with GphysChannel objects.

  • xy=true writes generic x-y data to file with time as the independent variable.

writesac(S[1], fname="tmp.SAC")     # writesac also works with SeisChannel objects.
0.4s

Writing to SeisIO files

The SeisIO low-level file format is comprehensive and fast. A SeisIO file can contain multiple structures, and structures written to file can be any Type in SeisIO (including, for example, just an instrument response). Information in structures is fully preserved except for exotic data Types stored in the :misc field. To read from a SeisIO file, you'll need to specify one or more object numbers, or the output will be of Type Array{Any,1}:

wseis("req_1.seis", S)
S2 = rseis("req_1.seis")[1]
S == S2
1.3s
true

Data Requests

Here, we'll learn to download data using SeisIO.

get_data is the wrapper to online time-series data requests. You can use it with FDSN dataselect and IRIS timeseries functions.

?get_data
1.0s

Let's try an example.

First, we'll get the current local time.

using Dates 
ds = Dates.now(); ds -= (Day(1) + Millisecond(ds) + Second(ds))
s = string(ds)
0.7s
"2020-04-29T03:03:00"

Now, let's use that to request some data. From the help text, the keywords s= and t= accept Strings, DateTime objects, and numbers. So let's start at s, as defined above, and end at t=600, or 10 minutes later.

cha_str = "UW.MBW..EHZ, UW.SHW..EHZ, UW.HSR..EHZ, UW.TDH..EHZ, CC.PALM..EH?" 
S = get_data("FDSN", cha_str, src="IRIS", s=s, t=600)
19.8s

What each positional argument does

  • "FDSN" tells get_data to use the FDSN dataselect service for our request

  • The long string gives the data channels requested as a comma-separated list

What each keyword does

  • s=s sets the start time to s, the string created in the cell above.

  • t=600 sets the termination (end) time to 600 seconds after s.

  • src="IRIS" tells get_data to check the IRIS FDSN dataselect server.

Note: the last keyword isn't equivalent to setting the first positional argument to "IRIS", because IRIS runs FDSN dataselect and its own timeseries request service (plus many others).

Any sign of TDH?

findid("UW.TDH..EHZ", S)
0.3s
4

Where can we look for data? What servers are available?

?seis_www
0.6s

I bet that CalTech is happy to handle a random download request!

S2 = get_data("FDSN", "CI.SDD..BHZ", src="SCEDC", s=s, t=600, fmt="mseed", msr=true, w=true, demean=true, rr=true)
4.3s
SeisData with 1 channels (1 shown) ID: CI.SDD..BHZ NAME: Saddleback LOC: 33.5526 N, -117.662 E, 120.0 m FS: 40.0 GAIN: 6.2652e8 RESP: MultiStageResp (4 stages) with fi… UNITS: m/s SRC: http://service.scedc.caltech.edu/… MISC: 4 entries NOTES: 2 entries T: 2020-04-29T03:02:59.994 (0 gaps) X: +3.517e+02 +2.694e+02 ... +6.756e+01 (nx = 24001) C: 0 open, 0 total

What the new keywords do:

  • src="SCEDC" tells get_data to use the SCEDC FDSN servers.

  • fmt="mseed" specifies the data format for the download. (Note: mseed is actually the default, but including this keyword is useful for tutorial purposes.)

  • w=true write the download directly to disk, byte for byte, before any parsing happens. The file extension is always ".fmt". The entire request is saved even if a parsing error happens -- which is rare, but possible with SEED. (Some Blockettes and data decoders are so rare that we've literally never seen them)

  • demean=true removes the mean of each channel after downloading.

  • rr=true removes the instrument response, flattening to DC.

  • msr=true uses the multi-stage instrument response. Most users don't need that much detail, so msr defaults to false.

# Example of single-stage response
S.resp[1]
0.4s
a0 2.49365e8, f0 1.0, 2z, 10p
# Example of multi-stage response
S2.resp[1]
0.4s
MultiStageResp (4 stages) with fields: 4-stage MultiStageResp

Check logs early and often

All file I/O and data processing operations are logged to the :notes \ fields of SeisIO data structures. For example:

S2.notes[1]
0.3s
2-element Array{String,1}: "2020-04-30T03:07:00.223: demean!" "2020-04-30T03:07:00.935: translate_resp!, wl = 1.1920929e-7, resp_old = PZResp64(a0=5.73035e8, f0=0.03, p=Complex{Float64}[-1131.0 + 0.0im, -1005.0 + 0.0im, -502.65 + 0.0im, -0.037008 - 0.037008im, -0.037008 + 0.037008im], z=Complex{Float64}[0.0 + 0.0im, 0.0 + 0.0im])"

Saving requests

Remember, from above: data requests can be written directly to disk with keyword w=true. This writes raw output to file, even if data parsing somehow fails.

In addition, SeisData and SeisChannel structures can be written to ASDF, SAC, or to SeisIO's native format, as we saw above.

wseis("req_1.seis", S)
0.3s

Data request syntax is always the same

NN.SSSSS.LL.CC (net.sta.loc.cha, separated by periods) is the expected syntax for all web functions. The maximum field width in characters corresponds to the length of each field (e.g. 2 for network). Fields can’t contain whitespace.

Data requests in SeisIO all use this syntax, even though IRIS timeseries, FDSN dataselect, and SeedLink format strings differently. Request strings are converted to the appropriate syntax for the request protocol.

# these are identical requests
channels = "UW.KMO.., IU.COR.00.BHZ, CC.LON..BH?"                          # single String
channels = ["UW.KMO..", "IU.COR.00.BHZ", "CC.LON..BH?"]                    # Vector{String}
channels = ["UW" "KMO" "" ""; "IU" "COR" "00" "BHZ"; "CC" "LON" "" "BH?"]  # Matrix{String}
1.2s
3×4 Array{String,2}: "UW" "KMO" "" "" "IU" "COR" "00" "BHZ" "CC" "LON" "" "BH?"
?chanspec # where to find help on channel specification
0.7s

See also: https://seisio.readthedocs.io/en/latest/src/Appendices/web_syntax.html

Multiple data requests to one structure

\ Web requests to a structure S always create a new channel in S for each channel in each request, even if a channel with the same ID exists. This is necessary to prevent TCP congestion. \

This is different from multiple file reads to one structure; file reads \ always attempt to append channels with matching IDs.

You can "flatten" structures with redundant channels by calling merge!. To see how this works, let's append a new data request to our first one:

get_data!(S, "FDSN", cha_str, src="IRIS", s=ds+Minute(10), t=600) 
S.id
4.4s
8-element Array{String,1}: "UW.HSR..EHZ" "UW.MBW.01.EHZ" "UW.SHW..EHZ" "UW.TDH..EHZ" "UW.HSR..EHZ" "UW.MBW.01.EHZ" "UW.SHW..EHZ" "UW.TDH..EHZ"

With two sets of requests to the same channels, each channel ID should appear twice. Let's clean this up.

merge!(S)
2.6s

Check the results:

S.id
0.2s
4-element Array{String,1}: "UW.HSR..EHZ" "UW.MBW.01.EHZ" "UW.SHW..EHZ" "UW.TDH..EHZ"

Saving your work

All data from this tutorial can be written to file using commands from the File IO tutorial. To review:

wseis("fname.seis", S) writes S to low-level SeisIO native format file fname.seis.

writesac(S) writes S to SAC files with auto-generated names.

write_hdf5("fname.h5", S) writes S to ASDF (HDF5) file fname.h5.

Additional Examples

The examples below are also found in https://seisio.readthedocs.io/en/latest/src/Appendices/examples.html

FDSN get_data

Request the last 600 seconds of data from the IRIS FDSNWS server \ for channels CC.PALM, UW.HOOD, CC.TIMB, CC.HIYU, UW.TDH

S_fdsn = get_data("FDSN", "CC.PALM, UW.HOOD, CC.TIMB, CC.HIYU, UW.TDH", src="IRIS", s=-600, t=0)
5.6s
S.id
0.3s
4-element Array{String,1}: "UW.HSR..EHZ" "UW.MBW.01.EHZ" "UW.SHW..EHZ" "UW.TDH..EHZ"
IRIS get_data
S_iris = get_data("IRIS", ["CC.TIMB..BHE", "CC.TIMB..BHN", "CC.TIMB..BHZ", "UW.HOOD..HHE", "UW.HOOD..HHN", "UW.HOOD..HHZ"], s=-3600, t=-1800)
15.3s
FDSNevt

Request waveform data for the Tohoku-Oki great earthquake, recorded by some borehole strain meters and seismometers in WA (USA), from IRIS (USA). This function is part of the Quake submodule.

using SeisIO.Quake
S_evt = FDSNevt("201103110547", "PB.B004..EH?,PB.B004..BS?,PB.B001..BS?,PB.B001..EH?")
20.3s
Event 3279407: SeisEvent with 14 channels (.hdr) ID: 3279407 INT: 0 LOC: 38.2963 N, 142.498 E, 19.7 km MAG: MW 9.1 (g 0.0°, n 0) OT: 2011-03-11T05:46:23.2 SRC: TYP: earthquake MISC: 0 items NOTES: 0 entries (.source) m₀ = 0.0; S = Float64[]; NP = Array{Float64}(undef,0,0); PAX = Array{Float64}… (.data) EventTraceData with 14 channels

Processing

Now, we'll explain how to process data. Let's start with the data we acquired at the end of the last tutorial:

S = (
        if isfile("req_1.seis")
            rseis("req_1.seis")[1]
        else
            ds = Dates.now(); ds -= (Day(1) + Millisecond(ds) + Second(ds))
            s = string(ds)
            get_data("FDSN", "UW.MBW..EHZ, UW.SHW..EHZ, UW.HSR..EHZ, UW.TDH..EHZ, CC.PALM..EH?", src="IRIS", s=s, t=600) 
        end
    )
if isfile("req_1.seis") == false
    wseis("req_1.seis", S)
end
0.2s

List of Data Processing Functions

  • convert_seis: differentiate and integrate seismograms

  • demean: remove mean

  • detrend: remove trend line

  • env: compute envelope

  • filtfilt: zero-phase filter

  • nanfill: fill NaNs

  • resample: Data resampling

  • sync: time-synchronize a SeisData structure

  • merge: merge data

  • ungap: remove gaps in channels

  • unscale: divide out :gain from all trace data

"In-Place" vs. "Safe" Operations

Processing functions each have two versions: an "in-place" variant that ends with an exclamation mark ("!"), and a "safe" version that stores the result in a new data structure. The in-place variant overwrites data in place. This generally isn't reversible. Here's a simple set of processing operations, common for seismic data:

U = deepcopy(S) # back up, just ... in... case...
0.2s
detrend!(S)
ungap!(S)
resample!(S, fs=50.0)
filtfilt!(S, rt="Lowpass", fh=10.0)
S.notes[1]
0.2s
3-element Array{String,1}: "2020-04-30T03:31:13.883: detrend!, n = 1, polyfit result = Float32[0.0012666576, 23.360363]" "2020-04-30T03:31:13.888: resample!, fs=50.0" "2020-04-30T03:31:13.899: filtfilt!, fl = 1.0, fh = 10.0, np = 4, rp = 8, rs = 30, rt = Lowpass, dm = Butterworth"

The filter uses a mix of default values and custom keyword parameters; the result applies a 4-pole Butterworth filter to each (regularly-sampled) channel of S in place.

Instrument Response

Translating instrument responses can be tricky. Let's work an example of that next. First, we create a target instrument response. Since the data in S are mostly short-period geophones, let's keep it reasonable -- a 10s rolloff frequency.

?fctoresp
0.4s
fctoresp(f)
fctoresp(f, c)

Create PZResp or PZResp64 instrument response from lower corner frequency f and damping constant c. If no damping constant is supplies, assumes c = 1/sqrt(2).

See Also

PZResp, PZResp64

resp_new = fctoresp(0.1)
0.5s
a0 1.0, f0 0.1, 1z, 2p

Now let's update the sensitivity, so that the translated instrument responses have meaningful velocity amplitudes.

resp_a0!(resp_new)
resp_new
0.4s
a0 0.888577, f0 0.1, 1z, 2p
?translate_resp
0.7s

Update the first three channels of S to the new response:

translate_resp!(S, resp_new, chans=1:3)
0.7s

Remove the instrument response from the rest completely, flattening to DC:

Sf = remove_resp(S, chans=4:S.n)
0.8s

Synchronization

In some situations, it's useful to force all traces in a SeisData container to start (and perhaps end) at the same time(s). Do this with the command sync:

S = deepcopy(U)
0.3s
# don't worry about memorizing these commands; they're
# only for demonstration purposes.
d0 = u2d(1.0e-6*(minimum([S.t[i][1,2] for i in 1:S.n]) - 1000000))
d0 -= Millisecond(d0)
d0 += Millisecond(5)
t0 = string(d0)
t1 = string(d0+Second(300))
0.3s
"2020-04-29T03:07:59.005"

Synchronized traces can still have trace lengths that differ by a sample if the start and end times align exactly with one trace's sample intervals but not another's; that's why we added 5 ms to t0 above.

sync!(S, s=t0, t=t1)
S
0.9s
[length(i) for i in S.x]
0.2s
4-element Array{Int64,1}: 30000 30000 30000 30000

...are they all equal? They should be within two samples.

For more help

Please consult the Processing chapter of the official documentation: https://seisio.readthedocs.io/en/latest/src/Processing/processing.html

Shift+Enter to run
Runtimes (1)