SeisNoise.jl GPU Computing Tutorial

Abstract We introduce SeisNoise.jl, a library for high-performance ambient seismic noise cross-correlation, written entirely in the computing language Julia. Julia is a new language, with syntax and a learning curve similar to MATLAB, R, or Python and performance close to Fortran or C. SeisNoise.jl is compatible with high-performance computing resources using both the Central Processing Unit (CPU) and the Graphics Processing Unit (GPU). SeisNoise.jl is a modular toolbox, giving researchers common tools and data structures to design custom ambient seismic cross-correlation workflows in Julia.

Check out the Paper in SRL!

SeisNoise.jl: Ambient Seismic Noise Cross-Correlation on the CPU and GPU in Julia

Tim Clementsinline_formula not implemented, and Marine A. Denolleinline_formula not implemented

inline_formula not implementedDepartment of Earth and Planetary Sciences, Harvard University, Cambridge, MA

This is a tutorial to accompany paper in Seismological Research Letters

Why Julia?

Julia is fast!

Julia was designed from the beginning for high performance. Julia programs compile to efficient native code for multiple platforms via LLVM.

Dynamic

Julia is dynamically typed, feels like a scripting language, and has good support for interactive use.

Optionally typed

Julia has a rich language of descriptive datatypes, and type declarations can be used to clarify and solidify programs.

General

Julia uses multiple dispatch as a paradigm, making it easy to express many object-oriented and functional programming patterns. It provides asynchronous I/O, debugging, logging, profiling, a package manager, and more.

Easy to use

Julia has high-level syntax, making it an accessible language for programmers from any background or experience level. Browse the Julia microbenchmarks to get a feel for the language.

Open source

Julia is provided under the MIT license, free for everyone to use. All source code is publicly viewable on GitHub.

Learning Julia

We recommend the learning Julia page for resources on learning Julia.

Installation

SeisNoise.jl is a registered Julia package and can be installed using the Julia package manager. Here we'll install SeisNoise.jl and a couple other packages, such as SeisIO.jl for loading/downloading data and CUDA.jl for using the GPU.

using Pkg 
Pkg.add("SeisIO")
Pkg.add("CUDA")
Pkg.add("BenchmarkTools")
Pkg.add(PackageSpec(url="https://github.com/tclements/SeisNoise.jl", rev="master")) # add the most recent version from github
3.0s

Before we can use a package in Julia, we need to import it with the using syntax. We'll load the following modules

  • Dates : For time manipulation

  • CUDA : Interface to GPU computations

  • SeisIO : For loading/downloading/pre-processing seismic data

  • SeisNoise : Ambient Noise Cross-Correlation

The first time these modules are loaded will take some time, as they need to be pre-compiled...

using BenchmarkTools, CUDA, Dates, SeisIO, SeisNoise 
0.5s

Brief Primer on Ambient Noise Cross-Correlation

Given two seismometers, inline_formula not implemented and inline_formula not implemented, placed at some distance apart on the surface, the stations will record ground motion as a function of time: inline_formula not implemented and inline_formula not implemented. Over long periods of time, the cross-correlation of ground motions of inline_formula not implemented and inline_formula not implemented

inline_formula not implemented

yields a band-limited approximation of the elastodynamic Green's function, or impulse response, between inline_formula not implemented and inline_formula not implemented. We call inline_formula not implemented the time-domain cross-correlation function. Using the convolution theorem, we can re-write cross-correlation in the frequency domain:

inline_formula not implemented

where inline_formula not implemented is the frequency-domain cross-correlation, inline_formula not implemented is the Fourier transform of inline_formula not implemented and inline_formula not implemented is the Fourier transform of inline_formula not implemented, and inline_formula not implemented denotes the complex conjugate. We can go retrieve the time-domain cross-correlation from the frequency-domain cross-correlation using the inverse Fourier transform:

inline_formula not implemented

SeisNoise.jl computes cross-correlations using inline_formula not implemented and inline_formula not implemented rather than inline_formula not implemented due to the efficiency of the Fast Fourier Transform (FFT). We'll show in the following sections how to implement this with SeisNoise.jl.

SeisNoise Structure

Processing in SeisNoise.jl follows a flow of data through custom structures for ambient noise cross-correlation.

We use the SeisIO.jl module to load seismic data. SeisIO.jl can read data in the following formats:

  • ASDF, GeoCSV, QuakeML, SAC, SEED, SEG Y, SLIST, SUDS, StationXML, Win32

SeisIO.jl provides two types of structures for handling data:

  • SeisChannel: stores single-channel geophysical data

  • SeisData : stores multi-channel geophysical

SeisNoise.jl provides three custom structures for computing ambient noise cross-correlations:

  • RawData : stores ambient seismic noise data in short, overlapping time windows -> inline_formula not implemented and inline_formula not implemented, as above

  • FFTData : stores Fourier transforms of these time windows -> inline_formula not implemented and inline_formula not implemented, as above

  • CorrData : stores the corresponding NCFs -> inline_formula not implemented, as above

Cross-Correlation Pre-Processing

SeisNoise.jl uses SeisIO.jl for reading data and pre-processing. Here we'll use the Southern California Earthquake Data Center (SCEDC) Open Dataset on Amazon Web Services (AWS) to test some cross-correlation with SeisNoise.jl.

scedc-pds
Public

First let's load some data using the SeisIO get_data function:

S = read_data("/scedc-pds/continuous_waveforms/2019/2019_031/CISDD__BHZ___2019031.ms")
3.2s
SeisData with 1 channels (1 shown) ID: CI.SDD..BHZ NAME: CI.SDD..BHZ LOC: 0.0 N, 0.0 E, 0.0 m FS: 40.0 GAIN: 1.0 RESP: a0 1.0, f0 1.0, 0z, 0p UNITS: SRC: /scedc-pds/continuous_waveforms/2… MISC: 0 entries NOTES: 1 entries T: 2019-01-31T00:00:00 (0 gaps) X: +1.430e+02 +2.900e+01 ... -6.130e+02 (nx = 3456000) C: 0 open, 0 total

S holds metadata, processing notes, timing information and data for each channel. Below is a simple SeisIO.jl workflow to downsample S from 40 to 20 Hz:

ungap!(S) # remove gaps
detrend!(S) # remove mean and trend
taper!(S) # taper ends
resample!(S,fs=20.) # downsample from 40 to 20 [Hz]
0.2s

Functions in Julia that end in ! (e.g. detrend!(S) as above), by convention modify their arguments in-place, while functions without !’s (e.g. detrend(S)) allocate new arrays or structures.

resample(S,fs=10.) # allocates new SeisData structure 
S
0.5s
SeisData with 1 channels (1 shown) ID: CI.SDD..BHZ NAME: CI.SDD..BHZ LOC: 0.0 N, 0.0 E, 0.0 m FS: 20.0 GAIN: 1.0 RESP: a0 1.0, f0 1.0, 0z, 0p UNITS: SRC: /scedc-pds/continuous_waveforms/2… MISC: 0 entries NOTES: 6 entries T: 2019-01-31T00:00:00 (0 gaps) X: -1.157e-06 -1.140e-06 ... -1.157e-06 (nx = 1728000) C: 0 open, 0 total

Note that the call to resample(S,fs=10.) does not change S's sampling rate to 10 Hz, but rather allocates a new SeisData, leaving S unchanged.

Pre-Processing with RawData

SeisNoise.jl provides the RawData structure for pre-processing windows of ambient noise. The advantage of RawData over SeisData is data can be processed as a matrix. This is especially helpful when processing on the GPU. To create a RawData structure, we need to specify:

  • cc_len: length of each time window in seconds

  • cc_step: step between time windows in seconds

RawData structures can be created from either SeisData or SeisChannel:

cc_len = 1800 # time window length [s] = 30 minutes 
cc_step = 450 # step between time windows [s] = 7.5 minutes or 75% overlap 
R = RawData(S,cc_len,cc_step) # RawData struct
0.3s
RawData with 189 windows NAME: "CI.SDD..BHZ" ID: "2019-01-31" LOC: 0.0 N, 0.0 E, 0.0 m FS: 20.0 GAIN: 1.0 FREQMIN: 0.000555556 FREQMAX: 10.0 CC_LEN: 1800.0 CC_STEP: 450.0 WHITENED: false TIME_NORM: "" RESP: a0 1.0, f0 1.0, 0z, 0p MISC: 0 entries NOTES: 9 entries T: 2019-01-31T00:00:00 … X: 36000×189 Array{Float32,2}

Ambient noise data is accessible via the .x field, while .t gives the start time of each window in UNIX time. Here, we'll filter the RawData between 0.1 and 0.3 Hz, apply clipping and then whiten the data between 0.12 and 0.28 Hz.

freqmin,freqmax = 0.1,0.3 # min/max freqs [Hz]
detrend!(R) # detrend each window 
taper!(R) # taper 
bandpass!(R,freqmin,freqmax)
clip!(R,3) # clip data at 3x RMS of each window
whitemin, whitemax = 0.12, 0.28 # whiten min/max [Hz]
whiten!(R,whitemin,whitemax) # apply whitening
0.7s

Fourier Transforms with FFTData

Once the RawData is pre-processed, we take the Fast Fourier Transform (fft) to convert to the frequency domain. In SeisNoise.jl the rfft function computes the real Fourier transform of the data in a RawData structure and returns a FFTData structure. We use the real fft because it is 2-3x faster than a regular fft ambient noise data.

FFT = rfft(R)
0.6s
FFTData with 189 ffts NAME: "CI.SDD..BHZ" ID: "2019-01-31" LOC: 0.0 N, 0.0 E, 0.0 m FS: 20.0 GAIN: 1.0 FREQMIN: 0.12 FREQMAX: 0.28 CC_LEN: 1800.0 CC_STEP: 450.0 WHITENED: true TIME_NORM: "" RESP: a0 1.0, f0 1.0, 0z, 0p MISC: 0 entries NOTES: 8 entries T: 2019-01-31T00:00:00 … FFT: 18001×189 Array{Complex{Float32},…

FFTData allow for spectral smoothing or whitening before cross-correlating

whiten!(FFT,whitemin,whitemax)
0.5s

Cross-correlating with SeisNoise

Cross-correlating with Seisnoise is done with the correlate function. correlate accepts two FFTData structures and a maxlag value - the maximum number of seconds of coda to return, as input. Here we'll input the same FFTData twice, giving an autocorrelation.

maxlag = 60. # maximum lag time in correlation
C = correlate(FFT,FFT,maxlag)
0.3s
CorrData with 189 Corrs NAME: "CI.SDD..BHZ.CI.SDD..BHZ" ID: "2019-01-31" LOC: 0.0 N, 0.0 E, 0.0 m COMP: "ZZ" ROTATED: false CORR_TYPE: "CC" FS: 20.0 GAIN: 1.0 FREQMIN: 0.12 FREQMAX: 0.28 CC_LEN: 1800.0 CC_STEP: 450.0 WHITENED: true TIME_NORM: "" RESP: a0 1.0, f0 1.0, 0z, 0p MISC: 0 entries NOTES: 8 entries DIST: 0.0 AZI: 0.0 BAZ: 0.0 MAXLAG: 60.0 T: 2019-01-31T00:00:00 … CORR: 2401×189 Array{Float32,2}

Note that correlation data is stored in the .corr field. CorrData have a few new parameters, namely

  • comp: cross-correlation component (e.g. "ZZ","EN","RT")

  • rotated: true/false for if the CorrData has been rotated to ZRT reference frame

  • corr_type: Type of correlation: `cross-correlation`, `coherence` or

    `deconv`.

  • dist: Distance between stations in km

  • azi: Azimuth from first station to second station

  • baz: Backazimuth from first station to second station

  • maxlag: The maximum number of seconds of coda

Post-Processing

The most common post-processing is stacking. The SeisNoise.jl stack function can stack arbitrary time periods. The default is to stack by day:

stack(C)
0.2s
CorrData with 1 Corrs NAME: "CI.SDD..BHZ.CI.SDD..BHZ" ID: "2019-01-31" LOC: 0.0 N, 0.0 E, 0.0 m COMP: "ZZ" ROTATED: false CORR_TYPE: "CC" FS: 20.0 GAIN: 1.0 FREQMIN: 0.12 FREQMAX: 0.28 CC_LEN: 1800.0 CC_STEP: 450.0 WHITENED: true TIME_NORM: "" RESP: a0 1.0, f0 1.0, 0z, 0p MISC: 0 entries NOTES: 8 entries DIST: 0.0 AZI: 0.0 BAZ: 0.0 MAXLAG: 60.0 T: 2019-01-31T00:00:00 … CORR: 2401×1 Array{Float32,2}

But you can specify the stack interval from seconds, hours, days, months, years, etc..

stack(C,interval=Minute(30))
stack(C,interval=Hour(2))
0.2s
CorrData with 12 Corrs NAME: "CI.SDD..BHZ.CI.SDD..BHZ" ID: "2019-01-31" LOC: 0.0 N, 0.0 E, 0.0 m COMP: "ZZ" ROTATED: false CORR_TYPE: "CC" FS: 20.0 GAIN: 1.0 FREQMIN: 0.12 FREQMAX: 0.28 CC_LEN: 1800.0 CC_STEP: 450.0 WHITENED: true TIME_NORM: "" RESP: a0 1.0, f0 1.0, 0z, 0p MISC: 0 entries NOTES: 8 entries DIST: 0.0 AZI: 0.0 BAZ: 0.0 MAXLAG: 60.0 T: 2019-01-31T00:00:00 … CORR: 2401×12 Array{Float32,2}

SeisNoise.jl with GPUs

Why Graphical Processing Units (GPUs)?

GPU have many cores vs few cores on a PC. GPU throughput is often an order of magnitude greater than CPU throughput. This is good for linear algebra + computationally intensive tasks aka Ambient Noise Cross Correlation.

GPU Computing in Julia

We use the CUDA.jl package. CUDA.jl makes it easy to move data onto NVIDIA GPUs using the cu function

A = cu(rand(Float32,10,10))
0.2s
10×10 CuArray{Float32,2,Nothing}: 0.877502 0.796006 0.0489516 0.24144 … 0.318584 0.387789 0.407335 0.106569 0.458002 0.0622241 0.71611 0.23505 0.0230458 0.133288 0.611094 0.898317 0.225385 0.513265 0.837151 0.141588 0.644944 0.190527 0.471491 0.469424 0.0547967 0.885947 0.424645 0.0738708 0.343791 0.300269 0.0389909 0.0665386 0.745775 0.924983 0.902264 0.567523 0.0552524 0.379971 0.922858 … 0.620628 0.860605 0.481297 0.798965 0.161512 0.011459 0.448125 0.196272 0.200665 0.943493 0.506931 0.48542 0.630714 0.190289 0.486066 0.497644 0.691146 0.762641 0.688526 0.138963 0.956698 0.20216 0.583944 0.949885 0.508377 0.554122 0.832786 0.24891 0.457112 0.841514 0.239463

Array A is on the GPU. We can do most array operations with A, such as

A + A # add A to itself 
A .+ 2 # add 2 to each element
A .- 4 # subtract 4 from each element
A * A # multiply A with itself
A .* A # square each element
0.2s
10×10 CuArray{Float32,2,Nothing}: 0.77001 0.633626 0.00239626 … 0.101496 0.15038 0.165921 0.0113569 0.209766 0.00387184 0.0552484 0.000531108 0.0177656 0.373436 0.806974 0.0507982 0.700821 0.0200471 0.415953 0.0363007 0.222304 0.220359 0.784901 0.180323 0.00545689 0.118192 0.0901615 0.00152029 0.55618 0.855594 0.81408 0.322082 0.00305283 0.144378 … 0.385179 0.74064 0.231647 0.638345 0.026086 0.000131309 0.0385228 0.0402664 0.890179 0.256979 0.235633 0.3978 0.23626 0.247649 0.477683 0.581622 0.474068 0.0193106 0.0408689 0.340991 0.902282 0.258447 0.307051 0.693532 0.208951 0.708146 0.0573426

Let's look at the time difference between matrix multiplication on the CPU and GPU

B = rand(Float32,1000,1000) # this array is on the CPU 
C = cu(B); # on the GPU 
b = @benchmark B * B # benchmark on the CPU
c = @benchmark CUDA.@sync C * C # benchmark on the GPU 
ratio(minimum(b),minimum(c))
26.7s
BenchmarkTools.TrialRatio: time: 14.7089 gctime: 1.0 memory: 7353.09 allocs: 0.1

Matrix multiplication is ~14x faster on the GPU! Let's leverage this power for ambient noise cross-correlation in the next section.

GPUs + SeisNoise

GPU computing with SeisNoise.jl is relatively easy - there's no need to write any GPU code. Use the gpu function to move data stored in RAM on the CPU to the GPU

RGPU = R |> gpu 
0.2s
RawData with 189 windows NAME: "CI.SDD..BHZ" ID: "2019-01-31" LOC: 0.0 N, 0.0 E, 0.0 m FS: 20.0 GAIN: 1.0 FREQMIN: 0.000555556 FREQMAX: 10.0 CC_LEN: 1800.0 CC_STEP: 450.0 WHITENED: false TIME_NORM: "" RESP: a0 1.0, f0 1.0, 0z, 0p MISC: 0 entries NOTES: 9 entries T: 2019-01-31T00:00:00 … X: 36000×189 CuArray{Float32,2,Nothi…

The RawData structure RGPU is now on the GPU! (in actuality, only R.x) is on the GPU. We can apply the same processing as before but on the GPU

detrend!(RGPU) # detrend each window 
taper!(RGPU) # taper 
bandpass!(RGPU,freqmin,freqmax)
clip!(RGPU,3) # clip data at 3x RMS of each window
whitemin, whitemax = 0.12, 0.28 # whiten min/max [Hz]
whiten!(RGPU,whitemin,whitemax) # apply whitening
0.2s

Let's look at the performance of detrending on the CPU and GPU

dCPU = @benchmark detrend!(R)
dGPU = @benchmark detrend!(RGPU)
ratio(minimum(dCPU),minimum(dGPU))
27.2s
BenchmarkTools.TrialRatio: time: 23.86 gctime: 1.0 memory: 2801.92 allocs: 0.0576923

It's 23x faster! Taking an fft of a RawData works similarly on the GPU

FFTGPU = rfft(RGPU)
0.3s
FFTData with 189 ffts NAME: "CI.SDD..BHZ" ID: "2019-01-31" LOC: 0.0 N, 0.0 E, 0.0 m FS: 20.0 GAIN: 1.0 FREQMIN: 0.000555556 FREQMAX: 10.0 CC_LEN: 1800.0 CC_STEP: 450.0 WHITENED: false TIME_NORM: "" RESP: a0 1.0, f0 1.0, 0z, 0p MISC: 0 entries NOTES: 9 entries T: 2019-01-31T00:00:00 … FFT: 18001×189 CuArray{Complex{Float32…

Once we have ffts on the GPU, we can cross-correlate on the GPU.

CGPU = correlate(FFTGPU,FFTGPU,maxlag)
2.0s
CorrData with 189 Corrs NAME: "CI.SDD..BHZ.CI.SDD..BHZ" ID: "2019-01-31" LOC: 0.0 N, 0.0 E, 0.0 m COMP: "ZZ" ROTATED: false CORR_TYPE: "CC" FS: 20.0 GAIN: 1.0 FREQMIN: 0.000555556 FREQMAX: 10.0 CC_LEN: 1800.0 CC_STEP: 450.0 WHITENED: false TIME_NORM: "" RESP: a0 1.0, f0 1.0, 0z, 0p MISC: 0 entries NOTES: 9 entries DIST: 0.0 AZI: 0.0 BAZ: 0.0 MAXLAG: 60.0 T: 2019-01-31T00:00:00 … CORR: 2401×189 CuArray{Float32,2,Nothin…

Full GPU Cross-correlation workflow

Here is a complete cross-correlation routine on the GPU. Most of the time is spent loading data from Amazon Web Services. The only difference to a CPU-based processing routine is calling |> gpu to send waveform data to the GPU and |> cpu to send correlation back to the CPU.

# load SCEDC data from AWS 
S1 = S = read_data("/scedc-pds/continuous_waveforms/2019/2019_031/CISDD__BHZ___2019031.ms")
S2 = S = read_data("/scedc-pds/continuous_waveforms/2019/2019_031/CIPER__BHZ___2019031.ms")
# convert SeisData to RawData and send to GPU
R1 = RawData(S1, cc_len, cc_step) |> gpu
R2 = RawData(S2, cc_len, cc_step) |> gpu
R = [R1,R2]
# preprocess on the GPU
detrend!.(R)
taper!.(R)
bandpass!.(R,freqmin,freqmax,zerophase=true)
# Real FFT on GPU
FFT = rfft.(R)
whiten!.(FFT,freqmin,freqmax)
# compute correlation and send to cpu
C = correlate(FFT[1],FFT[2],maxlag) |> cpu
4.8s
CorrData with 188 Corrs NAME: "CI.SDD..BHZ.CI.PER..BHZ" ID: "2019-01-31" LOC: 0.0 N, 0.0 E, 0.0 m COMP: "ZZ" ROTATED: false CORR_TYPE: "CC" FS: 40.0 GAIN: 1.0 FREQMIN: 0.1 FREQMAX: 0.3 CC_LEN: 1800.0 CC_STEP: 450.0 WHITENED: true TIME_NORM: "" RESP: a0 1.0, f0 1.0, 0z, 0p MISC: 0 entries NOTES: 2 entries DIST: 0.0 AZI: 0.0 BAZ: 0.0 MAXLAG: 60.0 T: 2019-01-31T00:07:30 … CORR: 4801×188 Array{Float32,2}

Now let's plot the correlation:

corrplot(C)
23.0s

Extending SeisNoise.jl

Let's say you want to make your own function to work with RawData, FFTData, or CorrData structures. We recommend using multiple dispatch to write kernel functions for computations on Arrays and then writing higher-level functions that take RawData, FFTData, or CorrData as input, then call kernel functions. Here's an example to double the amplitude of data:

# here is a kernel function 
function double!(A::AbstractArray{T}) where T <: Real# here T is the element type 
  A .*= T(2)
  return nothing 
end
# here is function for RawData
double!(R::RawData) = double!(R.x)
# here is function for FFTData
double!(F::FFTData) = double!(F.fft)
# and here is CorrData
double!(C::CorrData) = double!(C.corr)
0.6s
double! (generic function with 4 methods)

There are now 4 different versions of double! . Julia will pick the correct one depending on input:

A = collect(1:4)
double!(A)
A
0.9s
4-element Array{Int64,1}: 2 4 6 8

And now we'll try it on a RawData structure

R = RawData() # empty structure
R.x = reshape(collect(1.:9),3,3) |> Array
double!(R)
R.x
1.0s
3×3 Array{Float64,2}: 2.0 8.0 14.0 4.0 10.0 16.0 6.0 12.0 18.0

If you're interested in what's going on here, we can use the @code_lowered to show the byte code involved

@code_lowered double!(R)
0.3s
CodeInfo( 1 ─ %1 = Base.getproperty(R, :x) │ %2 = Main.double!(%1) └── return %2 )

double!(R) first gets R.x , then calls the kernel version of double! , as shown below

@code_lowered double!(A)
0.2s
CodeInfo( 1 ─ %1 = ($(Expr(:static_parameter, 1)))(2) │ %2 = Base.broadcasted(Main.:*, A, %1) │ Base.materialize!(A, %2) └── return Main.nothing )

which is just a broadcast operation. If you've made it this far and are interested in contributing, take a look at the SeisNoise.jl source code on Github. Happy correlating!

Runtimes (1)