[Scedc_users] SCEDC Newsletter - Volume 1, Issue 3
scedc_users at hungabee.gps.caltech.edu
scedc_users at hungabee.gps.caltech.edu
Thu Jul 29 11:12:26 PDT 2004
Welcome to the third issue of the Southern California Earthquake Data
Center's electronic newsletter. We produce this quarterly newsletter as part
of our continuing efforts to make SCEDC data more accessible to our users,
to improve our communication and outreach and to promote the tools and
services we provide.
This newsletter will be archived at: www.data.scec.org/about/chronicle/.
Please send your questions, comments, suggestions to:
webmgr at quakedc.gps.caltech.edu.
Contents:
A. The Archive
B. What's new with STP (Seismic Transfer Program)?
C. Dataless SEED volumes now Available at the SCEDC
D. DHI at the SCEDC
E. Highlight: The SCEDC/SCSN Database System
F. Strong-Motion Naming and Aliases
G. USArray - BigFoot
A. The Archive
The Archive: By the Numbers
Total size of the waveform archive: 3,449 GB
Size of SCEDC parametric and waveform database: 235,647,421 rows
For the period of April 1 - June 30, 2004:
Data transferred via STP:
* 1,695,677 waveforms = average of 18,768 waveforms daily.
* 204 gigabytes of waveform data = average of 2,250 megabytes daily =
26 kilobytes per second.
The SCEDC archived:
* 4,045 events
* 935,082 waveforms
* 70,356 arrivals
* 195,284 amplitudes
magnitude Number of local events (le):
----------------------------------------------------
0-1 1236
1-2 1801
2-3 241
3-4 23
4-5 1
5-6 1
# events: event type
---------------------------
3304 le (local event)
191 qb (quarry blast)
323 re (regional event)
26 sn (sonic blast)
1 st (subnet trigger)
200 ts (teleseism)
-----------------------------------------
4045 Total
Six month summary of requests for catalog information:
Jan 102,347
Feb 84,199
Mar 154,047
Apr 96,764
May 45,025
Jun 103,796
-------------------------------------
Total: 586,178
Continuous Archiving of High-Sample Rate Data
The SCEDC continuously archived 7 hours of HH_, HL_ (80 sps) and EH_, EL_
(100 sps) data from the entire CI and AZ array for the June 15th, 2004
magnitude 5.3 offshore event (42 miles SE of San Clemente Island; EVID:
14065544).
More information on this topic is available at
http://www.data.scec.org/about/sigeventsshot.html
B. What's new with STP (Seismic Transfer Program)?
New STP Client - Version 1.4 for Windows
In response to requests from the user community, we have recently released a
STP console client for Windows. This client is virtually identical to the
UNIX and Linux versions, but it operates in the Windows environment and
allows users to download SCEDC data directly onto their PC.
To get STP 1.4 for Windows:
1. go to http://www.data.scec.org/ftp/programs/stp/
2. left click on stp.exe
3. save file to disk
4. double-click on stp.exe
Any data downloaded will be saved into the same directory that you run
stp.exe from. As an example, try:
STP> PHASE -f northridge.txt -e 3144585
This command will save a file (the -f command) called northridge.txt
containing phase information for the event (the -e command) 3144585, the
event ID for the 6.7 Northridge event to your working directory.
The SCEDC has also developed a GUI version of STP that runs on Windows. To
get the GUI version of STP 1.4 for Windows:
1. go to http://www.data.scec.org/ftp/programs/stp/
2. left click on stp_gui.exe
3. save file to disk
4. double-click on stp_gui.exe
The Windows console client version functions similar to the UNIX and Linux
client version, while the GUI version looks similar to the Java version of
STP that runs on the SCEDC website. If you experience any difficulties using
either client program, please let us know by emailing:
mullaney at gps.caltech.edu.
Differences between the Windows and UNIX/Linux Client Versions.
Although the Windows client version of STP looks and works almost exactly
the same as the UNIX and Linux client versions, and most of the code has
remained the same in modifying STP for Windows, there is a significant
difference in the way the programs function. In the UNIX and Linux versions
of STP, the client communicates with the server by sending to, and receiving
files from the server. Because this method would not work on Windows, it was
necessary to use raw socket functions to communicate with the server
instead. For STP to work on a Windows platform, it was necessary to set up a
Windows Socket (Winsock), which creates a network programming interface for
Windows.
C. Dataless SEED volumes now Available at the SCEDC
The SCEDC and SCSN have cooperated to complete the production of station
metadata in the form of dataless SEED volumes for the present configuration
of all currently-active SCSN broadband stations. A listing of the stations
available and links to the volumes are available from the SCEDC website at
http://www.data.scec.org/stations/seed/dl_seed.php. This effort is being
expanded to provide a complete station history for all SCSN stations.
Users can download individual dataless SEED volumes (format:
datlaless.STANAME) from the Data Center's anonymous FTP site at:
scec.gps.caltech.edu from /pub/stations/seed/ or via the web at:
http://www.data.scec.org/ftp/stations/seed/. A compressed file containing
all volumes (CI.dataless.gz) is available from the same location. ASCII RESP
files are also available for individual stations and channels at the
anonymous FTP site from /pub/stations/response/ or via the web at:
http://www.data.scec.org/ftp/stations/response/.
D. DHI at the SCEDC
Work is currently underway to install a Data Handling Interface at the
SCEDC. The Data Handling Interface (DHI) provides well-defined standardized
methods to remotely access information from the SCEDC and other data centers
worldwide. The DHI can be thought of as an Application Programming Interface
(API) that can be used as a well-specified, standardized interface to any
seismic data center. There are three different DHI servers being installed
at the SCEDC: a Network Information Server (Station/Channel/Response
information), a Seismogram Server, and an Event Server. The Network server
is installed and running and the Seismogram Server is in the final testing
stages. Once the Seismogram Server is installed, work will begin on the
Event Server installation.
The DHI Servers are an offshoot of the FISSURES project supported by the
IRIS DMS. FISSURES uses the distributed computing technology CORBA (Common
Object Request Broker Architecture) to allow software systems to work across
the Internet in a platform-independent and computer-language neutral manner.
In the DHI, CORBA manages the socket connections, creating robust, reliable
connections between clients and servers. By writing clients that can access
information from a DHI server, one may easily access similar information
from any data center that has DHI servers installed. Currently, DHI servers
are running at the IRIS DMC, the NCEDC and the University of South Carolina.
For more information about the SCEDC DHI servers, please refer to:
http://www.data.scec.org/research/DHI.html. This page will be updated as the
status of the DHI servers progresses. General information about the DHI
project is available directly from IRIS at: http://www.iris.edu/DHI/.
This work is supported by the IRIS DMS as part of its role in the NSF-funded
SCEC-ITR project and has been facilitated by the prior efforts of the IRIS
DMC, the NCEDC and the University of South Carolina.
E. Highlight: The SCEDC/SCSN Database System
The SCEDC Oracle 9i database is part of a database system that is used by
the Data Center and the Southern California Seismic Network's Real-Time
System (RTS). SCSN data is processed by the RTS and events and supporting
parametric information are immediately copied to the SCEDC database.
Therefore, in addition to providing long-term storage and catalog
information for the SCSN, the SCEDC database is also the source of
information for network alarming, post-processing analysis and applications
such as ShakeMap immediately following an earthquake.
The database system was designed as part of the TriNet project in 1999 with
the following fundamental requirements:
* Data from the RTS would be available to the archive in near-real
time.
* The system must operate 24/7, with unavailability due to maintenance
or failures in software, hardware, and network connectivity minimized.
* Rapid query access from a very large data set that includes events,
locations, arrivals, amplitudes, codas and waveforms for southern California
from 1932-present.
To achieve these design goals, our system is set up as follows:
* The RTS has two servers, each with its own local database: one is
primary, the other operates as a shadow. Event information from the RTS
databases are replicated to the SCEDC databases within 4 seconds and
applications accessing the SCEDC database use data generated by the primary
system.
* The SCEDC has two independent databases on two separate servers that
are continually synchronized with one another.
* The two sets of RTS and SCEDC servers are housed in separate
buildings: the USGS building in Pasadena and in the Seismo Lab at Caltech.
The systems can operate independently from either site.
* The most common queries done on the database are for parameters of
most the recent earthquakes (magnitude, location, time) and associated
waveforms. The most frequent queries are done by SCEDC/SCSN internal
applications, which poll the database at regular intervals to get the most
up-to-date information about new events. Other common queries are catalog
searches made by the public and researchers, either through the web catalog
on the SCEDC website or STP. These queries also request parametric and
waveform data, but may span over a long period of time. As a result, the
database schema has been specifically designed to optimize for both types of
searches and a number of database indexes have created to increase
performance. In fact, indexes account for 43% of space used by objects in
the SCEDC database.
Why use Oracle?
* Caltech owns an Oracle Enterprise site license that provides the
database server software and Advanced Replication feature. The SCEDC pays
Oracle directly for licensing the partitioning feature.
* Oracle allows objects such as tables and indexes to be partitioned,
i.e., objects are divided into smaller, more manageable portions. The SCEDC
database is currently is 49 Gigabytes and the largest table (waveform) is 9
Gigabytes. The tables which contain waveform, amplitude, and arrival data,
are partitioned by year, so users can query the entire table, or they can
reference a smaller piece of the table, which significantly improves
performance. Partitioning is also a method to reduce maintenance because
administration can be focused on particular portions of tables, dividing the
maintenance process into more manageable segments.
* Oracle database software with Advanced Replication allows our system
to have multiple, continually-synchronized databases. Oracle also provides
stored procedures and integration with the Java which is used by
post-processing applications. Further information on Advanced Replication is
included at the end of this article.
Looking to the future, we are exploring a switch to an open-source database
system such as MySQL for our main systems. Clearly, migrating to a system
with the same functionality and performance at a fraction of the cost is
desirable and we have been impressed by the speed of MySQL in our
performance tests. However, the current production release of MySQL (4.0)
lacks a number of features that are used heavily by our system:
Multi-master replication
Stored procedures
Views
Triggers
Sequences
Many of these features are slated to be included in the 5.1 release, in the
meantime we continue to monitor for new developments, including PostgreSQL
(pgSQL).
Oracle Advanced Replication:
The SCEDC/SCSN database system uses Oracle's Advanced Replication feature to
replicate data among four databases. Each of our databases have a separate
copy of the data... When any transactional statement (such as inserts,
updates or deletes) is done on the database, it sends these instructions to
the other databases for them to perform on their data.
The system employs both one-way and two-way (also known as "multi-master")
replication. The RTS-to-SCEDC replication is one-way: the source database
(RTS) pushes the data to the target database, but does not receive updates
from the SCEDC databases. Data on the RTS are kept for one week before they
are purged. The SCEDC archive databases use two-way, multi-master
replication to push updates from either database to the other i.e., the
target database is also a source, so the two databases are synchronized.
Advanced Replication can be thought of as a collection of tables, stored
procedures, and triggers in the database. When a transactional statement, an
insert for example, is executed on a replicated table, it sets off a trigger
(a program stored inside the database), that instructs the database to store
all necessary information to execute the original insert statement into a
queue which is also stored inside the database. At regular intervals (every
4 seconds), an Oracle job is executed to look for any outstanding
transactions in the queue. If any are found, they are pushed to the remote
database site. If this push is successful, the database marks the
transaction as sent. Another Oracle job (executed at every 10 minutes) then
removes all sent transactions.
Benefits:
* The ability to have two database archives that are continually
synchronized allows the Data Center to load-balance applications which
provides better performance.
* Having two independent databases on separate servers allows for the
possibility of failover if one database should become unavailable. Having
replicated data means that each database has its own copy of the data, so if
the database becomes disconnected from the system (e.g., in a network
outage) the local database objects are still accessible.
* By storing transactions in a queue, the system also has the ability
to send these transactions to the target database at a later time, allowing
the target database to resynchronize gracefully. Because the process of
manually synchronizing databases can be time consuming, this functionality
has proven to be very useful when a database becomes unavailable due to
maintenance or unforeseen failure.
* The Advanced Replication feature allows for interoperability, which
means that the databases and servers do not have to be at the same version
level or operating system (within limits). This allows the DBA flexibility
in upgrading database versions and flexibility in choosing operating
systems. For example, the SCEDC is currently testing an Oracle 10g
development database on a Linux platform within our Oracle 9i Solaris
system. It is also fairly easy to add additional databases and/or replicated
objects in this system. For example, as part of our efforts to integrate
with the NCEDC in Berkeley, the SCEDC is using this method of replication to
share station data.
Costs:
* Synchronizing databases every 4 seconds requires a substantial
amount of database resource overhead. Especially costly are large batch
operations where several millions rows are affected. Although normal
transactions involving the seismic network never exceed 10,000 rows per
transaction, activities such as legacy data migration or data quality
control can severely impact system performance and they are usually done
with replication temporarily suspended.
* There is also added administrative cost needed to maintain
replication triggers and stored procedures. Simple maintenance operations on
a single database, such as altering table structure, become significantly
more complicated within a replicated environment. Failure to run these
procedures properly can result in the object being unavailable for update on
all databases, not just the original target database.
F. Strong-Motion Naming and Aliases
The SCEDC archives strong-motion data from the National Strong-Motion
Program (NSMP; network code NP) and the California Strong-Motion
Instrumentation Program (CSMIP; network code CE). These organizations
identify their stations with a numerical code which previously could not be
processed by the SCSN real-time and post-processing systems. To work around
this problem, the SCSN assigned an alias to each of these stations until a
method of processing numerical station-names was developed.
A solution was recently implemented by the SCSN and most new strong-motion
data is now available under the numerical name assigned by its originating
network. In the short-term, users will need to be aware of both the
numerical name and the alias applied by the SCSN. In the future, we aim to
serve the data only under its numerical station identifier. The list of
aliases is available at: http://www.data.scec.org/stations/stamapping.html
NET ALIAS NUMBER LOCATION DESCRIPTION
CE 400K 24400 East Los Angeles, Obregon Park
CE G405 14405 Rolling Hills Estates, Vista School
CE J732 23732 San Bernardino, Devils Canyon Rd.
CE K851 24851 Los Angeles, W. 3rd & Cloverdale
CE K853 24853 Los Angeles, W. Temple & N. Virgil
NP BBA 5398 Burbank, Burbank Airport
NP BBB 5271 Bombay Beach, Hwy 111
NP BVH 5402 Beverly Hills, Civic Ctr and Foothill
NP CAB 5404 Calabasas, Pk Sorrento and Pk Granada
NP FLL 5401 Fillmore, Santa Clara & Chamberburg Rd
NP GRF 141 Los Angeles, Griffith Observatory
NP JAB 655 Sylmar, Balboa Blvd.
NP JFP 655 Sylmar, Balboa Blvd.
NP JGB 655 Sylmar, Balboa Blvd.
NP LAX 5399 Los Angeles International Airport
NP LT2 5030 Little Rock, Off Pearblossom Hwy (138)
NP OKV 5403 Oak View, Hwy. 33
NP SSW 5062 Calpatria, Salton Sea Wild Life Refuge
NP TCF 5081 Fernwood, Topanga Canyon Blvd.
G. USArray - BigFoot
The transportable array component of USArray ("Bigfoot") formally began
operation in California in January, 2004 and will stay until 2007. The
southern California contribution to USArray includes the 40
currently-operating SCSN broadband stations listed below. The 40 sps BH_
data from these stations will be transmitted from the SCSN facility in
Pasadena to both the Array Network Facility (ANF) and the IRIS Data
Management Center (DMC) for archiving.
SCSN stations contributing to USArray:
STA Station Name Latitude Longitude Datalogger
ARV Arvin 35.1269 -118.83009 Q330
BBR Big Bear Solar Observatory 34.2623 -116.92075
Q730
BC3 Big Chuckawalla Mountains 33.65515 -115.45366
Q4120
BCC Bear Creek Country Club 33.57508 -117.26119
Q730
BEL Belle Mountain 34.0006 -115.9982 Q730
BFS Mt. Baldy Ranger Station 34.237 -117.6582
Q730
CIA Catalina Island Airport 33.40186 -118.41372
Q4120
CWC Cottonwood Creek 36.43988 -118.08016 Q680
DAN Danby 34.63745 -115.38115 Q4120
DEC Green Verdugo Microwave 34.25353 -118.33383
Q730
DVT Desert View Tower 32.65915 -116.10061 Q730
EDW2 Edwards Air Force Base 2 34.8811 -117.99388
Q330
FMP Fort Macarthur Park 33.71264 -118.29381 Q730
FUR Furnace Creek 36.46703 -116.86322 Q4120
GLA Glamis 33.05149 -114.82706 Q980
GRA Grapevine Ranger Station 36.99608 -117.36621
Q730
GSC Goldstone 35.30177 -116.80574 Q4120
HEC Hector 34.8294 -116.335 Q4120
IRM Iron Mountain Pumping Station 34.15738 -115.14513
Q4120
ISA Isabella 35.66278 -118.47403 Q4120
LGU Laguna Peak 34.10819 -119.06587 Q4120
LRL Laurel Mountain 35.47954 -117.68212 Q4120
MPM Manual Prospect Mine 36.05799 -117.48901 Q330
MPP McPherson Peak 34.88848 -119.81362 Q730
NEE NEEDLES 34.82482 -114.59942 Q980
OSI Osito Audit 34.6145 -118.7235 Q980
PDM Parker Dam 34.30336 -114.14152 Q4120
RCT Rector 36.30523 -119.243842 Q730
RRX Barstow Service Center 34.87533 -116.99684
Q4120
SBC Santa Barbara 34.44076 -119.71492 Q680
SCI2 San Clemente Island 2 32.9799 -118.54697 Q330
SCZ2 Santa Cruz Island 2 33.99543 -119.6351 Q330
SDP Sudden Peak 34.56547 -120.50137 Q730
SHO Shoshone 35.89953 -116.2753 Q4120
SMM Simmler 35.3142 -119.99581 Q730
SNCC San Nicolas Island 33.248 -119.524 Q980
SWS SAM W. STEWART 32.9408 -115.7958 Q4120
TIN Tinemaha 37.05422 -118.23009 Q4120
TUQ Turquoise Mountain 35.43584 -115.92389 Q4120
VES Vestal 35.84089 -119.08469 Q4120
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 35098 bytes
Desc: not available
Url : http://hungabee.gps.caltech.edu/pipermail/scedc_users/attachments/20040729/77f9f3df/winmail.bin
More information about the Scedc_users
mailing list