[Scedc_users] SCEDC Newsletter - Volume 1, Issue 3

scedc_users at hungabee.gps.caltech.edu scedc_users at hungabee.gps.caltech.edu
Thu Jul 29 11:12:26 PDT 2004


Welcome to the third issue of the Southern California Earthquake Data
Center's electronic newsletter. We produce this quarterly newsletter as part
of our continuing efforts to make SCEDC data more accessible to our users,
to improve our communication and outreach and to promote the tools and
services we provide. 

This newsletter will be archived at: www.data.scec.org/about/chronicle/.
Please send your questions, comments, suggestions to:
webmgr at quakedc.gps.caltech.edu. 

Contents:
		A. The Archive
		B. What's new with STP (Seismic Transfer Program)?
		C. Dataless SEED volumes now Available at the SCEDC
		D. DHI at the SCEDC
		E. Highlight: The SCEDC/SCSN Database System
		F. Strong-Motion Naming and Aliases
		G. USArray - BigFoot


A. The Archive

The Archive: By the Numbers

Total size of the waveform archive:	3,449 GB
Size of SCEDC parametric and waveform database: 	235,647,421 rows

For the period of April 1 - June 30, 2004:

Data transferred via STP:
*	1,695,677 waveforms = average of 18,768 waveforms daily.
*	204 gigabytes of waveform data = average of 2,250 megabytes daily =
26 kilobytes per second.

The SCEDC archived:
*	4,045 events
*	935,082 waveforms
*	70,356 arrivals
*	195,284 amplitudes

magnitude	Number of local events (le):
----------------------------------------------------
0-1	1236
1-2	1801
2-3	241
3-4	23
4-5	1
5-6	1

# events: 	event type
---------------------------
3304		le (local event)
191		qb (quarry blast)
323	 	re (regional event)
26		sn (sonic blast)
1		st (subnet trigger)
200		ts (teleseism)
-----------------------------------------
4045		Total


Six month summary of requests for catalog information:

Jan	102,347
Feb	84,199
Mar	154,047
Apr	96,764
May	45,025
Jun	103,796
-------------------------------------
Total:	586,178


Continuous Archiving of High-Sample Rate Data

The SCEDC continuously archived 7 hours of HH_, HL_ (80 sps) and EH_, EL_
(100 sps) data from the entire CI and AZ array for the June 15th, 2004
magnitude 5.3 offshore event (42 miles SE of San Clemente Island; EVID:
14065544).

More information on this topic is available at
http://www.data.scec.org/about/sigeventsshot.html


B. What's new with STP (Seismic Transfer Program)?

New STP Client - Version 1.4 for Windows

In response to requests from the user community, we have recently released a
STP console client for Windows. This client is virtually identical to the
UNIX and Linux versions, but it operates in the Windows environment and
allows users to download SCEDC data directly onto their PC.

To get STP 1.4 for Windows:

1. go to http://www.data.scec.org/ftp/programs/stp/ 
2. left click on stp.exe
3. save file to disk
4. double-click on stp.exe

Any data downloaded will be saved into the same directory that you run
stp.exe from. As an example, try:
STP> PHASE -f northridge.txt -e 3144585
This command will save a file (the -f command) called northridge.txt
containing phase information for the event (the -e command) 3144585, the
event ID for the 6.7 Northridge event to your working directory.

The SCEDC has also developed a GUI version of STP that runs on Windows. To
get the GUI version of STP 1.4 for Windows:

1. go to http://www.data.scec.org/ftp/programs/stp/ 
2. left click on stp_gui.exe
3. save file to disk
4. double-click on stp_gui.exe

The Windows console client version functions similar to the UNIX and Linux
client version, while the GUI version looks similar to the Java version of
STP that runs on the SCEDC website. If you experience any difficulties using
either client program, please let us know by emailing:
mullaney at gps.caltech.edu.

Differences between the Windows and UNIX/Linux Client Versions.

Although the Windows client version of STP looks and works almost exactly
the same as the UNIX and Linux client versions, and most of the code has
remained the same in modifying STP for Windows, there is a significant
difference in the way the programs function. In the UNIX and Linux versions
of STP, the client communicates with the server by sending to, and receiving
files from the server. Because this method would not work on Windows, it was
necessary to use raw socket functions to communicate with the server
instead. For STP to work on a Windows platform, it was necessary to set up a
Windows Socket (Winsock), which creates a network programming interface for
Windows. 


C. Dataless SEED volumes now Available at the SCEDC

The SCEDC and SCSN have cooperated to complete the production of station
metadata in the form of dataless SEED volumes for the present configuration
of all currently-active SCSN broadband stations. A listing of the stations
available and links to the volumes are available from the SCEDC website at
http://www.data.scec.org/stations/seed/dl_seed.php. This effort is being
expanded to provide a complete station history for all SCSN stations. 

Users can download individual dataless SEED volumes (format:
datlaless.STANAME) from the Data Center's anonymous FTP site at:
scec.gps.caltech.edu from /pub/stations/seed/ or via the web at:
http://www.data.scec.org/ftp/stations/seed/. A compressed file containing
all volumes (CI.dataless.gz) is available from the same location. ASCII RESP
files are also available for individual stations and channels at the
anonymous FTP site from /pub/stations/response/ or via the web at:
http://www.data.scec.org/ftp/stations/response/.


D. DHI at the SCEDC

Work is currently underway to install a Data Handling Interface at the
SCEDC. The Data Handling Interface (DHI) provides well-defined standardized
methods to remotely access information from the SCEDC and other data centers
worldwide. The DHI can be thought of as an Application Programming Interface
(API) that can be used as a well-specified, standardized interface to any
seismic data center. There are three different DHI servers being installed
at the SCEDC: a Network Information Server (Station/Channel/Response
information), a Seismogram Server, and an Event Server. The Network server
is installed and running and the Seismogram Server is in the final testing
stages. Once the Seismogram Server is installed, work will begin on the
Event Server installation.

The DHI Servers are an offshoot of the FISSURES project supported by the
IRIS DMS. FISSURES uses the distributed computing technology CORBA (Common
Object Request Broker Architecture) to allow software systems to work across
the Internet in a platform-independent and computer-language neutral manner.
In the DHI, CORBA manages the socket connections, creating robust, reliable
connections between clients and servers. By writing clients that can access
information from a DHI server, one may easily access similar information
from any data center that has DHI servers installed. Currently, DHI servers
are running at the IRIS DMC, the NCEDC and the University of South Carolina.

For more information about the SCEDC DHI servers, please refer to:
http://www.data.scec.org/research/DHI.html. This page will be updated as the
status of the DHI servers progresses. General information about the DHI
project is available directly from IRIS at: http://www.iris.edu/DHI/. 

This work is supported by the IRIS DMS as part of its role in the NSF-funded
SCEC-ITR project and has been facilitated by the prior efforts of the IRIS
DMC, the NCEDC and the University of South Carolina.


E. Highlight: The SCEDC/SCSN Database System

The SCEDC Oracle 9i database is part of a database system that is used by
the Data Center and the Southern California Seismic Network's Real-Time
System (RTS). SCSN data is processed by the RTS and events and supporting
parametric information are immediately copied to the SCEDC database.
Therefore, in addition to providing long-term storage and catalog
information for the SCSN, the SCEDC database is also the source of
information for network alarming, post-processing analysis and applications
such as ShakeMap immediately following an earthquake.

The database system was designed as part of the TriNet project in 1999 with
the following fundamental requirements: 

*	Data from the RTS would be available to the archive in near-real
time.
*	The system must operate 24/7, with unavailability due to maintenance
or failures in software, hardware, and network connectivity minimized.
*	Rapid query access from a very large data set that includes events,
locations, arrivals, amplitudes, codas and waveforms for southern California
from 1932-present.

To achieve these design goals, our system is set up as follows:

*	The RTS has two servers, each with its own local database: one is
primary, the other operates as a shadow. Event information from the RTS
databases are replicated to the SCEDC databases within 4 seconds and
applications accessing the SCEDC database use data generated by the primary
system. 
*	The SCEDC has two independent databases on two separate servers that
are continually synchronized with one another.
*	The two sets of RTS and SCEDC servers are housed in separate
buildings: the USGS building in Pasadena and in the Seismo Lab at Caltech.
The systems can operate independently from either site. 
*	The most common queries done on the database are for parameters of
most the recent earthquakes (magnitude, location, time) and associated
waveforms. The most frequent queries are done by SCEDC/SCSN internal
applications, which poll the database at regular intervals to get the most
up-to-date information about new events. Other common queries are catalog
searches made by the public and researchers, either through the web catalog
on the SCEDC website or STP. These queries also request parametric and
waveform data, but may span over a long period of time. As a result, the
database schema has been specifically designed to optimize for both types of
searches and a number of database indexes have created to increase
performance. In fact, indexes account for 43% of space used by objects in
the SCEDC database.

Why use Oracle?

*	Caltech owns an Oracle Enterprise site license that provides the
database server software and Advanced Replication feature. The SCEDC pays
Oracle directly for licensing the partitioning feature. 
*	Oracle allows objects such as tables and indexes to be partitioned,
i.e., objects are divided into smaller, more manageable portions. The SCEDC
database is currently is 49 Gigabytes and the largest table (waveform) is 9
Gigabytes. The tables which contain waveform, amplitude, and arrival data,
are partitioned by year, so users can query the entire table, or they can
reference a smaller piece of the table, which significantly improves
performance. Partitioning is also a method to reduce maintenance because
administration can be focused on particular portions of tables, dividing the
maintenance process into more manageable segments. 
*	Oracle database software with Advanced Replication allows our system
to have multiple, continually-synchronized databases. Oracle also provides
stored procedures and integration with the Java which is used by
post-processing applications. Further information on Advanced Replication is
included at the end of this article.

Looking to the future, we are exploring a switch to an open-source database
system such as MySQL for our main systems. Clearly, migrating to a system
with the same functionality and performance at a fraction of the cost is
desirable and we have been impressed by the speed of MySQL in our
performance tests.  However, the current production release of MySQL (4.0)
lacks a number of features that are used heavily by our system:

			Multi-master replication
			Stored procedures
			Views
			Triggers
			Sequences

Many of these features are slated to be included in the 5.1 release, in the
meantime we continue to monitor for new developments, including PostgreSQL
(pgSQL).


Oracle Advanced Replication:

The SCEDC/SCSN database system uses Oracle's Advanced Replication feature to
replicate data among four databases. Each of our databases have a separate
copy of the data... When any transactional statement (such as inserts,
updates or deletes) is done on the database, it sends these instructions to
the other databases for them to perform on their data. 

The system employs both one-way and two-way (also known as "multi-master")
replication. The RTS-to-SCEDC replication is one-way: the source database
(RTS) pushes the data to the target database, but does not receive updates
from the SCEDC databases. Data on the RTS are kept for one week before they
are purged. The SCEDC archive databases use two-way, multi-master
replication to push updates from either database to the other i.e., the
target database is also a source, so the two databases are synchronized. 

Advanced Replication can be thought of as a collection of tables, stored
procedures, and triggers in the database. When a transactional statement, an
insert for example, is executed on a replicated table, it sets off a trigger
(a program stored inside the database), that instructs the database to store
all necessary information to execute the original insert statement into a
queue which is also stored inside the database. At regular intervals (every
4 seconds), an Oracle job is executed to look for any outstanding
transactions in the queue. If any are found, they are pushed to the remote
database site. If this push is successful, the database marks the
transaction as sent. Another Oracle job (executed at every 10 minutes) then
removes all sent transactions.

Benefits:
*	The ability to have two database archives that are continually
synchronized allows the Data Center to load-balance applications which
provides better performance.
*	Having two independent databases on separate servers allows for the
possibility of failover if one database should become unavailable. Having
replicated data means that each database has its own copy of the data, so if
the database becomes disconnected from the system (e.g., in a network
outage) the local database objects are still accessible. 
*	By storing transactions in a queue, the system also has the ability
to send these transactions to the target database at a later time, allowing
the target database to resynchronize gracefully. Because the process of
manually synchronizing databases can be time consuming, this functionality
has proven to be very useful when a database becomes unavailable due to
maintenance or unforeseen failure. 
*	The Advanced Replication feature allows for interoperability, which
means that the databases and servers do not have to be at the same version
level or operating system (within limits). This allows the DBA flexibility
in upgrading database versions and flexibility in choosing operating
systems. For example, the SCEDC is currently testing an Oracle 10g
development database on a Linux platform within our Oracle 9i Solaris
system. It is also fairly easy to add additional databases and/or replicated
objects in this system. For example, as part of our efforts to integrate
with the NCEDC in Berkeley, the SCEDC is using this method of replication to
share station data. 

Costs:
*	Synchronizing databases every 4 seconds requires a substantial
amount of database resource overhead. Especially costly are large batch
operations where several millions rows are affected. Although normal
transactions involving the seismic network never exceed 10,000 rows per
transaction, activities such as legacy data migration or data quality
control can severely impact system performance and they are usually done
with replication temporarily suspended.
*	There is also added administrative cost needed to maintain
replication triggers and stored procedures. Simple maintenance operations on
a single database, such as altering table structure, become significantly
more complicated within a replicated environment. Failure to run these
procedures properly can result in the object being unavailable for update on
all databases, not just the original target database.


F. Strong-Motion Naming and Aliases

The SCEDC archives strong-motion data from the National Strong-Motion
Program (NSMP; network code NP) and the California Strong-Motion
Instrumentation Program (CSMIP; network code CE). These organizations
identify their stations with a numerical code which previously could not be
processed by the SCSN real-time and post-processing systems. To work around
this problem, the SCSN assigned an alias to each of these stations until a
method of processing numerical station-names was developed.  

A solution was recently implemented by the SCSN and most new strong-motion
data is now available under the numerical name assigned by its originating
network. In the short-term, users will need to be aware of both the
numerical name and the alias applied by the SCSN. In the future, we aim to
serve the data only under its numerical station identifier. The list of
aliases is available at: http://www.data.scec.org/stations/stamapping.html 

NET 	ALIAS 	NUMBER 	LOCATION DESCRIPTION 	
CE 	400K 	24400 	East Los Angeles, Obregon Park 	
CE 	G405 	14405 	Rolling Hills Estates, Vista School 	
CE 	J732 	23732 	San Bernardino, Devils Canyon Rd. 	
CE 	K851 	24851 	Los Angeles, W. 3rd & Cloverdale 	
CE 	K853 	24853 	Los Angeles, W. Temple & N. Virgil 	
NP 	BBA 	5398 	Burbank, Burbank Airport 	
NP 	BBB 	5271 	Bombay Beach, Hwy 111 	
NP 	BVH 	5402 	Beverly Hills, Civic Ctr and Foothill 	
NP 	CAB 	5404 	Calabasas, Pk Sorrento and Pk Granada 	
NP 	FLL 	5401 	Fillmore, Santa Clara & Chamberburg Rd 	
NP 	GRF 	141 	Los Angeles, Griffith Observatory 	
NP 	JAB 	655 	Sylmar, Balboa Blvd. 	
NP 	JFP 	655 	Sylmar, Balboa Blvd. 	
NP 	JGB 	655 	Sylmar, Balboa Blvd. 	
NP 	LAX 	5399 	Los Angeles International Airport 	
NP 	LT2 	5030 	Little Rock, Off Pearblossom Hwy (138) 	
NP 	OKV 	5403 	Oak View, Hwy. 33 	
NP 	SSW 	5062 	Calpatria, Salton Sea Wild Life Refuge 	
NP 	TCF 	5081 	Fernwood, Topanga Canyon Blvd. 	


G. USArray - BigFoot

The transportable array component of USArray ("Bigfoot") formally began
operation in California in January, 2004 and will stay until 2007. The
southern California contribution to USArray includes the 40
currently-operating SCSN broadband stations listed below. The 40 sps BH_
data from these stations will be transmitted from the SCSN facility in
Pasadena to both the Array Network Facility (ANF) and the IRIS Data
Management Center (DMC) for archiving.

SCSN stations contributing to USArray:

  STA 	  Station Name 	  Latitude 	  Longitude 	  Datalogger 	
 ARV 	 Arvin 	 35.1269 	 -118.83009 	 Q330 	
 BBR 	 Big Bear Solar Observatory 	 34.2623 	 -116.92075
Q730 	
 BC3 	 Big Chuckawalla Mountains 	 33.65515 	 -115.45366
Q4120 	
 BCC 	 Bear Creek Country Club 	 33.57508 	 -117.26119
Q730 	
 BEL 	 Belle Mountain 	 34.0006 	 -115.9982 	 Q730 	
 BFS 	 Mt. Baldy Ranger Station 	 34.237 	 -117.6582
Q730 	
 CIA 	 Catalina Island Airport 	 33.40186 	 -118.41372
Q4120 	
 CWC 	 Cottonwood Creek 	 36.43988 	 -118.08016 	 Q680 	
 DAN 	 Danby 	 34.63745 	 -115.38115 	 Q4120 	
 DEC 	 Green Verdugo Microwave 	 34.25353 	 -118.33383
Q730 	
 DVT 	 Desert View Tower 	 32.65915 	 -116.10061 	 Q730 	
 EDW2 	 Edwards Air Force Base 2 	 34.8811 	 -117.99388
Q330 	
 FMP 	 Fort Macarthur Park 	 33.71264 	 -118.29381 	 Q730 	
 FUR 	 Furnace Creek 	 36.46703 	 -116.86322 	 Q4120 	
 GLA 	 Glamis 	 33.05149 	 -114.82706 	 Q980 	
 GRA 	 Grapevine Ranger Station 	 36.99608 	 -117.36621
Q730 	
 GSC 	 Goldstone 	 35.30177 	 -116.80574 	 Q4120 	
 HEC 	 Hector 	 34.8294 	 -116.335 	 Q4120 	
 IRM 	 Iron Mountain Pumping Station 	 34.15738 	 -115.14513
Q4120 	
 ISA 	 Isabella 	 35.66278 	 -118.47403 	 Q4120 	
 LGU 	 Laguna Peak 	 34.10819 	 -119.06587 	 Q4120 	
 LRL 	 Laurel Mountain 	 35.47954 	 -117.68212 	 Q4120 	
 MPM 	 Manual Prospect Mine 	 36.05799 	 -117.48901 	 Q330 	
 MPP 	 McPherson Peak 	 34.88848 	 -119.81362 	 Q730 	
 NEE 	 NEEDLES 	 34.82482 	 -114.59942 	 Q980 	
 OSI 	 Osito Audit 	 34.6145 	 -118.7235 	 Q980 	
 PDM 	 Parker Dam 	 34.30336 	 -114.14152 	 Q4120 	
 RCT 	 Rector 	 36.30523 	 -119.243842 	 Q730 	
 RRX 	 Barstow Service Center 	 34.87533 	 -116.99684
Q4120 	
 SBC 	 Santa Barbara 	 34.44076 	 -119.71492 	 Q680 	
 SCI2 	 San Clemente Island 2 	 32.9799 	 -118.54697 	 Q330 	
 SCZ2 	 Santa Cruz Island 2 	 33.99543 	 -119.6351 	 Q330 	
 SDP 	 Sudden Peak 	 34.56547 	 -120.50137 	 Q730 	
 SHO 	 Shoshone 	 35.89953 	 -116.2753 	 Q4120 	
 SMM 	 Simmler 	 35.3142 	 -119.99581 	 Q730 	
 SNCC 	 San Nicolas Island 	 33.248 	 -119.524 	 Q980 	
 SWS 	 SAM W. STEWART 	 32.9408 	 -115.7958 	 Q4120 	
 TIN 	 Tinemaha 	 37.05422 	 -118.23009 	 Q4120 	
 TUQ 	 Turquoise Mountain 	 35.43584 	 -115.92389 	 Q4120 	
 VES 	 Vestal 	 35.84089 	 -119.08469 	 Q4120 	



-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 35098 bytes
Desc: not available
Url : http://hungabee.gps.caltech.edu/pipermail/scedc_users/attachments/20040729/77f9f3df/winmail.bin


More information about the Scedc_users mailing list