.ig >>
<STYLE TYPE="text/css">
<!--
        A:link{text-decoration:none}
        A:visited{text-decoration:none}
        A:active{text-decoration:none}
-->
</STYLE>
<title>ploticus: input data formats</title>
<body bgcolor=D0D0EE vlink=0000FF>
<br>
<br>
<center>
<table cellpadding=2 bgcolor=FFFFFF width=550 ><tr>
<td>
  <table cellpadding=2 width=550><tr>
  <td><br><h2>Input data formats</h2></td>
  <td align=right>
  <small>
  <a href="../doc/Welcome.html"><img src="../doc/ploticus.gif" border=0></a><br>
  <a href="../doc/Welcome.html">Welcome</a> &nbsp; &nbsp;
  <a href="../gallery/index.html">Gallery</a> &nbsp; &nbsp;
  <a href="../doc/Contents.html">Handbook</a> 
  <td></tr></table>
</td></tr>
<td>
<br>
<br>
.>>

.TH Input_data_formats PL "13-JUN-2002   PL ploticus.sourceforge.net"

.LP
Ploticus can read tabular ASCII data from files, commands, from the standard input, or 
data may be embedded in ploticus scripts.
If you're using prefabs, the \fBdata\fR parameter specifies the source (either file or standard input)
of data.
If you're writing scripts,
.ig >>
<a href="getdata.html">
.>>
\0proc getdata
.ig >>
</a>
.>>
is used to read or specify plotting data;
.ig >>
<a href="trailer.html">
.>>
\0proc trailer
.ig >>
</a>
.>>
may be used to place larger amounts of embedded plot data 
at the end of the script file, to get it out of the way.

.ig >>
<br><br><br>
.>>
.SH Plotting from data fields
.LP
Plotting and data display operations are done
using fields.  Suppose we have a data set like this:
.nf
	F1 2.43 0.47 "Jane Doe"   PF7955
	F2 2.79 0.28 "John Smith" PT2705
	F3 2.62 0.37 "Ken Brown"  PB2702
	F4 "" "" "Bud Flippner"   PX7205
.fi
We might draw a bar graph using the values in field 2,
and draw error bars using the values in field 3.
The bars could be labeled with the values in field 4, or
perhaps field 1.
.LP
Data fields may always be referenced by number, where the first is \fB1\fR.
For example, to produce a line plot using fields 1 and 2 of a data set
you might use the prefab command: \fCpl -prefab lines data=mydata x=1 y=2\fR.
Or the script equivlent:
.nf
	#proc lineplot
	  xfield: 1
	  yfield: 2
.fi
.LP
\fBNaming data fields:\fR
You may be able to reference data fields by name.
Sometimes data sets carry field names in the first row.  This is called a
\fBfield name header\fR.  If your data set has a field name header, you can
reference fields using those names (if you're using prefabs specify \fBheader=yes\fR;
if you're writing scripts set
.ig >>
<a href="getdata.html">
.>>
\0proc getdata fieldnameheader
.ig >>
</a>
.>>
to \fCyes\fR).
The field name header is expected to use the same delimitation as the rest of the data.
Here's an example:
.nf
	date	time	alevel	blevel
	020402	13:11	102	392
	020402	13:28	128	402
	...
.fi
You can also assign names explicitly in your ploticus script, by using one of the
.ig >>
<a href="getdata.html">
.>>
\0proc getdata fieldname
.ig >>
</a>
.>>
attributes.
Field names (whether from header or specified explicitly) are like variable names; they
cannot contain embedded white space, comma, or quote characters.
.LP
\fBMassaging data:\fR
If you are developing ploticus scripts, and your data exists in a state such that additional processing is
required in order to work with it, you may be able to accomplish the desired manipulation 
within ploticus.  
To select certain fields, reformat fields, concatenate fields, etc., try using a
.ig >>
<a href="getdata.html">
.>>
\0proc getdata filter.
.ig >>
</a>
.>>
To perform accumulation, tabulation and counting, rewriting
as percents, computation of totals, reversing record order,
rotation of row/column matrix, break processing, etc.,
.ig >>
<a href="processdata.html">
.>>
\0proc processdata
.ig >>
</a>
.>>
may be useful (it operates on the data after they have been read in).
.LP

.ig >>
<br><br><br>
.>>
.SH Recognized data formats
Data files or streams should be plain ASCII text, not binary, and should be organized as a
collection of rows having one or more fields.
Fields may have numeric or alphanumeric content and may be delimited in one of these ways:


.ig >>
<br><br><br>
.>>
.IP \(bu
\fBspacequoted\fR 
.br
.nf
	F1 2.43 0.47 "Jane Doe"   PF7955
	F2 2.79 0.28 "John Smith" PT2705
	F3 2.62 0.37 "Ken Brown"  PB2702
	F4 "" "" "Bud Flippner"   PX7205
.fi
Fields are delimited by one or more spaces or tabs.
Fields may be enclosed in double quotes ("), and such fields may have 
embedded white space.  Blank fields may be represented as shown.

.ig >>
<br><br><br>
.>>
.IP \(bu
\fBwhitespace\fR 
.br
.nf
	F1 2.43 0.47 Jane_Doe   PF7955   
	F2 2.79 0.28 John_Smith  PT2705
	F3 2.62 0.37 Ken_Brown  PB2702
	F4 - - Bud_Flippner   PX7205
	...
.fi
Fields are delimited by one or more spaces or tabs.
No quote processing is done.
Blank fields must be represented using a code, and
alphanumeric fields cannot contain white space.
Parsing of \fCwhitespace\fR data is faster than processing
of \fCspacequoted\fR data.


.ig >>
<br><br><br>
.>>
.IP \(bu
\fBtab delimited\fR 
.br
.nf
	F1	2.43	0.47	Jane Doe
	F2	2.79	0.28	John Smith
	F3	2.62	0.37	Ken Brown
	F4			Bud Flippner
	...
.fi
Fields are separated by a single tab.  
Zero length fields are taken to be blank.
Data fields cannot have embedded tabs.
The first field must start at the very beginning of the line.
The last field in a row may be terminated by a tab or not.

.ig >>
<br><br><br>
.>>
.IP \(bu
\fBcomma delimited\fR 
.nf
	"F1",2.43,0.47,"Jane Doe"
	"F2",2.79,0.28,"John Smith"
	"F3",2.62,0.37,"Ken Brown"
	"F4",,,"Hello""world"
	...
.fi
Also known as comma-quote delimited or CSV.  Fields are separated by commas.  
Alphanumeric fields are enclosed in double quotes (although ploticus really
doesn't care about this unless a field contains embedded whitespace or comma
characters).
Zero length fields and fields containing "" are taken to be blank.
An embedded double quote is represented using ("") as seen in row F4 above.
No whitespace is allowed before or after fields (although this
apparently is tolerated in the CSV spec).

.ig >>
<br><br><br>
.>>
.LP
\fBNotes regarding data input and parsing:\fR
.LP
\fBEmpty rows and commented rows\fR are ignored (the comment marker may be specified via
.ig >>
<a href="getdata.html">
.>>
\0proc getdata
.ig >>
</a>
.>>
)\0.
.LP
\fBData sets with variable number of fields\fR may be accomodated by specifying 
.ig >>
<a href="getdata.html">
.>>
\0proc getdata
.ig >>
</a>
.>>
attribute \fCnfields\fR.
Otherwise, the first usable row will dictate the expected number of fields per record.
If a row has \fBmore\fR than the expected number of fields, extra fields are silently ignored.
If a row has \fBless\fR than the expected number of fields, blank fields are silently added
until the record has same number of fields as other records.
\fCnfields\fR may also be used to read only the first few fields on every row, and ignore the rest.
.LP
Leading white space is allowed when using \fCspacequoted\fR or \fCwhitespace\fR delimitation.
It is not allowed on the other types.
.LP
Each row, including the last one, should be terminated with the standard line terminator
for your system.  For unix systems this is the newline character.
For Win32 it is CR/LF; these are handled properly by MingW builds but not by unix builds.
.LP
The data parser was improved for version 2.02; earlier versions did not support zero-length
fields or data sets with variable number of fields.
.LP
Data that is specified within a ploticus script is subject to script processing: leading white space
is stripped off and the script interpreter will attempt to evaluate constructs that look like 
operators or variables.


.ig >>
<br><br><br>
.>>
.SH Missing data
Missing data values may be represented using a code or by a zero-length field, if the
specific delimitation method allows them.
When plotting,
missing values are generally skipped over, but exactly what occurs depends on
what kind of plot operation is being done.  The individual plotting
proc manual pages give details.

.ig >>
<a name=set></a>
.>>
.ig >>
<br><br><br>
.>>
.SH Embedded #set statements
Data files may contain embedded \fC#set\fR statements for setting ploticus
variables directly from the data file.  The syntax is:
.IP
\fC#set VARIABLE = value\fR.
.LP
Here's an example of a data file with embedded #set statements:
.IP
.nf
\0 #set x = 1 
\0 #set y = 4
\0 ABC	3	4	11	42.3
\0 DEF	5	2	48	27.4
\0 GHI	9	1	79	37.3
\0 ...
.fi

.ig >>
<br><br><br>
.>>
.SH Database retrievals
There are currently no direct interfaces to databases; recommended procedure is to use
your database's command line tool to extract tabular ASCII data, and use
.ig >>
<a href="getdata.html">
.>>
\0proc getdata
.ig >>
</a>
.>>
\fCcommand\fR attribute to invoke.  For example:
.nf
\0  #proc getdata
\0  command: mysql acars_monitor < select_delay_gs.sql
\0  delim: tab
.fi


.ig >>
<br><br><br>
.>>
.SH Examples
Gallery examples include:
.br
.ig >>
<a href="../gallery/scat7.dat">
.>>
\0scat7.dat
.ig >>
</a>
.>>
(white-space delimited)
.br
.ig >>
<a href="../gallery/stock.csv">
.>>
\0stock.csv
.ig >>
</a>
.>>
(comma delimited)
.br
.ig >>
<a href="../gallery/timeline3.htm">
.>>
\0timeline3
.ig >>
</a>
.>>
(data specified within script)
.br
.ig >>
<a href="../gallery/km2.htm">
.>>
\0km2
.ig >>
</a>
.>>
(data specified within script).


.ig >>
<br>
<br>
</td></tr>
<td align=right>
<a href="../doc/Welcome.html">
<img src="../doc/ploticus.gif" border=0></a><br><small>data display engine &nbsp; <br>
<a href="../doc/Copyright.html">Copyright Steve Grubb</a>
<br>
<br>
<center>
<img src="../gallery/all.gif">
</center>
</td></tr>
</table>
.>>
