Previous | Contents | Next


5. Developer Notes

The best way to learn about the structure of libmtcelledit is to study the public API header file, and the programs that use it.

5.1 Example Programs

I find libmtcelledit a very useful library when developing my own software so I thought it would be helpful to include a few of these programs with the mtCellEdit package to demonstrate what can be done.

5.1.1 mtNetLog

This program demonstrates how you can create a small console program to monitor network activity, and then use spreadsheet formulas for accounting. Data is saved as values in a TSV file so its contents can be read at the command line at any time (even while logging). You can run this program at bootup and it will record all network activity. I do this by inserting the following code at the end of file like /etc/rc.d/rc.S or /etc/rc.local :

/usr/bin/mtnetlog -network ppp0 -delay 10 /var/log/mtnetlog_ppp0.tsv &
/usr/bin/mtnetlog -network eth0 -delay 10 /var/log/mtnetlog_eth0.tsv &

5.1.2 mtRaft

This program demonstrates how you can use a spreadsheet to make the following jobs easier:

  • Format numerical output in a table.
  • Table sorting.
  • Copying table output to the system clipboard.

5.2 Porting and Hacking mtCellEdit

mtCellEdit was designed and built on a GNU/Linux system, with the GUI created with the help of Qt 5. It should possible to port it anywhere else (with a little effort) as it is just a C/C++ program. I don't have enough time and energy to do this porting myself but feel free to port the code if you want (according to the GNU GPL of course) and tell me or the world about your experiences.

5.3 Command Line Examples

Because the TSV file format is so simple its very easy to read using the command line or a script. This is completely independent of the mtCellEdit GUI or even the core libmtcelledit library. This ensures that your data is completely accessible on any properly equipped GNU operating system.

Here is a summary of some useful commands for manipulating a TSV spreadsheet:

Command Description
awk General purpose parsing, manipulation, and reporting
cat Concatenate various files together
cut Remove columns
head Output the top rows of a file
paste Paste one files data after another files data (line by line)
sed Simple regexp's
sort Sort rows
tac Reverse the order of the rows (vertical flip)
tail Output the bottom of a file
tee Read from stdin and output to stdout and files
tr Translate or delete characters
uniq Remove duplicate rows (see also sort -u)

The shell also provides the usual redirections:

> Create a new file
>> Append rows to a file

5.3.1 awk

Writing a small awk program to process a TSV sheet offers a few advantages over a C++ program. For example some simple tasks require just a few awk instructions to complete, compared to a C++ program that would need to be much larger and therefore more time consuming to create and maintain.

5.3.1.1 mtNetLog Queries

You can use the awk program to process the TSV file created by mtNetLog and present the data in a more human readable form like this:

cat /var/log/mtnetlog_ppp0.tsv | awk -F "\t" 'NR==2{for (i=1; i<=68; i++) printf "-"; printf "\n"} NR!=2{printf "%20s %10.2fMB %10.2fMB %10.2fMB %8.2f\n", $1, $7, $8, $9, 100*$8/$7}'

Which displays the in/out/total columns and a perctange of input to output flow. This will convert the data from this file:

		61.555277776904	253420306	30916922	330063404	301.02403259277	13.748958587646	314.77299118042
'Start	'End	'Hours	'Bytes In	'Bytes Out	'Bytes Total	'MB In	'MB Out	'MB Total
2010-11-10 12:10:42	2010-11-11 00:39:11	12.474722223356	36193057	11935767	86299634	73.300669670105	9.0010747909546	82.30174446106
2010-11-14 09:59:13	2010-11-15 01:36:16	15.617500000633	15034107	2117117	70948296	67.253649711609	0.40791797637939	67.661567687988
2010-11-19 11:58:12	2010-11-20 02:43:36	14.756666664965	68171569	6224867	36326937	32.803199768066	1.8408670425415	34.644066810608
2010-11-20 10:09:43	2010-11-21 02:21:46	16.200833332725	84574392	1741061	56100792	52.855527877808	0.6463565826416	53.501884460449
2010-11-21 12:29:45	2010-11-21 15:00:05	2.5055555552244	49447181	8898110	80387745	74.810985565186	1.8527421951294	76.663727760315

Into this:

                         301.02MB      13.75MB     314.77MB     4.57
--------------------------------------------------------------------
 2010-11-10 12:10:42      73.30MB       9.00MB      82.30MB    12.28
 2010-11-14 09:59:13      67.25MB       0.41MB      67.66MB     0.61
 2010-11-19 11:58:12      32.80MB       1.84MB      34.64MB     5.61
 2010-11-20 10:09:43      52.86MB       0.65MB      53.50MB     1.22
 2010-11-21 12:29:45      74.81MB       1.85MB      76.66MB     2.48

You can also use the sort command line program in order to sort according to a given field:

cat /var/log/mtnetlog_ppp0.tsv | tail -n+3 | sort -t $'\t' -nk 3 | awk -F "\t" '{ printf "%s\t%5.2f hours\n", $1, $3 }'

Creates this output:

2010-11-21 12:29:45	 2.51 hours
2010-11-10 12:10:42	12.47 hours
2010-11-19 11:58:12	14.76 hours
2010-11-14 09:59:13	15.62 hours
2010-11-20 10:09:43	16.20 hours

5.3.1.2 Splitting fixed width text files

If a text file contains a rectangular lump of text with various columns set up using spaces it is possible to chop this up into a TSV file ready for spreadsheet editing:

cat fixed.txt | awk -v FIELDWIDTHS='20 11 2 11 2 11 2 9' -v OFS='\t' '{ $1=$1; print }' > fixed.tsv

This cuts up the output from the mtnetlog example in 5.3.1.1. You may also want to remove spaces using the following sections.

5.3.1.3 Removing all spaces

cat fixed.tsv | tr -d " "

This is a very crude and indiscriminate weapon of mass destruction, so care is needed when using it. For example it destroys the space between the date and time in the example from cutting up the text from 5.3.1.1. To avoid this problem use the following example.

5.3.1.4 Removing all leading and trailing spaces

You can use sed or awk to chop off leading and trailing spaces in each cell. This is useful if you want to preserve the spaces within the cell, but remove useless spaces after splitting a fixed width file:

cat fixed.tsv | sed -e "s/^[ ]\+//" | sed -e "s/[ ]\+$//" | sed -e "s/\t[ ]\+/\t/g" | sed -e "s/[ ]\+\t/\t/g"

cat fixed.tsv | awk '{ sub( /^ +/, "" ); sub( / +$/, "" );  gsub( /\t +/, "\t" ); gsub( / +\t/, "\t" ); print }'

5.3.1.5 Creating fixed width text files

To reverse the operation in 5.3.1.3 and to create a fixed width file from a TSV file:

cat fixed.tsv | awk -v FS='\t' '{ printf "%20s%11s%2s%11s%2s%11s%2s%9s\n", $1, $2, $3, $4, $5, $6, $7, $8 }'

In this example, we want all of the chunks of text to be right justified, but if we wanted to have them left aligned we would simply place a '-' before the number like this:

cat fixed.tsv | awk -v FS='\t' '{ printf "%-20s%-11s%-2s%-11s%-2s%-11s%-2s%-9s\n", $1, $2, $3, $4, $5, $6, $7, $8 }'

5.3.2 unzip

The command line program unzip allows you to access the sheets held inside a CED/ZIP file. Here is an example using the example data in the file 'test_suite.tsv.zip' from the mtCellEdit source code distribution:

unzip -p test_suite.tsv.zip "test_suite.tsv/sheet/Data (1959 GE)" | awk -F "\t" 'NR>340 {  printf "%s - %s\n", $1, $2 } NR>350 {exit}'

This outputs the first 2 columns from rows 341 to 350, which results in this being displayed:

'King's Lynn - 'DG Bullard
'Kingston upon Thames - 'JA Boyd-Carpenter
'Kinross and West Perthshire - 'WG Leburn
'Kirkcaldy - 'HPH Gourlay
'Knutsford - 'WH Bromley-Davenport
'Lanark - 'JCM Hart
'Lanarkshire North - 'MM Herbison
'Lancaster - 'HJ Berkeley
'Leeds East - 'DW Healey
'Leeds North East - 'K Joseph
'Leeds North West - 'D Kaberry

As you can see it prints out the ' character at the beginning of each field which is undesirable, so we ask awk to cut this out for us using the sub function:

unzip -p test_suite.tsv.zip "test_suite.tsv/sheet/Data (1959 GE)" | awk -F "\t" 'NR>340 { sub(/^\x27/, "", $0); gsub(/\x09\x27/, "\t", $0); printf "%s - %s\n", $1, $2 } NR>350 {exit}'

Which creates this:

King's Lynn - DG Bullard
Kingston upon Thames - JA Boyd-Carpenter
Kinross and West Perthshire - WG Leburn
Kirkcaldy - HPH Gourlay
Knutsford - WH Bromley-Davenport
Lanark - JCM Hart
Lanarkshire North - MM Herbison
Lancaster - HJ Berkeley
Leeds East - DW Healey
Leeds North East - K Joseph
Leeds North West - D Kaberry

If you wanted to see the last 10 rows in a file you would simply call the tail command first:

unzip -p test_suite.tsv.zip "test_suite.tsv/sheet/Data (1959 GE)" | tail | awk -F "\t" '{ sub(/^\x27/, "", $1); sub(/^\x27/, "", $2); printf "%s - %s\n", $1, $2 }'

Which creates this:

Worcester - GR Ward
Worcestershire South - PG Agnew
Workington - TF Peart
Worthing - OL Prior-Palmer
Wrekin, The - W Yates
Wrexham - JI Jones
Wycombe - J Hall
Yarmouth - A Fell
Yeovil - JWW Peyton
York - CB Longbottom

5.3.3 zcat

If you have saved some TSV data inside a compressed GZ file you cat use the zcat command to access this data. This example extracts the first and third columns from a file and stores the results in a new compressed output file:

zcat input.tsv.gz | awk -F "\t" '{ printf "%s\t%s\n", $1, $3 }' | gzip > output.tsv.gz

5.3.4 tr

5.3.4.1 Changing Field Separators

Occasionally you may come across a file with an unusual field separator such as : ; or @. Assuming these characters only exist as field separators then its very easy to swap the characters to tabs using tr:

cat input.txt | tr "@" "\t" > output.tsv

For exotic multi-character separators like @;@ you can use awk:

cat input.txt | awk '{ gsub( /@;@/, "\t" ); print }' > output.tsv

You can also change a normal TSV file into a file with rather more eccentric field separators:

cat input.tsv | tr "\t" "@" > output.txt
cat input.tsv | awk '{ gsub( /\t/, "@;@" ); print }' > output.txt

5.4 cedutils

As mentioned in section 5.3, a GNU system can manipulate a TSV file in a number of ways. However there are some jobs that are not possible to complete quickly and easily so I have assembled a group of command line utilities that can be used by the shell to manipulate spreadsheet files.

These utilities are designed for accessing CSV or TSV files using basic scripts. If you need access to data inside a CED/ZIP file then you should unzip it to a temp directory and access the sheets there. Any complex or CPU intensive tasks should be left to custom built C++ programs.

These utilities are accessed via the binary program cedutils. You access its functions by calling it via a symlink to execute a single command. Because all commands really exist in a central program, it is also possible to do multiple different operations within the same line by using the argument "-com" followed by the new command. The advantage of this is in terms of speed and fewer file operations. When you call 2 commands separately in two script lines, you action a load and a save operation twice. However, with multiple commands in a single line you only have a single load and a single save operation which saves CPU cycles, and involves fewer file read/write operations. Here is an example which flips a sheet and inserts 10 empty rows:

cedflip input.tsv -com insert -total 10 "" -o output.tsv

Compare this to the two operations done separately:

cedflip input.tsv -o output.tsv
cedinsert -total 10 output.tsv -o output.tsv

More information can be obtained by consulting the man page:

man cedutils

Previous | Contents | Next