Friday, July 10, 2009

On Romanian GIS data, Stereo 1970 and WGS84

GIS data for Romania is sparse. The government have it, but they are slow to share it and even when they do, they fail to release under clear licensing terms.

Once GIS data is obtained, another problem arises - the data is most likely in the Stereo 1970 projection (EPSG 31700). Modern GIS systems, including openstreetmap, have no capability of interpreting data in anything other then WGS 84 (EPSG 4326).

A translation is needed from Stereo 1970 to WGS84.

One can attempt to translate coordinates using the excellent libproj.
Given the coordinates for Cabana Fantanele, a hut in the Ceahlau mountains, in Stereo 1970, we can use the tools provided by libproj to translate to WGS84:


cristi:proj diciu$ ./src/cs2cs +proj=stere +lat_0=46 +lon_0=25 +k=0.999750 \
> +x_0=500000 +y_0=500000 +ellps=krass +units=m +no_defs +to \
>+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs -f "%.6f"
572176.849 611484.777
25.949088 46.999202 0.000000


However, if you plot this WGS84 coordinate set on Google Earth (pin name is "Fantanele proj"), you will see the coordinates are approximately 100 meters away from the hut.







Thanks to the work of a government agency called ANCPI (Agentia Nationala de Cadastru si Publicitate Imobiliara) we have a tool that allows accurate translations between Stereo 1970 and WGS84: TransDatRO.

Unfortunately accurate translation between Stereo 1970 and ETRS89 requires a set of common points and the tool only includes enough points for the following counties: Caras Severin, Gorj, Valcea, Dambovita, Teleorman, Braila, Botosani, Suceava.





Degrees, minutes, seconds can be converted to decimal degrees (as required by openstreetmap) using libproj's cs2cs:


cristi:proj diciu$ ./src/cs2cs -f “%.6f” +proj=latlong +datum=WGS84 +to +proj=latlong +datum=WGS84
46d59'56.28495N 25d56'51.25243E
“46.998968” “25.947570” “0.000000”


When we plot the coordinates as returned by TransDat, even though Neamt county is not part of the accurate county list, we observe that coordinates returned by TransDat are much closer to what's revealed by the satellite imagery (see how "Fantanele Topo" is in close proximity to the hut's roof):





Note: I am not a geographer. All of this explained above is derived from personal, rather empiric research.
I have lots of questions that I couldn't find an answer to such as why there is no apparent difference when converting from ETRS89 to WGS84 - you might notice that I use ETRS89 and WGS84 interchangeably.

Thursday, June 18, 2009

postgresql drop Romanian diacritics


create or replace function strip_diacritics(varchar) returns varchar as $$
DECLARE
src ALIAS FOR $1;
result VARCHAR;

BEGIN result := TRANSLATE(src, E'\xc3\x82\xc3\x83\xc3\xa2\xc3\xa3\xc4\x82\xc3\x8e\xc3\xae\xc5\x9e\xc5\x9f\xc5\xa2\xc3\xa3\xc7\x8d\xc7\x8e\xc8\x98\xc8\x99\xc8\x9a\xc8\x9B\xc3\x83', 'AaaaAIiSsTtAaSsTtA');
return result;
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;

Wednesday, June 10, 2009

Note to self

OpenStreetMap places less then 2000 meters apart with names of the same length:



select p1.name, p2.name, p1.osm_id, p2.osm_id, p1.place, p2.place, distance(p1.way, p2.way) from planet_osm_point p1, planet_osm_point p2 where p1.place !='' and p2.place!='' and p1.osm_id != p2.osm_id and distance(p1.way, p2.way) < 2000 and octet_length(p1.name) = octet_length(p2.name) order by p1.name;

Tuesday, April 28, 2009

Spell checking in Objective-C



I've finally beat python at running the very simple spell checker described in an earlier post. The code is not "pure" Objective-C, some parts are just too slow so I've had to mix in some C.

The major difference compared to the versions described in the earlier post is this addition in the loop of - (NSSet *) knownEdits2:(NSString *) word of an additional condition:

if([ts intersectsSet:knownWords])
[edits2 unionSet:[self edits1:w]];

The intersectsSet condition avoids growing the set of edit distance 2 very effectively - in fact I suspect the python reference code does exactly that - I'm not sure how to read the python construct into distinct loops: return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS) so I can't tell for sure.


CSSpell .h

#import

@interface CSSpell : NSObject
{
NSMutableDictionary * model;
NSDictionary * NWORDS;
NSMutableSet * knownWords;
}
@end


CSSpell.m


#import "CSSpell.h"

@interface CSMutableNumber:NSObject
{
NSUInteger mNumber;
}
@end

@implementation CSMutableNumber
- (id)initWithInt:(int)value
{
self = [super init];
if(!self) return nil;
mNumber = value;
return self;
}
+ (CSMutableNumber *)numberWithInt:(int)value
{
return [[[CSMutableNumber alloc] initWithInt:value] autorelease];
}
- (void)setIntValue:(int)value
{
mNumber = value;
}
- (void) increment
{
if(mNumber < NSUIntegerMax) mNumber++;
}
- (int) intValue
{
return mNumber;
}
@end

@implementation CSSpell

- (NSDictionary *) trainv2:(NSString *)text
{
if(model == nil)
{
model = [NSMutableDictionary dictionaryWithCapacity:40000];
[model retain];
}

NSString * t = [text lowercaseString];
char * buf = (char *)[t cStringUsingEncoding:NSASCIIStringEncoding];

char **bp = &buf;
char *tok;
while (tok = strsep(bp, " \r\t,.*()[]#\n\"-!?&/'~;:"))
if(strlen(tok)>2)
{
NSString * w = [NSString stringWithCString:tok];

CSMutableNumber * wordVal = [model objectForKey:w];
if(wordVal)
[wordVal increment];
else
[model setObject:[CSMutableNumber numberWithInt:1] forKey:w];

}

NSLog(@"Caching set of known words");

knownWords = [NSSet setWithArray:[model allKeys]];
return model;
}



/* returns a set of words at edit distance 1 away */
- (NSSet *) edits1:(NSString *) word
{
NSMutableSet * s = [NSMutableSet setWithCapacity:2000];

int i=0, len;
char c;
#define MAX_LEN_C 500
char str[MAX_LEN_C];
char temp[MAX_LEN_C];


strcpy(str, [word cStringUsingEncoding:NSASCIIStringEncoding]);
for(i=0;i<[word length];i++)
{
strcpy(temp, str); temp[i] = str[i+1]; temp[i+1] = str[i];
len = strlen(temp);

if(i<[word length]-1)
{
NSString * transposed = [NSString stringWithCString:temp length:len];
[s addObject:transposed];
}

strcpy(temp + i, str+i+1);
NSString * deletion = [NSString stringWithCString:temp length:strlen(temp)];
[s addObject:deletion];

for(c='a';c<='z';c++)
{
temp[i] = c; strcpy(temp + i + 1, str+i+1);
NSString * replaced = [NSString stringWithCString:temp length:len];
[s addObject:replaced];

strcpy(temp + i + 1, str+i);
NSString * inserted = [NSString stringWithCString:temp length:len+1];
[s addObject:inserted];
}
}

return s;
}



/* returns a set of known words based at edit distance 2 away */
- (NSSet *) knownEdits2:(NSString *) word
{
NSMutableSet * edits2 = [NSMutableSet setWithCapacity:5000];
NSSet * edits1 = [self edits1:word];

NSEnumerator * en1 = [edits1 objectEnumerator];
NSString * w;
while(w=[en1 nextObject])
{
NSSet * ts = [self edits1:w];


if([ts intersectsSet:knownWords])
[edits2 unionSet:[self edits1:w]];
}

//NSLog(@"Edits2 has %d", [edits2 count]);

[edits2 intersectSet:knownWords];
NSLog(@"KW for %@: %@", word, edits2);

return edits2;
}

/* returns a set of known words from a given set of words */
- (NSSet *) known:(NSSet *) words
{
NSMutableSet * s = [NSMutableSet setWithCapacity:100];
[s unionSet:words];
[s intersectSet:knownWords];
return s;
}

- (NSString *) correct:(NSString *) word
{
if([word length] > 300)
return nil;


NSMutableSet * candidates = [NSMutableSet setWithCapacity:10];

if([candidates count] < 1)
{
[candidates unionSet:[self known:[NSSet setWithObject:word]]];
//NSLog(@"Adding known words set: %@", candidates);
}

if([candidates count] < 1)
{
[candidates unionSet:[self known:[self edits1:word]]];
//NSLog(@"Adding edit distance 1 set: %@", candidates);
}

if([candidates count] < 1)
{
[candidates unionSet:[self knownEdits2:word]];
//[candidates unionSet:[self knownEdits2STL:word]];
//NSLog(@"Adding edit distance 2 set: %@", candidates);
}
if([candidates count] < 1)
{
//NSLog(@"Adding the word itself");
[candidates addObject:word];
}

int max=0;
NSString * chosen = @"Unknown#";
NSEnumerator * e = [candidates objectEnumerator];
NSString * w;
while(w=[e nextObject])
{
//CSMutableNumber * wordVal = [NWORDS valueForKey:w];
CSMutableNumber * wordVal = [NWORDS objectForKey:w];
if(wordVal)
{
if([wordVal intValue] > max)
{
chosen = w;
max = [wordVal intValue];
//break;
}
}
}


return chosen;
}

- (void) awakeFromNib
{

NSString * cnt = [NSString stringWithContentsOfFile:@"/Users/diciu/Desktop/big.txt" encoding:NSASCIIStringEncoding error:nil];
NSLog(@"Dictionary has been loaded");

NWORDS = [self trainv2:cnt];
NSLog(@"Training done, %d words", [NWORDS count]);

NSDate * startDate = [NSDate date];

NSArray * tests = [NSArray arrayWithObjects:@"generataed", @"acount", @"guidlines",
@"wheere", @"myne", @"graet", @"silenc", @"aggresive",
@"appreceiated", @"aquantance", @"beggining", nil];
NSEnumerator * en = [tests objectEnumerator];
NSString * c;
while(c=[en nextObject])
{
NSDate * sd = [NSDate date];
NSString * correct = [self correct:c];
NSLog(@"%@ / %@ \t\t\t in %f seconds", c, correct, [sd timeIntervalSinceNow]);
}


NSLog(@"Total runtime: %f", [startDate timeIntervalSinceNow]);

}
@end

Sunday, April 19, 2009

On (my) Objective-C code that's slower then expected

I was looking into building a spell checker for the geographical names in BucharestApp based on a Peter Norvig's article on building a spell checker so I've started writing a translation of Peter's python script in Objective-C.

The unpleasant surprise came when I timed my Objective-C code against the python script.
Running the spell checker on one of my word lists in python took 1.11 seconds; the Objective-C code was 4.69 times slower.

After factoring out training time because this can be very easily optimized by something as simple as serializing the training dictionary instead of computing it at runtime, I watched how my Objective-C code takes a whopping 5.3 seconds for spellchecking 11 words.

generataed generate -1.459070 s
acount count -0.003310 s
guidlines guideline -1.173020 s
wheere where -0.002277 s
myne mine -0.001197 s
graet great -0.002184 s
silenc silent -0.001916 s
aggresive aggressive -0.003051 s
appreceiated appreciated -0.004764 s
aquantance acquaintance -1.483899 s
beggining beginning -1.185744 s
Total runtime -5.329394



The data set shows a cluster of words performing at .003 seconds and another cluster performing at 1.4 seconds.
What the miss-spellings "generataed", "guidlines", "aquantance", "beggining" have in common is edit distance. Unlike the other miss-spellings that are all at edit distance 1, they are all at edit distance 2. The problem with edit distance 2 is that it produces many variations of the miss-spelled word (around 200000) - Peter explains all this in his article.

My first instinct was to blame NSSet operations. After all, I was intersecting a 200000 set (the words at edit distance 2) with a 40000 set (the dictionary).
This must be slow, right? Wrong. After messing with the initial capacity of my NSSets and poking around Google for some explanation of what I was seeing, I started looking at the data Shark shows.

As expected, Shark shows the bottleneck in knownEdits2, the method which computes variations at edit distance 2.
But it's not NSSet operations that take the most CPU, but rather unexpectedly it's [NSString stringWithFormat].





-[CSSpell knownEdits2] does most of its work in -[CSSpell edits1StringsInObjCUnoptimized]:


- (NSSet *) edits1StringsInObjCUnoptimized:(NSString *) word
{
NSMutableSet * s = [NSMutableSet setWithCapacity:2000];

int i=0;
char c;
//deletions
for(i=0;i<[word length];i++)
[s addObject:[NSString stringWithFormat:@"%@%@",
[word substringToIndex:i],
[word substringFromIndex:i+1]]];


//replaces
for(i=0;i<[word length];i++)
for(c='a';c<='z';c++)
[s addObject:[NSString stringWithFormat:@"%@%c%@",
[word substringToIndex:i],
c,
[word substringFromIndex:i+1]]];

// inserts
for(i=0;i<[word length];i++)
for(c='a';c<='z';c++)
[s addObject:[NSString stringWithFormat:@"%@%c%@",
[word substringToIndex:i],
c,
[word substringFromIndex:i]]];

//transposes
for(i=0;i<[word length]-1;i++)
[s addObject:[NSString stringWithFormat:@"%@%@%@%@",
[word substringToIndex:i],
[word substringWithRange:NSMakeRange(i+1, 1)],
[word substringWithRange:NSMakeRange(i, 1)],
[word substringFromIndex:i+2]]];

return s;
}




-[CSSpell edits1StringsInObjCUnoptimized] is shown by Shark to call -[NSString stringWithFormat] lots of times.
Which makes sense, since edits are formed using loops. But when I looked at how otx shows the assembler code generated from this method, I had another surprise (many parts cut for brevity):


-(id)[CSSpell edits1StringsInObjCUnoptimized:]
[..]
+36 0000281b e8ef270000 calll 0x0000500f +[NSMutableSet setWithCapacity:]
[..]
+101 0000285c e8ae270000 calll 0x0000500f -[(%esp,1) substringToIndex:] ///--
[..]
+134 0000287d e88d270000 calll 0x0000500f +[NSString stringWithFormat:]
[..]
+158 00002895 e875270000 calll 0x0000500f -[(%esp,1) addObject:]
[..]
+268 00002903 e807270000 calll 0x0000500f -[(%esp,1) substringToIndex:] ///--
[..]
+306 00002929 e8e1260000 calll 0x0000500f +[NSString stringWithFormat:]
[..]


The surprise is that substringToIndex is called twice.
Since it's called on an immutable string and thus it yields the same result within the method's scope, I would have expected GCC to optimize the call.

So I rewrote the method, optimizing calls to substring methods:


- (NSSet *) edits1StringsInObjC:(NSString *) word
{
NSMutableSet * s = [NSMutableSet setWithCapacity:2000];

int i=0;
char c;

for(i=0;i<[word length];i++)
{
NSString * toIndex = [word substringToIndex:i];
NSString * fromIndex = [word substringFromIndex:i+1];
NSString * fromIndex2 = [word substringFromIndex:i];


if(i<[word length]-1)
[s addObject:[NSString stringWithFormat:@"%@%c%c%@",
toIndex,
[word characterAtIndex:i+1],
[word characterAtIndex:i],
[word substringFromIndex:i+2]]]; // transposes


[s addObject:[NSString stringWithFormat:@"%@%@", toIndex, fromIndex]]; //deletion

for(c='a';c<='z';c++)
{
[s addObject:[NSString stringWithFormat:@"%@%c%@",
toIndex,
c,
fromIndex]]; //replace
[s addObject:[NSString stringWithFormat:@"%@%c%@",
toIndex,
c,
fromIndex2]]; //insert
}
}

return s;
}



Timing the code after the optimization shows:

generataed generate -1.142463 s
acount count -0.001646 s
guidlines guideline -0.931142 s
wheere where -0.002310 s
myne mine -0.001241 s
graet great -0.001417 s
silenc silent -0.001597 s
aggresive aggressive -0.002322 s
appreceiated appreciated -0.003206 s
aquantance acquaintance -1.157321 s
beggining beginning -0.938545 s
Total runtime -4.192067



This optimization has shaved off ~0.35 seconds when checking miss-spells at edit distance 2.
In the total runtime, we've decreased to 78% out of the initial runtime.

If we check this new version using Shark, the bottleneck is still shown in -[NSString stringWithFormat], in fact the stringWithFormat calls have increased their presence in CPU time to 34% out of the total run-time. Considering the Shark data takes into account training time too, stringWithFormat is in fact burning probably around 50% of our CPU time.






Googling around yet again, I've arrived to the conclusion that NSString might be indeed slow because of the complete UTF-8 support it offers. For the purpose of the spell checker, UTF-8 is not needed, but unfortunately you just can't tell NSString to not care about UTF-8.
How about replacing stringWithFormat and forming a NSString out of pure C strings?



- (NSSet *) edits1StringsInC:(NSString *) word
{
NSMutableSet * s = [NSMutableSet setWithCapacity:2000];

int i=0;
char c;

#define MAX_LEN_C 500
char toIndex[MAX_LEN_C];
char fromIndex[MAX_LEN_C];
char fromIndex2[MAX_LEN_C];
char str[MAX_LEN_C];
char temp[MAX_LEN_C];

if([word length] > MAX_LEN_C)
return [self edits1StringsInObjC:word];

strcpy(str, [word cStringUsingEncoding:NSASCIIStringEncoding]);

for(i=0;i<[word length];i++)
{
strncpy(toIndex, str, i);
toIndex[i] = '\0';
strcpy(fromIndex, str+i+1);
strcpy(fromIndex2, str+i);

sprintf(temp, "%s%c%c%s", toIndex, str[i+1], str[i], str+i+2);
NSString * transposed = [NSString stringWithCString:temp length:strlen(temp)];
if(i<[word length]-1)
[s addObject:transposed];

sprintf(temp, "%s%s", toIndex, fromIndex);
NSString * deletion = [NSString stringWithCString:temp length:strlen(temp)];
[s addObject:deletion];

for(c='a';c<='z';c++)
{
sprintf(temp, "%s%c%s", toIndex, c, fromIndex);
NSString * replaced = [NSString stringWithCString:temp length:strlen(temp)];
[s addObject:replaced];
sprintf(temp, "%s%c%s", toIndex, c, fromIndex2);
NSString * inserted = [NSString stringWithCString:temp length:strlen(temp)];
[s addObject:inserted];
}
}

return s;
}



Timing the code after changing string forming to C shows a total runtime of 2.04 seconds.
Our optimization has shaved off 1 second when checking miss-spells at edit distance 2 compared to the original implementation.
In the total runtime, we're down to 38% from the initial implementation runtime. Mind you, it's still twice as slow when compared to the python equivalent.

In this latest version, the CPU time is split between edits1StringsInC (3X), intersectSet (2X) and unionSet (1X).
edits1StringsInC still takes most of the CPU time, with the most intensive operations being sprintf at 41% and stringWithCString at 30%.

The next thing I tried is replacing sprintf with strcpy.
Yet another half a second in speed gain, but the code becomes less and less readable:


- (NSSet *) edits1StringsInCv2:(NSString *) word
{
NSMutableSet * s = [NSMutableSet setWithCapacity:2000];

int i=0, len;
char c;

#define MAX_LEN_C 500
char str[MAX_LEN_C];
char temp[MAX_LEN_C];

if([word length] > MAX_LEN_C-1)
return [self edits1StringsInObjC:word];

strcpy(str, [word cStringUsingEncoding:NSASCIIStringEncoding]);
for(i=0;i<[word length];i++)
{
strcpy(temp, str); temp[i] = str[i+1]; temp[i+1] = str[i];
len = strlen(temp);

if(i<[word length]-1)
{
NSString * transposed = [NSString stringWithCString:temp length:len];
[s addObject:transposed];
}

strcpy(temp + i, str+i+1);
NSString * deletion = [NSString stringWithCString:temp length:strlen(temp)];
[s addObject:deletion];

for(c='a';c<='z';c++)
{
temp[i] = c; strcpy(temp + i + 1, str+i+1);
NSString * replaced = [NSString stringWithCString:temp length:len];
[s addObject:replaced];

strcpy(temp + i + 1, str+i);
NSString * inserted = [NSString stringWithCString:temp length:len+1];
[s addObject:inserted];
}
}

return s;
}




Using strcpy and hard to follow pointer arithmetic instead of sprintf, we get at 1.5 seconds:

generataed generate -0.402929 s
acount count -0.001699 s
guidlines guideline -0.329685 s
wheere where -0.000743 s
myne mine -0.000473 s
graet great -0.001423 s
silenc silent -0.000769 s
aggresive aggressive -0.001243 s
appreceiated appreciated -0.001449 s
aquantance acquaintance -0.421072 s
beggining beginning -0.340965 s
Total runtime -1.509337



In knownEdits2 Shark shows a tie between edit1StringsInCv2 and -[NSMutableSet intersectSet].




At this point, further optimizations in knownEdits probably don't make sense unless NSMutableSet is replaced too. And working with sets in plain C is not fun, so for now I'll settle with a 1.50 seconds, 35% slower run-time when compared to the python equivalent.

Conclusions:

  • Python string handling is fast

  • Python set operations are fast

  • Objective-C string operations are slow

Friday, April 17, 2009

Network connections by geographic location

A couple of days ago, a colleague made me aware that iplocationtools.com exposes a cool web service that allows you to get the geographical location based on an IP address.

Here's a script that parses the IP addresses the workstation is connected to, and then resolves each one to its geographical location:




clear; netstat -an | grep ESTABLISHED | awk {'print $5'} | sort -u | sed "s/\([0-9]*\.[0-9]*\.[0-9]*\.[0-9]*\).*/\1/g" | while read sw; do url="http://iplocationtools.com/ip_query.php?ip=$sw"; curl $url -s | ~/tools/xmlstarlet sel -t -m "/Response/CountryName" -v . -o "," -m "/Response/City" -v . -; done


Note the dependancy on xmlstarlet that allows easy extraction of values from XML content. Unlike xsltproc which demands a XSL file, xmlstarlet allows you to run an Xpath query, building the needed XSL file by itself.

And here's a rundown of the script:

clear just clears the shell screen.

clear; netstat -an -> get a list of connections, without attempting to solve names

clear; netstat -an | grep ESTABLISHED -> keep only lines containing ESTABLISHED, that is active connections


clear; netstat -an | grep ESTABLISHED | awk {'print $5'} -> keep only the 5th field from each line, containing the IP address we are connected to

clear; netstat -an | grep ESTABLISHED | awk {'print $5'} | sed "s/\([0-9]*\.[0-9]*\.[0-9]*\.[0-9]*\).*/\1/g" -> strip out the port number from the IP address.

Without sed, netstat outputs stuff like 74.125.43.83.443 where 443 is the TCP port we are connected to.
To strip out the port, we can use sed, matching IP addresses to 4 groups of digits separated by a dot, i.e. \([0-9]*\.[0-9]*\.[0-9]*\.[0-9]*\). This is later matched by the "\1" reference.

clear; netstat -an | grep ESTABLISHED | awk {'print $5'} | sort -u | sed "s/\([0-9]*\.[0-9]*\.[0-9]*\.[0-9]*\).*/\1/g" | while read sw; do command done

What follows is a while loop, where we read each ip address, one at a time and execute the localization command.

command is:

url="http://iplocationtools.com/ip_query.php?ip=$sw"; curl $url -s | ~/tools/xmlstarlet sel -t -m "/Response/CountryName" -v . -o "," -m "/Response/City" -v . -

We use curl to get an XML document that lists the geolocation of the IP we and then process the XML document using xmlstarlet, by extracting the country and city name.

Tuesday, March 10, 2009

Network statistics on OS X using the sysctl interface

sysctl offers read access to a couple of kernel structures that contain network statistics.

I am yet to figure out the difference in between the IP statistics as read from the "net.inet.ip.stats" sysctl value when compared to the values as read by netstat (as seen below, IP stats prints 114243 packets and netstat reports 116840).


cristi:tmp diciu$ ./a.out
IP packets received: 114243
IP packets generated here: 85251
TCP connection attempts: 3337
TCP total packets sent: 81779
TCP total packets received: 105700



cristi:tmp diciu$ netstat -bi
Name Mtu Network Address Ipkts Ierrs Ibytes Opkts Oerrs Obytes Coll
[..]
en0 1500 Link#4 xx:xx:xx:xx:xx:xx 116840 0 113745333 85202 0 13307330 0
[..]


Here's the C source for the sysctl reader:



#include <sys/types.h>
#include <sys/sysctl.h>

#include <stdlib.h>
#include <stdio.h>

#include <sys/socketvar.h>
#include <netinet/ip.h>
#include <netinet/ip_var.h>
#include <netinet/tcp.h>
#include <netinet/tcp_var.h>

int main()
{
void * oldp = malloc(1024);
size_t oldlen = sizeof(struct ipstat), newlen;
void * newp = NULL;

int retval = sysctlbyname("net.inet.ip.stats", oldp, &oldlen, newp, newlen);


struct ipstat * g = (struct ipstat *) oldp;
printf("IP packets received: %d\n", g->ips_total);
printf("IP packets generated here: %d\n", g->ips_localout);


struct tcpstat * t = (struct tcpstat *) oldp;
oldlen = sizeof(struct tcpstat);
retval = sysctlbyname("net.inet.tcp.stats", oldp, &oldlen, newp, newlen);
printf("TCP connection attempts: %d\n", t->tcps_connattempt);
printf("TCP total packets sent: %d\n", t->tcps_sndtotal);
printf("TCP total packets received: %d\n", t->tcps_rcvtotal);
}