Archive for July, 2009

XML vs raw data

We store many data in XML format, as small file in the disk in many of our application projects.
We’ve got a crisis with one of the projects, where the 700 Gb hard disk filling up rather quickly on the intensified data entry event – they lose 5 to 10 Gb each day.

We’ve spent days tracing on the issues, and realized that all the small XML files actually eating up quite some disk space.
We have 2 choices:
- store these data in database.
- convert them to something more compact.

As the server is losing up disk space rather quickly, we have only a few days to perform the stunt.

We have tried:
- migrate data to database. No luck, since the database server also running low on resources.
- compact the XML data file with GZip. No luck too, as it eats up too much of processor power in compress and expand.
- change XML format to good old raw data type. Yeay!

We have managed to free up 370 Gb just by abandoning the XML format! That was almost 1/3 of the original disk space required.
And we were using a simplified XML before! The full XML standard would be even larger.

Now, let us look at the different here:

Simple XML:
<name>John A. Sam</name>
<id>1234567890</id>
<optional-data></optional-data>

Raw data format:
name=John A. Sam
id=1234567890
optional-data=

See the different? Look at the chart on the day we trigger the migration. It took the server 4 days to convert countless XML. But the result was worth it.



[ click for full size ]

XML is definitely having more waste-bytes here.
One less reason for us to use XML on server processing.

Uptime 1000 days

One of my servers has just hit a 1000 days uptime.

[root@us root]# uptime
18:04:20 up 1000 days, 11:30, 1 user, load average: 0.00, 0.00, 0.00

I am in this business far too long.

whois google.com?

I’ve just performed a whois lookup on google.com. The result was rather amusing. Here goes:

whois google.com
Whois Server Version 2.0

Domain names in the .com and .net domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.

GOOGLE.COM.ZZZZZ.GET.LAID.AT.WWW.SWINGINGCOMMUNITY.COM
GOOGLE.COM.ZZZZZ.DOWNLOAD.MOVIE.ONLINE.ZML2.COM
GOOGLE.COM.ZOMBIED.AND.HACKED.BY.WWW.WEB-HACK.COM
GOOGLE.COM.ZNAET.PRODOMEN.COM
GOOGLE.COM.WORDT.DOOR.VEEL.WHTERS.GEBRUIKT.SERVERTJE.NET
GOOGLE.COM.VN
GOOGLE.COM.UY
GOOGLE.COM.UA
GOOGLE.COM.TW
GOOGLE.COM.TR
GOOGLE.COM.SUCKS.FIND.CRACKZ.WITH.SEARCH.GULLI.COM
GOOGLE.COM.SPROSIUYANDEKSA.RU
GOOGLE.COM.SERVES.PR0N.FOR.ALLIYAH.NET
GOOGLE.COM.SA
GOOGLE.COM.PLZ.GIVE.A.PR8.TO.AUDIOTRACKER.NET
GOOGLE.COM.MX
GOOGLE.COM.IS.SHIT.SQUAREBOARDS.COM
GOOGLE.COM.IS.NOT.HOSTED.BY.ACTIVEDOMAINDNS.NET
GOOGLE.COM.IS.HOSTED.ON.PROFITHOSTING.NET
GOOGLE.COM.IS.APPROVED.BY.NUMEA.COM
GOOGLE.COM.HAS.LESS.FREE.PORN.IN.ITS.SEARCH.ENGINE.THAN.SECZY.COM
GOOGLE.COM.DO
GOOGLE.COM.CO
GOOGLE.COM.BR
GOOGLE.COM.BEYONDWHOIS.COM
GOOGLE.COM.AU
GOOGLE.COM.ACQUIRED.BY.CALITEC.NET
GOOGLE.COM

To single out one record, look it up with “xxx”, where xxx is one of the
of the records displayed above. If the records are the same, look them up
with “=xxx” to receive a full display for each record.

>>> Last update of whois database: Fri, 10 Jul 2009 04:23:42 UTC <<<

NOTICE: The expiration date displayed in this record is the date the
registrar's sponsorship of the domain name registration in the registry is
currently set to expire. This date does not necessarily reflect the expiration
date of the domain name registrant's agreement with the sponsoring
registrar. Users may consult the sponsoring registrar's Whois database to
view the registrar's reported date of expiration for this registration.

TERMS OF USE: You are not authorized to access or query our Whois
database through the use of electronic processes that are high-volume and
automated except as reasonably necessary to register domain names or
modify existing registrations; the Data in VeriSign Global Registry
Services' ("VeriSign") Whois database is provided by VeriSign for
information purposes only, and to assist persons in obtaining information
about or related to a domain name registration record. VeriSign does not
guarantee its accuracy. By submitting a Whois query, you agree to abide
by the following terms of use: You agree that you may use this Data only
for lawful purposes and that under no circumstances will you use this Data
to: (1) allow, enable, or otherwise support the transmission of mass
unsolicited, commercial advertising or solicitations via e-mail, telephone,
or facsimile; or (2) enable high volume, automated, electronic processes
that apply to VeriSign (or its computer systems). The compilation,
repackaging, dissemination or other use of this Data is expressly
prohibited without the prior written consent of VeriSign. You agree not to
use electronic processes that are automated and high-volume to access or
query the Whois database except as reasonably necessary to register
domain names or modify existing registrations. VeriSign reserves the right
to restrict your access to the Whois database in its sole discretion to ensure
operational stability. VeriSign may restrict or terminate your access to the
Whois database for failure to abide by these terms of use. VeriSign
reserves the right to modify these terms at any time.

The Registry database contains ONLY .COM, .NET, .EDU domains and
Registrars.

Sinchew-i being hacked

At around 3:38 PM today (6 July 2009), there were some code injection in pages that www.sinchew-i.com served.

It is now fixed ;)
Here is the page, the source code.