A world without RSS would not hurt

Posted on 2 January, 2011 by karussell

This post is a quick reply to the post RSS Is Dying, and You Should Be Very Worried.

I really like rss reader. I have a lot of subscriptions to blogs. But I don’t think that RSS (Really Simple Syndication) has a future, simply because there will be at least one service similar to twitter. People simply like it more: “to follow *people*” and not websites!

Twitter is very powerful. You can create your own RSS-like reader from twitter like I did with jetwick. I mean, what do you want with an RSS reader? You want news, right? Personalized news? With RSS you’ll get news from blogs. With twitter you get personalized news from people as I said before or if you are interested in some topics you can get personalized news via search terms too! That’s very powerful: to get news about topics. That’s what you want, I guess!?

Now there are some good statements in the post where I wanted to add my comment now.

“IF RSS DIES, WE LOSE THE ABILITY TO READ IN PRIVATE”
First: why would you use chrome if you want to read in private??? 🙂
Second: Yes, privacy is a problem with twitter and facebook. But there will be tools like jetwick where you can “silently” follow people if you want. So this is not an issue for me. Of course, you’ll need to host your own version of jetwick to really have no privacy issues 😉
“The ability for a website operator to be in control of what he advertise to his users … “
I don’t understand this. A website operator will always have options to advertise … and with RSS reader it is even more likely that the reader skip advertisments. Or what did you mean?
“If every website on the web has to have a Facebook account in order to exist in practical terms, the web is dead—competition is dead“
I don’t think that that will be the case. Browsers will add the ability to post to popular services soon. Similar to the RSS icon in the late 20XX 😉
“The ability for us to aggregate, mash-up and interpret news without having to go through a closed API that may change on a whim, or disagree with our particular usage”
“A developer should not have to be fluent in Twitter, Facebook and a million different private APIs just to aggregate content from different websites you read”
Valid arguments. But you are more likely to get the latest news via twitter rather than with your static set of blogs. In the end there will be something like a big realtime rss like API without the problems you describe 😉 But yes, this is a big argument. To be tied to an external service is bad.

It is also very powerful that with twitter-like web services

even people without a blog can share valueable informations (e.g. links)
a lot of people are already there. As a user RSS is very complicated. You have to search for blogs you are interested in. And because of the massive user-base of twitter you don’t need to mashupthings IMHO.

BTW: why do you use the RSS reader of the browser? I’m using lifera …

BTW2: you can follow me at twitter – and see not only what I’m posting here 😉

Memory Efficient XML Processing not only with DOM

Posted on 29 April, 2010 by karussell

How can I efficiently parse large xml files which can be several GB large? With SAX? Hmmh, well, yes: you can! But this is somewhat ugly. If you prefer a better maintable approach you should definitely try joost which does not load the entire xml file into memory but is quite similar to xslt.

But how can I do this with DOM or even better dom4j, if you only have 50 MB or even less RAM? Well, this is not always possible, but under some circumstances you can do this with a small helper class. Read on!

E.g.you have the xml file

<products>
  <product id="1"> CONTENT1 .. </product>
  <product id="2"> CONTENT2 .. </product>
  <product id="3"> CONTENT3 .. </product>
  ...
</products>

Then you can parse it product by product via:

List<String> idList = new ArrayList<String>();
ContentHandler productHandler =
         new GenericXDOMHandler("/products/product") {
  public void writeDocument(String localName, Element element)
        throws Exception {
    // use DOM here
    String id = element.getAttribute("id");
    idList.add(id)
  }
}
GenericXDOMHandler.execute(new File(inputFile), productHandler);

How does this work? Every time the SAX handler detects the <product> element it will read the product tree (which is quite small) into RAM and call the writeDocument function. Technically we have added a listener to all the product elements with that and are waiting for ‘events’ from our GenericXDOMHandler. The code was developed for my xvantage project but is also used in production code on big files:


import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

/**
 * License: http://en.wikipedia.org/wiki/Public_domain
 * This software comes without WARRANTY about anything! Use it at your own risk!
 *
 * Reads an xml via sax and creates an Element object per document.
 *
 * @author Peter Karich, peathal 'at' yahoo 'dot' de
 */
public abstract class GenericXDOMHandler extends DefaultHandler {

 private Document factory;
 private Element current;
 private List<String> rootPath;
 private int depth = 0;

 public GenericXDOMHandler(String forEachDocument) {
  rootPath = new ArrayList<String>();
  for (String str : forEachDocument.split("/")) {
    str = str.trim();
    if (str.length() > 0)
    rootPath.add(str);
  }

  if (rootPath.size() < 2)
    throw new UnsupportedOperationException("forEachDocument"+
       +" must have at least one sub element in it."
       + "E.g. /root/subPath but it was:" + rootPath);
 }

 @Override
 public void startDocument() throws SAXException {
  try {
    factory = DocumentBuilderFactory.newInstance().
         newDocumentBuilder().newDocument();
  } catch (Exception e) {
    throw new RuntimeException("can't get DOM factory", e);
  }
 }

 @Override
 public void startElement(String uri, String local,
      String qName, Attributes attrs) throws SAXException {

  // go further only if we add something to our sub tree (defined by rootPath)
  if (depth + 1 < rootPath.size()) {
    current = null;
    if (rootPath.get(depth).equals(local))
      depth++;

    return;
  } else if (depth + 1 == rootPath.size()) {
    if (!rootPath.get(depth).equals(local))
      return;
  }

  if (current == null) {
    // start a new subtree
    current = factory.createElement(local);
  } else {
    Element childElement = factory.createElement(local);
    current.appendChild(childElement);
    current = childElement;
  }

  depth++;

  // Add every attribute.
  for (int i = 0; i < attrs.getLength(); ++i) {
    String nsUri = attrs.getURI(i);
    String qname = attrs.getQName(i);
    String value = attrs.getValue(i);
    Attr attr = factory.createAttributeNS(nsUri, qname);
    attr.setValue(value);
    current.setAttributeNodeNS(attr);
  }
 }

 @Override
 public void endElement(String uri, String localName,
     String qName) throws SAXException {

  if (current == null)
    return;

  Node parent = current.getParentNode();

  // leaf of subtree
  if (parent == null)
    current.normalize();

  if (depth == rootPath.size()) {
    try {
      writeDocument(localName, current);
    } catch (Exception ex) {
      throw new RuntimeException("Exception"+
        +" while writing one element of path:" + rootPath, ex);
    }
  }

  // climb up one level
  current = (Element) parent;
  depth--;
 }

 @Override
 public void characters(char buf[], int offset, int length)
       throws SAXException {
  if (current != null)
    current.appendChild(factory.createTextNode(
       new String(buf, offset, length)));
 }

 public abstract void writeDocument(String localName, Element element)
 throws Exception {
 }

 public static void execute(File inputFile,
     ContentHandler handler)
     throws SAXException, FileNotFoundException, IOException {

   execute(new FileInputStream(inputFile), handler);
 }

 public static void execute(InputStream input,
     ContentHandler handler)
     throws SAXException, FileNotFoundException, IOException {

   XMLReader xr = XMLReaderFactory.createXMLReader();
   xr.setContentHandler(handler);
   InputSource iSource = new InputSource(new InputStreamReader(input, "UTF-8"));
   xr.parse(iSource);
 }
}

PS: It should be simple to adapt this class to your needs; e.g. using dom4j instead of DOM. You could even register several paths and not only one rootPath via a BindingTree. For an implementation of this look at my xvantage project .

PPS: If you want to process xpath expressions in the writeDocument method be sure that this is not a performance bottleneck with the ordinary xpath engine! Because the method could be called several times. In my case I had several thousand documents, but jaxen solved this problem!

PPPS: If you want to handle xml writing and reading (‘xml serialization’) from Java classes check this list out!

Xvantage – Yet Another Xml Serializer!

Posted on 29 September, 2009 by karussell

In one of my last posts I listed some xml serializers and binding tools. After trying XStream, JAXB, XmlEncoder, Apache Digester and X2JB I decided to create another one: xvantage!

Why yet another xml tool?

To make it clear: At the moment xvantage is in its early stage (one week old) and I hacked it down in some hours of my spare time. So it is not a fully fledged and bugless solution like the other ones should be.

But it has some new ideas included to make the serialization as easy as XStream and XmlEncoder, but to make the xml checkable through an xsd file. The advantage of a valid xml is first of all: “then you know all is fine” and second “It is nice to read and editable from a person!”. The latter one was very important for me and is really important for xml configuration files.

Apache Digester was another candidate but the set up was relative complex and the dependencies are to big for me. And last but not least: you need an additional, not further developed library (Betwixt) to make writing working too! So xvantage seems to solve at least these two problems: it is small and it can write xml from your POJOs and read it back.

How would it look like to write my objects?

// Create a data pool with all your POJOs you want to serialize
DataPool pool = new DefaultDataPool();
Map<Long, SimpleObj> map = pool.getData(SimpleObj.class);
map.put(0L, new SimpleObj("test"));
StringWriter writer = new StringWriter();
xadv.mount("/path/", SimpleObj.class);
xadv.saveObjects(pool, writer);

The resulting xml looks like

<?xml version="1.0" encoding="UTF-8"?>
<path>
<simpleObj id="0">
<name>test</name>
</simpleObj>
</path>

And reading?

// get xml from somewhere
StringReader iStream = new StringReader(
 "<path>" +
 "   <myobject><name>test</name></myobject>" +
 "</path>");

// mount to /path/ with an alternative name 'myobject' instead of the default which would be simpleObj
// this is the preferred way for mounting, because otherwise class refactoring results in different xml
xadv.mount("/path/myobject", SimpleObj.class);
DataPool pool = xadv.readObjects(iStream);

// get the first SimpleObj and check the name
SimpleObj obj = pool.getData(SimpleObj.class).values().iterator().next();
assertEquals("test", obj.getName());

Why does xvantage needs the DataPool interface?

Without it it couldn’t handle references properly. And now with this DataPool interesting use cases arises, e.g. where parts of an object graph should be refreshed through xml (imagine you grab some objects as xmls through HTTP GET …)

Why do we need to mount classes?

To mount a class means: xvantage should track every occurance of that class as references and should NOT nest the object within other objects.

This looks interesting, but does it works for more complex objects?

Yes, it should. I could successfully embed this in my TimeFinder project, where I tried to persist 4 entities (some of them with hundreds of objects and several references) and read them successfully back. Objects which are not mounted explicitly will be nested within mounted objects like in xstream.

Look into this for more information.

Is xvantage an xml binding tool?

No, it is an xml processor. So serialization and deserialization is easily possible if you start from Java code. At the moment it is nearly impossible to start from xml. Use JAXB or JiBX in that case.

What is are the disadvantages?

As I mentioned earlier: it is may be not so stable like all the others, because it is in early development
It is not (yet?) so powerful and configurable like JAXB and all the others. So at the moment you cannot make your dream xml happen (i.e. not suited for binding)
may be not so fast
not thread save, you have to use multiple instances of xvantage
a no-arg constructor (at least private), getter and setters for all classes are required to use xvantage

And what are the advantages?

There are several

easy xml (de-)serialization
small library <50KB (without dependencies!)
junit tested
cross references are allowed! So you can reference even between documents and you could read/write from/to multiple files!
the xml could be checked via xsd (but no must)
no deeply nested unreadable xml (the same as 6.)
no checked exceptions
no license issues and free source code (public domain!)

How can I use it?

Just clone the git repository:

git clone git://github.com/karussell/xvantage.git
cd xvantage
mvn clean install -Dmaven.test.skip=true

Currently it is a netbeans project and the tests in maven will not pass because some resource files are not available.(?)

How does it work in theory?

Writing

Xvantage writes the object to xml (via SAX). The object will be directly converted to xml if it is a primitive type, a collection (list, set), a map, an array, your own implementations of such an interface or a BitSet.
If the values of a collection or a property references to a mounted POJO it will write only the id
If no id was found (in case we have an unmounted POJO) the POJO will be written directly as subtree to the current object.

Reading

It reads the xml tree via SAX and reads discrete objects according to the mounted classes via DOM
If the object is a mounted POJO it will read the id and try to set the properties from the xml structure. If an object with the given id already exist, this instance will be used to fill in the properties from the xml nodes.
If a setter is one of the mounted classes a new object with the id (from xml) will be created and the properties will be filled later.
If the object is a collection it will fill the collection from the values
If the object is not a mounted POJO it will try to read the nested xml structure to set the properties

Provide Feedback!

Be constructive and tell me facts (maybe such a tool already exists :0!). Use comments or peathal at yaahooo dot de.

Thanks a lot for your attention!

Xml to Xsd-Schema (Xml2Xsd)

Posted on 14 September, 2009 by karussell

wget http://jing-trang.googlecode.com/files/trang-20090818.zip
unzip trang-20090818.zip
cd trang-20090818/
wget -O test.xml “http://www.metapartei.de/polls/show/10.xml”
java -jar trang.jar test.xml test.xsd
cat test.xsd

Xml Serializers For Java

Posted on 3 September, 2009 by karussell

At the moment I know only some xml serializers with different capabilities to write an object to an xml file and retrieve it back from it:

JDK classes (XmlEncoder and XmlDecoder)
XStream (BSD license)
The Simple project (Apache License 2.0) it even handles cycles.
Apache Digester (Apache license) can only read xml. To write xml use Betwixt.
X2JB (LGPL)
XmlFormat from javolution.org (BSD license)
cedarsoft serialization (GPL with classpath exception)
Xerces XmlSerializer (Apache license) is this used as the implementation for the first one? I discovered the class com.sun.org.apache.xml.internal.serialize.XMLSerializer in the jdk …
Burlap from Caucho (Apache license) used as serialization format over http.
JAXB (Apache license)
Smooks (LGPL)
Moose (LGPL)
Datanucleus (Apache license) JDO persistence into relational db, oo-db, xml, excel, …
vtd-xml (GPL)
WAX (LGPL) xml writer
… more libraries (and an old benchmark)
… or web services tools

Although xstream is very easy to use (like XmlEncoder) it is not that convenient to use if you want that you xml fits to a special schema (if you have cyclic references). For the latter case you can use JiBX, JAXB, Castor or whatever …

Please write a comment of your choice and include some information about the library and why you chose it.

Update:

The Simple framework looks very promising!
See this performance comparison (not only xml serializers! kryo looks simple and fast!)

BTW: If you want to unit test reading and writing xml then xmlunit will help you!

IE 8 Accelerator

Posted on 16 January, 2009 by karussell

I don’t like IE8 (*), but for a customer of mine I had to create the xml for an accelerator.

See here for more info:

Now the point I created this blog entry: Use always the same url! Use e.g. http://www.metapartei.de in <os:homepageUrl> AND in ALL the action attributes of <os:preview> and <os:execute>. Otherwise you cannot install it.

* Why I don’t like it? I don’t like IE in general, because

it has no firebug
not standard conform(although Firefox isn’t too)
no such innovative like opera…
it comes from MS

Object Oriented, Relational Databases and XML

Posted on 13 June, 2008 by karussell

A long time I invested to explore the project called Datanucleus where you can use object oriented DBs like db4o, relational DBs like derby or just XML files as datasource!

Thats fantastic and open source :-). You can find more implementations of JDO here.

Other resources:

Derby is a good database, which could pass ACID tests. You could run it in embedded mode, which makes it suiteable for the usage in your standalone swing application.

A pure xml library is XStream. Where you can (de)serialize your objects with one method call. No configuration! Really.

No OOXML

Posted on 11 April, 2008 by karussell

OOXML is now ISO standard!

OOXML will divide the (internet) communication into a Microsoft-world and the rest. With the doc-format MS started the ‘way of division’ and now they will be successful!?

“Finally I call on users all around the world to look to Norway and follow the example we have set. Raise a storm of protest! Uncover the irregularities that have taken place in your country! Insist that your Governments change their vote to reflect the interests of ordinary people and not the interests of monopolists and bureaucrats.” — Steve Pepper

So, nevertheless go to http://www.noooxml.org/petition and see the reasons, why OOXML shouldn’t be a standard.

Relax NG

Posted on 22 February, 2008 by karussell

For my open source timetabling application (gstpl) I want to create a file format. There are some standard timetabling formats, but none of them is easy and general.

I decided to use a schema to validate the created xml file. Checking the well-formedness is easy, but for validation you need to define a grammer, i.e.define what tags (and attributes) should be possible and which combinations.

There are three main schema types: DTD, XML Schema and Relax NG (.dtd, .xsd and .rng/.rnc). The first one is not so powerful, so perhaps it is better to use XML Schema. There are a lot of editors like “NetBeans XML” which can handle the validation of XML Schema and even the schema creation from within the editor. There is a graphical editor available:

For Relax NG I didn’t find such a good and free editor easily for linux, although I could write alot faster with Relax NG. E.g. in XML Schema you need:

<xs:element name=”Header”>
<xs:complexType>
<xs:sequence> …

In Relax NG you only need one line:

And attributes are handled compareable to elements – in this way you can define e.g. that either a tag ‘test’ or an element ‘test’ should exist:

</choice>

I found no way to define it in XML Schema.

Now here are some editors:

XML Copy Editor RelaxNG editing and validation (no support for compact style)
Etna (only xml editor where you ensure that the created xml is valid against some given schemas)
nXML addon for emacs

some (extern) validators:

jing compact and xml style
rnv only compact style
relax ng + XML Schema via Sun Multi-Schema XML Validator (MSV), select msv.20080213.zip
relax ng with embedded schematron constraints via MSV, select relames.20060319.zip.

some (extern) conversation tools:

via MSV (select rngconv.20060319.zip) you can create a relax ng file from various schemas types.
trang (could even create a schema from a given example xml – I didn’t tried it)
relaxer is a compiler to generate various artifacts from a schema (download version 1.0), could even create a schema from a given example xml – it works!! A lot more intersting features are provided.

some tools:

via MSV (select xmlgen.20060319.zip) you can create xml files from various schemas types. To have some test instances of the created schema.
XML Schema Datatypes implementation via MSV (select xsdlib.20060319.zip) to use XML Schema Part 2

So after creating my schema in relax ng (this was really faster then with XML Schema) I can convert it into XML Schema. Then NetBeans offers a JAXB Binding (schema to java code) for XML Schemas:

(You have to install the whole netbeans – I think the xml + soa stuff??)

Karussell

Thoughts about Java and more

Category Archives: XML