
Journal blinder's Journal: Java HTML Renderer Question 6
first a little background. the project site that i run, it catalogs links (provided by users) and every night there's a batch process that runs, it queries the database and builds a lucene index (which is then searched on the front end).
anyway, something i want to incorporate is the ability for the indexing engine, to go out to a URL its processing and render the URL it gets as an image (i.e. png or jpg). much like how alexa does it.
i've have very limited success with such things like flying saucer's xhtml rendering engine (doesn't seem to support CSS at all).
i'm basically looking for a method to render HTML and save it as an image, sorta like bulk screen scraping. anyone have any experience with this type of functionality?
yeah, it has to be java, or accessible via a java framework (i wrote a very elegant and highly scalable back-end for this site so incorporating new functionality is quite simple)... i just need to figure out the best way to do this.
there is a commericial product called web renderer, but i'd prefer not to rely on a 3rd party product, plus i would rather spend my own resources on the solution than spending money on a product.
Browser embedding (Score:1)
Check out embedding a readily availalbe browser, maybe FF (not sure on embedding, but the executive does take command line arguments on both linux and windows.)
Let me me make sure I understand (Score:2)
Re:Let me me make sure I understand (Score:2)
Re:Let me me make sure I understand (Score:2)
my goal here is to either find an alternate rendering engine (which is what is required for this sort of thing) or learn
Unhelpful reply... (Score:1)
Or maybe this [ubrowser.com] or this [mozdev.org] might provide inspiration.
Sorry I can't offer anything useful though dude.
Me Too... (Score:1)
I did find this note:
http://weblogs.mozillazine.org/roc/archives/2005/0 5/rendering_web_p.html [mozillazine.org]
but I don't know enough about mozilla/Gecko internals to take advantage of it, or even figure out whether it's useful or not in this context, plus it is, for me, a rather heavyweight solution to my tiny little problem.
I also looked at a half dozen other solutions that don't q