Universal Verticalization Format

Universal verticalization format is one of the possible inputs of the verticalizer. This format should primarily be used for information that the verticalizer could not process in their default format. Advantage of this format is that scripts which handle processing of various sources of information do not have to deal with verticalization, but only convert information to this universal format.

Table of Contents

1 Format description

Universal verticalization format is text format used to carry information in files structured using the XML tags. It is recommended to use suffix .ufv for files, even though the verticalizer itself does not require the use of it, the verticalizer's input type is set using parameter -t --inputtype. Verticalizer can also process files compressed by gzip tool, in that case use of file extension .gz is required, however it is recommended to use .ufv.gz for a clear identification of the format.

XML tags and their attributes must be written in lowercase, attribute values need to be closed in quotes, not apostrophes.

One file can contain more documents for verticalization, it that case, however, tags <doc> and </doc> surrounding each document have to be on an individual row and at the same time any other character cannot be used (beware of spaces and tabulators). If the file contains more documents, the root element is not being used (thus the format is not a valid XML).


2 XML tags used

2.1 <doc></doc>

Element encloses one document to be verticalized. As mentioned above, a file may contain several documents. If that is the case, then the opening and closing tags need to be on a separate line at the beginning of the document.

Attributes available:

2.2 <head></head>

Similarly to the HTML it encapsulates information about the document that is not a part of the content. In case that no additional information about the document are required, this element can be omitted. Otherwise it must be located immediately at the beginning of the content of the element <doc></doc>.

2.3 <title></doc>

Document title (name). Element must occur inside the element <head></head>. It is optional.

2.4 <meta name="some_name">some_value</meta>

Document information, which however does not belong to its content. Attribute name specifies the type of information (variable name), the value is then specified by the inner content of the element. Element must occur within element <head></head>. Metadata differ depending on the type of document, it may be for example a author's name, release date and such. If the document contains more elements of the same value of attribute name, the output of vertical will contain every value, however there still will be two separate pieces of information.

2.5 <body></body>

Element surrounds the content of the document (text). If the document includes element <head></head>, Document body must occur behind this element. Closing tag </head> will split the document onto header and body during verticalization. Both parts will be processed separately.

2.6 <p></p>

One paragraph of text. The text that is not inside this element, will be ignored. Element must occur within the <body></body> element. The document can contain multiple paragraphs of text, it is not allowed to nest paragraphs into each other, they may only be placed one after the other.

2.7 <a href="some_url">some_text</a>

Reference to another document, as in HTML. Attribute href contains the URL of the link, the address may be relative, during verticalization the URL address of the document will be used to calculate the absolute URL. The inside of the element determines the referring text. The tag must be located inside the element <p></p>. No other HTML attributes are supported.

2.8 <img src="some_url">

Occurrence of an image in the document. Attribute src contains the URL of the link, the address may be relative, during verticalization the URL address of the document will be used to calculate the absolute URL. The tag must be located inside the element <p></p>. No other HTML attributes are supported.


3 Examples

3.1 Single document

 <doc url="https://techcrunch.com/2016/10/04/everything-you-need-to-know-from-googles-pixel-event/" id="f54876ab654f">
	 <head>
	 	 <title>Everything you need to know from Google’s Pixel event</title>
	 	 <meta name="author">Devin Coldewey</meta>
	 	 <meta name="tags">google artificial intelligence pixel</meta>
	 </head>
	 <body>
	 	 <p>Google unveiled a gaggle of new products and services today at its event in San Francisco. The company was all about making things easy and seamless, so we thought we’d do the same. Here’s all the stuff you need to know, in one place.</p>
 		 <p>CEO Sundar Pichai came out first to set the stage, touting the company’s advances in artificial intelligence. Google’s research has yielded improved image recognition, speech synthesis and translation capabilities — and Go skills, though those are less useful in the average home.</p>
 		 <p>“AI is going to lead the way,” he said. Well, it certainly did today. Every product had some kind of intelligence baked in — though whether that’s a plus or a minus is up to you.</p>
 		 <p>First up were the new Pixel phones. These sleek devices are “the first phones designed by Google inside and out.” The accuracy of that claim is perhaps debatable, but the devices are definitely focused on the pure Google experience.</p>
 		 <p>The specs are flagship-level, with a 12.3-megapixel camera that scored higher than even the mighty iPhone 7 in DxOMark’s labs. “No unsightly camera bump” either, teased hardware head Rick Osterloh. A new camera app means microscopic shutter lag, intelligent HDR photos and anyone buying a Pixel gets unlimited storage of full-resolution images in the Cloud.</p>
	 </body>
 </doc>
        

3.2 Multiple documents

 <doc url="https://techcrunch.com/2016/10/04/everything-you-need-to-know-from-googles-pixel-event/" id="f54876ab654f">
	 <head>
		 <title>Everything you need to know from Google’s Pixel event</title>
		 <meta name="author">Devin Coldewey</meta>
		 <meta name="tags">google artificial intelligence pixel</meta>
	 </head>
	 <body>
		 <p>Google unveiled a gaggle of new products and services today at its event in San Francisco. The company was all about making things easy and seamless, so we thought we’d do the same. Here’s all the stuff you need to know, in one place.</p>
		 <p>CEO Sundar Pichai came out first to set the stage, touting the company’s advances in artificial intelligence. Google’s research has yielded improved image recognition, speech synthesis and translation capabilities — and Go skills, though those are less useful in the average home.</p>
		 <p>“AI is going to lead the way,” he said. Well, it certainly did today. Every product had some kind of intelligence baked in — though whether that’s a plus or a minus is up to you.</p>
		 <p>First up were the new Pixel phones. These sleek devices are “the first phones designed by Google inside and out.” The accuracy of that claim is perhaps debatable, but the devices are definitely focused on the pure Google experience.</p>
		 <p>The specs are flagship-level, with a 12.3-megapixel camera that scored higher than even the mighty iPhone 7 in DxOMark’s labs. “No unsightly camera bump” either, teased hardware head Rick Osterloh. A new camera app means microscopic shutter lag, intelligent HDR photos and anyone buying a Pixel gets unlimited storage of full-resolution images in the Cloud.</p>
	 </body>
 </doc>
 <doc url="https://techcrunch.com/2016/10/06/european-startups-get-on-the-disrupt-london-battlefield-stage-deadline-extended/" id="abd542c6f3c9">
	 <head>
		 <title>European Startups! Get on the Disrupt London Battlefield stage! Deadline extended!</title>
		 <meta name="author">Mike Butcher</meta>
	 </head>
	 <body>
		 <p>The deadline to apply for the Startup Battlefield at Disrupt London has been extended! Startups from across the world now have until 12pm PT on October 13th to get their application in front of the selection team (TechCrunch journalists and editors). Apply now!</p>
		 <p>Why compete in the Startup Battlefield? For starters, you’ll get to present your company to the best and brightest entrepreneurs and investors in the world who will give you incredible feedback on how to make your product or service even better.</p>
		 <p>If that isn’t enough awesome stuff to encourage you to apply, the winner of the Startup Battlefield receives £30,000 and the coveted Disrupt Cup. And the best part? There are absolutely no fees, equity or otherwise, to participate in the Startup Battlefield. TechCrunch recently launched the Startup Battlefield Scholarship Fund, so there are really no excuses. Sounds pretty good, huh? Submit an application today.</p>
		 <p>Our live video stream attracts around 1M views live, and over-all we get around 10M on all the video when all is said and done. It’s an astounding platform, not least for all the external media/press we invite to cover the single stage we have.</p>
	 </body>
 </doc>
 <doc url="https://techcrunch.com/2016/10/06/linkedin-will-now-let-you-quietly-signal-when-youre-looking-for-a-job/" id="c6f3c9abd542">
	 <head>
		 <title>LinkedIn will now let you quietly signal when you’re looking for a job</title>
		 <meta name="author">Ingrid Lunden</meta>
	 </head>
	 <body>
		 <p>One of the hallmarks of any social network is how it pushes people to share their experiences with others, but increasingly we’ve seen a lot of moves by social platforms to give people the option to remain very private, if they choose.</p>
		 <p>One of the latest moves on this front comes from LinkedIn, the social network for the working world with some 450 million members and currently getting acquired by Microsoft for $26.2 billion: today the company is turning on a new feature for its users who may be interested in quietly looking for a new job, while still employed somewhere else.</p>
		 <p>Open Candidates, as the new feature is called, will let those users essentially create a signal that will be viewable only to recruiters looking for candidates like that person, to let them know they would be open to getting contacted about new job opportunities.</p>
		 <p>The new feature is being turned on globally as part of a larger revamp of LinkedIn’s recruitment products, which also include new and more dynamic Career Pages — customised pages that companies that are recruiting employees use to advertise themselves —  and a new way for recruiters to connect at the backend with their clients who are hiring to provide more seamless integration.</p>
	 </body>
 </doc>

        

3.3 Single document without a header

 <doc url="https://techcrunch.com/2016/10/04/everything-you-need-to-know-from-googles-pixel-event/" id="f54876ab654f">
	 <body>
		 <p>Google unveiled a gaggle of new products and services today at its event in San Francisco. The company was all about making things easy and seamless, so we thought we’d do the same. Here’s all the stuff you need to know, in one place.</p>
		 <p>CEO Sundar Pichai came out first to set the stage, touting the company’s advances in artificial intelligence. Google’s research has yielded improved image recognition, speech synthesis and translation capabilities — and Go skills, though those are less useful in the average home.</p>
		 <p>“AI is going to lead the way,” he said. Well, it certainly did today. Every product had some kind of intelligence baked in — though whether that’s a plus or a minus is up to you.</p>
		 <p>First up were the new Pixel phones. These sleek devices are “the first phones designed by Google inside and out.” The accuracy of that claim is perhaps debatable, but the devices are definitely focused on the pure Google experience.</p>
		 <p>The specs are flagship-level, with a 12.3-megapixel camera that scored higher than even the mighty iPhone 7 in DxOMark’s labs. “No unsightly camera bump” either, teased hardware head Rick Osterloh. A new camera app means microscopic shutter lag, intelligent HDR photos and anyone buying a Pixel gets unlimited storage of full-resolution images in the Cloud.</p>
	 </body>
 </doc>