Universal Verticalization Format

Universal verticalization format je jedným z možných vstupov vertikalizátora. Do tohto formátu by mali byť prevádzané informácie, ktoré by vertikalizátor v pôvodnej podobe nevedel spracovať. Výhodou formátu je, že skripty na spracovanie rôznych zdrojov informácií nemusia riešiť vertikalizáciu, ale iba prevod do tohto univerzálneho formátu.

Obsah

1 Popis formátu

Universal verticalization format je formát textový, obsah súborov je štruktúrovaný pomocou XML značiek. Pre súbory sa odporúča používať prípona .ufv, i keď samotný vertikalizátor jej použitie nevyžaduje, typ vstupu vertikalizátora sa určuje parametrom -t/--inputtype. Vertikalizátor dokáže spracovať aj súbory komprimované pomocou nástroja gzip, v takom prípade sa ale vyžaduje prípona súboru .gz, pre jasnú identifikáciu formátu ale odporúčam používať .ufv.gz.

XML značky a mená ich atribútov je nutné zapisovať malými písmenami, hodnoty atribútov je potrebné zatvárať do úvodzoviek, nie apostrofov.

Jeden súbor môže obsahovať viac dokumentov na vertikalizáciu, v takom prípade sa ale značky <doc> a </doc> ohraničujúce jednotlivé dokumenty musia nachádzať na samostatnom riadku, pričom pred nimi nesmie byť žiadny iný znak (pozor na medzery či tabulátory). Ak súbor obsahuje viac dokumentov, nepoužíva sa koreňový prvok (formát teda nie je XML validný).


2 Použité XML značky

2.1 <doc></doc>

Prvok ohraničuje jeden dokument na vertikalizáciu. Ako bolo spomenuté vyššie, jeden súbor môže obsahovať viac dokumentov. V takom prípade sa ale počiatočná aj ukončovacia značka musia nachádzať na samostatnom riadku hneď na jeho začiatku.

Atribúty:

2.2 <head></head>

Podobne ako v prípade HTML ohraničuje informácie o dokumente, ktoré nie sú súčasťou obsahu. V prípade, že o dokumente nie je potrebné poskytnúť žiadne ďalšie informácie, môže byť tento prvok vynechaný. V opačnom prípade sa ale musí nachádzať bezprostredne na začiatku obsahu prvku <doc></doc>.

2.3 <title></doc>

Titulok (názov) dokumentu. Prvok sa musí nachádzať vo vnútri prvku <head></head>. Nie je povinný.

2.4 <meta name="some_name">some_value</meta>

Informácia o dokumente, ktorá ale nepatrí do jeho obsahu. Atribút name určuje typ informácie (názov premennej), hodnota je potom určená vnútorným obsahom prvku. Prvok sa musí nachádzať vo vnútri prvku <head></head>. Metadáta sa líšia v závislosti od typu dokumentu, môže sa jednať napríklad o meno autora článku, dátum vydania a podobne. V prípade, že dokument obsahuje viac prvkov s rovnakou hodnotou atribútu name, do výstupného vertikálu sa dostanú všetky hodnoty, budú ale stále uvedené dve samostatné informácie.

2.5 <body></body>

Prvok ohraničuje samotný obsah dokumentu (text). V prípade, že dokument obsahuje prvok <head></head>, musí sa telo dokumentu nachádzať až za týmto prvkom. Ukončovacia značka </head> pri vertikalizácii rozdelí dokument na hlavičku a telo. Obe časti sa následne spracujú samostatne.

2.6 <p></p>

Jeden odstavec textu. Text, ktorý nie je vo vnútri tohto prvku, bude ignorovaný. Prvok sa musí nachádzať vo vnútri prvku <body></body>. Dokument môže obsahovať viac odstavcov textu, nie je povolené vnorovať odstavce do seba, môžu byť umiestnené iba jeden za druhým.

2.7 <a href="some_url">some_text</a>

Odkaz na iný dokument, podobne ako v HTML. Atribút href obsahuje URL adresu odkazu, adresa môže byť relatívna, pri vertikalizácii sa vypočíta absolútna za použitia URL adresy celého dokumentu. Vnútro prvku určuje odkazujúci text. Značka sa musí nachádzať vo vnútri prvku <p></p>. Žiadne iné HTML atribúty nie sú podporované.

2.8 <img src="some_url">

Výskyt obrázku v dokumente. Atribút src určuje URL adresu súboru s obrázkom, adresa môže byť relatívna, pri vertikalizácii sa vypočíta absolútna za použitia URL adresy celého dokumentu. Značka sa musí nachádzať vo vnútri prvku <p></p>. Žiadne iné HTML atribúty nie sú podporované.


3 Príklady

3.1 Jeden dokument

 <doc url="https://techcrunch.com/2016/10/04/everything-you-need-to-know-from-googles-pixel-event/" id="f54876ab654f">
	 <head>
	 	 <title>Everything you need to know from Google’s Pixel event</title>
	 	 <meta name="author">Devin Coldewey</meta>
	 	 <meta name="tags">google artificial intelligence pixel</meta>
	 </head>
	 <body>
	 	 <p>Google unveiled a gaggle of new products and services today at its event in San Francisco. The company was all about making things easy and seamless, so we thought we’d do the same. Here’s all the stuff you need to know, in one place.</p>
 		 <p>CEO Sundar Pichai came out first to set the stage, touting the company’s advances in artificial intelligence. Google’s research has yielded improved image recognition, speech synthesis and translation capabilities — and Go skills, though those are less useful in the average home.</p>
 		 <p>“AI is going to lead the way,” he said. Well, it certainly did today. Every product had some kind of intelligence baked in — though whether that’s a plus or a minus is up to you.</p>
 		 <p>First up were the new Pixel phones. These sleek devices are “the first phones designed by Google inside and out.” The accuracy of that claim is perhaps debatable, but the devices are definitely focused on the pure Google experience.</p>
 		 <p>The specs are flagship-level, with a 12.3-megapixel camera that scored higher than even the mighty iPhone 7 in DxOMark’s labs. “No unsightly camera bump” either, teased hardware head Rick Osterloh. A new camera app means microscopic shutter lag, intelligent HDR photos and anyone buying a Pixel gets unlimited storage of full-resolution images in the Cloud.</p>
	 </body>
 </doc>
        

3.2 Viac dokumentov

 <doc url="https://techcrunch.com/2016/10/04/everything-you-need-to-know-from-googles-pixel-event/" id="f54876ab654f">
	 <head>
		 <title>Everything you need to know from Google’s Pixel event</title>
		 <meta name="author">Devin Coldewey</meta>
		 <meta name="tags">google artificial intelligence pixel</meta>
	 </head>
	 <body>
		 <p>Google unveiled a gaggle of new products and services today at its event in San Francisco. The company was all about making things easy and seamless, so we thought we’d do the same. Here’s all the stuff you need to know, in one place.</p>
		 <p>CEO Sundar Pichai came out first to set the stage, touting the company’s advances in artificial intelligence. Google’s research has yielded improved image recognition, speech synthesis and translation capabilities — and Go skills, though those are less useful in the average home.</p>
		 <p>“AI is going to lead the way,” he said. Well, it certainly did today. Every product had some kind of intelligence baked in — though whether that’s a plus or a minus is up to you.</p>
		 <p>First up were the new Pixel phones. These sleek devices are “the first phones designed by Google inside and out.” The accuracy of that claim is perhaps debatable, but the devices are definitely focused on the pure Google experience.</p>
		 <p>The specs are flagship-level, with a 12.3-megapixel camera that scored higher than even the mighty iPhone 7 in DxOMark’s labs. “No unsightly camera bump” either, teased hardware head Rick Osterloh. A new camera app means microscopic shutter lag, intelligent HDR photos and anyone buying a Pixel gets unlimited storage of full-resolution images in the Cloud.</p>
	 </body>
 </doc>
 <doc url="https://techcrunch.com/2016/10/06/european-startups-get-on-the-disrupt-london-battlefield-stage-deadline-extended/" id="abd542c6f3c9">
	 <head>
		 <title>European Startups! Get on the Disrupt London Battlefield stage! Deadline extended!</title>
		 <meta name="author">Mike Butcher</meta>
	 </head>
	 <body>
		 <p>The deadline to apply for the Startup Battlefield at Disrupt London has been extended! Startups from across the world now have until 12pm PT on October 13th to get their application in front of the selection team (TechCrunch journalists and editors). Apply now!</p>
		 <p>Why compete in the Startup Battlefield? For starters, you’ll get to present your company to the best and brightest entrepreneurs and investors in the world who will give you incredible feedback on how to make your product or service even better.</p>
		 <p>If that isn’t enough awesome stuff to encourage you to apply, the winner of the Startup Battlefield receives £30,000 and the coveted Disrupt Cup. And the best part? There are absolutely no fees, equity or otherwise, to participate in the Startup Battlefield. TechCrunch recently launched the Startup Battlefield Scholarship Fund, so there are really no excuses. Sounds pretty good, huh? Submit an application today.</p>
		 <p>Our live video stream attracts around 1M views live, and over-all we get around 10M on all the video when all is said and done. It’s an astounding platform, not least for all the external media/press we invite to cover the single stage we have.</p>
	 </body>
 </doc>
 <doc url="https://techcrunch.com/2016/10/06/linkedin-will-now-let-you-quietly-signal-when-youre-looking-for-a-job/" id="c6f3c9abd542">
	 <head>
		 <title>LinkedIn will now let you quietly signal when you’re looking for a job</title>
		 <meta name="author">Ingrid Lunden</meta>
	 </head>
	 <body>
		 <p>One of the hallmarks of any social network is how it pushes people to share their experiences with others, but increasingly we’ve seen a lot of moves by social platforms to give people the option to remain very private, if they choose.</p>
		 <p>One of the latest moves on this front comes from LinkedIn, the social network for the working world with some 450 million members and currently getting acquired by Microsoft for $26.2 billion: today the company is turning on a new feature for its users who may be interested in quietly looking for a new job, while still employed somewhere else.</p>
		 <p>Open Candidates, as the new feature is called, will let those users essentially create a signal that will be viewable only to recruiters looking for candidates like that person, to let them know they would be open to getting contacted about new job opportunities.</p>
		 <p>The new feature is being turned on globally as part of a larger revamp of LinkedIn’s recruitment products, which also include new and more dynamic Career Pages — customised pages that companies that are recruiting employees use to advertise themselves —  and a new way for recruiters to connect at the backend with their clients who are hiring to provide more seamless integration.</p>
	 </body>
 </doc>

        

3.3 Jeden dokument bez hlavičky

 <doc url="https://techcrunch.com/2016/10/04/everything-you-need-to-know-from-googles-pixel-event/" id="f54876ab654f">
	 <body>
		 <p>Google unveiled a gaggle of new products and services today at its event in San Francisco. The company was all about making things easy and seamless, so we thought we’d do the same. Here’s all the stuff you need to know, in one place.</p>
		 <p>CEO Sundar Pichai came out first to set the stage, touting the company’s advances in artificial intelligence. Google’s research has yielded improved image recognition, speech synthesis and translation capabilities — and Go skills, though those are less useful in the average home.</p>
		 <p>“AI is going to lead the way,” he said. Well, it certainly did today. Every product had some kind of intelligence baked in — though whether that’s a plus or a minus is up to you.</p>
		 <p>First up were the new Pixel phones. These sleek devices are “the first phones designed by Google inside and out.” The accuracy of that claim is perhaps debatable, but the devices are definitely focused on the pure Google experience.</p>
		 <p>The specs are flagship-level, with a 12.3-megapixel camera that scored higher than even the mighty iPhone 7 in DxOMark’s labs. “No unsightly camera bump” either, teased hardware head Rick Osterloh. A new camera app means microscopic shutter lag, intelligent HDR photos and anyone buying a Pixel gets unlimited storage of full-resolution images in the Cloud.</p>
	 </body>
 </doc>