HtmlToText
-- -- Toggle navigation ASRAEL -- Home Overview Code Partners News -- Publications Login -- -- Template Acquisition for Open Event Extraction ANR Project 2016-2019 ( ANR-15-CE23-0018 ‐ A cquisition de S chémas pour la R econnaissance et l' A nnotation d' à vénements L iés) -- LIMSI-CNRS (Team ILES , @LIMSI_NLP ) -- CEA-LIST -- CEA-LIST (lab. LVIC ) -- AFP -- AFP ( MediaLab ) -- EURECOM -- EURECOM Main contact: Xavier Tannier (Univ. Paris-Sud, LIMSI-CNRS) Overview Information and communication society led to the production of huge volumes of content. This content is still generally non-structured (text, images, videos) and the promises of a "Web of Knowledge" are still long ahead. This situation evolves with the development of Open Data portals or resources such as DBPedia, that have made easier the access to information stored in databases (economic or demographic statistics, world knowledge contained in Wikipedia infoboxes, etc). However, most of the knowledge is still produced by textual data. Among the information concerned by the difficulty of accessing textual data, those related to events are of great interest, notably in the context of the emergence of data journalism. Data journalism has been fed until now by publicly available, statistical data, but it has paradoxically made only little use of the very journalistic materials that are events. The project ASRAEL aims at bridging this gap. Our proposal comes within the scope of the general scientific framework of information extraction (IE). We aim at extracting events from a large set of textual documents, without prior knowledge about them, and at populating and publishing a knowledge base of events. This knowledge base will be the support of a dedicated event search engine. We define event in a traditional information extraction way. An event is a structured representation of something that happens, with a nucleus, a spatio-temporal context and some arguments. The "event type" gathers comparable instances of events, as "earthquake", "election" or "car race". Arguments are attribute/value pairs that characterize an event type (for an earthquake, its location, date, magnitude, casualties...). A template is the set of arguments that can describe an event type (earthquake template, election template). The generic representation of an event is based on the rule of the "5 Ws" (What, Who, Where, When, Why) that prevails in the "Anglo-Saxon" way of writing articles. This rule stipulates that a good description of an event must make these five elements explicit. In automatic information extraction, the information about "Who", "Where" and "When" are extracted by a traditional and quite generic named entity recognition approach. On the other hand, the "What" is very domain-specific. For this reason, traditional IE systems lean on templates predefined by experts and identify events in texts with either rule-based systems or statistical models. However, in the general domain, where the huge number of possible events makes the manual definition of these templates impossible, information retrieval ("bag of words") methods take over, but do not provide a structured answer. In this project, we aim to tackle the following challenges: Discover automatically event templates from very large text corpora, and populate a knowledge base dedicated to events. This implies a mixture of supervised and non-supervised approaches, which is necessary as soon as one consider such a generic problem. Use this knowledge base in order to build an event aggregator and a semantic search engine. With this engine, a user (either journalist or end-user) will be able to query for an event type (e.g. earthquake) and provide filters on attribute values (location = Turkey, magnitude > 8, etc). The knowledge base will also be published following the linked data principles for other to re-use. -- -- Get Free Membership -- -- Work packages 1. Coordination 2. Extraction of events and generic attributes Tasks 2, 3 and 4 are intended to automatically discover the schemas (sets of attributes/values) corresponding to events and, in parallel, to build the instances of these schemas by annotating the documents, for the creation of the search engine in task 5. The first attributes explored will be the generic attributes, in particular dates and locations (task 2), then the others (task 3). Task 4 describes how the knowledge base will be populated and the documents annotated by the template put in place. The seeds of this schema discovery process are names representing types of events. These names will be extracted from the list of International Press Telecommunications Council (IPTC) categories, a very complete hierarchy that is already the entry point of the existing event search engine at AFP. This hierarchy contains theme names, many of which are event names (for example, "road accidents" is a subtype of "transport accidents", itself a subtype of "disasters and accidents"). All themes are codified and internationalized. 3. Structure of the event base Even though the global event representation framework is already defined (kernel and arguments in the form of attribute/value pairs), a preliminary step will be to design the structure and content of the event database. This effort goes first and foremost by a thorough reflection on the modeling of the events and in particular on the types of attributes, their granularity, their evolution over time. Events can be interrelated with causal relationships. They can also belong to series (e.g. the Olympic Games, Grammy Awards). An ontology on media events will be created at the beginning of the project, taking into account the needs expressed by the journalists and exchanges between the Medialab AFP and the scientists of CEA LIST, LIMSI and EURECOM. The partners will be able to rely in this field on the knowledge acquired by the Medialab AFP during the European projects Glocal (search and indexing of events, XML modeling) and French SCRIBO (semantic web), as well as those of EURECOM (EventMedia project and animation of the dedicated schema.org community) The definition of a structured basis of events also depends on the definition of its implementation, that is to say the choice of a representational formalism. As this base evolves as new types of events and events are discovered, the use of a triplestore is ideal. A flexible data model as defined by OWL will allow to add new attributes for the definition of event schemas, to make SPARQL queries, implying in some cases inferences, to use the taxonomy IPTC categories (Subject, Matter, Details) and to support multilingualism ("Putin", "Poutine"...). In the ontology model, a type of event will be a class and the attributes of the template of that type of ObjectProperty or DataProperty . Finally, lists of authorities (e.g. event categories) will be represented in SKOS to allow the same type of query. The search engine (Task 5) will use these attributes for constructing the index with attributes of event types as facets. The documents will be related to the ontology through the events that will be extracted and their constituents. We can also use the ontology to make inferences about indexed resources and exploit their results in full text search: the documents resulting from this search can for example be filtered or grouped according to types of events or entities inferred and not directly explained within them. We will use Virtuoso tools (for querying structured data in SPARQL), ElasticSearch and Solr (for full-text querying of data) as we have experienced in the HyperTED project. In order to get closer to the standards of AFP and IPTC, the processed documents will be exported in NewsMLG2 format. This will make it easy to associate metadata about events to each document, but also to annotate the HTML content of the document with micro-formats (RDFa or Microdata using the rNews and schema.org vocabularies). An OSGI content annotation string will
Informations Whois
Whois est un protocole qui permet d'accéder aux informations d'enregistrement.Vous pouvez atteindre quand le site Web a été enregistré, quand il va expirer, quelles sont les coordonnées du site avec les informations suivantes. En un mot, il comprend ces informations;
%%
%% This is the AFNIC Whois server.
%%
%% complete date format : DD/MM/YYYY
%% short date format : DD/MM
%% version : FRNIC-2.5
%%
%% Rights restricted by copyright.
%% See https://www.afnic.fr/en/products-and-services/services/whois/whois-special-notice/
%%
%% Use '-h' option to obtain more information about this service.
%%
%% [2600:3c03:0000:0000:f03c:91ff:feae:779d REQUEST] >> eurecom.fr
%%
%% RL Net [#######...] - RL IP [######....]
%%
domain: eurecom.fr
status: ACTIVE
hold: NO
holder-c: IE110-FRNIC
admin-c: UF2-FRNIC
tech-c: GRST1-FRNIC
tech-c: JCD1-FRNIC
tech-c: PG7865-FRNIC
zone-c: NFC1-FRNIC
nsl-id: NSL150618-FRNIC
registrar: GIP RENATER
Expiry Date: 07/03/2018
created: 08/03/1996
last-update: 07/03/2017
source: FRNIC
ns-list: NSL150618-FRNIC
nserver: dns.eurecom.fr [193.55.113.200]
nserver: ns-auth1.enst.fr
nserver: ns-auth2.enst.fr
source: FRNIC
registrar: GIP RENATER
type: Isp Option 1
address: 23-25 Rue Daviel
address: PARIS
country: FR
phone: +33 1 53 94 20 30
fax-no: +33 1 53 94 20 31
e-mail: domaine@renater.fr
website: http://www.renater.fr
anonymous: NO
registered: 01/01/1998
source: FRNIC
nic-hdl: IE110-FRNIC
type: ORGANIZATION
contact: Institut Eurecom
address: 450, route des Chappes
address: 06410 Biot
country: FR
phone: +33 4 93 00 81 00
fax-no: +33 4 93 00 82 00
e-mail: postmaster@eurecom.fr
registrar: GIP RENATER
changed: 27/03/2013 nic@nic.fr
anonymous: NO
obsoleted: NO
source: FRNIC
nic-hdl: UF2-FRNIC
type: PERSON
contact: Ulrich Finger
address: Institut Eurecom
address: 450, route de Chappes
address: 06410 Biot
country: FR
phone: +33 4 93 00 81 00
fax-no: +33 4 93 00 82 00
e-mail: finger@eurecom.fr
registrar: GIP RENATER
changed: 27/03/2013 nic@nic.fr
anonymous: NO
obsoleted: NO
source: FRNIC
nic-hdl: GRST1-FRNIC
type: PERSON
contact: Gip Renater Support Technique Dns
address: GIP RENATER
address: 23-25, rue Daviel
address: 75013 Paris
country: FR
phone: +33 1 53 94 20 40
e-mail: support@renater.fr
registrar: GIP RENATER
changed: 21/07/2015 nic@nic.fr
anonymous: NO
obsoleted: NO
source: FRNIC
nic-hdl: JCD1-FRNIC
type: PERSON
contact: Jean-Christophe Delaye
address: Eurecom - Eurecom Institut
address: 450, route des Chappes
address: 06410 Biot
country: FR
phone: +33 4 93 00 81 07
fax-no: +33 4 93 00 82 00
e-mail: delaye@eurecom.fr
registrar: GIP RENATER
changed: 27/03/2013 nic@nic.fr
anonymous: NO
obsoleted: NO
source: FRNIC
nic-hdl: PG7865-FRNIC
type: PERSON
contact: Pascal Gros
address: EURECOM
address: 450, route des Chappes
address: 06410 Biot
country: FR
phone: +33 4 93 00 81 22
fax-no: +33 4 93 00 82 00
e-mail: gros@eurecom.fr
registrar: GIP RENATER
changed: 27/03/2013 nic@nic.fr
anonymous: NO
obsoleted: NO
source: FRNIC
REFERRER http://www.nic.fr
REGISTRAR AFNIC
SERVERS
SERVER fr.whois-servers.net
ARGS eurecom.fr
PORT 43
TYPE domain
RegrInfo
DISCLAIMER
%
% This is the AFNIC Whois server.
%
% complete date format : DD/MM/YYYY
% short date format : DD/MM
% version : FRNIC-2.5
%
% Rights restricted by copyright.
% See https://www.afnic.fr/en/products-and-services/services/whois/whois-special-notice/
%
% Use '-h' option to obtain more information about this service.
%
% [2600:3c03:0000:0000:f03c:91ff:feae:779d REQUEST] >> eurecom.fr
%
% RL Net [#######...] - RL IP [######....]
%
REGISTERED yes
ADMIN
HANDLE UF2-FRNIC
TYPE PERSON
CONTACT Ulrich Finger
ADDRESS
Institut Eurecom
450, route de Chappes
06410 Biot
COUNTRY FR
PHONE +33 4 93 00 81 00
FAX +33 4 93 00 82 00
EMAIL finger@eurecom.fr
SPONSOR GIP RENATER
CHANGED 2013-03-27
ANONYMOUS NO
OBSOLETED NO
SOURCE FRNIC
TECH
HANDLE PG7865-FRNIC
TYPE PERSON
CONTACT Pascal Gros
ADDRESS
EURECOM
450, route des Chappes
06410 Biot
COUNTRY FR
PHONE +33 4 93 00 81 22
FAX +33 4 93 00 82 00
EMAIL gros@eurecom.fr
SPONSOR GIP RENATER
CHANGED 2013-03-27
ANONYMOUS NO
OBSOLETED NO
SOURCE FRNIC
OWNER
HANDLE IE110-FRNIC
TYPE ORGANIZATION
CONTACT Institut Eurecom
ADDRESS
450, route des Chappes
06410 Biot
COUNTRY FR
PHONE +33 4 93 00 81 00
FAX +33 4 93 00 82 00
EMAIL postmaster@eurecom.fr
SPONSOR GIP RENATER
CHANGED 2013-03-27
ANONYMOUS NO
OBSOLETED NO
SOURCE FRNIC
DOMAIN
STATUS ACTIVE
HOLD NO
SPONSOR GIP RENATER
EXPIRY DATE 07/03/2018
CREATED 1996-03-08
CHANGED 2017-03-07
SOURCE FRNIC
HANDLE NSL150618-FRNIC
NSERVER
DNS.EURECOM.FR 193.55.113.200
NS-AUTH1.ENST.FR 137.194.2.156
NS-AUTH2.ENST.FR 137.194.2.157
NAME eurecom.fr
Go to top