Laboratory of Interdisciplinary Computer Science
Most of our research are the following lines:
GIS are information systems that manage geographic data, that is, data that represent objects and phenomena for which geographic location is an essential feature, and indispensable for analysis. GIS provide a broader perspective to information systems, since geographic aspects, such as location, geometric shape and relationships to nearby objects, can be taken into consideration. With richer representations, GIS excels in applications designed for various specialized activities, such as urban planning, environmental sustainability, transportation and transit, public health and epidemiology, demography, cartography, utility networks, agriculture, marketing and many others.
GIS wide applicability and flexibility provided the basis for much evolution in data management in the last decades. The interest on the computational representation of geography-related objects and phenomena inspired the evolution of a wide range of data management tools and techniques, including computational geometry algorithms, multidimensional data structures, spatial indexing, geographic data modeling, geographic visualization, geostatistics, and much more.
Spatial Data Infrastructures (SDI) are a new approach for the creation, distribution and use of geographic information, with an emphasis on interoperability. SDIs seek to go beyond the simple distribution of previously existing maps and cartographic data, to serve as data sources powered by standardized Web services. SDIs have the potential to become fundamental elements for understanding space, disseminating geographic data and information along with metadata on provenance, quality and semantics. The typical SDI user is someone who needs to combine and integrate data from various sources to generate new insights on a field of study or application. In this perspective, SDIs can have a central role in areas such as environmental management and urban planning.
The volume and variety of geographic data available on the Web for the common citizen are increasing rapidly. Since the onset of Web 2.0, there is much interest on tools that allow people to geographically locate and describe aspects of their daily life, and to share such knowledge with other people. Initial applications show that it is possible to mobilize the interest of large numbers of citizens for the creation, dissemination and maintenance of geographic information on socially relevant themes. This line of research focuses on the design and implementation of computational tools and techniques that allow groups of people to act as human sensors, voluntarily (or unconsciously) contributing information for the common good. The research agenda includes investigating user motivation, data quality, user feedback and spatial coverage of contributions. We also work on methods for active and passive crowdsourcing and crowdsensing, seeking the application of recommendation systems to enhance volunteered contributions.
The first data models developed for geographic applications were guided by existing GIS internal structures, forcing the user to adjust his/her interpretation of spatial phenomena to whatever structures were available. As a consequence, the modeling process did not offer mechanisms that would allow for the representation of the reality according to the user's mental model. Even well-known semantic and object-oriented data models, such as the Entity-Relationship (ER) model , the Object Modeling Technique (OMT) model , and the IFO model , do not offer adequate facilities to represent geographic applications. Even though these models are highly expressive, they present limitations to the adequate modeling of such applications, since they do not include geographic primitives that would allow for a satisfactory representation of spatial data.
Considering these limitations, OMT-G was proposed in 2001, and evolved steadily since. OMT-G is a data model for the design of geographic database systems and applications. OMT-G starts with Unified Modeling Language (UML) class diagram primitives, introducing geographic primitives in order to enhance UML's semantic representation capabilities, thus reducing the distance between the designer's mental model of the reality and the usual representation tools. OMT-G provides primitives for modeling the geometric shape and location of geographic objects, supporting spatial and topological relationships, “whole-part” structures, networks, and multiple representations. Furthermore, the model allows the specification of alphanumeric attributes and methods associated to each class. The model's main strong points include its graphical expressivity and its compactness, since textual annotations are replaced by pictograms and symbols indicating explicit relationships, which are able to denote the dynamic nature of the interaction between spatial and non-spatial objects. From the model, it is also possible to derive spatial integrity constraints, specified along with the usual constraints found in conventional database design. Using these assets, the mapping between the conceptual schema and the physical implementation can be executed more soundly and preserving the semantics contained in the higher abstraction level.
OMT-G is the basis for our research on geographic data modeling. We are interested in various related themes, including modeling tools and techniques, spatial integrity constraints, automatic mapping to physical object-relational implementations and mapping to NoSQL database managers.
The demand for geographic data in applications on the Web is increasing. One of the most important resources to support this increased interest is the ability to recognize references to places in Web documents. If documents can be correctly and efficiently linked to places mentioned directly or indirectly in them, it becomes possible to improve and innovate in directions such as geographic indexing and querying, finding relationships based on spatial proximity or containment, and detecting localized trends for events and phenomena mentioned in social media.
A large share of the information available on the Web is geographically specific. References to geographic locations appear in the form of place names, postal addresses, postcodes, historical dates, demonyms, ethnicity, typical food and others. Many queries include place names and other geographic terms. Therefore, there is demand for mechanisms to search for documents both thematically (for instance, using a set of keywords) and geographically, based on places mentioned or referenced by the text. Similar techniques and resources can also apply to streaming data, such as Twitter messages or RSS feeds, providing the opportunity to index content in near-real-time, based on references to places.
However, while finding references to places in Web documents, ambiguity and uncertainty occur. Places can share a name with other places (Paris, besides being the capital of France, refers to more than sixty places around the world. Places are named using common language words (Park, Hope and Independence are American cities) and proper names (Washington, Houston and San Francisco). Also, a place can be associated to many names, like New York, NYC or The Big Apple, and to names in various languages. Ambiguity makes the resolution of references to places intrinsically context-based. Although there are important work on place-based information integration and retrieval, areas such as disambiguation are still in their infancy.
References to places can be straightforward and unambiguous as geographic coordinates or not. Other sources of geographic location information can be structured (postal addresses) or unstructured (place descriptions in text). They can also be direct (place names) or indirect (references to cultural characteristics associated to places), explicit (news headers) or implicit (“9/11”). Humans are often able to recognize references to places based on such evidence, but this association does not come so easily to automated systems. Addressing this problem is one of the tasks for Geographic Information Retrieval (GIR) research.
GIR extends Information Retrieval with geographic locations and metadata, taking it beyond the use of keywords. GIR studies methods and techniques for the retrieval of information from unstructured or partially structured sources, including relevance ranking, based on queries that specify both theme and geographic scope.
The expression Urban Computing designates the process of acquiring, integrating and analyzing large volumes of heterogeneous data, generated by various sources in the urban space. These sources range from environmental sensors to official governmental data, and include the direct participation of citizens in crowdsourcing or volunteered information initiatives. Data and information managed in this process are directed to the understanding and solution of urban problems that are typical of large cities in Brazil and abroad, such as mobility, public health, air and sound pollution, water and energy consumption, and many others. There is a three-fold concern: on improving the urban environment for human (co)existence, on improving urban quality of living, and on improving the conditions for the operation, by governmental authorities and public utility companies, of the various systems that comprise the city. Our objectives in Urban Computing research is to establish a qualified cycle for collection, integration and use of geographic information to the benefit of society, fostering the evolution of the state-of-the-art in topics along this cycle, such as spatial data infrastructures. Research outcomes are applied to typical urban problems, with an emphasis on the use of geographic location as a factor for data integration and for communicating findings, as feedback to the society.