Sunday, May 3, 2015

Things that caught my interest over May Day weekend

Clear and breezy, just what you expect from early May.

  • XML External Entity (XXE) Processing
    There exists a specific type of entity, an external general parsed entity often shortened to an external entity, that can access local or remote content via a declared system identifier. The system identifier is assumed to be a URI that can be dereferenced (accessed) by the XML processor when processing the entity. The XML processor then replaces occurrences of the named external entity with the contents dereferenced by the system identifier. If the system identifier contains tainted data and the XML processor dereferences this tainted data, the XML processor may disclose confidential information normally not accessible by the application.

    Attacks can include disclosing local files, which may contain sensitive data such as passwords or private user data, using file: schemes or relative paths in the system identifier. Since the attack occurs relative to the application processing the XML document, an attacker may use this trusted application to pivot to other internal systems, possibly disclosing other internal content via http(s) requests.

  • XML External Entity (XXE) Vulnerabilities
    The Billion Laughs Denial-of–Service (DoS) attack consists of defining 10 entities, each defined as consisting of 10 of the previous entity, with the document consisting of a single instance of the largest entity, which expands to one billion copies of the first entity.
  • Notes on indexes and index-like structures
    One of the best examples of a DBMS winning largely on the basis of its indexing approach is Sybase IQ, which popularized bitmap indexing. But when last I asked, some years ago, Sybase IQ actually used 9 different kinds of indexing. Oracle surely has yet more. This illustrates that different kinds of indexes are good in different use cases, which in turn suggests obvious reasons why clever indexing rarely gives a great competitive advantage.
  • Why Are Geospatial Databases So Hard To Build?
    If your data model is inherently non-scalar, you enter an algorithm wasteland in the computer science literature. Paths, vectors, polygons, and other elementary aggregations of scalar coordinates used in spatial analysis are non-scalar data types. Computational relationships are topological instead of graph-like.

    Spatial data types, among a few other common data types, are interval data types. An interval data type cannot be represented with less than two scalar values of arbitrary dimensionality, like the boundary of a hyper-rectangle. These differ from scalar types in two important ways: sets have no meaningful linearization and intersection relationships are not equivalent to equality relationships. The algorithms that do exist in literature for interval data are poor.

  • Space efficient indexes for the big data era
    An index is typical used to restrict access to only the relevant to the query parts of the data. By space efficient we emphasize that the index has to be significantly smaller that the original data and typically has to reside in at least one level higher in the memory hierarchy than the indexed data. During query evaluation, before accessing and transferring data from a slower memory to a faster, we consult the space efficient index which resides in much faster memory. The index will reveal which parts, if any, of the ”slower” data are relevant to the query and should be transferred.
  • Distance Metrics for Fun and Profit: "People Who Like This Also Like ... "
    People building search engines have developed some pretty nice models for calculating similarity between query strings and text documents. These models can be easily adapted to our purposes, by treating each artist as a document and each user as term in those documents.
  • Interop Liveblog: The Post-Cloud
    If the problem was “I want to deploy my web service”, the initial answer was the x86 server (which made computing power more accessible to people). The next answer was to use multiple virtual servers, for more density. Next we wanted workload mobility (live migration), and then deploying web services as a service (IaaS). That lead to “automagically configured” web services (PaaS). Then we wanted captured and immutable images of our web services (Docker), being able to turn them up extremely quickly (Linux containers and Docker) and easily integrated into our Continuous Development lifecycle. Finally, we wanted to be able to do all that on-demand, and quickly replaced on error (Mesos, CF Diego, Kubernetes). That, in turn, leads to wanting to be able to manage and place the workload intelligently based on data from any level—i.e., i just want to run a web service and have the data center do all the rest (the Post-Cloud).
  • Beej's Guide to Network Programming Using Internet Sockets
    This document has been written as a tutorial, not a complete reference. It is probably at its best when read by individuals who are just starting out with socket programming and are looking for a foothold. It is certainly not the complete and total guide to sockets programming, by any means.
  • Committers' FAQ
    This document is targeted at Apache committers. A committer is an individual who was given write access to the codebase of any Apache project.
  • Why our future depends on libraries, reading and daydreaming
    The simplest way to make sure that we raise literate children is to teach them to read, and to show them that reading is a pleasurable activity. And that means, at its simplest, finding books that they enjoy, giving them access to those books, and letting them read them.
  • Ladies and Gentlemen, The English Language…
    Place the word "only" anywhere on the sentence: "She told him that she loved him."

No comments:

Post a Comment