The fundamentally challenging structure of the web, Sandhaus says, isn’t exactly helping the cause, though. The web is predominantly written in HTML, a markup language that focuses on expressing how information on a webpage should look, not what it means. As a result, important pieces of information within webpages, such as headlines, bylines and publish dates in news articles, are formatted within HTML, but aren’t explicitly labeled as “headline, “byline” and “publish date.” “As a consequence,” Sandhaus explains, “it makes it difficult for a wider web ecosystem to have an idea of the structured nature of content.” That is, while webpages are formatted for humans to easily read them, machines can’t easily determine the underlying meaning of content on a page if it doesn’t follow a consistent structure. Thus, devaluing the utility of data.