Today involved digging into details about different classification schemes and controlled vocabularies and I realized I have enough to start a list! I’m interested to see how this list grows and what meta-characteristics they have in common. So far I’m tracking if the classification scheme or controlled vocabulary is available online, if it is in Linked Data format, and where I am finding it (online resource, in a book, in print some other way, etc).
My readings today were about American Indian classification and subject heading issues in Dewey Decimal Classification, Library of Congress Classification, and Library of Congress Subject Headings as well as more information about Dorothy B. Porter and her work to organize, increase, and provide access to the African and African American collections that became the Moorland-Spingarn Research Center at Howard University. Practices for classifying American Indian resources have placed much of this content in the historic past under sections of the catalog about the history of North America (in both DDC and LCC) as if American Indians don’t even exist anymore. And Porter recalled a time when many libraries grouped anything by an African American author under a DDC heading for colonization (and migration). There are clunky ways to somewhat work within these classification systems but only to a point and only for some material. Limitations of DDC to expand and the slow pace of change in LC just seems to allow these problems to languish. So new classification schemes and controlled vocabularies have been developed and I’m learning how they have been used and how they can be applied to aid in the research process. This is where my thoughts turn to Linked Data possibilities but they aren’t well-formed thoughts yet.
And just to make sure I have some warning lights going off in my head regarding Linked Data, I also read about issues of bias in Knowledge Graphs related to the Semantic Web:
- data bias (Linked Data from sources being mostly about Europe, Japan, Australia, and the US)
- schema bias (depending on the ontology you can get very different results for a concept like the article’s example, theater)
- inferential bias (taking data from a source like DBPedia and running inference results in high confidence assumptions from the graph that say things like: “if X is a US president, X is male”).
That graph could use some more learning.
This brings up something that is coming across in other readings. Bias on its own isn’t necessarily a problem. Everyone has implicit biases. It’s when that implicit bias becomes systemic and reflects out as the appropriate or authorized way to organize and interpret classifications and subject matter – bias without recognition or documentation, without transparency, is a problem. Or in the case of this knowledge graph example, results without context show bias.