FiNETIK – Asia and Latin America – Market News Network

Asia and Latin America News Network focusing on Financial Markets, Energy, Environment, Commodity and Risk, Trading and Data Management

Markets in Financial Instruments Regulation (MiFIR): A New Breed of Data Requirements – A-TEAM

Rather than opting for an all in one directive to herald the second coming of MiFID, the European Commission has split the update into two parts: a regulation and a directive. The Markets in Financial Instruments Regulation (MiFIR) should be of particular interest to the data management community due to its focus on all aspects of data transparency, from trade data through to transaction reporting.

According to the draft of MiFIR, which is available to download at the bottom of the blog, the regulation: “sets out requirements in relation to the disclosure of trade transparency data to the public and transaction data to competent authorities, the authorisation and ongoing obligations applicable to providers of data services, the mandatory trading of derivatives on organised venues, and specific supervisory actions regarding financial instruments and positions in derivatives.” The data transparency requirements have therefore been neatly tied together under one regulatory banner, leaving the directive to deal with aspects such as the provision of investment services and conduct of business requirements for investment firms.

The draft regulation is the culmination of the work of the European Securities and Markets Authority (ESMA) and its predecessor over the last couple of years to gather industry feedback on the implementation of the first version of MiFID and to fill in any gaps, as well as to extend the regulation beyond the equities market. The draft paper notes that the European Commission has focused on assessing the impact of these new requirements including cost effectiveness and transparency; hence it is adopting a defensive stance ahead of any possible industry backlash on the subject.

Much like its predecessor, MiFIR is focused on improving cross border transparency and ensuring a level playing field with regards to data reporting requirements and access. Although the regulation contains a number of important pre-trade data transparency requirements such as equal access to data about trading opportunities, the most important aspects for data managers will likely reside in the post-trade section of MiFIR.

The extension of transparency requirements to OTC derivatives and fixed income instruments and the multilateral trading facility (MTF) and organised trading facility (OTF) contingents in the market is one such development. These markets, however, will not face the same level of transparency requirements as the equity markets, although “equity like” instruments such as depository receipts and exchange traded funds will see the MiFID requirements extended to cover them directly. All trading venues and their related trades will therefore now be subject to the same level of transparency requirements, but these will be tailored to the individual instrument types in question (the level of transparency will be determined by instrument type rather than venue).

On transaction reporting (the area of most relevance with regards to reference data standards), MiFIR aims to improve the quality of the data underlying these reports (a common theme across a lot of recent regulation – see commentary on which here) by being much more prescriptive in the standards that must be used. The idea is for firms to provide “full access to records at all stages in the order execution process” and for trading venues, beyond just traditional exchanges to encompass MTFs and OTFs, to store relevant data for a period of five years. This data includes legal entity identification data that the regulation indicates must be reported via approved mechanisms and formatted in a certain manner that will make it accessible for regulatory oversight purposes cross border.

The exact nature of the legal entity identification (LEI) and instrument identification standards that are to be used by firms in their transaction reports is likely to be impacted by the ongoing work at a global level as part of the systemic risk monitoring effort (see more here). At the moment, a range of identifiers is acceptable, but the regulatory community has been pushing towards the Bank Identifier Code (BIC) for some time (see more on which here), but this may change before MiFIR comes into force.

Another important section of MiFIR is the one devoted to the “increased and more efficient data consolidation” for market data, which necessarily entails a reduction in the cost of this data. A City of London paper published earlier this year addressed this issue directly, noting that the majority of the European firms participating in the study believe poor data quality, high costs of pricing data and a reliance on vendors are the main barriers to post-trade transparency (see more here), and MiFIR appears to be aiming to directly address those issues.

The argument for some form of consolidated tape or tapes is an integral part of that endeavour (see recent industry commentary on this issue here) and MiFIR indicates that the aim is for data to be “reliable, timely and available at a reasonable cost.” On that last point, the regulation also includes a provision that all trading venues must make post-trade information available free of charge 15 minutes after execution, thus enabling data vendors to stay in business but increasing transparency overall (or so the logic goes). Moreover, the regulator is keen for a number of consolidated tape providers to offer market data services and improve access to a comparison of prices and trades across venues, rather than a single utility version.

In order to tackle the issue of a lack of data quality for trade reporting, all firms will also be required to publish their trade reports through approved publication arrangements (APAs), thus ensuring certain standards are adhered to.

The full MiFIR Draft paper is downloabale here  from A-TEAM

Source: A-Team Virgina´s Blog, 08.09.2011

Filed under: Data Management, Market Data, News, Reference Data, Risk Management, Standards, , , , , , , , , , , ,

Semantics and Enterprise/Master Data Management: Data Strategy Spring 2009

Selected white papers & articles from Data Strategy Journal Spring 2009 on Semantic technology and Enterprise Data Managment (EDM) and Master Data Management (MDM).

The Power of Semantic Technology: Mind over Meta  (David Newman)

We have witnessed over the years the progression from basic machine languages, to higher-level procedural languages, and then to object-oriented languages. Each advance introduced dramatic improvements in software capabilities that resulted in major leaps forward in fulfilling information technology requirements. We are again on the verge of another major advance in the evolution of software

Semantic Technology and Master Data Management (Brian Schulte)

Master Data Management is now mainstream and those of us who have practiced it for a few years are battered, bruised and wearily displaying our scars. Typically defined as the people, processes and systems that govern the core data (e.g. products, customers, suppliers) needed to run a business, Master Data Management (or MDM) requires painstaking work in three broad areas: data standardization, architecture, and governance:

Bring Semantic technology to Enterprise Data (Paul Miller)

World Wide Web inventor Sir Tim Berners-Lee declared the Semantic Web ‘open for business’ in 2008, celebrating the ratification of the SPARQL query specification by the World Wide Web Consortium (W3C); the organization of which he is Director. “I think we’ve got all the pieces to be able to go ahead and do pretty much everything,” he stated in an interview. “You should be able to implement a huge amount of the dream, we should be able to get huge benefits from interoperability using what we’ve got. So, people are realizing it’s time to just go do it.”

Source: Data Strategy Journal, Spring Issue 2009

Filed under: Data Management, Library, News, Reference Data, Risk Management, Standards, , , , , , ,

Is Data Modeling Still Relevant?

Some people believe data modeling has become very passé these days. The belief is that because data modeling theory is more than 30 years old and, because some data modeling tools have been around for 10 to 20 years, somehow data modeling is no longer relevant. Nothing could be further from the truth. In fact, data modeling may now be more necessary than ever before.

While there are other modeling techniques and notations, such as business process modeling and Unified Modeling Language, the need to accurately capture business data requirements and transform them into a reliable database structural design is as paramount as ever. The key differentiator is that data modeling is the only technique and notation that focuses on the “data at rest.” All the others tend to focus more on “data in motion.” Put another way data modeling concentrates on issues that lead to a solid database design, while others approaches tend to focus more on issues that will result in better application design or things useful to programmers, such as data structures, objects, classes, methods and application code generation.
Case in point: I’ve personally served as an expert witness in several court trials where plaintiffs sued defendants for serious financial remuneration when custom database applications had performance and/or data accuracy problems. In every case, there was a failure to data model the business requirements. Thus, the data effectiveness suffered. Moreover, ad hoc database design, or database design using more programmatic-oriented techniques and tools, often resulted in inefficient database design. No amount of coding could overcome the resulting bad database design. So, in every case, the plaintiff won.
The other reason data modeling has seen measurable resurgence is the data warehousing phenomenon. With cheap storage these days, most companies can afford, and benefit from, retaining historical aggregate and/or summary data for making significant strategic decisions. With the accumulation of numerous source legacy online transaction processing systems, there are two key ways to approach populating a data warehouse: directly from source to warehouse (as shown in Figure 1) or through an intermediary database often referred to as an operational data store (as shown in Figure 2).
Sufficient debate exists as to which approach is superior, but I won’t address that here. Regardless of which approach is selected, the database design (i.e., the data at rest) is paramount because, in a data warehouse, the data itself – and the business information it contains – is the most relevant and valuable asset. Typical data warehouse queries and reports issued via business intelligence tools process that asset to yield strategic decision-making results.
The other key area where data modeling often supports the whole data warehousing and BI effort is the mapping of legacy data fields to their DW and BI counterparts. This metadata mapping about how frontline business data maps to the data warehouse helps with the design of both queries and/or reports, as well as with extract, transform and load programming efforts. Without such mapping, there would be no automatic tie to the dependent data warehousing information as OLTP legacy systems evolve. Hence, one would have to almost totally re-engineer rather than simply follow the OLTP source data ramifications and ripples downstream to the DW and BI endpoints.
For those not involved with data warehousing projects – perhaps those performing more traditional OLTP-type systems development – data modeling still is important. Often, however, people get so caught up in novel paradigms such as extreme programming, agile software development or scrum that they compromise data modeling, or even skip it entirely. The problem is that these new approaches don’t always spell out exactly how data modeling should be incorporated, so people often forego it.
My belief is that no matter what latest and greatest approach you use, data modeling should be integrated into your development process wherever it makes sense. Figure 3 shows how both conceptual and physical data modeling should fit into an overall database design process – whether it’s for a totally new system or for one that’s being updated or re-engineered.
There is one final reason why data modeling has been getting more attention these days. In many cases, organizations finally are requiring data models as a sign-off deliverable of the development process. I attribute this to their attempt to adhere to the Software Engineering Institute’s Capability Maturity Model and Capability Maturity Model Integration concepts. The idea here is quite simple: to mature your development process regardless of technique, you need to develop in terms of both the processes and tools used to achieve the desired better end result. Both processes and tools can lead to maturity, helpig to reinvigorate many peoples’ interest in data modeling.
Now comes the hard part. Which data modeling tool should you use? That might seem like a tough or loaded question; there are numerous data modeling tools available. Plus, many enterprise modeling suites contain data modeling capabilities. Rather than advise any particular tool, I’m going to outline some basic guidelines for things to avoid. I believe that any tool that meets some standard and minimal requirements will help you produce effective and efficient data models – and, hence, the resulting databases.
Avoid drawing tools that aspire to be data modeling tools. A good data modeling tool supports defining tons of metadata with business relevance. Think of the diagram as just the tip of the iceberg – where you don’t see the 90 percent of the mass that is underwater. The same is true for data modeling. If you concentrate only on what the picture is, you’ll probably compromise the effectiveness of the resulting database.
Choose a tool that fits your needs. Often, people purchase a killer modeling tool that offers everything imaginable. But, if all you need or will use is the data modeling portion, why pay for more? The key concern here is that the more any tool does besides data modeling, the better the chance its data modeling capabilities may have been compromised to do everything else. Sometimes more is not better.
Data definition language. This is another case where more might not be better. It is better if your tool supports 100 percent accurate CREATE or ALTER scripts for a few databases important to you than all of them at a lesser level. But be very careful – the DDL generated by many tools, even those focusing on just a few databases, can often generate less than optimal DDL. You have to know what to look for; so, engage your database administrator in making the decision, just to be safe.
Verify that your data modeling tool provides robust model consistency and accuracy checking reports and/or utilities. As data models grow (and they will), it can be quite overwhelming to have to manually check everything. And you cannot expect the poor DBA to sanity check the thousands or tens of thousands of DDL lines a data modeling tool can quickly generate. Effectiveness is mostly on your shoulders, but efficiency can be aided by good data modeling checking utilities.
Data modeling has come a long way since its inception. Even though the heydays of CASE and software engineering passed with the ‘90s, the need for and usefulness of data models has not subsided. Data modeling can assist with any effort, regardless of development methodology or paradigm. So, don’t pass on data modeling just because it’s a mature technique – you might be very sorry if you do.

Source: Information Management, 28.05.2009 by Bert Scalzo

Filed under: Data Management, Library, News, Risk Management, , , ,

Master Data: The Product Information Challenge

Master data management for product data (known as PIM, for product information management) is a different kettle of fish altogether from MDM for customer data (also known as customer data integration, or CDI).

It is important to recognize and consider the fundamental differences between the two. One distinction is complexity. Product data typically requires more attributes (or fields) than customer data. A customer might require 10 to 20 attributes for unique identification and to capture the minimum set of data needed to do business with each. But it’s not uncommon for product data to have dozens or hundreds of required attributes.

Standardization is an issue, too. In the customer data realm, ZIP codes and other address elements can be verified against postal standards. But many manufacturers are reluctant to release too much detailed information because of concerns of becoming commoditized on the Web. In certain industries, there has been progress recently toward standardizing some elements of product information, with industry associations and government bodies promoting standards like the United Nations Standard Products and Services Code. But there’s still a long way to go.

And although customer data is commonly structured into various hierarchies (such as a corporate family tree or a sales geographic rollup), the hierarchy requirements for product data are usually more complex, including bills of material, product/product line/product family rollups and financial reporting breakouts.

A lot of product data is unstructured (such as engineering or marketing documents) or poorly structured (like description fields overloaded with lots of information that should ideally be broken out into separate fields like size, weight, color, packaging, etc.). This variability in structure requires a specialized parsing engine if you want any hope of automating the standardization of your data.

The availability of outside reference databases is much more common for customer data than for product data because a customer is a real entity that exists independently of the enterprise, while a product may exist in the imagination, factories and stores of the company. So, third-party content providers such as D&B and Acxiom, which can be very helpful in cleansing, matching and enriching customer data, may be of limited or no assistance with your product data.

Volumes tend to be higher, too. One of my software industry clients had approximately 25,000 customers but separately managed more than 50,000 individual product records where poor system designs sometimes forced the creation of multiple product records to allow for minor differences or variations of a product.

Other industries like retail or high-tech manufacturing also have very high volumes of product data and can easily have many millions of unique (or supposedly unique) items in their product or materials master databases.

While initial quality levels of customer data are often worse than expected, with product data, quality levels are typically even worse. Using the ACT+C (accuracy, completeness, timeliness and consistency) definition of data quality for assessment, I’m usually shocked by how inaccurate, incomplete, out-of-date and inconsistent product information is.

All of this may sound like a lot of complexity – and it is – but the real kicker seems to be that there are so many categories of product data. What do I mean by categories? Well, what are the rules that tell you a piece of data describing a printed circuit board is valid? Whatever your answer, it’s a different answer for sheet metal, which is different from ball bearings, which is different from MP3 players, which is different from digital cameras. You get the point – different rules for each type of product means exploding levels of complexity!

A Different Approach

Do the widespread differences mean that data mastering and data quality approaches that work well for customer MDM won’t work for product MDM? Unfortunately, yes.

Andrew White, research VP at Gartner, Inc. said, “Product data is inherently variable, and its lack of structure is generally too much for traditional, pattern-based data quality approaches. Product and item data requires a semantic-based approach that can quickly adapt and ‘learn’ the nuances of each new product category. With this as a foundation, standardization, validation, matching and repurposing are possible. Without it, the task can be overwhelming and is likely to include lots of manual effort, lots of custom code – and a whole lot of frustration.”

I think Andrew’s right on the money here. The variability and relative lack of structure, the lack of external standards and third-party referential data sources, the overloading of the description field, the number of requirements for classification and categorization and the differences in hierarchy management all add up to a problem that most data quality tools designed for customer information would have a hard time solving.

Yet all of these same issues make trying to handle product information without a tool-based approach even less appealing. Investigate the semantic-based tools now on the market. They can help you to standardize, enrich, match, repurpose and govern your product information.
Source: Information Management Magazine, May 1, 2009 by Dan Power

Dan Power is the founder and president of Hub Solution Designs, Inc., a management and technology consulting firm specializing in master data management (MDM) and data governance. He has 21 years of experience in management consulting, enterprise applications, strategic alliances and marketing at companies like Dun & Bradstreet, Deloitte Touche Tohmatsu, Computer Sciences Corporation, eCredit and Parson Consulting. Power speaks frequently at technology conferences and advises clients on using MDM to solve business problems.

Filed under: Data Management, Library, News, Reference Data, Risk Management, Standards, , , , , ,

Data Management: A new Dawn for Data Governance?

The data management world has long been addressing the data governance problem. Yet, firms continue to adopt erroneous approaches or avoid the issue altogether. Carla Mangado explores how the current economic environment will push firms to give data governance the attention it requires

Data governance has typically been passed over by firms who have not realized the long-term business benefits a successful program can lead to. While the discipline continues to evolve, the level of maturity in the industry is just not present.

But this is already changing. As word spreads about how a good data governance program can help with essentials such as reducing risk and enabling compliance, the current economic environment is already pushing firms to focus on introducing new frameworks.

Software vendor DataFlux released a financial services sector research report in March in partnership with research consultancy Business Development Research Consultants (BDRC), which revealed data governance is clearly on the agenda. The results portrayed increasing momentum, with 34% of participants already having completed a data governance project and a total of 29% currently implementing one or considering to do so.

Awareness is growing fast. In July 2008, the IBM Data Governance Council, a group that focuses on data governance challenges and explores best practices in the field, predicted that within three years data governance would become a regulatory requirement, the value of data treated as an asset on the balance sheet and reported by the chief financial officer, who would become responsible for reporting on data quality and risk to the board of directors.

The predictions are now being realized sooner than initially expected. “Eight months on from releasing the research, we have seen these issues accelerating,” says New York-based Steven Adler, chairman, IBM Data Governance Council, adding that he expects 2009 to be a big year for data governance due to the already increasing focus across many industries.

Such an immediate response is not unexpected. At a time when risk and compliance keeps top management awake at night, awareness of the need for good data quality is rising. And data governance is an essential factor to achieve this goal. In fact, Adler says there is a tremendous amount more activity in data governance today, and he thinks the risk environment and the repercussions around the economy are driving this focus.

Orlando-based Gwen Thomas, president at the Data Governance Institute, says everyone is now confident to move forward in this space, ensuring leaders have the information they need to deal with the credit crunch and the future. “One of the most important benefits of data governance is the transparency it offers into the true state of an organization’s data,” says Thomas, adding that this is a must for effective decision-making, to resolve long-standing data issues and pro-actively to set standards and business rules to avoid future problems.

Now that everything revolves around transparency and control, data governance is certainly seen as the way forward. Methodologies must be put in place to achieve this. Data governance can enable firms to deal with a wide range of systems, creating a consistent and single view and facilitating more effective risk management. “For the business to accurately and effectively manage risk, they need to have a better method or some methods in place to effect information control and awareness,” says Toronto-based Mark Cowan, managing director at the data management strategies and Saas integration provider First Spike.

A clear vision of data is essential. Datanomic, the data integration and data quality software provider, recently started a data governance program with a US-based financial institution. The main driver behind the project is Basel II compliance – which is now becoming a data governance project-driver in the US market – and achieving a single view of the data. London-based Steve Tuck, chief strategy officer at Datanomic, says the first steps consisted of identifying the key areas that need attention and their different requirements. Knowing the state of the data is essential, he says, explaining that he often advises clients to start small.

Firms that already have data governance programs in place are now increasingly turning to such frameworks to achieve goals and provide the required alignment with the cross-functional input needed to effectively tackle data issues. “What I see everywhere is awareness that governance is needed so that these assumptions about quality and standardization and definitions can turn from untested assumptions to actionable knowledge,” says Thomas, adding that data governance can bring all the cross-functional groups together so that they can bring all the pieces of the puzzle to the table.

Cost, Organization and the Business Challenge

Still, with the current cost-cutting initiatives, limited budgets and need for immediate business and economic benefits, some firms may not be willing to start data governance projects. But with increasing talk about regulation having a greater say in the emerging governance, risk and compliance space (GRC) those not involved in data governance will soon have to be.

In fact, according to a white paper released by data quality system provider Harte-Hanks Trillium Software called Data Intelligence and Governance, such needs are already at the forefront of the debate, and regulators now require GRC teams to open their risk governance processes to detailed scrutiny.

London-based Colin Rickard, managing director, north and west Europe at DataFlux, says: “Data governance is about proving to the regulator that what you are doing is correct … creating transparency is equally as important as cutting costs.”

Such programs do provide results, and Thomas says that with the highest ROI of many firms’ efforts, data governance projects are the critical path to enable other projects to be successful – making money while reducing complexity.

Firms are already responding. “They want better quality reports and visibility into the underlying data, a demand now usually coming from the director and senior level of organizations – then from the innovation side of these businesses, they all have to see horizontally across the data sets, better, faster, cheaper than before,” says Cowan.

But firms should not be put off by thinking such projects will not give short-term benefits. Berkshire-based Ed Wrazen, vice-president, product management and strategy at Harte-Hanks Trillium Software, says data governance is all about reaching both short-term and long-term goals with an overall strategy in mind.

“The value is where data governance can be applied in the context of short- and medium-term business goals not losing sight of the overall strategy,” he says, adding that it is all about having a long-term vision while generating value for the organization.

And while tackling projects with an overall strategy in mind, firms also have to set achievable expectations. London-based Peter Moss, global head of content, technology and operations, Thomson Reuters, says some organizations are totally committed to data governance but struggle to get results. “They try to define a perfect world. Instead, they should define a vision where they want to get to and take very pragmatic steps to get there,” says Moss.

To be successful, firms need to tackle the organizational issue. Internal data governance councils, which take the lead when it comes to data governance, have been regarded as part of the solution. These have emerged both at a very high executive level, treating the council as a small board of directors at the management level and at lower levels focusing more on the operational aspects of data governance. “I have seen companies around the world create such groups in past years, but the financial crisis is accelerating it and it’s happening today,” says IBM’s Adler.

But such councils can only be successful if the organization has the right alignment within the business to tackle the challenges. “If you haven’t got the right alignment within the business to make sure everybody throughout the organization is engaging appropriately, then the governance board isn’t going to be of much value, but if you have business alignment and don’t have a back-up board that makes the key strategic decisions, the business alignment will not be as effective,” says Moss, adding that it’s all about combining both together. “Data governance is more about understanding how to manage the journey than defining where you want to get to,” he says.

The main challenge remains achieving enterprise-wide recognition that data governance is linked to business initiatives. This is problematic as, done in isolation and away from the operating pieces of the business or technology, Cowan says data governance will not give the benefits expected.

Terminology could represent one of the first issues that need to be overcome. “Data governance doesn’t inspire a lot of excitement or interest from the business side, but it is increasingly being linked to business initiatives,” says Wrazen. “If you can link data governance to a business program or goal and show the value it will generate with tangible evidence through the improved quality of data, that’s where we will see the success in data governance,” he adds.

Firms are no longer seeing data governance as an optional task but a necessary process – not one project in isolation with a specific timeframe but several projects creating an enterprise-wide program with specific long-term and short-term goals. The industry is even beginning to notice the rise of data governance-related job titles. Perhaps this is the ultimate sign that the approach to data governance is changing.

The Governance Forecast

The DataFlux and BDRC primary research survey amongst a senior IT audience reveals some of the UK financial service sector attitudes to data quality, data governance, regulation and compliance.

34% of firms have already implemented a data governance project

11% of firms are implementing a data governance project

18% of firms are considering implementing a data governance project

16% claim no-one has specific responsibility for data quality or governance initiatives

36% claim IT is responsible for data quality or governance initiatives

14% claim the data steward/manager is responsible for data quality or governance initiatives

91% of firms have established an enterprise-wide view of data

9% of firms have no wide view of data but plan to change this

73% say compliance is the main driver for investment in data management

52% say organizational efficiency is the main driver for investment in data management

By: DataFlux, BDRC.

Source: IRD – Inside Reference Data, April 2009

Filed under: Data Management, Library, News, Reference Data, Risk Management, Standards, , , , , ,

Master Data versus Reference Data

Master data and reference data are two major data categories that are often thought of as the same. The reality is that they are quite different, even though they have strong dependencies on each other. Failure to recognize these differences is risky, particularly given the current explosion of interest in master data. Projects that approach master data as just “data” and fail to address its unique needs are likely to encounter problems. This is particularly true if such projects experience scope creep that leads them into the very different realm of reference data management.

Basic Categories of Data

Figure 1 shows how data can be separated into a number of different categories, arranged as layers. Perhaps the most important category is the transaction activity data. This represents the transactions that operational systems are designed to automate. It is the traditional focus of IT, including things such as orders, sales and trades. Below it is transaction audit data, which is data that tracks the progress of an individual transaction, such as Web logs and database logs. Just above transaction activity data is enterprise structure data. This is the data that represents the structure of the enterprise, particularly for reporting business activity by responsibility. It includes things such as organizational structure and charts of accounts. Enterprise structure data is often a problem because when it changes it becomes difficult to do historical reporting. (For example, when a unit splits into two, each responsible for a distinct set of products, how do we compare their current product sales performance to their performance prior to the split?) Enterprise structure data is a subject in its own right, which, alas, there is not enough space to discuss here.

Then we come to master data. Master data represents the parties to the transactions of the enterprise.

It describes the things that interact when a transaction occurs. For instance, master data that represents product and customer must be present before the transaction is fired to sell a product to a customer. Reference data is any kind of data that is used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise.

Figure 1: Categories of Data

Know Your Data

If we accept these definitions, there is a big difference between master data and reference data. Definitions, however, are a cloudy issue. Some people tend to regard reference data as any data used in an application, but not created in that application. Thus a sales application that gets product data from some other application can view the product data as reference data. This is a big problem. If we do not have precise definitions for what we are talking about, then it is difficult to even exchange ideas, let alone implement solutions to address problems of either reference data or master data management. It is something I have witnessed in my work, and I have been particularly disappointed by projects where people with different and rather fuzzy ideas of what master data is try to work together. Unless everyone involved in such projects has a clear idea of what they are dealing with, they cannot understand where the boundaries of the projects lie, and they are forced back to very general and, frankly, fruitless approaches to whatever problems of reference or master data they are trying to solve.

The Difference of Identification

Let us look at some of the specific differences between reference and master data. Identification is a major one. In master data, the same entity instance, such as a product or customer, can be known by different names or IDs. For example, a product typically follows a lifecycle from a concept to a laboratory project to a prototype to a production run to a phase where it is supported under warranty and perhaps to a phase of obsolescence where it may no longer be produced or supported but is still covered by product liability responsibilities. In each of these phases, the name of the product may change, and its product identifier may, too. For instance, Microsoft’s Cairo project was eventually named Windows 2000. I worked at an organization that funded special projects, and the year was part of the project number. When a project took a long time to be formulated, management usually changed the year node in the project number to give a more up-to-date impression. Beyond product, we are all aware that customers can change their names, or have identical names, and how difficult it is for enterprises that interact with a large customer base to know which individual they are dealing with.

By contrast, reference data typically has much less of a problem with identification. This is partly because the volumes of reference data are much lower than what is involved in master data and because reference data changes more slowly. Existing issues tend to revolve around the use of acronyms as codes. Reference data, such as product line, gender, country or customer type, often consists of a code, a description and little else. The code is usually an acronym, which is actually very useful, because acronyms can be used in system outputs, even views of data, and still be recognizable to users. Thus the acronym USA can be used instead of United States of America. Some IT staff try to replace acronyms in reference data with meaningless surrogate keys, and think they are buying stability by such an approach. In reality, they are causing more problems because reference data is even more widely shared than master data, and when surrogate keys pass across system boundaries, their values must be changed to whatever identification scheme is used in the receiving system.

Thus we can see that in the area of identification, quite dissimilar problems exist if we compare master data to reference data. A single approach is never going to adequately address identification problems in both categories of data.

The Problem of Meaning

Reference data has one unique property that it shares with metadata but which is totally lacking in master data. This is semantic meaning at the row level. We are all accustomed to the idea that metadata items, such as an attribute of an entity in a logical data model (or column of a table in a physical database) have definitions. It is a little less obvious that items of reference data also have definitions. For instance, what is the definition of USA in a country code table? Does it include Puerto Rico, Guam or the U.S. Virgin Islands? For some enterprises, it may only be the lower 48 states. Consider a database table of customer credit category. It may have rows for platinum, gold, silver, bronze and plutonium. The definitions of these rows are very important for interpretation of reports that are organized by customer credit category and for understanding what business rules may be triggered when a customer is assigned a particular customer credit category.

By contrast, definitions are meaningless for individual rows of master data. Customer A is just Customer A, and Product X is just Product X. Rows of master data do not have meanings. On the other hand, there can be huge disputes about meaning when it comes the to entity level in master data. What is a customer? What is a product? I would love to know how many millions of dollars have been wasted trying to get single enterprise-wide answers to these questions. It is a little like chasing rainbows. The reality is that the definition of master data entities depends on context. A marketing department may view prospects as customers, whereas for accounts receivable, a customer may only be somebody who has paid for a purchase. Understanding and managing these contexts and the various definitions that go with them is a major challenge in master data management.

Therefore, semantic issues are yet another significant difference between master data and reference data. The problem of getting, storing and making available definitions for individual rows of reference data is not the same as the need to understand the contexts and related definitions at the entity level in master data. These diverse challenges require very different solutions.

Links between Master and Reference Data

There are many other detailed differences between master and reference data, but there are also important linkages that complicate management approaches. Perhaps the most critical is the integration of the update cycle. When a new product, customer or other item of master data is introduced, there is always a possibility that new reference data will be required. Perhaps a new product will require a new product line or product category. This is particularly true in enterprises where the operationalization of the business is not well integrated with information systems. In such cases, nobody really understands the impact on data of introducing a new product as part of an entirely new product line. Perhaps it is not fair to blame business personnel for this, because they typically have enough problems to deal with in such a situation. Notwithstanding, in such cases, the addition of a new product record now requires the coordinated addition of a new product line record. If there is complete separation of master and reference data management, this can be a nightmare. It is particularly true if master and reference data are so separated that updates occur in different databases at different times. The result can be “orphan” product line and product records floating around the databases of the enterprise for a period of time.

What this shows is that while it is important to understand the differences between reference and master data, we must still think carefully about enterprise information architecture as a whole. The specific approaches that can solve the specific problems of master and reference data management must be set within a strategy of the overall management of enterprise information. None of this is particularly easy. What is exciting about the present moment in information management is that closer attention is being paid to these difficult problems, and the will and resources exist to try to solve them. Hopefully in the next few years we will see conceptual and practical innovations in master and reference data management that will be of enormous benefit to the enterprises that adopt them.

Source:Information Management Magazine, April 2006 by Malcolm Chisholm

Filed under: Data Management, Library, News, Reference Data, Risk Management, , , , , , , ,

Master Data Management: Data Strategy Autum 2008

Selected white papers & articles from Data Strategy Journal Autum 2008 on Master Data Management (MDM).

Golden Copy or Unified View (David Loshin)

Perusing the published literature on Master Data Management (MDM), as well as reading press releases, news stories, and listening to case studies and assorted podcasts and web seminars, it is not unusual to come across the phrase “golden copy” in reference to the result of a master data consolidation activity. Presumably, the term “golden copy” implies a record that epitomizes the highest quality information, on which any application can depend. The expectation is that the data integration and consolidation features of the MDM program are able to absorb all the various and sundry records from across the enterprise into this perfect data set.

Master Data Managment Governance (Malcom Chisholm)

Data administration has in the past placed great emphasis on the logical level of data management.  This has meant that data models, business processes, architectures, and application functionality have received a great deal more attention than physical data values in production databases.  Yet, master data management (MDM) is fundamentally concerned with managing physical data values in production databases.  These management requirements cannot be fully answered by abstracting the problems of MDM back to logical layer of data models, architectures, and so on.

Six Sigma and Master Data Managment (Joe Danielewicz)

Master Data Management (MDM) programs are inherently risky because they are long running and span several business functions. Business thinks Master Data quality is an IT problem and IT knows MDM won’t succeed without Business ownership. Applying Six Sigma methodologies helps you manage a challenging program using well understood management techniques that serve to remove risk and improve communication.

Source: Data Strategy Journal Autum issue 2008

Filed under: Data Management, Library, News, Reference Data, Risk Management, Standards, , , , , , ,

Enterprise Data Management Special Nov 2008 -Reference Data Review

Download: RDR_EnterpriseDataManagement_Special_Nov2008_A-TEAM

The current financial crisis has highlighted that financial institutions do not have a sufficient handle on their data and has prompted many of these institutions to re-evaluate their approaches to data management. Moreover, the increased regulatory scrutiny of the financial services community during the past year has meant that data management has become a key area for investment.

But given that IT spend is down as a whole, how is this impacting the implementation of enterprise data management (EDM) projects? Is strategic investment in particular areas of the processing chain with regards to data taking priority over wider EDM implementations? What lies ahead for the EDM community this year 2009?

Source: A-TeamGroup, November 2008

Filed under: Data Management, Data Vendor, Library, News, Reference Data, Risk Management, Services, Standards, , , , , , , , , , , ,