FiNETIK – Asia and Latin America – Market News Network

Asia and Latin America News Network focusing on Financial Markets, Energy, Environment, Commodity and Risk, Trading and Data Management

Coming to Grips With Big Data Challenges by Dan Watkins

The rate of data growth in financial markets has scaled beyond the means of manageability.

Debates have gone so far as to dismiss Big Data as being tamable and controlable in the near term with the current computing architecture commonly adopted as an acceptable solution. I agree but argue that conventional data transport – not management – is the real challenge of handling and utilizing Big Data effectively.

From exchange to trading machine, the amount of new ticks and market data depth are delivered only as fast as the delivery speed can endure. Common market data feeds that are used in conventional exchange trading are but a fraction of the market information actually available.

Perhaps due to high costs of $100,000 per terabyte, many market participants deem the use of more data as a bit too aggressive. Or they believe that high performance computing (HPC) is the next generation technology solution for any Big Data issue. Firms, therefore, are sluggishly advancing their information technology in a slow cadence in tune with the old adage: “if it ain’t broke don’t fix it.”

Over the last decade, Wall Street business heads have agreed with engineers that the immense perplexity of Big Data is best categorized by Doug Laney’s 2001 META Group report’s Three B’s: Big Volume, Big Velocity and Big Variety.

When looking at “Big Volume” 10 years ago, the markets had just defragmented under Regulation ATS. A flurry of new market centers arose in U.S. equities as did dark liquidity pools. This gave rise to a global “electronic trading reformation.” Straight-through processing (STP) advocates and evangelized platforms such as BRASS, REDIPlus and Bloomberg Order Management Systems (OMS) resulted in voluminous and fragmented market data streaming to 5,000 NASD/FINRA trading firms and 700,000 professional traders.

Today, the U.S. has 30+ Securities and Exchange Commission-recognized self-regulatory organizations (SROs), commonly known as exchanges and ECNs. For the first time since 2002, full market depth feeds from NASDAQ allow firms to collect, cache, react, store and retrieve feeds on six hours of trading for nearly 300 days a year more transparently than ever. Big Data volume has grown 1,000 percent and has reached three terabytes of market data depth per day.

Billions of dollars are being spent on increasing “Big Velocity.” The pipes that wire exchanges through the STP chain to the trader have become 100 times faster and larger but still not fast enough to funnel the bulk of information laying idle back in the database. Through “proximity hosting,” the telco is eliminated and latency is lowered. This structure results in adjustments made for larger packets but not really for more information as Big Data remains the big, quiet elephant in the corner.

Five years after Reg ATS, markets are bursting at the seams with electronic trading that produces explosive market data that breaks new peak levels seemingly every day. The SEC’s Regulation National Market System (Reg NMS), struck in 2007, requires exchanges and firms to calculate the best price for execution to be compliant. Firms are also now mandated to sweep all exchanges’ market order books and process all of that data for a smart execution.

After the execution, traders have to track the “order trail” from price to execution for every trade and store all of that information for seven years in the event of an audit recall of a transaction.

Under Reg NMS, subscribing to the full depth of all 30+ markets in “real time” would mean a firm would have to have a 1x terabyte pipe for low latency. Since a T-pipe is not realistic, data moves at 1x gigabits, which is relatively slow with the data in queue at 50-100 terabytes deep. Multi-gbs pipes, as fast as they seem, are still similar to driving five miles an hour on a 55 mph highway.

Analysts typically call data from a database with R (Revolution Analytics) and “SAS” Connectors. The process includes bringing data to an analytical environment in which the user runs models and computations on the subsets of a larger store before moving on to the next data crunch job. The R and SAS Connectors between the file servers and the database are at 10/100BASE-T, making the movement of 50 terabyte environment like driving one mile per hour in a 55 mph zone.

We all hear the polemics regarding data formats and the jigsaw puzzle of unstructured data and the fact that “Big Variety” is the obstacle. Even after standardization of SQL-based queries where analysts can ask any “ad hoc” question, too many sources and too many pipes from analytic servers cause traffic jams. SQL databases are ideal for unstructured queries but are slow in unstructured data compiling. Aggregating market information is where much of market’s processing technologies are being evaluated today to meet the requirements of regulations, sweeping for best execution and for risk management.

Comparing where current prices of stocks are against bids and asks to trade across multiple exchanges, markets, sources, asset classes and clients is essentially the Big Data task of risk management. In addition to managing data changes, firms are also tasked with managing their trading accounts, client portfolios and trading limits such as with the implementation of Credit Valuation Adjustments (CVAs) for counterparty risk.

So why are we still piping data around the enterprise when we just need more compute and memory power? Hardware-accelerated core processing in databases such as XtremeData’s dbX and IBM’s Netezza are powered by FPGAs (field programmable gate arrays). Processing of massive amounts of data with FPGAs can now occur at “wireless” speed. Along with high performance computing, high-speed messaging technology provided by companies like TIBCO, Solace Systems and Informatica have redefined transport times into ultra-low latency terms from one database to another in single microseconds, sometimes in nanoseconds, from memory-cache to memory-cache.

The colloquial phrase “in-database” analytics is an approach of running analytics and computations as near as possible inside a database where the data is located. Fuzzy Logix, an algorithmic HPC vendor, replaces the need for SAS and R connecting analytics, which stretch along the wire from the database to the analyst. With Fuzzy Logix, the need to call a database for small files is eliminated because computations can be done with the rest of the database in real-time: days to seconds faster.

With in-database or in-memory analytics, BI engineers can eliminate transport latency altogether and now compute at server speeds with computations sitting inside the database or in memory for tasks to be completed locally, not on the transport wire.

Wall Street is as risk averse as ever in today’s atmosphere so the adoption of new technology or new vendors continues to present operational risk challenges. ParAccel is a company that appears to be addressing the operational risk of new technology adoption by helping firms utilize the power of parallel processing of Big Data analytics on OEM hardware.

Since ParAccel is software, an IBM, HP or Dell shop could essentially rely on the reliability of their well-known, established database vendor but use next generation Big Data analytic processing an order of magnitude faster than what is currently in place. ParAccel allows firms to aggregate, load and assimilate different data sets faster than traditional platforms through its “columnar database” nodal system. The columns in a ParAccel environment provides firms with the flexibility to first run analytics in-database or in-memory, then bring massive amounts of data to a common plane and finally, aggregate the unstructured data and do it all in lightning speed.

Other companies like NVIDIA have been building graphic processing units (GPUs) for the video game industry for three decades and are now swamped with customer requests to help build parallel computing environments, giving financial firms the ability to run trillions of algorithmic simulations in microseconds for less than $10,000 per card, essentially. GPUs can have up to 2,000 cores of processing on a single NVIDIA Tesla card embedded inside. A GPU appliance can be attached to a data warehouse for advanced complex computations. Low-latency processing can also be achieved due to minimum movement of data over a short distance analyzing most of what Wall Street claims is Big Data in seconds compared with the days it takes now.

The vendors and players are ready to get to work; there just needs to be some consensus that the Big Elephant in the room is there and it’s standing on a straw when it could be surfing a Big Wave!

Source: Tabb Forum , 02.05.2012 by Dan Watkins, President @ CC- Speed dwatkins@cc-speed.com

Filed under: Data Management, Market Data, Risk Management, Trading Technology, , , , , , , , , , ,

White Paper: Big Data Solutions in Capital Markets – A Reality Check

Big Data has emerged in recent months as a potential technology solution to the issue of dealing with vast amounts of data within the enterprise. As in other industries, financial services firms of all kinds are drowning in data, both in terms of the sheer volume of information they generate and / or have to deal with, and in terms of the growing and diverse types of data they confront in those efforts.

But the relative immaturity of Big Data solutions, and widespread lack of understanding of what the term really means, leads some to question whether ‘Big Data’ is no more than a technology solution looking for Big Problem to solve.

So is Big Data for real? Can so-called Big Data solutions provide relief to the embattled data architects at financial institutions? Or is Big Data a solution looking for a set of problems to solve?

Research conducted by A-Team Group on behalf of Platform Computing suggests that current market sentiment, financial hardships and regulatory scrutiny may be conspiring to create the perfect conditions for Big Data solutions to provide value to financial institutions.

Download the White Paper Now

Source: A-Team, 15.02.2012

Filed under: Data Management, Data Vendor, Library, Market Data, Reference Data, , , ,

A-TEAM launches Big Data 4 Finance

 A-Team Group launched today – BigDataForFinance.com where it will cover the emerging science of big data and how it relates to financial markets applications – such as analysis of time series pricing, management of reference data and determination of sentiment from news archives.  A-Team will also cover the evolving technology infrastructure that underpins big data applications, from storage to analytics and business intelligence.

A-TEAM: Let’s start by addressing a working definition for big data, as we see it.  Wikipedia has a pretty good starter: “Datasets that grow so large that they become awkward to work with using on-hand database management tools.”

But here’s our improvement on that: “Datasets whose characteristics – size, data type and frequency – are beyond efficient processing, storage and extraction by traditional database management tools.”

And let’s be clear, the focus is as much on the analysis of data to derive actionable business information as it is on handling different data types and high frequency updates.

Make sure that you don’t miss news and contributions that could be valuable.  Be sure to sign up for the weekly email update here.

Source: A-TEAM, 18.01.2012

Filed under: Data Management, Data Vendor, Market Data, Reference Data, Risk Management, , , , , , , , , ,

IDC White Paper: Solving Big Data’s Big Challenges Can Lead to Big Advantages

Solving Big Data’s Big Challenges Can Lead to Big Advantages

The volumes and complexity of market data required by financial institutions today are immense and growing rapidly. Ongoing market changes are accelerating the growth in demand for data, and forcing financial institutions to address the challenges of what has come to be known as “Big Data”. This demand is fueled as firms develop and deploy new, more sophisticated cross-asset investment strategies.

At the same time regulatory changes are also forcing firms to source and report increasingly larger amounts of trade data, as well as to adopt higher-quality – and usually data-hungry – risk and pricing models. Investors are making similar demands of their asset managers.

Interactive Data, the reference data powerhouse, has authored a new white paper which describes these challenges in depth. It also outlines the steps financial firms may need to take in order to address them effectively. Those that do could have a notable competitive advantage over their more slow-footed rivals.

Download your complimentary copy here.

Source: IDC, 18.01.2012

Filed under: Data Management, Data Vendor, Market Data, Reference Data, Risk Management, Trading Technology, , , , , , ,