

How is it that your financial institution, cell phone company or local cable provider knows everything about you from your postal code to your motherβs maiden name, but they canβt figure out how to correct extra charges on your monthly statement without bouncing you from one department to another, or putting you on hold for an interminable amount of time?
Itβs an all-too-familiar scenario that underscores one of the challenges of big data that extends beyond volume, velocity and variety.
For Fei Chiang, the challenge is ensuring quality over quantity, focussing on the veracity and value of the data β whether or not itβs accurate, consistent and timely.
βMany companies simply arenβt sharing the data theyβve collected on their customers,β says Chiang, associate director of the MacDATA Research Institute. βIf updates arenβt done across departments, then the department that knows the details about your service plan wonβt know that youβve moved, or perhaps changed the PIN on your account.β
For the computing and software professor, an easy solution is a central repository that keeps everyone in synch with the clientβs most current data. However, this is not always practically feasible due to synchronization and concurrency issues.
Chiang and her research group focus on developing the techniques and tools that can repair, clean, validate and safeguard the massive datasets generated by telecommunications corporations, financial institutions, health care systems and industry. Their work ensures that high quality data is produced, which in turn ensures valuable insights and real-time decision making β right down to solving the issue with your cell phone provider in one short, efficient phone call.
Chiangβs Data Science Research Lab is involved in a number of projects, one of which involves a collaboration with IBM Canada. Her research team is tackling the challenge of developing efficient data cleaning algorithms that can be used in IBMβs product portfolio.
The software technology theyβre developing would enable data cleaning tools that assess and predict a userβs intentions with the data. This would enable more customized software solutions that curate and clean the data according to the specific information a user is looking for.
βTake for example the sports industry β we would create a customized data profile view that could be used by the fan who wants the latest stats for their football pool, and another view for the sports station that requires in-depth analysis and predictions for each and every team in the NHL,β Chiang explains.
βWhether itβs a basketball scout, the wagering industry or a marketing firm trying to foresee the World Series MVP so they can lock up a promotions contract, we are building automated software tools to provide users with consistent and accurate data based on their specific information needs.β
The added bonus of working with IBM is that the prestigious global enterprise provides Chiangβs graduate students with entrΓ©e to the latest βreal-worldβ technologies and tools, plus access to modern data sets and the opportunity to solve data quality challenges.
Yu Huang, one of Chiangβs PhD students, has also had the opportunity to work with IBMβs Centre for Advanced Studies as an intern. The internship is every year for three years, connecting Huang with architects and software developers, where he gains not only leading edge technological knowledge, but industry relevant experience.
βItβs graduate students like Yu who are going to solve the next-generation challenges of Big Data, to harness its potential and power,β says Chiang.
βIt isnβt just the vast amounts of data that make Big Data βbigβ. The real value lies in extracting the value from this data to produce accurate, high quality data, and to reap the information and insights this data provides.β