Blog - Brent Ozar | Brent Ozar Unlimited

Blog Team Contact

You don’t have a Big Data problem.

by Brent Ozar March 7, 2013

13 comments

Let’s bust the buzzword bubble. Big Data is a sexy problem to have, so everybody’s claiming it. I’m sick of people using this phrase to gloss over their real challenges.

You have a Small Server problem. If you haven’t bought a new server in the last two years, you’re not allowed to complain about the size of your data. You can buy a 4-socket server with 1TB of memory for less than what it’ll cost to bring in a fancypants Big Data consultant for a month or two.

You have a Slow Storage problem. That massively expensive SAN you bought five years ago is eclipsed by a single consumer-grade SSD today. Run CrystalDiskMark on your SSD-equipped laptop, and then run it on your database server. If the laptop’s faster, does that mean your MP3 collection is Big Data?

I have a Big Calorie problem.

You have an Awkward ETL problem. Call it “unstructured data”, but it’s really sloppy data modeling. It’s completely okay to save money on design and pass the savings on to – wait, actually, you don’t save money. If you don’t structure your data, everything else you do costs more money. Anytime you want to interact with that data, it’s going to cost extra – but it’s not the size of the data that’s the problem.

You have a Small Documentation problem. Don’t think that you can hire one 18th Level Data Scientist Mage and suddenly get amazing insights from your data. The first thing he’ll need to do is talk to the rest of your staff about the sources of the data, the timing, and the quality. They won’t remember offhand why the mainframe’s Gross Net Subnet Tax Top Line is different from the accounting system’s Net Gross Top Tax Subnet field. You can’t Hadoop your way out of that one.

You have an Small Momentum problem. Every few years, there’s a new buzzword to describe how your data is going to become magically actionable. Business Intelligence! No, wait, Self-Service Business Intelligence! No, wait, Data Visualization! No, wait, Big Data! Before you embark on the next one, take a critical look at why past data initiatives in the company have failed, and I bet it doesn’t have anything to do with the data’s size.

When someone wants to talk Big Data to you, ask what they’re trying to sell you. Odds are, they’re really trying to solve your Big Wallet problem.

13 comments

↑ Back to top

Developers: Check Your SQL Server’s Health

by Brent Ozar Unlimited Team March 6, 2013

0 comments

Before you join us in Atlanta for our 2-day training class (or if you’re just thinking about it), let’s talk about your environment. In this 30-minute session, we’ll give you some vital statistics to check in your SQL Server before you leave work, and we’ll explain how our upcoming training class will help you improve those metrics.

In the video, we discuss bringing the results from a few troubleshooting queries – here’s the links:

Wait Stats Triage – get a snapshot of what SQL Server has been waiting on.
Get Database Sizes – excellent StackOverflow question and answer.
Which Tables are Cached in Memory – check the buffer pool.
Top 20 Resource-Using Queries – from the plan cache.
Check Config Settings with sp_Blitz™ – quick diagnostic utility.

To get the most from the training class, run those queries ahead of time and bring the results to the training session. You’ll learn to identify common T-SQL antipatterns, improve your indexes, and make your database faster.

Need help convincing your boss you should go? Here’s a business justification PDF you can hand ‘em.

0 comments

↑ Back to top

Join Jeremiah and Kendra in Alpharetta, Georgia on March 11

by Kendra Little March 5, 2013

2 comments

Join this guy in Alpharetta, GA

On Monday, March 11, Jeremiah and Kendra will be speaking at the AtlantaMDF SQL Server User group in Alpharetta, Georgia. We’ll be fresh from our company retreat in Mexico and may or may not be sporting some killer sunburns.

We’ll be giving a new talk, ‘Performance Tuning Challenge: DBAs vs Developers’. We’re going to tackle three big issues where developers and DBA frequently go to war about how to write and tune code using SQL Server. We’ll talk about each sides perspective, and debate who’s right and who’s wrong. We’ll also share the secrets of how we’ve helped teams move past all the controversy and solve their production problems.

If you’re in the Atlanta area, join us for this free event on Monday, March 11 at 6:30 PM.

2 comments

Six Scary SQL Surprises

by Brent Ozar March 4, 2013

1 comment

In the Happy Database movies, SQL Server is benign and friendly, with no scary surprises lurking around the next corner. In these movies:

People work together
No-one but the DBA can make changes to production
People tell us in advance when they want to change something
When query performance degrades, we’re notified about it
Third Party vendors only deploy good changes
Deployment scripts always work perfectly

Back in the real world of SQL Server, not all of these are true all of the time and the result is that we spend long weekends slaving over fixes for things that should never have been broken in the first place, or that we should have known about before they became a big problem….

Read my Six Scary SQL Surprises at Red Gate.

1 comment

Disabling vs. Dropping Indexes

by Jes Schultz Borland February 28, 2013

6 comments

In order to improve your applications and your databases, they will need to change over time. The structure of the database changes, the structure of the tables change, the data in the tables change, the application changes, the queries against the data change. Indexes that once aided performance now sit unused while new indexes are added.

When index needs change and you want to test how removing an index will affect performance, you have two options – you can disable or drop the index. Which is better?

Disabling an Index

To disable in index, issue an ALTER INDEX command.

ALTER INDEX IX_IndexName ON Schema.TableName DISABLE

What happens when you do this? The metadata about the index is retained, as are the usage statistics in sys.dm_db_index_usage_stats. The index is not usable by the query optimizer, however. If you have an index hint in a query specifying the index, the query will fail. Also, foreign key constraints are disabled.

Disabling a nonclustered index will delete the index pages – the space is freed in the database.

Disabling a clustered index has additional effects. The data in the table still exists, but will be inaccessible for anything other than a drop or rebuild operation. All related nonclustered indexes and views are also unavailable. Foreign key constraints that reference the table are disabled. Queries against the table will fail.

If you want to re-enable the index, you need to rebuild it – there is no ALTER INDEX…ENABLE command.

ALTER INDEX IX_IndexName ON Schema.TableName REBUILD

When you rebuild an index, the usage stats will be reset in sys.dm_db_index_usage_stats.

Dropping an Index

To drop a clustered or nonclustered index, issue a DROP INDEX command.

DROP INDEX IndexName ON Schema.TableName

When you do this, the metadata, statistics, and index pages are removed. If you drop a clustered index, the table will become a heap.

Once an index has been dropped, it can’t be rebuilt – it must be created again. Thus, if you are going to drop an existing index, make sure you script the CREATE statement also. If it is determined that you need it again in the future, it will be much easier to open the script than try to remember how you had it set up previously!

Do you want to maintain statistics?

The biggest difference between disabling and dropping an index is whether the metadata and statistics are persisted. If disabled, they are. If dropped, they are not. Make sure you carefully weigh your options before performing either action, and always have a way to recreate the index available.

6 comments

Log Shipping: Preparing for Disaster

by Jes Schultz Borland February 27, 2013

5 comments

Log shipping is a tried and true method in SQL Server for keeping a copy of your data on a secondary server. If you haven’t worked with this feature yet, join Jes to learn the basics, such as what log shipping is used for, what the moving pieces are, and how to set it up in this 30-minute video:

Want to learn more about monitoring and failing over? Sign up for Part 2 now.

5 comments

Extract, Transform, Loud Noises

by Jeremiah Peschka February 26, 2013

3 comments

Getting data out of SQL Server is easy—you don’t need special tools. With just a few lines of code, you can easily send the results of a query to a file. When you’re building a prototype, this is also the easiest way to work; you create quick and dirty tools that get the job done. ETL (extract, transform, load) processes are no exception. During a recent project working with SQL Server and Hive, I assumed that most of the pain would involve Hive. I was sorely mistaken: SQL Server and dirty data were the source of our problems.

Extracting the Errors

Getting data out of SQL Server seemed painless. It only took a few SQL statements and several lines of C# before I had a working data extractor. Data came out of SQL Server and was written to disk in a delimited file format—to make life easy, I decided to use a tab character as the delimiter. The hardest part of the extraction process was checking the data for funny characters.

Hive doesn’t support quoted strings; this made it especially important to make sure that the source data was as clean as possible. Hive’s schema on read makes it possible to support files with a different number of fields in each record, but the downside of this flexibility is that stray delimiter characters will be picked up as a new column.

The initial portion of the data looked correct, so I proceeded with data loads. As we began loading more and more data, problems started appearing in the queries. Data wasn’t where it should be and some of the aggregations were failing—Hive was reporting errors converting strings to numbers. I was perplexed and started looking at the data in the source system. Everything looked correct. Rather than examine the source data, I looked into the extracted data and rapidly got lost in a sea of massive files because Hive’s MapReduce jobs create a detailed log of all activity.

While digging through the chain of MapReduce logs, I found references to the problem data. Ultimately, I found several lines of code where an enterprising user had put a tab character in a string. Not to be outdone, a few other users had managed to sneak newline characters into text as well. After creating additional data cleansing rules, I reloaded all of the source data. Somewhat surprisingly, everything began working at this point. The users and their desire to put random characters in text had been defeated by a simple REPLACE.

A Transformative Experience

Applying data transformations in Hive was anticlimactic. Really. The process of transforming source data into a queryable format was painless—barring date and time bugs, there’s nothing to report. Hive was able to fly through the source data, pivot several hundred gigabytes of tables, and produce complex aggregations in a matter of minutes.

The only difficulty came from figuring out the right techniques for aggregating data. That, however, is a different story.

The Letdown

There’s always a letdown. After a reasonable run of success building a prototype, something had to break and break it did.

After getting data out of SQL Server, pushing data into Hive, running queries, and pulling results out of Hive, you would assume that the hard part was over. After all, the only thing left to do was load the results into SQL Server. And that’s where the problems started.

There are a lot of techniques to load data into SQL Server and they pose different problems. Although the most efficient techniques involve using bcp or BULK INSERT, they require that SQL Server has access to the files. Although that works, troubleshooting a combination of Active Directory and SQL Server security is nobody’s idea of a good time; sorting out the required permissions proved time consuming and difficult. Once security issues were resolved, a new error cropped up: BULK INSERT may encounter problems loading data into tables created with LOB data types. It was easy to get around this problem by using the correct data types with appropriate precision. This wasn’t difficult to fix, but it did slow down our end to end testing.

In the end…

After a lot of troubleshooting security issues and data formatting issues, we were able to resolve our problems and get an end to end test running. It’s rewarding to watch data move through a processing pipeline, especially after struggling with multiple ETL problems.

ETL remains the most difficult part of database projects. Moving data between different instances of SQL Server can be problematic, depending on the methods used; moving data between SQL Server and different databases can cause problems in ways that you didn’t expect. Building end to end systems is difficult. If you haven’t done it before make sure you pad your schedule; I was able to solve these problems quickly through previous experience and by reaching out to my network, but everyone may not be so lucky. ETL can be problematic even for the most seasoned SQL Server professional.

Curious if you might have a problem suitable for Hadoop but don’t want to run into some of the same problems with your implementation? I’ll at SQLIntersection in April presenting Reporting in Production: A Hadoop Case Study talking about the problems we ran into implementing a Hadoop driven solution and how we solved them. Head on over, look at the sessions, and register to get your learn on in Vegas.

3 comments

SQL Server Change Management: 3 Best Practices (video)

by Kendra Little February 25, 2013

1 comment

Whether you’re planning a major migration, a code release, or an update to Windows security patches, there’s always the risk that something may go terribly wrong. How can you prevent disaster? In this 30 minute webcast, Kendra will share the top three elements to planning a successful change for SQL Server.

1 comment

gipoco.com is neither affiliated with the authors of this page or responsible
for its contents. This is a safe-cache copy of the original web site.

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.