"Big Tech" පරිසරවල (ඔබ දන්නවා නම්, පරිශීලකයන් ටොන්, විශාල දත්ත සබඳතා සහ වේගයෙන් වර්ධනය වන අවශ්යතා සහිත) දත්ත සබඳතා මත පදනම්ව ඔබ හිතන තරම් සාර්ථක නොවනු ඇත - සෑම සෙනෙහසකටම නිවැරදි විය යුතු බව මූල්ය සමුදාය වැනි දෙයක් සඳහා නොවේ නම් - දත්ත සැපයුම් වළක්වා ගැනීම සඳහා සීමා - අමතරව, ඒවා තබාගැනීමේ වියදම පුදුමාකාරව ඉහළ විය හැකිය. වඩා හොඳ ප්රවේශයක් බොහෝ විට යෙදුම් මට්ටමේ Deduplication ලෝගිකයේ ප්රධාන ප්රතිකාර කිරීම වේ. UNIQUE INDEX 1.ඇයි මම තනි ඉංජිනේරුවන් නැවත සිතන්න පටන් ගත්තේ? Database unique indexes sound pretty reliable, right?The last line of defense against data duplication.I used to think so too.When a field in a table needed to be unique, I would randomly slapp a unique index on it.Data database unique indexes sound pretty reliable, right?Data database unique indexes sound pretty reliable, right?Data database unique indexes sound pretty reliable, right?Data base unique indexes sound pretty reliable, right?Data database unique, right?Data database unique indexes sound pretty reliable, right?Data base unique indexes sound pretty reliable, right?Data base unique, right?Data base unique indexes sound pretty reliable, right?Data base unique. සැබෑ සැබැවින්ම මට හදිසි අවදිවීමක් දුන්නා. බොහෝ කලකට පෙර, මගේ කොණ්ඩය වඩාත් පූර්ණ වූ විට, මම මිලියන ගණනක් රේඛාවක් සහිත ටැබ් එකකට සංසන්දනීය සුවිශේෂී ඉංජිනේරුවක් එකතු කළ යුතුය (ඔව්, එවැනි ක්ෂේත්ර සඳහා සහ සෑහෙන්න සරලයි, නැද්ද?ඔව්, මුළු වෙනස් කිරීමේ ක් රියාවලිය දිගින් දිගටම මෙම කාලය තුළ, ස්වාමිපුරුෂ-ස්වාමිපුරුෂ ප්රතිපත්තිය ප්රතිපත්තිය ප්රමාණය රැඳී සිටියා, සහ අපි සෑම විටම ප්රශ්න විය හැකි සේවා හයිකප්ප. tenant_id is_deleted දින ඉන්පසු තවත් අපහසු තත්ත්වයක් විය. ව්යාපාරික අවබෝධයෙන්, අපි හැමෝම දන්නවා සහ සැබවින්ම එකම ඊ-තැපැල් ය. ඔබගේ යෙදුම කේතය අනිවාර්යයෙන්ම ඒවා සම්මත කරනු ඇත (උදාහරණයක් ලෙස, පහතට) ලියාපදිංචි වීමේදී ද්විත්වයන් සඳහා පරීක්ෂා කිරීමට පෙර. නමුත් දත්ත සංකේතයේ සුවිශේෂී ඉංජිනේරුව (එය නිතරම උපායමාර්ගයෙන් ප්රමාණවත් වේ) ඒ ආකාරයට එය නොපෙනේ. සමහර වෙලාවට, ඓතිහාසික දත්ත හෝ දත්ත නිවැරදිව සම්මත නොකළ දත්ත සකස් කිරීම නිසා, ඔබ දත්ත සකස් නොකළ විට, දත්ත සංකේතයේ "මෙම" ඊ-තැපැල් දෙකම සම්මත වනු ඇත. එවැනි අවස්ථා වලදී, සුවිශේෂී ඉංජිනේරුවා user@example.com USER@EXAMPLE.COM උදාහරණයක් ලෙස, සමහර විට "තැපැල් සුවිශේෂීත්වය" මීට පෙර ප්රමාණවත් විය හැකි නමුත් දැන් අවශ්යතාවය "තැපැල් ID + තැපැල් සුවිශේෂීත්වය" බවට වෙනස් වේ. පීඩනය සහ නව එකක් d. ඔබ මෙම ක් රියාකාරකම් දෙකම කොන්දේසි කරන්නේ කෙසේද? පළමු ක් රියාකාරකම් කුමක්ද? අතරින් යම් දෙයක් වැරදියි නම් කුමක්ද? විශාල මේසවල මෙම ක් රියාකාරකම් සිදු කිරීම සෑම අවස්ථාවකදීම බෝම්බයක් ඉවත් කිරීම වැනි හැඟීමකි. DROP CREATE මෙම අත්දැකීම් මට සිතා බලන්නට විය: විශාල දත්ත ප්රමාණයන්, උසස් අනුකූලතාවය සහ වේගයෙන් වෙනස් වන අවශ්යතා සහිත පරිසරයකදී, සුවිශේෂී ඉංජිනේරුවන්ට සදාචාරාත්මක ප්රවේශය තවමත් නිවැරදිද? මේ ලිපිය මේ ගැන මගේ සිතුවිලි බෙදාගැනීම ගැනයි. 2.  : ඇයි අපි එතරම් විශ්වාස කරන්නේ? තනි අංකය තනි අංකය මම පැමිණිලි වලට පිවිසීමට පෙර, අපි සාධාරණ විය යුතු අතර, සුවිශේෂී ඉංජිනේරු එතරම් ජනප්රිය ඇයි බව පිළිගනිමු. 
 
 
 
 
 දත්ත සම්පූර්ණත්වය සඳහා අවසාන ආරක්ෂාව: දත්ත දෙවරක් ඉවත් කිරීම සඳහා අවසාන බාධක. ක්රියාත්මක කිරීම පහසුය: ටැබ්ලයක් නිර්මාණය කරන විට SQL කිහිපයක් හෝ ඊළඟට DDL එකතු කරන විට, සහ ඔබ අවසන්. Schema as documentation: එය Schema වල ලියාපදිංචි කර ඇත; මෙම ක්ෂේත් රයේ ද්විත්වයන් තිබිය නොහැක. ප්රශ්නය කාර්ය සාධනය වැඩි දියුණු කිරීම: එය ඉංජිනේරු වන නිසා, මෙම යතුර මත ප්රශ්න වේගවත් විය හැක. මෙම ප්රතිලාභ සැබවින්ම කුඩා ව්යාපෘති සඳහා ඉතා ආකර්ෂණීය වේ, හෝ දත්ත ප්රමාණයන් කළමනාකරණය කළ හැකි අතර ව්යාපාරික ලෝහය අතිශයින් සංකීර්ණ නොවේ. 3.  "Big Tech" ලින්ක් යටතේ: එම ප්රතිලාභ තවමත් බලපානවාද? තනි අංකය තනි අංකය ඉහත සඳහන් වූ සෑම "පිරිහැර"ක්ම පරීක්ෂා කරමු සහ ඔවුන් තවමත් විශාල ප්රමාණයේ, වේගවත් තාක්ෂණික පරිසරයක සිටීදැයි බලන්න. 
 
 
 
 
 
 
 
 
 
 
 
 "The ultimate safeguard"? Is this safeguard reliable? What exactly is it safeguarding against? It doesn't fully recognize business-level "duplicates"! Except the email case sensitivity issue I mentioned earlier (which could be solved by using   but introduce more complexity in the DB layer), or phone numbers with or without  , or usernames with or without special characters stripped... these nuances, which business logic considers "the same," are beyond the grasp of a database's simplistic "byte-for-byte identical" unique index. It can't prevent "logical duplicates" at the business layer. collation +44 The application layer has to do the heavy lifting anyway. Since all these complex "sameness" checks must be handled in the application code (you can't just throw raw database errors at users, can you?), the application layer is the true workhorse ensuring "business data uniqueness." The database's unique index is, at best, an "auxiliary police officer" whose standards might not even align with the business rules. In distributed systems, it's merely a "local bodyguard." Once you shard your tables in a distributed scenario, an in-table unique index can't ensure global uniqueness. Global uniqueness then relies on ID generation services or application-level global validation. At this point, the "safeguard" provided by the local database index becomes even less significant. This "ultimate safeguard" might miss the mark, has limited coverage, and relying solely on it is a bit precarious. 
 
 
 
 
 "Easy to implement"? One-time setup, week-long headache. Adding a unique index to a brand new table is indeed just one SQL statement. But more often, you're changing the rules for an old table that's been running for ages and has accumulated mountains of data. Trying to alter a unique index on a table with tens of millions of rows (e.g., changing from a single-field unique to a composite unique) could mean several minutes of table locking! Online DDL tools might save you from service downtime, but the entire process can still be lengthy, resource-intensive, and risky. Agile? Not so fast! In scenarios with rapid iteration, multi-region synchronization, and compliance requirements, a single unique index change at the database level can hold you up for days. So much for agility. So, that initial "simplicity" is like bait compared to the "hell" of modifying it later. 
 
 
 "Schema as documentation"? The documentation might not match reality! Yes, a unique index in the table structure acts as a form of "technical documentation." But "documentation" can be misleading. If the "uniqueness" defined by this index doesn't align with the actual, more complex business rules (like the case-insensitivity example), then this "documentation" is not only useless but can also mislead future developers. If changing this "documentation" (i.e., modifying the unique index) involves an epic struggle, why not write down the business rules properly in actual design documents, wikis, or code comments? Those are far easier to update. 
 
 
 "A potential query performance boost"? Is the tail wagging the dog? This is a common misconception, or rather, an overemphasized "added value." If you simply want to speed up queries on a specific field or set of fields, you can absolutely create a regular, non-unique index for them! A non-unique index will boost query speeds just fine, and it comes without the write overhead, DDL pains, and rigid business logic constraints of a unique index. 
 
 
 Master-slave index inconsistency can instantly "paralyze" replication: I've seen it happen multiple times: the unique index configuration on the primary database is updated (e.g., a field is added, or a constraint is changed), but the index on the replica isn't modified in sync. Then, as soon as data changes on the primary (e.g., a row is inserted that would be considered a duplicate on the replica, or the primary can write it but the replica can't due to the incorrect/outdated index), the binlog is applied to the replica, and bam!  . Replication just dies. When this happens, you get data lag, read-write splitting is affected, and it can even impact failover capabilities. What a nightmare, right? Slave_SQL_Running: No 4.Let the Application Layer Do the Job - It's What It's Good at! - යෙදුම් මට්ටමේ වැඩ කරන්න - ඒක තමයි එය හොඳයි! දත්ත පද්ධති සුවිශේෂී ඉංජිනේරුවන් සමඟ මෙම ගැටළු ඔස්සේ, දත්ත සුවිශේෂීත්වය සහතික කිරීම සඳහා වගකීම ප්රධාන වශයෙන් අපගේ යෙදුම් මට්ටමේ විය යුතුය. යෙදුම් මට්ටමේ සුවිශේෂීතාව ප්රතිලාභ ගණනාවක් ඇත: 
 
 
 
 
 ආකර්ෂණීය හා නිවැරදි: ව්යාපාරය දෙගුණයක් ලෙස හඳුන්වන්නේ කුමක් වුවත්, අපි ඒ අනුව සංකේතය සකස් කළ හැක - අවස්ථාව සංවේදීතාවය, ආකෘති කිරීම, සංකීර්ණ කොන්දේසි, ඔබ එය නම් කළ හැක. වඩා හොඳ පරිශීලක අත්දැකීම්: පරිශීලකයා වරදක් කළහොත්, අපි පැහැදිලි, ප්රයෝජනවත් ප්රතිචාර ලබා ගත හැකිය, වැනි "මේ දුරකථන අංකය දැනටමත් ලියාපදිංචි වී ඇත. Efficient Early Rejection: Intercept service interface layer හෝ even the gateway layer හි ද්විත්වයන්, දත්ත දත්ත දත්ත දත්තට පවා ආවරණය කිරීමට පෙර, අසාමාන්ය වටා ගමනක් ඉතිරි කරයි. Interface Idempotency: මෙය ද්විත්ව මෙහෙයුම්වලට එරෙහිව ශක්තිමත් ආයුධයක් වන අතර, පරිශීලකයා Submit බොත්තම දෙගුණයෙන් ක්ලික් කරන්නේ නම්, හෝ ජාල ගැටලුව ප් රතිපත්තියකට හේතු වන්නේ නම්, යෙදුම් මට්ටමේ නිවැරදි idempotency දත්ත ද්විත්ව නොවන බව සහතික කරයි. ප් රතිඵල විශේෂිත අවස්ථාවලදී සම්පූර්ණ අන්තිම ප් රතිසංස්කරණ දත්ත backstop ලෙස ඇති ප්රතිලාභ (සමහර විට) විශාල දත්ත ප්රමාණයන් සහ වේගවත් iteration (විශේෂ වේගවත්කම, මෙහෙයුම් වේදනාව) සහිත සංකීර්ණ පරිසරවල දී ඇති ප්රමාණයේ ප්රතිලාභ පැහැදිලිව හා සංකීර්ණ ප්රතිලාභ වඩා වැඩි වන විට එය භාවිතා කිරීම ගැන සැලකිලිමත් වන්න.

Read My Stories

මෙම ශ්‍රව්‍යය කතාවේ මුල් භාෂාවෙන් නිෂ්පාදනය කර ඇත!

ඉංජිනේරු ඉංජිනේරු: We Should Think Twice (Especially at Scale)

About Author

අදහස්

ටැග් එල්ලන්න

මෙම ලිපිය ඉදිරිපත් කරන ලදී

Related Stories

Technology's 24 Most Important Social Networks for Content Distribution

Spring Into New Features: Smarter Notifications, More Writing Stats, Pixel Icon Library & More

Meet PennyFly Entertainment, Startups of the Year Winner (Malibu, CA)

Seamless Sign-ups on the Move: Unlock Access as You Take Action!

Technology's 24 Most Important Social Networks for Content Distribution

Spring Into New Features: Smarter Notifications, More Writing Stats, Pixel Icon Library & More

Meet PennyFly Entertainment, Startups of the Year Winner (Malibu, CA)

Seamless Sign-ups on the Move: Unlock Access as You Take Action!

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps