Bug Wiki Code Quality & RedundancyIncorrect normalization

1 example

Incorrect normalization

Data incorrectly standardized or formatted, causing processing errors.

[ FAQ1 ]

What is incorrect normalization?

Incorrect normalization occurs when data is improperly standardized or formatted, causing inconsistencies, misinterpretations, or errors during processing. Common contexts include Unicode normalization (affecting text encoding and comparisons), database schema normalization (influencing data redundancy and consistency), and feature scaling in data science (impacting model accuracy). For example, improperly normalized Unicode strings might cause matching issues, while incorrect database normalization leads to redundant or inconsistent records.

[ FAQ2 ]

How to fix incorrect normalization issues

To fix incorrect normalization, clearly define and consistently apply appropriate normalization standards or forms relevant to your data type or context—such as Unicode Normalization Forms (NFC, NFD) for text processing, or normal forms (1NF, 2NF, 3NF) for database schemas. Utilize reliable tools or libraries designed explicitly for correct data normalization and ensure consistent application across your system. Regularly review and validate data handling processes, employing unit tests and automated checks to identify normalization errors proactively. Maintaining clear guidelines and documentation around normalization practices helps ensure data consistency and integrity throughout your application.

reorproject/reor·#507

diff block

}

return []

}

+

+interface KeywordDBQueryResult extends DBQueryResult {

+ keyword_score?: number

+}

+

+const keywordSearch = async (query: string, limit: number, filter?: string): Promise<KeywordDBQueryResult[]> => {

+ try {

+ const keywords = query

+ .toLowerCase()

+ .split(/\s+/)

+ .filter((word: string) => word.length > 2 && !['the', 'and', 'for', 'with', 'this', 'that'].includes(word))

+

+ if (keywords.length === 0) {

+ return []

+ }

+

+ const vectorResults = await window.database.search(query, limit, filter)

+

+ const resultsWithKeywordScores = vectorResults

+ .map((result) => {

+ let score = 0

+ keywords.forEach((keyword) => {

+ const regex = new RegExp(keyword, 'gi')

+ const matches = result.content.match(regex)

+ if (matches) {

+ score += matches.length

+ }

+ })

+

+ return {

+ ...result,

+ keyword_score: score,

+ }

+ })

+ .sort((a, b) => (b.keyword_score || 0) - (a.keyword_score || 0))

+

+ return resultsWithKeywordScores

+ } catch (error) {

+ // eslint-disable-next-line no-console

+ console.error('Error performing keyword search:', error)

+ return []

+ }

+}

+

+const combineAndRankResults = (

+ vectorResults: DBQueryResult[],

+ keywordResults: KeywordDBQueryResult[],

+ limit: number,

+ vectorWeight = 0.7,

+): DBQueryResult[] => {

+ const keywordWeight = 1 - vectorWeight

+ const resultsMap = new Map<string, DBQueryResult & { combinedScore: number; keywordScore?: number }>()

+

+ const maxKeywordScore = keywordResults.length > 0 ? Math.max(...keywordResults.map((r) => r.keyword_score || 0)) : 1

greptile

logic: Using 1 as fallback for maxKeywordScore when no results could lead to incorrect normalization. Should use 0 or handle this case differently.

suggested fix

+ const maxKeywordScore = keywordResults.length > 0 ? Math.max(...keywordResults.map((r) => r.keyword_score || 0)) : 0

Want to avoid this bug in your codebase? Try Greptile.

Avoid this bug!